网络科技

    今日:733| 主题:244858
收藏本版
互联网、科技极客的综合动态。

[其他] Async Python: The Different Forms of Concurrency

[复制链接]
汉子你别跑丫 发表于 2016-10-6 21:41:39
149 2

立即注册CoLaBug.com会员,免费获得投稿人的专业资料,享用更多功能,玩转个人品牌!

您需要 登录 才可以下载或查看,没有帐号?立即注册

x
With the advent of Python 3 the way we’re hearing a lot of buzz about “async” and “concurrency”, one might simply assume that Python recently introduced these concepts/capabilities. But that would be quite far from the truth. We have had async and concurrent operations for quite some times now. Also many beginners may think that    asynciois the only/best way to do async/concurrent operations. In this post we shall explore the different ways we can achieve concurrency and the benefits/drawbacks of them.  
  Defining The Terms

  Before we dive into the technical aspects, it is essential to have some basic understanding of the terms frequently used in this context.
  Sync vs Async

  In Syncrhonous operations, the tasks are executed in sync, one after one. In asynchronous operations, tasks may start and complete independent of each other. One async task may start and continue running while the execution moves on to a new task. Async tasks don’t block (make the execution wait for it’s completion) operations and usually run in the background.
  For example, you have to call a travel agency to book for your next vacation. And you need to send an email to your boss before you go on the tour. In synchronous fashion, you would first call the travel agency, if they put you on hold for a moment, you keep waiting and waiting. Once it’s done, you start writing the email to your boss. Here you complete one task after another. But if you be clever and while you are waiting on hold, you could start writing up the email, when they talk to you, you pause writing the email, talk to them and then resume the email writing. You could also ask a friend to make the call while you finish that email. This is asynchronicity. Tasks don’t block one another.
  Concurrency and Parallelism

  Concurrency implies that two tasks make progress together. In our previous example, when we considered the async example, we were making progress on both the call with the travel agent and writing the email. This is concurrency.
  When we talked about taking help from a friend with the call, in that case both tasks would be running in parallel.
  Parallelism is in fact a form of concurrency. But parallelism is hardware dependent. For example if there’s only one core in the CPU, two operations can’t really run in parallel. They just share time slices from the same core. This is concurrency but not parallelism. But when we have multiple cores, we can actually run two or more operations (depending on the number of cores) in parallel.
  Quick Recap

  So this is what we have realized so far:
  
       
  •       Sync:Blocking operations.   
  •       Async:Non blocking operations.   
  •       Concurrency:Making progress together.   
  •       Parallelism:Making progress in parallel.  
          Parallelism implies Concurrency. But Concurrency doesn’t always mean Parallelism.        Threads & Processes

  Python has had    Threadsfor a very long time. Threads allow us to run our operations concurrently. But there was/is a problem with the    Global Interpreter Lock (GIL)for which the threading could not provide true parallelism. However, with    multiprocessing, it is now possible to leverage multiple cores with Python.  
  Threads

  Let’s see a quick example. In the following code, the    workerfunction will be run on multiple threads, asynchronously and concurrently.  
  1. import threading
  2. import time
  3. import random
  4. def worker(number):
  5.     sleep = random.randrange(1, 10)
  6.     time.sleep(sleep)
  7.     print("I am Worker {}, I slept for {} seconds".format(number, sleep))
  8. for i in range(5):
  9.     t = threading.Thread(target=worker, args=(i,))
  10.     t.start()
  11. print("All Threads are queued, let's see when they finish!")
复制代码
Here’s a sample output from a run on my machine:
  1. $ python thread_test.py
  2. All Threads are queued, let's see when they finish!
  3. I am Worker 1, I slept for 1 seconds
  4. I am Worker 3, I slept for 4 seconds
  5. I am Worker 4, I slept for 5 seconds
  6. I am Worker 2, I slept for 7 seconds
  7. I am Worker 0, I slept for 9 seconds
复制代码
So you can see we start 5 threads, they make progress together and when we start the threads (and thus executing the worker function), the operation does not wait for the threads to complete before moving on to the next print statement. So this is an async operation.
  In our example, we passed a function to the    Threadconstructor. But if we wanted we could also subclass it and implement the code as a method (in a more OOP way).  
  Further Reading:

  To know about Threads in details, you can follow these resources:
  
       
  •       https://pymotw.com/3/threading/index.html  
  Global Interpreter Lock (GIL)

  The Global Interpreter Lock aka GIL was introduced to make CPython’s memory handling easier and to allow better integrations with C (for example the extensions). The GIL is a locking mechanism that the Python interpreter runs only one thread at a time. That is only one thread can execute Python byte code at any given time. This GIL makes sure that multiple threads    DO NOTrun in parallel.  
  Quick facts about the GIL:
  
       
  • One thread can run at a time.   
  • The Python Interpreter switches between threads to allow concurrency.   
  • The GIL is only applicable to CPython (the defacto implementation). Other implementations like Jython, IronPython don’t have GIL.   
  • GIL makes single threaded programs fast.   
  • For I/O bound operations, GIL usually doesn’t harm much.   
  • GIL makes it easy to integrate non thread safe C libraries, thansk to the GIL, we have many high performance extensions/modules written in C.   
  • For CPU bound tasks, the interpreter checks between      Nticks and switches threads. So one thread does not block others.  
  Many people see the    GILas a weakness. I see it as a blessing since it has made libraries like NumPy, SciPy possible which have taken Python an unique position in the scientific communities.  
  Further Reading:

  These resources can help dive deeper into the GIL:
  
       
  •       http://www.dabeaz.com/python/UnderstandingGIL.pdf  
  Processes

  To get parallelism, Python introduced the    multiprocessingmodule which provides APIs which will feel very similar if you have used Threading before.  
  In fact, we will just go and change our previous example. Here’s the modified version that uses    Processinstead of    Thread.  
  1. import multiprocessing
  2. import time
  3. import random
  4. def worker(number):
  5.     sleep = random.randrange(1, 10)
  6.     time.sleep(sleep)
  7.     print("I am Worker {}, I slept for {} seconds".format(number, sleep))
  8. for i in range(5):
  9.     t = multiprocessing.Process(target=worker, args=(i,))
  10.     t.start()
  11. print("All Processes are queued, let's see when they finish!")
复制代码
So what’s changed? I just imported the    multiprocessingmodule instead of    threading. And then instead of    Thread, I used    Process. That’s it, really! Now instead of multi threading, we are using multiple processes which are running on different core of your CPU (assuming you have multiple cores).  
  With the    Poolclass, we can also distribute one function execution across multiple processes for different input values. If we take the example from the official docs:  
  1. from multiprocessing import Pool
  2. def f(x):
  3.     return x*x
  4. if __name__ == '__main__':
  5.     p = Pool(5)
  6.     print(p.map(f, [1, 2, 3]))
复制代码
Here, instead of iterating over the list of values and calling    fon them one by one, we are actually running the function on different processes. One process executes    f(1), another runs    f(2)and another runs    f(3). Finally the results are again aggregated in a list. This would allow us to break down heavy computations into smaller parts and run them in parallel for faster calculation.  
  Further Reading:

  
       
  •       https://pymotw.com/3/multiprocessing/index.html  
  The    concurrent.futuresmodule  

  The    concurrent.futuresmodule packs some really great stuff for writing async codes easily. My favorites are the    ThreadPoolExecutorand the    ProcessPoolExecutor. These executors maintain a pool of threads or processes. We submit our tasks to the pool and it runs the tasks in available thread/process. A    Futureobject is returned which we can use to query and get the result when the task has completed.  
  Here’s an example of    ThreadPoolExecutor:  
  1. from concurrent.futures import ThreadPoolExecutor
  2. from time import sleep
  3. def return_after_5_secs(message):
  4.     sleep(5)
  5.     return message
  6. pool = ThreadPoolExecutor(3)
  7. future = pool.submit(return_after_5_secs, ("hello"))
  8. print(future.done())
  9. sleep(5)
  10. print(future.done())
  11. print(future.result())
复制代码
I have a blog post on the    concurrent.futuresmodule here:    http://masnun.com/2016/03/29/python-a-quick-introduction-to-the-concurrent-futures-module.htmlwhich might be helpful for exploring the module deeper.  
  Further Reading:

  
       
  •       https://pymotw.com/3/concurrent.futures/  
  Asyncio - Why, What and How?

  You probably have the question many people in the Python community have - What does asyncio bring new to the table? Why did we need one more way to do async I/O? Did we not have threads and processes already? Let’s see!
  Why do we need asyncio?

  Processes are costly to spawn. So for I/O, Threads are chosen largely. We know that I/O depends on external stuff - slow disks or nasty network lags make I/O often unpredictable. Now, let’s assume that we are using threads for I/O bound operations. 3 threads are doing different I/O tasks. The interpreter would need to switch between the concurrent threads and give each of them some time in turns. Let’s call the threads -    T1,    T2and    T3. The three threads have started their I/O operation.    T3completes it first.    T2and    T1are still waiting for I/O. The Python interpreter switches to    T1but it’s still waiting. Fine, so it moves to    T2, it’s still waiting and then it moves to    T3which is ready and executes the code. Do you see the problem here?  
      T3was ready but the interpreter switched between    T2and    T1first - that incurred switching costs which we could have avoided if the interpreter first moved to    T3, right?  
  What is asyncio?

  Asyncio provides us an event loop along with other good stuff. The event loop tracks different I/O events and switches to tasks which are ready and pauses the ones which are waiting on I/O. Thus we don’t waste time on tasks which are not ready to run right now.
  The idea is very simple. There’s an event loop. And we have functions that run async, I/O operations. We give our functions to the event loop and ask it to run those for us. The event loop gives us back a    Futureobject, it’s like a promise that we will get something back in the    future. We hold on to the promise, time to time check if it has a value (when we feel impatient) and finally when the future has a value, we use it in some other operations.  
  Asyncio uses generators and coroutines to pause and resume tasks. You can read these posts for more details:
  
       
  •       http://masnun.com/2015/11/20/python-asyncio-future-task-and-the-event-loop.html   
  •       http://masnun.com/2015/11/13/python-generators-coroutines-native-coroutines-and-async-await.html  
  How do we use asyncio?

  Before we beging, let’s see example codes:
  1. import asyncio
  2. import datetime
  3. import random
  4. async def my_sleep_func():
  5.     await asyncio.sleep(random.randint(0, 5))
  6. async def display_date(num, loop):
  7.     end_time = loop.time() + 50.0
  8.     while True:
  9.         print("Loop: {} Time: {}".format(num, datetime.datetime.now()))
  10.         if (loop.time() + 1.0) >= end_time:
  11.             break
  12.         await my_sleep_func()
  13. loop = asyncio.get_event_loop()
  14. asyncio.ensure_future(display_date(1, loop))
  15. asyncio.ensure_future(display_date(2, loop))
  16. loop.run_forever()
复制代码
Please note that the    async/awaitsyntax is Python 3.5+ only. if we walk through the codes:  
  
       
  • We have an async function      display_datewhich takes a number (as an identifier) and the event loop as parameters.   
  • The function has an infinite loop that breaks after 50 secs. But during this 50 sec period, it repeatedly prints out the time and takes a nap. The      awaitfunction can wait on other async functions (coroutines) to complete.   
  • We pass the function to event loop (using the      ensure_futuremethod).   
  • We start running the event loop.  
  Whenever the    awaitcall is made, asyncio understands that the function is probably going to need some time. So it pauses the execution, starts monitoring any I/O event related to it and allows tasks to run. When asyncio notices that paused function’s I/O is ready, it resumes the function.  
  Making the Right Choice

  We have walked through the most popular forms of concurrency. But the question remains - when should choose which one? It really depends on the use cases. From my experience (and reading), I tend to follow this pseudo code:
  1. if io_bound:
  2.     if io_very_slow:
  3.         print("Use Asyncio")
  4.     else:
  5.        print("Use Threads")
  6. else:
  7.     print("Multi Processing")
复制代码

       
  • CPU Bound => Multi Processing   
  • I/O Bound, Fast I/O => Multi Threading   
  • I/O Bound, Slow I/O => Asyncio  
友荐云推荐




上一篇:跨境B2B 交易机构Payoneer完成1.8亿美元E轮融资,加速全球业务扩张 ...
下一篇:Python的七种武器
酷辣虫提示酷辣虫禁止发表任何与中华人民共和国法律有抵触的内容!所有内容由用户发布,并不代表酷辣虫的观点,酷辣虫无法对用户发布内容真实性提供任何的保证,请自行验证并承担风险与后果。如您有版权、违规等问题,请通过"联系我们"或"违规举报"告知我们处理。

absdiu 发表于 2016-10-6 22:01:58
Duang Duang
回复 支持 反对

使用道具 举报

星鼠123 发表于 2016-10-6 23:16:25
涨姿势
回复 支持 反对

使用道具 举报

*滑动验证:
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

我要投稿

推荐阅读

扫码访问 @iTTTTT瑞翔 的微博
回页顶回复上一篇下一篇回列表手机版
手机版/CoLaBug.com ( 粤ICP备05003221号 | 文网文[2010]257号 )|网站地图 酷辣虫

© 2001-2016 Comsenz Inc. Design: Dean. DiscuzFans.

返回顶部 返回列表