asyncio internals
Saúl Ibarra Corretgé
@saghul
PyGrunn 2014
Friday, May 9, 14
Intro
New asynchronous I/O framework for Python
PEP-3156
Python >= 3.3 (backport available: Trollius)
Uses new language features: yield from
Designed to interoperate with other frameworks
You went to Rodrigo’s talk earlier today, right?
Friday, May 9, 14
Friday, May 9, 14
Architecture
Event loop
Coroutines, Futures and Tasks
Transports, Protocols and Streams
I’ll cover these
Homework!
Friday, May 9, 14
Event Loop
Friday, May 9, 14
Calculate
poll time
Poll
Run
callbacks
Friday, May 9, 14
There is no abstraction for an “event”
It runs callbacks which are put in a queue
Callbacks can be scheduled due to i/o, time or user
desire
The event loop acts as an implicit scheduler
Friday, May 9, 14
Sim
plified
def call_soon(self, callback, *args):
handle = events.Handle(callback, args, self)
self._ready.append(handle)
return handle
Friday, May 9, 14
events.Handle is like a “callbak wrapper”
The ready queue is a deque
Once per loop iteration al handles in the ready queue
are executed
Friday, May 9, 14
def call_later(self, delay, callback, *args):
return self.call_at(self.time() + delay, callback, *args)
def call_at(self, when, callback, *args):
timer = events.TimerHandle(when, callback, args, self)
heapq.heappush(self._scheduled, timer)
return timer
Sim
plified
Friday, May 9, 14
Timers are stored in a heap (loop._scheduled)
TimerHandle subclasses Handle, but stores the time
when it’s due and has comparison methods for keeping
the heap sorted by due time
Friday, May 9, 14
ntodo = len(self._ready)
for i in range(ntodo):
handle = self._ready.popleft()
if not handle._cancelled:
handle._run()
handle = None # break cycles
Friday, May 9, 14
This is the single place where the ready queue is
iterated over
A thread-safe iteration method is used, since other
threads could modify the ready queue (see
call_soon_threadsafe)
If any handles are scheduled while the ready queue is
being processed, they will be run on the next loop
iteration
Friday, May 9, 14
Different polling mechanisms on Unix: select, poll, epoll,
kqueue, devpoll
Windows is a completely different beast
Different paradigms: readyness vs completion
APIs are provided for both
I/O handling
Friday, May 9, 14
I/O handling APIs
Readyness style
add_reader/add_writer
remove_reader/remove_writer
Completion style
sock_recv/sock_sendall
sock_connect/sock_accept
Friday, May 9, 14
import selectors
New module in Python 3.4
Consistent interface to Unix polling mechanisms
On Windows it uses select()
64 file descriptors default* limit - WEBSCALE!
IOCP is the way to go, but has a different API
Caveat emptor: doesn’t work for file i/o
Friday, May 9, 14
Sim
plified
def add_reader(self, fd, callback, *args):
handle = events.Handle(callback, args, self)
try:
key = self._selector.get_key(fd)
except KeyError:
self._selector.register(fd, selectors.EVENT_READ, (handle, None))
else:
mask, (reader, writer) = key.events, key.data
self._selector.modify(fd, mask | selectors.EVENT_READ, (handle, writer))
if reader is not None:
reader.cancel()
Friday, May 9, 14
The selector key stores the fd, events and user
provided arbitrary data
In this case the arbitrary data is the reader, writer
handle tuple
Only one reader and writer per fd are allowed
Friday, May 9, 14
1.Calculate timeout
2.Block for I/O
3.Process I/O events: schedule callbacks
4.Process timers: schedule callbacks
5.Run pending callbacks
Polling for I/O
Friday, May 9, 14
timeout = None
if self._ready:
timeout = 0
elif self._scheduled:
# Compute the desired timeout.
when = self._scheduled[0]._when
deadline = max(0, when - self.time())
if timeout is None:
timeout = deadline
else:
timeout = min(timeout, deadline)
event_list = self._selector.select(timeout)
self._process_events(event_list)
end_time = self.time()
while self._scheduled:
handle = self._scheduled[0]
if handle._when >= end_time:
break
handle = heapq.heappop(self._scheduled)
self._ready.append(handle)
# run all handles in the ready queue...
Sim
plified
Friday, May 9, 14
If timeout is None an infinite poll is performed
_process_events puts the read / write handles in the
ready queue, if applicable
Friday, May 9, 14
def call_soon_threadsafe(self, callback, *args):
handle = self._call_soon(callback, args)
self._write_to_self()
return handle
Sim
plified
Friday, May 9, 14
The event loop has the read end of a socketpair added
to the selector
When _write_to_self is called the loop will be “waken
up” from the select/poll/epoll_wait/kevent syscall
Friday, May 9, 14
Coroutines, Futures & Tasks
Friday, May 9, 14
Generator functions, can also receive values
Use the @asyncio.coroutine decorator
Does extra checks in debug mode
Serves as documentation
Chain them with yield from
Coroutines
Friday, May 9, 14
Not actually PEP-3148 (concurrent.futures)
API almost identical
Represent a value which is not there yet
yield from can be used to wait for it!
asyncio.wrap_future can be used to wrap a PEP-3148
Future into one of these
Futures
Friday, May 9, 14
f = Future()
Usually a future will be the result of a function
f.set_result / f.set_exception
Someone will set the result eventually
yield from f
Wait until the result arrives
add_done_callback / remove_done_callback
Callback based interface
Friday, May 9, 14
def set_result(self, result):
if self._state != _PENDING:
raise InvalidStateError('{}: {!r}'.format(self._state, self))
self._result = result
self._state = _FINISHED
self._schedule_callbacks()
def _schedule_callbacks(self):
callbacks = self._callbacks[:]
if not callbacks:
return
self._callbacks[:] = []
for callback in callbacks:
self._loop.call_soon(callback, self)
Friday, May 9, 14
After the result or exception is set all callbacks added
with Future.add_done_callback are called
Note how callbacks are scheduled in the event loop
using call_soon
Friday, May 9, 14
Sim
plified
def sock_connect(self, sock, address):
fut = futures.Future(loop=self)
self._sock_connect(fut, False, sock, address)
return fut
def _sock_connect(self, fut, registered, sock, address):
fd = sock.fileno()
if registered:
self.remove_writer(fd)
if fut.cancelled():
return
try:
if not registered:
sock.connect(address)
else:
err = sock.getsockopt(socket.SOL_SOCKET, socket.SO_ERROR)
if err != 0:
raise OSError(err, 'Connect call failed %s' % (address,))
except (BlockingIOError, InterruptedError):
self.add_writer(fd, self._sock_connect, fut, True, sock, address)
except Exception as exc:
fut.set_exception(exc)
else:
fut.set_result(None)
Friday, May 9, 14
Not a coroutine, but we can wait on it using yield from
because it returns a Future
The Uncallback Pattern (TM)
Hey, look at those nice exceptions: BlockingIOError,
InterruptedError
Much nicer than checking if errno is
EWOULDBLOCK or EINTR
Friday, May 9, 14
def run_until_complete(self, future):
future = tasks.async(future, loop=self)
future.add_done_callback(_raise_stop_error)
self.run_forever()
future.remove_done_callback(_raise_stop_error)
if not future.done():
raise RuntimeError('Event loop stopped before Future completed.')
return future.result()
Friday, May 9, 14
Loop.run_forever will run the loop until Loop.stop is
called
_raise_stop_error is an implementation detail, it causes
an exception to bubble up and makes run_forever
return
Friday, May 9, 14
def __iter__(self):
if not self.done():
self._blocking = True
yield self # This tells Task to wait for completion.
assert self.done(), "yield from wasn't used with future"
return self.result() # May raise too.
Friday, May 9, 14
Returning a value from __iter__ is the same as raising
StopIteration(value)
The _blocking flag is used to check if yield future was
used intead of yield from future
Task has a way to wait on a Future if yielded to it, also
checks that yield from was used (_blocking flag)
Friday, May 9, 14
Friday, May 9, 14
Unit of concurrent asynchronous work
It’s actually a coroutine wrapped in a Future
Magic!
Schedules callbacks using loop.call_soon
Use asyncio.async to run a coroutine in a Task
Tasks
Friday, May 9, 14
import asyncio
@asyncio.coroutine
def f(n, x):
while True:
print(n)
yield from asyncio.sleep(x)
loop = asyncio.get_event_loop()
asyncio.async(f('f1', 0.5))
asyncio.async(f('f2', 1.5))
loop.run_forever()
Friday, May 9, 14
Both coroutines will run concurrently
asyncio.async returns a Task if a coroutine was passed,
or the unchanged value if a Future was passed
Go and check how asyncio.sleep is implemented, it’s
really simple!
Friday, May 9, 14
def __init__(self, coro, *, loop=None):
assert iscoroutine(coro), repr(coro) # Not a coroutine function!
super().__init__(loop=loop)
self._coro = iter(coro) # Use the iterator just in case.
self._fut_waiter = None
self._loop.call_soon(self._step)
Sim
plified
Friday, May 9, 14
Tasks are not run immediately, the actual work is done
by Task._step, which is scheduled with loop.call_soon
_fut_waiter is used to store a Future which this Task is
waiting for
Friday, May 9, 14
Sim
plified
def _step(self, value=None, exc=None):
assert not self.done(), '_step(): already done'
coro = self._coro
self._fut_waiter = None
try:
if exc is not None:
result = coro.throw(exc)
elif value is not None:
result = coro.send(value)
else:
result = next(coro)
except StopIteration as exc:
self.set_result(exc.value)
except Exception as exc:
self.set_exception(exc)
except BaseException as exc:
self.set_exception(exc)
raise
else:
if isinstance(result, futures.Future):
# Yielded Future must come from Future.__iter__().
if result._blocking:
result._blocking = False
result.add_done_callback(self._wakeup)
self._fut_waiter = result
else:
# error...
elif result is None:
# Bare yield relinquishes control for one event loop iteration.
self._loop.call_soon(self._step)
else:
# error...
Friday, May 9, 14
The Magic (TM)
The coroutine is stepped over until it finishes
Note the check of _blocking to verify yield vs yield from
usage
The _wakeup function will schedule _step with either a
result or an exception
At any point in time, either _step is scheduled or
_fut_waiter is not None
Friday, May 9, 14
There is a lot more in asyncio
Go read PEP-3156
Don’t be afraid of looking under the hood
Don’t rely on internals, they are implementation details
Join the mailing list, check the third party libraries!
raise SystemExit
“I hear and I forget. I see and I remember.
I do and I understand.” - Confucius
Friday, May 9, 14
Questions?
bettercallsaghul.com
@saghul
Friday, May 9, 14

asyncio internals

  • 1.
    asyncio internals Saúl IbarraCorretgé @saghul PyGrunn 2014 Friday, May 9, 14
  • 2.
    Intro New asynchronous I/Oframework for Python PEP-3156 Python >= 3.3 (backport available: Trollius) Uses new language features: yield from Designed to interoperate with other frameworks You went to Rodrigo’s talk earlier today, right? Friday, May 9, 14
  • 3.
  • 4.
    Architecture Event loop Coroutines, Futuresand Tasks Transports, Protocols and Streams I’ll cover these Homework! Friday, May 9, 14
  • 5.
  • 6.
  • 7.
    There is noabstraction for an “event” It runs callbacks which are put in a queue Callbacks can be scheduled due to i/o, time or user desire The event loop acts as an implicit scheduler Friday, May 9, 14
  • 8.
    Sim plified def call_soon(self, callback,*args): handle = events.Handle(callback, args, self) self._ready.append(handle) return handle Friday, May 9, 14
  • 9.
    events.Handle is likea “callbak wrapper” The ready queue is a deque Once per loop iteration al handles in the ready queue are executed Friday, May 9, 14
  • 10.
    def call_later(self, delay,callback, *args): return self.call_at(self.time() + delay, callback, *args) def call_at(self, when, callback, *args): timer = events.TimerHandle(when, callback, args, self) heapq.heappush(self._scheduled, timer) return timer Sim plified Friday, May 9, 14
  • 11.
    Timers are storedin a heap (loop._scheduled) TimerHandle subclasses Handle, but stores the time when it’s due and has comparison methods for keeping the heap sorted by due time Friday, May 9, 14
  • 12.
    ntodo = len(self._ready) fori in range(ntodo): handle = self._ready.popleft() if not handle._cancelled: handle._run() handle = None # break cycles Friday, May 9, 14
  • 13.
    This is thesingle place where the ready queue is iterated over A thread-safe iteration method is used, since other threads could modify the ready queue (see call_soon_threadsafe) If any handles are scheduled while the ready queue is being processed, they will be run on the next loop iteration Friday, May 9, 14
  • 14.
    Different polling mechanismson Unix: select, poll, epoll, kqueue, devpoll Windows is a completely different beast Different paradigms: readyness vs completion APIs are provided for both I/O handling Friday, May 9, 14
  • 15.
    I/O handling APIs Readynessstyle add_reader/add_writer remove_reader/remove_writer Completion style sock_recv/sock_sendall sock_connect/sock_accept Friday, May 9, 14
  • 16.
    import selectors New modulein Python 3.4 Consistent interface to Unix polling mechanisms On Windows it uses select() 64 file descriptors default* limit - WEBSCALE! IOCP is the way to go, but has a different API Caveat emptor: doesn’t work for file i/o Friday, May 9, 14
  • 17.
    Sim plified def add_reader(self, fd,callback, *args): handle = events.Handle(callback, args, self) try: key = self._selector.get_key(fd) except KeyError: self._selector.register(fd, selectors.EVENT_READ, (handle, None)) else: mask, (reader, writer) = key.events, key.data self._selector.modify(fd, mask | selectors.EVENT_READ, (handle, writer)) if reader is not None: reader.cancel() Friday, May 9, 14
  • 18.
    The selector keystores the fd, events and user provided arbitrary data In this case the arbitrary data is the reader, writer handle tuple Only one reader and writer per fd are allowed Friday, May 9, 14
  • 19.
    1.Calculate timeout 2.Block forI/O 3.Process I/O events: schedule callbacks 4.Process timers: schedule callbacks 5.Run pending callbacks Polling for I/O Friday, May 9, 14
  • 20.
    timeout = None ifself._ready: timeout = 0 elif self._scheduled: # Compute the desired timeout. when = self._scheduled[0]._when deadline = max(0, when - self.time()) if timeout is None: timeout = deadline else: timeout = min(timeout, deadline) event_list = self._selector.select(timeout) self._process_events(event_list) end_time = self.time() while self._scheduled: handle = self._scheduled[0] if handle._when >= end_time: break handle = heapq.heappop(self._scheduled) self._ready.append(handle) # run all handles in the ready queue... Sim plified Friday, May 9, 14
  • 21.
    If timeout isNone an infinite poll is performed _process_events puts the read / write handles in the ready queue, if applicable Friday, May 9, 14
  • 22.
    def call_soon_threadsafe(self, callback,*args): handle = self._call_soon(callback, args) self._write_to_self() return handle Sim plified Friday, May 9, 14
  • 23.
    The event loophas the read end of a socketpair added to the selector When _write_to_self is called the loop will be “waken up” from the select/poll/epoll_wait/kevent syscall Friday, May 9, 14
  • 24.
    Coroutines, Futures &Tasks Friday, May 9, 14
  • 25.
    Generator functions, canalso receive values Use the @asyncio.coroutine decorator Does extra checks in debug mode Serves as documentation Chain them with yield from Coroutines Friday, May 9, 14
  • 26.
    Not actually PEP-3148(concurrent.futures) API almost identical Represent a value which is not there yet yield from can be used to wait for it! asyncio.wrap_future can be used to wrap a PEP-3148 Future into one of these Futures Friday, May 9, 14
  • 27.
    f = Future() Usuallya future will be the result of a function f.set_result / f.set_exception Someone will set the result eventually yield from f Wait until the result arrives add_done_callback / remove_done_callback Callback based interface Friday, May 9, 14
  • 28.
    def set_result(self, result): ifself._state != _PENDING: raise InvalidStateError('{}: {!r}'.format(self._state, self)) self._result = result self._state = _FINISHED self._schedule_callbacks() def _schedule_callbacks(self): callbacks = self._callbacks[:] if not callbacks: return self._callbacks[:] = [] for callback in callbacks: self._loop.call_soon(callback, self) Friday, May 9, 14
  • 29.
    After the resultor exception is set all callbacks added with Future.add_done_callback are called Note how callbacks are scheduled in the event loop using call_soon Friday, May 9, 14
  • 30.
    Sim plified def sock_connect(self, sock,address): fut = futures.Future(loop=self) self._sock_connect(fut, False, sock, address) return fut def _sock_connect(self, fut, registered, sock, address): fd = sock.fileno() if registered: self.remove_writer(fd) if fut.cancelled(): return try: if not registered: sock.connect(address) else: err = sock.getsockopt(socket.SOL_SOCKET, socket.SO_ERROR) if err != 0: raise OSError(err, 'Connect call failed %s' % (address,)) except (BlockingIOError, InterruptedError): self.add_writer(fd, self._sock_connect, fut, True, sock, address) except Exception as exc: fut.set_exception(exc) else: fut.set_result(None) Friday, May 9, 14
  • 31.
    Not a coroutine,but we can wait on it using yield from because it returns a Future The Uncallback Pattern (TM) Hey, look at those nice exceptions: BlockingIOError, InterruptedError Much nicer than checking if errno is EWOULDBLOCK or EINTR Friday, May 9, 14
  • 32.
    def run_until_complete(self, future): future= tasks.async(future, loop=self) future.add_done_callback(_raise_stop_error) self.run_forever() future.remove_done_callback(_raise_stop_error) if not future.done(): raise RuntimeError('Event loop stopped before Future completed.') return future.result() Friday, May 9, 14
  • 33.
    Loop.run_forever will runthe loop until Loop.stop is called _raise_stop_error is an implementation detail, it causes an exception to bubble up and makes run_forever return Friday, May 9, 14
  • 34.
    def __iter__(self): if notself.done(): self._blocking = True yield self # This tells Task to wait for completion. assert self.done(), "yield from wasn't used with future" return self.result() # May raise too. Friday, May 9, 14
  • 35.
    Returning a valuefrom __iter__ is the same as raising StopIteration(value) The _blocking flag is used to check if yield future was used intead of yield from future Task has a way to wait on a Future if yielded to it, also checks that yield from was used (_blocking flag) Friday, May 9, 14
  • 36.
  • 37.
    Unit of concurrentasynchronous work It’s actually a coroutine wrapped in a Future Magic! Schedules callbacks using loop.call_soon Use asyncio.async to run a coroutine in a Task Tasks Friday, May 9, 14
  • 38.
    import asyncio @asyncio.coroutine def f(n,x): while True: print(n) yield from asyncio.sleep(x) loop = asyncio.get_event_loop() asyncio.async(f('f1', 0.5)) asyncio.async(f('f2', 1.5)) loop.run_forever() Friday, May 9, 14
  • 39.
    Both coroutines willrun concurrently asyncio.async returns a Task if a coroutine was passed, or the unchanged value if a Future was passed Go and check how asyncio.sleep is implemented, it’s really simple! Friday, May 9, 14
  • 40.
    def __init__(self, coro,*, loop=None): assert iscoroutine(coro), repr(coro) # Not a coroutine function! super().__init__(loop=loop) self._coro = iter(coro) # Use the iterator just in case. self._fut_waiter = None self._loop.call_soon(self._step) Sim plified Friday, May 9, 14
  • 41.
    Tasks are notrun immediately, the actual work is done by Task._step, which is scheduled with loop.call_soon _fut_waiter is used to store a Future which this Task is waiting for Friday, May 9, 14
  • 42.
    Sim plified def _step(self, value=None,exc=None): assert not self.done(), '_step(): already done' coro = self._coro self._fut_waiter = None try: if exc is not None: result = coro.throw(exc) elif value is not None: result = coro.send(value) else: result = next(coro) except StopIteration as exc: self.set_result(exc.value) except Exception as exc: self.set_exception(exc) except BaseException as exc: self.set_exception(exc) raise else: if isinstance(result, futures.Future): # Yielded Future must come from Future.__iter__(). if result._blocking: result._blocking = False result.add_done_callback(self._wakeup) self._fut_waiter = result else: # error... elif result is None: # Bare yield relinquishes control for one event loop iteration. self._loop.call_soon(self._step) else: # error... Friday, May 9, 14
  • 43.
    The Magic (TM) Thecoroutine is stepped over until it finishes Note the check of _blocking to verify yield vs yield from usage The _wakeup function will schedule _step with either a result or an exception At any point in time, either _step is scheduled or _fut_waiter is not None Friday, May 9, 14
  • 44.
    There is alot more in asyncio Go read PEP-3156 Don’t be afraid of looking under the hood Don’t rely on internals, they are implementation details Join the mailing list, check the third party libraries! raise SystemExit “I hear and I forget. I see and I remember. I do and I understand.” - Confucius Friday, May 9, 14
  • 45.