Developer-friendly taskqueues: What you should ask yourself before choosing one

Sylvain Zimmer / @sylvinus
PyParis 2017
DEVELOPER-FRIENDLY TASKQUEUES
WHAT WE LEARNED BUILDING MRQ
& WHAT YOU SHOULD ASK YOURSELF BEFORE CHOOSING ONE

/usr/bin/whoami
▸ (SpaceX nerd)
▸ Founder, dotConferences
▸ CTO Pricing Assistant
▸ Co-organizer Paris.py meetup
▸ User of Python taskqueues for 10+ years
▸ Main contributor of MRQ

A typical job/task
def send_an_email(email_type, user):
html = template(email_type, user)
status = email.send(html, user["email"])
metrics.send("email_%s" % status, 1)
return status
KERNEL PANIC

Task properties
Re-entrant Idempotent Nullipotent< <
▸ Safe to interrupt
and then retry
▸ Safe to call
multiple times
▸ Result will be the
same
▸ Free of side-effects
def reentrant(a):
value = a + random()
db.insert(value)
def idempotent(key, value):
db.update(key, value)
def nullipotent(a):
return a ** 2

Other task properties & best practices
▸ Serializable args, serializable result
▸ Args validation / documentation
▸ Least args possible
▸ Canonical path vs. registration
▸ Concurrent safety
▸ Statuses

Coroutines vs. Threads vs. Processes
▸ IO-bound tasks vs. CPU-bound tasks
▸ Threads offer few beneﬁts for a Python worker (GIL)
▸ Coroutines/Greenlets are ideal for IO-bound tasks
▸ Processes are required for CPU-bound tasks
▸ If you have heterogenous tasks, your TQ should support
both!
$ mrq-worker --greenlets 25 --processes 4

Performance: latency & throughput
APP
BROKER WORKER
RESULT STORE

Errors
▸ Exception handlers
▸ Timeouts
▸ Retry rules
▸ Sentry & friends
▸ gevent: test your tracebacks!
▸ Priorities
▸ Human process to manage failed tasks!

Task visibility
▸ Tasks by status, path, worker, ...
▸ Tracebacks & current stack
▸ Logs
▸ Timing info
▸ Cancel / Kill / Move tasks
▸ Progress

Memory leaks
▸ Workers = long-running processes
▸ gevent makes debugging harder
▸ Watch out for global variables or mutable class attributes!
▸ Python's ecosystem is surprisingly poor in this area
▸ guppy, objgraph can usually help

Misc tools
▸ Scheduler
▸ Command-line runner, e.g. mrq-run tasks.myTask {"a": 1}
▸ Autoscaling
▸ Proﬁler

Consistency guarantees
▸ At least once vs. At most once vs. Exactly once
▸ Ordering
▸ Critical operations:
▸ Queueing
▸ Marking tasks as started
▸ Timeouts & retries

Types of brokers
▸ Specialized message queues (RabbitMQ, SNS, Kafka, ...)
▸ Performance, complexity, poor visibility
▸ In-memory data stores (Redis, ...)
▸ Performance, simplicity, harder to scale
▸ Regular databases (MongoDB, PostgreSQL, ...)
▸ Often enough for the job!

At the heart of the broker
▸ Atomic update from "queued" to "started"
▸ MRQ with MongoDB broker: ﬁnd_one_and_update()
▸ MRQ with Redis broker: Pushback in a ZSET

ZSETs in Redis
▸ Sorted sets with O(log(N)) scalability
▸ set/get by key, order by key, lookups by key or value
▸ Very interesting properties for task queues: Unicity,
Ordering, Atomicity of updates, Performance
▸ MRQ's "Pushback" model:
▸ Queue with key=timestamp
▸ Unqueue by fetching key range & setting new keys in
the future
▸ After completion the task adjusts or removes the key

Consistency guarantees
▸ Must be thought of for the whole system, not just the
broker!
▸ Brokers can be misused or misconﬁgured
▸ The workers can drop tasks if they want to ;-)
▸ Consistency starts at queueing time!

Think hard about what you need
▸ Will your taskqueue be the foundation of your
architecture, or is it just a side project?
▸ What performance do you need? (IO vs. CPU, latency, ...)
▸ What level of visibility and control do you need on queued
& running tasks?
▸ Can workers terminate abruptly? Lots of design
consequences!
▸ What language interop do you need?

And then all the usual questions...
▸ Is it supported by a lively community?
▸ License
▸ Documentation
▸ Future plans

Which one to pick?
▸ Celery: High performance, large community, very complex,
major upgrades painful
▸ RQ: Extremely simple to understand, low performance
▸ MRQ: Adjust task visibility vs. performance, simple to
understand, 1.0 soon
▸ Lots of other valid options! Just be sure to ask yourself the
right questions ;-)

BE GRATEFUL FOR
THE OSS YOU USE!
REMINDER

QUESTIONS?
THANKS!
Photo credits: https://www.ﬂickr.com/photos/spacex/

Developer-friendly taskqueues: What you should ask yourself before choosing one

More Related Content

What's hot

Similar to Developer-friendly taskqueues: What you should ask yourself before choosing one

More from Sylvain Zimmer

Recently uploaded

Developer-friendly taskqueues: What you should ask yourself before choosing one