Lock-free algorithms for Kotlin Coroutines

Lock-free algorithms for
Kotlin coroutines
It is all about scalability
Presented at SPTCC 2017
/Roman Elizarov @ JetBrains

Speaker: Roman Elizarov
• 16+ years experience
• Previously developed high-perf
trading software @ Devexperts
• Teach concurrent & distributed
programming @ St. Petersburg
ITMO University
• Chief judge @ Northeastern
European Region of ACM ICPC
• Now work on Kotlin @ JetBrains

Agenda
• Kotlin coroutines overview & motivation for lock-
free algorithms
• Lock-free doubly linked list
• Lock-free multi-word compare-and-swap
• Combining them to get more complex atomic
operations (without STM)

Kotlin basic facts
• Kotlin is a JVM language developed by JetBrains
• General purpose and statically-typed
• Object-oriented and functional paradigms
• Open source under Apache 2.0
• Reached version 1.0 in 2016
• Compatibility commitment
• Now at version 1.1
• Officially supported by Google on Android

Kotlin is …
• Modern
• Concise
• Safe
• Extensible
• Pragmatic
• Fun to work with!

Kotlin is pragmatic
… and easy to learn

Coroutines
Asynchronous programming made easy

How do we write code that waits for
something most of the time?

Blocking threads
Kotlin fun postItem(item: Item) {
val token = requestToken()
val post = submitPost(token, item)
processPost(post)
}

fun postItem(item: Item) {
requestToken { token ->
submitPost(token, item) { post ->
processPost(post)
}
}
}
Callbacks
Kotlin

fun postItem(item: Item) {
requestToken()
.thenCompose { token ->
submitPost(token, item)
}
.thenAccept { post ->
processPost(post)
}
}
Futures/Promises/Rx
Kotlin

Coroutines
Kotlin fun postItem(item: Item) {
launch(CommonPool) {
val token = requestToken()
val post = submitPost(token, item)
processPost(post)
}
}

CSP & Actor models
• A style of programming for modern systems
• Lots of concurrent tasks / jobs
• Waiting most of the time
• Communicating all the time
Share data by communicating

Kotlin coroutines primitives
• Jobs/Deferreds (futures)
• join/await
• Channels
• send & receive
• synchronous & buffered channels
• Select/alternatives
• Atomically wait on multiple events
• Cancellation
• Parent-child hierarchies

Implementation challenges
• Coroutines are like light-weight threads
• All the low-level scheduling & communication
mechanisms have to scale to lots of coroutines

Building blocks
• Single-word CAS (that’s all we have on JVM)
• Automatic memory management (GC)
• Practical lock-free algorithms
• Lock-Free and Practical Doubly Linked List-Based
Deques Using Single-Word Compare-and-Swap
by Sundell and Tsigas
• A Practical Multi-Word Compare-and-Swap Operation
by Timothy L. Harris, Keir Fraser and Ian A. Pratt.

Doubly linked list
S
N
1
N
P
S
P
H T
sentinel sentinel
Use same node in practice
next links form logical list contents
prev links are auxiliary

Insert
PushRight (like in queue)

Doubly linked list (insert 0)
S
N
1
N
P
S
P
H T
2
N
P
create & init
1
2

S
N
1
N
P
S
P
H T
2
N
P
CAS
Retry insert on CAS
failure

S
N
1
N
P
S
P
H T
2
N
P
CAS
Ignore CAS failure
”finish insert”

Remove
PopLeft (like in queue)

Doubly linked list (remove 1)
S
N
1
N
P
S
P
H T
Mark removed node’s
next link
Use wrapper object for mark in practice
Cache wrappers in pointed-to nodes
CAS
Retry remove on CAS failure
1
2
Don’t use AtomicMarkableReference

CAS
S
N
1
N
P
S
P
H T
Mark removed node’s
prev link
Retry marking on CAS failure
”finish remove”

S
N
1
N
P
S
P
H T
CAS
”help remove” – fixup next links

S
N
1
N
P
S
P
H T
CAS
”correct prev” – fixup prev links

Node states
Init
next: Ok
prev: Ok
prev.next: --
next.prev: --
Insert 1
next: Ok
prev: Ok
prev.next: me
next.prev: --
Insert 2
next: Ok
prev: Ok
prev.next: me
next.prev: me
Remove 1
next: Rem
prev: Ok
prev.next: me
next.prev: me
Remove 2
next: Rem
prev: Rem
prev.next: me
next.prev: me
Remove 3
next: Rem
prev: Rem
prev.next: ++
next.prev: me
Remove 4
next: Rem
prev: Rem
prev.next: ++
next.prev: ++
help remove
correct prev
correct prev
1 2 3
4 5 6 7

Concurrent insert (0)
S
N
1
N
P
S
P
H T
2
N
P
3
N
P
I2
I3

S
N
1
N
P
S
P
H T
2
N
P
3
N
P
CAS fail
CAS ok
I2
I3

S
N
1
N
P
S
P
H T
2
N
P
3
N
P
detect wrong prev
(t.prev.next != t)
I2
I3

S
N
1
N
P
S
P
H T
2
N
P
3
N
P
correct prev
I2
I3

S
N
1
N
P
S
P
H T
2
N
P
3
N
P reinit & repeat
I2
I3

Concurrent remove (0)
S
N
1
N
P
S
P
H T2
N
P
R1
R2

S
N
1
N
P
S
P
H T2
N
P
R1
R1
R2

S
N
1
N
P
S
P
H T2
N
P
R1
R2
Finds already removed
R1
R2

S
N
1
N
P
S
P
H T2
N
P
R1
R2
help remove
mark prev
R1
R2

S
N
1
N
P
S
P
H T2
N
P
R1
R2
Retry with corrected next
R1
R2

S
N
1
N
P
S
P
H T2
N
P
R1
R2
help remove
R1
R2

S
N
1
N
P
S
P
H T2
N
P
R1
R2
correct prev
R1
R2

Concurrent remove &
insert
When remove wins

Concurrent remove & insert (0)
S
N
1
N
P
S
P
H T
2
N
P
create & init
R1
R1
I2

S
N
1
N
P
S
P
H T
2
N
P
remove first
R1
R1
I2

S
N
1
N
P
S
P
H T
2
N
P
CAS fail
R1
R1
I2

S
N
1
N
P
S
P
H T
2
N
P
detect wrong prev
(t.prev.next -- removed)
do “correct prev”R1
R1
I2

S
N
1
N
P
S
P
H T
2
N
P
mark prev
fixup next
R1
R1
I2

S
N
1
N
P
S
P
H T
2
N
P
R1
R1
I2
update prev

S
N
1
N
P
S
P
H T
2
N
P
R1
reinit & repeat
R1
I2

Concurrent remove &
insert
When insert wins

S
N
1
N
P
S
P
H T
2
N
P
R1
CAS
R1
I2

S
N
1
N
P
S
P
H T
2
N
P
R1
R1
I2
will succeed marking on remove retry

S
N
1
N
P
S
P
H T
2
N
P
R1
help remove
mark prev
R1
I2

S
N
1
N
P
S
P
H T
2
N
P
R1
correct prev
R1
I2
Remove is over!

S
N
1
N
P
S
P
H T
2
N
P
R1
correct prev
R1
I2

Takeaways
• A kind of algo you need a paper for
• Hard to improve w/o writing another paper
• Good news: stress tests uncover most impl bugs
• Bad news: when stress test fails, you up to long
hours
• More bad news: hard to find bugs that violate lock-
freedomness of algorithm

Summary: what we can do
• Insert items (at the end of the queue)
• Remove items (at the front of the queue)
• Traverse the list
• Remove items at arbitrary locations
• In O(1)

Linearizability
• Insert last
• Linearizes at CAS of next
• Remove first / arbitrary
• Success – at CAS of next
• Fail – at read of head.next

More about algorithm
• Sundell & Tsigas algo supports deque operations
• Can PushLeft & PopRight
• PopLeft is simple – read head.next & remove
• But cannot linearize them all at cas points
• PushLeft, PushRight, PopRight - Ok
• PopLeft linearizes at head.next read (!!!)

Summary of impl notes
• Use GC (drop all memory management details)
• Merge head & tail into a single sentinel node
• Empty list is just one object (prev & next onto itself)
• One item += one object
• Reuse “remove mark” objects
• One-element lists reuse of ptrs to sentinel all the time
• Encapsulate!
S
N
P
1
N
P
Q

Mods
More complex atomic operations

Basic mods (1)
• Insert item conditionally on prev tail value
S
N
1
N
P
S
P
H T
2
N
P
check & bailout before CAS

Basic mods (2)
• Remove head conditionally on prev head value
S
N
1
N
P
S
P
H T
R1
check & bailout before CAS

Practical use-case: synchronous
channels
val channel = Channel<Int>()
// coroutine #1
for (x in 1..5) {
channel.send(x * x)
}
// coroutine #2
repeat(5) {
println(channel.receive())
}
1
2
3

Senders wait
Sender #1H Sender #2 Sender #3 T
More
senders
Incoming
receivers
Receiver removes
first if it is a sender
node
Sender inserts last if it
is not a receiver node

Receivers wait
Receiver #1H Receiver #2 Receiver #3 T
More
receivers
Incoming
senders
Sender removes
first if it is a receiver
node
Receiver inserts last if
it is not a sender node

Send function sketch
fun send(element: T) {
while (true) {
// try to add sender, unless prev is receiver
if (enqueueSend(element)) break
// try to remove first receiver
val receiver = removeFirstReceiver()
if (receiver != null) {
receiver.resume(element) // resume receiver
break
}
}
}
1
2
3
4

Channel use-case recap
• Uses insert/remove ops conditional on tail/head
node
• Can abort (cancel) wait to receive/send at any time
by using remove
• Full removal -- no garbage is left
• Pretty efficient in practice
• One item lists – one “garbage” object

Multi-word compare and
swap (CASN)
Build even bigger atomic operations

Use-case: select expression
val channel1 = Channel<Int>()
val channel2 = Channel<Int>()
select {
channel1.onReceive { e -> ... }
channel2.onReceive { e -> ... }
}

Impl summary: register (1)
Select
status: NS
Channel1
Queue
Channel2
Queue
1. Not selected
2. Selected

Select
status: NS
Channel1
Queue
Channel2
Queue
Add node to channel1 queue if
not selected (NS) yet
N1

Select
status: NS
Channel1
Queue
Channel2
Queue
Add node to channel2 queue if
not selected (NS) yet
N1 N2

Impl summary: wait
Select
status: NS
Channel1
Queue
Channel2
Queue
N1 N2

Impl summary: select (resume)
Select
status: S
Channel1
Queue
Channel2
Queue
N1
Make selected and remove node
from queue

Impl summary: clean up rest
Select
status: S
Channel1
Queue
Channel2
Queue
Remove non-selected waiters
from queue

Double-Compare
Single-Swap (DCSS)
Building block for CASN

DCSS spec in pseudo-code
A B
fun <A,B> dcss(
a: Ref<A>, expectA: A, updateA: A,
b: Ref<B>, expectB: B) =
atomic {
if (a.value == expectA && b.value == expectB) {
a.value = updateA
}
}
1
2
3
4

DCSS: init descriptor
DCSS Descriptor
(a, expectA, updateA,
b, expectB)
A BexpectA expectB
updateA

DCSS: prepare
DCSS Descriptor
b, expectB)
A B
CAS ptr to descriptor if a.value == expectA
expectA expectB
updateA

DCSS: read b.value
DCSS Descriptor
b, expectB)
A B
CAS ptr to descriptor if a.value == expectA
expectA expectB
updateA

DCSS: complete (when success)
DCSS Descriptor
b, expectB)
A BexpectA expectB
updateA
CAS to updated value if a still points to descriptor

DCSS: complete (alternative)
DCSS Descriptor
b, expectB)
A BexpectA !expectB
updateA

DCSS: complete (when fail)
DCSS Descriptor
b, expectB)
A BexpectA !expectB
updateA
CAS to original value if a still points to descriptor

DCSS: States
Init
A: ???
(desc created)
A: desc
A was expectA
prep ok
A: ???
A was !expectA
prep fail
one tread
1 2
A: updateA
B was expectB
success
A: expectA
B was !expectB
4 5
fail
Any other thread encountering
descriptor helps complete
Originator cannot
learn what was the
outcome
Lock-free algorithm without loops!
3

Caveats
• A & B locations must be totally ordered
• or risk stack-overflow while helping
• One way to look at it: Restricted DCSS (RDCSS)

DCSS Mod: learn outcome
A B
fun <A,B> dcssMod(
b: Ref<B>, expectB: B): Boolean =
atomic {
a.value = updateA
true
} else
false
}

DCSS Mod: init descriptor
DCSS Descriptor
b, expectB)
A BexpectA expectB
updateA
Outcome: UNDECIDED
Consensus

DCSS Mod: prepare
DCSS Descriptor
b, expectB)
A BexpectA expectB
updateA
Outcome: UNDECIDED

DCSS Mod: read b.value
DCSS Descriptor
b, expectB)
A BexpectA expectB
updateA
Outcome: UNDECIDED

DCSS Mod: reach consensus
DCSS Descriptor
b, expectB)
A BexpectA expectB
updateA
Outcome: SUCCESS
CAS(UNDECIDED,
DECISION)

DCSS Mod: complete
DCSS Descriptor
b, expectB)
A BexpectA expectB
updateA
Outcome: SUCCESS

DCSS Mod: States
Init
A: ???
Outcome: UND
(desc created)
A: desc
Outcome: UND
A was expectA
prep ok
A: ???
Outcome: FAIL
A was !expectA
prep fail
one tread
1 2
A: desc
Outcome: SUCC
B was expectB
success
A: desc
Outcome: FAIL
6
fail
A: expectA
A: updateA
5
7
Still no loops!
3
4

Compare-And-Swap
N-words (CASN)
The ultimate atomic update

CASN spec in pseudo-code
A B
fun <A,B> cas2(
b: Ref<B>, expectB: B, updateB: B): Boolean =
atomic {
a.value = updateA
b.value = updateB
true
} else
false
}
1
2
3
4
5
For two words, for simplicity

CASN: init descriptor
DCSS Descriptor
b, expectB, updateB)
A BexpectA expectB
updateA
Outcome: UNDECIDED
updateB

CASN: prepare (1)
DCSS Descriptor
A BexpectA expectB
updateA
Outcome: UNDECIDED
updateBCAS

CASN: prepare (2)
DCSS Descriptor
A BexpectA expectB
updateA
Outcome: UNDECIDED
updateB
Use DCSS to update B if
Outcome == UNDECIDED
DCSS

CASN: decide
DCSS Descriptor
A BexpectA expectB
updateA
Outcome: SUCCESS
updateB
CAS outcome

CASN: complete (1)
DCSS Descriptor
A BexpectA expectB
updateA
Outcome: SUCCESS
updateB
CAS

CASN: complete (2)
DCSS Descriptor
A BexpectA expectB
updateA
Outcome: SUCCESS
updateB
CAS

CASN: States
A: ???
B: ???
O: UND
A: desc
B: ???
O: UND
A: desc
B: desc
O: UND
A: updateA
B: desc
O: SUCC
Init
A: updateA
B: updateB
O: SUCC
1 2 3
5
A: desc
B: desc
O: SUCC
4
6
A: ???
B: ???
O: FAIL
A != expectA
A: desc
B: ???
O: FAIL
B != expectB
one tread
A: expectA
B: ???
O: FAIL
7 8
9
DCSS
Prevents from
going back in this SM
descriptor is known to other (helping) threads

Using it in practice
All the little things that matter

It is easy to combine multiple operations
with DCSS/CASN that linearize on a CAS
with a descriptor parameters that are
known in advance

Trivial example: Treiber stack
1TOP 2
3 New node
CAS
expect
update

Let’s go deeper
Into unpublished territory

Doubly linked list:
insert last

Operation Descriptor
A ref: ???
expectA: Sentinel
updateA: Node #2
…
Outcome: UNDECIDED
Doubly linked list: insert (0)
S
N
1
N
P
S
P
H T
2
N
P
CAS here
We know expected value for CAS in advance
We know updated value for CAS in advance
??? can fill in A before CAS & update on retry
DCSS here is needed (always!)

S
N
1
N
P
S
P
H T
2
N
P
A ref: ???
expectA: Sentinel
updateA: Node #2
…
Outcome: UNDECIDED
DCSS Descriptor
affected node: #1
operation ref

S
N
1
N
P
S
P
H T
2
N
P
Operation Descriptor DCSS Descriptor
Helpers are a bound to stumble
upon the same descriptor
CAS can only succeed on last node
Competing inserts will
complete (help) us first
affected node: #1
operation ref
A ref: ???
expectA: Sentinel
updateA: Node #2
…
Outcome: UNDECIDED

S
N
1
N
P
S
P
H T
2
N
P
desc is updated after successful DCSS
A ref: Node #1
expectA: Sentinel
updateA: Node #2
…
Outcome: UNDECIDED
DCSS Descriptor
affected node: #1
operation ref

S
N
1
N
P
S
P
H T
2
N
P
Stays pointed until operation
outcome is decided
A ref: Node #1
expectA: Sentinel
updateA: Node #2
…
Outcome: UNDECIDED

Doubly linked list:
remove first

Doubly linked list: remove (0)
S
N
1
N
P
S
P
H T
R1
CAS here
2
N
P
A ref: ???
expectA: ???
updateA: Rem[???]
…
Outcome: UNDECIDED
Both not known in advance
Deterministic f(expectA)

S
N
1
N
P
S
P
H T
R1
2
N
P
A ref: ???
expectA: ???
updateA: Rem[???]
…
Outcome: UNDECIDED
DCSS Descriptor
affected node: #1
operation ref
old value: #2

S
N
1
N
P
S
P
H T
R1
2
N
P
A ref: ???
expectA: ???
updateA: Rem[???]
…
Outcome: UNDECIDED
It locks what node we are to removeCannot change w/o removal of #1
We don’t support PushLeft!!!
DCSS Descriptor
affected node: #1
operation ref
old value: #2

S
N
1
N
P
S
P
H T
R1
2
N
P
A ref: Node #1
expectA: Node #2
updateA: Rem[#2]
…
Outcome: UNDECIDED
desc is updated after successful DCSS
DCSS Descriptor
affected node: #1
operation ref
old value: #2

S
N
1
N
P
S
P
H T
R1
2
N
P
A ref: Node #1
expectA: Node #2
updateA: Rem[#2]
…
Outcome: UNDECIDED
Stays pointed until operation
outcome is decided

Closing notes
• All we care about is CAS that linearizes operation
• Subsequent updates are helper moves
• Invoke regular help/correct functions
• Perfect algorithm to combine with optional
Hardware Transactional Memory (HTM)

Let’s enjoy what we’ve accomplished

References
• Kotlin language
• http://kotlinlang.org
• Kotlin coroutines support library
• http://github.com/kotlin/kotlinx.coroutines

Thank you
Any questions?
email me to elizarov at gmail
relizarov

Lock-free algorithms for Kotlin Coroutines

More Related Content

What's hot

Similar to Lock-free algorithms for Kotlin Coroutines

More from Roman Elizarov

Recently uploaded

Lock-free algorithms for Kotlin Coroutines