Webinar: Using Control Theory to Keep Compactions Under Control

Using Control Theory to Keep
Compactions Under Control
Glauber Costa - VP of Field Engineering, ScyllaDB
WEBINAR

Glauber Costa
2
Glauber Costa is the VP of Field Engineering at ScyllaDB.
He shares his time between the engineering department
working on upcoming Scylla features and helping
customers succeed.
Before ScyllaDB, Glauber worked with Virtualization in the
Linux Kernel for 10 years with contributions ranging from
the Xen Hypervisor to all sorts of guest functionality and
containers

3
+ Next-generation NoSQL database
+ Drop-in replacement for Cassandra
+ 10X the performance & low tail latency
+ Open source and enterprise editions
+ Founded by the creators of KVM hypervisor
+ HQs: Palo Alto, CA; Herzelia, Israel
+ Scylla Summit 2018 November 6-7, SF Bay
About ScyllaDB

Join real-time big-data database developers and users from start-ups
and leading enterprises from around the globe for two days of sharing
ideas, hearing innovative use cases, and getting practical tips and tricks
from your peers and NoSQL gurus.

What are compactions ?
5
Scylla’s write path:
5
Writes
commit log
compaction

6
Compaction Strategy
+ Which sstables to compact, and when?
+ This is called the compaction strategy
+ The goal of the strategy is low amplification:
+ Avoid read requests needing many sstables: read amplification
+ Avoid overwritten/deleted/expired data staying on disk.
+ Avoid excessive temporary disk space needs: space amplification
+ Avoid compacting the same data again and again : write amplification

7
The main compaction strategies
+ Size Tiered Compaction Strategy
+ compact SSTables with roughly the same size together
+ Leveled Compaction Strategy
+ compact SSTables keeping them in different levels that are exponentially bigger
+ Time Window Compaction Strategy
+ Each user-defined time window has a single SSTable
+ Major, or manual compaction
+ compacts everything in a single* SSTable

8
The main compaction strategies
+ Size Tiered Compaction Strategy
+ compact SSTables with roughly the same size together
+ Leveled Compaction Strategy
+ compact SSTables keeping them in different levels that are exponentially bigger
+ Time Window Compaction Strategy
+ Each user-defined time window has a single SSTable
+ Major, or manual compaction
+ compacts everything in a single* SSTable
* see next slide

9
Compactions in Scylla
+ Because all data is sharded, so are SSTables
+ and as a result, so are compactions
+ in a system with 64 vCPUS - expect 64 SSTables after a major compaction
+ same logic for LeveledCompactionStrategy for amount of tables in each level.

Impact of compactions
10
+ Compaction too slow: reads will touch from many SSTables and be slower.
+ Compactions too fast : foreground workload will be disrupted.

11
+ Common solutions is to use limits. Ex: Apache Cassandra
+ “Don’t allow compactions to run at more than 300 MB/s”
+ But how to find that number?
+ But what if the workload changes?
+ But what if there is idle time now?

12
+ Common solutions is to use limits. Ex: Apache Cassandra
+ “Don’t allow compactions to run at more than 300 MB/s”
+ But how to find that number?
+ But what if the workload changes?
+ But what if there is idle time now?
+ Another solution is to determine ratios. Ex: ScyllaDB until 2.2
+ “Don’t allow compactions to use more than 20% of storage bandwidth/CPU”
+ Much better, adapts automatically to resource capacity, use idle time efficiently
+ But no temporal knowledge.

Compactions over time
13
Compactions run. Limited impact,
but still impact

Compactions over time
14
All shards are compacting here Almost no shards are
compacting here

What is Control Theory ?
15
+ Open-loop control system
+ there is some input, a function is applied, there is an output.
+ ex: toaster
+ Closed-loop control systems
+ We want the world to be in a particular state.
+ The current state of the world is fed-back to the control system
+ The control system acts to bring the system back to the goal

Feedback Control Systems
16
1. Measure the state of the world
2. Transfer function
3. Actuator

Measuring - current state of all SSTables
17
Partial New SSTable Size
Static SSTable Size
SSTable Uncompacted Size
SSTable Uncompacted Size
Partial compacted
SSTable Size

Actuators - Schedulers
18
Query
Commitlog
Compaction
Queue
Queue
Queue
Userspace
I/O
Scheduler
Storage
Max useful disk concurrency
I/O queued in FS/deviceNo queues

Transfer Function - Backlog
19
+ Each compaction strategy does a different amount of work
+ For each compaction strategy we determine when there is no more work to be done.
+ Examples:
+ SizeTiered: there is only one SSTable in the system.
+ TimeWindow: there is only one SSTable per Time Window.
+ The backlog is: how many bytes I expect to write to reach the state of zero backlog ?
+ Controller output: f(B)
+ proportional function

Transfer Function - Backlog
20
+ Each compaction strategy does a different amount of work
+ For each compaction strategy we determine when there is no more work to be done.
+ Examples:
+ SizeTiered: there is only one SSTable in the system.
+ TimeWindow: there is only one SSTable per Time Window.
+ The backlog is: how many bytes I expect to write to reach the state of zero backlog ?
+ Controller output: f(B)
+ proportional function
+ This is a self-regulating system:
+ more compaction shares = less new writes = less compaction backlog
+ less compaction shares = more new writes = more compaction backlog

SizeTiered Backlog
22
+ each byte that is written now is rewritten T times, where T is the number of tiers
+ In SizeTiered, tiers are proportinal to SSTable Sizes.

SizeTiered Backlog
23
+ Number of tiers is roughly proportional to the log of this SSTable contribution to the total size
+ Ex: 4 SSTables with 1GB, 4 SSTables with 4GB. Total size = 20GB
+ log4(20 / 1) ~ 2
+ log4(20 / 4) ~ 1

SizeTiered Backlog
24
+ log4(20 / 1) ~ 2
+ log4(20 / 4) ~ 1
+ Backlog for one SSTable is its size, times the backlog per byte:
+ B = SSTableSize * log4(TableSize / SSTableSize)

SizeTiered Backlog
25
+ log4(20 / 1) ~ 2
+ log4(20 / 4) ~ 1
+ Backlog for one SSTable is its size, times the backlog per byte:
+ B = SSTableSize * log4(TableSize / SSTableSize)
+ Backlog for the Entire Table is the Sum of all backlogs for that SSTable.

Results: throughput vs CPU
27
% CPU time used by Compactions
Throughput

Results, changing workload
28
28
Workload changes:
- automatic adjustment
- new equilibrium

Results - impact on latency
29
2929
2ms : 99.9 % latencies at 100 % load
< 2ms : 99 % latencies,
1ms : 95 % latencies.

30
Q&A
Stay in touch
Join us at Scylla Summit 2018
Pullman San Francisco Bay Hotel | November 6-7
scylladb.com/scylla-summit-2018
glauber@scylladb.com
@ScyllaDB
@glcst

United States
1900 Embarcadero Road
Palo Alto, CA 94303
Israel
11 Galgalei Haplada
Herzelia, Israel
www.scylladb.com
@scylladb
Thank You!

Webinar: Using Control Theory to Keep Compactions Under Control

More Related Content

What's hot

Similar to Webinar: Using Control Theory to Keep Compactions Under Control

More from ScyllaDB

Recently uploaded

Webinar: Using Control Theory to Keep Compactions Under Control