Couchbase live 2016

Couchbase usage and perfs
Criteo - Couchbase Live 2016 - Paris

About me
Pierre Mavro - Lead DevOps - NoSQL Team
Working at Criteo as Site Reliability Engineer
@deimosfr

Criteo
31 Offices
2000+ employees

Criteo technical insights
● 700 engineers
● 17K servers
● 27K displays per second
● 2.4M requests per second

Criteo SRE: biggest challenges
● Scaling
● Low latency
● High throughput
● Resiliency
● Automation

Couchbase figures at Criteo (Worldwide)
● 1300+ physical servers
● 100+ clusters (up to 50 servers each)
● 90TB of data in memory
● 25M QPS
● < 8ms constant latency

Couchbase usage at Criteo
● Storing UUIDs < 30b
● Storing blobs (ex. binary images)
● Storing keys size > value data size (sometimes)
● Serving between 100Kqps to 2.5Mqps per cluster
● Low latency at 99perc < 2ms
● Data size per cluster between 500Gb to ~12Tb (with replica)
● All data fits in memory
● Inter datacenter replication (custom client driver)

Legacy infrastructure
● Couchbase v1.8 legacy (80%) and v3.0.1
community (20%)
● Slow rebalance (up to 48h for 1 server)
● Rebalance failures on high loaded
clusters
● Max connection reached on v1.8 (9k)

Legacy infrastructure
● Same cluster shares on persisted and non
persisted buckets
● No dedicated latency monitoring tool
● No auto restart/upgrade orchestrator
● Server benchmarks update required
● Lack of Couchbase best practices

Benchmarks
● Couchbase Enterprise 3.1.3
● 3x HP GEN9 DL360 (256GB RAM, 6x400GB SSD RAID10, 1Gb Network
interface) (2x injectors + 1 server)
● Key size: UUID string (36 bytes) + Couchbase metadata (56 bytes)
● Value size: uniform range between 750 B and 1250 B (avg 1 kB)
● Number of items: 50M/node (with replica) or 100M/node (without replica)
● Resident active items (= items fully in RAM): ~50%
● Value-only ejection mode (only data value can be removed from RAM, keeping
metadata + key in RAM).

Benchmarks
Heavy Writes/Little Reads (10Kqps) without replica
Write rate
per node
Status Disk Write
Queue
Latency 50
perc
Latency 95
perc
Latency
99perc
Latency
99,9 perc
40 Kset/s OK 10M items 0.4 ms 0.7 ms 2 ms 8 ms
60 Kset/s OK 30M items 0.4 ms 0.7 ms 2 ms 20 ms
80 Kset/s OK 50M items 0.4 ms 2 ms 7 ms 30 ms

Benchmarks
Heavy Writes/Little Reads (10Kqps) with one replica
Write rate
per node
Status Disk Write
Queue
Latency 50
perc
Latency 95
perc
Latency
99perc
Latency
99,9 perc
50 Kset/s NOK (OOM) >70M items 0.7 ms 5 ms 50 ms 75 ms

Benchmarks
Heavy Reads/Little Writes (10Kqps) with one replica
Read rate
per node
Status Disk Write
Queue
Latency 50
perc
Latency 95
perc
Latency
99perc
Latency
99,9 perc
25 Kset/s OK 130k items 0.4 ms 0.7 ms 4 ms 8 ms
50 Kset/s OK 130k items 0.4 ms 1 ms 5 ms 10 ms
75 Kset/s OK 130k items 0.4 ms 5 ms 15 ms 25 ms
100 Kset/s NOK 50k to 500k
items
16 ms 25 ms 45 ms 100 ms

Benchmarks
Conclusion for a single node:
● Network 1Gb is the bottleneck
● Replicas introduce latency
● Reads are fast
● Max write with replica per node: 40 Kqps
● Max read with replica per node: 90Kqps
● Max read/write without replica per node: 90 Kqps

Metrics
Metrics are greats !
● QPS total (read + write)
● Total RAM usage
● Availability
● Number of items
● …
But it’s not relevant enough to know the global service status !

SLI: add the major missing metric
Adding latency monitoring as SLI, to be part of our Couchbase SLO and SLA

Support contract
● Get latest Couchbase bug fixes
● Suggest Couchbase enhancements
● Speed up resolution of incidents with the help of support
● Get better Couchbase tuning recommendations for performance

Split usages
● High-load (QPS) buckets are on dedicated
clusters
● Low-load (QPS) buckets are shared on
separate “shared” clusters
● Persisted and Non persisted clusters are not
on the same servers anymore

Automation: why?
● Need to upgrade from the community to the enterprise version
● Need to apply new configuration options that require a restart of all the
nodes in a cluster
● Need to apply fixes that require a reboot of all the nodes in a cluster
● Need to reinstall servers from scratch

Automation: how?
● Criteo is using Chef to bootstrap servers, deploy applications and
configuration
● We did not want to add another new tool in the loop
● Nothing with the required features already exists
● We developed a FOSS Chef cookbook for this and other use-cases:
Choregraphie
https://github.com/criteo-cookbooks/choregraphie

Automation: Choregraphie
With Choregraphie we can perform:
● Rolling restart with rebalance
● Rolling upgrade with rebalance
● Use an optional, additional server to speed up rebalance
● Rolling reboot with rebalance
● Rolling reinstall with rebalance
Choregraphie is open source! Feel free to contribute

6. Couchbase and system tuning

Couchbase best practices / system tuning
● Minimize swap usage:
○ vm.swappiness = 0 (set to 1 for kernel >3.5)
● Disable transparent Hugepages:
○ chkconfig disable-thp on
● Set SSD IO-Scheduler to deadline:
○ echo “deadline” > /sys/block/sdX/queue/scheduler
● Change CPUFreq governor:
○ modprobe cpufreq_performance
● Leverage maximum connection:
○ max_conns_on_port_XXXX: 30000

Couchbase tuning
● Upgrade Nonio parameter to 8 to avoid rebalance failures on high-load
clusters:
○ curl -i -u <Administrator>:<pwd> --data 'ns_bucket:update_bucket_props("<bucketname>",
[{extra_config_string, "max_num_nonio=<N>"}]).' http://<NodeIP>:8091/diag/eval
● Disable access log if you don’t need them to reduce disk usage (native in
Couchbase 4.5):
○ curl -i -u <Administrator>:<pwd> --data 'ns_bucket:update_bucket_props("<bucketname>",
[{extra_config_string, "access_scanner_enabled=false"}]).' http://<NodeIP>:8091/diag/eval

Tuning...what’s next?
● Network teaming 802.3ad (bonding) with 2x1Gb cards
● 10Gb network cards
● Upgrade to Couchbase 4.5
● Upgrade kernel to a newer LTS vanilla to enable specific SSD enhancement
(multi queues SSD)
● Switch to Mesos to reduce administration time

Questions ?
Criteo - Couchbase Live 2016 - Paris
Pierre Mavro / @deimosfr

Couchbase live 2016

More Related Content

What's hot

Viewers also liked

Similar to Couchbase live 2016

More from Pierre Mavro

Recently uploaded

Couchbase live 2016