Sharding Methods for MongoDB

Sharding Methods For
MongoDB
Jay Runkel
jay.runkel@mongodb.com
@jayrunkel
#MongoDB

2
• Customer Stories
• Sharding for Performance/Scale
– When to shard?
– How many shards do I need?
• Types of Sharding
• How to Pick a Shard Key
• Sharding for Other Reasons
Agenda

5
• 50M users.
• 6B check-ins to date (6M per day growth).
• 55M points of interest / venues.
• 1.7M merchants using the platform for marketing
• Operations Per Second: 300,000
• Documents: 5.5B
Foursquare

6
• 11 MongoDB clusters
– 8 are sharded
• Largest cluster has 15 shards (check ins)
– Sharded on user id
Foursquare clusters

8
• 13 billion+ documents
– 1.5 billion documents added every year
• 1 vehicle history report is > 200 documents
• 12 Shards
• 9-node replica sets
• Replicas distributed across 3 data centers
CarFax Shards

12
Sharding Overview
Primary
Secondary
Secondary
Shard 1
Primary
Secondary
Secondary
Shard 2
Primary
Secondary
Secondary
Shard 3
Primary
Secondary
Secondary
Shard N
…
Query
Router
Query
Router
Query
Router
……
Driver
Application

14
Scaling: Sharding
mongod
Read/Write Scalability
Key Range
0..100

15
Scaling: Sharding
mongod mongod
Key Range
0..50
Key Range
51..100

16
Scaling: Sharding
mongod mongod mongod mongod
Key Range
0..25
Key Range
26..50
Key Range
51..75
Key Range
76.. 100

How do I know I need to shard?

18
Does one server/replica…
• Have enough disk space to store
all my data?
• Handle my query throughput
(operations per second)?
• Respond to queries fast enough
(latency)?

19
• Have enough disk space to store
all my data?
• Handle my query throughput
(operations per second)?
• Respond to queries fast enough
(latency)?
Does one server/replica set…
Server Specs
Disk Capacity
Disk IOPS
RAM
Network
Disk IOPS
RAM
Network

21
• Sum of disk space across shards > greater than
required storage size
Disk Space: How Many Shards Do I
Need?

22
• Sum of disk space across shards > greater than
required storage size
Disk Space: How Many Shards Do I
Need?
Example
Storage size = 3 TB
Server disk capacity = 2 TB
2 Shards Required

23
• Working set should fit in RAM
– Sum of RAM across shards > Working Set
• WorkSet = Indexes plus the set of documents
accessed frequently
• WorkSet in RAM 
– Shorter latency
– Higher Throughput
RAM: How Many Shards Do I Need?

24
• Measuring Index Size and Working Set
db.stats() – index size of each collection
db.serverStatus({ workingSet: 1}) – working
set size estimate

25
• Measuring Index Size and Working Set
db.stats() – index size of each collection
db.serverStatus({ workingSet: 1}) – working
set size estimate
Example
Working Set = 428 GB
Server RAM = 128 GB
428/128 = 3.34
4 Shards Required

26
• Sum of IOPS across shards > greater than
required IOPS
• IOPS are difficult to estimate
– Update doc
– Update indexes
– Append to journal
– Log entry?
• Best approach – build a prototype and measure
Disk Throughput: How Many Shards
Do I Need

27
• Sum of IOPS across shards > greater than
required IOPS
• IOPS are difficult to estimate
– Update doc
– Update indexes
– Append to journal
– Log entry?
• Best approach – build a prototype and measure
Disk Throughput: How Many Shards
Do I Need
Example
Required IOPS = 11000
Server disk IOPS = 5000
3 Shards Required

28
• S = ops/sec of a single server
• G = required ops/sec
• N = # of shards
• G = N * S * .7
N = G/.7S
OPS: How Many Shards Do I Need?

29
• N = # of shards
• G = N * S * .7
N = G/.7S
Sharding Overhead

30
• N = # of shards
• G = N * S * .7
N = G/.7S
Example
S = 4000
G = 10000
N = 3.57
4 Shards

32
• Range
• Tag-Aware
• Hashed
Sharding Types

33
Range Sharding
Key Range
0..25
Key Range
26..50
Key Range
51..75
Key Range
76.. 100

34
Tag-Aware Sharding
Shard Tags
Shard Tag Start End
Winter 23 Dec 21 Mar
Spring 22 Mar 21 Jun
Summer 21 Jun 23 Sep
Fall 24 Sep 22 Dec
Tag Ranges
Winter Spring Summer Fall

35
Hash-Sharding
Hash Range
0000..4444
Hash Range
4445..8000
Hash Range
i8001..aaaa
Hash Range
aaab..ffff

36
Hashed shard key
• Pros:
– Evenly distributed writes
• Cons:
– Random data (and index) updates can be IO
intensive
– Range-based queries turn into scatter gather
Shard 1
mongos
Shard 2 Shard 3 Shard N

37
Range sharding document
distribution

38
Hashed sharding document
distribution

40
Shard Key characteristics
• A good shard key has:
– sufficient cardinality
– distributed writes
– targeted reads ("query isolation")
• Shard key should be in every query if possible
– scatter gather otherwise
• Choosing a good shard key is important!
– affects performance and scalability
– changing it later is expensive

41
Low cardinality shard key
• Induces "jumbo chunks"
• Examples: boolean field
Shard 1
mongos
[ a, b )

42
Ascending shard key
• Monotonically increasing shard key values
cause "hot spots" on inserts
• Examples: timestamps, _id
Shard 1
mongos
[ ISODate(…), $maxKe

44
• Scale
– Data volume
– Query volume
• Global deployment with local writes
– Geography aware sharding
• Tiered Storage
• Fast backup restore
Reasons to shard

45
Global Deployment/Local Writes
Primary:NYC
Secondary:NYC
Primary:LON
Primary:SYD
Secondary:LON
Secondary:NYC
Secondary:SYD
Secondary:LON
Secondary:SYD

46
• Save hardware costs
• Put frequently accessed documents on fast
servers
– Infrequently accessed documents on less capable
servers
• Use Tag aware sharding
Tiered Storage
Current Current Archive Archive
SSD SSD HDD HDD

47
• 40 TB Database
• 2 shards of 20 TB each
• Challenge
– Cannot meet restore SLA after data loss
Fast Restore
mongod mongod
20 TB 20 TB

48
• 40 TB Database
• 4 shards of 10 TB each
• Solution
– Reduce the restore time by 50%
Fast Restore
mongod mongod
10 TB 10 TB
mongod mongod
10 TB 10 TB

50
• To determine required # of shards determine
– Storage requirements
– Latency requirements
– Throughput requirements
• Derive total
– Disk capacity
– Disk throughput
– RAM
• Calculate # of shards based upon individual
server specs
Determining the # of shards

51
• Scalability
• Geo-aware clusters
• Tiered Storage
• Reduce backup restore times
Leverage Sharding For

52
• MongoDB Manual:
http://docs.mongodb.org/manual/sharding/
• Other Webinars:
– How to Achieve Scale With MongoDB
• White Papers
– MongoDB Performance Best Practices
– MongoDB Architecture Guide
Sharding: Where to go from here…

Get Expert Advice on Scaling. For Free.
For a limited time, if
you’re considering a
commercial
relationship with
MongoDB, you can
sign up for a free one
hour consult about
scaling with one of our
MongoDB Engineers.
Sign Up: http://bit.ly/1rkXcfN

54
Webinar Q&A
jay.runkel@mongodb.com
@jayrunkel
Stay tuned
after the
webinar and
take our
survey for your
chance to win
MongoDB
swag.

Sharding Methods for MongoDB

In this document