GHX proprietary information: Please do not copy or distribute
GHX Use of MongoDB
Bill Perkins: Director of Data and Architecture
Jeff Sherard: Manager of Database Services
1
GHX proprietary information: Please do not copy or distribute
So there, what is a GHX?
2
What is a GHX?
• 500K+ documents a day
– Scaling to many millions
• Saved $5.5 Billion dollars over last 5 years
Healthcare
Providers
Healthcare
Providers
Healthcare
Providers
Healthcare
Providers
1000’s
Healthcare
Providers
Healthcare
Providers
Healthcare
Suppliers
100’s
85% of US market
Purchase Orders
Purchase Order Acknowledgement
Advanced Shipment Notice
Invoice
Contract
Item Master
Vendor Master…
GHX proprietary information: Please do not copy or distribute
Yah so, tell me what the Mongo
selection process felt like?
4
This is me being greeted by the committee for homogenous databases
In this picture I am standing to the right of the photographer (see arrow)
This is me after the selection of Mongo was approved
Weighted Selection Criteria
8
Criteria Weight
Scalability
Development throughput
Flexible Data Model
Developer Impact
Database Services Impact
3
1
3
2
2
Applicability of Different NoSQL Data Models
Characteristics
(1-4, lower is better)
Data
Model
Performance
Scalability
Flexibility
Complexity
Index
Support
Examples Applicability
Key-Value
Stores
1 3 1 1 4
Oracle NoSQL , Redis,
Riak, Membrain,
LevelDB, Voldemort,
Castle, Aerospike
LOW  Stores a key associated with a binary. Good
for cache. We will need multiple keys on objects,
Column
Store
1 1 2 3 3
Cassandra, HBase,
Hypertable, Acuru
MEDIUM  Very scalable, good tooling support, no
built-in second index, steep learning curve
Document
Store
2 2 1 1 1
MongoDB,
Couchbase, RavenDB,
RethinkDB, CouchDB
HIGH  Fields everything, flexible model, supports
multiple indexes. Low learning. Matches our model.
Graph
Database
2 3 4 4 2
Neo4j, InfiniteGraph,
OrientDB
LOW  Very good at quick storage. Queries are
harder. Horizontal scale is questionable. Our data is
not graph oriented.
9
DBs that made first cut
Platform Applicability Community / Market Traction
Oracle
noSQL
LOW Low traction, potential community is high due to Oracle. This is
BerkleyDB with an Oracle wrapper. May be good for cache.
Cassandra MEDIUM High traction, good community. Companies are using this to solve
very large problems. Scales well. Does not support secondary
indexes natively. Good tool support. Good integration with ETL /
BI tools.
Hbase MEDIUM High traction, good community. No one is using this successfully
for OLTP problems. Very complex.
MongoDB HIGH High traction, high community. Lots of features including many
types of indexes. Leading in deployments and growing fastest.
Developer friendly. Tooling for operations is command line.
10
GHX proprietary information: Please do not copy or distribute
So there, what was it like when
Mongo battled Cassandra?
11
The one who looks like “David” in this
picture is actually actually Bryan
Performance Test Results
• Mongo config:
– Primary and 2 replicas
• Cassandra config:
– 3 node ring
• Mongo outperformed Cassandra in
each test
• Different architectures: Mongo
performs best in read heavy scenarios,
while Cassandra is the opposite.
• Mongo’s latency was better on each
test.
13
0
5000
10000
15000
20000
25000
30000
35000
40000
99/1 90/10 70/30 50/50 30/70
OPS by Read/Write %
MongoDB Datastax
0
1
2
3
4
5
6
7
99/1 90/10 70/30 50/50 30/70
Latency by Read/Write %
MongoDB Datastax
Mongo: Index test results
• Added 3 secondary indexes to mongo collection
• YCSB test reflects the cost of index management in writes, but does not use secondary
indexes for reads
14
99/1 90/10 70/30 50/50 30/70
Primary Key Only 36274 29244 21042 17381 15270
Add 3 Indexes 31151 23297 15158 11742 9142
Performance Drop 14% 20% 28% 32% 40%
0
5000
10000
15000
20000
25000
30000
35000
40000
OperationsperSecond
0
1000
2000
3000
4000
5000
6000
7000
8000
99/1 30/70 50/50 70/30 90/10
Read/write ratio
Performance
Performance w/ indexes
ThroughputLatency
Mongo: Sharding test results
• 90/10 read / write ratio
• This was not the same configuration as previous results.
We are interested in the gain, not the raw value
15
Application
Server
(YCSB)
MongoS (router)
MongoD
(shard 1)
MongoD
(shard 2)
Application
Server
(YCSB)
MongoS (router)
Shards Ops/Second Gain %
1 12K ----
2 22.6K 88.3%
Rolling up Results to Match our Requirements
16
Criteria Weight Mongo Cassandra
Scalability
Development throughput
Flexible Data Model
Developer Impact
Database Services Impact
3
1
3
2
2
GHX proprietary information: Please do not copy or distribute
Yah so, what does your data look
like?
17
Event Driven Work Broker
Work Broker
Worker
A
Worker
A
Worker
A
Worker
A
Worker
A
Worker
B
Worker
A
Worker
A
Worker
C
2. Broker assigns it to
exactly one worker
Event A
1. Reads events
from Mongo
Work Event B
3. Work Log and Event B generated by Worker A
4. Repeat…
Distributed Event Broker
• Many readers reading from mongo trying to
assign to workers on as fast as they can take
the work
…
Assignment Approach 1: FindandModify
• Steps
– FindandModify
• Find event of certain type that are not assigned
• Mark them as assigned
• Do this BATCH_SIZE times
• Result
– Really high WriteLock% (90+)
– Upset stakeholder
– **mongo did not recommend this
Assignment Approach 2: N Updates and a find
• Steps
– Update event of certain type that are not assigned
• Mark them as assigned with batch ID
• Do this BATCH_SIZE times
– Find all with batch ID
• Result
– Lots of loss # modify > # find
– Upset stakeholder
– Only tested this, never used in prod
Assignment Approach 3: Find-Modify-Find
• Steps
– Find event _ID where event unassigned and of
certain type limit BATCH_SIZE
– Modify set of events found by writing a batch ID
to assign it
– Find the modified events by batch ID
• Result
– WriteLock% went down
– Lots of loss
• Count of # found > # modified > # found
Assignment Approach 4: Don’t do it
• Steps
– Assign workers a range of events on startup
– Mark every event with a random number token
(1-1000)
– Worker updates an event with token that is in its
range
• Results (moving to this now)
– No conflict or loss expected
– WorkerManager rebalances ranges, monitors
health of workers
Throughput Challenges
• 25MM customer documents * 20 activities per
doc * 4K per activity = 2TB per day throughput
• 90,000 operations per second at peak load
Scaling
• 7 to 10 shards (21-30 Mongod)
– 9,000 to 12,700 db operations per shard per second
Execution
< 1 day
Audit
30 days
Archive
quite a while
GHX proprietary information: Please do not copy or distribute
Yah good, how do you deploy
this thing?
26
GHX proprietary information: Please do not copy or distribute
So there, how does ops feel
about Mongo after being in prod
for 9 months?
29
GHX proprietary information: Please do not copy or distribute
Thank You
31

MongoDB in Denver: How Global Healthcare Exchange is Using MongoDB

  • 1.
    GHX proprietary information:Please do not copy or distribute GHX Use of MongoDB Bill Perkins: Director of Data and Architecture Jeff Sherard: Manager of Database Services 1
  • 2.
    GHX proprietary information:Please do not copy or distribute So there, what is a GHX? 2
  • 3.
    What is aGHX? • 500K+ documents a day – Scaling to many millions • Saved $5.5 Billion dollars over last 5 years Healthcare Providers Healthcare Providers Healthcare Providers Healthcare Providers 1000’s Healthcare Providers Healthcare Providers Healthcare Suppliers 100’s 85% of US market Purchase Orders Purchase Order Acknowledgement Advanced Shipment Notice Invoice Contract Item Master Vendor Master…
  • 4.
    GHX proprietary information:Please do not copy or distribute Yah so, tell me what the Mongo selection process felt like? 4
  • 5.
    This is mebeing greeted by the committee for homogenous databases
  • 6.
    In this pictureI am standing to the right of the photographer (see arrow)
  • 7.
    This is meafter the selection of Mongo was approved
  • 8.
    Weighted Selection Criteria 8 CriteriaWeight Scalability Development throughput Flexible Data Model Developer Impact Database Services Impact 3 1 3 2 2
  • 9.
    Applicability of DifferentNoSQL Data Models Characteristics (1-4, lower is better) Data Model Performance Scalability Flexibility Complexity Index Support Examples Applicability Key-Value Stores 1 3 1 1 4 Oracle NoSQL , Redis, Riak, Membrain, LevelDB, Voldemort, Castle, Aerospike LOW  Stores a key associated with a binary. Good for cache. We will need multiple keys on objects, Column Store 1 1 2 3 3 Cassandra, HBase, Hypertable, Acuru MEDIUM  Very scalable, good tooling support, no built-in second index, steep learning curve Document Store 2 2 1 1 1 MongoDB, Couchbase, RavenDB, RethinkDB, CouchDB HIGH  Fields everything, flexible model, supports multiple indexes. Low learning. Matches our model. Graph Database 2 3 4 4 2 Neo4j, InfiniteGraph, OrientDB LOW  Very good at quick storage. Queries are harder. Horizontal scale is questionable. Our data is not graph oriented. 9
  • 10.
    DBs that madefirst cut Platform Applicability Community / Market Traction Oracle noSQL LOW Low traction, potential community is high due to Oracle. This is BerkleyDB with an Oracle wrapper. May be good for cache. Cassandra MEDIUM High traction, good community. Companies are using this to solve very large problems. Scales well. Does not support secondary indexes natively. Good tool support. Good integration with ETL / BI tools. Hbase MEDIUM High traction, good community. No one is using this successfully for OLTP problems. Very complex. MongoDB HIGH High traction, high community. Lots of features including many types of indexes. Leading in deployments and growing fastest. Developer friendly. Tooling for operations is command line. 10
  • 11.
    GHX proprietary information:Please do not copy or distribute So there, what was it like when Mongo battled Cassandra? 11
  • 12.
    The one wholooks like “David” in this picture is actually actually Bryan
  • 13.
    Performance Test Results •Mongo config: – Primary and 2 replicas • Cassandra config: – 3 node ring • Mongo outperformed Cassandra in each test • Different architectures: Mongo performs best in read heavy scenarios, while Cassandra is the opposite. • Mongo’s latency was better on each test. 13 0 5000 10000 15000 20000 25000 30000 35000 40000 99/1 90/10 70/30 50/50 30/70 OPS by Read/Write % MongoDB Datastax 0 1 2 3 4 5 6 7 99/1 90/10 70/30 50/50 30/70 Latency by Read/Write % MongoDB Datastax
  • 14.
    Mongo: Index testresults • Added 3 secondary indexes to mongo collection • YCSB test reflects the cost of index management in writes, but does not use secondary indexes for reads 14 99/1 90/10 70/30 50/50 30/70 Primary Key Only 36274 29244 21042 17381 15270 Add 3 Indexes 31151 23297 15158 11742 9142 Performance Drop 14% 20% 28% 32% 40% 0 5000 10000 15000 20000 25000 30000 35000 40000 OperationsperSecond 0 1000 2000 3000 4000 5000 6000 7000 8000 99/1 30/70 50/50 70/30 90/10 Read/write ratio Performance Performance w/ indexes ThroughputLatency
  • 15.
    Mongo: Sharding testresults • 90/10 read / write ratio • This was not the same configuration as previous results. We are interested in the gain, not the raw value 15 Application Server (YCSB) MongoS (router) MongoD (shard 1) MongoD (shard 2) Application Server (YCSB) MongoS (router) Shards Ops/Second Gain % 1 12K ---- 2 22.6K 88.3%
  • 16.
    Rolling up Resultsto Match our Requirements 16 Criteria Weight Mongo Cassandra Scalability Development throughput Flexible Data Model Developer Impact Database Services Impact 3 1 3 2 2
  • 17.
    GHX proprietary information:Please do not copy or distribute Yah so, what does your data look like? 17
  • 19.
    Event Driven WorkBroker Work Broker Worker A Worker A Worker A Worker A Worker A Worker B Worker A Worker A Worker C 2. Broker assigns it to exactly one worker Event A 1. Reads events from Mongo Work Event B 3. Work Log and Event B generated by Worker A 4. Repeat…
  • 20.
    Distributed Event Broker •Many readers reading from mongo trying to assign to workers on as fast as they can take the work …
  • 21.
    Assignment Approach 1:FindandModify • Steps – FindandModify • Find event of certain type that are not assigned • Mark them as assigned • Do this BATCH_SIZE times • Result – Really high WriteLock% (90+) – Upset stakeholder – **mongo did not recommend this
  • 22.
    Assignment Approach 2:N Updates and a find • Steps – Update event of certain type that are not assigned • Mark them as assigned with batch ID • Do this BATCH_SIZE times – Find all with batch ID • Result – Lots of loss # modify > # find – Upset stakeholder – Only tested this, never used in prod
  • 23.
    Assignment Approach 3:Find-Modify-Find • Steps – Find event _ID where event unassigned and of certain type limit BATCH_SIZE – Modify set of events found by writing a batch ID to assign it – Find the modified events by batch ID • Result – WriteLock% went down – Lots of loss • Count of # found > # modified > # found
  • 24.
    Assignment Approach 4:Don’t do it • Steps – Assign workers a range of events on startup – Mark every event with a random number token (1-1000) – Worker updates an event with token that is in its range • Results (moving to this now) – No conflict or loss expected – WorkerManager rebalances ranges, monitors health of workers
  • 25.
    Throughput Challenges • 25MMcustomer documents * 20 activities per doc * 4K per activity = 2TB per day throughput • 90,000 operations per second at peak load Scaling • 7 to 10 shards (21-30 Mongod) – 9,000 to 12,700 db operations per shard per second Execution < 1 day Audit 30 days Archive quite a while
  • 26.
    GHX proprietary information:Please do not copy or distribute Yah good, how do you deploy this thing? 26
  • 29.
    GHX proprietary information:Please do not copy or distribute So there, how does ops feel about Mongo after being in prod for 9 months? 29
  • 31.
    GHX proprietary information:Please do not copy or distribute Thank You 31