MongoDB and In-Memory Computing

ElevateYour Enterprise
Architecture with an In-Memory
Computing Strategy
Dylan Tong
Principal Solutions Architect
dylan.tong@mongodb.com

In-Memory Computing
How can we process data as fast as possible
by leveraging in-memory speed at it’s best?
What are the possibilities if we could?

High-frequency trading (HFT) is a program trading platform that uses
powerful computers to transact a large number of orders at very fast
speeds. It uses complex algorithms to analyze multiple markets and
execute orders based on market conditions.
Typically, the traders with the fastest execution speeds are more
profitable than traders with slower execution speeds.
Source: Investopedia
Speed Matters…

Speed Matters…
Amazon found that it increased revenue by 1% for every 100ms of
improvement [source: Amazon]
A 1-second delay in page load time equals 11% fewer page views,
a 16% decrease in customer satisfaction, and 7% loss in
conversions. [Source: Aberdeen Group]
A study found that 27% of the participants who did mobile shopping
were dissatisfied due to the experience being too slow. [Source:
Forrester Consulting]

How Fast?
Latency Unit
RAM access 100s ns
SSD access 100s µs
HDD access 10s ms
Normalized to 1 s
~6 min
~6 days
~12 months

Why Now?
*Average $/GB
2015 $4.37
2013 $5.5
2010 $12.37
2005 $189
2000 $1,107
1995 $30,875
1990 $103,880
1985 $859,375
1980 $6,328,125
$0
$20
$40
$60
$80
$100
$120
$140
$160
$180
$200
2005 2010 2013 2015
Last 10 Years…
“Generally affordable”
*http://www.statisticbrain.com/average-historic-price-of-ram/

Why Now?
$0.00
$2.00
$4.00
$6.00
$8.00
$10.00
$12.00
$14.00
2010 2013 2015
“An Option at Scale”
*Average $/GB
2015 $4.37
2013 $5.5
2010 $12.37
2005 $189
2000 $1,107
1995 $30,875
1990 $103,880
1985 $859,375
1980 $6,328,125
Last 5 Years…
*http://www.statisticbrain.com/average-historic-price-of-ram/

"This will process these data using algorithms for machine
learning and artificial intelligence before sending the data
back to the car.
The zFAS board will in this way continuously extend its
capabilities to master even complex situations increasingly
better," Audi stated. "The piloted cars from Audi thus learn
more every day and with each new situation they
experience.”
Source: T3.com
The possibilities…

Challenges: Cost Viability
= $34,777/yr.  ~$1.74M/yr. for infrastructure to support 100TB

Challenges: Cost Viability
Storage Type Avg. Cost ($/GB) Cost at 100TB ($)
RAM 5.00 500K
SSD 0.47-1.00 47K to 100K
HDD 0.03 3K
http://www.statisticbrain.com/average-cost-of-hard-drive-storage/
http://www.myce.com/news/ssd-price-per-gb-drops-below-0-50-how-low-can-they-go-70703/

Challenges: Durability
Volatile Memory
• What happens when things fail,
and what data maybe loss?
• How does the system synchronize
with your durable storage? Does it
do this well, and is it simple to
implement?

Challenges: Design Still Matters

Scenario : ECommerce Modernization
Initiative
Business Problems Technology Limitation
Customer experience is suffering during high traffic
events.
Too expensive to scale system to support spike
events.
Scaling system is hard, and engineering teams
can’t react fast enough in the event of unexpected
growth
Some caching solution implemented, but it mostly
only helps with read performance; synchronizing
writes has been a development nightmare.
Lack of mobile customers in Europe and Asia has
been attributed to latency issues.
Difficult to extend data architecture globally, so
effort is put on hold

Initiative
Business Problems Technology Limitation
Below industry conversation rate performance
has been attributed partly to poor personalization
Customer info is siloed across across the
Enterprise, and it’s too complicated to bring this
data together so effective models can be built to
drive personalization
“Big Data” project to bring data together to drive
machine learning and cognitive capabilities in
platform failed as data scientists report platform
was too slow to develop on, and performance
was impractical.
Business analysts have siloed views of the
eCommerce channel, and information isn’t
getting to them fast enough
Related to limitations above
Integrating data into data warehouse is slow and
hard to maintain

Orders
Product
Catalog
Customer Data:
Profile, Sessions,
Carts, Personalization
Inventory
NoSQLRDBMS
Platform Services
eCommerce Datastores Dependent External Data Sources and Integrations
CRM ERP PIM
Data warehouse
BI Tools
…
Platform API
Initiative

Customer Data:
Profile, Sessions,
NoSQLRDBMS CRM ERP PIM
Partner Sources: Supplier
databases…etc.
Legacy:
Mainframe
Product
Catalog
Silo Data-sources Problem
SLOW AND POOR SCALABILITY

NoSQLRDBMS CRM ERP PIM
databases…etc.
Legacy:
Mainframe
Operational Single View
Customer Data:
Profile, Sessions,
Product
Catalog

MongoDB
Enterprise Data Hub

Reference: Metlife Wall Presentation

{
product_name: ‘Acme Paint’,
color: [‘Red’, ‘Green’],
size_oz: [8, 32],
finish: [‘satin’, ‘eggshell’]
}
{
product_name: ‘T-shirt’,
size: [‘S’, ‘M’, ‘L’, ‘XL’],
color: [‘Heather Gray’ … ],
material: ‘100% cotton’,
wash: ‘cold’,
dry: ‘tumble dry low’
}
{
product_name: ‘Mountain Bike’,
brake_style: ‘mechanical disc’,
color: ‘grey’,
frame_material: ‘aluminum’,
no_speeds: 21,
package_height: ‘7.5x32.9x55’,
weight_lbs: 44.05,
suspension_type: ‘dual’,
wheel_size_in: 26
}
Documents in the same product catalog collection in MongoDB
Dynamic Schema

Flexible Data Model: facilitates
agile development and continuous
delivery methodologies
Scalability: scale-out dynamically
as demand grows
Still Agile, Scalable and Simple

High Performance:
• More predictable, and lower
latency on less in-memory
infrastructure.
In-Memory Storage Engine
Infrastructure Optimization:
• Assign a data subset on the
In-Memory SE via Zone
Sharding.
• Optimize on cost vs.
performance without silos.
.Rich Query Capability:
• Full MongoDB Query and
Indexing Support.
IN-MEMORY SE NODES WIREDTIGER NODES

WEST EAST
Update
SHARD 4
TAG: EAST, WT
Local Read/Write with Strong Consistency
Session Data Geographically Localized, and with In-memory Engine Latency
SHARD 2
TAG: WEST, WT
SHARD 3
TAG: EAST, IN_MEM
SHARD 1
TAG: WEST, IN_MEM

Durability and Fault-Tolerance:
• Mixed ReplicaSets allow data to
be replicated from In-Memory SE
to WT SE.
• Full High Availability: automatic
fail-over, cross geography.

NoSQLRDBMS
Platform Databases Dependent External Data Sources and Integrations
CRM ERP PIM
databases…etc.
Legacy:
Mainframe
Operational Unified View
Advance Personalization
1. TRAIN/RE-TRAIN
ML MODELS
2. APPLY MODELS TO
REAL-TIME
STREAM OF
INTERACTIONS
3. DRIVE TARGETED
CONTENT,
RECOMMENDATIONS…ET
C.

Why ?
Speed. By exploiting in-memory optimizations, Spark
has shown up to 100x higher performance than
MapReduce running on Hadoop.
Simplicity. Easy-to-use APIs for operating on large
datasets. This includes a collection of sophisticated
operators for transforming and manipulating
semi-structured data.
Unified Framework. Packaged with higher-level libraries,
including support for SQL queries, machine learning,
stream and graph processing. These standard libraries
increase developer productivity and can be combined to
create complex workflows.

+Spark Connector
• Native Scala connector,
certified by Databricks
• Exposes all Spark APIs &
libraries
• Efficient data filtering
with predicate
pushdown, secondary
indexes, & in-database
aggregations
• Locality awareness to
reduce data movement

Locality Awareness
CLUSTER
MANAGER
Task
Task
Task
Task
Task
DRIVER
PROGRAM
SPARK
CONTEXT

+Spark Connector
Blend client data from multiple
internal and external sources to
drive real time campaign
optimization

MongoDB+Spark at China Eastern
180m fare calculations & 1.6
billion searches per day
Oracle database peaked at 200
searches per second.
Radically re-architect their fare
engine to meet the required
100x growth in search traffic.

ETL
(Yesterday’s) Data at the Speed of Thought?

BI Connector
BI Connector
db.orders.aggregate( [
{
$group: {
_id: null,
total: { $sum:
"$price" }
}
}
] )
SELECT SUM(price)
AS total
FROM orders

Resources for You
Spark Connector
• Download: Spark Packages
GitHub
• Documentation
• Whitepaper:
Turning Analytics into Real-Time
Action
• Education:M233: Getting
Started with Spark and
MongoDB
• Download: Enterprise Server
• Documentation
BI Connector
• Download: BI Connector
• Documentation

Dylan Tong
Principal Solutions Architect
dylan.tong@mongodb.com
Q&A

MongoDB and In-Memory Computing

More Related Content

What's hot

Viewers also liked

Similar to MongoDB and In-Memory Computing

Recently uploaded

MongoDB and In-Memory Computing

Editor's Notes