#MDBlocal
Managing a Heterogeneous
Stack with MongoDB and SQL
Felix Reichenbach
Solutions Architect @ MongoDB
felix@mongodb.com
MUNICH
#MDBLocal
Data technology is changing
Key question:
• How can I combine disparate technologies into a robust and
feature-rich data stack to support a broad set of use cases?
#MDBLocal
Data technologies are like life
• To accomplish big things and solve difficult problems, systems
must be able to talk to each other:
In the end, it is results that matter most!
MDB and MQL is the best way to interrogate data
SQL is a widely-adopted, powerful query language
#MDBLocal
● Explain what the BI Connector is
● Explain how the BI Connector makes this
possible
● Teach you how to manage schema
mappings
● Enable you to query MongoDB using SQL
Our goals today:
#MDBLocal
● Use the default schema mapping
to query MongoDB via SQL
● Create a custom schema mapping
● Query MongoDB via SQL using
your custom schema
When you leave this session:
#MDBLocal
Data Access
API
Change
Data
Capture
(CDC)
Extract
Transform
Load
(ETL)
MongoDB Cluster
Document Data Model
Distributed Systems
Architecture
Cloud | On-Premises
Operational Apps and
Systems of Record
Producers
Operational Data Layer
Consumers
Mainframe Systems
CRM
ERP
Order Management
Supply Chain Mgmt
Data Lake
Marketing Automation
Website
Social Media
Reference Data
Third-Party APIs
Etc.
Batch Load
API CallsBatch File
Exports
Real-Time
Data Changes
Delta Load
MongoDB Change Streams
Write Back to Producer Systems (Optional)
MongoDB
Native Drivers
Consuming Operational Apps
and Services
Internal apps, customer-facing
services, and APIs for
third-party consumption – across
any channel
Business Intelligence (BI)
and Advanced Analytics
Visualization and reporting,
data analysis, artificial
intelligence, machine learning
and more
Human Capital Mgmt
MongoDB Connectors
#MDBLocal
Co-locating operational and analytical workloads
#MDBLocal
What is the MongoDB BI Connector?
#MDBLocal
● BI Connector is presented as MySQL
● Mapping between tabular schema
and MongoDB document structure
● Schema defined in several ways:
○ Sampling
○ “DRDL” Document Relational Definition Language
We need a relational schema to query!
#MDBLocal
● Provides read-only SQL access to any MongoDB standalone or replica set
● Translates incoming SQL queries to MQL aggregation pipelines
○ Pipeline executed on MongoDB cluster
○ Tabular results returned to client via BI Connector
● Supports:
○ ODBC
○ JDBC
○ MySQL
What provides the BI Connector?
#MDBLocal
1 2.0 - 2.4 2.5 - 2.10 2.11+
Adoption
● Proof of Concept
● Hyper-focus on
Tableau
● Expanded SQL
function support
● Performance
improvements
● Improved usability
● Near 100% coverage of standard
SQL functions
● Capable of displacing of RDMS
systems
● Performance improvements
● Enterprise management
features
● mongotranslate
● Query optimization
● Performance improvements
Maturity and Adoption
#MDBLocal
August 2019 –MongoDB Atlas BIC
SQL Queries ProcessedFull Translation Success Rate
Default Schema Mapping
Automatically create a tabular schema via sampling
#MDBLocal
Schema Mapping: Via Document Sampling
Mongosqld
connects to
MongoDB
Documents
sampled from
namespace(s)
Relational schema
available to
incoming
connections
MongoDB Document Relational Schema: _id foreign key
Database Database
Collection Root Table
Field Column
Arrays & Objects Sub-tables
#MDBLocal
> db.sql.findOne()
{
"_id": ObjectId("5bfabde76f280102ddf27969"),
"band": "Slayer",
"formation": {
"year": ISODate("1982-01-01T00:00:00Z"),
"city": "Los Angeles"
},
"popular_albums": [
"Show No Mercy!",
"Seasons in the Abyss",
"Haunting the Chapel",
"Divine Intervention"
],
"members": [
{
"name": "Tom Araya",
"dob": ISODate("1961-06-06T07:00:00Z"),
"primary_instrument": "Bass/Vocals"
},
{
"name": "Kerry King",
"dob": ISODate("1964-06-03T07:00:00Z"),
"primary_instrument": "Guitar"
}
]}
#MDBLocal
mysql> show tables;
+-----------------------------------------+
| Tables_in_sql |
+-----------------------------------------+
| bands |
| bands_members |
| bands_popular_albums |
mysql> SELECT * FROM bands
-> JOIN bands_popular_albums ON bands._id = bands_popular_albums._id;
Demo
I don’t trust slides.
Show me the real thing!
Custom Schema Mapping
Create a tabular schema to meet specific needs of users
#MDBLocal
Document Relational Definition Language (DRDL)
● Often, we need to adjust the default schema generated by BIC
schema creation logic
● Output generated by mongodrdl
● Utilizes BIC default sampling logic
● Defines both names and data types
#MDBLocal
Generate
mongodrdl
output
Edit output to
create
desired
schema
Import
schema to
MongoDB
Provide
unique name
Start
mongosqld
with named
schema
Schema Mapping: Via Custom Schema
#MDBLocal
Generate DRDL Output using mongodrdl
mongodrdl -d sql -c bands > bands.drdl
Generate
mongodrdl
output
schema:
- db: sql
tables:
- table: bands
collection: bands
pipeline: []
columns:
- Name: _id
MongoType: bson.ObjectId
SqlName: _id
SqlType: objectid
- Name: band
MongoType: string
SqlName: band
SqlType: varchar
- Name: formation.city
MongoType: string
SqlName: formation.city
SqlType: varchar
- Name: formation.year
MongoType: date
SqlName: formation.year
SqlType: timestamp
- table: bands_members
collection: bands
pipeline:
- $unwind:
includeArrayIndex: members_idx
path: $members
preserveNullAndEmptyArrays: false
columns:
- Name: _id
MongoType: bson.ObjectId
SqlName: _id
SqlType: objectid
- Name: members.dob
MongoType: date
SqlName: members.dob
SqlType: timestamp
- Name: members.name
MongoType: string
SqlName: members.name
SqlType: varchar
- Name: members.primary_instrument
MongoType: string
SqlName: members.primary_instrument
SqlType: varchar
- Name: members_idx
MongoType: int
SqlName: members_idx
SqlType: int
Root table: bands Sub-table: bands_members
- table: bands_popular_albums
collection: bands
pipeline:
- $unwind:
includeArrayIndex: popular_albums_idx
path: $popular_albums
preserveNullAndEmptyArrays: false
columns:
- Name: _id
MongoType: bson.ObjectId
SqlName: _id
SqlType: objectid
- Name: popular_albums
MongoType: string
SqlName: popular_albums
SqlType: varchar
- Name: popular_albums_idx
MongoType: int
SqlName: popular_albums_idx
SqlType: int
Sub-table: bands_popular_albums
#MDBLocal
Current vs Desired
mysql> show tables;
+-----------------------------------------+
| Tables_in_sql |
+-----------------------------------------+
| bands |
| bands_members |
| bands_popular_albums |
mysql> show tables;
+-----------------------------------------+
| Tables_in_sql |
+-----------------------------------------+
| bands |
Edit output to
create
desired
schema
#MDBLocal
Define new document structure with aggregation
[{$unwind: {
path: "$popular_albums" }},
{$project: {
band: "$band",
formed: "$formation.year",
city: "$formation.city",
popular_album: "$popular_albums”}
}]
schema:
- db: sql
tables:
- table: bands
collection: bands
pipeline: [{$unwind: {path: "$popular_albums"}},
{$project: {
band: "$band",
formed: "$formation.year",
city: "$formation.city",
popular_album: "$popular_albums"
}}]
Use aggregation to create document structure: Add pipeline to drdl:
Edit output to
create
desired
schema
#MDBLocal
Edit table/field names and data types
schema:
- db: sql
tables:
- table: bands
collection: bands
pipeline: [{$unwind: {
path: "$popular_albums"
}}, {$project: {
band: "$band",
formed: "$formation.year",
city: "$formation.city",
popular_album: "$popular_albums"
}}]
columns:
- Name: _id
MongoType: bson.ObjectId
SqlName: _id
SqlType: objectid
- Name: band
MongoType: string
SqlName: band
SqlType: varchar
- Name: city
MongoType: string
SqlName: city
SqlType: timestamp
- Name: popular_album
MongoType: string
SqlName: album
SqlType: varchar
- Name: formed
MongoType: date
SqlName: formed
SqlType: timestamp
Edit output to
create
desired
schema
#MDBLocal
Import schema and provide name
mongodrdl upload --drdl bands_flat.drdl –-schemaSource bic
Import
schema to
MongoDB
#MDBLocal
Import schema and provide name
mongodrdl name-schema --name bands_flat --schemaSource bic 
--schema <uniqueID>
Provide
unique name
#MDBLocal
Start mongosqld in “custom mode”
mongosqld --schemaSource=bic --schemaName=bands_flat 
--schemaMode=custom
Start
mongosqld
with named
schema
#MDBLocal
Updated schema:
mysql> describe bands;
+-------+----------------+------+------+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+----------------+------+------+---------+-------+
| _id | varchar(24) | YES | PRI | NULL | |
| album | varchar(65535) | YES | | NULL | |
| band | varchar(65535) | YES | | NULL | |
| city | datetime | YES | | NULL | |
+-------+----------------+------+------+---------+-------+
Demo
The easy stuff always works.
Show me something more complicated!
#MDBLocal
● Input
● Schema: drdl file
● Query: string or file
● Output
● Aggregation pipeline
Using mongotranslate:
#MDBLocal
mongotranslate --schema=kickoff_net_buckets.drdl --queryFile=returnPlayer.sql -dbName=nfl
SELECT AVG(_id) AS avg_bucket, COUNT(*) as count, `returner.name` AS name
FROM kickoff_net_buckets_returner
GROUP BY name
ORDER BY count
DESC LIMIT 10;
[
{
"$group": {
"_id": {
"group_key_0": "$returner.name"
},
"avg(nfl_DOT_kickoff_net_buckets_returner_DOT__id)": {
"$avg": "$_id"
},
"count(*)": {
"$sum": NumberInt("1")
}
}
},
{
"$sort": {
"count(*)": NumberInt("-1")
}
},
{
"$limit": NumberLong("10")
},
{
"$project": {
"nfl_DOT_avg(nfl_DOT_kickoff_net_buckets_returner_DOT__id)": "$avg(nfl_DOT_kickoff_net_buckets_returner_DOT__id)",
"count(*)": "$count(*)",
"nfl_DOT_kickoff_net_buckets_returner_DOT_returner_DOT_name": "$_id.group_key_0",
"_id": NumberInt("0")
}
},
]
#MDBLocal
● Explain what the BI Connector is
● Explain how the BI Connector
makes this possible
● Teach you how to manage
schema mappings
Revisiting our goals today:
#MDBLocal
● Use the default schema mapping to
query MongoDB via SQL
● Create a custom schema mapping
● Query MongoDB via SQL using your
custom schema
Now you can:
Go and download the BI-Connector!
https://downloads.mongodb.com
THANK YOU
THANK YOU
MongoDB .local Munich 2019: Managing a Heterogeneous Stack with MongoDB & SQL

MongoDB .local Munich 2019: Managing a Heterogeneous Stack with MongoDB & SQL

  • 1.
    #MDBlocal Managing a Heterogeneous Stackwith MongoDB and SQL Felix Reichenbach Solutions Architect @ MongoDB felix@mongodb.com MUNICH
  • 2.
    #MDBLocal Data technology ischanging Key question: • How can I combine disparate technologies into a robust and feature-rich data stack to support a broad set of use cases?
  • 3.
    #MDBLocal Data technologies arelike life • To accomplish big things and solve difficult problems, systems must be able to talk to each other: In the end, it is results that matter most! MDB and MQL is the best way to interrogate data SQL is a widely-adopted, powerful query language
  • 4.
    #MDBLocal ● Explain whatthe BI Connector is ● Explain how the BI Connector makes this possible ● Teach you how to manage schema mappings ● Enable you to query MongoDB using SQL Our goals today:
  • 5.
    #MDBLocal ● Use thedefault schema mapping to query MongoDB via SQL ● Create a custom schema mapping ● Query MongoDB via SQL using your custom schema When you leave this session:
  • 6.
    #MDBLocal Data Access API Change Data Capture (CDC) Extract Transform Load (ETL) MongoDB Cluster DocumentData Model Distributed Systems Architecture Cloud | On-Premises Operational Apps and Systems of Record Producers Operational Data Layer Consumers Mainframe Systems CRM ERP Order Management Supply Chain Mgmt Data Lake Marketing Automation Website Social Media Reference Data Third-Party APIs Etc. Batch Load API CallsBatch File Exports Real-Time Data Changes Delta Load MongoDB Change Streams Write Back to Producer Systems (Optional) MongoDB Native Drivers Consuming Operational Apps and Services Internal apps, customer-facing services, and APIs for third-party consumption – across any channel Business Intelligence (BI) and Advanced Analytics Visualization and reporting, data analysis, artificial intelligence, machine learning and more Human Capital Mgmt MongoDB Connectors
  • 7.
  • 8.
    #MDBLocal What is theMongoDB BI Connector?
  • 9.
    #MDBLocal ● BI Connectoris presented as MySQL ● Mapping between tabular schema and MongoDB document structure ● Schema defined in several ways: ○ Sampling ○ “DRDL” Document Relational Definition Language We need a relational schema to query!
  • 10.
    #MDBLocal ● Provides read-onlySQL access to any MongoDB standalone or replica set ● Translates incoming SQL queries to MQL aggregation pipelines ○ Pipeline executed on MongoDB cluster ○ Tabular results returned to client via BI Connector ● Supports: ○ ODBC ○ JDBC ○ MySQL What provides the BI Connector?
  • 11.
    #MDBLocal 1 2.0 -2.4 2.5 - 2.10 2.11+ Adoption ● Proof of Concept ● Hyper-focus on Tableau ● Expanded SQL function support ● Performance improvements ● Improved usability ● Near 100% coverage of standard SQL functions ● Capable of displacing of RDMS systems ● Performance improvements ● Enterprise management features ● mongotranslate ● Query optimization ● Performance improvements Maturity and Adoption
  • 12.
    #MDBLocal August 2019 –MongoDBAtlas BIC SQL Queries ProcessedFull Translation Success Rate
  • 13.
    Default Schema Mapping Automaticallycreate a tabular schema via sampling
  • 14.
    #MDBLocal Schema Mapping: ViaDocument Sampling Mongosqld connects to MongoDB Documents sampled from namespace(s) Relational schema available to incoming connections MongoDB Document Relational Schema: _id foreign key Database Database Collection Root Table Field Column Arrays & Objects Sub-tables
  • 15.
    #MDBLocal > db.sql.findOne() { "_id": ObjectId("5bfabde76f280102ddf27969"), "band":"Slayer", "formation": { "year": ISODate("1982-01-01T00:00:00Z"), "city": "Los Angeles" }, "popular_albums": [ "Show No Mercy!", "Seasons in the Abyss", "Haunting the Chapel", "Divine Intervention" ], "members": [ { "name": "Tom Araya", "dob": ISODate("1961-06-06T07:00:00Z"), "primary_instrument": "Bass/Vocals" }, { "name": "Kerry King", "dob": ISODate("1964-06-03T07:00:00Z"), "primary_instrument": "Guitar" } ]}
  • 16.
    #MDBLocal mysql> show tables; +-----------------------------------------+ |Tables_in_sql | +-----------------------------------------+ | bands | | bands_members | | bands_popular_albums | mysql> SELECT * FROM bands -> JOIN bands_popular_albums ON bands._id = bands_popular_albums._id;
  • 17.
    Demo I don’t trustslides. Show me the real thing!
  • 18.
    Custom Schema Mapping Createa tabular schema to meet specific needs of users
  • 19.
    #MDBLocal Document Relational DefinitionLanguage (DRDL) ● Often, we need to adjust the default schema generated by BIC schema creation logic ● Output generated by mongodrdl ● Utilizes BIC default sampling logic ● Defines both names and data types
  • 20.
    #MDBLocal Generate mongodrdl output Edit output to create desired schema Import schemato MongoDB Provide unique name Start mongosqld with named schema Schema Mapping: Via Custom Schema
  • 21.
    #MDBLocal Generate DRDL Outputusing mongodrdl mongodrdl -d sql -c bands > bands.drdl Generate mongodrdl output
  • 22.
    schema: - db: sql tables: -table: bands collection: bands pipeline: [] columns: - Name: _id MongoType: bson.ObjectId SqlName: _id SqlType: objectid - Name: band MongoType: string SqlName: band SqlType: varchar - Name: formation.city MongoType: string SqlName: formation.city SqlType: varchar - Name: formation.year MongoType: date SqlName: formation.year SqlType: timestamp - table: bands_members collection: bands pipeline: - $unwind: includeArrayIndex: members_idx path: $members preserveNullAndEmptyArrays: false columns: - Name: _id MongoType: bson.ObjectId SqlName: _id SqlType: objectid - Name: members.dob MongoType: date SqlName: members.dob SqlType: timestamp - Name: members.name MongoType: string SqlName: members.name SqlType: varchar - Name: members.primary_instrument MongoType: string SqlName: members.primary_instrument SqlType: varchar - Name: members_idx MongoType: int SqlName: members_idx SqlType: int Root table: bands Sub-table: bands_members
  • 23.
    - table: bands_popular_albums collection:bands pipeline: - $unwind: includeArrayIndex: popular_albums_idx path: $popular_albums preserveNullAndEmptyArrays: false columns: - Name: _id MongoType: bson.ObjectId SqlName: _id SqlType: objectid - Name: popular_albums MongoType: string SqlName: popular_albums SqlType: varchar - Name: popular_albums_idx MongoType: int SqlName: popular_albums_idx SqlType: int Sub-table: bands_popular_albums
  • 24.
    #MDBLocal Current vs Desired mysql>show tables; +-----------------------------------------+ | Tables_in_sql | +-----------------------------------------+ | bands | | bands_members | | bands_popular_albums | mysql> show tables; +-----------------------------------------+ | Tables_in_sql | +-----------------------------------------+ | bands | Edit output to create desired schema
  • 25.
    #MDBLocal Define new documentstructure with aggregation [{$unwind: { path: "$popular_albums" }}, {$project: { band: "$band", formed: "$formation.year", city: "$formation.city", popular_album: "$popular_albums”} }] schema: - db: sql tables: - table: bands collection: bands pipeline: [{$unwind: {path: "$popular_albums"}}, {$project: { band: "$band", formed: "$formation.year", city: "$formation.city", popular_album: "$popular_albums" }}] Use aggregation to create document structure: Add pipeline to drdl: Edit output to create desired schema
  • 26.
    #MDBLocal Edit table/field namesand data types schema: - db: sql tables: - table: bands collection: bands pipeline: [{$unwind: { path: "$popular_albums" }}, {$project: { band: "$band", formed: "$formation.year", city: "$formation.city", popular_album: "$popular_albums" }}] columns: - Name: _id MongoType: bson.ObjectId SqlName: _id SqlType: objectid - Name: band MongoType: string SqlName: band SqlType: varchar - Name: city MongoType: string SqlName: city SqlType: timestamp - Name: popular_album MongoType: string SqlName: album SqlType: varchar - Name: formed MongoType: date SqlName: formed SqlType: timestamp Edit output to create desired schema
  • 27.
    #MDBLocal Import schema andprovide name mongodrdl upload --drdl bands_flat.drdl –-schemaSource bic Import schema to MongoDB
  • 28.
    #MDBLocal Import schema andprovide name mongodrdl name-schema --name bands_flat --schemaSource bic --schema <uniqueID> Provide unique name
  • 29.
    #MDBLocal Start mongosqld in“custom mode” mongosqld --schemaSource=bic --schemaName=bands_flat --schemaMode=custom Start mongosqld with named schema
  • 30.
    #MDBLocal Updated schema: mysql> describebands; +-------+----------------+------+------+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------+----------------+------+------+---------+-------+ | _id | varchar(24) | YES | PRI | NULL | | | album | varchar(65535) | YES | | NULL | | | band | varchar(65535) | YES | | NULL | | | city | datetime | YES | | NULL | | +-------+----------------+------+------+---------+-------+
  • 31.
    Demo The easy stuffalways works. Show me something more complicated!
  • 32.
    #MDBLocal ● Input ● Schema:drdl file ● Query: string or file ● Output ● Aggregation pipeline Using mongotranslate:
  • 33.
    #MDBLocal mongotranslate --schema=kickoff_net_buckets.drdl --queryFile=returnPlayer.sql-dbName=nfl SELECT AVG(_id) AS avg_bucket, COUNT(*) as count, `returner.name` AS name FROM kickoff_net_buckets_returner GROUP BY name ORDER BY count DESC LIMIT 10; [ { "$group": { "_id": { "group_key_0": "$returner.name" }, "avg(nfl_DOT_kickoff_net_buckets_returner_DOT__id)": { "$avg": "$_id" }, "count(*)": { "$sum": NumberInt("1") } } }, { "$sort": { "count(*)": NumberInt("-1") } }, { "$limit": NumberLong("10") }, { "$project": { "nfl_DOT_avg(nfl_DOT_kickoff_net_buckets_returner_DOT__id)": "$avg(nfl_DOT_kickoff_net_buckets_returner_DOT__id)", "count(*)": "$count(*)", "nfl_DOT_kickoff_net_buckets_returner_DOT_returner_DOT_name": "$_id.group_key_0", "_id": NumberInt("0") } }, ]
  • 34.
    #MDBLocal ● Explain whatthe BI Connector is ● Explain how the BI Connector makes this possible ● Teach you how to manage schema mappings Revisiting our goals today:
  • 35.
    #MDBLocal ● Use thedefault schema mapping to query MongoDB via SQL ● Create a custom schema mapping ● Query MongoDB via SQL using your custom schema Now you can: Go and download the BI-Connector! https://downloads.mongodb.com
  • 36.
  • 37.