Global distribution Elastic scale out Guaranteed low latency Comprehensive SLAs
Azure Cosmos DB
Key-Value Column-Family GraphDocuments
A multi-model, globally-distributed database service
Tunable Consistency
SQL
DocumentDB
Azure Tables
Global Distribution
Worldwide presence
Automatic multi-region replication
Multi-homing APIs
Manual and automatic failovers
Elastically Scale-out
Partition management is automatically taken care of for you
Independently scale storage and throughput
Scale storage from Gigabytes to Petabytes
Scale throughput from 100's to 100,000,000's of requests/second
Dial up/down throughput and provision only what is needed
Provisionedrequest/sec
Time
12000000
10000000
8000000
6000000
4000000
2000000
Nov 2016 Dec 2016
Black Friday
Hourly throughput (request/sec)
Guaranteed low latency
Globally distributed with requests served from local region
Write optimized, latch-free database
Automatic Indexing
Five Consistency Models
Helps navigate Brewer's CAP theorem
Intuitive Programming
• Tunable well-defined consistency levels
• Override on per-request basis
Clear PACELC tradeoffs
• Partition – Availability vs Consistency
• Else – Latency vs Consistency
Comprehensive SLAs
99.99% availability
Durable quorum committed writes
Latency, consistency, and throughput also covered by
financially backed SLAs
Made possible with highly-redundant architecture
SLA
Managed Open Source Analytics for the
cloud with a 99.9% SLA.
100% Open Source Hortonworks data platform
Clusters up and running in minutes
63% lower TCO than deploy your own Hadoop on-
premises
Separation of compute and store allows you to scale
clusters to exponentially reduce costs
Multi Region Availability
Available in >25 regions world-wide
Launched most recently in US West 2, and UK regions
Available in China, Europe and US Gov clouds
Security and Compliance to enable OSS for Enterprises
Perimeter Level Security
Virtual Networks
Network Security Groups (firewalls)
Authentication
Azure Active Directory
Kerberos authentication
Authorization
Apache Ranger
RBAC for Admin
POSIX ACLs for Data Plane
Data Security
Server-Side encryption at rest
HTTPS/TLS In-transit
Developer ecosystem
Plugins for HDI available for most popular IDEs for agile
development and debugging
Rich support for powerful notebooks used by data
scientists
Develop in C#, deploy on Linux in Java via HDI
developed SCP.Net technology
Easy ISV integration as you deploy the cluster
REALTIME ANALYTICS
BATCH ANALYTICS
INTERACTIVE ANALYTICS
Reference Big Data Analytics Pipeline
Data Sources Ingest Prepare
(normalize, clean, etc.)
Analyze
(stat analysis, ML, etc.)
Publish
(for programmatic
consumption, BI/visualization)
Consume
(Alerts, Operational Stats,
Insights)
Machine Learning
(Spark + Azure ML)
(Failure and RCA
Predictions)
HDI + ISVs
OLAP for Data
Warehousing
HDI Custom ETL
Aggregate /Partition
Big Data Storage
PowerBI
dashboard
Hive, Spark processing
(Big Data Processing)
Big Data Storage
(Shared with field
Ops, customers,
MIS, and Engineers)
Realtime Machine Learning
(Anomaly Detection)
Azure Data
Lake Store
CosmosDB Azure Blob
Storage
CosmosDB
HDI + ISVs
OLAP for Data
Warehousing
Real-Time Analytics and Internet of Things
Azure IoT Hub
Apache Storm on
Azure HDInsight
Azure Cosmos DB (Hot)
(telemetry and device state)
high-fidelity events
Azure Web Jobs
(Change feed processor)
Azure Logic Apps
latest state
Aggregated + Archived Events (Cold)
PowerBI
Key benefits
• DocumentDB can scale elastically
without operational overhead of
MongoDB
• Perform fast queries over events to
deliver safety, diagnostic, and remote
services to Toyota customers
Business need
• Need to ingest massive
volumes of diagnostic data
from vehicles and take real-
time actions as part of
connected car platform
• Management and operations of
database infrastructure to
handle exponential growth of
data
Toyota drives connected car push forward with:
Azure Cosmos DB and Apache Storm on HDInsight
Flight
information
global safety
alerts
weather
Data Science Scenarios
Device
Notifications
Web / REST API
Azure Cosmos DB
Scale-out Computation
Scale-out Database
Spark connector for Azure Cosmos DB with HDInsight
Distributed Aggregations and Analytics
Spark connector for Azure Cosmos DB with HDInsight
Pushdown Predicate Filtering Data Science Scenarios
{city:SEA}
locations headquarter exports
0 1
country
Germany
city
Seattle
country
France
city
Paris
city
Moscow
city
Athens
Belgium 0 1
{city:SEA, dst: POR, ...},
{city:SEA, dst: JFK, ...},
{city:SEA, dst: SFO, ...},
{city:SEA, dst: YVR, ...},
{city:SEA, dst: YUL, ...},
...
Spark connector for Azure Cosmos DB with HDInsight
Updateable Columns
Flight
information
Data Science Scenarios
Device
Notifications
Web / REST API
{
tripid: “100100”,
delay: -5,
time: “01:00:01”
}
{
tripid: “100100”,
delay: -30,
time: “01:00:01”
}
{delay:-30}
{delay:-30}
{delay:-30}
Get started with Azure Cosmos DB
Get started with Hadoop on HDI
HDInsight EdX Courses
HDInsight Channel9 Videos
HDI Spark + Cosmos DB Tutorial
AskOSSNoSQL@microsoft.com
Build 2017 - P4010 - A lap around Azure HDInsight and Cosmos DB Open Source Analytics + NoSQL
Build 2017 - P4010 - A lap around Azure HDInsight and Cosmos DB Open Source Analytics + NoSQL

Build 2017 - P4010 - A lap around Azure HDInsight and Cosmos DB Open Source Analytics + NoSQL

  • 5.
    Global distribution Elasticscale out Guaranteed low latency Comprehensive SLAs Azure Cosmos DB Key-Value Column-Family GraphDocuments A multi-model, globally-distributed database service Tunable Consistency SQL DocumentDB Azure Tables
  • 6.
    Global Distribution Worldwide presence Automaticmulti-region replication Multi-homing APIs Manual and automatic failovers
  • 7.
    Elastically Scale-out Partition managementis automatically taken care of for you Independently scale storage and throughput Scale storage from Gigabytes to Petabytes Scale throughput from 100's to 100,000,000's of requests/second Dial up/down throughput and provision only what is needed Provisionedrequest/sec Time 12000000 10000000 8000000 6000000 4000000 2000000 Nov 2016 Dec 2016 Black Friday Hourly throughput (request/sec)
  • 8.
    Guaranteed low latency Globallydistributed with requests served from local region Write optimized, latch-free database Automatic Indexing
  • 9.
    Five Consistency Models Helpsnavigate Brewer's CAP theorem Intuitive Programming • Tunable well-defined consistency levels • Override on per-request basis Clear PACELC tradeoffs • Partition – Availability vs Consistency • Else – Latency vs Consistency
  • 10.
    Comprehensive SLAs 99.99% availability Durablequorum committed writes Latency, consistency, and throughput also covered by financially backed SLAs Made possible with highly-redundant architecture SLA
  • 12.
    Managed Open SourceAnalytics for the cloud with a 99.9% SLA. 100% Open Source Hortonworks data platform Clusters up and running in minutes 63% lower TCO than deploy your own Hadoop on- premises Separation of compute and store allows you to scale clusters to exponentially reduce costs
  • 13.
    Multi Region Availability Availablein >25 regions world-wide Launched most recently in US West 2, and UK regions Available in China, Europe and US Gov clouds
  • 14.
    Security and Complianceto enable OSS for Enterprises Perimeter Level Security Virtual Networks Network Security Groups (firewalls) Authentication Azure Active Directory Kerberos authentication Authorization Apache Ranger RBAC for Admin POSIX ACLs for Data Plane Data Security Server-Side encryption at rest HTTPS/TLS In-transit
  • 15.
    Developer ecosystem Plugins forHDI available for most popular IDEs for agile development and debugging Rich support for powerful notebooks used by data scientists Develop in C#, deploy on Linux in Java via HDI developed SCP.Net technology
  • 16.
    Easy ISV integrationas you deploy the cluster
  • 17.
    REALTIME ANALYTICS BATCH ANALYTICS INTERACTIVEANALYTICS Reference Big Data Analytics Pipeline Data Sources Ingest Prepare (normalize, clean, etc.) Analyze (stat analysis, ML, etc.) Publish (for programmatic consumption, BI/visualization) Consume (Alerts, Operational Stats, Insights) Machine Learning (Spark + Azure ML) (Failure and RCA Predictions) HDI + ISVs OLAP for Data Warehousing HDI Custom ETL Aggregate /Partition Big Data Storage PowerBI dashboard Hive, Spark processing (Big Data Processing) Big Data Storage (Shared with field Ops, customers, MIS, and Engineers) Realtime Machine Learning (Anomaly Detection) Azure Data Lake Store CosmosDB Azure Blob Storage CosmosDB HDI + ISVs OLAP for Data Warehousing
  • 19.
    Real-Time Analytics andInternet of Things Azure IoT Hub Apache Storm on Azure HDInsight Azure Cosmos DB (Hot) (telemetry and device state) high-fidelity events Azure Web Jobs (Change feed processor) Azure Logic Apps latest state Aggregated + Archived Events (Cold) PowerBI
  • 20.
    Key benefits • DocumentDBcan scale elastically without operational overhead of MongoDB • Perform fast queries over events to deliver safety, diagnostic, and remote services to Toyota customers Business need • Need to ingest massive volumes of diagnostic data from vehicles and take real- time actions as part of connected car platform • Management and operations of database infrastructure to handle exponential growth of data Toyota drives connected car push forward with: Azure Cosmos DB and Apache Storm on HDInsight
  • 21.
    Flight information global safety alerts weather Data ScienceScenarios Device Notifications Web / REST API Azure Cosmos DB
  • 22.
  • 23.
    Spark connector forAzure Cosmos DB with HDInsight Distributed Aggregations and Analytics
  • 24.
    Spark connector forAzure Cosmos DB with HDInsight Pushdown Predicate Filtering Data Science Scenarios {city:SEA} locations headquarter exports 0 1 country Germany city Seattle country France city Paris city Moscow city Athens Belgium 0 1 {city:SEA, dst: POR, ...}, {city:SEA, dst: JFK, ...}, {city:SEA, dst: SFO, ...}, {city:SEA, dst: YVR, ...}, {city:SEA, dst: YUL, ...}, ...
  • 25.
    Spark connector forAzure Cosmos DB with HDInsight Updateable Columns Flight information Data Science Scenarios Device Notifications Web / REST API { tripid: “100100”, delay: -5, time: “01:00:01” } { tripid: “100100”, delay: -30, time: “01:00:01” } {delay:-30} {delay:-30} {delay:-30}
  • 27.
    Get started withAzure Cosmos DB Get started with Hadoop on HDI HDInsight EdX Courses HDInsight Channel9 Videos HDI Spark + Cosmos DB Tutorial AskOSSNoSQL@microsoft.com