ClustrixDB
@ Samsung Cloud
Kwangbock Lee
Lead Database Architect
Samsung Electronics
Agenda
1. Introduction of Samsung Cloud Platform
2. Requirements & Features
3. Samsung Cloud + ClustrixDB Journey
4. Issues & Enhancements
5. Wrap Up
User Benefit
Backup and restore
data and settings
Your photos on multiple
devices any time
15 GB of free storage,
Upgrade for more
- Home screen, App data, Contact, Messages,
Device settings, Music, Documents, etc.
- Sync photos, videos, notes using native
applications across Samsung devices
- Premium Plans
. Korea, 29 countries in EU (’16, Nov)
. US models (excl. VZW. ATT, ’17, Feb)
. Brazil (unlock devices, ‘18, Mar)
* No. 1 request from customers
Figures of Samsung Cloud
Hundreds of millions
Members
Tens of billions
Daily Request
Hundreds of PiB
Storage
ClustrixDB
Cassandra
MySQL
DynamoDB
Samsung Cloud Architecture
Data Processing
Layer Backend Modules
Service Modules
Basic Modules User Modules
Data Layer
API Gateway
Application Layer
Access Layer
ClustrixDB Cassandra
User Architecture – Before Migration
Master
Slaves
Master
Slaves
Shard Info
…
Shard #1 Shard #2
Key Challenges
● RDBMS Scaling Strategy
○ Sharding Overhead
○ Migration Overhead
○ Additional Codes for both sharding & migration
● High Availability
● Analytic Query
○ Need to run the query in every Shard DB and merge it.
● Online Schema Change
● Online Backup / Restore
Requirements & Clustrix Features
● Scalability, No more Sharding!
● ACID Compliant
● MySQL Compatible
● Fault Tolerance, No SPOF!
● OLTP and Operational Analytics
● Online Schema Change
● Online Backup / Restore
● Scalable
● High-Volume, High Concurrent OLTP
● Automatic Data Distribution
● Distributed Query Execution
● Fault-Tolerant
● Flexible Deployment Options
● MySQL Compatible
● Easy to Migrate from MySQL
● Fast Backup and Restore
Requirements Clustrix Features
Key Features of ClustrixDB
Scalability
● Scalable Architecture
○ Can scale linearly as nodes are added
○ Automatically distributes both data and query execution to scale
○ Flex Up & Flex Down
● Rebalancer
○ Automatically manage the distribution of data for the cluster
○ Read/Write imbalance across node/zones (ranking replica)
Key Features of ClustrixDB
Fault-Tolerant
● Built-in Fault Tolerance can endure a single node failure and automatically
maintain 2 copies of all data
● Replication
● Deploying Across Zones
○ AWS Availability Zones (requires 3 AZ)
● MAX_FAILURES
○ Number of failures that can occur simultaneously
○ ALTER CLUSTER SET MAX_FAILURES = number of simultaneous node failures
Key Features of ClustrixDB
Online Schema Change
● No blocking read or writes to a table
○ Requires more space to run
● Distributed Parallel Query Execution – FANOUT option
○ query_fanout
○ query_fanout_insert_select
○ query_fanout_all_writes
● Monitoring the Process of an ALTER
○ system.alter_progress
Samsung Cloud + ClustrixDB Journey
From PoC to Expansion
PoC
ClustrixDB
v7.6
2016
Pre-Launch
Workshop
ClustrixDB
v8.x
2017
Go-Live
ClustrixDB
v9.0
2017
Expansion
ClustrixDB
v9.1
2018
Issues & Enhancements
Replication Configuration with MySQL 5.7
● For Migraion Deployment
● MySQL 5.7(master) – ClustrixDB (slave)
PoC
ClustrixDB
v7.6
2016
Master Slave
ClustrixDB
Issues & Enhancements
Fast Backup and Restore
√ Fast Backup and Restore as a binary
backup mechanism
√ Each node sends its data directly to the
backup target in parallel
√ Provides SFTP for Backup and Restore
√ Can control concurrency
ClustrixDB
PoC
ClustrixDB
v7.6
2016
FTP Server
Secure FTP
Issues & Enhancements
Replication Performance
● Write Intensive Workload
○ Replication Gap increasing
○ Binlogging Performance low
Pre-Launch
Workshop
ClustrixDB
v8.x
2017
Master
ClustrixDB
Slave
ClustrixDB
√ Zones (v9.x)
Issues & Enhancements
Enhanced Security
● SSL
○ Supports SSL Encrypted Connections
○ Requires a mysql client 5.6.38 or higher
● SHA256 Password Plugin
○ Provides strong user password credentials than mysql_native_password plugin
● Audit (User Logging)
○ Provides audit logs of user login/logout (user.log)
○ SET GLOBAL session_log_users = true;
Expansion
ClustrixDB
v9.1
2018
Issues & Enhancements
Monitoring Tools
● Built-in Monitoring tool - ClustrixGUI
● Network security policy blocks using ClustrixGUI
● Need long-term historical data
√ Monitoring with InfluxDB & Grafana
○ Collector script
○ Grafana dashboard
√ Other tools are available
Expansion
ClustrixDB
v9.1
2018
Current Architecture
Architecture #1
Zone 1 Zone 2 Zone 3
Master Slave
ClustrixDBClustrixDB
Architecture #2
MAX_FAILURES = 2
REPLICAS = 3
MAX_FAILURES = 1
REPLICAS = 2
ClustrixDB
Current Deployment & Usage
Region #2
Region #1
Region #3
M SS
M SS
230 Million
TPS
16 Billion
Rows
2 Services
3 Regions
 No Additional Resources for
Migration or Sharding
 Downsized Instance Spec.
 No Standby Replicas for HA,
Backup, Analytics
 Less Man-Month
 Easy Scalability
 No SPOF, Strong HA
 Better Maintenance &
Monitoring
 Analytic Query
 Tech Support
 Simplified Application
Architecture
 No Additional Code for
Migration or Sharding
 Focus on Service Logic
Development
Benefits
Operation Cost
Wrap Up
● Future Work
○ BINLOG / Replication Enhancement
○ ETL Tools
● Q&A
THANK YOU!

ClustrixDB at Samsung Cloud

  • 1.
    ClustrixDB @ Samsung Cloud KwangbockLee Lead Database Architect Samsung Electronics
  • 2.
    Agenda 1. Introduction ofSamsung Cloud Platform 2. Requirements & Features 3. Samsung Cloud + ClustrixDB Journey 4. Issues & Enhancements 5. Wrap Up
  • 4.
    User Benefit Backup andrestore data and settings Your photos on multiple devices any time 15 GB of free storage, Upgrade for more - Home screen, App data, Contact, Messages, Device settings, Music, Documents, etc. - Sync photos, videos, notes using native applications across Samsung devices - Premium Plans . Korea, 29 countries in EU (’16, Nov) . US models (excl. VZW. ATT, ’17, Feb) . Brazil (unlock devices, ‘18, Mar) * No. 1 request from customers
  • 5.
    Figures of SamsungCloud Hundreds of millions Members Tens of billions Daily Request Hundreds of PiB Storage ClustrixDB Cassandra MySQL DynamoDB
  • 6.
    Samsung Cloud Architecture DataProcessing Layer Backend Modules Service Modules Basic Modules User Modules Data Layer API Gateway Application Layer Access Layer ClustrixDB Cassandra
  • 7.
    User Architecture –Before Migration Master Slaves Master Slaves Shard Info … Shard #1 Shard #2
  • 8.
    Key Challenges ● RDBMSScaling Strategy ○ Sharding Overhead ○ Migration Overhead ○ Additional Codes for both sharding & migration ● High Availability ● Analytic Query ○ Need to run the query in every Shard DB and merge it. ● Online Schema Change ● Online Backup / Restore
  • 9.
    Requirements & ClustrixFeatures ● Scalability, No more Sharding! ● ACID Compliant ● MySQL Compatible ● Fault Tolerance, No SPOF! ● OLTP and Operational Analytics ● Online Schema Change ● Online Backup / Restore ● Scalable ● High-Volume, High Concurrent OLTP ● Automatic Data Distribution ● Distributed Query Execution ● Fault-Tolerant ● Flexible Deployment Options ● MySQL Compatible ● Easy to Migrate from MySQL ● Fast Backup and Restore Requirements Clustrix Features
  • 10.
    Key Features ofClustrixDB Scalability ● Scalable Architecture ○ Can scale linearly as nodes are added ○ Automatically distributes both data and query execution to scale ○ Flex Up & Flex Down ● Rebalancer ○ Automatically manage the distribution of data for the cluster ○ Read/Write imbalance across node/zones (ranking replica)
  • 11.
    Key Features ofClustrixDB Fault-Tolerant ● Built-in Fault Tolerance can endure a single node failure and automatically maintain 2 copies of all data ● Replication ● Deploying Across Zones ○ AWS Availability Zones (requires 3 AZ) ● MAX_FAILURES ○ Number of failures that can occur simultaneously ○ ALTER CLUSTER SET MAX_FAILURES = number of simultaneous node failures
  • 12.
    Key Features ofClustrixDB Online Schema Change ● No blocking read or writes to a table ○ Requires more space to run ● Distributed Parallel Query Execution – FANOUT option ○ query_fanout ○ query_fanout_insert_select ○ query_fanout_all_writes ● Monitoring the Process of an ALTER ○ system.alter_progress
  • 13.
    Samsung Cloud +ClustrixDB Journey From PoC to Expansion PoC ClustrixDB v7.6 2016 Pre-Launch Workshop ClustrixDB v8.x 2017 Go-Live ClustrixDB v9.0 2017 Expansion ClustrixDB v9.1 2018
  • 14.
    Issues & Enhancements ReplicationConfiguration with MySQL 5.7 ● For Migraion Deployment ● MySQL 5.7(master) – ClustrixDB (slave) PoC ClustrixDB v7.6 2016 Master Slave ClustrixDB
  • 15.
    Issues & Enhancements FastBackup and Restore √ Fast Backup and Restore as a binary backup mechanism √ Each node sends its data directly to the backup target in parallel √ Provides SFTP for Backup and Restore √ Can control concurrency ClustrixDB PoC ClustrixDB v7.6 2016 FTP Server Secure FTP
  • 16.
    Issues & Enhancements ReplicationPerformance ● Write Intensive Workload ○ Replication Gap increasing ○ Binlogging Performance low Pre-Launch Workshop ClustrixDB v8.x 2017 Master ClustrixDB Slave ClustrixDB √ Zones (v9.x)
  • 17.
    Issues & Enhancements EnhancedSecurity ● SSL ○ Supports SSL Encrypted Connections ○ Requires a mysql client 5.6.38 or higher ● SHA256 Password Plugin ○ Provides strong user password credentials than mysql_native_password plugin ● Audit (User Logging) ○ Provides audit logs of user login/logout (user.log) ○ SET GLOBAL session_log_users = true; Expansion ClustrixDB v9.1 2018
  • 18.
    Issues & Enhancements MonitoringTools ● Built-in Monitoring tool - ClustrixGUI ● Network security policy blocks using ClustrixGUI ● Need long-term historical data √ Monitoring with InfluxDB & Grafana ○ Collector script ○ Grafana dashboard √ Other tools are available Expansion ClustrixDB v9.1 2018
  • 19.
    Current Architecture Architecture #1 Zone1 Zone 2 Zone 3 Master Slave ClustrixDBClustrixDB Architecture #2 MAX_FAILURES = 2 REPLICAS = 3 MAX_FAILURES = 1 REPLICAS = 2 ClustrixDB
  • 20.
    Current Deployment &Usage Region #2 Region #1 Region #3 M SS M SS 230 Million TPS 16 Billion Rows 2 Services 3 Regions
  • 21.
     No AdditionalResources for Migration or Sharding  Downsized Instance Spec.  No Standby Replicas for HA, Backup, Analytics  Less Man-Month  Easy Scalability  No SPOF, Strong HA  Better Maintenance & Monitoring  Analytic Query  Tech Support  Simplified Application Architecture  No Additional Code for Migration or Sharding  Focus on Service Logic Development Benefits Operation Cost
  • 22.
    Wrap Up ● FutureWork ○ BINLOG / Replication Enhancement ○ ETL Tools ● Q&A
  • 23.

Editor's Notes

  • #2 Title Slide for OpenWorks
  • #24 OpenWorks End Slide