Scaling data on public clouds

Scaling Data On Public Clouds

Liran Zelkha, Founder
Liran.zelkha@scalebase.com

About Us
• ScaleBase is a new startup targeting the
database-as-a-service market (DBaaS)
• We offer unlimited database scalability and
availability using our Database Load Balancer
• We currently run in beta mode – contact me if
you want to join

Problem Of Data
• Flickr just hit 5B pictures
• Facebook > 0.5B users
• Farmville have more monthly players than the
population of France

Mondays Key Note
• More data
• More users
• More complex actions
• Shorter response times

Scalability Pain

Infrastructure
Cost $
Large You just lost
Capital customers
Expenditure

Predicted
Demand

Opportunity Traditional
Cost Hardware

Actual
Demand

Automated
Virtualization

time

http://media.amazonwebservices.com/pdf/IBMWebinarDeck_Final.pdf

CAP vs. ACID
• CAP = Consistency, Availability, Partition
Tolerance
• ACID = Atomicity, Consistency, Isolation,
Durability

• Atomicity – Chain of actions treated as one
whole unseperateable action
• Isolation – Consistent query snapshots, read
across writes, 4 levels are supported

ScaleBase
Database Scaling In A Box

Applications Legacy clients
• The first truly elastic,
fault tolerant SQL
based data layer
• Enables linear scaling
Scalebase of any SQL based
database

Database instances

ScaleBase
Application/Web Servers

Shared Nothing
DB Machines

Commodity
Hardware

ScaleBase
MySQL? Oracle?

Scalable and hi-
available

THE REQUIREMENTS FOR DATA SLA
IN PUBLIC CLOUD ENVIRONMENTS

What We Need
• Availability
• Consistency
• Scalability

Brewer's (CAP) Theorem
• It is impossible for a distributed computer
system to simultaneously provide all three of
the following guarantees:
– Consistency (all nodes see the same data at the
same time)
– Availability (node failures do not prevent survivors
from continuing to operate)
– Partition Tolerance (the system continues to
operate despite arbitrary message loss)

http://en.wikipedia.org/wiki/CAP_theorem

What It Means

http://guyharrison.squarespace.com/blog/2010/6/13/consistency-models-in-non-relational-databases.html

Reading More On CAP
• This is an excellent read, and some of my
samples are from this blog
– http://www.julianbrowne.com/article/viewer/bre
wers-cap-theorem

ACHIEVING DATA SCALABILITY
WITH RELATIONAL DATABASES

Databases And CAP
• ACID – Consistency
• Availability – tons of solutions, most of them
not cloud oriented
– Oracle RAC
– MySQL Proxy
– Etc.
– Replication based solutions can solve at least read
availability and scalability (see Azure SQL)

Database Cloud Solutions
• Amazon RDS
• NaviSite Oracle RAC
• Joyent + Zeus

So Where Is The Problem?
• Scaling problems (usually write but also read)
• Schema change problems
• BigData problems

Scaling Up
• Issues with scaling up when the dataset is just
too big
• RDBMS were not designed to be distributed
• Began to look at multi-node database
solutions
• Known as ‘scaling out’ or ‘horizontal scaling’
• Different approaches include:
– Master-slave
– Sharding

Scaling RDBMS – Master/Slave
• Master-Slave
– All writes are written to the master. All reads
performed against the replicated slave databases
– Critical reads may be incorrect as writes may not
have been propagated down
– Large data sets can pose problems as master
needs to duplicate data to slaves

Scaling RDBMS - Sharding
• Partition or sharding
– Scales well for both reads and writes
– Not transparent, application needs to be partition-
aware
– Can no longer have relationships/joins across
partitions
– Loss of referential integrity across shards

Other ways to scale RDBMS
• Multi-Master replication
• INSERT only, not UPDATES/DELETES
• No JOINs, thereby reducing query time
– This involves de-normalizing data
• In-memory databases

NoSQL
• A term used to designate databases which
differ from classic relational databases in
some way. These data stores may not require
fixed table schemas, and usually
avoid join operations and typically scale
horizontally. Academics and papers typically
refer to these databases as structured storage,
a term which would include classic relational
databases as a subset.
http://en.wikipedia.org/wiki/NoSQL

NoSQL Types
• Key/Value
– A big hash table
– Examples: Voldemort, Amazon Dynamo
• Big Table
– Big table, column families
– Examples: Hbase, Cassandra
• Document based
– Collections of collections
– Examples: CouchDB, MongoDB
• Each solves a different problem

NO-SQL

http://browsertoolkit.com/fault-tolerance.png

Pros/Cons
• Pros:
– Performance
– BigData
– Most solutions are open source
– Data is replicated to nodes and is therefore fault-tolerant (partitioning)
– Don't require a schema
– Can scale up and down
• Cons:
– Code change
– Limited framework support
– Not ACID
– Eco system (BI, Backup)
– There is always a database at the backend
– Some API is just too simple

Amazon S3 Code Sample
AWSAuthConnection conn = new AWSAuthConnection(awsAccessKeyId, awsSecretAccessKey, secure, server, format);

Response response = conn.createBucket(bucketName, location, null);

final String text = "this is a test";

response = conn.put(bucketName, key, new S3Object(text.getBytes(), null), null);

Cassandra Code Sample
CassandraClient cl = pool.getClient() ;
KeySpace ks = cl.getKeySpace("Keyspace1") ;

// insert value
ColumnPath cp = new ColumnPath("Standard1" , null,
"testInsertAndGetAndRemove".getBytes("utf-8"));
for(int i = 0 ; i < 100 ; i++){
ks.insert("testInsertAndGetAndRemove_"+i, cp ,
("testInsertAndGetAndRemove_value_"+i).getBytes("utf-8"));
}

//get value
for(int i = 0 ; i < 100 ; i++){
Column col = ks.getColumn("testInsertAndGetAndRemove_"+i, cp);
String value = new String(col.getValue(),"utf-8") ;
}

//remove value
for(int i = 0 ; i < 100 ; i++){
ks.remove("testInsertAndGetAndRemove_"+i, cp);
}

Cassandra Code Sample – Cont’
try{
ks.remove("testInsertAndGetAndRemove_not_exist", cp);
}catch(Exception e){
fail("remove not exist row should not throw exceptions");
}

//get already removed value
for(int i = 0 ; i < 100 ; i++){
try{
Column col = ks.getColumn("testInsertAndGetAndRemove_"+i, cp);
fail("the value should already being deleted");
}catch(NotFoundException e){

}catch(Exception e){
fail("throw out other exception, should be
NotFoundException." + e.toString() );
}
}

pool.releaseClient(cl) ;
pool.close() ;

Cassandra Statistics
• Facebook Search
• MySQL > 50 GB Data
– Writes Average : ~300 ms
– Reads Average : ~350 ms
• Rewritten with Cassandra > 50 GB Data
– Writes Average : 0.12 ms
– Reads Average : 15 ms

MongoDB
Mongo m = new Mongo();

DB db = m.getDB( "mydb" );
Set<String> colls = db.getCollectionNames();

for (String s : colls) {
System.out.println(s);
}

MongoDB – Cont’
BasicDBObject doc = new BasicDBObject();

doc.put("name", "MongoDB");
doc.put("type", "database");
doc.put("count", 1);

BasicDBObject info = new BasicDBObject();

info.put("x", 203);
info.put("y", 102);

doc.put("info", info);

coll.insert(doc);

Data SLA
• There is no golden hammer
– See http://sourcemaking.com/antipatterns/golden-
hammer
• Choose your tool wisely, based on what you need
• Usually
– Start with RDBMS (shortest TTM, which is what we
really care about)
– When scale issues occur – start moving to NoSQL
based on your needs
• You can get Data Scalability in the cloud – just
think before you code!!!

How ScaleBase Works
Application
• ScaleBase takes an application
database and splits its data
across multiple, separated
instances (a technique called
Sharding)
• Queries and DML are either:
– Directed to correct instance, or
– Executed simultaneously across ScaleBase
several instances
• Results are aggregated and
returned to the original
application

Database instances

Example
ID First name Last name

102 Lex De Haan

105 David Austin

100 Steven King
100 Steven King
103 Alexander Hunold
101 Neena Kochhar
106 Valli Pataballa
102 Lex De Haan

103 Alexander Hunold

104 Bruce Ernst ID First name Last name

105 David Austin 101 Neena Kochhar

106 Valli Pataballa 104 Bruce Ernst

ScaleBase Supports
• 3 table types
– Master
– Global
– Split
• Splitting according to Hash, List, Range
• Rebalance, addition and removal of machines
• Instance replication backup: Shadow and Master
• Full consistent 2-Phase Commit
• Joins, Foreign Keys, Subqueries
• DML, DDL, Batch updates, Prepared Statements
• Aggregations, Group By, Order By, Auto Numbering,
Timestamps

Sample Code
SELECT site_owner_id, count(*)FROM google.user_clicks
WHERE country = ‘BRAZIL’
GROUP BY site_owner_id

• site_owner_id is the split key
• Perform the query on all DBs
• Simple aggregation of results

• No Code Change

Sample Code
SELECT country, count(*)FROM google.user_clicks
GROUP BY country

• Perform the query on all DBs
• Aggregation of the aggregations

• No Code Change

Sample Code
PreparedStatement pstmt = conn.prepareStatement("INSERT INTO emp VALUES(?,?,?,?,?)");
for (int i = 0; i < 10; i++) {
pstmt.setInt(1, 300 + i);
pstmt.setString(2, "Something" + i);
pstmt.setDate(3, new Date(System.currentTimeMillis()));
pstmt.setInt(4, i);
pstmt.setLong(5, i);
pstmt.addBatch();
}
int[] result = pstmt.executeBatch();

• Is split key dynamic or static?
• Each command is added to the correct DB,
execution is on all relevant DBs

• No Code Change

ScaleBase Solution
• Elastic SQL Database Scaling hi-availability
solution
– Complete
– Transparent
– Super scalable
– Out of the box
– Non-intrusive
– Flexible
– Manageable

With ScaleBase
• Pay much less for hardware and Database licenses
• Get more power, better data spreading and better
availability
• Real linear scalability
• Go for real grid, cloud and virtualization

• ScaleBase is NOT:
– Is NOT an RDBMS. It facilitates any secure, high-available,
archivable RDBMS (Oracle, DB2, MySQL, any!)
– Does NOT require schema structure modifications
– Does NOT require additional sophisticated hardware

Moving To ScaleBase
• Implementing ScaleBase is done in minutes
• Just direct your application to your ScaleBase
instance
• Target ScaleBase to your original database and
target database instances
• ScaleBase will automatically migrate your schema
and data to the new servers
• After all data is transferred ScaleBase will start
working with target database instances, giving
you linear scalability – with no down time!

Where ScaleBase Fits In
• Cloud databases
– Use SQL databases in the cloud, and get Scale Out
features and high availability
• High scale applications
– Use your existing code, without migrating to
NOSQL, and get unlimited SQL scalability

Public Cloud
• Scenario:
– A startup developing a complex iPhone application
running on EC2
– Requires high scaling option for SQL based database
• Problem:
– Amazon RDS offers only Scale Up option, no Scale Out
• Solution:
– Customer switched their Amazon RDS-based
application to ScaleBase on EC2
– Gained transparent, linear, scaling
– Running 4 RDS instances behind ScaleBase

Private Cloud
• Scenario:
– A company selling devices that ‘ping’ home every 5 minute
– 8 digit number of devices sold
• Problem:
– Evaluated MySQL, Oracle - no single machine can support
required devices
– Clustering options too expensive, limited scalability
• Solution:
– Customer moved to ScaleBase with no code changes
– Gained linear scales. Runs 4 MySQL databases behind
ScaleBase

Scaling data on public clouds

More Related Content

What's hot

Viewers also liked

Similar to Scaling data on public clouds

Recently uploaded

Scaling data on public clouds