Scaling web applications with cassandra presentation

introduction to cassandra
eben hewitt

september 29. 2010
web 2.0 expo
new york city

@ebenhewitt
• director, application architecture
at a global corp

• focus on SOA, SaaS, Events

• i wrote this

agenda
• context
• features
• data model
• api

“nosql”  “big data”
• mongodb
• couchdb
• tokyo cabinet
• redis
• riak
• what about?
– Poet, Lotus, Xindice
– they’ve been around forever…
– rdbms was once the new kid…

innovation at scale
• google bigtable (2006)
– consistency model: strong
– data model: sparse map
– clones: hbase, hypertable
• amazon dynamo (2007)
– O(1) dht
– consistency model: client tune-able
– clones: riak, voldemort

cassandra ~= bigtable + dynamo

proven
• The Facebook stores 150TB of data on 150 nodes

web 2.0
• used at Twitter, Rackspace, Mahalo, Reddit,
Cloudkick, Cisco, Digg, SimpleGeo, Ooyala, OpenX,
others

cap theorem

• consistency
– all clients have same view of data

• availability
– writeable in the face of node failure

• partition tolerance
– processing can continue in the face of network failure
(crashed router, broken network)

write consistency
Level Description
ZERO Good luck with that
ANY 1 replica (hints count)
ONE 1 replica. read repair in bkgnd
QUORUM (DCQ for RackAware) (N /2) + 1
ALL N = replication factor

read consistency
Level Description
ZERO Ummm…
ANY Try ONE instead
ONE 1 replica
QUORUM (DCQ for RackAware) Return most recent TS after (N /2) + 1
report
ALL N = replication factor

cassandra properties
• tuneably consistent
• very fast writes
• highly available
• fault tolerant
• linear, elastic scalability
• decentralized/symmetric
• ~12 client languages
– Thrift RPC API
• ~automatic provisioning of new nodes
• 0(1) dht
• big data

Staged Event-Driven Architecture
• A general-purpose framework for high
concurrency & load conditioning
• Decomposes applications into stages
separated by queues
• Adopt a structured approach to event-driven
concurrency

partitioner smack-down
Random Preserving Order Preserving
• system will use MD5(key) to • key distribution determined
distribute data across nodes by token
• even distribution of keys • lexicographical ordering
from one CF across • required for range queries
ranges/nodes – scan over rows like cursor in
index
• can specify the token for
this node to use
• ‘scrabble’ distribution

keyspace
• ~= database
• typically one per application
• some settings are configurable only per
keyspace

column family
• group records of similar kind
• not same kind, because CFs are sparse tables
• ex:
– User
– Address
– Tweet
– PointOfInterest
– HotelRoom

think of cassandra as

row-oriented
• each row is uniquely identifiable by key
• rows group columns and super columns

column family

key nickname=
user=eben The
123 Situation

key icon= n=
user=alison
456 42

json-like notation
User {
123 : { email: alison@foo.com,
icon: },

456 : { email: eben@bar.com,
location: The Danger Zone}
}

0.6 example
$cassandra –f
$bin/cassandra-cli
cassandra> connect localhost/9160

cassandra> set Keyspace1.Standard1[‘eben’]
[‘age’]=‘29’
cassandra> set Keyspace1.Standard1[‘eben’]
[‘email’]=‘e@e.com’
cassandra> get Keyspace1.Standard1[‘eben'][‘age']
=> (column=6e616d65, value=39,
timestamp=1282170655390000)

a column has 3 parts
1. name
– byte[]
– determines sort order
– used in queries
– indexed
1. value
– byte[]
– you don’t query on column values
1. timestamp
– long (clock)
– last write wins conflict resolution

column comparators
• byte
• utf8
• long
• timeuuid
• lexicaluuid
• <pluggable>
– ex: lat/long

super column

super columns group columns under a common name

super column family
<<SCF>>PointOfInterest

<<SC>>Central <<SC>>
Park Empire State Bldg
10017 desc=Fun to desc=Great
phone=212.
walk in. view from
555.11212
102nd floor!

<<SC>>
85255 Phoenix
Zoo

super column family
super column family

PointOfInterest {
key: 85255 { column
Phoenix Zoo { phone: 480-555-5555, desc: They have animals here. },
Spring Training { phone: 623-333-3333, desc: Fun for baseball fans. },
}, //end phx
key
super column
key: 10019 { flexible schema
Central Park { desc: Walk around. It's pretty.} ,
s
Empire State Building { phone: 212-777-7777,
desc: Great view from 102nd floor. }
} //end nyc
}

about super column families
• sub-column names in a SCF are not indexed
– top level columns (SCF Name) are always indexed
• often used for denormalizing data from
standard CFs

slice predicate
• data structure describing columns to return
– SliceRange
• start column name
• finish column name (can be empty to stop on count)
• reverse
• count (like LIMIT)

• get() : Column
– get the Col or SC at given ColPath
read api
COSC cosc = client.get(key, path, CL);

• get_slice() : List<ColumnOrSuperColumn>
– get Cols in one row, specified by SlicePredicate:
List<ColumnOrSuperColumn> results =
client.get_slice(key, parent, predicate, CL);

• multiget_slice() : Map<key, List<CoSC>>
– get slices for list of keys, based on SlicePredicate
Map<byte[],List<ColumnOrSuperColumn>> results =
client.multiget_slice(rowKeys, parent, predicate, CL);

• get_range_slices() : List<KeySlice>
– returns multiple Cols according to a range
– range is startkey, endkey, starttoken, endtoken:
List<KeySlice> slices = client.get_range_slices(
parent, predicate, keyRange, CL);

client.insert(userKeyBytes, parent, write api
new Column(“band".getBytes(UTF8),
“Funkadelic".getBytes(), clock), CL);

batch_mutate
– void batch_mutate(
map<byte[], map<String, List<Mutation>>> , CL)
remove
– void remove(byte[],
ColumnPath column_path, Clock, CL)

//create param
batch_mutate
Map<byte[], Map<String, List<Mutation>>> mutationMap =
new HashMap<byte[], Map<String, List<Mutation>>>();

//create Cols for Muts
Column nameCol = new Column("name".getBytes(UTF8),
“Funkadelic”.getBytes("UTF-8"), new Clock(System.nanoTime()););
Mutation nameMut = new Mutation();
nameMut.column_or_supercolumn = nameCosc; //also phone, etc

Map<String, List<Mutation>> muts = new HashMap<String, List<Mutation>>();
List<Mutation> cols = new ArrayList<Mutation>();
cols.add(nameMut);
cols.add(phoneMut);
muts.put(CF, cols);
//outer map key is a row key; inner map key is the CF name
mutationMap.put(rowKey.getBytes(), muts);
//send to server
client.batch_mutate(mutationMap, CL);

raw thrift: for masochists only

• pycassa (python)
• fauna (ruby)
• hector (java)
• pelops (java)
• kundera (JPA)
• hectorSharp (C#)

?
what about…

SELECT WHERE
ORDER BY
JOIN ON
GROUP

rdbms: domain-based model
what answers do I have?

cassandra: query-based model
what questions do I have?

SELECT WHERE
cassandra is an index factory
<<cf>>USER
Key: UserID
Cols: username, email, birth date, city, state

How to support this query?

SELECT * FROM User WHERE city = ‘Scottsdale’

Create a new CF called UserCity:

<<cf>>USERCITY
Key: city
Cols: IDs of the users in that city.
Also uses the Valueless Column pattern

SELECT WHERE pt 2
• Use an aggregate key
state:city: { user1, user2}

• Get rows between AZ: & AZ;
for all Arizona users

• Get rows between AZ:Scottsdale &
AZ:Scottsdale1
for all Scottsdale users

ORDER BY

Columns Rows
are sorted according to are placed according to their Partitioner:
CompareWith or
CompareSubcolumnsWith •Random: MD5 of key
•Order-Preserving: actual key

are sorted by key, regardless of partitioner

is cassandra a good fit?
• you need really fast writes • your programmers can deal
• you need durability – documentation
• you have lots of data – complexity
– consistency model
> GBs
– change
>= three servers
– visibility tools
• your app is evolving
• your operations can deal
– startup mode, fluid data
– hardware considerations
structure
– can move data
• loose domain data
– JMX monitoring
– “points of interest”

Scaling web applications with cassandra presentation

More Related Content

What's hot

Viewers also liked

Similar to Scaling web applications with cassandra presentation

More from Murat Çakal

Recently uploaded

Scaling web applications with cassandra presentation