APACHE
CASSANDRA
Architecture & Internals
B H U V A N R A W A L
SNAPDEAL.COM
BHUVAN RAWAL
CASSANDRA - AN OVERVIEW
NOSQL-DATABASE.ORG
> MASSIVELY SCALABLE
> PARTITIONED ROW STORE
> MASTERLESS ARCHITECTURE
> LINEAR SCALABILITY
> NO SINGLE POINT OF FAILURE
>  MULTIPLE DC SUPPORT OUT OF BOX
BHUVAN RAWAL
CASSANDRA - AN OVERVIEW
2008
Open sourced by Facebook on Google Code, in
2009 became an Apache Incubator Project. In
2010 gained top level status at Apache.
Can be adapted for different
class of use cases
GENERALPURPOSE
Can be available at the loss of
Node/Rack/DC
AVAILABLE
BHUVAN RAWAL
KEY FEATURES
CASSANDRA - AN OVERVIEW
Seamless distribution across
datacentres across continents
DISTRIBUTED
JVM Heap & GC Algorithms
Compaction Strategy
Key Cache Size
Row Cache
Compression Chunk Size
Speculative Retries
Throughput vs Latency tuning
KEY TUNABLES
BHUVAN RAWAL
CASSANDRA - AN OVERVIEW
Cassandra is the most popular wide column
store - Wikipedia
Deployed by 400+ Fortune-500 Firms 
667 Companies Verified  on siftery
Apple 100,000+ Node Deployment
Netflix - 95% Data on Cassandra
Uber - 20 Cassandra Clusters, soon will be 100
Spotify - 100+ Production Clusters 
SOME USERS
BHUVAN RAWAL
CASSANDRA - AN OVERVIEW
Determines how data is to be stored in
nodes
Should be same across the cluster
Ordered Partitioner
Random Partitioner
Murmur3 Partitioner
PARTITIONER
BHUVAN RAWAL
CASSANDRA - AN OVERVIEW
Determines node placement
Allows to spread enough replicas to
handle failures
Failure Modes : Node -> Rack -> DC ->
Region
Tries its best to not have same replica in
same rack
SNITCH
BHUVAN RAWAL
CASSANDRA - AN OVERVIEW
status
health
tokens
schema version
data size
phi_threshold
GOSSIPPROTOCOL
BHUVAN RAWAL
CASSANDRA - AN OVERVIEW
As with most databases, data model is the key
to successful deployments & scalability
Test thoroughly on stage env
Avoid Client Side joins as far as possible
Materialized view - Boon for automated
denormalization
Tune Partition size to not affect cluster
abnormally
DATA MODEL
WWW.AUGUSTA&CO.COM
CASSANDRA - AN OVERVIEW
BHUVAN RAWAL
TEAM
Operations Manager
CASSANDRA - AN OVERVIEW
BHUVAN RAWAL
TEAM
CEO / Director
NANCYD.BROOKS
Head Architect
RICHARDB.BEVERIDGE
Operations Manager
JOHNV.POWELL
CASSANDRA - AN OVERVIEW
WWW.AUGUSTA&CO.COM
CASSANDRA - AN OVERVIEW
Datastax Driver for Spark:
-> Reads localized data off
Cassandra Nodes
-> Support for Hadoop
-> Pig, Hive, Squoop, Mahout
-> Solr integration
ANALYTICS
SUPPORT
B H U V A N R A W A L
CASSANDRA - AN OVERVIEW
-> Memtable
-> SSTable - Sorted String
-> Index
-> Partition Summary
-> Bloom Filter
-> Compression
STORAGE
BHUVAN RAWAL
FELLOW
DATASTORES
HBASE
RIAK MONGODB
AEROSPIKE BIGTABLE
SCYLLA
CASSANDRA - AN OVERVIEW
THANK
YOU!
  Bhuvan Rawal

Apache cassandra architecture internals

  • 1.
    APACHE CASSANDRA Architecture & Internals BH U V A N R A W A L SNAPDEAL.COM
  • 2.
    BHUVAN RAWAL CASSANDRA -AN OVERVIEW NOSQL-DATABASE.ORG > MASSIVELY SCALABLE > PARTITIONED ROW STORE > MASTERLESS ARCHITECTURE > LINEAR SCALABILITY > NO SINGLE POINT OF FAILURE >  MULTIPLE DC SUPPORT OUT OF BOX
  • 3.
    BHUVAN RAWAL CASSANDRA -AN OVERVIEW 2008 Open sourced by Facebook on Google Code, in 2009 became an Apache Incubator Project. In 2010 gained top level status at Apache.
  • 4.
    Can be adaptedfor different class of use cases GENERALPURPOSE Can be available at the loss of Node/Rack/DC AVAILABLE BHUVAN RAWAL KEY FEATURES CASSANDRA - AN OVERVIEW Seamless distribution across datacentres across continents DISTRIBUTED
  • 5.
    JVM Heap &GC Algorithms Compaction Strategy Key Cache Size Row Cache Compression Chunk Size Speculative Retries Throughput vs Latency tuning KEY TUNABLES BHUVAN RAWAL CASSANDRA - AN OVERVIEW
  • 6.
    Cassandra is themost popular wide column store - Wikipedia Deployed by 400+ Fortune-500 Firms  667 Companies Verified  on siftery Apple 100,000+ Node Deployment Netflix - 95% Data on Cassandra Uber - 20 Cassandra Clusters, soon will be 100 Spotify - 100+ Production Clusters  SOME USERS BHUVAN RAWAL CASSANDRA - AN OVERVIEW
  • 7.
    Determines how datais to be stored in nodes Should be same across the cluster Ordered Partitioner Random Partitioner Murmur3 Partitioner PARTITIONER BHUVAN RAWAL CASSANDRA - AN OVERVIEW
  • 8.
    Determines node placement Allowsto spread enough replicas to handle failures Failure Modes : Node -> Rack -> DC -> Region Tries its best to not have same replica in same rack SNITCH BHUVAN RAWAL CASSANDRA - AN OVERVIEW
  • 9.
  • 10.
    As with mostdatabases, data model is the key to successful deployments & scalability Test thoroughly on stage env Avoid Client Side joins as far as possible Materialized view - Boon for automated denormalization Tune Partition size to not affect cluster abnormally DATA MODEL WWW.AUGUSTA&CO.COM CASSANDRA - AN OVERVIEW
  • 11.
  • 12.
    BHUVAN RAWAL TEAM CEO /Director NANCYD.BROOKS Head Architect RICHARDB.BEVERIDGE Operations Manager JOHNV.POWELL CASSANDRA - AN OVERVIEW
  • 13.
    WWW.AUGUSTA&CO.COM CASSANDRA - ANOVERVIEW Datastax Driver for Spark: -> Reads localized data off Cassandra Nodes -> Support for Hadoop -> Pig, Hive, Squoop, Mahout -> Solr integration ANALYTICS SUPPORT
  • 14.
    B H UV A N R A W A L CASSANDRA - AN OVERVIEW -> Memtable -> SSTable - Sorted String -> Index -> Partition Summary -> Bloom Filter -> Compression STORAGE
  • 15.
    BHUVAN RAWAL FELLOW DATASTORES HBASE RIAK MONGODB AEROSPIKEBIGTABLE SCYLLA CASSANDRA - AN OVERVIEW
  • 16.