Cassandra

Where Did Cassandra Come From
• Cassandra originated at Facebook in 2007 to
solve that company’s inbox search problem
– large volumes of data
– many random reads
– many simultaneous random writes
• was released as an open source Google Code
project in July 2008
• March 2009 it was moved to an Apache Incubator
project
• February 17, 2010 it was voted into a top-level
project

Cassandra in 50 Words or Less
• Apache Cassandra is an
– open source
– distributed
– Decentralized
– elastically scalable
– highly available
– fault-tolerant
– tuneably consistent
– column-oriented
• Database that
• bases its distribution design on Amazon’s Dynamo
• its data model on Google’s Bigtable
• Created at Facebook
• it is now used at some of the most popular sites on the Web

Who Is Using Cassandra
• Twitter is using Cassandra for analytics.
• Mahalo uses it for its primary near-time data store.
• Facebook still uses it for inbox search, though they are using a
proprietary fork.
• Digg uses it for its primary near-time data store.
• Rackspace uses it for its cloud service, monitoring, and logging.
• Reddit uses it as a persistent cache.
• Cloudkick uses it for monitoring statistics and analytics.
• Ooyala uses it to store and serve near real-time video analytics
data.
• SimpleGeo uses it as the main data store for its real-time location
infrastructure.
• Onespot uses it for a subset of its main data store

Decentralized

• Master/slave:
Decentralized Master/slave
all nodes are the same, If the master node fails, the
failures of a whole database is in jeopardy
node won’t disrupt service

Elastic Scalability
• add another machine—Cassandra will find it
and start sending it work

High Availability and Fault Tolerance

SCID
• Atomic
– All or nothing
• Consistent

• Isolated
– Two transaction modify same data
• Durable

Brewer’s CAP Theorem
• you can strongly support only two of the Three:
– Consistency
• All database client will read the same value for same query,
even given concurrent updates
– Availability
• All database clients will always be able to read and write
data
– Partition Tolerance
• The database can be split into multiple machines
• It can continue functioning in fact of network segmentation
breaks

usage
• Connect localhost/9160 ;
• Show cluster name
• Show keyspaces
• Create keyspace XXXXX
• Use XXXXX
• Create column family YYYYY
• Describe keyspace XXXXX

• Set YYYYY[“XiaoMing”][“name”] = “小明”
• Get YYYYY[“XiaoMing”]

• List
• Map
• MapList<row_id, Map>

• Column Family 列簇
• create column family User
with key_validation_class=UTF8Type

Clusters (Ring)
• If the first node goes down, a replica can
respond to queries. The peer-to-peer protocol
allows the data to replicate across nodes in a
manner transparent to the user

• Replaction factor

Keyspaces
• Don’t add too much Keyspaces

• (database)

Gossip protocols
• intra-ring communication so that each node
can have state information about other nodes
• Runs every second
• Gossip Message:
– Send: GossipDigestSynMessage
– Ack: GossipDigestAckMessage
– send: GossipDigestAck2Message
• algorithm :
– Phi Accrual Failure Detection

Anti-entropy
• Anti-entropy is the replica synchronization
mechanism in Cassandra for ensuring that
data on different nodes is updated to the
newest version
• Merkle tree

Memtable&SSTable&CommitLog
• Memtable
– Value is written to a memory-resident data structure
• SSTable
– Include: Data, Index, and Filter
– concept borrowed from Google’s Bigtable
– Memtable reaches a threshold, flushed to disk
• Commit log
– Flush status: 0 / 1
• 1:start to flush
• 0: flush success

hinted handoff & Compaction
• hinted handoff
– When a write no available
– Create a hint to node Cassandra

• Compaction:
– In order to merge SSTable
– merged data is sorted
– new index is created over the sorted data

major compaction
• stored in memory
• used to improve performance by reducing disk
access on key lookups

Tombstones 墓碑
• Knows as “soft delete”
• Not immediately deleted after execute a
delete operation
• Garbage Collection Grace Seconds:
– GCGraceSeconds
• Default: 10 days (864000 sec)

Staged Event-Driven Architecture
(SEDA)
• originally proposed in a 2001 paper called “SEDA: An
Architecture for Well-Conditioned, Scalable Internet
Services”
• A stage consists of an incoming event queue
– Read
– Mutation
– Gossip
– Response
– Anti-Entropy
– Load Balance
– Migration
– Streaming
– …

Custom FactoryUtil
• Prevent version uncompatible

Configuring Cassandra
• system_add_keyspace
– Creates a keyspace.
• system_rename_keyspace
– Changes the name of a keyspace after taking a snapshot of it. Note that this
method
– blocks until its work is done.
• system_drop_keyspace
– Deletes an entire keyspace after taking a snapshot of it.
• system_add_column_family
– Creates a column family.
• system_drop_column_family
– Deletes a column family after taking a snapshot of it.
• system_rename_column_family
– Changes the name of a column family after taking a snapshot of it. Note that
this
– method blocks until its work is done.

Creating a Column Family
• column_type
– Either Super or Standard.
• clock_type
– The only valid value is Timestamp.
• comparator
– Valid options include AsciiType, BytesType, LexicalUUIDType, LongType, TimeUUID Type, and UTF8Type.
• subcomparator
– Name of comparator used for subcolumns when the column_type is Super. Valid options are the same as comparator.
• reconciler
– Name of the class that will reconcile conflicting column versions. The only valid value at this time is Timestamp.
• comment
– Any human-readable comment in the form of a string.
• rows_cached
– The number of rows to cache.
• preload_row_cache
– Set this to true to automatically load the row cache.
• key_cache_size
– The number of keys to pull into the cache.
• read_repair_chance
– Valid values are a number between 0.0 and 1.0.

Replicas
• Simple Strategy
– RackUnawareStrategy
• Old Network Topology Strategy
– RackAwareStrategy
• Network Topology Strategy
– DataCenterShardStrategy
– datacenter.properties

Replication Factor
• specifies how many copies of each piece of
data will be stored and distributed throughout
the Cassandra cluster
• Factor = 1 : your data will exist only in a single
node in the cluster. Losing that node means
that data becomes unavailable

Increasing the Replication Factor
• Nodes grows and should increasing factor
• How to do:
– ensure that all the data is flushed to the SSTables
• flush -h 192.168.1.1 -p 9160
– stop that node
– copy the datafiles from your keyspaces
– Paste those datafiles to the new node

Replica Placement Strategies
• Simple Strategy
• Old Network Topology Strategy
• Network Topology Strategy

Adding Nodes to a Cluster
• If you want to add a new seed node, then you should
autobootstrap it first, and then change it to a seed
afterward

• Node1:
– listen_address: 192.168.1.1
– rpc_address: 0.0.0.0
• Node2:
– auto_bootstrap: true
– listen_address: 192.168.2.34
– rpc_address: 0.0.0.0

Hector
• Cluster myCluster =
HFactory.getOrCreateCluster("Test Cluster",
"192.168.2.3:9160");

• ThriftCfDef columnFamilyDefinition = new
ThriftCfDef("s3","nb",ComparatorType.UTF8TYPE
);
•
columnFamilyDefinition.setReplicateOnWrite(tru
e);

Hector
• ThriftCfDef columnFamilyDefinition = new
ThriftCfDef("s3","bb",ComparatorType.UTF8TYPE);
•
columnFamilyDefinition.setKeyValidationClass("org.apache.
cassandra.db.marshal.UTF8Type");
•
columnFamilyDefinition.setDefaultValidationClass("org.apa
che.cassandra.db.marshal.UTF8Type");
•
//myCluster.addColumnFamily(columnFamilyDefinition) ;
• columnFamilyDefinition.setId(1013);
•
myCluster.updateColumnFamily(columnFamilyDefinition);

Hector
• Keyspace myKeyspace =
HFactory.createKeyspace("s3", myCluster);
• Mutator<String> mutator =
HFactory.createMutator(myKeyspace,
StringSerializer.get());

• mutator.insert("b", "bb",
HFactory.createStringColumn("column1", "你好
在"));

Hector
• ColumnQuery q = HFactory.createColumnQuery(myKeyspace,
StringSerializer.get(), StringSerializer.get(), StringSerializer.get());
• // set key, name, cf and execute
• QueryResult<HColumn> r = q
• .setColumnFamily("bb")
• .setKey("b")
• .setName("column1")
• .execute();
• // read value from the result
• HColumn<String,String> c = r.get();
• String value = c.getValue();
• System.out.println(value);

Cassandra

More Related Content

What's hot

Viewers also liked

Similar to Cassandra

More from exsuns

Recently uploaded

Cassandra