SlideShare uma empresa Scribd logo
1 de 37
cassandra
Where Did Cassandra Come From
• Cassandra originated at Facebook in 2007 to
  solve that company’s inbox search problem
  – large volumes of data
  – many random reads
  – many simultaneous random writes
• was released as an open source Google Code
  project in July 2008
• March 2009 it was moved to an Apache Incubator
  project
• February 17, 2010 it was voted into a top-level
  project
Cassandra in 50 Words or Less
• Apache Cassandra is an
    –   open source
    –   distributed
    –   Decentralized
    –   elastically scalable
    –   highly available
    –   fault-tolerant
    –   tuneably consistent
    –   column-oriented
•   Database that
•   bases its distribution design on Amazon’s Dynamo
•   its data model on Google’s Bigtable
•   Created at Facebook
•   it is now used at some of the most popular sites on the Web
Who Is Using Cassandra
• Twitter is using Cassandra for analytics.
• Mahalo uses it for its primary near-time data store.
• Facebook still uses it for inbox search, though they are using a
  proprietary fork.
• Digg uses it for its primary near-time data store.
• Rackspace uses it for its cloud service, monitoring, and logging.
• Reddit uses it as a persistent cache.
• Cloudkick uses it for monitoring statistics and analytics.
• Ooyala uses it to store and serve near real-time video analytics
  data.
• SimpleGeo uses it as the main data store for its real-time location
  infrastructure.
• Onespot uses it for a subset of its main data store
Decentralized


• Master/slave:
     Decentralized                Master/slave
     all nodes are the same,      If the master node fails, the
     failures of a                whole database is in jeopardy
     node won’t disrupt service
Elastic Scalability
• add another machine—Cassandra will find it
  and start sending it work
High Availability and Fault Tolerance
SCID
• Atomic
  – All or nothing
• Consistent

• Isolated
  – Two transaction modify same data
• Durable
Brewer’s CAP Theorem
• you can strongly support only two of the Three:
  – Consistency
     • All database client will read the same value for same query,
       even given concurrent updates
  – Availability
     • All database clients will always be able to read and write
       data
  – Partition Tolerance
     • The database can be split into multiple machines
     • It can continue functioning in fact of network segmentation
       breaks
CAP




transaction
usage
•   Connect localhost/9160 ;
•   Show cluster name
•   Show keyspaces
•   Create keyspace XXXXX
•   Use XXXXX
•   Create column family YYYYY
•   Describe keyspace XXXXX
• Set YYYYY[“XiaoMing”][“name”] = “小明”
• Get YYYYY[“XiaoMing”]
• List
• Map
• MapList<row_id, Map>
• Column Family 列簇
• create column family User
  with key_validation_class=UTF8Type
Column family
• Ddd
Super column family
• d
Clusters (Ring)
• If the first node goes down, a replica can
  respond to queries. The peer-to-peer protocol
  allows the data to replicate across nodes in a
  manner transparent to the user

• Replaction factor
Keyspaces
• Don’t add too much Keyspaces

• (database)
Gossip protocols
• intra-ring communication so that each node
  can have state information about other nodes
• Runs every second
• Gossip Message:
  – Send: GossipDigestSynMessage
  – Ack: GossipDigestAckMessage
  – send: GossipDigestAck2Message
• algorithm :
  – Phi Accrual Failure Detection
Anti-entropy
• Anti-entropy is the replica synchronization
  mechanism in Cassandra for ensuring that
  data on different nodes is updated to the
  newest version
• Merkle tree
Memtable&SSTable&CommitLog
• Memtable
  – Value is written to a memory-resident data structure
• SSTable
  – Include: Data, Index, and Filter
  – concept borrowed from Google’s Bigtable
  – Memtable reaches a threshold, flushed to disk
• Commit log
  – Flush status: 0 / 1
     • 1:start to flush
     • 0: flush success
hinted handoff & Compaction
• hinted handoff
  – When a write no available
  – Create a hint to node Cassandra


• Compaction:
  – In order to merge SSTable
  – merged data is sorted
  – new index is created over the sorted data
major compaction
• stored in memory
• used to improve performance by reducing disk
  access on key lookups
Tombstones 墓碑
• Knows as “soft delete”
• Not immediately deleted after execute a
  delete operation
• Garbage Collection Grace Seconds:
  – GCGraceSeconds
     • Default: 10 days (864000 sec)
Staged Event-Driven Architecture
                (SEDA)
• originally proposed in a 2001 paper called “SEDA: An
  Architecture for Well-Conditioned, Scalable Internet
  Services”
• A stage consists of an incoming event queue
   –   Read
   –   Mutation
   –   Gossip
   –   Response
   –   Anti-Entropy
   –   Load Balance
   –   Migration
   –   Streaming
   –   …
Custom FactoryUtil
• Prevent version uncompatible
Configuring Cassandra
• system_add_keyspace
   – Creates a keyspace.
• system_rename_keyspace
   – Changes the name of a keyspace after taking a snapshot of it. Note that this
     method
   – blocks until its work is done.
• system_drop_keyspace
   – Deletes an entire keyspace after taking a snapshot of it.
• system_add_column_family
   – Creates a column family.
• system_drop_column_family
   – Deletes a column family after taking a snapshot of it.
• system_rename_column_family
   – Changes the name of a column family after taking a snapshot of it. Note that
     this
   – method blocks until its work is done.
Creating a Column Family
•   column_type
      – Either Super or Standard.
•   clock_type
      – The only valid value is Timestamp.
•   comparator
      – Valid options include AsciiType, BytesType, LexicalUUIDType, LongType, TimeUUID Type, and UTF8Type.
•   subcomparator
      – Name of comparator used for subcolumns when the column_type is Super. Valid options are the same as comparator.
•   reconciler
      – Name of the class that will reconcile conflicting column versions. The only valid value at this time is Timestamp.
•   comment
      – Any human-readable comment in the form of a string.
•   rows_cached
      – The number of rows to cache.
•   preload_row_cache
      – Set this to true to automatically load the row cache.
•   key_cache_size
      – The number of keys to pull into the cache.
•   read_repair_chance
      – Valid values are a number between 0.0 and 1.0.
Replicas
• Simple Strategy
  – RackUnawareStrategy
• Old Network Topology Strategy
  – RackAwareStrategy
• Network Topology Strategy
  – DataCenterShardStrategy
  – datacenter.properties
Replication Factor
• specifies how many copies of each piece of
  data will be stored and distributed throughout
  the Cassandra cluster
• Factor = 1 : your data will exist only in a single
  node in the cluster. Losing that node means
  that data becomes unavailable
Increasing the Replication Factor
• Nodes grows and should increasing factor
• How to do:
  – ensure that all the data is flushed to the SSTables
     • flush -h 192.168.1.1 -p 9160
  – stop that node
  – copy the datafiles from your keyspaces
  – Paste those datafiles to the new node
Replica Placement Strategies
• Simple Strategy
• Old Network Topology Strategy
• Network Topology Strategy
Adding Nodes to a Cluster
• If you want to add a new seed node, then you should
  autobootstrap it first, and then change it to a seed
  afterward

• Node1:
   – listen_address: 192.168.1.1
   – rpc_address: 0.0.0.0
• Node2:
   – auto_bootstrap: true
   – listen_address: 192.168.2.34
   – rpc_address: 0.0.0.0
Hector
• Cluster myCluster =
  HFactory.getOrCreateCluster("Test Cluster",
  "192.168.2.3:9160");

• ThriftCfDef columnFamilyDefinition = new
  ThriftCfDef("s3","nb",ComparatorType.UTF8TYPE
  );
•
  columnFamilyDefinition.setReplicateOnWrite(tru
  e);
Hector
• ThriftCfDef columnFamilyDefinition = new
  ThriftCfDef("s3","bb",ComparatorType.UTF8TYPE);
•
  columnFamilyDefinition.setKeyValidationClass("org.apache.
  cassandra.db.marshal.UTF8Type");
•
  columnFamilyDefinition.setDefaultValidationClass("org.apa
  che.cassandra.db.marshal.UTF8Type");
•
  //myCluster.addColumnFamily(columnFamilyDefinition) ;
•     columnFamilyDefinition.setId(1013);
•
  myCluster.updateColumnFamily(columnFamilyDefinition);
Hector
• Keyspace myKeyspace =
  HFactory.createKeyspace("s3", myCluster);
•      Mutator<String> mutator =
  HFactory.createMutator(myKeyspace,
  StringSerializer.get());


•     mutator.insert("b", "bb",
    HFactory.createStringColumn("column1", "你好
    在"));
Hector
• ColumnQuery q = HFactory.createColumnQuery(myKeyspace,
  StringSerializer.get(), StringSerializer.get(), StringSerializer.get());
• // set key, name, cf and execute
• QueryResult<HColumn> r = q
•      .setColumnFamily("bb")
•      .setKey("b")
•      .setName("column1")
•      .execute();
• // read value from the result
• HColumn<String,String> c = r.get();
• String value = c.getValue();
• System.out.println(value);

Mais conteúdo relacionado

Mais procurados

Monitoring Cassandra with Riemann
Monitoring Cassandra with RiemannMonitoring Cassandra with Riemann
Monitoring Cassandra with RiemannPatricia Gorla
 
Introduction to Real-Time Analytics with Cassandra and Hadoop
Introduction to Real-Time Analytics with Cassandra and HadoopIntroduction to Real-Time Analytics with Cassandra and Hadoop
Introduction to Real-Time Analytics with Cassandra and HadoopPatricia Gorla
 
Cassandra multi-datacenter operations essentials
Cassandra multi-datacenter operations essentialsCassandra multi-datacenter operations essentials
Cassandra multi-datacenter operations essentialsJulien Anguenot
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache CassandraRobert Stupp
 
Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016
Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016
Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016DataStax
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentationEdward Capriolo
 
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...DataStax
 
Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...
Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...
Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...DataStax
 
Leveraging Cassandra for real-time multi-datacenter public cloud analytics
Leveraging Cassandra for real-time multi-datacenter public cloud analyticsLeveraging Cassandra for real-time multi-datacenter public cloud analytics
Leveraging Cassandra for real-time multi-datacenter public cloud analyticsJulien Anguenot
 
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014Johnny Miller
 
Bulk Loading into Cassandra
Bulk Loading into CassandraBulk Loading into Cassandra
Bulk Loading into CassandraBrian Hess
 
Introduction to Cassandra and CQL for Java developers
Introduction to Cassandra and CQL for Java developersIntroduction to Cassandra and CQL for Java developers
Introduction to Cassandra and CQL for Java developersJulien Anguenot
 
How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)DataStax Academy
 
Processing 50,000 events per second with Cassandra and Spark
Processing 50,000 events per second with Cassandra and SparkProcessing 50,000 events per second with Cassandra and Spark
Processing 50,000 events per second with Cassandra and SparkBen Slater
 
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache AccumuloReal-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache AccumuloJoe Stein
 
Webinar: Getting Started with Apache Cassandra
Webinar: Getting Started with Apache CassandraWebinar: Getting Started with Apache Cassandra
Webinar: Getting Started with Apache CassandraDataStax
 
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...DataStax
 
Apache HBase in the Enterprise Data Hub at Cerner
Apache HBase in the Enterprise Data Hub at CernerApache HBase in the Enterprise Data Hub at Cerner
Apache HBase in the Enterprise Data Hub at CernerHBaseCon
 
Micro-batching: High-performance writes
Micro-batching: High-performance writesMicro-batching: High-performance writes
Micro-batching: High-performance writesInstaclustr
 

Mais procurados (20)

Monitoring Cassandra with Riemann
Monitoring Cassandra with RiemannMonitoring Cassandra with Riemann
Monitoring Cassandra with Riemann
 
Introduction to Real-Time Analytics with Cassandra and Hadoop
Introduction to Real-Time Analytics with Cassandra and HadoopIntroduction to Real-Time Analytics with Cassandra and Hadoop
Introduction to Real-Time Analytics with Cassandra and Hadoop
 
Cassandra multi-datacenter operations essentials
Cassandra multi-datacenter operations essentialsCassandra multi-datacenter operations essentials
Cassandra multi-datacenter operations essentials
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache Cassandra
 
Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016
Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016
Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentation
 
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
 
Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...
Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...
Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...
 
Leveraging Cassandra for real-time multi-datacenter public cloud analytics
Leveraging Cassandra for real-time multi-datacenter public cloud analyticsLeveraging Cassandra for real-time multi-datacenter public cloud analytics
Leveraging Cassandra for real-time multi-datacenter public cloud analytics
 
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
 
Bulk Loading into Cassandra
Bulk Loading into CassandraBulk Loading into Cassandra
Bulk Loading into Cassandra
 
Introduction to Cassandra and CQL for Java developers
Introduction to Cassandra and CQL for Java developersIntroduction to Cassandra and CQL for Java developers
Introduction to Cassandra and CQL for Java developers
 
How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)
 
BigData Developers MeetUp
BigData Developers MeetUpBigData Developers MeetUp
BigData Developers MeetUp
 
Processing 50,000 events per second with Cassandra and Spark
Processing 50,000 events per second with Cassandra and SparkProcessing 50,000 events per second with Cassandra and Spark
Processing 50,000 events per second with Cassandra and Spark
 
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache AccumuloReal-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
 
Webinar: Getting Started with Apache Cassandra
Webinar: Getting Started with Apache CassandraWebinar: Getting Started with Apache Cassandra
Webinar: Getting Started with Apache Cassandra
 
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
 
Apache HBase in the Enterprise Data Hub at Cerner
Apache HBase in the Enterprise Data Hub at CernerApache HBase in the Enterprise Data Hub at Cerner
Apache HBase in the Enterprise Data Hub at Cerner
 
Micro-batching: High-performance writes
Micro-batching: High-performance writesMicro-batching: High-performance writes
Micro-batching: High-performance writes
 

Destaque

Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & FeaturesDataStax Academy
 
An Overview of Apache Cassandra
An Overview of Apache CassandraAn Overview of Apache Cassandra
An Overview of Apache CassandraDataStax
 
Intro to cassandra
Intro to cassandraIntro to cassandra
Intro to cassandraAaron Ploetz
 
Introduciton to Apache Cassandra for Java Developers (JavaOne)
Introduciton to Apache Cassandra for Java Developers (JavaOne)Introduciton to Apache Cassandra for Java Developers (JavaOne)
Introduciton to Apache Cassandra for Java Developers (JavaOne)zznate
 
Apache Cassandra
Apache CassandraApache Cassandra
Apache CassandraSperasoft
 
Dağıtık Sistemler / Programlama
Dağıtık Sistemler / ProgramlamaDağıtık Sistemler / Programlama
Dağıtık Sistemler / ProgramlamaŞahabettin Akca
 
Application Development with Apache Cassandra as a Service
Application Development with Apache Cassandra as a ServiceApplication Development with Apache Cassandra as a Service
Application Development with Apache Cassandra as a ServiceWSO2
 
Cursos Big Data Open Source
Cursos Big Data Open SourceCursos Big Data Open Source
Cursos Big Data Open SourceStratebi
 
Cassandra By Example: Data Modelling with CQL3
Cassandra By Example: Data Modelling with CQL3Cassandra By Example: Data Modelling with CQL3
Cassandra By Example: Data Modelling with CQL3Eric Evans
 
Apache cassandra architecture internals
Apache cassandra architecture internalsApache cassandra architecture internals
Apache cassandra architecture internalsBhuvan Rawal
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcachedJurriaan Persyn
 

Destaque (14)

Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
 
An Overview of Apache Cassandra
An Overview of Apache CassandraAn Overview of Apache Cassandra
An Overview of Apache Cassandra
 
Cassandra
CassandraCassandra
Cassandra
 
Intro to cassandra
Intro to cassandraIntro to cassandra
Intro to cassandra
 
Introduciton to Apache Cassandra for Java Developers (JavaOne)
Introduciton to Apache Cassandra for Java Developers (JavaOne)Introduciton to Apache Cassandra for Java Developers (JavaOne)
Introduciton to Apache Cassandra for Java Developers (JavaOne)
 
Apache Cassandra
Apache CassandraApache Cassandra
Apache Cassandra
 
Dağıtık Sistemler / Programlama
Dağıtık Sistemler / ProgramlamaDağıtık Sistemler / Programlama
Dağıtık Sistemler / Programlama
 
Application Development with Apache Cassandra as a Service
Application Development with Apache Cassandra as a ServiceApplication Development with Apache Cassandra as a Service
Application Development with Apache Cassandra as a Service
 
Cursos Big Data Open Source
Cursos Big Data Open SourceCursos Big Data Open Source
Cursos Big Data Open Source
 
Cassandra By Example: Data Modelling with CQL3
Cassandra By Example: Data Modelling with CQL3Cassandra By Example: Data Modelling with CQL3
Cassandra By Example: Data Modelling with CQL3
 
Bases de Datos No Relacionales (NoSQL): Cassandra, CouchDB, MongoDB y Neo4j
Bases de Datos No Relacionales (NoSQL): Cassandra, CouchDB, MongoDB y Neo4jBases de Datos No Relacionales (NoSQL): Cassandra, CouchDB, MongoDB y Neo4j
Bases de Datos No Relacionales (NoSQL): Cassandra, CouchDB, MongoDB y Neo4j
 
Apache cassandra architecture internals
Apache cassandra architecture internalsApache cassandra architecture internals
Apache cassandra architecture internals
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcached
 
Cassandra NoSQL Tutorial
Cassandra NoSQL TutorialCassandra NoSQL Tutorial
Cassandra NoSQL Tutorial
 

Semelhante a Cassandra

Cassandra - A decentralized storage system
Cassandra - A decentralized storage systemCassandra - A decentralized storage system
Cassandra - A decentralized storage systemArunit Gupta
 
7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth7. Key-Value Databases: In Depth
7. Key-Value Databases: In DepthFabio Fumarola
 
Cassandra an overview
Cassandra an overviewCassandra an overview
Cassandra an overviewPritamKathar
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningVitsRangannavar
 
NoSQL - Cassandra & MongoDB.pptx
NoSQL -  Cassandra & MongoDB.pptxNoSQL -  Cassandra & MongoDB.pptx
NoSQL - Cassandra & MongoDB.pptxNaveen Kumar
 
Deep Dive into Cassandra
Deep Dive into CassandraDeep Dive into Cassandra
Deep Dive into CassandraBrent Theisen
 
Managing Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchManaging Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchJoe Alex
 
Centralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stackCentralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stackRich Lee
 
Cassandra tech talk
Cassandra tech talkCassandra tech talk
Cassandra tech talkSatish Mehta
 
Cassandra from the trenches: migrating Netflix (update)
Cassandra from the trenches: migrating Netflix (update)Cassandra from the trenches: migrating Netflix (update)
Cassandra from the trenches: migrating Netflix (update)Jason Brown
 
C* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag Jambhekar
C* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag JambhekarC* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag Jambhekar
C* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag JambhekarDataStax Academy
 
A Closer Look at Apache Kudu
A Closer Look at Apache KuduA Closer Look at Apache Kudu
A Closer Look at Apache KuduAndriy Zabavskyy
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scalethelabdude
 
Chicago Kafka Meetup
Chicago Kafka MeetupChicago Kafka Meetup
Chicago Kafka MeetupCliff Gilmore
 
Talk about apache cassandra, TWJUG 2011
Talk about apache cassandra, TWJUG 2011Talk about apache cassandra, TWJUG 2011
Talk about apache cassandra, TWJUG 2011Boris Yen
 
Talk About Apache Cassandra
Talk About Apache CassandraTalk About Apache Cassandra
Talk About Apache CassandraJacky Chu
 

Semelhante a Cassandra (20)

Cassandra - A decentralized storage system
Cassandra - A decentralized storage systemCassandra - A decentralized storage system
Cassandra - A decentralized storage system
 
L6.sp17.pptx
L6.sp17.pptxL6.sp17.pptx
L6.sp17.pptx
 
7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth
 
Cassandra an overview
Cassandra an overviewCassandra an overview
Cassandra an overview
 
Apache cassandra
Apache cassandraApache cassandra
Apache cassandra
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
 
NoSQL - Cassandra & MongoDB.pptx
NoSQL -  Cassandra & MongoDB.pptxNoSQL -  Cassandra & MongoDB.pptx
NoSQL - Cassandra & MongoDB.pptx
 
Deep Dive into Cassandra
Deep Dive into CassandraDeep Dive into Cassandra
Deep Dive into Cassandra
 
Managing Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchManaging Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using Elasticsearch
 
Master.pptx
Master.pptxMaster.pptx
Master.pptx
 
Centralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stackCentralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stack
 
Cassandra tech talk
Cassandra tech talkCassandra tech talk
Cassandra tech talk
 
Cassandra from the trenches: migrating Netflix (update)
Cassandra from the trenches: migrating Netflix (update)Cassandra from the trenches: migrating Netflix (update)
Cassandra from the trenches: migrating Netflix (update)
 
Cassandra training
Cassandra trainingCassandra training
Cassandra training
 
C* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag Jambhekar
C* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag JambhekarC* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag Jambhekar
C* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag Jambhekar
 
A Closer Look at Apache Kudu
A Closer Look at Apache KuduA Closer Look at Apache Kudu
A Closer Look at Apache Kudu
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
 
Chicago Kafka Meetup
Chicago Kafka MeetupChicago Kafka Meetup
Chicago Kafka Meetup
 
Talk about apache cassandra, TWJUG 2011
Talk about apache cassandra, TWJUG 2011Talk about apache cassandra, TWJUG 2011
Talk about apache cassandra, TWJUG 2011
 
Talk About Apache Cassandra
Talk About Apache CassandraTalk About Apache Cassandra
Talk About Apache Cassandra
 

Mais de exsuns

Hadoop 20111215
Hadoop 20111215Hadoop 20111215
Hadoop 20111215exsuns
 
Statistics
StatisticsStatistics
Statisticsexsuns
 
Hadoop 20111117
Hadoop 20111117Hadoop 20111117
Hadoop 20111117exsuns
 
java memory management & gc
java memory management & gcjava memory management & gc
java memory management & gcexsuns
 

Mais de exsuns (6)

Hadoop 20111215
Hadoop 20111215Hadoop 20111215
Hadoop 20111215
 
Statistics
StatisticsStatistics
Statistics
 
R
RR
R
 
Ios
IosIos
Ios
 
Hadoop 20111117
Hadoop 20111117Hadoop 20111117
Hadoop 20111117
 
java memory management & gc
java memory management & gcjava memory management & gc
java memory management & gc
 

Último

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 

Último (20)

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 

Cassandra

  • 2. Where Did Cassandra Come From • Cassandra originated at Facebook in 2007 to solve that company’s inbox search problem – large volumes of data – many random reads – many simultaneous random writes • was released as an open source Google Code project in July 2008 • March 2009 it was moved to an Apache Incubator project • February 17, 2010 it was voted into a top-level project
  • 3. Cassandra in 50 Words or Less • Apache Cassandra is an – open source – distributed – Decentralized – elastically scalable – highly available – fault-tolerant – tuneably consistent – column-oriented • Database that • bases its distribution design on Amazon’s Dynamo • its data model on Google’s Bigtable • Created at Facebook • it is now used at some of the most popular sites on the Web
  • 4. Who Is Using Cassandra • Twitter is using Cassandra for analytics. • Mahalo uses it for its primary near-time data store. • Facebook still uses it for inbox search, though they are using a proprietary fork. • Digg uses it for its primary near-time data store. • Rackspace uses it for its cloud service, monitoring, and logging. • Reddit uses it as a persistent cache. • Cloudkick uses it for monitoring statistics and analytics. • Ooyala uses it to store and serve near real-time video analytics data. • SimpleGeo uses it as the main data store for its real-time location infrastructure. • Onespot uses it for a subset of its main data store
  • 5. Decentralized • Master/slave: Decentralized Master/slave all nodes are the same, If the master node fails, the failures of a whole database is in jeopardy node won’t disrupt service
  • 6. Elastic Scalability • add another machine—Cassandra will find it and start sending it work
  • 7. High Availability and Fault Tolerance
  • 8. SCID • Atomic – All or nothing • Consistent • Isolated – Two transaction modify same data • Durable
  • 9. Brewer’s CAP Theorem • you can strongly support only two of the Three: – Consistency • All database client will read the same value for same query, even given concurrent updates – Availability • All database clients will always be able to read and write data – Partition Tolerance • The database can be split into multiple machines • It can continue functioning in fact of network segmentation breaks
  • 11. usage • Connect localhost/9160 ; • Show cluster name • Show keyspaces • Create keyspace XXXXX • Use XXXXX • Create column family YYYYY • Describe keyspace XXXXX
  • 12. • Set YYYYY[“XiaoMing”][“name”] = “小明” • Get YYYYY[“XiaoMing”]
  • 13. • List • Map • MapList<row_id, Map>
  • 14. • Column Family 列簇 • create column family User with key_validation_class=UTF8Type
  • 17. Clusters (Ring) • If the first node goes down, a replica can respond to queries. The peer-to-peer protocol allows the data to replicate across nodes in a manner transparent to the user • Replaction factor
  • 18. Keyspaces • Don’t add too much Keyspaces • (database)
  • 19. Gossip protocols • intra-ring communication so that each node can have state information about other nodes • Runs every second • Gossip Message: – Send: GossipDigestSynMessage – Ack: GossipDigestAckMessage – send: GossipDigestAck2Message • algorithm : – Phi Accrual Failure Detection
  • 20. Anti-entropy • Anti-entropy is the replica synchronization mechanism in Cassandra for ensuring that data on different nodes is updated to the newest version • Merkle tree
  • 21. Memtable&SSTable&CommitLog • Memtable – Value is written to a memory-resident data structure • SSTable – Include: Data, Index, and Filter – concept borrowed from Google’s Bigtable – Memtable reaches a threshold, flushed to disk • Commit log – Flush status: 0 / 1 • 1:start to flush • 0: flush success
  • 22. hinted handoff & Compaction • hinted handoff – When a write no available – Create a hint to node Cassandra • Compaction: – In order to merge SSTable – merged data is sorted – new index is created over the sorted data
  • 23. major compaction • stored in memory • used to improve performance by reducing disk access on key lookups
  • 24. Tombstones 墓碑 • Knows as “soft delete” • Not immediately deleted after execute a delete operation • Garbage Collection Grace Seconds: – GCGraceSeconds • Default: 10 days (864000 sec)
  • 25. Staged Event-Driven Architecture (SEDA) • originally proposed in a 2001 paper called “SEDA: An Architecture for Well-Conditioned, Scalable Internet Services” • A stage consists of an incoming event queue – Read – Mutation – Gossip – Response – Anti-Entropy – Load Balance – Migration – Streaming – …
  • 26. Custom FactoryUtil • Prevent version uncompatible
  • 27. Configuring Cassandra • system_add_keyspace – Creates a keyspace. • system_rename_keyspace – Changes the name of a keyspace after taking a snapshot of it. Note that this method – blocks until its work is done. • system_drop_keyspace – Deletes an entire keyspace after taking a snapshot of it. • system_add_column_family – Creates a column family. • system_drop_column_family – Deletes a column family after taking a snapshot of it. • system_rename_column_family – Changes the name of a column family after taking a snapshot of it. Note that this – method blocks until its work is done.
  • 28. Creating a Column Family • column_type – Either Super or Standard. • clock_type – The only valid value is Timestamp. • comparator – Valid options include AsciiType, BytesType, LexicalUUIDType, LongType, TimeUUID Type, and UTF8Type. • subcomparator – Name of comparator used for subcolumns when the column_type is Super. Valid options are the same as comparator. • reconciler – Name of the class that will reconcile conflicting column versions. The only valid value at this time is Timestamp. • comment – Any human-readable comment in the form of a string. • rows_cached – The number of rows to cache. • preload_row_cache – Set this to true to automatically load the row cache. • key_cache_size – The number of keys to pull into the cache. • read_repair_chance – Valid values are a number between 0.0 and 1.0.
  • 29. Replicas • Simple Strategy – RackUnawareStrategy • Old Network Topology Strategy – RackAwareStrategy • Network Topology Strategy – DataCenterShardStrategy – datacenter.properties
  • 30. Replication Factor • specifies how many copies of each piece of data will be stored and distributed throughout the Cassandra cluster • Factor = 1 : your data will exist only in a single node in the cluster. Losing that node means that data becomes unavailable
  • 31. Increasing the Replication Factor • Nodes grows and should increasing factor • How to do: – ensure that all the data is flushed to the SSTables • flush -h 192.168.1.1 -p 9160 – stop that node – copy the datafiles from your keyspaces – Paste those datafiles to the new node
  • 32. Replica Placement Strategies • Simple Strategy • Old Network Topology Strategy • Network Topology Strategy
  • 33. Adding Nodes to a Cluster • If you want to add a new seed node, then you should autobootstrap it first, and then change it to a seed afterward • Node1: – listen_address: 192.168.1.1 – rpc_address: 0.0.0.0 • Node2: – auto_bootstrap: true – listen_address: 192.168.2.34 – rpc_address: 0.0.0.0
  • 34. Hector • Cluster myCluster = HFactory.getOrCreateCluster("Test Cluster", "192.168.2.3:9160"); • ThriftCfDef columnFamilyDefinition = new ThriftCfDef("s3","nb",ComparatorType.UTF8TYPE ); • columnFamilyDefinition.setReplicateOnWrite(tru e);
  • 35. Hector • ThriftCfDef columnFamilyDefinition = new ThriftCfDef("s3","bb",ComparatorType.UTF8TYPE); • columnFamilyDefinition.setKeyValidationClass("org.apache. cassandra.db.marshal.UTF8Type"); • columnFamilyDefinition.setDefaultValidationClass("org.apa che.cassandra.db.marshal.UTF8Type"); • //myCluster.addColumnFamily(columnFamilyDefinition) ; • columnFamilyDefinition.setId(1013); • myCluster.updateColumnFamily(columnFamilyDefinition);
  • 36. Hector • Keyspace myKeyspace = HFactory.createKeyspace("s3", myCluster); • Mutator<String> mutator = HFactory.createMutator(myKeyspace, StringSerializer.get()); • mutator.insert("b", "bb", HFactory.createStringColumn("column1", "你好 在"));
  • 37. Hector • ColumnQuery q = HFactory.createColumnQuery(myKeyspace, StringSerializer.get(), StringSerializer.get(), StringSerializer.get()); • // set key, name, cf and execute • QueryResult<HColumn> r = q • .setColumnFamily("bb") • .setKey("b") • .setName("column1") • .execute(); • // read value from the result • HColumn<String,String> c = r.get(); • String value = c.getValue(); • System.out.println(value);