O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Cassandra20141009

391 visualizações

Publicada em

Talk given at McGrow-Hill Financial in Oct 2014

Publicada em: Tecnologia
  • Seja o primeiro a comentar

Cassandra20141009

  1. 1. Details And Data Modeling
  2. 2. Agenda  Quick Review Of Cassandra  New Developments In Cassandra  Basic Data Modeling Concepts  Materialized Views  Secondary Indexes  Counters  Time Series Data  Expiring Data 2
  3. 3. Cassandra High Level Cassandra's architecture is based on the combination of two technologies  Google BigTable – Data Model  Amazon Dynamo – Distributed Architecture  Cassandra = C* 3
  4. 4. Architecture Basics & Terminology  Nodes are single instances of C*  Cluster is a group of nodes  Data is organized by keys (tokens) which are distributed across the cluster  Replication Factor (rf) determines how many copies are key  Data Center Aware  Consistency Level – powerful feature to tune consistency vs speed vs availability.’ 4
  5. 5. C* Ring 5
  6. 6. More Architecture  Information on who has what data and who is available is transferred using gossip.  No single point of failure (SPF), every node can service requests.  Data Center Aware 6
  7. 7. CAP Theorem  Distributed Systems Law:  Consistency  Availability  Partition Tolerance (you can only really have two in a distributed system)  Cassandra is AP with Eventual Consistency 7
  8. 8. Consistency  Cassandra Uses the concept of Tunable Consistency, which make it very powerful and flexible for system needs. 8
  9. 9. C* Persistence Model 9
  10. 10. Read Path 10
  11. 11. Write Path 11
  12. 12. Data Model Architecture  Keyspace – container of column families (tables). Defines RF among others.  Table – column family. Contains definition of schema.  Row – a “record” identified by a key  Column - a key and a value 12
  13. 13. 13
  14. 14. Keys  Primary Key  Partition Key – identifies a row  Cluster Key – sorting within a row  Using CQL these are defined together as a compound (composite) key  Compound keys are how you implement “wide rows” which we will look at a lot! 14
  15. 15. Single Primary Key create table users ( user_id UUID PRIMARY KEY, firstname text, lastname text, emailaddres text ); ** Cassandra Data Types http://www.datastax.com/documentation/cql/3.0/cql/cql _reference/cql_data_types_c.html 15
  16. 16. Compound Key create table users ( emailaddress text, department text, firstname text, lastname text, PRIMARY KEY (emailaddress, department) );  Partition Key plus Cluster Key  emailaddress is partition key  department is cluster key 16
  17. 17. Compound Key create table users ( emailaddress text, department text, country text, firstname text, lastname text, PRIMARY KEY ((emailaddress, department), country) );  Partition Key plus Cluster Key  Emailaddress & department is partition key  country is cluster key 17
  18. 18. Deletions  Distributed systems present unique problem for deletes. If it actually deleted data and a node was down and didn’t receive the delete notice it would try and create record when came back online. So…  Tombstone - The data is replaced with a special value called a Tombstone, works within distributed architecture 18
  19. 19. New Rules  Writes Are Cheap  Denormalize All You Need  Model Your Queries, Not Data (understand access patterns)  Application Worries About Joins 19
  20. 20. What’s New In 2.0 Conditional DDL IF Exists or If Not Exists Drop Column Support ALTER TABLE users DROP lastname; 20
  21. 21. More New Stuff  Triggers CREATE TRIGGER myTrigger ON myTable USING 'com.thejavaexperts.cassandra.updateevt'  Lightweight Transactions (CAS) UPDATE users SET firstname = 'tim' WHERE emailaddress = 'tpeters@example.com' IF firstname = 'tom'; ** Not like an ACID Transaction!! 21
  22. 22. CAS & Transactions  CAS - compare-and-set operations. In a single, atomic operation compares a value of a column in the database and applying a modification depending on the result of the comparison.  Consider performance hit. CAS is (was) considered an anti-pattern. 22
  23. 23. Data Modeling… The Basics  Cassandra now is very familiar to RDBMS/SQL users.  Very nicely hides the underlying data storage model.  Still have all the power of Cassandra, it is all in the key definition. RDBMS = model data Cassandra = model access (queries) 23
  24. 24. Side-Note On Querying  Create table with compound key  Select using ALLOW FILTERING  Counts  Select using IN or = 24
  25. 25. Batch Operations  Saves Network Roundtrips  Can contain INSERT, UPDATE, DELETE  Atomic by default (all or nothing)  Can use timestamp for specific ordering 25
  26. 26. Batch Operation Example BEGIN BATCH INSERT INTO users (emailaddress, firstname, lastname, country) values ('brian.enochson@gmail.com', 'brian', 'enochson', 'USA'); INSERT INTO users (emailaddress, firstname, lastname, country) values ('tpeters@example.com', 'tom', 'peters', 'DE'); INSERT INTO users (emailaddress, firstname, lastname, country) values ('jsmith@example.com', 'jim', 'smith', 'USA'); INSERT INTO users (emailaddress, firstname, lastname, country) values ('arogers@example.com', 'alan', 'rogers', 'USA'); DELETE FROM users WHERE emailaddress = 'jsmith@example.com'; APPLY BATCH;  select in cqlsh  List in cassandra-cli with timestamp 26
  27. 27. More Data Modeling…  No Joins  No Foreign Keys  No Third (or any other) Normal Form Concerns  Redundant Data Encouraged. Apps maintain consistency. 27
  28. 28. Secondary Indexes  Allow defining indexes to allow other access than partition key.  Each node has a local index for its data.  They have uses, but shouldn’t be used all the time without consideration.  We will look at alternatives. 28
  29. 29. Secondary Index Example  Create a table  Try to select with column not in PK  Add Secondary Index  Try select again. 29
  30. 30. When to use?  Low Cardinality – small number of unique values  High Cardinality – high number of distinct values  Secondary Indexes are good for Low Cardinality. So country codes, department codes etc. Not email addresses. 30
  31. 31. Materialized View  Want full distribution can use what is called a Materialized View pattern.  Remember redundant data is fine.  Model the queries 31
  32. 32. Materialized View Example  Show normal able with compound key and querying limitations  Create Materialized View Table With Different Compound Key, support alternate access.  Selects use partition key.  Secondary indexes local, not distributed  Allow filtering. Can cause performance issues 32
  33. 33. Counters  Updated in 2.1 and now work in a more distributed and accurate manner.  Table organization, example  How to update, view etc. 33
  34. 34. Time Series Example….  Time series table model.  Need to consider interval for event frequency and wide row size.  Make what is tracked by time and unit of interval partition key. 34
  35. 35. Time Series Data  Due to its quick writing model Cassandra is suited for storing time series data.  The Cassandra wide row is a perfect fit for modeling time series / time based events.  Let’s look at an example…. 35
  36. 36. Event Data  Notice primary key and cluster key.  Insert some data  View in CQL, then in CLI as wide row 36
  37. 37. TTL – Self Expiring Data  Another technique is data that has a defined lifespan.  For instance session identifiers, temporary passwords etc.  For this Cassandra provides a Time To Live (TTL) mechanism. 37
  38. 38. TTL Example…  Create table  Insert data using TTL  Can update specific column with table  Show using selects. 38
  39. 39. Questions  Email: brian.enochson@gmail.com  Twitter: @benochso  G+: https://plus.google.com/+BrianEnochson 39

×