SlideShare a Scribd company logo
1 of 40
Cassandra from the trenches:
      migrating Netflix
          Jason Brown
    Senior Software Engineer
             Netflix
       @jasobrown jasedbrown@gmail.com

      http://www.linkedin.com/in/jasedbrown
Your host for the evening
• Sr. Software Engineer at Netflix > 3 years
  – Currently lead a team developing and operating
    AB testing infrastructure in EC2
  – Spent time migrating core e-commerce
    functionality out of PL/SQL and scaling it up
• MLB Advanced Media
  – Ran Ecommerce engineering group
• Wandered about in the wireless space
  (J2ME, BREW)
History
• In the beginning, there was the webapp
  – And a database, too
  – In one datacenter
• Then we grew, and grew, and grew
  – More databases, all conjoined
  – Database links with PL/SQL and M views
  – Multi-Master replication
History,2
• Then it melted down (2008)
  – Oracle MMR between two databases
  – SPOF – one Oracle instance for website (no
    backup)
• Couldn’t ship DVDs for ~3 days
History,3
• Time to rethink everything
  – Abandon datacenter for EC2
     • We’re not in the business of building datacenters
  – Ditch monolithic webapp for distributed systems
     • Greater independence for all teams/initiatives
  – Migrate SPOF database to …
History,4
• SimpleDb/S3
  – Somebody else manages your database (yeah!)
  – Tried it out, but didn’t quite work well for us
  – High latency, rate limiting (throttling), (no) auto-
    sharding, no backup problems
• Time to try out one of them (other) new
  fangled NoSql things…
Shiny new toy
• We selected Cassandra
  – Dynamo-model appealed to us
  – Column-based, key-value data model seemed
    sufficient for most needs
  – Performance looked great (rudimentary tests)
• Now what?
  – Put something into it
  – Run it in EC2
  – Sounds easy enough…
• Data Modeling
  – Where the rubber meets the road
About Netflix’s AB Testing
• We use it everywhere (no, really)
• Basic concepts
  – Test – An experiment where several competing
    behaviors are implemented and compared
  – Cell – different experiences within a test that are
    being compared against each other
  – Allocation – a customer-specific assignment to a
    cell within a test
     • Customer can only be in one cell of a test at a time
     • Generally immutable (very important for analysis)
Data Modeling - background
• AB has two sets of data
  – metadata about tests
  – allocations
• Both need to be migrated out of Oracle and
  into Cassandra in the cloud
AB - allocations
• Single table to hold allocations
  – Currently at ~950 million records
  – Plus indices!
• One record for every test that every customer
  is allocated into
• Unique constraint on customer/test
AB - metadata
• Fairly typical parent-child table relationship
• Not updated frequently, so service can cache
Data modeling in cassandra
• Every where I looked, the internets told me to
  understand my data use patterns
  – Understand the questions that you need to
    answer from the data
     • Meaning: know how to query your data structure the
       persistence model to match


• There’s no free lunch here, apparently
Identifying the AB questions that need
            to be answered
• get all allocations for a customer
• get count of customers in test/cell
• find all customers in a test/cell
  – So we can kick them out of the test
  – So we can clean up ancient data
  – So we can move them to a different cell in test
• find all customers allocated to test within a
  date range
  – So we can kick them out of the test
Modeling allocations in cassandra
• As we’re read-heavy, read all allocations for a
  customer as fast as possible
  – Denormalize allocations into a single row
  – But, how do I denormalize?
• Find all of customers in a test/cell = reverse
  index
• Get count of customers in test/cell = count the
  entries in the reverse index
Denormalization-HOWTO
• The internets talk about it, but no real world
  examples
  – ‘Normalization is for sissies’, Pat Helland
• Denormalizing allocations per customer
  – Trivial with a schema-less database
Denormalized allocations
• Sample normalized data




• Sample denormalized data (sparse!)
Implementing allocations
• As allocation for a customer has a handful of
  data points, they logically can be grouped
  together
• Hello, super columns
• Avoided blobs, json or otherwise
  – data race concerns
  – BI integration
  – Serialization alg changes could tank the data
Implementing allocations, second
               round
• But, cassandra devs secretly despise don’t
  enjoy super columns
• Switched to standard column family, using
  composite columns
• Composite columns are sorted by each ‘token’
  in name
  – This sorts each allocation’s data together (by
    testId)
Composite columns
• Allocation column naming convention
  – <testId>:<field>
  – 42:cell = 2
  – 42:enabled = Y
  – 47:cell = 0
  – 47:enabled = Y
• Using terse field names, but still have column
  name overhead (~15 bytes)
Implementing indices
• Cassandra’s secondary indices vs. hand-built
  and maintained alternate indices
• Secondary indices work great on uniform data
  between rows
• But sparse column data not so easy
Hand-built Indices, 1

• Reverse index
  – Test/cell (key) to custIds (columns)
     • Column value is timestamp
• Mutate on allocating a customer into test
Hand-built indices, 2
• Counter column family
  – Test/cell to count of customers in test columns
  – Mutate on allocating a customer into test
• Counters are not idempotent!
• Mutates need to write to every node that
  hosts that key
Index rebuilding
• Yeah, even Oracle needs to have it’s indices
  rebuilt
• Easy enough to rebuild the reverse index, but
  how about that counter column?
  – Read the reverse index for the count and write
    that as counter’s value
Modeling AB metadata in cassandra
• Explored several models, including json
  blobs, spreading across multiple CFs, differing
  degrees of denormalization
• Reverse index to identify all tests for loading
Implementing metadata
• One CF, one row for all test’s data
  – Every data point is a column – no blobs
• Composite columns
  – type:id:field
     • Types = base info, cells, allocation plans
     • Id = cell number, allocation plan (gu)id
     • Field = type-specific
        – Base info = test name, description, enabled
        – Cell’s name / description
        – Plan’s start/end dates, country to allocate to
Into the real world … here comes the hurt
Allocation mutates
• AB allocations are immutable, so how do you
  prevent mutating?
  – Oracle – unique constraint on table
  – Cassandra – read before write
• Read before write in a distributed system is a
  data race
Running cassandra
• Compactions happen
  – Part of the Cassandra lifestyle
  – Mutations are written to memory (memtable)
  – Flushed to disk (sstable) on triggering threshold
     • Time
     • Size
     • Operations against column family
  – Eventually, Cassandra decides to merge sstables as
    data for a individual rows becomes scattered
Compactions, 2
• Spikes happen, esp. on read-heavy systems
  – Everything can slow down
  – Sometimes, average latency > 95%ile
  – Throttling in newer Cass versions helps, I think
  – Affects clients (hector, astyanax)
Repairs
• Different from read repair!
• Fix all the data in a single node by pulling
  shared ranges from neighbor nodes
Repairs, 2
• Replication factor determines number of
  nodes involved in repair of single node
• Neighbor nodes will perform validation
  compaction
  – Pushes disk and network hard dep. on data size
• Guess what happens when you run a multi-
  region cluster?
Client libraries
• Round-robin is not the way to go for
  connection pooling
  – Coordinator Cassandra nodes will incorrectly be
    marked down rather than target slow node
• Token-aware is safer, faster, but harder to
  implement
Tunings, 1
• Key and row caches
  – Left unbounded can chew up jvm memory needed
    for normal work
  – Latencies will spike as the jvm needs to fight for
    memory
  – Off-heap row cache is better but still maintains
    data structures on-heap
Tunings, 2
• mmap() as in-memory cache
  – When process terminated, mmap pages are added
    to the free list
Tunings, 3
• Sizing memtable flushes for optimizing
  compactions
  – Easier when writes are uniformly
    distributed, timewise – easier to reason about
    flush patterns
  – Best to optimize flushes based on memtable
    size, not time
Tunings, 4
• Sharding
  – Not dead yet!
  – If a single row has disproportionately high
    gets/mutates, the nodes holding it will become
    hot spots
  – If a row grows too large, it won’t fit into memory
Takeaways
• Netflix is making all of our components
  distributed and fault tolerant as we grow
  domestically and internationally.

• Cassandra is a core piece of our cloud
  infrastructure.
終わり(The End)


• Q&A



        @jasobrown jasedbrown@gmail.com

        http://www.linkedin.com/in/jasedbrown
References
• Pat Helland, ‘Normalization Is for Sissies”
  http://blogs.msdn.com/b/pathelland/archive/
  2007/07/23/normalization-is-for-sissies.aspx
• btoddb, “Storage Sizing” http://btoddb-cass-
  storage.blogspot.com/

More Related Content

What's hot

Divide and conquer in the cloud
Divide and conquer in the cloudDivide and conquer in the cloud
Divide and conquer in the cloudJustin Swanhart
 
Hw09 Practical HBase Getting The Most From Your H Base Install
Hw09   Practical HBase  Getting The Most From Your H Base InstallHw09   Practical HBase  Getting The Most From Your H Base Install
Hw09 Practical HBase Getting The Most From Your H Base InstallCloudera, Inc.
 
Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)alexbaranau
 
HBase Schema Design - HBase-Con 2012
HBase Schema Design - HBase-Con 2012HBase Schema Design - HBase-Con 2012
HBase Schema Design - HBase-Con 2012Ian Varley
 
Shard-Query, an MPP database for the cloud using the LAMP stack
Shard-Query, an MPP database for the cloud using the LAMP stackShard-Query, an MPP database for the cloud using the LAMP stack
Shard-Query, an MPP database for the cloud using the LAMP stackJustin Swanhart
 
SQL Server 2014 In-Memory OLTP
SQL Server 2014 In-Memory OLTPSQL Server 2014 In-Memory OLTP
SQL Server 2014 In-Memory OLTPTony Rogerson
 
Bigtable and Boxwood
Bigtable and BoxwoodBigtable and Boxwood
Bigtable and BoxwoodEvan Weaver
 
Introduction to cassandra
Introduction to cassandraIntroduction to cassandra
Introduction to cassandraTarun Garg
 
NoSQL databases - An introduction
NoSQL databases - An introductionNoSQL databases - An introduction
NoSQL databases - An introductionPooyan Mehrparvar
 
8. key value databases laboratory
8. key value databases laboratory 8. key value databases laboratory
8. key value databases laboratory Fabio Fumarola
 
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...Cloudera, Inc.
 
PostgreSQL as an Alternative to MSSQL
PostgreSQL as an Alternative to MSSQLPostgreSQL as an Alternative to MSSQL
PostgreSQL as an Alternative to MSSQLAlexei Krasner
 
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of FacebookTech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of FacebookThe Hive
 

What's hot (20)

Voldemort Nosql
Voldemort NosqlVoldemort Nosql
Voldemort Nosql
 
Fudcon talk.ppt
Fudcon talk.pptFudcon talk.ppt
Fudcon talk.ppt
 
Project Voldemort
Project VoldemortProject Voldemort
Project Voldemort
 
Divide and conquer in the cloud
Divide and conquer in the cloudDivide and conquer in the cloud
Divide and conquer in the cloud
 
No SQL and MongoDB - Hyderabad Scalability Meetup
No SQL and MongoDB - Hyderabad Scalability MeetupNo SQL and MongoDB - Hyderabad Scalability Meetup
No SQL and MongoDB - Hyderabad Scalability Meetup
 
Apache hadoop hbase
Apache hadoop hbaseApache hadoop hbase
Apache hadoop hbase
 
Hw09 Practical HBase Getting The Most From Your H Base Install
Hw09   Practical HBase  Getting The Most From Your H Base InstallHw09   Practical HBase  Getting The Most From Your H Base Install
Hw09 Practical HBase Getting The Most From Your H Base Install
 
Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)
 
HBase Schema Design - HBase-Con 2012
HBase Schema Design - HBase-Con 2012HBase Schema Design - HBase-Con 2012
HBase Schema Design - HBase-Con 2012
 
Shard-Query, an MPP database for the cloud using the LAMP stack
Shard-Query, an MPP database for the cloud using the LAMP stackShard-Query, an MPP database for the cloud using the LAMP stack
Shard-Query, an MPP database for the cloud using the LAMP stack
 
SQL Server 2014 In-Memory OLTP
SQL Server 2014 In-Memory OLTPSQL Server 2014 In-Memory OLTP
SQL Server 2014 In-Memory OLTP
 
Bigtable and Boxwood
Bigtable and BoxwoodBigtable and Boxwood
Bigtable and Boxwood
 
Introduction to cassandra
Introduction to cassandraIntroduction to cassandra
Introduction to cassandra
 
NoSQL databases - An introduction
NoSQL databases - An introductionNoSQL databases - An introduction
NoSQL databases - An introduction
 
PostgreSQL and MySQL
PostgreSQL and MySQLPostgreSQL and MySQL
PostgreSQL and MySQL
 
8. key value databases laboratory
8. key value databases laboratory 8. key value databases laboratory
8. key value databases laboratory
 
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
 
PostgreSQL as an Alternative to MSSQL
PostgreSQL as an Alternative to MSSQLPostgreSQL as an Alternative to MSSQL
PostgreSQL as an Alternative to MSSQL
 
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of FacebookTech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
 
Intro to column stores
Intro to column storesIntro to column stores
Intro to column stores
 

Viewers also liked

An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...
An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...
An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...DataStax
 
Instaclustr: When and how to migrate from a relational database to Cassandra
Instaclustr: When and how to migrate from a relational database to CassandraInstaclustr: When and how to migrate from a relational database to Cassandra
Instaclustr: When and how to migrate from a relational database to CassandraDataStax Academy
 
Netflix's Big Leap from Oracle to Cassandra
Netflix's Big Leap from Oracle to CassandraNetflix's Big Leap from Oracle to Cassandra
Netflix's Big Leap from Oracle to CassandraRoopa Tangirala
 
Using Spark to Load Oracle Data into Cassandra
Using Spark to Load Oracle Data into CassandraUsing Spark to Load Oracle Data into Cassandra
Using Spark to Load Oracle Data into CassandraJim Hatcher
 
Why Migrate from MySQL to Cassandra
Why Migrate from MySQL to CassandraWhy Migrate from MySQL to Cassandra
Why Migrate from MySQL to CassandraDATAVERSITY
 
Migration Best Practices: From RDBMS to Cassandra without a Hitch
Migration Best Practices: From RDBMS to Cassandra without a HitchMigration Best Practices: From RDBMS to Cassandra without a Hitch
Migration Best Practices: From RDBMS to Cassandra without a HitchDataStax Academy
 
Cassandra at eBay - Cassandra Summit 2012
Cassandra at eBay - Cassandra Summit 2012Cassandra at eBay - Cassandra Summit 2012
Cassandra at eBay - Cassandra Summit 2012Jay Patel
 
Migrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global CassandraMigrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global CassandraAdrian Cockcroft
 

Viewers also liked (8)

An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...
An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...
An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...
 
Instaclustr: When and how to migrate from a relational database to Cassandra
Instaclustr: When and how to migrate from a relational database to CassandraInstaclustr: When and how to migrate from a relational database to Cassandra
Instaclustr: When and how to migrate from a relational database to Cassandra
 
Netflix's Big Leap from Oracle to Cassandra
Netflix's Big Leap from Oracle to CassandraNetflix's Big Leap from Oracle to Cassandra
Netflix's Big Leap from Oracle to Cassandra
 
Using Spark to Load Oracle Data into Cassandra
Using Spark to Load Oracle Data into CassandraUsing Spark to Load Oracle Data into Cassandra
Using Spark to Load Oracle Data into Cassandra
 
Why Migrate from MySQL to Cassandra
Why Migrate from MySQL to CassandraWhy Migrate from MySQL to Cassandra
Why Migrate from MySQL to Cassandra
 
Migration Best Practices: From RDBMS to Cassandra without a Hitch
Migration Best Practices: From RDBMS to Cassandra without a HitchMigration Best Practices: From RDBMS to Cassandra without a Hitch
Migration Best Practices: From RDBMS to Cassandra without a Hitch
 
Cassandra at eBay - Cassandra Summit 2012
Cassandra at eBay - Cassandra Summit 2012Cassandra at eBay - Cassandra Summit 2012
Cassandra at eBay - Cassandra Summit 2012
 
Migrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global CassandraMigrating Netflix from Datacenter Oracle to Global Cassandra
Migrating Netflix from Datacenter Oracle to Global Cassandra
 

Similar to Cassandra from the trenches: migrating Netflix

Cassandra from the trenches: migrating Netflix (update)
Cassandra from the trenches: migrating Netflix (update)Cassandra from the trenches: migrating Netflix (update)
Cassandra from the trenches: migrating Netflix (update)Jason Brown
 
Introduction to cassandra
Introduction to cassandraIntroduction to cassandra
Introduction to cassandraNguyen Quang
 
Column db dol
Column db dolColumn db dol
Column db dolpoojabi
 
Efficient node bootstrapping for decentralised shared-nothing Key-Value Stores
Efficient node bootstrapping for decentralised shared-nothing Key-Value StoresEfficient node bootstrapping for decentralised shared-nothing Key-Value Stores
Efficient node bootstrapping for decentralised shared-nothing Key-Value StoresHan Li
 
Talk about apache cassandra, TWJUG 2011
Talk about apache cassandra, TWJUG 2011Talk about apache cassandra, TWJUG 2011
Talk about apache cassandra, TWJUG 2011Boris Yen
 
Talk About Apache Cassandra
Talk About Apache CassandraTalk About Apache Cassandra
Talk About Apache CassandraJacky Chu
 
The Rise of NoSQL and Polyglot Persistence
The Rise of NoSQL and Polyglot PersistenceThe Rise of NoSQL and Polyglot Persistence
The Rise of NoSQL and Polyglot PersistenceAbdelmonaim Remani
 
Use a data parallel approach to proAcess
Use a data parallel approach to proAcessUse a data parallel approach to proAcess
Use a data parallel approach to proAcess23mz02
 
Cassandra an overview
Cassandra an overviewCassandra an overview
Cassandra an overviewPritamKathar
 
Cassandra for mission critical data
Cassandra for mission critical dataCassandra for mission critical data
Cassandra for mission critical dataOleksandr Semenov
 
Storage Systems For Scalable systems
Storage Systems For Scalable systemsStorage Systems For Scalable systems
Storage Systems For Scalable systemselliando dias
 
NoSQL A brief look at Apache Cassandra Distributed Database
NoSQL A brief look at Apache Cassandra Distributed DatabaseNoSQL A brief look at Apache Cassandra Distributed Database
NoSQL A brief look at Apache Cassandra Distributed DatabaseJoe Alex
 
Applications in the Cloud
Applications in the CloudApplications in the Cloud
Applications in the CloudEberhard Wolff
 
Cassandra
CassandraCassandra
Cassandraexsuns
 

Similar to Cassandra from the trenches: migrating Netflix (20)

Cassandra from the trenches: migrating Netflix (update)
Cassandra from the trenches: migrating Netflix (update)Cassandra from the trenches: migrating Netflix (update)
Cassandra from the trenches: migrating Netflix (update)
 
Introduction to cassandra
Introduction to cassandraIntroduction to cassandra
Introduction to cassandra
 
Redshift deep dive
Redshift deep diveRedshift deep dive
Redshift deep dive
 
NoSql
NoSqlNoSql
NoSql
 
Column db dol
Column db dolColumn db dol
Column db dol
 
Efficient node bootstrapping for decentralised shared-nothing Key-Value Stores
Efficient node bootstrapping for decentralised shared-nothing Key-Value StoresEfficient node bootstrapping for decentralised shared-nothing Key-Value Stores
Efficient node bootstrapping for decentralised shared-nothing Key-Value Stores
 
L6.sp17.pptx
L6.sp17.pptxL6.sp17.pptx
L6.sp17.pptx
 
Talk about apache cassandra, TWJUG 2011
Talk about apache cassandra, TWJUG 2011Talk about apache cassandra, TWJUG 2011
Talk about apache cassandra, TWJUG 2011
 
Talk About Apache Cassandra
Talk About Apache CassandraTalk About Apache Cassandra
Talk About Apache Cassandra
 
The Rise of NoSQL and Polyglot Persistence
The Rise of NoSQL and Polyglot PersistenceThe Rise of NoSQL and Polyglot Persistence
The Rise of NoSQL and Polyglot Persistence
 
Use a data parallel approach to proAcess
Use a data parallel approach to proAcessUse a data parallel approach to proAcess
Use a data parallel approach to proAcess
 
Cassandra an overview
Cassandra an overviewCassandra an overview
Cassandra an overview
 
Cassandra for mission critical data
Cassandra for mission critical dataCassandra for mission critical data
Cassandra for mission critical data
 
No sql databases
No sql databasesNo sql databases
No sql databases
 
Master.pptx
Master.pptxMaster.pptx
Master.pptx
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
 
Storage Systems For Scalable systems
Storage Systems For Scalable systemsStorage Systems For Scalable systems
Storage Systems For Scalable systems
 
NoSQL A brief look at Apache Cassandra Distributed Database
NoSQL A brief look at Apache Cassandra Distributed DatabaseNoSQL A brief look at Apache Cassandra Distributed Database
NoSQL A brief look at Apache Cassandra Distributed Database
 
Applications in the Cloud
Applications in the CloudApplications in the Cloud
Applications in the Cloud
 
Cassandra
CassandraCassandra
Cassandra
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 

Recently uploaded (20)

Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 

Cassandra from the trenches: migrating Netflix

  • 1. Cassandra from the trenches: migrating Netflix Jason Brown Senior Software Engineer Netflix @jasobrown jasedbrown@gmail.com http://www.linkedin.com/in/jasedbrown
  • 2. Your host for the evening • Sr. Software Engineer at Netflix > 3 years – Currently lead a team developing and operating AB testing infrastructure in EC2 – Spent time migrating core e-commerce functionality out of PL/SQL and scaling it up • MLB Advanced Media – Ran Ecommerce engineering group • Wandered about in the wireless space (J2ME, BREW)
  • 3. History • In the beginning, there was the webapp – And a database, too – In one datacenter • Then we grew, and grew, and grew – More databases, all conjoined – Database links with PL/SQL and M views – Multi-Master replication
  • 4. History,2 • Then it melted down (2008) – Oracle MMR between two databases – SPOF – one Oracle instance for website (no backup) • Couldn’t ship DVDs for ~3 days
  • 5. History,3 • Time to rethink everything – Abandon datacenter for EC2 • We’re not in the business of building datacenters – Ditch monolithic webapp for distributed systems • Greater independence for all teams/initiatives – Migrate SPOF database to …
  • 6. History,4 • SimpleDb/S3 – Somebody else manages your database (yeah!) – Tried it out, but didn’t quite work well for us – High latency, rate limiting (throttling), (no) auto- sharding, no backup problems • Time to try out one of them (other) new fangled NoSql things…
  • 7. Shiny new toy • We selected Cassandra – Dynamo-model appealed to us – Column-based, key-value data model seemed sufficient for most needs – Performance looked great (rudimentary tests) • Now what? – Put something into it – Run it in EC2 – Sounds easy enough…
  • 8. • Data Modeling – Where the rubber meets the road
  • 9. About Netflix’s AB Testing • We use it everywhere (no, really) • Basic concepts – Test – An experiment where several competing behaviors are implemented and compared – Cell – different experiences within a test that are being compared against each other – Allocation – a customer-specific assignment to a cell within a test • Customer can only be in one cell of a test at a time • Generally immutable (very important for analysis)
  • 10. Data Modeling - background • AB has two sets of data – metadata about tests – allocations • Both need to be migrated out of Oracle and into Cassandra in the cloud
  • 11. AB - allocations • Single table to hold allocations – Currently at ~950 million records – Plus indices! • One record for every test that every customer is allocated into • Unique constraint on customer/test
  • 12. AB - metadata • Fairly typical parent-child table relationship • Not updated frequently, so service can cache
  • 13. Data modeling in cassandra • Every where I looked, the internets told me to understand my data use patterns – Understand the questions that you need to answer from the data • Meaning: know how to query your data structure the persistence model to match • There’s no free lunch here, apparently
  • 14. Identifying the AB questions that need to be answered • get all allocations for a customer • get count of customers in test/cell • find all customers in a test/cell – So we can kick them out of the test – So we can clean up ancient data – So we can move them to a different cell in test • find all customers allocated to test within a date range – So we can kick them out of the test
  • 15. Modeling allocations in cassandra • As we’re read-heavy, read all allocations for a customer as fast as possible – Denormalize allocations into a single row – But, how do I denormalize? • Find all of customers in a test/cell = reverse index • Get count of customers in test/cell = count the entries in the reverse index
  • 16. Denormalization-HOWTO • The internets talk about it, but no real world examples – ‘Normalization is for sissies’, Pat Helland • Denormalizing allocations per customer – Trivial with a schema-less database
  • 17. Denormalized allocations • Sample normalized data • Sample denormalized data (sparse!)
  • 18. Implementing allocations • As allocation for a customer has a handful of data points, they logically can be grouped together • Hello, super columns • Avoided blobs, json or otherwise – data race concerns – BI integration – Serialization alg changes could tank the data
  • 19. Implementing allocations, second round • But, cassandra devs secretly despise don’t enjoy super columns • Switched to standard column family, using composite columns • Composite columns are sorted by each ‘token’ in name – This sorts each allocation’s data together (by testId)
  • 20. Composite columns • Allocation column naming convention – <testId>:<field> – 42:cell = 2 – 42:enabled = Y – 47:cell = 0 – 47:enabled = Y • Using terse field names, but still have column name overhead (~15 bytes)
  • 21. Implementing indices • Cassandra’s secondary indices vs. hand-built and maintained alternate indices • Secondary indices work great on uniform data between rows • But sparse column data not so easy
  • 22. Hand-built Indices, 1 • Reverse index – Test/cell (key) to custIds (columns) • Column value is timestamp • Mutate on allocating a customer into test
  • 23. Hand-built indices, 2 • Counter column family – Test/cell to count of customers in test columns – Mutate on allocating a customer into test • Counters are not idempotent! • Mutates need to write to every node that hosts that key
  • 24. Index rebuilding • Yeah, even Oracle needs to have it’s indices rebuilt • Easy enough to rebuild the reverse index, but how about that counter column? – Read the reverse index for the count and write that as counter’s value
  • 25. Modeling AB metadata in cassandra • Explored several models, including json blobs, spreading across multiple CFs, differing degrees of denormalization • Reverse index to identify all tests for loading
  • 26. Implementing metadata • One CF, one row for all test’s data – Every data point is a column – no blobs • Composite columns – type:id:field • Types = base info, cells, allocation plans • Id = cell number, allocation plan (gu)id • Field = type-specific – Base info = test name, description, enabled – Cell’s name / description – Plan’s start/end dates, country to allocate to
  • 27. Into the real world … here comes the hurt
  • 28. Allocation mutates • AB allocations are immutable, so how do you prevent mutating? – Oracle – unique constraint on table – Cassandra – read before write • Read before write in a distributed system is a data race
  • 29. Running cassandra • Compactions happen – Part of the Cassandra lifestyle – Mutations are written to memory (memtable) – Flushed to disk (sstable) on triggering threshold • Time • Size • Operations against column family – Eventually, Cassandra decides to merge sstables as data for a individual rows becomes scattered
  • 30. Compactions, 2 • Spikes happen, esp. on read-heavy systems – Everything can slow down – Sometimes, average latency > 95%ile – Throttling in newer Cass versions helps, I think – Affects clients (hector, astyanax)
  • 31. Repairs • Different from read repair! • Fix all the data in a single node by pulling shared ranges from neighbor nodes
  • 32. Repairs, 2 • Replication factor determines number of nodes involved in repair of single node • Neighbor nodes will perform validation compaction – Pushes disk and network hard dep. on data size • Guess what happens when you run a multi- region cluster?
  • 33. Client libraries • Round-robin is not the way to go for connection pooling – Coordinator Cassandra nodes will incorrectly be marked down rather than target slow node • Token-aware is safer, faster, but harder to implement
  • 34. Tunings, 1 • Key and row caches – Left unbounded can chew up jvm memory needed for normal work – Latencies will spike as the jvm needs to fight for memory – Off-heap row cache is better but still maintains data structures on-heap
  • 35. Tunings, 2 • mmap() as in-memory cache – When process terminated, mmap pages are added to the free list
  • 36. Tunings, 3 • Sizing memtable flushes for optimizing compactions – Easier when writes are uniformly distributed, timewise – easier to reason about flush patterns – Best to optimize flushes based on memtable size, not time
  • 37. Tunings, 4 • Sharding – Not dead yet! – If a single row has disproportionately high gets/mutates, the nodes holding it will become hot spots – If a row grows too large, it won’t fit into memory
  • 38. Takeaways • Netflix is making all of our components distributed and fault tolerant as we grow domestically and internationally. • Cassandra is a core piece of our cloud infrastructure.
  • 39. 終わり(The End) • Q&A @jasobrown jasedbrown@gmail.com http://www.linkedin.com/in/jasedbrown
  • 40. References • Pat Helland, ‘Normalization Is for Sissies” http://blogs.msdn.com/b/pathelland/archive/ 2007/07/23/normalization-is-for-sissies.aspx • btoddb, “Storage Sizing” http://btoddb-cass- storage.blogspot.com/

Editor's Notes

  1. Point of departure from the datacenterData modeling -Relational to non-relImplementation(s)real world – Ops, tuning, compactions, gotchas
  2. Background as to why netflix has moved to the cloud and embraced new databases
  3. Circa mid-late 2010, we evaluated a bunch of database systems, primarily focusing on the new NoSQL breed.
  4. I lead AB testing and we’ll be using that data set as a model for discussion. I’ll describe the legacy oracle implementation and how I went about moving it to cass
  5. Show example of an AB test (1482) on the homepage
  6. Existing data sets in our legacy Oracle database that need to be migrated and transformed
  7. LAST SLIDE ON DATA MODELING! Next is running this in prod!
  8. Going to share real world issues from design, ops, performance
  9. Some some systems, as long as one writes wins (eventual consistency), all is fine
  10. Explain difference between read repair and node repair
  11. Makes minor compactions smoother
  12. Too large - AB Indices ran afoul of thisProblem for reads, compactions, and repairs