cassandra@Netflix

•Transferir como PPTX, PDF•

2 gostaram•1,325 visualizações

nkorla1share

A brief overview of how cassandra is being used at Netflix

Tecnologia Educação

Cassandra @
Nitish Korla
Cloud Data Architect

Why Cassandra?
 High Availability / Fully distributed
 Scalability (Linear)
 Write performance
 Multi-region replication support (bi-directional)
 Simple to install and operate

Cassandra footprint @ Netflix
• 50+ Cassandra clusters
• 1000+ nodes holding 100+ TB data
• AWS 500 IOPS -> 100, 000 IOPS
• Streaming data completely persisted in Cassandra
• Related Open Source Projects
– Cassandra : in-house committer
– Priam : Cassandra Automation
– Test Tools : jmeter
– http://github.com/netflix

Device Keys - Cassandra
AWS EU-West
EU apps
AWS US-East
AWS US-West
US-E apps
US-W apps

Data Model
• Row-oriented
• Number of columns/Names can differ
name
xyz Paul zip 95123
name
abc Adam zip 94538 sex Male
namenk12 Nitish

Read/Write performance
• Write performance : Superfast!!
– Sequential I/O
– In-memory write
– Zero locking
• Point reads : high performant
• Range scans
– Need reverse-key indexes
– Assess the need for range scans (full-table scans)
– Use Netflix Astyanax client library

wide-row implementation
• Viewing history
22-
JAN100 json 1-MAR json
24-jan
501 Json
data
25-
jan
json
data
26-jan data
data
name1000 Nitish
28-jan Json
data
29-jan json
data

Think Data Archival
• Data stores in Netflix grow exponentially
• Have a process in place to archive data
– Work with Data Science Engineering /DW
– Move data to cheap H/W
– Set right expectations w.r.t latencies with historical data
• Cassandra TTL’s

read-modify-write patterns
• Read portion drives the overall latency
• Revisit your architecture

Observations
• Cassandra scales linearly without any noticeable
degradation to running cluster
• Read performance sufficient enough to remove
memcache in some cases
• Self-healing : minimal operational noise
• Developers
– mindset needed a shift from normalization to
denormalization
– Need to have reasonable understanding of Cassandra
architecture

Avoid surprises
• Benchmark …
• AWS makes it easy for us

Mais conteúdo relacionado

Mais procurados

Göteborg Distributed: Eventual Consistency in Apache CassandraJeremy Hanna

How to Monitor and Size Workloads on AWS i3 instancesScyllaDB

Migrating from a Relational Database to Cassandra: Why, Where, When and HowAnant Corporation

Cassandra vs Databases Anant Corporation

Big Data Day LA 2015 - Sparking up your Cassandra Cluster- Analytics made Awe...Data Con LA

ScyllaDB: What could you do with Cassandra compatibility at 1.8 million reque...Data Con LA

Cassandra for mission critical dataOleksandr Semenov

Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by ScyllaScyllaDB

FireEye & Scylla: Intel Threat Analysis Using a Graph DatabaseScyllaDB

Apache Cassandra in the Real WorldJeremy Hanna

Pythian: My First 100 days with a Cassandra ClusterDataStax Academy

ScyllaDB @ Apache BigData, may 2016Tzach Livyatan

Stream processing using Apache Storm - Big Data Meetup Athens 2016Adrianos Dadis

Managing Objects and Data in Apache CassandraDataStax

Introduction to Apache Cassandra Knoldus Inc.

Cassandra CLuster Management by Japan Cassandra CommunityHiromitsu Komatsu

Scylla Summit 2016: Graph Processing with Titan and ScyllaScyllaDB

Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...DataStax Academy

Cassandra background-and-architectureMarkus Klems

Comparing Apache Cassandra 4.0, 3.0, and ScyllaDBScyllaDB

Mais procurados (20)

Göteborg Distributed: Eventual Consistency in Apache Cassandra

How to Monitor and Size Workloads on AWS i3 instances

Migrating from a Relational Database to Cassandra: Why, Where, When and How

Cassandra vs Databases

Big Data Day LA 2015 - Sparking up your Cassandra Cluster- Analytics made Awe...

ScyllaDB: What could you do with Cassandra compatibility at 1.8 million reque...

Cassandra for mission critical data

Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla

FireEye & Scylla: Intel Threat Analysis Using a Graph Database

Apache Cassandra in the Real World

Pythian: My First 100 days with a Cassandra Cluster

ScyllaDB @ Apache BigData, may 2016

Stream processing using Apache Storm - Big Data Meetup Athens 2016

Managing Objects and Data in Apache Cassandra

Introduction to Apache Cassandra

Cassandra CLuster Management by Japan Cassandra Community

Scylla Summit 2016: Graph Processing with Titan and Scylla

Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...

Cassandra background-and-architecture

Comparing Apache Cassandra 4.0, 3.0, and ScyllaDB

Destaque

kafkaAriel Moskovich

Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...Data Con LA

Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016DataStax

Big data and Hadoop introductionDzung Nguyen

NoSQL databasesHarri Kauhanen

An Introduction To NoSQL & MongoDBLee Theobald

Relational databases vs Non-relational databasesJames Serra

Using Cloud in an Enterprise EnvironmentMike Crabb

Cassandra NoSQL TutorialMichelle Darling

A Beginners Guide to noSQLMike Crabb

Hadoop introduction , Why and What is Hadoop ?sudhakara st

Destaque (11)

kafka

Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...

Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016

Big data and Hadoop introduction

NoSQL databases

An Introduction To NoSQL & MongoDB

Relational databases vs Non-relational databases

Using Cloud in an Enterprise Environment

Cassandra NoSQL Tutorial

A Beginners Guide to noSQL

Hadoop introduction , Why and What is Hadoop ?

Semelhante a cassandra@Netflix

Svc 202-netflix-open-sourceRuslan Meshenberg

How Netflix’s Tools Can Help Accelerate Your Start-up (SVC202) | AWS re:Inven...Amazon Web Services

Aws for Startups Building Cloud Enabled AppsAmazon Web Services

cassandra_presentation_finalSergioBruno21

Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, ScalaHelena Edelson

Best of re:InventAmazon Web Services

Yow Conference Dec 2013 Netflix Workshop Slides with NotesAdrian Cockcroft

AWS re:Invent 2013 RecapBarry Jones

Web Scale Applications using NeflixOSS Cloud PlatformSudhir Tonse

The Best of re:invent 2016Amazon Web Services

Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...Lviv Startup Club

Building Your Data Warehouse with Amazon RedshiftAmazon Web Services

Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)Spark Summit

Serverless Web Apps using API Gateway, Lambda and DynamoDBAmazon Web Services

Scalable Object Storage with Apache CloudStack and Apache Hadoopbuildacloud

AWS Webcast - Managing Big Data in the AWS Cloud_20140924Amazon Web Services

The Netflix Open Source PlatformRuslan Meshenberg

SV Forum Platform Architecture SIG - Netflix Open Source PlatformAdrian Cockcroft

Scala and Spark are Ideal for Big DataJohn Nestor

Kafka spark cassandra webinar feb 16 2016 Hiromitsu Komatsu

Semelhante a cassandra@Netflix (20)

Svc 202-netflix-open-source

How Netflix’s Tools Can Help Accelerate Your Start-up (SVC202) | AWS re:Inven...

Aws for Startups Building Cloud Enabled Apps

cassandra_presentation_final

Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala

Best of re:Invent

Yow Conference Dec 2013 Netflix Workshop Slides with Notes

AWS re:Invent 2013 Recap

Web Scale Applications using NeflixOSS Cloud Platform

The Best of re:invent 2016

Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...

Building Your Data Warehouse with Amazon Redshift

Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)

Serverless Web Apps using API Gateway, Lambda and DynamoDB

Scalable Object Storage with Apache CloudStack and Apache Hadoop

AWS Webcast - Managing Big Data in the AWS Cloud_20140924

The Netflix Open Source Platform

SV Forum Platform Architecture SIG - Netflix Open Source Platform

Scala and Spark are Ideal for Big Data

Kafka spark cassandra webinar feb 16 2016

Último

Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University

DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy

DevEX - reference for building teams, processes, and platformsSergiu Bodiu

CloudStudio User manual (basic edition):comworks

Training state-of-the-art general text embeddingZilliz

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays

"ML in Production",Oleksandr BaganFwdays

Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm

Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren

Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz

Search Engine Optimization SEO PDF for 2024.pdfRankYa

SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal

Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106

Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar

Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software

Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst

cassandra@Netflix

1. Cassandra @ Nitish Korla Cloud Data Architect

2. Why Cassandra?  High Availability / Fully distributed  Scalability (Linear)  Write performance  Multi-region replication support (bi-directional)  Simple to install and operate

3. Cassandra footprint @ Netflix • 50+ Cassandra clusters • 1000+ nodes holding 100+ TB data • AWS 500 IOPS -> 100, 000 IOPS • Streaming data completely persisted in Cassandra • Related Open Source Projects – Cassandra : in-house committer – Priam : Cassandra Automation – Test Tools : jmeter – http://github.com/netflix

4. Device Keys - Cassandra AWS EU-West EU apps AWS US-East AWS US-West US-E apps US-W apps

5. Data Model • Row-oriented • Number of columns/Names can differ name xyz Paul zip 95123 name abc Adam zip 94538 sex Male namenk12 Nitish

6. Read/Write performance • Write performance : Superfast!! – Sequential I/O – In-memory write – Zero locking • Point reads : high performant • Range scans – Need reverse-key indexes – Assess the need for range scans (full-table scans) – Use Netflix Astyanax client library

7. wide-row implementation • Viewing history 22- JAN100 json 1-MAR json 24-jan 501 Json data 25- jan json data 26-jan data data name1000 Nitish 28-jan Json data 29-jan json data

8. Think Data Archival • Data stores in Netflix grow exponentially • Have a process in place to archive data – Work with Data Science Engineering /DW – Move data to cheap H/W – Set right expectations w.r.t latencies with historical data • Cassandra TTL’s

9. read-modify-write patterns • Read portion drives the overall latency • Revisit your architecture

10. Observations • Cassandra scales linearly without any noticeable degradation to running cluster • Read performance sufficient enough to remove memcache in some cases • Self-healing : minimal operational noise • Developers – mindset needed a shift from normalization to denormalization – Need to have reasonable understanding of Cassandra architecture

11. Avoid surprises • Benchmark … • AWS makes it easy for us

Notas do Editor

Share some practical data modeling lessons we have learned over past 2 yearsUnderstand your data use patterns and match it to your persistence store at the cost of DE normalizationVery important to spend time and come up appropriate data model – cost is high. Subscriber example
Start with some live example.. And then use it as segway to cover some best practices
Start with some live example.. And then use it as segway to cover some best practices
Rows are indexedColumns are sorted based on comparator you specify, so use it to your benefitKeep column names short as they are repeated Column size = 15 bytes + size of name + size of value Don’t store empty columns if there is no need – schema free design
Cassandra is for point queriesStill ok for small set of rows
We don’t have linear growthTTL fascinating feature… coming from oracle background
We don’t have linear growthTTL fascinating feature… coming from oracle background
gps 1.0
architecture to reap the benefits of distributed computing / high performance

cassandra@Netflix

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (11)

Semelhante a cassandra@Netflix

Semelhante a cassandra@Netflix (20)

Último

Último (20)

cassandra@Netflix

Notas do Editor