SlideShare uma empresa Scribd logo
1 de 45
Main sponsor




The Apache Cassandra storage
          engine
        Sylvain Lebresne
About me

• Sylvain Lebresne
• sylvain@datastax.com
• @pcmanus
• Work at
1. What is Apache Cassandra

2. Data Model

3. The storage engine
1. What is Apache Cassandra

2. Data Model

3. The storage engine
about:project

• Distributed data store aimed at big data.
• Apache project since 2010.
• Version 1.0 released last October.
• Proven in production (Netflix, Twitter,
  Reddit, Cisco, ...). Largest know cluster has
  over 300TB in over 400 machines.
Apache Cassandra
Apache Cassandra
A database:
Apache Cassandra
A database:
• distributed / decentralized
Apache Cassandra
A database:
• distributed / decentralized
• replicated & durable
Apache Cassandra
A database:
• distributed / decentralized
• replicated & durable
• scalable / elastic
Apache Cassandra
A database:
• distributed / decentralized
• replicated & durable
• scalable / elastic
Apache Cassandra
A database:
• distributed / decentralized
• replicated & durable
• scalable / elastic
• fault-tolerant / no SPOF
Apache Cassandra
A database:
• distributed / decentralized
• replicated & durable
• scalable / elastic
• fault-tolerant / no SPOF
• highly available
Apache Cassandra
A database:
• distributed / decentralized
• replicated & durable
• scalable / elastic
• fault-tolerant / no SPOF
• highly available
Apache Cassandra
A database:
• distributed / decentralized
• replicated & durable
• scalable / elastic
• fault-tolerant / no SPOF
• highly available
• data center aware


          US
                                Europe
1. What is Apache Cassandra

2. Data Model

3. The storage engine
Data Model


• Not SQL (no transaction, nor joins) but
  more than Key/Value.
• Inspired by Google BigTable
• Column families based.
Ex: user profiles
        “For each user, holds profile infos”


                   50e8-e29b
                  birth_year   1994

                   fname       Justin

                   lname       Bieber




Users
Ex: user profiles
        “For each user, holds profile infos”


          50e8-e29b             2ab1-f1b7
         birth_year   1994     birth_year     1978

          fname       Justin     email      a@kutcher.com



          lname       Bieber    fname        Ashton

                                lname       Kutcher




Users
Ex: user’s Tweets
           “For each user, tweets he has made”

                        50e8-e29b




Timeline
Ex: user’s Tweets
           “For each user, tweets he has made”

                          50e8-e29b
                            @LiveLoveKary glad you had
                      0     a good birthday #muchlove




Timeline
Ex: user’s Tweets
           “For each user, tweets he has made”

                          50e8-e29b
                            @NickDeMoura happy bday
                      1     my dude.


                            @LiveLoveKary glad you had
                      0     a good birthday #muchlove




Timeline
Ex: user’s Tweets
           “For each user, tweets he has made”

                          50e8-e29b
                            @MickyArison @miamiHEAT
                      2     thanks for the gam tonight


                            @NickDeMoura happy bday
                      1     my dude.


                            @LiveLoveKary glad you had
                      0     a good birthday #muchlove




Timeline
Ex: user’s Tweets
           “For each user, tweets he has made”

                          50e8-e29b
                            still a little tired. back in the
                      3     studio today with Timbaland


                            @MickyArison @miamiHEAT
                      2     thanks for the gam tonight


                            @NickDeMoura happy bday
                      1     my dude.


                            @LiveLoveKary glad you had
                      0     a good birthday #muchlove




Timeline
There’s more


• Secondary indexes
• Distributed counters
• Composite columns
1. What is Apache Cassandra

2. Data Model

3. The storage engine
Goal


• Writes are harder than reads to scale
• Spinning disks aren’t good with random I/O
• Goal: minimize random I/O
A write’s journey
 write( k1 , c1:v1 )

                                               Memory




                                  Memtable




Commit log



                                             Hard drive
A write’s journey
 write( k1 , c1:v1 )

                                                    Memory
                            k1 c1:v1




                                       Memtable



     k1 c1:v1




Commit log



                                                  Hard drive
A write’s journey
ack
                                 Memory
                k1 c1:v1




k1 c1:v1




                               Hard drive
A write’s journey
write(    k2   ,   c1:v1 c2:v2   )

                                                        Memory
                                     k1 c1:v1

                                     k2 c1:v1 c2:v2




  k1 c1:v1
k2 c1:v1 c2:v2




                                                      Hard drive
A write’s journey
write(    k1   , c1:v4 c3:v3 c2:v2 )

                                                                Memory
                                       k1 c1:v4 c2:v2 c3:v3

                                       k2 c1:v1 c2:v2




   k1 c1:v1
k2 c1:v1 c2:v2
k1 c1:v4 c3:v3
c2:v2




                                                              Hard drive
A write’s journey
                                              Memory




          flush




                 index
cleanup    k1 c1:v4 c2:v2 c3:v3

           k2 c1:v1 c2:v2


                                  SSTable




                                            Hard drive
A write’s journey
more updates

                                                             Memory
                                          k1 c1:v5 c4:v4

                                          k2 c1:v2 c3:v3




 k2 c1:v2 c3:v3
 k1 c1:v5 c4:v4
                         index
                   k1 c1:v4 c2:v2 c3:v3

                   k2 c1:v1 c2:v2




                                                           Hard drive
A write’s journey
                                              Memory




                        flush


       index                     index
 k1 c1:v4 c2:v2 c3:v3      k1 c1:v5 c4:v4

 k2 c1:v1 c2:v2            k2 c1:v2 c3:v3




                                            Hard drive
Writes properties


• No reads or seeks
• Only sequential I/O
• Immutable SSTables: easy snapshots
A read’s journey
read( k1 )
                                                        Memory
    ?




                    index                  index
              k1 c1:v4 c2:v2 c3:v3   k1 c1:v5 c4:v4

              k2 c1:v1 c2:v2         k2 c1:v2 c3:v3




                                                      Hard drive
A read’s journey
k1 c1:v5 c2:v2 c3:v3 c4:v4


                                                                 Memory

merge




                             index                  index
                       k1 c1:v4 c2:v2 c3:v3   k1 c1:v5 c4:v4

                       k2 c1:v1 c2:v2         k2 c1:v2 c3:v3




                                                               Hard drive
Compaction

• Goal: keep the number of SSTables low
• Merge sort against multiple sstables
• Sequential I/O
Compaction

• Goal: keep the number of SSTables low
• Merge sort against multiple sstables
• Sequential I/O
          index
    k1 c1:v4 c2:v2 c3:v3

    k2 c1:v1 c2:v2



          index
    k1 c1:v5 c4:v4

    k2 c1:v2 c3:v3
Compaction

• Goal: keep the number of SSTables low
• Merge sort against multiple sstables
• Sequential I/O
          index
    k1 c1:v4 c2:v2 c3:v3

    k2 c1:v1 c2:v2
                                      index
                                k1 c1:v5 c2:v2 c3:v3 c4:v4
          index                 k2 c1:v2 c2:v2 c3:v3

    k1 c1:v5 c4:v4

    k2 c1:v2 c3:v3
Optimizations

• Row Cache
• Bloom filters: eliminates whole SSTable
• Key Cache
• Rows & Columns Indexes
• ...
Other features

• Compression
• Checksums
• Time to live
Questions?
• Cassandra 1.1 scheduled in a couple of
  weeks

• http://cassandra.apache.org/
• http://wiki.apache.org/cassandra/
• http://www.datastax.com/docs/1.0

Mais conteúdo relacionado

Último

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 

Último (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 

Destaque

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

Destaque (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Apache Cassandra Storage Engine

  • 1. Main sponsor The Apache Cassandra storage engine Sylvain Lebresne
  • 2. About me • Sylvain Lebresne • sylvain@datastax.com • @pcmanus • Work at
  • 3. 1. What is Apache Cassandra 2. Data Model 3. The storage engine
  • 4. 1. What is Apache Cassandra 2. Data Model 3. The storage engine
  • 5. about:project • Distributed data store aimed at big data. • Apache project since 2010. • Version 1.0 released last October. • Proven in production (Netflix, Twitter, Reddit, Cisco, ...). Largest know cluster has over 300TB in over 400 machines.
  • 8. Apache Cassandra A database: • distributed / decentralized
  • 9. Apache Cassandra A database: • distributed / decentralized • replicated & durable
  • 10. Apache Cassandra A database: • distributed / decentralized • replicated & durable • scalable / elastic
  • 11. Apache Cassandra A database: • distributed / decentralized • replicated & durable • scalable / elastic
  • 12. Apache Cassandra A database: • distributed / decentralized • replicated & durable • scalable / elastic • fault-tolerant / no SPOF
  • 13. Apache Cassandra A database: • distributed / decentralized • replicated & durable • scalable / elastic • fault-tolerant / no SPOF • highly available
  • 14. Apache Cassandra A database: • distributed / decentralized • replicated & durable • scalable / elastic • fault-tolerant / no SPOF • highly available
  • 15. Apache Cassandra A database: • distributed / decentralized • replicated & durable • scalable / elastic • fault-tolerant / no SPOF • highly available • data center aware US Europe
  • 16. 1. What is Apache Cassandra 2. Data Model 3. The storage engine
  • 17. Data Model • Not SQL (no transaction, nor joins) but more than Key/Value. • Inspired by Google BigTable • Column families based.
  • 18. Ex: user profiles “For each user, holds profile infos” 50e8-e29b birth_year 1994 fname Justin lname Bieber Users
  • 19. Ex: user profiles “For each user, holds profile infos” 50e8-e29b 2ab1-f1b7 birth_year 1994 birth_year 1978 fname Justin email a@kutcher.com lname Bieber fname Ashton lname Kutcher Users
  • 20. Ex: user’s Tweets “For each user, tweets he has made” 50e8-e29b Timeline
  • 21. Ex: user’s Tweets “For each user, tweets he has made” 50e8-e29b @LiveLoveKary glad you had 0 a good birthday #muchlove Timeline
  • 22. Ex: user’s Tweets “For each user, tweets he has made” 50e8-e29b @NickDeMoura happy bday 1 my dude. @LiveLoveKary glad you had 0 a good birthday #muchlove Timeline
  • 23. Ex: user’s Tweets “For each user, tweets he has made” 50e8-e29b @MickyArison @miamiHEAT 2 thanks for the gam tonight @NickDeMoura happy bday 1 my dude. @LiveLoveKary glad you had 0 a good birthday #muchlove Timeline
  • 24. Ex: user’s Tweets “For each user, tweets he has made” 50e8-e29b still a little tired. back in the 3 studio today with Timbaland @MickyArison @miamiHEAT 2 thanks for the gam tonight @NickDeMoura happy bday 1 my dude. @LiveLoveKary glad you had 0 a good birthday #muchlove Timeline
  • 25. There’s more • Secondary indexes • Distributed counters • Composite columns
  • 26. 1. What is Apache Cassandra 2. Data Model 3. The storage engine
  • 27. Goal • Writes are harder than reads to scale • Spinning disks aren’t good with random I/O • Goal: minimize random I/O
  • 28. A write’s journey write( k1 , c1:v1 ) Memory Memtable Commit log Hard drive
  • 29. A write’s journey write( k1 , c1:v1 ) Memory k1 c1:v1 Memtable k1 c1:v1 Commit log Hard drive
  • 30. A write’s journey ack Memory k1 c1:v1 k1 c1:v1 Hard drive
  • 31. A write’s journey write( k2 , c1:v1 c2:v2 ) Memory k1 c1:v1 k2 c1:v1 c2:v2 k1 c1:v1 k2 c1:v1 c2:v2 Hard drive
  • 32. A write’s journey write( k1 , c1:v4 c3:v3 c2:v2 ) Memory k1 c1:v4 c2:v2 c3:v3 k2 c1:v1 c2:v2 k1 c1:v1 k2 c1:v1 c2:v2 k1 c1:v4 c3:v3 c2:v2 Hard drive
  • 33. A write’s journey Memory flush index cleanup k1 c1:v4 c2:v2 c3:v3 k2 c1:v1 c2:v2 SSTable Hard drive
  • 34. A write’s journey more updates Memory k1 c1:v5 c4:v4 k2 c1:v2 c3:v3 k2 c1:v2 c3:v3 k1 c1:v5 c4:v4 index k1 c1:v4 c2:v2 c3:v3 k2 c1:v1 c2:v2 Hard drive
  • 35. A write’s journey Memory flush index index k1 c1:v4 c2:v2 c3:v3 k1 c1:v5 c4:v4 k2 c1:v1 c2:v2 k2 c1:v2 c3:v3 Hard drive
  • 36. Writes properties • No reads or seeks • Only sequential I/O • Immutable SSTables: easy snapshots
  • 37. A read’s journey read( k1 ) Memory ? index index k1 c1:v4 c2:v2 c3:v3 k1 c1:v5 c4:v4 k2 c1:v1 c2:v2 k2 c1:v2 c3:v3 Hard drive
  • 38. A read’s journey k1 c1:v5 c2:v2 c3:v3 c4:v4 Memory merge index index k1 c1:v4 c2:v2 c3:v3 k1 c1:v5 c4:v4 k2 c1:v1 c2:v2 k2 c1:v2 c3:v3 Hard drive
  • 39. Compaction • Goal: keep the number of SSTables low • Merge sort against multiple sstables • Sequential I/O
  • 40. Compaction • Goal: keep the number of SSTables low • Merge sort against multiple sstables • Sequential I/O index k1 c1:v4 c2:v2 c3:v3 k2 c1:v1 c2:v2 index k1 c1:v5 c4:v4 k2 c1:v2 c3:v3
  • 41. Compaction • Goal: keep the number of SSTables low • Merge sort against multiple sstables • Sequential I/O index k1 c1:v4 c2:v2 c3:v3 k2 c1:v1 c2:v2 index k1 c1:v5 c2:v2 c3:v3 c4:v4 index k2 c1:v2 c2:v2 c3:v3 k1 c1:v5 c4:v4 k2 c1:v2 c3:v3
  • 42. Optimizations • Row Cache • Bloom filters: eliminates whole SSTable • Key Cache • Rows & Columns Indexes • ...
  • 43. Other features • Compression • Checksums • Time to live
  • 45. • Cassandra 1.1 scheduled in a couple of weeks • http://cassandra.apache.org/ • http://wiki.apache.org/cassandra/ • http://www.datastax.com/docs/1.0

Notas do Editor

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n
  27. \n
  28. \n
  29. \n
  30. \n
  31. \n
  32. \n
  33. \n
  34. \n
  35. \n
  36. \n
  37. \n
  38. \n
  39. \n
  40. \n
  41. \n
  42. \n
  43. \n
  44. \n
  45. \n
  46. \n
  47. \n
  48. \n
  49. \n
  50. \n
  51. \n
  52. \n
  53. \n
  54. \n
  55. \n
  56. \n
  57. \n
  58. \n
  59. \n
  60. \n
  61. \n
  62. \n
  63. \n
  64. \n
  65. \n
  66. \n
  67. \n