SlideShare uma empresa Scribd logo
1 de 83
1,2,3,4
Add Another Data Store
(And Other Rhymes)


Eric Lubow
@elubow
elubow@simplereach.com
#cassandra12
Overview




1,2,3,4 Add Another Data Store (And Other Rhymes)   Eric Lubow   @elubow
Overview
•   SimpleReach




    1,2,3,4 Add Another Data Store (And Other Rhymes)   Eric Lubow   @elubow
Overview
•   SimpleReach
•   Definitions and Data Stores




    1,2,3,4 Add Another Data Store (And Other Rhymes)   Eric Lubow   @elubow
Overview
•   SimpleReach
•   Definitions and Data Stores
•   Evolution to Polyglottany




    1,2,3,4 Add Another Data Store (And Other Rhymes)   Eric Lubow   @elubow
Overview
•   SimpleReach
•   Definitions and Data Stores
•   Evolution to Polyglottany
•   Tie It Together




    1,2,3,4 Add Another Data Store (And Other Rhymes)   Eric Lubow   @elubow
Overview
•   SimpleReach
•   Definitions and Data Stores
•   Evolution to Polyglottany
•   Tie It Together
•   Questions


    1,2,3,4 Add Another Data Store (And Other Rhymes)   Eric Lubow   @elubow
Socially Intelligent



1,2,3,4 Add Another Data Store (And Other Rhymes)   Eric Lubow   @elubow
Size




1,2,3,4 Add Another Data Store (And Other Rhymes)   Eric Lubow   @elubow
Size
•   100m events
    recorded per day and
    growing




     1,2,3,4 Add Another Data Store (And Other Rhymes)   Eric Lubow   @elubow
Size
•   100m events
    recorded per day and
    growing
•   500m Pageviews per
    month and growing




     1,2,3,4 Add Another Data Store (And Other Rhymes)   Eric Lubow   @elubow
Polyglot Persistence
Polyglot Persistence, like polyglot programming, is all
about choosing the right persistence option for the task
at hand.
                                   http://www.sleberknight.com/blog/sleberkn/entry/polyglot_persistence




1,2,3,4 Add Another Data Store (And Other Rhymes)                             Eric Lubow     @elubow
Right Tool For The Job




1,2,3,4 Add Another Data Store (And Other Rhymes)   Eric Lubow   @elubow
Why?




1,2,3,4 Add Another Data Store (And Other Rhymes)   Eric Lubow   @elubow
Why?
•   Heavier READ loads vs heavier write loads




    1,2,3,4 Add Another Data Store (And Other Rhymes)   Eric Lubow   @elubow
Why?
•   Heavier READ loads vs heavier write loads
•   Data relationships may be less important




    1,2,3,4 Add Another Data Store (And Other Rhymes)   Eric Lubow   @elubow
Why?
•   Heavier READ loads vs heavier write loads
•   Data relationships may be less important
•   Different aspects of a system have different requirements




    1,2,3,4 Add Another Data Store (And Other Rhymes)   Eric Lubow   @elubow
No One Size Fits All




1,2,3,4 Add Another Data Store (And Other Rhymes)   Eric Lubow   @elubow
Tools




1,2,3,4 Add Another Data Store (And Other Rhymes)   Eric Lubow   @elubow
Free vs. Cost




1,2,3,4 Add Another Data Store (And Other Rhymes)   Eric Lubow   @elubow
Languages




1,2,3,4 Add Another Data Store (And Other Rhymes)   Eric Lubow   @elubow
Pre-Scale




1,2,3,4 Add Another Data Store (And Other Rhymes)   Eric Lubow   @elubow
Scale




1,2,3,4 Add Another Data Store (And Other Rhymes)   Eric Lubow   @elubow
SimpleReach Pre-Scale




1,2,3,4 Add Another Data Store (And Other Rhymes)   Eric Lubow   @elubow
SimpleReach




1,2,3,4 Add Another Data Store (And Other Rhymes)   Eric Lubow   @elubow
Cassandra




1,2,3,4 Add Another Data Store (And Other Rhymes)   Eric Lubow   @elubow
Cassandra
•   Large data volume ingestion




    1,2,3,4 Add Another Data Store (And Other Rhymes)   Eric Lubow   @elubow
Cassandra
•   Large data volume ingestion
•   Really fast writes to many locations (eventual consistency)




    1,2,3,4 Add Another Data Store (And Other Rhymes)             Eric Lubow   @elubow
Cassandra
•   Large data volume ingestion
•   Really fast writes to many locations (eventual consistency)
•   Query by column groups within rows




    1,2,3,4 Add Another Data Store (And Other Rhymes)             Eric Lubow   @elubow
Cassandra
•   Large data volume ingestion
•   Really fast writes to many locations (eventual consistency)
•   Query by column groups within rows
•   Range queries in Hive (partial CF scans)




    1,2,3,4 Add Another Data Store (And Other Rhymes)             Eric Lubow   @elubow
mongoDB




1,2,3,4 Add Another Data Store (And Other Rhymes)   Eric Lubow   @elubow
mongoDB
•   Fast atomic increments (Node.js is native JSON)




    1,2,3,4 Add Another Data Store (And Other Rhymes)   Eric Lubow   @elubow
mongoDB
•   Fast atomic increments (Node.js is native JSON)
•   Sharding for faster distributed increments




    1,2,3,4 Add Another Data Store (And Other Rhymes)   Eric Lubow   @elubow
mongoDB
•   Fast atomic increments (Node.js is native JSON)
•   Sharding for faster distributed increments
•   Solid ORM for Rails (MongoID)




    1,2,3,4 Add Another Data Store (And Other Rhymes)   Eric Lubow   @elubow
mongoDB
•   Fast atomic increments (Node.js is native JSON)
•   Sharding for faster distributed increments
•   Solid ORM for Rails (MongoID)
•   Fast access for pub/sub of durable/persisted documents




    1,2,3,4 Add Another Data Store (And Other Rhymes)        Eric Lubow   @elubow
Redis




1,2,3,4 Add Another Data Store (And Other Rhymes)   Eric Lubow   @elubow
Redis
•   Supports hundreds of thousands transactions per
    second




    1,2,3,4 Add Another Data Store (And Other Rhymes)   Eric Lubow   @elubow
Redis
•   Supports hundreds of thousands transactions per
    second
•   Great caching engine




    1,2,3,4 Add Another Data Store (And Other Rhymes)   Eric Lubow   @elubow
Redis
•   Supports hundreds of thousands transactions per
    second
•   Great caching engine
•   Supports useful variable types like sorted set




    1,2,3,4 Add Another Data Store (And Other Rhymes)   Eric Lubow   @elubow
Redis
•   Supports hundreds of thousands transactions per
    second
•   Great caching engine
•   Supports useful variable types like sorted set
•   Pay SerDe price on each access




    1,2,3,4 Add Another Data Store (And Other Rhymes)   Eric Lubow   @elubow
InfiniDB and Infobright




1,2,3,4 Add Another Data Store (And Other Rhymes)   Eric Lubow   @elubow
InfiniDB and Infobright
•   Column Stores for ad-hoc analytics queries in SQL




    1,2,3,4 Add Another Data Store (And Other Rhymes)   Eric Lubow   @elubow
InfiniDB and Infobright
•   Column Stores for ad-hoc analytics queries in SQL
•   Databases built for business intelligence




    1,2,3,4 Add Another Data Store (And Other Rhymes)   Eric Lubow   @elubow
InfiniDB and Infobright
•   Column Stores for ad-hoc analytics queries in SQL
•   Databases built for business intelligence
•   Heavy compression of data




    1,2,3,4 Add Another Data Store (And Other Rhymes)   Eric Lubow   @elubow
InfiniDB and Infobright
•   Column Stores for ad-hoc analytics queries in SQL
•   Databases built for business intelligence
•   Heavy compression of data
•   Pre-aggregated data (Extents/Knowledge Grid)




    1,2,3,4 Add Another Data Store (And Other Rhymes)   Eric Lubow   @elubow
Ruby, Node.js, Python




1,2,3,4 Add Another Data Store (And Other Rhymes)   Eric Lubow   @elubow
Ruby, Node.js, Python
•   Polyglottany doesn’t only apply to data stores




    1,2,3,4 Add Another Data Store (And Other Rhymes)   Eric Lubow   @elubow
Ruby, Node.js, Python
•   Polyglottany doesn’t only apply to data stores
•   Each language has its own benefit to each data storage layer




    1,2,3,4 Add Another Data Store (And Other Rhymes)        Eric Lubow   @elubow
Ruby, Node.js, Python
•   Polyglottany doesn’t only apply to data stores
•   Each language has its own benefit to each data storage layer
•   Each language has its own individual benefits




    1,2,3,4 Add Another Data Store (And Other Rhymes)        Eric Lubow   @elubow
Ruby, Node.js, Python
•   Polyglottany doesn’t only apply to data stores
•   Each language has its own benefit to each data storage layer
•   Each language has its own individual benefits
•   JSON, APIs, Performance




    1,2,3,4 Add Another Data Store (And Other Rhymes)        Eric Lubow   @elubow
Choice




1,2,3,4 Add Another Data Store (And Other Rhymes)   Eric Lubow   @elubow
Cons




1,2,3,4 Add Another Data Store (And Other Rhymes)   Eric Lubow   @elubow
Cons
•   Redis - Can only utilize a single core




    1,2,3,4 Add Another Data Store (And Other Rhymes)   Eric Lubow   @elubow
Cons
•   Redis - Can only utilize a single core
•   MySQL Column Store - DELETE/UPDATEs are VERY expensive




    1,2,3,4 Add Another Data Store (And Other Rhymes)        Eric Lubow   @elubow
Cons
•   Redis - Can only utilize a single core
•   MySQL Column Store - DELETE/UPDATEs are VERY expensive
•   Cassandra - No btree indexes




    1,2,3,4 Add Another Data Store (And Other Rhymes)        Eric Lubow   @elubow
Cons
•   Redis - Can only utilize a single core
•   MySQL Column Store - DELETE/UPDATEs are VERY expensive
•   Cassandra - No btree indexes
•   Mongo - Queries slow down when shard count increases. Indexes must fit in memory




    1,2,3,4 Add Another Data Store (And Other Rhymes)          Eric Lubow   @elubow
Cons
•   Redis - Can only utilize a single core
•   MySQL Column Store - DELETE/UPDATEs are VERY expensive
•   Cassandra - No btree indexes
•   Mongo - Queries slow down when shard count increases. Indexes must fit in memory
•   Python - Whitespace. Community




    1,2,3,4 Add Another Data Store (And Other Rhymes)          Eric Lubow   @elubow
Cons
•   Redis - Can only utilize a single core
•   MySQL Column Store - DELETE/UPDATEs are VERY expensive
•   Cassandra - No btree indexes
•   Mongo - Queries slow down when shard count increases. Indexes must fit in memory
•   Python - Whitespace. Community
•   Ruby - Not high performance enough for our standards




    1,2,3,4 Add Another Data Store (And Other Rhymes)          Eric Lubow   @elubow
Cons
•   Redis - Can only utilize a single core
•   MySQL Column Store - DELETE/UPDATEs are VERY expensive
•   Cassandra - No btree indexes
•   Mongo - Queries slow down when shard count increases. Indexes must fit in memory
•   Python - Whitespace. Community
•   Ruby - Not high performance enough for our standards
•   Javascript (Node.js) - Bad for CPU or IO intensive workloads


    1,2,3,4 Add Another Data Store (And Other Rhymes)              Eric Lubow   @elubow
Tying It Together




1,2,3,4 Add Another Data Store (And Other Rhymes)   Eric Lubow   @elubow
Tying It Together
•   Built in the cloud




    1,2,3,4 Add Another Data Store (And Other Rhymes)   Eric Lubow   @elubow
Tying It Together
•   Built in the cloud
•   Service Oriented Architecture (Internal API)




    1,2,3,4 Add Another Data Store (And Other Rhymes)   Eric Lubow   @elubow
Tying It Together
•   Built in the cloud
•   Service Oriented Architecture (Internal API)
•   Built Helenus (Cassandra Node.js driver)




    1,2,3,4 Add Another Data Store (And Other Rhymes)   Eric Lubow   @elubow
Tying It Together
•   Built in the cloud
•   Service Oriented Architecture (Internal API)
•   Built Helenus (Cassandra Node.js driver)
•   Data accuracy checks: visual and programmatic




    1,2,3,4 Add Another Data Store (And Other Rhymes)   Eric Lubow   @elubow
Tying It Together
•   Built in the cloud
•   Service Oriented Architecture (Internal API)
•   Built Helenus (Cassandra Node.js driver)
•   Data accuracy checks: visual and programmatic
•   Built framework for testing out storage engines




    1,2,3,4 Add Another Data Store (And Other Rhymes)   Eric Lubow   @elubow
Service Architecture
 Analytics


 Real-time



                                         Internal API


1,2,3,4 Add Another Data Store (And Other Rhymes)       Eric Lubow   @elubow
Helenus




1,2,3,4 Add Another Data Store (And Other Rhymes)   Eric Lubow   @elubow
Helenus
•   Built Node.js driver for Cassandra




    1,2,3,4 Add Another Data Store (And Other Rhymes)   Eric Lubow   @elubow
Helenus
•   Built Node.js driver for Cassandra
•   https://github.com/simplereach/helenus




    1,2,3,4 Add Another Data Store (And Other Rhymes)   Eric Lubow   @elubow
Helenus
•   Built Node.js driver for Cassandra
•   https://github.com/simplereach/helenus
•   CQL 2/3, Composite Column, Thrift Interface




    1,2,3,4 Add Another Data Store (And Other Rhymes)   Eric Lubow   @elubow
Helenus
•   Built Node.js driver for Cassandra
•   https://github.com/simplereach/helenus
•   CQL 2/3, Composite Column, Thrift Interface
•   More about Node.js and Cassandra




    1,2,3,4 Add Another Data Store (And Other Rhymes)   Eric Lubow   @elubow
Points To Consider




1,2,3,4 Add Another Data Store (And Other Rhymes)   Eric Lubow   @elubow
Points To Consider
•   Data consistency - Same in all data stores




    1,2,3,4 Add Another Data Store (And Other Rhymes)   Eric Lubow   @elubow
Points To Consider
•   Data consistency - Same in all data stores
•   How important is data durability?




    1,2,3,4 Add Another Data Store (And Other Rhymes)   Eric Lubow   @elubow
Points To Consider
•   Data consistency - Same in all data stores
•   How important is data durability?
•   Managing many servers (Chef, AWS, CSSH)




    1,2,3,4 Add Another Data Store (And Other Rhymes)   Eric Lubow   @elubow
Points To Consider
•   Data consistency - Same in all data stores
•   How important is data durability?
•   Managing many servers (Chef, AWS, CSSH)
•   Managing and learning many different applications and
    tuning for them




    1,2,3,4 Add Another Data Store (And Other Rhymes)       Eric Lubow   @elubow
Summary




1,2,3,4 Add Another Data Store (And Other Rhymes)   Eric Lubow   @elubow
Summary
•   Polyglottany is not a sin




    1,2,3,4 Add Another Data Store (And Other Rhymes)   Eric Lubow   @elubow
Summary
•   Polyglottany is not a sin
•   Know your data read/write patterns




    1,2,3,4 Add Another Data Store (And Other Rhymes)   Eric Lubow   @elubow
Summary
•   Polyglottany is not a sin
•   Know your data read/write patterns
•   Know the tools available to you




    1,2,3,4 Add Another Data Store (And Other Rhymes)   Eric Lubow   @elubow
Summary
•   Polyglottany is not a sin
•   Know your data read/write patterns
•   Know the tools available to you
•   Know your compromises




    1,2,3,4 Add Another Data Store (And Other Rhymes)   Eric Lubow   @elubow
We’re Hiring




1,2,3,4 Add Another Data Store (And Other Rhymes)   Eric Lubow   @elubow
Questions are guaranteed in life.
Answers aren’t.
               Eric Lubow
               @elubow
               elubow@simplereach.com
               #cassandra12

               Thank you.

Mais conteúdo relacionado

Último

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Último (20)

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 

Destaque

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
ThinkNow
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 

Destaque (20)

Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 

1, 2, 3, 4, Add Another Data Store

  • 1. 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow elubow@simplereach.com #cassandra12
  • 2. Overview 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 3. Overview • SimpleReach 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 4. Overview • SimpleReach • Definitions and Data Stores 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 5. Overview • SimpleReach • Definitions and Data Stores • Evolution to Polyglottany 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 6. Overview • SimpleReach • Definitions and Data Stores • Evolution to Polyglottany • Tie It Together 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 7. Overview • SimpleReach • Definitions and Data Stores • Evolution to Polyglottany • Tie It Together • Questions 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 8. Socially Intelligent 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 9. Size 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 10. Size • 100m events recorded per day and growing 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 11. Size • 100m events recorded per day and growing • 500m Pageviews per month and growing 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 12. Polyglot Persistence Polyglot Persistence, like polyglot programming, is all about choosing the right persistence option for the task at hand. http://www.sleberknight.com/blog/sleberkn/entry/polyglot_persistence 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 13. Right Tool For The Job 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 14. Why? 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 15. Why? • Heavier READ loads vs heavier write loads 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 16. Why? • Heavier READ loads vs heavier write loads • Data relationships may be less important 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 17. Why? • Heavier READ loads vs heavier write loads • Data relationships may be less important • Different aspects of a system have different requirements 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 18. No One Size Fits All 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 19. Tools 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 20. Free vs. Cost 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 21. Languages 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 22. Pre-Scale 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 23. Scale 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 24. SimpleReach Pre-Scale 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 25. SimpleReach 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 26. Cassandra 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 27. Cassandra • Large data volume ingestion 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 28. Cassandra • Large data volume ingestion • Really fast writes to many locations (eventual consistency) 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 29. Cassandra • Large data volume ingestion • Really fast writes to many locations (eventual consistency) • Query by column groups within rows 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 30. Cassandra • Large data volume ingestion • Really fast writes to many locations (eventual consistency) • Query by column groups within rows • Range queries in Hive (partial CF scans) 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 31. mongoDB 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 32. mongoDB • Fast atomic increments (Node.js is native JSON) 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 33. mongoDB • Fast atomic increments (Node.js is native JSON) • Sharding for faster distributed increments 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 34. mongoDB • Fast atomic increments (Node.js is native JSON) • Sharding for faster distributed increments • Solid ORM for Rails (MongoID) 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 35. mongoDB • Fast atomic increments (Node.js is native JSON) • Sharding for faster distributed increments • Solid ORM for Rails (MongoID) • Fast access for pub/sub of durable/persisted documents 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 36. Redis 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 37. Redis • Supports hundreds of thousands transactions per second 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 38. Redis • Supports hundreds of thousands transactions per second • Great caching engine 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 39. Redis • Supports hundreds of thousands transactions per second • Great caching engine • Supports useful variable types like sorted set 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 40. Redis • Supports hundreds of thousands transactions per second • Great caching engine • Supports useful variable types like sorted set • Pay SerDe price on each access 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 41. InfiniDB and Infobright 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 42. InfiniDB and Infobright • Column Stores for ad-hoc analytics queries in SQL 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 43. InfiniDB and Infobright • Column Stores for ad-hoc analytics queries in SQL • Databases built for business intelligence 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 44. InfiniDB and Infobright • Column Stores for ad-hoc analytics queries in SQL • Databases built for business intelligence • Heavy compression of data 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 45. InfiniDB and Infobright • Column Stores for ad-hoc analytics queries in SQL • Databases built for business intelligence • Heavy compression of data • Pre-aggregated data (Extents/Knowledge Grid) 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 46. Ruby, Node.js, Python 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 47. Ruby, Node.js, Python • Polyglottany doesn’t only apply to data stores 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 48. Ruby, Node.js, Python • Polyglottany doesn’t only apply to data stores • Each language has its own benefit to each data storage layer 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 49. Ruby, Node.js, Python • Polyglottany doesn’t only apply to data stores • Each language has its own benefit to each data storage layer • Each language has its own individual benefits 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 50. Ruby, Node.js, Python • Polyglottany doesn’t only apply to data stores • Each language has its own benefit to each data storage layer • Each language has its own individual benefits • JSON, APIs, Performance 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 51. Choice 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 52. Cons 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 53. Cons • Redis - Can only utilize a single core 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 54. Cons • Redis - Can only utilize a single core • MySQL Column Store - DELETE/UPDATEs are VERY expensive 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 55. Cons • Redis - Can only utilize a single core • MySQL Column Store - DELETE/UPDATEs are VERY expensive • Cassandra - No btree indexes 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 56. Cons • Redis - Can only utilize a single core • MySQL Column Store - DELETE/UPDATEs are VERY expensive • Cassandra - No btree indexes • Mongo - Queries slow down when shard count increases. Indexes must fit in memory 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 57. Cons • Redis - Can only utilize a single core • MySQL Column Store - DELETE/UPDATEs are VERY expensive • Cassandra - No btree indexes • Mongo - Queries slow down when shard count increases. Indexes must fit in memory • Python - Whitespace. Community 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 58. Cons • Redis - Can only utilize a single core • MySQL Column Store - DELETE/UPDATEs are VERY expensive • Cassandra - No btree indexes • Mongo - Queries slow down when shard count increases. Indexes must fit in memory • Python - Whitespace. Community • Ruby - Not high performance enough for our standards 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 59. Cons • Redis - Can only utilize a single core • MySQL Column Store - DELETE/UPDATEs are VERY expensive • Cassandra - No btree indexes • Mongo - Queries slow down when shard count increases. Indexes must fit in memory • Python - Whitespace. Community • Ruby - Not high performance enough for our standards • Javascript (Node.js) - Bad for CPU or IO intensive workloads 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 60. Tying It Together 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 61. Tying It Together • Built in the cloud 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 62. Tying It Together • Built in the cloud • Service Oriented Architecture (Internal API) 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 63. Tying It Together • Built in the cloud • Service Oriented Architecture (Internal API) • Built Helenus (Cassandra Node.js driver) 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 64. Tying It Together • Built in the cloud • Service Oriented Architecture (Internal API) • Built Helenus (Cassandra Node.js driver) • Data accuracy checks: visual and programmatic 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 65. Tying It Together • Built in the cloud • Service Oriented Architecture (Internal API) • Built Helenus (Cassandra Node.js driver) • Data accuracy checks: visual and programmatic • Built framework for testing out storage engines 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 66. Service Architecture Analytics Real-time Internal API 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 67. Helenus 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 68. Helenus • Built Node.js driver for Cassandra 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 69. Helenus • Built Node.js driver for Cassandra • https://github.com/simplereach/helenus 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 70. Helenus • Built Node.js driver for Cassandra • https://github.com/simplereach/helenus • CQL 2/3, Composite Column, Thrift Interface 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 71. Helenus • Built Node.js driver for Cassandra • https://github.com/simplereach/helenus • CQL 2/3, Composite Column, Thrift Interface • More about Node.js and Cassandra 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 72. Points To Consider 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 73. Points To Consider • Data consistency - Same in all data stores 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 74. Points To Consider • Data consistency - Same in all data stores • How important is data durability? 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 75. Points To Consider • Data consistency - Same in all data stores • How important is data durability? • Managing many servers (Chef, AWS, CSSH) 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 76. Points To Consider • Data consistency - Same in all data stores • How important is data durability? • Managing many servers (Chef, AWS, CSSH) • Managing and learning many different applications and tuning for them 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 77. Summary 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 78. Summary • Polyglottany is not a sin 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 79. Summary • Polyglottany is not a sin • Know your data read/write patterns 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 80. Summary • Polyglottany is not a sin • Know your data read/write patterns • Know the tools available to you 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 81. Summary • Polyglottany is not a sin • Know your data read/write patterns • Know the tools available to you • Know your compromises 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 82. We’re Hiring 1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow @elubow
  • 83. Questions are guaranteed in life. Answers aren’t. Eric Lubow @elubow elubow@simplereach.com #cassandra12 Thank you.

Notas do Editor

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. SimpleReach is a social intelligence tool for content creators. We track everything social action, on every major network, across the entire web in real-time. That means every like, tweet, pin, stumble and many more.\n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n
  27. \n
  28. \n
  29. \n
  30. \n
  31. \n
  32. \n
  33. \n
  34. \n
  35. \n
  36. \n
  37. \n
  38. \n
  39. \n
  40. \n
  41. \n
  42. \n
  43. \n
  44. \n
  45. \n
  46. \n
  47. \n
  48. \n
  49. \n
  50. \n
  51. \n
  52. \n
  53. \n
  54. \n
  55. \n
  56. \n
  57. \n
  58. \n
  59. \n
  60. \n
  61. \n
  62. \n
  63. \n
  64. \n
  65. \n
  66. \n
  67. \n
  68. \n
  69. \n
  70. \n