SlideShare a Scribd company logo
1 of 32
Download to read offline
The Big Data
Revolution is an
               Eric Lubow

               @elubow

               elubow@simplereach.co
Overvie
•   Evolution

•   SimpleReach

•   Data Stores / Languages

•   Architecture Implementation

                  Big Data Revolution is an   Eric Lubow  @elubow
                  Evolution                   #NYCassandra2013
We're in the midst of an
evolution, not a revolution.
       Big Data Revolution is an   Eric Lubow  @elubow
       Evolution                   #NYCassandra2013
The 2 Truths




      Big Data Revolution is an   Eric Lubow  @elubow
      Evolution                   #NYCassandra2013
The Real Truth
Even with the right tools, 80% of
the work of building a big data
system is acquiring and refining

          Big Data Revolution is an   Eric Lubow  @elubow
          Evolution                   #NYCassandra2013
30m plays/day + 4m user ratings + 75k movies metadata + 24.4m use
metadata =




    David Fincher + Kevin                   Mitch Hurwitz + Will Arnett +
  Spacey + British House of                     Jason Bateman + Arrested
           Cards                                      Development
                    Big Data Revolution is an     Eric Lubow  @elubow
                    Evolution                     #NYCassandra2013
BRING IT
TOGETHE

       Big Data Revolution is an   Eric Lubow  @elubow
       Evolution                   #NYCassandra2013
revolution                                          evolution
  Insufficient
                                                        New Products
  Capabilities



  Scale/Need                                           Development &
   Changes                                               Integration




                 Big Data Revolution is an   Eric Lubow  @elubow
                 Evolution                   #NYCassandra2013
Big Data Revolution is an   Eric Lubow  @elubow
Evolution                   #NYCassandra2013
Big Data Revolution is an   Eric Lubow  @elubow
Evolution                   #NYCassandra2013
SimpleReach
•   Millions of URLs per day

•   Over 1 billion pageviews per month

•   250m events per day (~3k events/second)

•   Auto-scale 90-130 machines depending on traffic


                   Big Data Revolution is an   Eric Lubow  @elubow
                   Evolution                   #NYCassandra2013
HUMBLE BEGINNINGS




  Big Data Revolution is an   Eric Lubow  @elubow
  Evolution                   #NYCassandra2013
Scale


        Big Data Revolution is an   Eric Lubow  @elubow
        Evolution                   #NYCassandra2013
AND THEN...



 C*


Big Data Revolution is an   Eric Lubow  @elubow
Evolution                   #NYCassandra2013
Cassandra                                                           C*
•   Large data volume ingestion at high velocity

•   Really fast writes to many locations (eventual
    consistency)

•   Query by column groups within rows (slicing)

•   TTLs for small group aggregation

•   Wrote Helenus, Node.js driver for Cassandra

                      Big Data Revolution is an   Eric Lubow  @elubow
                      Evolution                   #NYCassandra2013
•
    MongoDB
    Fast atomic increments (Node.js is native JSON)

•   Sharding

•   Solid ORM for Rails (MongoID)

•   B-Tree Indexes

•   Document based via JSON

•   TTLs for ephemeral data

                      Big Data Revolution is an   Eric Lubow  @elubow
                      Evolution                   #NYCassandra2013
Redis
•   Supports hundreds of thousands transactions per
    second

•   Great caching engine

•   Supports useful variable types like sets, sorted set,
    lists

•   Everything is guaranteed to be Memory Mapped

                      Big Data Revolution is an   Eric Lubow  @elubow
                      Evolution                   #NYCassandra2013
Infobright
•   Works with standard MySQL driver

•   Column Stores for ad-hoc analytics queries
    in SQL

•   Heavy compression of data (avg 12:1)




                     Big Data Revolution is an   Eric Lubow  @elubow
                     Evolution                   #NYCassandra2013
The
•   c0dez
    Polyglottany doesn’t only apply to data stores

•   Each language has its own benefit to each stack
    layer

•   Each language has its own individual benefits

•   Each language has its own development benefits



                     Big Data Revolution is an   Eric Lubow  @elubow
                     Evolution                   #NYCassandra2013
Big Data Revolution is an   Eric Lubow  @elubow
Evolution                   #NYCassandra2013
Cons
•   Redis - Can only utilize a single core. SerDe price.

•   Infobright - DELETE/UPDATEs are VERY expensive

•   Cassandra - No btree indexes or probabilistic counters

•   Mongo - Indexes must fit in memory. Forced Replica ping times

•   Python - Whitespace. Community

•   Ruby - Not high performance enough for our standards
                     Big Data Revolution is an   Eric Lubow  @elubow
                     Evolution                   #NYCassandra2013
•
    Evolution Takes Work
    Service Oriented Architecture (Internal API)

•   Data accuracy checks: visual and programmatic

•   Built framework for testing out engines (Storage,
    Queueing, etc)

•   Access to many toolsets (for all languages, DBs, Engines)




                      Big Data Revolution is an   Eric Lubow  @elubow
                      Evolution                   #NYCassandra2013
Service
  Solr
  C*
Real-time
  C*
                      Internal API


            Big Data Revolution is an   Eric Lubow  @elubow
            Evolution                   #NYCassandra2013
Path of a Packet
           Fire                                                 Solr
           Hos
                                                                 C*




                                                 Internal API
                                  Consumers
           EP



                       Queue
Internet                                                        Mong
           API
                                                                Redis

           SC                                                    IB

                  Big Data Revolution is an   Eric Lubow  @elubow
                  Evolution                   #NYCassandra2013
Architecture Distribution
    US-EAST-1a                  US-EAST-1b               US-EAST-1e

  CASSANDRA-0001            CASSANDRA-0002             CASSANDRA-0003

  CASSANDRA-0010            CASSANDRA-0011             CASSANDRA-0012

    REDIS-0001A                REDIS-0001B

   INFOBRIGHT-00                                        INFOBRIGHT-00
         01                                                   02

MONGO-SHARD-0000-A                                  MONGO-SHARD-0000-B

MONGO-SHARD-0001-B       MONGO-SHARD-0001-A

                         MONGO-SHARD-0002-B         MONGO-SHARD-0002-A

     iAPI-0001                   iAPI-0002                 iAPI-0003

                   Big Data Revolution is an   Eric Lubow  @elubow
                   Evolution                   #NYCassandra2013
The Schrute of the Problem




     Big Data Revolution is an   Eric Lubow  @elubow
     Evolution                   #NYCassandra2013
Evolving Amazon Tools            •   CloudSearch
•   Full Featured API
                                     •   Elastic Beanstalk
•   Simple Queuing Service
                                     •   Elastic MapReduce
•   Data Pipelining
                                     •   Simple Workflow Coordinator
•   OpsWorks
                                     •   S3 / Glacier
•   Cloud Formation

•   Redshift Analytics
                         Big Data Revolution is an   Eric Lubow  @elubow
                         Evolution                   #NYCassandra2013
DevOps Wizardry
•   Extensive use of AWS

•   Monitor: Nagios, Statsd, and Graphite

•   Manage: Chef, OpsWorks, cSSHx

•   Deployments




                     Big Data Revolution is an   Eric Lubow  @elubow
                     Evolution                   #NYCassandra2013
•
    Summary
    Solutions Require Evolution

•   Build, Use, and Integrate Tools

•   Abstraction

•   Distribution

•   Monitoring & Automation



                     Big Data Revolution is an   Eric Lubow  @elubow
                     Evolution                   #NYCassandra2013
Evolution Takes
Time
A revolution only lasts fifteen
years, a period which
coincides with the


          Big Data Revolution is an   Eric Lubow  @elubow
          Evolution                   #NYCassandra2013
We’re
(Ask us about Foodis an
      Big Data Revolution Coma Fridays)
                               Eric Lubow   @elubow
       Evolution                #NYCassandra2013
Questions are guaranteed in life.
Answers aren’t.
                                      Eric Lubow

                                      @elubow

                                      elubow@simplereach.co
                                      Thank

          Big Data Revolution is an
                                      you.
                                         Eric Lubow  @elubow
          Evolution                      #NYCassandra2013

More Related Content

Viewers also liked

The evolution of web and big data
The evolution of web and big dataThe evolution of web and big data
The evolution of web and big data
Edward Yoon
 
Introduction to Monads in Scala (2)
Introduction to Monads in Scala (2)Introduction to Monads in Scala (2)
Introduction to Monads in Scala (2)
stasimus
 
Introduction to Monads in Scala (1)
Introduction to Monads in Scala (1)Introduction to Monads in Scala (1)
Introduction to Monads in Scala (1)
stasimus
 

Viewers also liked (9)

The evolution of web and big data
The evolution of web and big dataThe evolution of web and big data
The evolution of web and big data
 
Introduction to Monads in Scala (2)
Introduction to Monads in Scala (2)Introduction to Monads in Scala (2)
Introduction to Monads in Scala (2)
 
The evolution of data analytics
The evolution of data analyticsThe evolution of data analytics
The evolution of data analytics
 
Introduction to Monads in Scala (1)
Introduction to Monads in Scala (1)Introduction to Monads in Scala (1)
Introduction to Monads in Scala (1)
 
A Brief History of Big Data
A Brief History of Big DataA Brief History of Big Data
A Brief History of Big Data
 
Big Data: Evolution? Game Changer? Definitely
Big Data: Evolution? Game Changer? DefinitelyBig Data: Evolution? Game Changer? Definitely
Big Data: Evolution? Game Changer? Definitely
 
Big Data
Big DataBig Data
Big Data
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Big Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should KnowBig Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should Know
 

More from DataStax Academy

Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
DataStax Academy
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
DataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
DataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
DataStax Academy
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
DataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
DataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
DataStax Academy
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
DataStax Academy
 

More from DataStax Academy (20)

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
 
Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra Driver
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
 
Advanced Cassandra
Advanced CassandraAdvanced Cassandra
Advanced Cassandra
 

Recently uploaded

Recently uploaded (20)

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

The Big Data Revolution is an Evolution

  • 1. The Big Data Revolution is an Eric Lubow @elubow elubow@simplereach.co
  • 2. Overvie • Evolution • SimpleReach • Data Stores / Languages • Architecture Implementation Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 3. We're in the midst of an evolution, not a revolution. Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 4. The 2 Truths Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 5. The Real Truth Even with the right tools, 80% of the work of building a big data system is acquiring and refining Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 6. 30m plays/day + 4m user ratings + 75k movies metadata + 24.4m use metadata = David Fincher + Kevin Mitch Hurwitz + Will Arnett + Spacey + British House of Jason Bateman + Arrested Cards Development Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 7. BRING IT TOGETHE Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 8. revolution evolution Insufficient New Products Capabilities Scale/Need Development & Changes Integration Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 9. Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 10. Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 11. SimpleReach • Millions of URLs per day • Over 1 billion pageviews per month • 250m events per day (~3k events/second) • Auto-scale 90-130 machines depending on traffic Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 12. HUMBLE BEGINNINGS Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 13. Scale Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 14. AND THEN... C* Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 15. Cassandra C* • Large data volume ingestion at high velocity • Really fast writes to many locations (eventual consistency) • Query by column groups within rows (slicing) • TTLs for small group aggregation • Wrote Helenus, Node.js driver for Cassandra Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 16. MongoDB Fast atomic increments (Node.js is native JSON) • Sharding • Solid ORM for Rails (MongoID) • B-Tree Indexes • Document based via JSON • TTLs for ephemeral data Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 17. Redis • Supports hundreds of thousands transactions per second • Great caching engine • Supports useful variable types like sets, sorted set, lists • Everything is guaranteed to be Memory Mapped Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 18. Infobright • Works with standard MySQL driver • Column Stores for ad-hoc analytics queries in SQL • Heavy compression of data (avg 12:1) Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 19. The • c0dez Polyglottany doesn’t only apply to data stores • Each language has its own benefit to each stack layer • Each language has its own individual benefits • Each language has its own development benefits Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 20. Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 21. Cons • Redis - Can only utilize a single core. SerDe price. • Infobright - DELETE/UPDATEs are VERY expensive • Cassandra - No btree indexes or probabilistic counters • Mongo - Indexes must fit in memory. Forced Replica ping times • Python - Whitespace. Community • Ruby - Not high performance enough for our standards Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 22. Evolution Takes Work Service Oriented Architecture (Internal API) • Data accuracy checks: visual and programmatic • Built framework for testing out engines (Storage, Queueing, etc) • Access to many toolsets (for all languages, DBs, Engines) Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 23. Service Solr C* Real-time C* Internal API Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 24. Path of a Packet Fire Solr Hos C* Internal API Consumers EP Queue Internet Mong API Redis SC IB Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 25. Architecture Distribution US-EAST-1a US-EAST-1b US-EAST-1e CASSANDRA-0001 CASSANDRA-0002 CASSANDRA-0003 CASSANDRA-0010 CASSANDRA-0011 CASSANDRA-0012 REDIS-0001A REDIS-0001B INFOBRIGHT-00 INFOBRIGHT-00 01 02 MONGO-SHARD-0000-A MONGO-SHARD-0000-B MONGO-SHARD-0001-B MONGO-SHARD-0001-A MONGO-SHARD-0002-B MONGO-SHARD-0002-A iAPI-0001 iAPI-0002 iAPI-0003 Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 26. The Schrute of the Problem Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 27. Evolving Amazon Tools • CloudSearch • Full Featured API • Elastic Beanstalk • Simple Queuing Service • Elastic MapReduce • Data Pipelining • Simple Workflow Coordinator • OpsWorks • S3 / Glacier • Cloud Formation • Redshift Analytics Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 28. DevOps Wizardry • Extensive use of AWS • Monitor: Nagios, Statsd, and Graphite • Manage: Chef, OpsWorks, cSSHx • Deployments Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 29. Summary Solutions Require Evolution • Build, Use, and Integrate Tools • Abstraction • Distribution • Monitoring & Automation Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 30. Evolution Takes Time A revolution only lasts fifteen years, a period which coincides with the Big Data Revolution is an Eric Lubow @elubow Evolution #NYCassandra2013
  • 31. We’re (Ask us about Foodis an Big Data Revolution Coma Fridays) Eric Lubow @elubow Evolution #NYCassandra2013
  • 32. Questions are guaranteed in life. Answers aren’t. Eric Lubow @elubow elubow@simplereach.co Thank Big Data Revolution is an you. Eric Lubow @elubow Evolution #NYCassandra2013