SlideShare uma empresa Scribd logo
1 de 60
How to
   Win at Scale
      and its
Influence on People

     Philip (flip) Kromer
    CTO, Infochimps.com
Big Data is Inevitable

It Demands a New Approach
There’s Another Way
There’s Another Way

You’re Going to Have to
        follow It
There’s Another Way

You’re Going to Have to
        follow It

It Might be a Better Way
The Other Way
Massive component count
Federated Truth
   email
                        MySQL            HBase         s3
spreadsheets
                     elasticsearch    elasticsearch
                                                      HDFS
           hipchat
                         redis           mongo

                      MongoDB           log files
 salesforce
                                         zabbix
                      hubspot
                                                       ADP
    Chargify
                                                      BC/BS
                     ZenDesk         google docs
Low Coupling
Reliable   Resilient
• Manage 100s of machines: architecture as code
• Contain system complexity: relentlessly decouple
• Maintain coherency: federated truth
• Manage true costs: optimize for people not machines
• Manage failure & change:resiliency engineering
The Other Way

Declarative, not Homogenous
Decoupled, not Standardized
 Federated, not Centralized
    Simple, not Performant
  Resilient, not Reliable
Declarative
Architecture as Code
           Lightweight           Lightweight
            Dashboard
                                 Dashboard
                                                                      HBase
                                                      HBase


                                                                       API
          Data Transport
                           ESh            flume

                                                   ElasticSearch   ElasticSearch


           Operations               Application


Ironfan
   +               ops               ics.com      Hadoop            On-Demand
                                                                     Hadoop




  Chef
Lightweight           Lightweight
  Dashboard
                       Dashboard
                                                            HBase
                                            HBase


                                                             API
Data Transport
                 ESh            flume

                                         ElasticSearch   ElasticSearch


 Operations               Application




         ops               ics.com      Hadoop            On-Demand
                                                           Hadoop




   HM NN ZK

              RS                        RS

              RS                        RS

              RS                        RS
provision machine

run state

settings

standard components

cluster-specific

facet groups
Lightweight
  Dashboard
                       Lightweight
                       Dashboard
                                            HBase
                                                            HBase
                                                                         HM NN ZK
                                                             API
Data Transport
                 ESh            flume

                                         ElasticSearch

                                                                         RS   RS
                                                         ElasticSearch


 Operations               Application




         ops               ics.com      Hadoop            On-Demand
                                                           Hadoop




                                                                         RS   RS

                                                                         RS   RS




                                        regionserver                           ssh
                                                                               nfs
                                                         datanode
                                                                               zbx
                                                         stargate              log
                                               tasktracker                      fw

                                                    zookeeper
Wins
from Declarative
   Lightweight           Lightweight
    Dashboard
                         Dashboard
                                                              HBase
                                              HBase


                                                               API
  Data Transport
                   ESh            flume

                                           ElasticSearch   ElasticSearch


   Operations               Application




           ops               ics.com      Hadoop            On-Demand
                                                             Hadoop
Recapitulatable
Portable
Decoupled
Our Stack
 Lightweight           Lightweight
  Dashboard
                       Dashboard
                                                            HBase
                                            HBase


                                                             API
Data Transport
                 ESh            flume

                                         ElasticSearch   ElasticSearch


 Operations               Application




         ops               ics.com      Hadoop            On-Demand
                                                           Hadoop
Our Stack
Our Stack
Engineer : System = 1:10


• >60 distinct components
• 50-150 machines
• 1 ops + 5 hackers + 1 analyst
Self-similar
 Lightweight           Lightweight
  Dashboard
                       Dashboard
                                                            HBase
                                            HBase


                                                             API
Data Transport
                 ESh            flume

                                         ElasticSearch   ElasticSearch


 Operations               Application




         ops               ics.com      Hadoop            On-Demand
                                                           Hadoop
                                                                         HM NN ZK

                                                                         RS   RS                   ssh                      ssh
                                                                                                             hb 2d mstr
                                                                                     hb master     nfs                      nfs
                                                                         RS   RS    namenode       zbx          2d nn       zbx
                                                                                                   log        jobtracker    log
                                                                                     zookeeper
                                                                         RS   RS                    fw        zookeeper      fw
                                                                                                     alpha                        beta



                                                                                    regionserver   ssh       regionserver   ssh
                                                                                                   nfs                      nfs
                                                                                     datanode                 datanode
                                                                                                   zbx                      zbx
                                                                                      stargate     log         stargate     log
                                                                                    tasktracker     fw       tasktracker     fw

                                                                                     zookeeper       gamma                    delta
Example: Scraper

Scraper     disk   tail’er   decorator     sink



 Jobs                                    database
Scraper
                                flume
Scraper    disk     tail’er   decorator     sink



 Jobs                                     database


  while true:
   get_job
   fetch_url
   dump_to_disk
Scraper
                                flume
Scraper    disk     tail’er   decorator     sink



 Jobs                                     database


  while true:   ensures
   get_job      reliable
   fetch_url    delivery
   dump_to_disk
Scraper
                                flume
Scraper    disk     tail’er   decorator     sink



 Jobs                                     database


  while true:   ensures       parse
   get_job      reliable      raw
   fetch_url    delivery      =>
   dump_to_disk               objects
Scraper
                                flume
Scraper    disk     tail’er   decorator     sink



 Jobs                                     database


  while true:   ensures       parse       store
   get_job      reliable      raw         object
   fetch_url    delivery      =>          =>
   dump_to_disk               objects     database
alice


alice

bob

alice

bob


bob
Simple
• Immediately Understandable
• Clear Interface
• Few Moving Parts
Federated
Data Stores in Production

• HBase           • MySQL
• ElasticSearch   • Redis
• Cassandra       • sqlite
• TokyoTyrant     • whisper (graphite)
• SimpleDB        • file system
• MongoDB         • S3
Programs Used for This Talk

• Emacs        • Skitch
• Keynote      • finder
• Preview      • flickr.com
• Chrome       • google image search
• ruby (pry)   • ssh
How’s my Batch Job Going?

• 1 x Job Status
• 1 x Counters & App Metrics
• N x Task Status
• M x Machine System Stats
• 1 x Cloud Status
• 1 x Chef Server
Dataflow is All
Lightweight           Lightweight
  Dashboard
                       Dashboard
                                                            HBase
                                            HBase


                                                             API
Data Transport
                 ESh            flume

                                         ElasticSearch   ElasticSearch


 Operations               Application




         ops               ics.com      Hadoop            On-Demand
                                                           Hadoop




       System Diagram                                                    Dataflow




                 Workflow
Lightweight           Lightweight
  Dashboard
                       Dashboard
                                                            HBase
                                            HBase


                                                             API
Data Transport
                 ESh            flume

                                         ElasticSearch   ElasticSearch


 Operations               Application




         ops               ics.com      Hadoop            On-Demand
                                                           Hadoop




       System Diagram                                                    Dataflow




                 Workflow                                                 Org Chart
Robots are Cheap

People are Important
Expensive / Not Expensive
1 trillion 10 kb objects:
 • 100 % in RAM: 	

$ 212,000 /mo
 • 10% in Ram: 	

 $ 21,000 /mo
 • On Disk:           	

$ 3,000 /mo
 • On S3:          	

 $ 1,200 /mo
Expensive / Not Expensive
1 trillion 10 kb objects:
 • 100 % in RAM: 	

$ 212,000 /mo
 • 10% in Ram: 	

 $ 21,000 /mo
 • On Disk:           	

$ 3,000 /mo
 • On S3:          	

 $ 1,200 /mo
1 Intern, part-time: 	

$   1,500 /mo
Scalability
    is
  People
Monolithic Software




 means Meetings
Meetings




are Death
Decentralize. Decouple.
n^2 law of coupling




100 things   5 + 3 + 2 things
                    + 2 (tax)
n^2 law of coupling
                       2500
                           +
                        900
                           +
                        400
                           +
                        400
                           =
10,000 things    4200 things
to go wrong     to go wrong
Infochimps.com 2011
                  text search

                                Planet of the
                  API acct'g
                                    APIs

 infochimps.com     models


                  A/B testing


                     cloud
                    services
Infochimps.com 2012
           datasets    catalog API

           API docs
                       text search
           content

          dashboards                 Planet of the
                       API acct'g
                                         APIs
 auth &    payment
 layout
           console
                         models

                       A/B testing
             blog
            press         cloud
                         services
          collateral
Infochimps.com 2012
                                           (infochimps)
           icsexpl     catalog API
                                              (saas)


           capuchin
                       elasticsrch
            kanzi

          beergoggls                 Planet of the
                       MongoDB
                                         APIs
 george    george

          alphamale
                         MySQL

                          redis
          WPEngine
            totem         cloud
                         services
           hubspot
this drawing fits in my head


  datasets      catalog API



 this app fits in my head,
 and my laptop
Infochimps.com 2012
                                           (infochimps)
           icsexpl     catalog API
                                              (saas)


           capuchin
                       elasticsrch
            kanzi

          beergoggls                 Planet of the
                       MongoDB
                                         APIs
 george    george

          alphamale
                         MySQL

                          redis
          WPEngine
            totem         cloud
                         services
           hubspot
fin.

     http://infochimps.com
http://github.com/infochimps-labs

Mais conteúdo relacionado

Mais procurados

Hadoop Summit 2012 - Hadoop and Vertica: The Data Analytics Platform at Twitter
Hadoop Summit 2012 - Hadoop and Vertica: The Data Analytics Platform at TwitterHadoop Summit 2012 - Hadoop and Vertica: The Data Analytics Platform at Twitter
Hadoop Summit 2012 - Hadoop and Vertica: The Data Analytics Platform at Twitter
Bill Graham
 
SQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL DatabasesSQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL Databases
OReillyStrata
 
Scaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value StoresScaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value Stores
DataWorks Summit
 

Mais procurados (20)

MyCassandra (Full English Version)
MyCassandra (Full English Version)MyCassandra (Full English Version)
MyCassandra (Full English Version)
 
Realtime Apache Hadoop at Facebook
Realtime Apache Hadoop at FacebookRealtime Apache Hadoop at Facebook
Realtime Apache Hadoop at Facebook
 
MyCassandra: A Cloud Storage Supporting both Read Heavy and Write Heavy Workl...
MyCassandra: A Cloud Storage Supporting both Read Heavy and Write Heavy Workl...MyCassandra: A Cloud Storage Supporting both Read Heavy and Write Heavy Workl...
MyCassandra: A Cloud Storage Supporting both Read Heavy and Write Heavy Workl...
 
Hadoop Summit 2012 - Hadoop and Vertica: The Data Analytics Platform at Twitter
Hadoop Summit 2012 - Hadoop and Vertica: The Data Analytics Platform at TwitterHadoop Summit 2012 - Hadoop and Vertica: The Data Analytics Platform at Twitter
Hadoop Summit 2012 - Hadoop and Vertica: The Data Analytics Platform at Twitter
 
Processing Big Data
Processing Big DataProcessing Big Data
Processing Big Data
 
Building Enterprise Apps for Big Data with Cascading
Building Enterprise Apps for Big Data with CascadingBuilding Enterprise Apps for Big Data with Cascading
Building Enterprise Apps for Big Data with Cascading
 
Intro to Cascading (SpringOne2GX)
Intro to Cascading (SpringOne2GX)Intro to Cascading (SpringOne2GX)
Intro to Cascading (SpringOne2GX)
 
SQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL DatabasesSQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL Databases
 
Analyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache DrillAnalyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache Drill
 
Oct 2012 HUG: Project Panthera: Better Analytics with SQL, MapReduce, and HBase
Oct 2012 HUG: Project Panthera: Better Analytics with SQL, MapReduce, and HBaseOct 2012 HUG: Project Panthera: Better Analytics with SQL, MapReduce, and HBase
Oct 2012 HUG: Project Panthera: Better Analytics with SQL, MapReduce, and HBase
 
Building a Modern Data Warehouse: Deep Dive on Amazon Redshift - SRV337 - Chi...
Building a Modern Data Warehouse: Deep Dive on Amazon Redshift - SRV337 - Chi...Building a Modern Data Warehouse: Deep Dive on Amazon Redshift - SRV337 - Chi...
Building a Modern Data Warehouse: Deep Dive on Amazon Redshift - SRV337 - Chi...
 
Apache drill
Apache drillApache drill
Apache drill
 
Advanced analytics with sap hana and r
Advanced analytics with sap hana and rAdvanced analytics with sap hana and r
Advanced analytics with sap hana and r
 
Enterprise Data Workflows with Cascading and Windows Azure HDInsight
Enterprise Data Workflows with Cascading and Windows Azure HDInsightEnterprise Data Workflows with Cascading and Windows Azure HDInsight
Enterprise Data Workflows with Cascading and Windows Azure HDInsight
 
Rethinking SQL for Big Data with Apache Drill
Rethinking SQL for Big Data with Apache DrillRethinking SQL for Big Data with Apache Drill
Rethinking SQL for Big Data with Apache Drill
 
Scaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value StoresScaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value Stores
 
Cosbench apac
Cosbench apacCosbench apac
Cosbench apac
 
User Group Bi
User Group BiUser Group Bi
User Group Bi
 
Free Code Friday: Drill 101 - Basics of Apache Drill
Free Code Friday: Drill 101 - Basics of Apache DrillFree Code Friday: Drill 101 - Basics of Apache Drill
Free Code Friday: Drill 101 - Basics of Apache Drill
 
Liquidity Risk Management powered by SAP HANA
Liquidity Risk Management powered by SAP HANALiquidity Risk Management powered by SAP HANA
Liquidity Risk Management powered by SAP HANA
 

Destaque (7)

Hadoop administration
Hadoop administrationHadoop administration
Hadoop administration
 
Configuration management best practices
Configuration management best practicesConfiguration management best practices
Configuration management best practices
 
하둡2 YARN 짧게 보기
하둡2 YARN 짧게 보기하둡2 YARN 짧게 보기
하둡2 YARN 짧게 보기
 
하둡 HDFS 훑어보기
하둡 HDFS 훑어보기하둡 HDFS 훑어보기
하둡 HDFS 훑어보기
 
Zookeeper 소개
Zookeeper 소개Zookeeper 소개
Zookeeper 소개
 
20141029 하둡2.5와 hive설치 및 예제
20141029 하둡2.5와 hive설치 및 예제20141029 하둡2.5와 hive설치 및 예제
20141029 하둡2.5와 hive설치 및 예제
 
Understanding Enterprise Quality Management Systems (EQMS)
Understanding Enterprise Quality Management Systems (EQMS)Understanding Enterprise Quality Management Systems (EQMS)
Understanding Enterprise Quality Management Systems (EQMS)
 

Semelhante a The Other Way of Doing Big Data

Big data hadoop ecosystem and nosql
Big data hadoop ecosystem and nosqlBig data hadoop ecosystem and nosql
Big data hadoop ecosystem and nosql
Khanderao Kand
 
Big Data Real Time Applications
Big Data Real Time ApplicationsBig Data Real Time Applications
Big Data Real Time Applications
DataWorks Summit
 
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Amazon Web Services
 
Hw09 Making Hadoop Easy On Amazon Web Services
Hw09   Making Hadoop Easy On Amazon Web ServicesHw09   Making Hadoop Easy On Amazon Web Services
Hw09 Making Hadoop Easy On Amazon Web Services
Cloudera, Inc.
 

Semelhante a The Other Way of Doing Big Data (20)

Big data hadoop ecosystem and nosql
Big data hadoop ecosystem and nosqlBig data hadoop ecosystem and nosql
Big data hadoop ecosystem and nosql
 
Mar 2012 HUG: Hive with HBase
Mar 2012 HUG: Hive with HBaseMar 2012 HUG: Hive with HBase
Mar 2012 HUG: Hive with HBase
 
Architecting the Future of Big Data & Search - Eric Baldeschwieler
Architecting the Future of Big Data & Search - Eric BaldeschwielerArchitecting the Future of Big Data & Search - Eric Baldeschwieler
Architecting the Future of Big Data & Search - Eric Baldeschwieler
 
Handling not so big data
Handling not so big dataHandling not so big data
Handling not so big data
 
Techincal Talk Hbase-Ditributed,no-sql database
Techincal Talk Hbase-Ditributed,no-sql databaseTechincal Talk Hbase-Ditributed,no-sql database
Techincal Talk Hbase-Ditributed,no-sql database
 
Big Data Real Time Applications
Big Data Real Time ApplicationsBig Data Real Time Applications
Big Data Real Time Applications
 
Dataflow in 104corp - AWS UserGroup TW 2018
Dataflow in 104corp - AWS UserGroup TW 2018Dataflow in 104corp - AWS UserGroup TW 2018
Dataflow in 104corp - AWS UserGroup TW 2018
 
Hadoop and HBase on Amazon Web Services
Hadoop and HBase on Amazon Web Services Hadoop and HBase on Amazon Web Services
Hadoop and HBase on Amazon Web Services
 
Cloudera Sessions - Clinic 1 - Getting Started With Hadoop
Cloudera Sessions - Clinic 1 - Getting Started With HadoopCloudera Sessions - Clinic 1 - Getting Started With Hadoop
Cloudera Sessions - Clinic 1 - Getting Started With Hadoop
 
Cloud computing era
Cloud computing eraCloud computing era
Cloud computing era
 
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Sparkhbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
 
Building a Business on Hadoop, HBase, and Open Source Distributed Computing
Building a Business on Hadoop, HBase, and Open Source Distributed ComputingBuilding a Business on Hadoop, HBase, and Open Source Distributed Computing
Building a Business on Hadoop, HBase, and Open Source Distributed Computing
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
 
Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on Demand
Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on DemandApachecon Euro 2012: Elastic, Multi-tenant Hadoop on Demand
Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on Demand
 
Hurence
HurenceHurence
Hurence
 
Hw09 Making Hadoop Easy On Amazon Web Services
Hw09   Making Hadoop Easy On Amazon Web ServicesHw09   Making Hadoop Easy On Amazon Web Services
Hw09 Making Hadoop Easy On Amazon Web Services
 
Integration of Hive and HBase
Integration of Hive and HBaseIntegration of Hive and HBase
Integration of Hive and HBase
 
Zh tw cloud computing era
Zh tw cloud computing eraZh tw cloud computing era
Zh tw cloud computing era
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An Introduction
 

Mais de Infochimps, a CSC Big Data Business

Vayacondios: Divine into Complex Systems
Vayacondios: Divine into Complex SystemsVayacondios: Divine into Complex Systems
Vayacondios: Divine into Complex Systems
Infochimps, a CSC Big Data Business
 

Mais de Infochimps, a CSC Big Data Business (17)

Vayacondios: Divine into Complex Systems
Vayacondios: Divine into Complex SystemsVayacondios: Divine into Complex Systems
Vayacondios: Divine into Complex Systems
 
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
 
[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
 
AHUG Presentation: Fun with Hadoop File Systems
AHUG Presentation: Fun with Hadoop File SystemsAHUG Presentation: Fun with Hadoop File Systems
AHUG Presentation: Fun with Hadoop File Systems
 
Report: CIOs & Big Data
Report: CIOs & Big DataReport: CIOs & Big Data
Report: CIOs & Big Data
 
Infographic: CIOs & Big Data
Infographic: CIOs & Big DataInfographic: CIOs & Big Data
Infographic: CIOs & Big Data
 
5 Big Data Use Cases for 2013
5 Big Data Use Cases for 20135 Big Data Use Cases for 2013
5 Big Data Use Cases for 2013
 
451 Research Impact Report
451 Research Impact Report451 Research Impact Report
451 Research Impact Report
 
[Webinar] Top Strategies for Successful Big Data Projects
[Webinar] Top Strategies for Successful Big Data Projects[Webinar] Top Strategies for Successful Big Data Projects
[Webinar] Top Strategies for Successful Big Data Projects
 
[Webinar] High Speed Retail Analytics
[Webinar] High Speed Retail Analytics[Webinar] High Speed Retail Analytics
[Webinar] High Speed Retail Analytics
 
Infochimps + CloudCon: Infinite Monkey Theorem
Infochimps + CloudCon: Infinite Monkey TheoremInfochimps + CloudCon: Infinite Monkey Theorem
Infochimps + CloudCon: Infinite Monkey Theorem
 
Taming the Big Data Tsunami using Intel Architecture
Taming the Big Data Tsunami using Intel ArchitectureTaming the Big Data Tsunami using Intel Architecture
Taming the Big Data Tsunami using Intel Architecture
 
Real-Time Analytics: The Future of Big Data in the Agency
Real-Time Analytics: The Future of Big Data in the AgencyReal-Time Analytics: The Future of Big Data in the Agency
Real-Time Analytics: The Future of Big Data in the Agency
 
Ironfan: Your Foundation for Flexible Big Data Infrastructure
Ironfan: Your Foundation for Flexible Big Data InfrastructureIronfan: Your Foundation for Flexible Big Data Infrastructure
Ironfan: Your Foundation for Flexible Big Data Infrastructure
 
The Power of Elasticsearch
The Power of ElasticsearchThe Power of Elasticsearch
The Power of Elasticsearch
 
Case Study: Digital Agency Turbocharges Social Listening and Insights with t...
Case Study: Digital  Agency Turbocharges Social Listening and Insights with t...Case Study: Digital  Agency Turbocharges Social Listening and Insights with t...
Case Study: Digital Agency Turbocharges Social Listening and Insights with t...
 
Meet the Infochimps Platform
Meet the Infochimps PlatformMeet the Infochimps Platform
Meet the Infochimps Platform
 

Último

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 

The Other Way of Doing Big Data

  • 1. How to Win at Scale and its Influence on People Philip (flip) Kromer CTO, Infochimps.com
  • 2. Big Data is Inevitable It Demands a New Approach
  • 4. There’s Another Way You’re Going to Have to follow It
  • 5. There’s Another Way You’re Going to Have to follow It It Might be a Better Way
  • 8. Federated Truth email MySQL HBase s3 spreadsheets elasticsearch elasticsearch HDFS hipchat redis mongo MongoDB log files salesforce zabbix hubspot ADP Chargify BC/BS ZenDesk google docs
  • 10. Reliable Resilient
  • 11. • Manage 100s of machines: architecture as code • Contain system complexity: relentlessly decouple • Maintain coherency: federated truth • Manage true costs: optimize for people not machines • Manage failure & change:resiliency engineering
  • 12. The Other Way Declarative, not Homogenous Decoupled, not Standardized Federated, not Centralized Simple, not Performant Resilient, not Reliable
  • 14. Architecture as Code Lightweight Lightweight Dashboard Dashboard HBase HBase API Data Transport ESh flume ElasticSearch ElasticSearch Operations Application Ironfan + ops ics.com Hadoop On-Demand Hadoop Chef
  • 15. Lightweight Lightweight Dashboard Dashboard HBase HBase API Data Transport ESh flume ElasticSearch ElasticSearch Operations Application ops ics.com Hadoop On-Demand Hadoop HM NN ZK RS RS RS RS RS RS
  • 16. provision machine run state settings standard components cluster-specific facet groups
  • 17.
  • 18. Lightweight Dashboard Lightweight Dashboard HBase HBase HM NN ZK API Data Transport ESh flume ElasticSearch RS RS ElasticSearch Operations Application ops ics.com Hadoop On-Demand Hadoop RS RS RS RS regionserver ssh nfs datanode zbx stargate log tasktracker fw zookeeper
  • 19. Wins from Declarative Lightweight Lightweight Dashboard Dashboard HBase HBase API Data Transport ESh flume ElasticSearch ElasticSearch Operations Application ops ics.com Hadoop On-Demand Hadoop
  • 23. Our Stack Lightweight Lightweight Dashboard Dashboard HBase HBase API Data Transport ESh flume ElasticSearch ElasticSearch Operations Application ops ics.com Hadoop On-Demand Hadoop
  • 26. Engineer : System = 1:10 • >60 distinct components • 50-150 machines • 1 ops + 5 hackers + 1 analyst
  • 27. Self-similar Lightweight Lightweight Dashboard Dashboard HBase HBase API Data Transport ESh flume ElasticSearch ElasticSearch Operations Application ops ics.com Hadoop On-Demand Hadoop HM NN ZK RS RS ssh ssh hb 2d mstr hb master nfs nfs RS RS namenode zbx 2d nn zbx log jobtracker log zookeeper RS RS fw zookeeper fw alpha beta regionserver ssh regionserver ssh nfs nfs datanode datanode zbx zbx stargate log stargate log tasktracker fw tasktracker fw zookeeper gamma delta
  • 28. Example: Scraper Scraper disk tail’er decorator sink Jobs database
  • 29. Scraper flume Scraper disk tail’er decorator sink Jobs database while true: get_job fetch_url dump_to_disk
  • 30. Scraper flume Scraper disk tail’er decorator sink Jobs database while true: ensures get_job reliable fetch_url delivery dump_to_disk
  • 31. Scraper flume Scraper disk tail’er decorator sink Jobs database while true: ensures parse get_job reliable raw fetch_url delivery => dump_to_disk objects
  • 32. Scraper flume Scraper disk tail’er decorator sink Jobs database while true: ensures parse store get_job reliable raw object fetch_url delivery => => dump_to_disk objects database
  • 35.
  • 36. • Immediately Understandable • Clear Interface • Few Moving Parts
  • 38. Data Stores in Production • HBase • MySQL • ElasticSearch • Redis • Cassandra • sqlite • TokyoTyrant • whisper (graphite) • SimpleDB • file system • MongoDB • S3
  • 39. Programs Used for This Talk • Emacs • Skitch • Keynote • finder • Preview • flickr.com • Chrome • google image search • ruby (pry) • ssh
  • 40. How’s my Batch Job Going? • 1 x Job Status • 1 x Counters & App Metrics • N x Task Status • M x Machine System Stats • 1 x Cloud Status • 1 x Chef Server
  • 42. Lightweight Lightweight Dashboard Dashboard HBase HBase API Data Transport ESh flume ElasticSearch ElasticSearch Operations Application ops ics.com Hadoop On-Demand Hadoop System Diagram Dataflow Workflow
  • 43. Lightweight Lightweight Dashboard Dashboard HBase HBase API Data Transport ESh flume ElasticSearch ElasticSearch Operations Application ops ics.com Hadoop On-Demand Hadoop System Diagram Dataflow Workflow Org Chart
  • 44. Robots are Cheap People are Important
  • 45. Expensive / Not Expensive 1 trillion 10 kb objects: • 100 % in RAM: $ 212,000 /mo • 10% in Ram: $ 21,000 /mo • On Disk: $ 3,000 /mo • On S3: $ 1,200 /mo
  • 46. Expensive / Not Expensive 1 trillion 10 kb objects: • 100 % in RAM: $ 212,000 /mo • 10% in Ram: $ 21,000 /mo • On Disk: $ 3,000 /mo • On S3: $ 1,200 /mo 1 Intern, part-time: $ 1,500 /mo
  • 47. Scalability is People
  • 48.
  • 52. n^2 law of coupling 100 things 5 + 3 + 2 things + 2 (tax)
  • 53. n^2 law of coupling 2500 + 900 + 400 + 400 = 10,000 things 4200 things to go wrong to go wrong
  • 54.
  • 55. Infochimps.com 2011 text search Planet of the API acct'g APIs infochimps.com models A/B testing cloud services
  • 56. Infochimps.com 2012 datasets catalog API API docs text search content dashboards Planet of the API acct'g APIs auth & payment layout console models A/B testing blog press cloud services collateral
  • 57. Infochimps.com 2012 (infochimps) icsexpl catalog API (saas) capuchin elasticsrch kanzi beergoggls Planet of the MongoDB APIs george george alphamale MySQL redis WPEngine totem cloud services hubspot
  • 58. this drawing fits in my head datasets catalog API this app fits in my head, and my laptop
  • 59. Infochimps.com 2012 (infochimps) icsexpl catalog API (saas) capuchin elasticsrch kanzi beergoggls Planet of the MongoDB APIs george george alphamale MySQL redis WPEngine totem cloud services hubspot
  • 60. fin. http://infochimps.com http://github.com/infochimps-labs

Notas do Editor

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. This is on a 15-person organization\nFederated, meaning the data is semantically disparate\n
  9. \n
  10. \n
  11. people are walking around as if we used to have one kind of database and now we have two\nThe important fact isn’t that one of them is sharded \nThe important fact is that they’re proliferating -- and that’s a good thing.\n
  12. Google, Facebook, Amazon had to solve the scalability problem\n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n
  27. \n
  28. \n
  29. \n
  30. \n
  31. \n
  32. \n
  33. \n
  34. \n
  35. \n
  36. \n
  37. \n
  38. Now I know this sounds like the lunacy of a ritalin-addled architecture astronaut spending too much time on StackOverflow. \n
  39. Now I know this sounds like the lunacy of a ritalin-addled architecture astronaut spending too much time on StackOverflow. \n
  40. \n
  41. \n
  42. \n
  43. \n
  44. \n
  45. $200k on 146 Amazon EC2 m2.4xlarge\n$20k 10 TB Data, 10% Ram: $ / month, on 57 Amazon EC2 m2.xlarge\n$3k 10 TB Data, Disk: $ / month, on 6 Amazon EC2 c1.xlarge\n$1.2k 10 TB s3\n\n10 TB Ram: $ / month, on 146 Amazon EC2 m2.4xlarge \n 10_000 * 2.00 * 24 * 30.25 / 68.4 = \n $212,280\n10 TB Data, 10% Ram: $ / month, on 57 Amazon EC2 m2.xlarge \n 0.1 * 10_000 * 0.50 * 24 * 30.25 / 17.5 = \n $20,743\n10 TB Data, Disk: $ / month, on 6 Amazon EC2 c1.xlarge\n machines, price, disk, ram = [6, 0.68, 1_690, 7] ; [(tot_disk = disk * machines), (machine_dollars_mo = (machines * price * 24 * 30.25).round)] $2,962\n10 TB Data, S3: $1,250 / month\n1 intern, $10/hr, 25 hrs/wk, not incl. overhead: $1,100 / month\n\n
  46. $200k on 146 Amazon EC2 m2.4xlarge\n$20k 10 TB Data, 10% Ram: $ / month, on 57 Amazon EC2 m2.xlarge\n$3k 10 TB Data, Disk: $ / month, on 6 Amazon EC2 c1.xlarge\n$1.2k 10 TB s3\n\n10 TB Ram: $ / month, on 146 Amazon EC2 m2.4xlarge \n 10_000 * 2.00 * 24 * 30.25 / 68.4 = \n $212,280\n10 TB Data, 10% Ram: $ / month, on 57 Amazon EC2 m2.xlarge \n 0.1 * 10_000 * 0.50 * 24 * 30.25 / 17.5 = \n $20,743\n10 TB Data, Disk: $ / month, on 6 Amazon EC2 c1.xlarge\n machines, price, disk, ram = [6, 0.68, 1_690, 7] ; [(tot_disk = disk * machines), (machine_dollars_mo = (machines * price * 24 * 30.25).round)] $2,962\n10 TB Data, S3: $1,250 / month\n1 intern, $10/hr, 25 hrs/wk, not incl. overhead: $1,100 / month\n\n
  47. \n
  48. \n
  49. \n
  50. \n
  51. \n
  52. \n
  53. \n
  54. \n
  55. \n
  56. \n
  57. \n
  58. \n
  59. \n
  60. \n