SlideShare a Scribd company logo
1 of 51
Download to read offline
Firehose
Engineering
designing
high-volume
data collection
systems
                   Josh Berkus
                    FOSS4G-NA
                  Wash. DC 2012
Firehose Database
   Applications (FDA)
(1) very high volume of data
    input from many automated
    producers
(2) continuous processing and
    aggregation of incoming
    data
Mozilla Socorro
Upwind
GPS Applications
Firehose Challenges
1. Volume




●   100's to 1000's facts/second
●   GB/hour
1. Volume




●   spikes in volume
●   multiple uncoorindated sources
1. Volume




volume always grows over time
2. Constant flow
since data arrives 24/7 …

while the user interface can be
down, data collection can never
be down
2. Constant flow
        ●   can't stop
            receiving to
            process
ETL     ●   data can arrive
            out of order
3. Database size
●       terabytes to petabytes
    ●   lots of hardware
    ●   single-node DBMSes aren't enough
    ●   difficult backups, redundancy,
        migration
    ●   analytics are resource-consumptive
3. Database size
●       database growth
    ●   size grows quickly
    ●   need to expand storage
    ●   estimate target data size
    ●   create data ageing policies
3. Database size

    “We will decide on a data
retention policy when we run out
         of disk space.”
         – every business user everywhere
4.
many components
 = many failures
4. Component failure
●       all components fail
    ●   or need scheduled downtime
    ●   including the network
●       collection must continue
●       collection & processing must
        recover
solving firehose problems
socorro project
http://crash-stats.mozilla.com
Mozilla Socorro

                          webservers




collectors

                           reports
             processors
socorro data volume
●       3000 crashes/minute
    ●   avg. size 150K
●       40TB accumulated raw data
    ●   500GB accumulated metadata /
        reports
dealing with volume




  load balancers   collectors
dealing with volume




monitor   processors
dealing with size
             data
            40TB
       expandible


   metadata
   views
   500GB
   fixed size
dealing with
       component failure
Lots of hardware ...
  ●   30 Hbase       ●   3 ES servers
      nodes          ●   6 collectors
  ●   2 PostgreSQL   ●   12 processors
      servers        ●   8 middleware &
  ●   6 load             web servers
      balancers

                     … lots of failures
load balancing &
   redundancy




    load balancers   collectors
elastic connections
●       components queue their data
    ●   retain it if other nodes are down
●       components resume work
        automatically
    ●   when other nodes come back up
elastic connections
                        crash mover

           collector




reciever               local file queue
server management
●       puppet
    ●   controls configuration of all servers
    ●   makes sure servers recover
    ●   allows rapid deployment of
        replacement nodes
Upwind
Upwind
 ●   speed
 ●   wind speed
 ●   heat
 ●   vibration
 ●   noise
 ●   direction
Upwind
 1. maximize
    power
    generation
 2. make sure
    turbine isn't
    damaged
dealing with volume
each turbine:
            90 to 700 facts/second
windmills per farm:      up to 100
number of farms:                40+
est. total: 300,000 facts/second
                         (will grow)
dealing with volume


   local     historian   analytic   reports
   storage               database




   local     historian   analytic   reports
   storage               database
dealing with volume


   local     historian   analytic
   storage               database




                                    master
                                    database
   local     historian   analytic
   storage               database
multi-tenant partitioning
●       partition the whole application
    ●   each customer gets their own
        toolchain
●       allows scaling with the
        number of customers
    ●   lowers efficiency
    ●   more efficient with virtualization
dealing with:
        constant flow and size

             minute       hours     days     months   years
historian
             buffer       table     table     table   table



                         minute     hours    days     months
            historian
                         buffer     table    table     table



                                    minute   hours    days
                        historian
                                    buffer   table    table
time-based rollups
●       continuously accumulate
        levels of rollup
    ●   each is based on the level below it
    ●   data is always appended, never
        updated
    ●   small windows == small resources
time-based rollups
●       allows:
    ●   very rapid summary reports for
        different windows
    ●   retaining different summaries for
        different levels of time
    ●   batch/out-of-order processing
    ●   summarization in parallel
firehose tips
data collection must be:

   ●   continuous
   ●   parallel
   ●   fault-tolerant
data processing must be:

   ●   continuous
   ●   parallel
   ●   fault-tolerant
every component
     must be able to fail
●   including the network
●   without too much data loss
●   other components must
    continue
5 tools to use
1.   queueing software
2.   buffering techniques
3.   materialized views
4.   configuration management
5.   comprehensive monitoring
4 don'ts
1.   use cutting-edge technology
2.   use untested hardware
3.   run components to capacity
4.   do hot patching
master your own
Contact
●       Josh Berkus: josh@pgexperts.com
    ●   blog: blogs.ittoolbox.com/database/soup
●       PostgreSQL: www.postgresql.org
    ●   pgexperts: www.pgexperts.com
●       Postgres BOF 3:30PM Today
    ●   pgCon, Ottawa, May 2012
    ●   Postgres Open, Chicago, Sept. 2012

        The text and diagrams in this talk is copyright 2011 Josh Berkus and is licensed under the creative
        commons attribution license. Title slide image and other fireman images are licensed from
        iStockPhoto and may not be reproduced or redistributed separate from this presentation. Socorro
        images are copyright 2011 Mozilla Inc.

More Related Content

What's hot

Red Hat Gluster Storage Performance
Red Hat Gluster Storage PerformanceRed Hat Gluster Storage Performance
Red Hat Gluster Storage PerformanceRed_Hat_Storage
 
Realtime Indexing for Fast Queries on Massive Semi-Structured Data
Realtime Indexing for Fast Queries on Massive Semi-Structured DataRealtime Indexing for Fast Queries on Massive Semi-Structured Data
Realtime Indexing for Fast Queries on Massive Semi-Structured DataScyllaDB
 
State of Gluster Performance
State of Gluster PerformanceState of Gluster Performance
State of Gluster PerformanceGluster.org
 
Life as a GlusterFS Consultant with Ivan Rossi
Life as a GlusterFS Consultant with Ivan RossiLife as a GlusterFS Consultant with Ivan Rossi
Life as a GlusterFS Consultant with Ivan RossiGluster.org
 
Gluster fs architecture_future_directions_tlv
Gluster fs architecture_future_directions_tlvGluster fs architecture_future_directions_tlv
Gluster fs architecture_future_directions_tlvSahina Bose
 
Pgxc scalability pg_open2012
Pgxc scalability pg_open2012Pgxc scalability pg_open2012
Pgxc scalability pg_open2012Ashutosh Bapat
 
Supersized PostgreSQL: Postgres-XL for Scale-Out OLTP and Big Data Analytics
Supersized PostgreSQL: Postgres-XL for Scale-Out OLTP and Big Data AnalyticsSupersized PostgreSQL: Postgres-XL for Scale-Out OLTP and Big Data Analytics
Supersized PostgreSQL: Postgres-XL for Scale-Out OLTP and Big Data Analyticsmason_s
 
Erasure codes and storage tiers on gluster
Erasure codes and storage tiers on glusterErasure codes and storage tiers on gluster
Erasure codes and storage tiers on glusterRed_Hat_Storage
 
Google File Systems
Google File SystemsGoogle File Systems
Google File SystemsAzeem Mumtaz
 
Introduction to Postrges-XC
Introduction to Postrges-XCIntroduction to Postrges-XC
Introduction to Postrges-XCAshutosh Bapat
 
Gluster fs architecture_&_roadmap_atin_punemeetup_2015
Gluster fs architecture_&_roadmap_atin_punemeetup_2015Gluster fs architecture_&_roadmap_atin_punemeetup_2015
Gluster fs architecture_&_roadmap_atin_punemeetup_2015Atin Mukherjee
 
2019.06.27 Intro to Ceph
2019.06.27 Intro to Ceph2019.06.27 Intro to Ceph
2019.06.27 Intro to CephCeph Community
 
Google file system GFS
Google file system GFSGoogle file system GFS
Google file system GFSzihad164
 
Challenges with Gluster and Persistent Memory with Dan Lambright
Challenges with Gluster and Persistent Memory with Dan LambrightChallenges with Gluster and Persistent Memory with Dan Lambright
Challenges with Gluster and Persistent Memory with Dan LambrightGluster.org
 
MapReduce:Simplified Data Processing on Large Cluster Presented by Areej Qas...
MapReduce:Simplified Data Processing on Large Cluster  Presented by Areej Qas...MapReduce:Simplified Data Processing on Large Cluster  Presented by Areej Qas...
MapReduce:Simplified Data Processing on Large Cluster Presented by Areej Qas...areej qasrawi
 
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii VozniukCloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii VozniukAndrii Vozniuk
 
DSD-INT 2017 The use of big data for dredging - De Boer
DSD-INT 2017 The use of big data for dredging - De BoerDSD-INT 2017 The use of big data for dredging - De Boer
DSD-INT 2017 The use of big data for dredging - De BoerDeltares
 
PGConf.ASIA 2019 Bali - Performance Analysis at Full Power - Julien Rouhaud
PGConf.ASIA 2019 Bali - Performance Analysis at Full Power - Julien RouhaudPGConf.ASIA 2019 Bali - Performance Analysis at Full Power - Julien Rouhaud
PGConf.ASIA 2019 Bali - Performance Analysis at Full Power - Julien RouhaudEqunix Business Solutions
 

What's hot (20)

Red Hat Gluster Storage Performance
Red Hat Gluster Storage PerformanceRed Hat Gluster Storage Performance
Red Hat Gluster Storage Performance
 
Realtime Indexing for Fast Queries on Massive Semi-Structured Data
Realtime Indexing for Fast Queries on Massive Semi-Structured DataRealtime Indexing for Fast Queries on Massive Semi-Structured Data
Realtime Indexing for Fast Queries on Massive Semi-Structured Data
 
State of Gluster Performance
State of Gluster PerformanceState of Gluster Performance
State of Gluster Performance
 
Life as a GlusterFS Consultant with Ivan Rossi
Life as a GlusterFS Consultant with Ivan RossiLife as a GlusterFS Consultant with Ivan Rossi
Life as a GlusterFS Consultant with Ivan Rossi
 
Gluster fs architecture_future_directions_tlv
Gluster fs architecture_future_directions_tlvGluster fs architecture_future_directions_tlv
Gluster fs architecture_future_directions_tlv
 
Google file system
Google file systemGoogle file system
Google file system
 
Pgxc scalability pg_open2012
Pgxc scalability pg_open2012Pgxc scalability pg_open2012
Pgxc scalability pg_open2012
 
Supersized PostgreSQL: Postgres-XL for Scale-Out OLTP and Big Data Analytics
Supersized PostgreSQL: Postgres-XL for Scale-Out OLTP and Big Data AnalyticsSupersized PostgreSQL: Postgres-XL for Scale-Out OLTP and Big Data Analytics
Supersized PostgreSQL: Postgres-XL for Scale-Out OLTP and Big Data Analytics
 
Erasure codes and storage tiers on gluster
Erasure codes and storage tiers on glusterErasure codes and storage tiers on gluster
Erasure codes and storage tiers on gluster
 
Google File Systems
Google File SystemsGoogle File Systems
Google File Systems
 
GlusterFS And Big Data
GlusterFS And Big DataGlusterFS And Big Data
GlusterFS And Big Data
 
Introduction to Postrges-XC
Introduction to Postrges-XCIntroduction to Postrges-XC
Introduction to Postrges-XC
 
Gluster fs architecture_&_roadmap_atin_punemeetup_2015
Gluster fs architecture_&_roadmap_atin_punemeetup_2015Gluster fs architecture_&_roadmap_atin_punemeetup_2015
Gluster fs architecture_&_roadmap_atin_punemeetup_2015
 
2019.06.27 Intro to Ceph
2019.06.27 Intro to Ceph2019.06.27 Intro to Ceph
2019.06.27 Intro to Ceph
 
Google file system GFS
Google file system GFSGoogle file system GFS
Google file system GFS
 
Challenges with Gluster and Persistent Memory with Dan Lambright
Challenges with Gluster and Persistent Memory with Dan LambrightChallenges with Gluster and Persistent Memory with Dan Lambright
Challenges with Gluster and Persistent Memory with Dan Lambright
 
MapReduce:Simplified Data Processing on Large Cluster Presented by Areej Qas...
MapReduce:Simplified Data Processing on Large Cluster  Presented by Areej Qas...MapReduce:Simplified Data Processing on Large Cluster  Presented by Areej Qas...
MapReduce:Simplified Data Processing on Large Cluster Presented by Areej Qas...
 
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii VozniukCloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
 
DSD-INT 2017 The use of big data for dredging - De Boer
DSD-INT 2017 The use of big data for dredging - De BoerDSD-INT 2017 The use of big data for dredging - De Boer
DSD-INT 2017 The use of big data for dredging - De Boer
 
PGConf.ASIA 2019 Bali - Performance Analysis at Full Power - Julien Rouhaud
PGConf.ASIA 2019 Bali - Performance Analysis at Full Power - Julien RouhaudPGConf.ASIA 2019 Bali - Performance Analysis at Full Power - Julien Rouhaud
PGConf.ASIA 2019 Bali - Performance Analysis at Full Power - Julien Rouhaud
 

Viewers also liked

Safety LAMP: data security & agile languages
Safety LAMP: data security & agile languagesSafety LAMP: data security & agile languages
Safety LAMP: data security & agile languagesPostgreSQL Experts, Inc.
 
Scaling Wanelo.com 100x in Six Months
Scaling Wanelo.com 100x in Six MonthsScaling Wanelo.com 100x in Six Months
Scaling Wanelo.com 100x in Six MonthsKonstantin Gredeskoul
 
From Obvious to Ingenius: Incrementally Scaling Web Apps on PostgreSQL
From Obvious to Ingenius: Incrementally Scaling Web Apps on PostgreSQLFrom Obvious to Ingenius: Incrementally Scaling Web Apps on PostgreSQL
From Obvious to Ingenius: Incrementally Scaling Web Apps on PostgreSQLKonstantin Gredeskoul
 
Webscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Webscale PostgreSQL - JSONB and Horizontal Scaling StrategiesWebscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Webscale PostgreSQL - JSONB and Horizontal Scaling StrategiesJonathan Katz
 
PostgreSQL Replication in 10 Minutes - SCALE
PostgreSQL Replication in 10  Minutes - SCALEPostgreSQL Replication in 10  Minutes - SCALE
PostgreSQL Replication in 10 Minutes - SCALEPostgreSQL Experts, Inc.
 
12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQL12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQLKonstantin Gredeskoul
 

Viewers also liked (19)

The Accidental DBA
The Accidental DBAThe Accidental DBA
The Accidental DBA
 
Pg py-and-squid-pypgday
Pg py-and-squid-pypgdayPg py-and-squid-pypgday
Pg py-and-squid-pypgday
 
Building a SQL Database that Works
Building a SQL Database that WorksBuilding a SQL Database that Works
Building a SQL Database that Works
 
Give A Great Tech Talk 2013
Give A Great Tech Talk 2013Give A Great Tech Talk 2013
Give A Great Tech Talk 2013
 
Ten Ways to Destroy Your Database
Ten Ways to Destroy Your DatabaseTen Ways to Destroy Your Database
Ten Ways to Destroy Your Database
 
Safety LAMP: data security & agile languages
Safety LAMP: data security & agile languagesSafety LAMP: data security & agile languages
Safety LAMP: data security & agile languages
 
HowTo DR
HowTo DRHowTo DR
HowTo DR
 
GUC Tutorial Package (9.0)
GUC Tutorial Package (9.0)GUC Tutorial Package (9.0)
GUC Tutorial Package (9.0)
 
Five steps perform_2013
Five steps perform_2013Five steps perform_2013
Five steps perform_2013
 
Scaling Wanelo.com 100x in Six Months
Scaling Wanelo.com 100x in Six MonthsScaling Wanelo.com 100x in Six Months
Scaling Wanelo.com 100x in Six Months
 
From Obvious to Ingenius: Incrementally Scaling Web Apps on PostgreSQL
From Obvious to Ingenius: Incrementally Scaling Web Apps on PostgreSQLFrom Obvious to Ingenius: Incrementally Scaling Web Apps on PostgreSQL
From Obvious to Ingenius: Incrementally Scaling Web Apps on PostgreSQL
 
Webscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Webscale PostgreSQL - JSONB and Horizontal Scaling StrategiesWebscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Webscale PostgreSQL - JSONB and Horizontal Scaling Strategies
 
7 Ways To Crash Postgres
7 Ways To Crash Postgres7 Ways To Crash Postgres
7 Ways To Crash Postgres
 
Scaling postgres
Scaling postgresScaling postgres
Scaling postgres
 
PostgreSQL Replication in 10 Minutes - SCALE
PostgreSQL Replication in 10  Minutes - SCALEPostgreSQL Replication in 10  Minutes - SCALE
PostgreSQL Replication in 10 Minutes - SCALE
 
12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQL12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQL
 
Shootout at the AWS Corral
Shootout at the AWS CorralShootout at the AWS Corral
Shootout at the AWS Corral
 
Shootout at the PAAS Corral
Shootout at the PAAS CorralShootout at the PAAS Corral
Shootout at the PAAS Corral
 
Fail over fail_back
Fail over fail_backFail over fail_back
Fail over fail_back
 

Similar to Firehose Engineering

Проектирование крупномасштабных приложений сбора данных (Josh Berkus)
Проектирование крупномасштабных приложений сбора данных (Josh Berkus)Проектирование крупномасштабных приложений сбора данных (Josh Berkus)
Проектирование крупномасштабных приложений сбора данных (Josh Berkus)Ontico
 
Multi-Site Perforce at NetApp
Multi-Site Perforce at NetAppMulti-Site Perforce at NetApp
Multi-Site Perforce at NetAppPerforce
 
Introduction to Apache Apex
Introduction to Apache ApexIntroduction to Apache Apex
Introduction to Apache ApexApache Apex
 
How to Design for Database High Availability
How to Design for Database High AvailabilityHow to Design for Database High Availability
How to Design for Database High AvailabilityEDB
 
Interactive Data Analysis in Spark Streaming
Interactive Data Analysis in Spark StreamingInteractive Data Analysis in Spark Streaming
Interactive Data Analysis in Spark Streamingdatamantra
 
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...Fred de Villamil
 
Backing up Wikipedia Databases
Backing up Wikipedia DatabasesBacking up Wikipedia Databases
Backing up Wikipedia DatabasesJaime Crespo
 
Managing Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchManaging Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchJoe Alex
 
Scalable and High available Distributed File System Metadata Service Using gR...
Scalable and High available Distributed File System Metadata Service Using gR...Scalable and High available Distributed File System Metadata Service Using gR...
Scalable and High available Distributed File System Metadata Service Using gR...Alluxio, Inc.
 
Intro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
Intro to Apache Apex (next gen Hadoop) & comparison to Spark StreamingIntro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
Intro to Apache Apex (next gen Hadoop) & comparison to Spark StreamingApache Apex
 
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in ProductionCassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in ProductionDataStax Academy
 
Cassandra Day Chicago 2015: Diagnosing Problems in Production
Cassandra Day Chicago 2015: Diagnosing Problems in ProductionCassandra Day Chicago 2015: Diagnosing Problems in Production
Cassandra Day Chicago 2015: Diagnosing Problems in ProductionDataStax Academy
 
Cassandra Day London 2015: Diagnosing Problems in Production
Cassandra Day London 2015: Diagnosing Problems in ProductionCassandra Day London 2015: Diagnosing Problems in Production
Cassandra Day London 2015: Diagnosing Problems in ProductionDataStax Academy
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2aspyker
 
Diagnosing Problems in Production (Nov 2015)
Diagnosing Problems in Production (Nov 2015)Diagnosing Problems in Production (Nov 2015)
Diagnosing Problems in Production (Nov 2015)Jon Haddad
 
OVHcloud – Enterprise Cloud Databases
OVHcloud – Enterprise Cloud DatabasesOVHcloud – Enterprise Cloud Databases
OVHcloud – Enterprise Cloud DatabasesOVHcloud
 

Similar to Firehose Engineering (20)

Проектирование крупномасштабных приложений сбора данных (Josh Berkus)
Проектирование крупномасштабных приложений сбора данных (Josh Berkus)Проектирование крупномасштабных приложений сбора данных (Josh Berkus)
Проектирование крупномасштабных приложений сбора данных (Josh Berkus)
 
Multi-Site Perforce at NetApp
Multi-Site Perforce at NetAppMulti-Site Perforce at NetApp
Multi-Site Perforce at NetApp
 
Introduction to Apache Apex
Introduction to Apache ApexIntroduction to Apache Apex
Introduction to Apache Apex
 
How to Design for Database High Availability
How to Design for Database High AvailabilityHow to Design for Database High Availability
How to Design for Database High Availability
 
Interactive Data Analysis in Spark Streaming
Interactive Data Analysis in Spark StreamingInteractive Data Analysis in Spark Streaming
Interactive Data Analysis in Spark Streaming
 
WW Historian 10
WW Historian 10WW Historian 10
WW Historian 10
 
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
 
Backing up Wikipedia Databases
Backing up Wikipedia DatabasesBacking up Wikipedia Databases
Backing up Wikipedia Databases
 
Managing Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchManaging Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using Elasticsearch
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Scalable and High available Distributed File System Metadata Service Using gR...
Scalable and High available Distributed File System Metadata Service Using gR...Scalable and High available Distributed File System Metadata Service Using gR...
Scalable and High available Distributed File System Metadata Service Using gR...
 
Intro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
Intro to Apache Apex (next gen Hadoop) & comparison to Spark StreamingIntro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
Intro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
 
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in ProductionCassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
 
Cassandra Day Chicago 2015: Diagnosing Problems in Production
Cassandra Day Chicago 2015: Diagnosing Problems in ProductionCassandra Day Chicago 2015: Diagnosing Problems in Production
Cassandra Day Chicago 2015: Diagnosing Problems in Production
 
Cassandra Day London 2015: Diagnosing Problems in Production
Cassandra Day London 2015: Diagnosing Problems in ProductionCassandra Day London 2015: Diagnosing Problems in Production
Cassandra Day London 2015: Diagnosing Problems in Production
 
Big Data for QAs
Big Data for QAsBig Data for QAs
Big Data for QAs
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
 
Advanced Operations
Advanced OperationsAdvanced Operations
Advanced Operations
 
Diagnosing Problems in Production (Nov 2015)
Diagnosing Problems in Production (Nov 2015)Diagnosing Problems in Production (Nov 2015)
Diagnosing Problems in Production (Nov 2015)
 
OVHcloud – Enterprise Cloud Databases
OVHcloud – Enterprise Cloud DatabasesOVHcloud – Enterprise Cloud Databases
OVHcloud – Enterprise Cloud Databases
 

More from PostgreSQL Experts, Inc.

Elephant Roads: PostgreSQL Patches and Variants
Elephant Roads: PostgreSQL Patches and VariantsElephant Roads: PostgreSQL Patches and Variants
Elephant Roads: PostgreSQL Patches and VariantsPostgreSQL Experts, Inc.
 
Introducing Windowing Functions (pgCon 2009)
Introducing Windowing Functions (pgCon 2009)Introducing Windowing Functions (pgCon 2009)
Introducing Windowing Functions (pgCon 2009)PostgreSQL Experts, Inc.
 

More from PostgreSQL Experts, Inc. (20)

92 grand prix_2013
92 grand prix_201392 grand prix_2013
92 grand prix_2013
 
PWNage: Producing a newsletter with Perl
PWNage: Producing a newsletter with PerlPWNage: Producing a newsletter with Perl
PWNage: Producing a newsletter with Perl
 
10 Ways to Destroy Your Community
10 Ways to Destroy Your Community10 Ways to Destroy Your Community
10 Ways to Destroy Your Community
 
Open Source Press Relations
Open Source Press RelationsOpen Source Press Relations
Open Source Press Relations
 
5 (more) Ways To Destroy Your Community
5 (more) Ways To Destroy Your Community5 (more) Ways To Destroy Your Community
5 (more) Ways To Destroy Your Community
 
Preventing Community (from Linux Collab)
Preventing Community (from Linux Collab)Preventing Community (from Linux Collab)
Preventing Community (from Linux Collab)
 
Development of 8.3 In India
Development of 8.3 In IndiaDevelopment of 8.3 In India
Development of 8.3 In India
 
PostgreSQL and MySQL
PostgreSQL and MySQLPostgreSQL and MySQL
PostgreSQL and MySQL
 
50 Ways To Love Your Project
50 Ways To Love Your Project50 Ways To Love Your Project
50 Ways To Love Your Project
 
8.4 Upcoming Features
8.4 Upcoming Features 8.4 Upcoming Features
8.4 Upcoming Features
 
Elephant Roads: PostgreSQL Patches and Variants
Elephant Roads: PostgreSQL Patches and VariantsElephant Roads: PostgreSQL Patches and Variants
Elephant Roads: PostgreSQL Patches and Variants
 
Writeable CTEs: The Next Big Thing
Writeable CTEs: The Next Big ThingWriteable CTEs: The Next Big Thing
Writeable CTEs: The Next Big Thing
 
PostgreSQL Development Today: 9.0
PostgreSQL Development Today: 9.0PostgreSQL Development Today: 9.0
PostgreSQL Development Today: 9.0
 
9.1 Mystery Tour
9.1 Mystery Tour9.1 Mystery Tour
9.1 Mystery Tour
 
Postgres Open Keynote: The Next 25 Years
Postgres Open Keynote: The Next 25 YearsPostgres Open Keynote: The Next 25 Years
Postgres Open Keynote: The Next 25 Years
 
9.1 Grand Tour
9.1 Grand Tour9.1 Grand Tour
9.1 Grand Tour
 
Keyvil Lightning Talk
Keyvil Lightning TalkKeyvil Lightning Talk
Keyvil Lightning Talk
 
Safe Data is Happy Data
Safe Data is Happy DataSafe Data is Happy Data
Safe Data is Happy Data
 
Unit Test Your Database! (PgCon 2009)
Unit Test Your Database! (PgCon 2009)Unit Test Your Database! (PgCon 2009)
Unit Test Your Database! (PgCon 2009)
 
Introducing Windowing Functions (pgCon 2009)
Introducing Windowing Functions (pgCon 2009)Introducing Windowing Functions (pgCon 2009)
Introducing Windowing Functions (pgCon 2009)
 

Recently uploaded

SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 

Recently uploaded (20)

SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 

Firehose Engineering

  • 2.
  • 3. Firehose Database Applications (FDA) (1) very high volume of data input from many automated producers (2) continuous processing and aggregation of incoming data
  • 8. 1. Volume ● 100's to 1000's facts/second ● GB/hour
  • 9. 1. Volume ● spikes in volume ● multiple uncoorindated sources
  • 10. 1. Volume volume always grows over time
  • 11. 2. Constant flow since data arrives 24/7 … while the user interface can be down, data collection can never be down
  • 12. 2. Constant flow ● can't stop receiving to process ETL ● data can arrive out of order
  • 13. 3. Database size ● terabytes to petabytes ● lots of hardware ● single-node DBMSes aren't enough ● difficult backups, redundancy, migration ● analytics are resource-consumptive
  • 14. 3. Database size ● database growth ● size grows quickly ● need to expand storage ● estimate target data size ● create data ageing policies
  • 15. 3. Database size “We will decide on a data retention policy when we run out of disk space.” – every business user everywhere
  • 16. 4.
  • 17. many components = many failures
  • 18. 4. Component failure ● all components fail ● or need scheduled downtime ● including the network ● collection must continue ● collection & processing must recover
  • 22.
  • 23. Mozilla Socorro webservers collectors reports processors
  • 24.
  • 25. socorro data volume ● 3000 crashes/minute ● avg. size 150K ● 40TB accumulated raw data ● 500GB accumulated metadata / reports
  • 26. dealing with volume load balancers collectors
  • 28. dealing with size data 40TB expandible metadata views 500GB fixed size
  • 29. dealing with component failure Lots of hardware ... ● 30 Hbase ● 3 ES servers nodes ● 6 collectors ● 2 PostgreSQL ● 12 processors servers ● 8 middleware & ● 6 load web servers balancers … lots of failures
  • 30. load balancing & redundancy load balancers collectors
  • 31. elastic connections ● components queue their data ● retain it if other nodes are down ● components resume work automatically ● when other nodes come back up
  • 32. elastic connections crash mover collector reciever local file queue
  • 33. server management ● puppet ● controls configuration of all servers ● makes sure servers recover ● allows rapid deployment of replacement nodes
  • 35. Upwind ● speed ● wind speed ● heat ● vibration ● noise ● direction
  • 36. Upwind 1. maximize power generation 2. make sure turbine isn't damaged
  • 37. dealing with volume each turbine: 90 to 700 facts/second windmills per farm: up to 100 number of farms: 40+ est. total: 300,000 facts/second (will grow)
  • 38. dealing with volume local historian analytic reports storage database local historian analytic reports storage database
  • 39. dealing with volume local historian analytic storage database master database local historian analytic storage database
  • 40. multi-tenant partitioning ● partition the whole application ● each customer gets their own toolchain ● allows scaling with the number of customers ● lowers efficiency ● more efficient with virtualization
  • 41. dealing with: constant flow and size minute hours days months years historian buffer table table table table minute hours days months historian buffer table table table minute hours days historian buffer table table
  • 42. time-based rollups ● continuously accumulate levels of rollup ● each is based on the level below it ● data is always appended, never updated ● small windows == small resources
  • 43. time-based rollups ● allows: ● very rapid summary reports for different windows ● retaining different summaries for different levels of time ● batch/out-of-order processing ● summarization in parallel
  • 45. data collection must be: ● continuous ● parallel ● fault-tolerant
  • 46. data processing must be: ● continuous ● parallel ● fault-tolerant
  • 47. every component must be able to fail ● including the network ● without too much data loss ● other components must continue
  • 48. 5 tools to use 1. queueing software 2. buffering techniques 3. materialized views 4. configuration management 5. comprehensive monitoring
  • 49. 4 don'ts 1. use cutting-edge technology 2. use untested hardware 3. run components to capacity 4. do hot patching
  • 51. Contact ● Josh Berkus: josh@pgexperts.com ● blog: blogs.ittoolbox.com/database/soup ● PostgreSQL: www.postgresql.org ● pgexperts: www.pgexperts.com ● Postgres BOF 3:30PM Today ● pgCon, Ottawa, May 2012 ● Postgres Open, Chicago, Sept. 2012 The text and diagrams in this talk is copyright 2011 Josh Berkus and is licensed under the creative commons attribution license. Title slide image and other fireman images are licensed from iStockPhoto and may not be reproduced or redistributed separate from this presentation. Socorro images are copyright 2011 Mozilla Inc.