SlideShare uma empresa Scribd logo
1 de 25
Baixar para ler offline
Lessons Learned
 from OpenTSDB

Or why OpenTSDB is the way it is
 and how it changed iteratively to
correct some of the mistakes made
                             Benoît “tsuna” Sigoure
                           tsuna@stumbleupon.com
Key concepts

•   Data Points
    (time, value)

•   Metrics
    proc.loadavg.1m

•   Tags
    host=web42     pool=static

•   Metric + Tags = Time Series

•   Order of magnitude: >106 time series, >1012 data points

    put proc.loadavg.1m 1234567890 0.42 host=web42 pool=static
OpenTSDB @ StumbleUpon
• Main production monitoring system for ~2 years
• Storing hundreds of billions of data points
• Adding over 1 billion data points per day
• 13000 data points/s → 130 QPS on HBase
• If you had a 5 node cluster, this load
  would hardly make it sweat
Do’s
• Wider rows to seek faster
  before: ~4KB/row, after: ~20KB
• Make writes idempotent and independent
  before: start rows at arbitrary points in time
  after: align rows on 10m (then 1h) boundaries
• Store more data per KeyValue
  Remember you pay for the key along each value
  in a row, so large keys are really expensive
Don’ts

• Use HTable / HTablePool in app servers
  asynchbase + Netty or Finagle = performance++
• Put variable-length fields in composite keys
  They’re hard to scan
• Exceed a few hundred regions per RegionServer
  “Oversharding” introduces overhead and makes
  recovering from failures more expensive
Use asynchbase
                                         HTable           asynchbase
                scan                          sequential read                     sequential write
50s                             500s                                   200s



38s                             375s                                   150s



25s                             250s                                   100s



13s                             125s                                    50s



 0s                                 0s                                   0s
      4   8      16       24   32        4    8      16       24   32         4   8      16       24   32
              # Threads                           # Threads                           # Threads
How OpenTSDB
         came to be the
            way it is
Questions:
• How to store time series data efficiently in HBase?
• How to enable concurrent writes without
  synchronization between the writers?
• How to save space/memory when storing
  hundreds of billions of data items in HBase?
Ta
     Time Series Data in HBase                    ke
                                                       1

                   Col                 don’t care
                         umn
             Key
             1234567890        1
                                      values
             1234567892        2
timestamps
             1234567894        3


Simplest design: only 1 time series, 1 row with a
single KeyValue per data point.
Supports time-range scans.
Ta
     Time Series Data in HBase                  ke
                                                     2

                     Colu
                         mn
           Key
            foo 1234567890     1

            foo 1234567892     3
  metric
  name     fool 1234567890     2


Metric name first in row key for data locality.
Problem: can’t store the metric as text in row key
due to space concerns
Ta
     Time Series Data in HBase                      ke
                                                         3

                       Colu               Separate
                           mn
           Key                          Lookup Table:
                                          Key    Value
             0x1 1234567890       1
                                          0x1    foo
             0x1 1234567892       3       0x2    fool
  metric
                                           foo   0x1
   ID        0x2 1234567890       2
                                          fool   0x2

Use a separate table to assign unique IDs to
metric names (and tags, not shown here). IDs give us a
predictable length and achieve desired data locality.
Ta
    Time Series Data in HBase                ke
                                                  4

                    Colu
                        mn   +0    +2
          Key
           0x1 1234567890     1    3

           0x1 1234567892     3

           0x2 1234567890     2


Reduce the number of rows by storing multiple
consecutive data points in the same row.
Fewer rows = faster to seek to a specific row.
Ta
       Time Series Data in HBase                             ke
                                                                   4

                          Colu
                              mn   +0       +2
                 Key
                  0x1 1234567890    1       3
  Misleading
     table        0x1 1234567892    3
representation
                  0x2 1234567890    2

 Gotcha #1: wider rows don’t save any space*
                     Key      Colum Value
            le 0x1 1234567890   n
                               +0     1
         ab
      l t d 0x1 1234567890
    ua re                      +2     3          * Until magic prefix
  ct to                        +0     2
                                                 compression happens in
 A s           0x2 1234567890                    upcoming HBase 0.94
Ta
     Time Series Data in HBase                  ke
                                                     4

                     Colu
                         mn    +0    +2
          Key
            0x1 1234567890      1    3

            0x1 1234567892      3

            0x2 1234567890      2


Devil is in the details: when to start new rows?
Naive answer: start on first data point, after some
time start a new row.
Ta
 Time Series Data in HBase                          ke
                                                         4

                       Colu
                           mn   +0
         Key

           0x1 1000000000        1




                    0000 00 1   TSD1
             1000                      First data point:
         foo                           Start a new row
Client                          TSD2
Ta
 Time Series Data in HBase                             ke
                                                            4

                       Colu
                           mn   +0     +10     ...
         Key

           0x1 1000000000        1      2      ...




                    0000 10 2   TSD1
             1000                       Keep adding
         foo
                                        points until...
Client                          TSD2
Ta
 Time Series Data in HBase                            ke
                                                           4

                      Colu
                          mn    +0     +10    ... +599
          Key

            0x1 1000000000       1      2     ...   42




                           42
                 0000 0599      TSD1
                                       ... some arbitrary
         fo o 10
                                         limit, say 10min
Client                          TSD2
Ta
 Time Series Data in HBase                           ke
                                                          4

                      Colu
                          mn    +0     +10   ... +599
          Key
            0x1 1000000000       1      2    ...   42

            0x1 1000000600              51




                           51
                 0000 0610      TSD1
                                       Then start a new
         fo o 10
                                             row
Client                          TSD2
Ta
 Time Series Data in HBase                        ke
                                                       4

                       Colu
                           mn   +0
         Key
           0x1 1234567890        1




     But this scheme fails with multiple TSDs


                    5678 90 1   TSD1   Create new row
         foo 1234
Client                          TSD2
Ta
 Time Series Data in HBase                       ke
                                                      4

                       Colu
                           mn   +0     +2
         Key
           0x1 1234567890        1     3




                    5678 92 3   TSD1    Add to row
         foo 1234
Client                          TSD2
Ta
 Time Series Data in HBase                          ke
                                                         4

                    Colu
                        mn     +0     +2
         Key
          0x1 1234567890       1      3
                                            Oops!
          0x1 1234567892       3

 Maybe a connection failure occurred, client is
     retransmitting data to another TSD

                              TSD1     Add to row
         foo 12345678
                      92 3
Client                        TSD2   Create new row
Ta
      Time Series Data in HBase               ke
                                                   5

                       Colu
                           mn   +90   +92
              Key
    Base
timestamp      0x1 1234567800    1     3
  always a
multiple of    0x2 1234567800    2
    600


In order to scale easily and keep TSD stateless,
make writes independent & idempotent.
New rule: rows are aligned on 10 min. boundaries
Ta
      Time Series Data in HBase               ke
                                                   6

                       Colu
                           mn +1890 +1892
              Key
    Base
timestamp      0x1 1234566000   1     3
  always a
multiple of    0x2 1234566000   2
    3600


1 data point every ~10s => 60 data points / row
Not much. Go to wider rows to further increase
seek speed. One hour rows = 6x fewer rows
Ta
    Time Series Data in HBase                          ke
                                                            6

                        Colu
                              mn +1890 +1892
           Key
             0x1 1234566000          1      3

             0x2 1234566000          2

Remember: wider rows don’t save any space!
                   Key         Colum Value Key is easily 4x
          le 0x1 1234566000      n
                               +1890   1     bigger than
      tab                                  column + value
    al ed 0x1 1234566000
  tu or
                               +1892   3
Ac st        0x2 1234566000    +1890   2    and repeated
Ta
      Time Series Data in HBase                                ke
                                                                    7

                            Colu
                                 mn   +1890 +1890 +1892 +1892
              Key
                   0x1 1234566000      1      1    3     3

                   0x2 1234566000      2

Solution: “compact” columns by concatenation
                     Key      Column Value Space savings
          le 0x1   1234566000 +1890        1   on disk and in
      tab
    al ed 0x1
  tu or            1234566000 +1890,+1892 1, 3 memory are
Ac st        0x1   1234566000 +1892        3    huge: data is
           0x2     1234566000 +1890        2 4x-8x smaller!
¿ Questions ?
            ub
            tH
        Gi
       on

                   opentsdb.net
    e
   m
  kr
Fo




                                Summary

 • Use asynchbase             • Use Netty or Finagle
 • Wider table > Taller table • Short family names
 • Make writes idempotent • Make writes independent
 • Compact your data          • Have predictable key sizes
                                    ool?
                   Thin k this is c          Benoît “tsuna” Sigoure
                 W e’re hiring             tsuna@stumbleupon.com

Mais conteúdo relacionado

Mais procurados

The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroThe Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroDatabricks
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guideRyan Blue
 
Hadoop Strata Talk - Uber, your hadoop has arrived
Hadoop Strata Talk - Uber, your hadoop has arrived Hadoop Strata Talk - Uber, your hadoop has arrived
Hadoop Strata Talk - Uber, your hadoop has arrived Vinoth Chandar
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase强 王
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangDatabricks
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
 
What is in a Lucene index?
What is in a Lucene index?What is in a Lucene index?
What is in a Lucene index?lucenerevolution
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsAlluxio, Inc.
 
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016DataStax
 
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...Databricks
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesDatabricks
 
The Apache Spark File Format Ecosystem
The Apache Spark File Format EcosystemThe Apache Spark File Format Ecosystem
The Apache Spark File Format EcosystemDatabricks
 
Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Julian Hyde
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & FeaturesDataStax Academy
 
Dynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache SparkDynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache SparkDatabricks
 

Mais procurados (20)

The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroThe Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
 
Hadoop Strata Talk - Uber, your hadoop has arrived
Hadoop Strata Talk - Uber, your hadoop has arrived Hadoop Strata Talk - Uber, your hadoop has arrived
Hadoop Strata Talk - Uber, your hadoop has arrived
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
What is in a Lucene index?
What is in a Lucene index?What is in a Lucene index?
What is in a Lucene index?
 
HBase Storage Internals
HBase Storage InternalsHBase Storage Internals
HBase Storage Internals
 
File Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & ParquetFile Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & Parquet
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 
The Impala Cookbook
The Impala CookbookThe Impala Cookbook
The Impala Cookbook
 
Hive: Loading Data
Hive: Loading DataHive: Loading Data
Hive: Loading Data
 
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
 
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 
The Apache Spark File Format Ecosystem
The Apache Spark File Format EcosystemThe Apache Spark File Format Ecosystem
The Apache Spark File Format Ecosystem
 
Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
 
Dynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache SparkDynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache Spark
 

Destaque

Monitoring MySQL with OpenTSDB
Monitoring MySQL with OpenTSDBMonitoring MySQL with OpenTSDB
Monitoring MySQL with OpenTSDBGeoffrey Anderson
 
opentsdb in a real enviroment
opentsdb in a real enviromentopentsdb in a real enviroment
opentsdb in a real enviromentChen Robert
 
OpenTSDB 2.0
OpenTSDB 2.0OpenTSDB 2.0
OpenTSDB 2.0HBaseCon
 
Twitter Storm: Ereignisverarbeitung in Echtzeit
Twitter Storm: Ereignisverarbeitung in EchtzeitTwitter Storm: Ereignisverarbeitung in Echtzeit
Twitter Storm: Ereignisverarbeitung in EchtzeitGuido Schmutz
 
HCatalog: Table Management for Hadoop - CHUG - 20120917
HCatalog: Table Management for Hadoop - CHUG - 20120917HCatalog: Table Management for Hadoop - CHUG - 20120917
HCatalog: Table Management for Hadoop - CHUG - 20120917Chicago Hadoop Users Group
 
Making Big Data Analytics Interactive and Real-­Time
 Making Big Data Analytics Interactive and Real-­Time Making Big Data Analytics Interactive and Real-­Time
Making Big Data Analytics Interactive and Real-­TimeSeven Nguyen
 
Using Morphlines for On-the-Fly ETL
Using Morphlines for On-the-Fly ETLUsing Morphlines for On-the-Fly ETL
Using Morphlines for On-the-Fly ETLCloudera, Inc.
 
How we solved Real-time User Segmentation using HBase
How we solved Real-time User Segmentation using HBaseHow we solved Real-time User Segmentation using HBase
How we solved Real-time User Segmentation using HBaseDataWorks Summit
 
Deploying Apache Flume to enable low-latency analytics
Deploying Apache Flume to enable low-latency analyticsDeploying Apache Flume to enable low-latency analytics
Deploying Apache Flume to enable low-latency analyticsDataWorks Summit
 
A real time architecture using Hadoop and Storm @ FOSDEM 2013
A real time architecture using Hadoop and Storm @ FOSDEM 2013A real time architecture using Hadoop and Storm @ FOSDEM 2013
A real time architecture using Hadoop and Storm @ FOSDEM 2013Nathan Bijnens
 
HBaseCon 2013: OpenTSDB at Box
HBaseCon 2013: OpenTSDB at BoxHBaseCon 2013: OpenTSDB at Box
HBaseCon 2013: OpenTSDB at BoxCloudera, Inc.
 
HBaseCon 2015: OpenTSDB and AsyncHBase Update
HBaseCon 2015: OpenTSDB and AsyncHBase UpdateHBaseCon 2015: OpenTSDB and AsyncHBase Update
HBaseCon 2015: OpenTSDB and AsyncHBase UpdateHBaseCon
 
Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)alexbaranau
 
Apache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsApache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsHortonworks
 
Integration of Hive and HBase
Integration of Hive and HBaseIntegration of Hive and HBase
Integration of Hive and HBaseHortonworks
 
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, SalesforceHBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, SalesforceCloudera, Inc.
 
Realtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopRealtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopDataWorks Summit
 

Destaque (20)

Monitoring MySQL with OpenTSDB
Monitoring MySQL with OpenTSDBMonitoring MySQL with OpenTSDB
Monitoring MySQL with OpenTSDB
 
opentsdb in a real enviroment
opentsdb in a real enviromentopentsdb in a real enviroment
opentsdb in a real enviroment
 
OpenTSDB 2.0
OpenTSDB 2.0OpenTSDB 2.0
OpenTSDB 2.0
 
Twitter Storm: Ereignisverarbeitung in Echtzeit
Twitter Storm: Ereignisverarbeitung in EchtzeitTwitter Storm: Ereignisverarbeitung in Echtzeit
Twitter Storm: Ereignisverarbeitung in Echtzeit
 
HCatalog: Table Management for Hadoop - CHUG - 20120917
HCatalog: Table Management for Hadoop - CHUG - 20120917HCatalog: Table Management for Hadoop - CHUG - 20120917
HCatalog: Table Management for Hadoop - CHUG - 20120917
 
Hadoop basics
Hadoop basicsHadoop basics
Hadoop basics
 
Making Big Data Analytics Interactive and Real-­Time
 Making Big Data Analytics Interactive and Real-­Time Making Big Data Analytics Interactive and Real-­Time
Making Big Data Analytics Interactive and Real-­Time
 
Using Morphlines for On-the-Fly ETL
Using Morphlines for On-the-Fly ETLUsing Morphlines for On-the-Fly ETL
Using Morphlines for On-the-Fly ETL
 
How we solved Real-time User Segmentation using HBase
How we solved Real-time User Segmentation using HBaseHow we solved Real-time User Segmentation using HBase
How we solved Real-time User Segmentation using HBase
 
Intro to Pig UDF
Intro to Pig UDFIntro to Pig UDF
Intro to Pig UDF
 
Deploying Apache Flume to enable low-latency analytics
Deploying Apache Flume to enable low-latency analyticsDeploying Apache Flume to enable low-latency analytics
Deploying Apache Flume to enable low-latency analytics
 
A real time architecture using Hadoop and Storm @ FOSDEM 2013
A real time architecture using Hadoop and Storm @ FOSDEM 2013A real time architecture using Hadoop and Storm @ FOSDEM 2013
A real time architecture using Hadoop and Storm @ FOSDEM 2013
 
HBaseCon 2013: OpenTSDB at Box
HBaseCon 2013: OpenTSDB at BoxHBaseCon 2013: OpenTSDB at Box
HBaseCon 2013: OpenTSDB at Box
 
HBaseCon 2015: OpenTSDB and AsyncHBase Update
HBaseCon 2015: OpenTSDB and AsyncHBase UpdateHBaseCon 2015: OpenTSDB and AsyncHBase Update
HBaseCon 2015: OpenTSDB and AsyncHBase Update
 
Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)
 
Apache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsApache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data Applications
 
Spark and shark
Spark and sharkSpark and shark
Spark and shark
 
Integration of Hive and HBase
Integration of Hive and HBaseIntegration of Hive and HBase
Integration of Hive and HBase
 
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, SalesforceHBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
 
Realtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopRealtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and Hadoop
 

Semelhante a HBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUpon

Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012Boris Yen
 
MongoDB: Scaling write performance | Devon 2012
MongoDB: Scaling write performance | Devon 2012MongoDB: Scaling write performance | Devon 2012
MongoDB: Scaling write performance | Devon 2012Daum DNA
 
Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase HBaseCon
 
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...Cloudera, Inc.
 
Intro to Cassandra
Intro to CassandraIntro to Cassandra
Intro to CassandraTyler Hobbs
 
Near-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBaseNear-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBasedave_revell
 
Mongodb - Scaling write performance
Mongodb - Scaling write performanceMongodb - Scaling write performance
Mongodb - Scaling write performanceDaum DNA
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to CassandraHanborq Inc.
 
Apache Cassandra Opinion and Fact
Apache Cassandra Opinion and FactApache Cassandra Opinion and Fact
Apache Cassandra Opinion and Factmediumdata
 
Ben Coverston - The Apache Cassandra Project
Ben Coverston - The Apache Cassandra ProjectBen Coverston - The Apache Cassandra Project
Ben Coverston - The Apache Cassandra ProjectMorningstar Tech Talks
 
TriHUG January 2012 Talk by Chris Shain
TriHUG January 2012 Talk by Chris ShainTriHUG January 2012 Talk by Chris Shain
TriHUG January 2012 Talk by Chris Shaintrihug
 
Slide presentation pycassa_upload
Slide presentation pycassa_uploadSlide presentation pycassa_upload
Slide presentation pycassa_uploadRajini Ramesh
 
Cassandra for Ruby/Rails Devs
Cassandra for Ruby/Rails DevsCassandra for Ruby/Rails Devs
Cassandra for Ruby/Rails DevsTyler Hobbs
 
Indexing and Mining a Billion Time series using iSAX 2.0
Indexing and Mining a Billion Time series using iSAX 2.0Indexing and Mining a Billion Time series using iSAX 2.0
Indexing and Mining a Billion Time series using iSAX 2.0Vasu Jain
 

Semelhante a HBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUpon (20)

Taming Cassandra
Taming CassandraTaming Cassandra
Taming Cassandra
 
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
 
MongoDB: Scaling write performance | Devon 2012
MongoDB: Scaling write performance | Devon 2012MongoDB: Scaling write performance | Devon 2012
MongoDB: Scaling write performance | Devon 2012
 
Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase
 
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
 
Bayesian Counters
Bayesian CountersBayesian Counters
Bayesian Counters
 
Intro to Cassandra
Intro to CassandraIntro to Cassandra
Intro to Cassandra
 
Near-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBaseNear-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBase
 
Cryptography-101
Cryptography-101Cryptography-101
Cryptography-101
 
Cryptography - 101
Cryptography - 101Cryptography - 101
Cryptography - 101
 
Mongodb - Scaling write performance
Mongodb - Scaling write performanceMongodb - Scaling write performance
Mongodb - Scaling write performance
 
Data structures
Data structuresData structures
Data structures
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
 
Apache Cassandra Opinion and Fact
Apache Cassandra Opinion and FactApache Cassandra Opinion and Fact
Apache Cassandra Opinion and Fact
 
Ben Coverston - The Apache Cassandra Project
Ben Coverston - The Apache Cassandra ProjectBen Coverston - The Apache Cassandra Project
Ben Coverston - The Apache Cassandra Project
 
TriHUG January 2012 Talk by Chris Shain
TriHUG January 2012 Talk by Chris ShainTriHUG January 2012 Talk by Chris Shain
TriHUG January 2012 Talk by Chris Shain
 
Slide presentation pycassa_upload
Slide presentation pycassa_uploadSlide presentation pycassa_upload
Slide presentation pycassa_upload
 
Cassandra for Ruby/Rails Devs
Cassandra for Ruby/Rails DevsCassandra for Ruby/Rails Devs
Cassandra for Ruby/Rails Devs
 
Assembler
AssemblerAssembler
Assembler
 
Indexing and Mining a Billion Time series using iSAX 2.0
Indexing and Mining a Billion Time series using iSAX 2.0Indexing and Mining a Billion Time series using iSAX 2.0
Indexing and Mining a Billion Time series using iSAX 2.0
 

Mais de Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

Mais de Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Último

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 

Último (20)

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 

HBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUpon

  • 1. Lessons Learned from OpenTSDB Or why OpenTSDB is the way it is and how it changed iteratively to correct some of the mistakes made Benoît “tsuna” Sigoure tsuna@stumbleupon.com
  • 2. Key concepts • Data Points (time, value) • Metrics proc.loadavg.1m • Tags host=web42 pool=static • Metric + Tags = Time Series • Order of magnitude: >106 time series, >1012 data points put proc.loadavg.1m 1234567890 0.42 host=web42 pool=static
  • 3. OpenTSDB @ StumbleUpon • Main production monitoring system for ~2 years • Storing hundreds of billions of data points • Adding over 1 billion data points per day • 13000 data points/s → 130 QPS on HBase • If you had a 5 node cluster, this load would hardly make it sweat
  • 4. Do’s • Wider rows to seek faster before: ~4KB/row, after: ~20KB • Make writes idempotent and independent before: start rows at arbitrary points in time after: align rows on 10m (then 1h) boundaries • Store more data per KeyValue Remember you pay for the key along each value in a row, so large keys are really expensive
  • 5. Don’ts • Use HTable / HTablePool in app servers asynchbase + Netty or Finagle = performance++ • Put variable-length fields in composite keys They’re hard to scan • Exceed a few hundred regions per RegionServer “Oversharding” introduces overhead and makes recovering from failures more expensive
  • 6. Use asynchbase HTable asynchbase scan sequential read sequential write 50s 500s 200s 38s 375s 150s 25s 250s 100s 13s 125s 50s 0s 0s 0s 4 8 16 24 32 4 8 16 24 32 4 8 16 24 32 # Threads # Threads # Threads
  • 7. How OpenTSDB came to be the way it is Questions: • How to store time series data efficiently in HBase? • How to enable concurrent writes without synchronization between the writers? • How to save space/memory when storing hundreds of billions of data items in HBase?
  • 8. Ta Time Series Data in HBase ke 1 Col don’t care umn Key 1234567890 1 values 1234567892 2 timestamps 1234567894 3 Simplest design: only 1 time series, 1 row with a single KeyValue per data point. Supports time-range scans.
  • 9. Ta Time Series Data in HBase ke 2 Colu mn Key foo 1234567890 1 foo 1234567892 3 metric name fool 1234567890 2 Metric name first in row key for data locality. Problem: can’t store the metric as text in row key due to space concerns
  • 10. Ta Time Series Data in HBase ke 3 Colu Separate mn Key Lookup Table: Key Value 0x1 1234567890 1 0x1 foo 0x1 1234567892 3 0x2 fool metric foo 0x1 ID 0x2 1234567890 2 fool 0x2 Use a separate table to assign unique IDs to metric names (and tags, not shown here). IDs give us a predictable length and achieve desired data locality.
  • 11. Ta Time Series Data in HBase ke 4 Colu mn +0 +2 Key 0x1 1234567890 1 3 0x1 1234567892 3 0x2 1234567890 2 Reduce the number of rows by storing multiple consecutive data points in the same row. Fewer rows = faster to seek to a specific row.
  • 12. Ta Time Series Data in HBase ke 4 Colu mn +0 +2 Key 0x1 1234567890 1 3 Misleading table 0x1 1234567892 3 representation 0x2 1234567890 2 Gotcha #1: wider rows don’t save any space* Key Colum Value le 0x1 1234567890 n +0 1 ab l t d 0x1 1234567890 ua re +2 3 * Until magic prefix ct to +0 2 compression happens in A s 0x2 1234567890 upcoming HBase 0.94
  • 13. Ta Time Series Data in HBase ke 4 Colu mn +0 +2 Key 0x1 1234567890 1 3 0x1 1234567892 3 0x2 1234567890 2 Devil is in the details: when to start new rows? Naive answer: start on first data point, after some time start a new row.
  • 14. Ta Time Series Data in HBase ke 4 Colu mn +0 Key 0x1 1000000000 1 0000 00 1 TSD1 1000 First data point: foo Start a new row Client TSD2
  • 15. Ta Time Series Data in HBase ke 4 Colu mn +0 +10 ... Key 0x1 1000000000 1 2 ... 0000 10 2 TSD1 1000 Keep adding foo points until... Client TSD2
  • 16. Ta Time Series Data in HBase ke 4 Colu mn +0 +10 ... +599 Key 0x1 1000000000 1 2 ... 42 42 0000 0599 TSD1 ... some arbitrary fo o 10 limit, say 10min Client TSD2
  • 17. Ta Time Series Data in HBase ke 4 Colu mn +0 +10 ... +599 Key 0x1 1000000000 1 2 ... 42 0x1 1000000600 51 51 0000 0610 TSD1 Then start a new fo o 10 row Client TSD2
  • 18. Ta Time Series Data in HBase ke 4 Colu mn +0 Key 0x1 1234567890 1 But this scheme fails with multiple TSDs 5678 90 1 TSD1 Create new row foo 1234 Client TSD2
  • 19. Ta Time Series Data in HBase ke 4 Colu mn +0 +2 Key 0x1 1234567890 1 3 5678 92 3 TSD1 Add to row foo 1234 Client TSD2
  • 20. Ta Time Series Data in HBase ke 4 Colu mn +0 +2 Key 0x1 1234567890 1 3 Oops! 0x1 1234567892 3 Maybe a connection failure occurred, client is retransmitting data to another TSD TSD1 Add to row foo 12345678 92 3 Client TSD2 Create new row
  • 21. Ta Time Series Data in HBase ke 5 Colu mn +90 +92 Key Base timestamp 0x1 1234567800 1 3 always a multiple of 0x2 1234567800 2 600 In order to scale easily and keep TSD stateless, make writes independent & idempotent. New rule: rows are aligned on 10 min. boundaries
  • 22. Ta Time Series Data in HBase ke 6 Colu mn +1890 +1892 Key Base timestamp 0x1 1234566000 1 3 always a multiple of 0x2 1234566000 2 3600 1 data point every ~10s => 60 data points / row Not much. Go to wider rows to further increase seek speed. One hour rows = 6x fewer rows
  • 23. Ta Time Series Data in HBase ke 6 Colu mn +1890 +1892 Key 0x1 1234566000 1 3 0x2 1234566000 2 Remember: wider rows don’t save any space! Key Colum Value Key is easily 4x le 0x1 1234566000 n +1890 1 bigger than tab column + value al ed 0x1 1234566000 tu or +1892 3 Ac st 0x2 1234566000 +1890 2 and repeated
  • 24. Ta Time Series Data in HBase ke 7 Colu mn +1890 +1890 +1892 +1892 Key 0x1 1234566000 1 1 3 3 0x2 1234566000 2 Solution: “compact” columns by concatenation Key Column Value Space savings le 0x1 1234566000 +1890 1 on disk and in tab al ed 0x1 tu or 1234566000 +1890,+1892 1, 3 memory are Ac st 0x1 1234566000 +1892 3 huge: data is 0x2 1234566000 +1890 2 4x-8x smaller!
  • 25. ¿ Questions ? ub tH Gi on opentsdb.net e m kr Fo Summary • Use asynchbase • Use Netty or Finagle • Wider table > Taller table • Short family names • Make writes idempotent • Make writes independent • Compact your data • Have predictable key sizes ool? Thin k this is c Benoît “tsuna” Sigoure W e’re hiring tsuna@stumbleupon.com