SlideShare uma empresa Scribd logo
1 de 43
Baixar para ler offline
collectd & PostgreSQL

       Mark Wong
 markwkm@postgresql.org
 mark.wong@myemma.com
         PDXPUG


    November 17, 2011
My Story



     • How did I get to collectd?
     • What is collectd
     • Hacking collectd
     • Using collectd with Postgres
     • Visualizing the data




    markwkm (PDXPUG)            collectd & PostgreSQL   November 17, 2011   2 / 43
Brief background




     • Working at a little company called Emma http://myemma.com
     • Collect performance data from production systems




    markwkm (PDXPUG)           collectd & PostgreSQL       November 17, 2011   3 / 43
What did we have?



     • A database with over 1 million database objects
           • >500,000 tables
           • >1,000,000 indexes

     • Tables alone generate 11,000,000 data point per sample




    markwkm (PDXPUG)            collectd & PostgreSQL           November 17, 2011   4 / 43
What did we try?


  Only free things:
      • Cacti http://www.cacti.net/
      • Ganglia http://ganglia.info/
      • Munin http://munin-monitoring.org/
      • Reconnoiter https://labs.omniti.com/labs/reconnoiter
      • Zenoss http://community.zenoss.org/




    markwkm (PDXPUG)         collectd & PostgreSQL      November 17, 2011   5 / 43
What doesn’t work

  Dependency on RRDtool; can’t handle more than hundreds of thousands of
  metrics (Application Buffer-Cache Management for Performance: Running the
  World’s Largest MRTG by David Plonka, Archit Gupta and Dale Carder, LISA
  2007):
      • Cacti
      • Ganglia
      • Munin
      • Reconnoiter
      • Zenoss



     markwkm (PDXPUG)           collectd & PostgreSQL          November 17, 2011   6 / 43
Reconnoiter almost worked for us

  Pro’s:
      • Write your own SQL queries to collect data from Postgres
      • Used Postgres instead of RRDtool for storing data
      • JavaScript based on-the-fly charting
      • Support for integrating many other health and stats collection solutions
  Con’s:
      • Data collection still couldn’t keep up; maybe needed more tuning
      • Faster hardware? (using VM’s)
      • More hardware? (scale out MQ processes)


     markwkm (PDXPUG)             collectd & PostgreSQL            November 17, 2011   7 / 43
Couldn’t bring myself to try anything else



      • Hands were tied, no resources available to help move forward.
      • Can we build something light weight?
      • Played with collectd (http://collectd.org/) while evaluating
          Reconnoiter




     markwkm (PDXPUG)            collectd & PostgreSQL            November 17, 2011   8 / 43
What is collectd?



          collectd is a daemon which collects system performance
          statistics periodically and provides mechanisms to store the
          values in a variety of ways, for example in RRD files.

  http://collectd.org/




     markwkm (PDXPUG)               collectd & PostgreSQL            November 17, 2011   9 / 43
Does this look familiar?




  Note: RRDtool is an option, not a requirement
     markwkm (PDXPUG)                  collectd & PostgreSQL   November 17, 2011   10 / 43
What is special about collectd?

  From their web site:
      •         it’s written in C for performance and portability
      •         includes optimizations and features to handle hundreds
                of thousands of data sets
      • PostgreSQL plugin enables querying the database
      • Can collect most operating systems statistics (I say “most” because I
         don’t know if anything is missing)
      • Over 90 total plugins
         http://collectd.org/wiki/index.php/Table_of_Plugins


     markwkm (PDXPUG)             collectd & PostgreSQL           November 17, 2011   11 / 43
collectd data description

      • time - when the data was collected
      • interval - frequency of data collection
      • host - server hostname
      • plugin - collectd plugin used
      • plugin instance - additional plugin information
      • type - type of data collected for set of values
      • type instance - unique identifier of the metric
      • dsnames - names for the values collected
      • dstypes - type of data for values collected (e.g. counter, gauge, etc.)
      • values - array of values collected

     markwkm (PDXPUG)              collectd & PostgreSQL            November 17, 2011   12 / 43
PostgreSQL plugin configuration
  Define custom queries in collectd.conf:

  LoadPlugin postgresql
  <Plugin postgresql>
     <Query magic>
         Statement "SELECT magic FROM wizard;"
         <Result>
             Type gauge
             InstancePrefix "magic"
             ValuesFrom magic
         </Result>
     </Query>
  ...

     markwkm (PDXPUG)             collectd & PostgreSQL   November 17, 2011   13 / 43
. . . per database.

...
   <Database bar>
       Interval 60
       Service "service_name"
       Query backend # predefined
       Query magic_tickets
   </Database>
</Plugin>


Full details at
http://collectd.org/wiki/index.php/Plugin:PostgreSQL

   markwkm (PDXPUG)       collectd & PostgreSQL        November 17, 2011   14 / 43
Hurdles



  More meta data:
      • Need a way to save schema, table, and index names; can’t differentiate
        stats between tables and indexes
      • Basic support of meta data in collectd but mostly unused
      • How to store data in something other than RRDtool




     markwkm (PDXPUG)            collectd & PostgreSQL          November 17, 2011   15 / 43
Wanted: additional meta data


  Hack the PostgreSQL plugin to create meta data for:
      • database - database name (maybe not needed, same as
         plugin instance)
      • schemaname - schema name
      • tablename - table name
      • indexname - index name
      • metric - e.g. blks hit, blks read, seq scan, etc.




     markwkm (PDXPUG)           collectd & PostgreSQL         November 17, 2011   16 / 43
Another database query for collecting a table statistic



  <Query table_stats>
      SELECT schemaname, relname, seq_scan
      FROM pg_stat_all_tables;
  <Query>




     markwkm (PDXPUG)       collectd & PostgreSQL   November 17, 2011   17 / 43
Identify the data



  <Result>
      Type counter
      InstancePrefix "seq_scan"
      InstancesFrom "schemaname" "relname"
      ValuesFrom "seq_scan"
  </Result>




     markwkm (PDXPUG)       collectd & PostgreSQL   November 17, 2011   18 / 43
Meta data specific parameters


  <Database postgres>
      Host "localhost"
      Query table_stats
      SchemanameColumn 0
      TablenameColumn 1
  </Database>



  Note: The database name is set by what is specified in the <Database>tag, if
  it is not retrieved by the query.

     markwkm (PDXPUG)            collectd & PostgreSQL          November 17, 2011   19 / 43
Example data

     • time: 2011-10-20 18:04:17-05
     • interval: 300
     • host: pong.int
     • plugin: postgresql
     • plugin instance: sandbox
     • type: counter
     • type instance: seq scan-pg catalog-pg class
     • dsnames: {value}
     • dstypes: {counter}
     • values: {249873}

    markwkm (PDXPUG)              collectd & PostgreSQL   November 17, 2011   20 / 43
Example meta data



     • database: sandbox
     • schemaname: pg catalog
     • tablename: pg class
     • indexname:
     • metric: seq scan




    markwkm (PDXPUG)            collectd & PostgreSQL   November 17, 2011   21 / 43
Now what?



  Hand’s were tied (I think I mentioned that earlier); open sourced work to date:

      • collectd forked with patches
        https://github.com/mwongatemma/collectd
      • YAMS https://github.com/myemma/yams




     markwkm (PDXPUG)              collectd & PostgreSQL            November 17, 2011   22 / 43
Yet Another Monitoring System




markwkm (PDXPUG)             collectd & PostgreSQL   November 17, 2011   23 / 43
Switching hats and boosting code




  Using extracurricular time working on equipment donated to Postgres from
  SUN, IBM, and HP to continue proofing collectd changes.




     markwkm (PDXPUG)            collectd & PostgreSQL           November 17, 2011   24 / 43
How am I going to move the data?

  Options from available write plugins; guess which I used:
      • Carbon - Graphite’s storage API to Whisper
        http://collectd.org/wiki/index.php/Plugin:Carbon
      • CSV http://collectd.org/wiki/index.php/Plugin:CSV
      • Network - Send/Receive to other collectd daemons
        http://collectd.org/wiki/index.php/Plugin:Network
      • RRDCacheD http://collectd.org/wiki/index.php/Plugin:RRDCacheD
      • RRDtool http://collectd.org/wiki/index.php/Plugin:RRDtool
      • SysLog http://collectd.org/wiki/index.php/Plugin:SysLog
      • UnixSock http://collectd.org/wiki/index.php/Plugin:UnixSock
      • Write HTTP - PUTVAL (plain text), JSON
        http://collectd.org/wiki/index.php/Plugin:Write_HTTP

     markwkm (PDXPUG)              collectd & PostgreSQL      November 17, 2011   25 / 43
Process of elimination

  If RRDtool (wriiten in C) can’t handle massive volumes of data, a Python
  RRD like database probably can’t either:
       • Carbon
       • CSV
       • Network
       • RRDCacheD
       • RRDtool
       • SysLog
       • UnixSock
       • Write HTTP - PUTVAL (plain text), JSON

     markwkm (PDXPUG)             collectd & PostgreSQL           November 17, 2011   26 / 43
Process of elimination


  Writing to other collectd daemons or just locally doesn’t seem useful at the
  moment:
      • CSV
      • Network
      • SysLog
      • UnixSock
      • Write HTTP - PUTVAL (plain text), JSON




     markwkm (PDXPUG)              collectd & PostgreSQL            November 17, 2011   27 / 43
Process of elimination



  Let’s try CouchDB’s RESTful JSON API!
       • CSV
       • SysLog
       • Write HTTP - PUTVAL (plain text), JSON




     markwkm (PDXPUG)          collectd & PostgreSQL   November 17, 2011   28 / 43
Random: What Write HTTP PUTVAL data looks like
  Note: Each PUTVAL is a single line but is broken up into two lines to fit onto
  the slide.

  PUTVAL leeloo.lan.home.verplant.org/disk-sda/disk_octets
      interval=10 1251533299:197141504:175136768
  PUTVAL leeloo.lan.home.verplant.org/disk-sda/disk_ops
      interval=10 1251533299:10765:12858
  PUTVAL leeloo.lan.home.verplant.org/disk-sda/disk_time
      interval=10 1251533299:5:140
  PUTVAL leeloo.lan.home.verplant.org/disk-sda/disk_merged
      interval=10 1251533299:4658:29899


     markwkm (PDXPUG)             collectd & PostgreSQL           November 17, 2011   29 / 43
Random: What the Write HTTP JSON data looks like
  Note: Write HTTP packs as much data as it can into a 4KB buffer.
   [ {
       "values": [197141504, 175136768],
       "dstypes": ["counter", "counter"],
       "dsnames": ["read", "write"],
       "time": 1251533299,
       "interval": 10,
       "host": "leeloo.lan.home.verplant.org",
       "plugin": "disk",
       "plugin_instance": "sda",
       "type": "disk_octets",
       "type_instance": ""
     }, ... ]
     markwkm (PDXPUG)           collectd & PostgreSQL         November 17, 2011   30 / 43
I didn’t know anything about CouchDB at the time



     • Query interface not really suited for retrieving data to visualize
     • Insert performance not suited for millions of metrics of data over short
         intervals (can insert same data into Postgres several orders of
         magnitude faster)




    markwkm (PDXPUG)               collectd & PostgreSQL            November 17, 2011   31 / 43
Now where am I going to put the data?



  Hoping that using the Write HTTP is still a good choice:
      • Write an ETL
                •   Table partitioning logic; creation of partition tables
                •   Transform JSON data into INSERT statements
      • Use Postgres




     markwkm (PDXPUG)                     collectd & PostgreSQL              November 17, 2011   32 / 43
Database design
                Table "collectd.value_list"
       Column      |           Type           | Modifiers
  -----------------+--------------------------+-----------
   time            | timestamp with time zone | not null
   interval        | integer                  | not null
   host            | character varying(64)    | not null
   plugin          | character varying(64)    | not null
   plugin_instance | character varying(64)    |
   type            | character varying(64)    | not null
   type_instance   | character varying(64)    |
   dsnames         | character varying(512)[] | not null
   dstypes         | character varying(8)[]   | not null
   values          | numeric[]                | not null
    markwkm (PDXPUG)        collectd & PostgreSQL     November 17, 2011   33 / 43
Take advantage of partitioning




  At least table inheritance in Postgres’ case; partition data by plugin




     markwkm (PDXPUG)               collectd & PostgreSQL             November 17, 2011   34 / 43
Child table
               Table "collectd.vl_postgresql"
       Column      |           Type            | Modifiers
  -----------------+--------------------------+-----------
   ...
   database        | character varying(64)     | not null
   schemaname      | character varying(64)     |
   tablename       | character varying(64)     |
   indexname       | character varying(64)     |
   metric          | character varying(64)     | not null
  Check constraints:
      "vl_postgresql_plugin_check" CHECK (plugin::text =
                                           ’postgresql’::text)
  Inherits: value_list
     markwkm (PDXPUG)       collectd & PostgreSQL      November 17, 2011   35 / 43
How much partitioning?


  Lots of straightforward options:
      • Date
      • Database
      • Schema
      • Table
      • Index
      • Metric




     markwkm (PDXPUG)                collectd & PostgreSQL   November 17, 2011   36 / 43
Back to the ETL


  Parameters set for fastest path to working prototype:
      • Keeping using HTTP POST (Write HTTP plugin) for HTTP protocol
        and JSON
      • Use Python for built in HTTP Server and JSON parsing (Emma is
        primarily a Python shop)
      • Use SQLAlchemy/psycopg2




    markwkm (PDXPUG)          collectd & PostgreSQL       November 17, 2011   37 / 43
Back again to the ETL

  Python didn’t perform; combination of JSON parsing, data transformation,
  and INSERT performance still several orders of magnitude below acceptable
  levels:
       • redis to queue data to transform
       • lighttpd for the HTTP interface
       • fastcgi C program to push things to redis
       • multi-threaded C program using libpq for Postgres API
                •   pop data out of redis
                •   table partitioning creation logic
                •   transform JSON data into INSERT statements


     markwkm (PDXPUG)                  collectd & PostgreSQL     November 17, 2011   38 / 43
Success?




     • Table statistics for 1 million tables collect in approximately 12 minutes.
     • Is that acceptable?
     • Can we go faster?




    markwkm (PDXPUG)              collectd & PostgreSQL             November 17, 2011   39 / 43
If you don’t have millions of data


  Easier ways to visualize the data:
       • RRDtool
       • RRDtool compatible front-ends
         http://collectd.org/wiki/index.php/List_of_front-ends
       • Graphite with the Carbon and Whisper combo
         http://graphite.wikidot.com/
       • Reconnoiter




     markwkm (PDXPUG)       collectd & PostgreSQL      November 17, 2011   40 / 43
__      __
          / ~~~/  . o O ( Thank you! )
    ,----(       oo     )
  /        __      __/
 /|            ( |(
^     /___ / |
    |__|    |__|-"




  markwkm (PDXPUG)       collectd & PostgreSQL   November 17, 2011   41 / 43
Acknowledgements

  Hayley Jane Wakenshaw

              __      __
            / ~~~/ 
      ,----(       oo     )
    /        __      __/
   /|            ( |(
  ^     /___ / |
      |__|    |__|-"



     markwkm (PDXPUG)         collectd & PostgreSQL   November 17, 2011   42 / 43
License



  This work is licensed under a Creative Commons Attribution 3.0 Unported
  License. To view a copy of this license, (a) visit
  http://creativecommons.org/licenses/by/3.0/us/; or, (b) send a
  letter to Creative Commons, 171 2nd Street, Suite 300, San Francisco,
  California, 94105, USA.




     markwkm (PDXPUG)            collectd & PostgreSQL          November 17, 2011   43 / 43

Mais conteúdo relacionado

Mais procurados

Patroni: Kubernetes-native PostgreSQL companion
Patroni: Kubernetes-native PostgreSQL companionPatroni: Kubernetes-native PostgreSQL companion
Patroni: Kubernetes-native PostgreSQL companion
Alexander Kukushkin
 
Kevin Kempter PostgreSQL Backup and Recovery Methods @ Postgres Open
Kevin Kempter PostgreSQL Backup and Recovery Methods @ Postgres OpenKevin Kempter PostgreSQL Backup and Recovery Methods @ Postgres Open
Kevin Kempter PostgreSQL Backup and Recovery Methods @ Postgres Open
PostgresOpen
 

Mais procurados (20)

Altinity Quickstart for ClickHouse
Altinity Quickstart for ClickHouseAltinity Quickstart for ClickHouse
Altinity Quickstart for ClickHouse
 
Introduction to PgBench
Introduction to PgBenchIntroduction to PgBench
Introduction to PgBench
 
PGConf.ASIA 2017 Logical Replication Internals (English)
PGConf.ASIA 2017 Logical Replication Internals (English)PGConf.ASIA 2017 Logical Replication Internals (English)
PGConf.ASIA 2017 Logical Replication Internals (English)
 
All about Zookeeper and ClickHouse Keeper.pdf
All about Zookeeper and ClickHouse Keeper.pdfAll about Zookeeper and ClickHouse Keeper.pdf
All about Zookeeper and ClickHouse Keeper.pdf
 
10 Good Reasons to Use ClickHouse
10 Good Reasons to Use ClickHouse10 Good Reasons to Use ClickHouse
10 Good Reasons to Use ClickHouse
 
Container Security Deep Dive & Kubernetes
Container Security Deep Dive & Kubernetes Container Security Deep Dive & Kubernetes
Container Security Deep Dive & Kubernetes
 
Patroni: Kubernetes-native PostgreSQL companion
Patroni: Kubernetes-native PostgreSQL companionPatroni: Kubernetes-native PostgreSQL companion
Patroni: Kubernetes-native PostgreSQL companion
 
Adventures with the ClickHouse ReplacingMergeTree Engine
Adventures with the ClickHouse ReplacingMergeTree EngineAdventures with the ClickHouse ReplacingMergeTree Engine
Adventures with the ClickHouse ReplacingMergeTree Engine
 
Kerbernetes Robotics Distributed System Deep Dive
Kerbernetes Robotics Distributed System Deep DiveKerbernetes Robotics Distributed System Deep Dive
Kerbernetes Robotics Distributed System Deep Dive
 
MongoDB Performance Tuning
MongoDB Performance TuningMongoDB Performance Tuning
MongoDB Performance Tuning
 
Oracle Latch and Mutex Contention Troubleshooting
Oracle Latch and Mutex Contention TroubleshootingOracle Latch and Mutex Contention Troubleshooting
Oracle Latch and Mutex Contention Troubleshooting
 
PCA_Admin_Presentation-1.pptx
PCA_Admin_Presentation-1.pptxPCA_Admin_Presentation-1.pptx
PCA_Admin_Presentation-1.pptx
 
PostgreSQL Replication High Availability Methods
PostgreSQL Replication High Availability MethodsPostgreSQL Replication High Availability Methods
PostgreSQL Replication High Availability Methods
 
Kevin Kempter PostgreSQL Backup and Recovery Methods @ Postgres Open
Kevin Kempter PostgreSQL Backup and Recovery Methods @ Postgres OpenKevin Kempter PostgreSQL Backup and Recovery Methods @ Postgres Open
Kevin Kempter PostgreSQL Backup and Recovery Methods @ Postgres Open
 
Practical Partitioning in Production with Postgres
Practical Partitioning in Production with PostgresPractical Partitioning in Production with Postgres
Practical Partitioning in Production with Postgres
 
Ilya Kosmodemiansky - An ultimate guide to upgrading your PostgreSQL installa...
Ilya Kosmodemiansky - An ultimate guide to upgrading your PostgreSQL installa...Ilya Kosmodemiansky - An ultimate guide to upgrading your PostgreSQL installa...
Ilya Kosmodemiansky - An ultimate guide to upgrading your PostgreSQL installa...
 
Elephant Roads: a tour of Postgres forks
Elephant Roads: a tour of Postgres forksElephant Roads: a tour of Postgres forks
Elephant Roads: a tour of Postgres forks
 
Multi Master PostgreSQL Cluster on Kubernetes
Multi Master PostgreSQL Cluster on KubernetesMulti Master PostgreSQL Cluster on Kubernetes
Multi Master PostgreSQL Cluster on Kubernetes
 
Migration to Oracle Multitenant
Migration to Oracle MultitenantMigration to Oracle Multitenant
Migration to Oracle Multitenant
 
Efficient DBA: Gain Time by Reducing Command-Line Keystrokes
Efficient DBA: Gain Time by Reducing Command-Line KeystrokesEfficient DBA: Gain Time by Reducing Command-Line Keystrokes
Efficient DBA: Gain Time by Reducing Command-Line Keystrokes
 

Destaque

OHAI, my name is Chelnik! PGCon 2014 Mockumentary
OHAI, my name is Chelnik! PGCon 2014 MockumentaryOHAI, my name is Chelnik! PGCon 2014 Mockumentary
OHAI, my name is Chelnik! PGCon 2014 Mockumentary
Mark Wong
 
Bacd zenoss
Bacd zenossBacd zenoss
Bacd zenoss
ke4qqq
 
Advanced Postgres Monitoring
Advanced Postgres MonitoringAdvanced Postgres Monitoring
Advanced Postgres Monitoring
Denish Patel
 

Destaque (17)

OHAI, my name is Chelnik! PGCon 2014 Mockumentary
OHAI, my name is Chelnik! PGCon 2014 MockumentaryOHAI, my name is Chelnik! PGCon 2014 Mockumentary
OHAI, my name is Chelnik! PGCon 2014 Mockumentary
 
Influxdb
InfluxdbInfluxdb
Influxdb
 
pg_top is 'top' for PostgreSQL: pg_top + pg_proctab
pg_top is 'top' for PostgreSQL: pg_top + pg_proctabpg_top is 'top' for PostgreSQL: pg_top + pg_proctab
pg_top is 'top' for PostgreSQL: pg_top + pg_proctab
 
pg_top is 'top' for PostgreSQL
pg_top is 'top' for PostgreSQLpg_top is 'top' for PostgreSQL
pg_top is 'top' for PostgreSQL
 
Bacd zenoss
Bacd zenossBacd zenoss
Bacd zenoss
 
Monitoring at Cloud Scale
Monitoring at Cloud ScaleMonitoring at Cloud Scale
Monitoring at Cloud Scale
 
3 conley-mar16
3 conley-mar163 conley-mar16
3 conley-mar16
 
2 macrina-mar16
2 macrina-mar162 macrina-mar16
2 macrina-mar16
 
1 hellman-mar16
1 hellman-mar161 hellman-mar16
1 hellman-mar16
 
InfluxDB & Grafana
InfluxDB & GrafanaInfluxDB & Grafana
InfluxDB & Grafana
 
Advanced Postgres Monitoring
Advanced Postgres MonitoringAdvanced Postgres Monitoring
Advanced Postgres Monitoring
 
InfluxDb
InfluxDbInfluxDb
InfluxDb
 
Backend server monitoring and alarm system (collectd, graphite, grafana, zabb...
Backend server monitoring and alarm system (collectd, graphite, grafana, zabb...Backend server monitoring and alarm system (collectd, graphite, grafana, zabb...
Backend server monitoring and alarm system (collectd, graphite, grafana, zabb...
 
Beautiful Monitoring With Grafana and InfluxDB
Beautiful Monitoring With Grafana and InfluxDBBeautiful Monitoring With Grafana and InfluxDB
Beautiful Monitoring With Grafana and InfluxDB
 
Best Practices for Becoming an Exceptional Postgres DBA
Best Practices for Becoming an Exceptional Postgres DBA Best Practices for Becoming an Exceptional Postgres DBA
Best Practices for Becoming an Exceptional Postgres DBA
 
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.
 
5 Steps to PostgreSQL Performance
5 Steps to PostgreSQL Performance5 Steps to PostgreSQL Performance
5 Steps to PostgreSQL Performance
 

Semelhante a collectd & PostgreSQL

Operating PostgreSQL at Scale with Kubernetes
Operating PostgreSQL at Scale with KubernetesOperating PostgreSQL at Scale with Kubernetes
Operating PostgreSQL at Scale with Kubernetes
Jonathan Katz
 

Semelhante a collectd & PostgreSQL (20)

Making Postgres Central in Your Data Center
Making Postgres Central in Your Data CenterMaking Postgres Central in Your Data Center
Making Postgres Central in Your Data Center
 
PostgreSQL 9.4
PostgreSQL 9.4PostgreSQL 9.4
PostgreSQL 9.4
 
Operating PostgreSQL at Scale with Kubernetes
Operating PostgreSQL at Scale with KubernetesOperating PostgreSQL at Scale with Kubernetes
Operating PostgreSQL at Scale with Kubernetes
 
Logs aggregation and analysis
Logs aggregation and analysisLogs aggregation and analysis
Logs aggregation and analysis
 
10 Reasons to Start Your Analytics Project with PostgreSQL
10 Reasons to Start Your Analytics Project with PostgreSQL10 Reasons to Start Your Analytics Project with PostgreSQL
10 Reasons to Start Your Analytics Project with PostgreSQL
 
PostgreSQL 9.4, 9.5 and Beyond @ COSCUP 2015 Taipei
PostgreSQL 9.4, 9.5 and Beyond @ COSCUP 2015 TaipeiPostgreSQL 9.4, 9.5 and Beyond @ COSCUP 2015 Taipei
PostgreSQL 9.4, 9.5 and Beyond @ COSCUP 2015 Taipei
 
Postgres в основе вашего дата-центра, Bruce Momjian (EnterpriseDB)
Postgres в основе вашего дата-центра, Bruce Momjian (EnterpriseDB)Postgres в основе вашего дата-центра, Bruce Momjian (EnterpriseDB)
Postgres в основе вашего дата-центра, Bruce Momjian (EnterpriseDB)
 
Monitoring pg with_graphite_grafana
Monitoring pg with_graphite_grafanaMonitoring pg with_graphite_grafana
Monitoring pg with_graphite_grafana
 
Prestogres internals
Prestogres internalsPrestogres internals
Prestogres internals
 
Perl Programming - 04 Programming Database
Perl Programming - 04 Programming DatabasePerl Programming - 04 Programming Database
Perl Programming - 04 Programming Database
 
PostgreSQL 9.4 and Beyond @ FOSSASIA 2015 Singapore
PostgreSQL 9.4 and Beyond @ FOSSASIA 2015 SingaporePostgreSQL 9.4 and Beyond @ FOSSASIA 2015 Singapore
PostgreSQL 9.4 and Beyond @ FOSSASIA 2015 Singapore
 
OSMC 2008 | PostgreSQL Monitoring - Introduction, Internals And Monitoring S...
OSMC 2008 |  PostgreSQL Monitoring - Introduction, Internals And Monitoring S...OSMC 2008 |  PostgreSQL Monitoring - Introduction, Internals And Monitoring S...
OSMC 2008 | PostgreSQL Monitoring - Introduction, Internals And Monitoring S...
 
Beyond Postgres: Interesting Projects, Tools and forks
Beyond Postgres: Interesting Projects, Tools and forksBeyond Postgres: Interesting Projects, Tools and forks
Beyond Postgres: Interesting Projects, Tools and forks
 
Jethro data meetup index base sql on hadoop - oct-2014
Jethro data meetup    index base sql on hadoop - oct-2014Jethro data meetup    index base sql on hadoop - oct-2014
Jethro data meetup index base sql on hadoop - oct-2014
 
Attack monitoring using ElasticSearch Logstash and Kibana
Attack monitoring using ElasticSearch Logstash and KibanaAttack monitoring using ElasticSearch Logstash and Kibana
Attack monitoring using ElasticSearch Logstash and Kibana
 
OSDC 2016 - Unifying Logs and Metrics Data with Elastic Beats by Monica Sarbu
OSDC 2016 - Unifying Logs and Metrics Data with Elastic Beats by Monica SarbuOSDC 2016 - Unifying Logs and Metrics Data with Elastic Beats by Monica Sarbu
OSDC 2016 - Unifying Logs and Metrics Data with Elastic Beats by Monica Sarbu
 
Postgresql Database Administration Basic - Day2
Postgresql  Database Administration Basic  - Day2Postgresql  Database Administration Basic  - Day2
Postgresql Database Administration Basic - Day2
 
Large Scale Crawling with Apache Nutch and Friends
Large Scale Crawling with Apache Nutch and FriendsLarge Scale Crawling with Apache Nutch and Friends
Large Scale Crawling with Apache Nutch and Friends
 
Large Scale Crawling with Apache Nutch and Friends
Large Scale Crawling with Apache Nutch and FriendsLarge Scale Crawling with Apache Nutch and Friends
Large Scale Crawling with Apache Nutch and Friends
 
ElasticSearch as (only) datastore
ElasticSearch as (only) datastoreElasticSearch as (only) datastore
ElasticSearch as (only) datastore
 

Mais de Mark Wong

Introduction to PostgreSQL
Introduction to PostgreSQLIntroduction to PostgreSQL
Introduction to PostgreSQL
Mark Wong
 

Mais de Mark Wong (18)

OHAI, my name is Chelnik! Postgres Open 2013 Report
OHAI, my name is Chelnik! Postgres Open 2013 ReportOHAI, my name is Chelnik! Postgres Open 2013 Report
OHAI, my name is Chelnik! Postgres Open 2013 Report
 
Android & PostgreSQL
Android & PostgreSQLAndroid & PostgreSQL
Android & PostgreSQL
 
PGTop for Android: Things I learned making this app
PGTop for Android: Things I learned making this appPGTop for Android: Things I learned making this app
PGTop for Android: Things I learned making this app
 
Introduction to PostgreSQL
Introduction to PostgreSQLIntroduction to PostgreSQL
Introduction to PostgreSQL
 
Developing PGTop for Android
Developing PGTop for AndroidDeveloping PGTop for Android
Developing PGTop for Android
 
Pg in-the-brazilian-armed-forces-presentation
Pg in-the-brazilian-armed-forces-presentationPg in-the-brazilian-armed-forces-presentation
Pg in-the-brazilian-armed-forces-presentation
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQL
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQL
 
PostgreSQL Portland Performance Practice Project - Database Test 2 Tuning
PostgreSQL Portland Performance Practice Project - Database Test 2 TuningPostgreSQL Portland Performance Practice Project - Database Test 2 Tuning
PostgreSQL Portland Performance Practice Project - Database Test 2 Tuning
 
pg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQLpg_proctab: Accessing System Stats in PostgreSQL
pg_proctab: Accessing System Stats in PostgreSQL
 
Filesystem Performance from a Database Perspective
Filesystem Performance from a Database PerspectiveFilesystem Performance from a Database Perspective
Filesystem Performance from a Database Perspective
 
PostgreSQL Portland Performance Practice Project - Database Test 2 Filesystem...
PostgreSQL Portland Performance Practice Project - Database Test 2 Filesystem...PostgreSQL Portland Performance Practice Project - Database Test 2 Filesystem...
PostgreSQL Portland Performance Practice Project - Database Test 2 Filesystem...
 
PostgreSQL Portland Performance Practice Project - Database Test 2 Howto
PostgreSQL Portland Performance Practice Project - Database Test 2 HowtoPostgreSQL Portland Performance Practice Project - Database Test 2 Howto
PostgreSQL Portland Performance Practice Project - Database Test 2 Howto
 
PostgreSQL Portland Performance Practice Project - Database Test 2 Workload D...
PostgreSQL Portland Performance Practice Project - Database Test 2 Workload D...PostgreSQL Portland Performance Practice Project - Database Test 2 Workload D...
PostgreSQL Portland Performance Practice Project - Database Test 2 Workload D...
 
PostgreSQL Portland Performance Practice Project - Database Test 2 Background
PostgreSQL Portland Performance Practice Project - Database Test 2 BackgroundPostgreSQL Portland Performance Practice Project - Database Test 2 Background
PostgreSQL Portland Performance Practice Project - Database Test 2 Background
 
PostgreSQL Portland Performance Practice Project - Database Test 2 Series Ove...
PostgreSQL Portland Performance Practice Project - Database Test 2 Series Ove...PostgreSQL Portland Performance Practice Project - Database Test 2 Series Ove...
PostgreSQL Portland Performance Practice Project - Database Test 2 Series Ove...
 
Linux Filesystems, RAID, and more
Linux Filesystems, RAID, and moreLinux Filesystems, RAID, and more
Linux Filesystems, RAID, and more
 
What Is Going On?
What Is Going On?What Is Going On?
What Is Going On?
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Último (20)

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 

collectd & PostgreSQL

  • 1. collectd & PostgreSQL Mark Wong markwkm@postgresql.org mark.wong@myemma.com PDXPUG November 17, 2011
  • 2. My Story • How did I get to collectd? • What is collectd • Hacking collectd • Using collectd with Postgres • Visualizing the data markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 2 / 43
  • 3. Brief background • Working at a little company called Emma http://myemma.com • Collect performance data from production systems markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 3 / 43
  • 4. What did we have? • A database with over 1 million database objects • >500,000 tables • >1,000,000 indexes • Tables alone generate 11,000,000 data point per sample markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 4 / 43
  • 5. What did we try? Only free things: • Cacti http://www.cacti.net/ • Ganglia http://ganglia.info/ • Munin http://munin-monitoring.org/ • Reconnoiter https://labs.omniti.com/labs/reconnoiter • Zenoss http://community.zenoss.org/ markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 5 / 43
  • 6. What doesn’t work Dependency on RRDtool; can’t handle more than hundreds of thousands of metrics (Application Buffer-Cache Management for Performance: Running the World’s Largest MRTG by David Plonka, Archit Gupta and Dale Carder, LISA 2007): • Cacti • Ganglia • Munin • Reconnoiter • Zenoss markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 6 / 43
  • 7. Reconnoiter almost worked for us Pro’s: • Write your own SQL queries to collect data from Postgres • Used Postgres instead of RRDtool for storing data • JavaScript based on-the-fly charting • Support for integrating many other health and stats collection solutions Con’s: • Data collection still couldn’t keep up; maybe needed more tuning • Faster hardware? (using VM’s) • More hardware? (scale out MQ processes) markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 7 / 43
  • 8. Couldn’t bring myself to try anything else • Hands were tied, no resources available to help move forward. • Can we build something light weight? • Played with collectd (http://collectd.org/) while evaluating Reconnoiter markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 8 / 43
  • 9. What is collectd? collectd is a daemon which collects system performance statistics periodically and provides mechanisms to store the values in a variety of ways, for example in RRD files. http://collectd.org/ markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 9 / 43
  • 10. Does this look familiar? Note: RRDtool is an option, not a requirement markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 10 / 43
  • 11. What is special about collectd? From their web site: • it’s written in C for performance and portability • includes optimizations and features to handle hundreds of thousands of data sets • PostgreSQL plugin enables querying the database • Can collect most operating systems statistics (I say “most” because I don’t know if anything is missing) • Over 90 total plugins http://collectd.org/wiki/index.php/Table_of_Plugins markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 11 / 43
  • 12. collectd data description • time - when the data was collected • interval - frequency of data collection • host - server hostname • plugin - collectd plugin used • plugin instance - additional plugin information • type - type of data collected for set of values • type instance - unique identifier of the metric • dsnames - names for the values collected • dstypes - type of data for values collected (e.g. counter, gauge, etc.) • values - array of values collected markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 12 / 43
  • 13. PostgreSQL plugin configuration Define custom queries in collectd.conf: LoadPlugin postgresql <Plugin postgresql> <Query magic> Statement "SELECT magic FROM wizard;" <Result> Type gauge InstancePrefix "magic" ValuesFrom magic </Result> </Query> ... markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 13 / 43
  • 14. . . . per database. ... <Database bar> Interval 60 Service "service_name" Query backend # predefined Query magic_tickets </Database> </Plugin> Full details at http://collectd.org/wiki/index.php/Plugin:PostgreSQL markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 14 / 43
  • 15. Hurdles More meta data: • Need a way to save schema, table, and index names; can’t differentiate stats between tables and indexes • Basic support of meta data in collectd but mostly unused • How to store data in something other than RRDtool markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 15 / 43
  • 16. Wanted: additional meta data Hack the PostgreSQL plugin to create meta data for: • database - database name (maybe not needed, same as plugin instance) • schemaname - schema name • tablename - table name • indexname - index name • metric - e.g. blks hit, blks read, seq scan, etc. markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 16 / 43
  • 17. Another database query for collecting a table statistic <Query table_stats> SELECT schemaname, relname, seq_scan FROM pg_stat_all_tables; <Query> markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 17 / 43
  • 18. Identify the data <Result> Type counter InstancePrefix "seq_scan" InstancesFrom "schemaname" "relname" ValuesFrom "seq_scan" </Result> markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 18 / 43
  • 19. Meta data specific parameters <Database postgres> Host "localhost" Query table_stats SchemanameColumn 0 TablenameColumn 1 </Database> Note: The database name is set by what is specified in the <Database>tag, if it is not retrieved by the query. markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 19 / 43
  • 20. Example data • time: 2011-10-20 18:04:17-05 • interval: 300 • host: pong.int • plugin: postgresql • plugin instance: sandbox • type: counter • type instance: seq scan-pg catalog-pg class • dsnames: {value} • dstypes: {counter} • values: {249873} markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 20 / 43
  • 21. Example meta data • database: sandbox • schemaname: pg catalog • tablename: pg class • indexname: • metric: seq scan markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 21 / 43
  • 22. Now what? Hand’s were tied (I think I mentioned that earlier); open sourced work to date: • collectd forked with patches https://github.com/mwongatemma/collectd • YAMS https://github.com/myemma/yams markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 22 / 43
  • 23. Yet Another Monitoring System markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 23 / 43
  • 24. Switching hats and boosting code Using extracurricular time working on equipment donated to Postgres from SUN, IBM, and HP to continue proofing collectd changes. markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 24 / 43
  • 25. How am I going to move the data? Options from available write plugins; guess which I used: • Carbon - Graphite’s storage API to Whisper http://collectd.org/wiki/index.php/Plugin:Carbon • CSV http://collectd.org/wiki/index.php/Plugin:CSV • Network - Send/Receive to other collectd daemons http://collectd.org/wiki/index.php/Plugin:Network • RRDCacheD http://collectd.org/wiki/index.php/Plugin:RRDCacheD • RRDtool http://collectd.org/wiki/index.php/Plugin:RRDtool • SysLog http://collectd.org/wiki/index.php/Plugin:SysLog • UnixSock http://collectd.org/wiki/index.php/Plugin:UnixSock • Write HTTP - PUTVAL (plain text), JSON http://collectd.org/wiki/index.php/Plugin:Write_HTTP markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 25 / 43
  • 26. Process of elimination If RRDtool (wriiten in C) can’t handle massive volumes of data, a Python RRD like database probably can’t either: • Carbon • CSV • Network • RRDCacheD • RRDtool • SysLog • UnixSock • Write HTTP - PUTVAL (plain text), JSON markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 26 / 43
  • 27. Process of elimination Writing to other collectd daemons or just locally doesn’t seem useful at the moment: • CSV • Network • SysLog • UnixSock • Write HTTP - PUTVAL (plain text), JSON markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 27 / 43
  • 28. Process of elimination Let’s try CouchDB’s RESTful JSON API! • CSV • SysLog • Write HTTP - PUTVAL (plain text), JSON markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 28 / 43
  • 29. Random: What Write HTTP PUTVAL data looks like Note: Each PUTVAL is a single line but is broken up into two lines to fit onto the slide. PUTVAL leeloo.lan.home.verplant.org/disk-sda/disk_octets interval=10 1251533299:197141504:175136768 PUTVAL leeloo.lan.home.verplant.org/disk-sda/disk_ops interval=10 1251533299:10765:12858 PUTVAL leeloo.lan.home.verplant.org/disk-sda/disk_time interval=10 1251533299:5:140 PUTVAL leeloo.lan.home.verplant.org/disk-sda/disk_merged interval=10 1251533299:4658:29899 markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 29 / 43
  • 30. Random: What the Write HTTP JSON data looks like Note: Write HTTP packs as much data as it can into a 4KB buffer. [ { "values": [197141504, 175136768], "dstypes": ["counter", "counter"], "dsnames": ["read", "write"], "time": 1251533299, "interval": 10, "host": "leeloo.lan.home.verplant.org", "plugin": "disk", "plugin_instance": "sda", "type": "disk_octets", "type_instance": "" }, ... ] markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 30 / 43
  • 31. I didn’t know anything about CouchDB at the time • Query interface not really suited for retrieving data to visualize • Insert performance not suited for millions of metrics of data over short intervals (can insert same data into Postgres several orders of magnitude faster) markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 31 / 43
  • 32. Now where am I going to put the data? Hoping that using the Write HTTP is still a good choice: • Write an ETL • Table partitioning logic; creation of partition tables • Transform JSON data into INSERT statements • Use Postgres markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 32 / 43
  • 33. Database design Table "collectd.value_list" Column | Type | Modifiers -----------------+--------------------------+----------- time | timestamp with time zone | not null interval | integer | not null host | character varying(64) | not null plugin | character varying(64) | not null plugin_instance | character varying(64) | type | character varying(64) | not null type_instance | character varying(64) | dsnames | character varying(512)[] | not null dstypes | character varying(8)[] | not null values | numeric[] | not null markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 33 / 43
  • 34. Take advantage of partitioning At least table inheritance in Postgres’ case; partition data by plugin markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 34 / 43
  • 35. Child table Table "collectd.vl_postgresql" Column | Type | Modifiers -----------------+--------------------------+----------- ... database | character varying(64) | not null schemaname | character varying(64) | tablename | character varying(64) | indexname | character varying(64) | metric | character varying(64) | not null Check constraints: "vl_postgresql_plugin_check" CHECK (plugin::text = ’postgresql’::text) Inherits: value_list markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 35 / 43
  • 36. How much partitioning? Lots of straightforward options: • Date • Database • Schema • Table • Index • Metric markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 36 / 43
  • 37. Back to the ETL Parameters set for fastest path to working prototype: • Keeping using HTTP POST (Write HTTP plugin) for HTTP protocol and JSON • Use Python for built in HTTP Server and JSON parsing (Emma is primarily a Python shop) • Use SQLAlchemy/psycopg2 markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 37 / 43
  • 38. Back again to the ETL Python didn’t perform; combination of JSON parsing, data transformation, and INSERT performance still several orders of magnitude below acceptable levels: • redis to queue data to transform • lighttpd for the HTTP interface • fastcgi C program to push things to redis • multi-threaded C program using libpq for Postgres API • pop data out of redis • table partitioning creation logic • transform JSON data into INSERT statements markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 38 / 43
  • 39. Success? • Table statistics for 1 million tables collect in approximately 12 minutes. • Is that acceptable? • Can we go faster? markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 39 / 43
  • 40. If you don’t have millions of data Easier ways to visualize the data: • RRDtool • RRDtool compatible front-ends http://collectd.org/wiki/index.php/List_of_front-ends • Graphite with the Carbon and Whisper combo http://graphite.wikidot.com/ • Reconnoiter markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 40 / 43
  • 41. __ __ / ~~~/ . o O ( Thank you! ) ,----( oo ) / __ __/ /| ( |( ^ /___ / | |__| |__|-" markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 41 / 43
  • 42. Acknowledgements Hayley Jane Wakenshaw __ __ / ~~~/ ,----( oo ) / __ __/ /| ( |( ^ /___ / | |__| |__|-" markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 42 / 43
  • 43. License This work is licensed under a Creative Commons Attribution 3.0 Unported License. To view a copy of this license, (a) visit http://creativecommons.org/licenses/by/3.0/us/; or, (b) send a letter to Creative Commons, 171 2nd Street, Suite 300, San Francisco, California, 94105, USA. markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 43 / 43