This presentation is primarily focused on how to use collectd (http://collectd.org/) to gather data from the Postgres statistics tables. Examples of how to use collectd with Postgres will be shown. There is some hackery involved to make collectd do a little more and collect more meaningful data from Postgres. These small patches will be explored. A small portion of the discussion will be about how to visualize the data.
1. collectd & PostgreSQL
Mark Wong
markwkm@postgresql.org
mark.wong@myemma.com
PDXPUG
November 17, 2011
2. My Story
• How did I get to collectd?
• What is collectd
• Hacking collectd
• Using collectd with Postgres
• Visualizing the data
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 2 / 43
3. Brief background
• Working at a little company called Emma http://myemma.com
• Collect performance data from production systems
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 3 / 43
4. What did we have?
• A database with over 1 million database objects
• >500,000 tables
• >1,000,000 indexes
• Tables alone generate 11,000,000 data point per sample
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 4 / 43
5. What did we try?
Only free things:
• Cacti http://www.cacti.net/
• Ganglia http://ganglia.info/
• Munin http://munin-monitoring.org/
• Reconnoiter https://labs.omniti.com/labs/reconnoiter
• Zenoss http://community.zenoss.org/
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 5 / 43
6. What doesn’t work
Dependency on RRDtool; can’t handle more than hundreds of thousands of
metrics (Application Buffer-Cache Management for Performance: Running the
World’s Largest MRTG by David Plonka, Archit Gupta and Dale Carder, LISA
2007):
• Cacti
• Ganglia
• Munin
• Reconnoiter
• Zenoss
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 6 / 43
7. Reconnoiter almost worked for us
Pro’s:
• Write your own SQL queries to collect data from Postgres
• Used Postgres instead of RRDtool for storing data
• JavaScript based on-the-fly charting
• Support for integrating many other health and stats collection solutions
Con’s:
• Data collection still couldn’t keep up; maybe needed more tuning
• Faster hardware? (using VM’s)
• More hardware? (scale out MQ processes)
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 7 / 43
8. Couldn’t bring myself to try anything else
• Hands were tied, no resources available to help move forward.
• Can we build something light weight?
• Played with collectd (http://collectd.org/) while evaluating
Reconnoiter
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 8 / 43
9. What is collectd?
collectd is a daemon which collects system performance
statistics periodically and provides mechanisms to store the
values in a variety of ways, for example in RRD files.
http://collectd.org/
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 9 / 43
10. Does this look familiar?
Note: RRDtool is an option, not a requirement
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 10 / 43
11. What is special about collectd?
From their web site:
• it’s written in C for performance and portability
• includes optimizations and features to handle hundreds
of thousands of data sets
• PostgreSQL plugin enables querying the database
• Can collect most operating systems statistics (I say “most” because I
don’t know if anything is missing)
• Over 90 total plugins
http://collectd.org/wiki/index.php/Table_of_Plugins
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 11 / 43
12. collectd data description
• time - when the data was collected
• interval - frequency of data collection
• host - server hostname
• plugin - collectd plugin used
• plugin instance - additional plugin information
• type - type of data collected for set of values
• type instance - unique identifier of the metric
• dsnames - names for the values collected
• dstypes - type of data for values collected (e.g. counter, gauge, etc.)
• values - array of values collected
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 12 / 43
14. . . . per database.
...
<Database bar>
Interval 60
Service "service_name"
Query backend # predefined
Query magic_tickets
</Database>
</Plugin>
Full details at
http://collectd.org/wiki/index.php/Plugin:PostgreSQL
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 14 / 43
15. Hurdles
More meta data:
• Need a way to save schema, table, and index names; can’t differentiate
stats between tables and indexes
• Basic support of meta data in collectd but mostly unused
• How to store data in something other than RRDtool
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 15 / 43
16. Wanted: additional meta data
Hack the PostgreSQL plugin to create meta data for:
• database - database name (maybe not needed, same as
plugin instance)
• schemaname - schema name
• tablename - table name
• indexname - index name
• metric - e.g. blks hit, blks read, seq scan, etc.
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 16 / 43
17. Another database query for collecting a table statistic
<Query table_stats>
SELECT schemaname, relname, seq_scan
FROM pg_stat_all_tables;
<Query>
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 17 / 43
18. Identify the data
<Result>
Type counter
InstancePrefix "seq_scan"
InstancesFrom "schemaname" "relname"
ValuesFrom "seq_scan"
</Result>
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 18 / 43
19. Meta data specific parameters
<Database postgres>
Host "localhost"
Query table_stats
SchemanameColumn 0
TablenameColumn 1
</Database>
Note: The database name is set by what is specified in the <Database>tag, if
it is not retrieved by the query.
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 19 / 43
20. Example data
• time: 2011-10-20 18:04:17-05
• interval: 300
• host: pong.int
• plugin: postgresql
• plugin instance: sandbox
• type: counter
• type instance: seq scan-pg catalog-pg class
• dsnames: {value}
• dstypes: {counter}
• values: {249873}
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 20 / 43
21. Example meta data
• database: sandbox
• schemaname: pg catalog
• tablename: pg class
• indexname:
• metric: seq scan
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 21 / 43
22. Now what?
Hand’s were tied (I think I mentioned that earlier); open sourced work to date:
• collectd forked with patches
https://github.com/mwongatemma/collectd
• YAMS https://github.com/myemma/yams
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 22 / 43
23. Yet Another Monitoring System
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 23 / 43
24. Switching hats and boosting code
Using extracurricular time working on equipment donated to Postgres from
SUN, IBM, and HP to continue proofing collectd changes.
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 24 / 43
25. How am I going to move the data?
Options from available write plugins; guess which I used:
• Carbon - Graphite’s storage API to Whisper
http://collectd.org/wiki/index.php/Plugin:Carbon
• CSV http://collectd.org/wiki/index.php/Plugin:CSV
• Network - Send/Receive to other collectd daemons
http://collectd.org/wiki/index.php/Plugin:Network
• RRDCacheD http://collectd.org/wiki/index.php/Plugin:RRDCacheD
• RRDtool http://collectd.org/wiki/index.php/Plugin:RRDtool
• SysLog http://collectd.org/wiki/index.php/Plugin:SysLog
• UnixSock http://collectd.org/wiki/index.php/Plugin:UnixSock
• Write HTTP - PUTVAL (plain text), JSON
http://collectd.org/wiki/index.php/Plugin:Write_HTTP
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 25 / 43
26. Process of elimination
If RRDtool (wriiten in C) can’t handle massive volumes of data, a Python
RRD like database probably can’t either:
• Carbon
• CSV
• Network
• RRDCacheD
• RRDtool
• SysLog
• UnixSock
• Write HTTP - PUTVAL (plain text), JSON
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 26 / 43
27. Process of elimination
Writing to other collectd daemons or just locally doesn’t seem useful at the
moment:
• CSV
• Network
• SysLog
• UnixSock
• Write HTTP - PUTVAL (plain text), JSON
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 27 / 43
29. Random: What Write HTTP PUTVAL data looks like
Note: Each PUTVAL is a single line but is broken up into two lines to fit onto
the slide.
PUTVAL leeloo.lan.home.verplant.org/disk-sda/disk_octets
interval=10 1251533299:197141504:175136768
PUTVAL leeloo.lan.home.verplant.org/disk-sda/disk_ops
interval=10 1251533299:10765:12858
PUTVAL leeloo.lan.home.verplant.org/disk-sda/disk_time
interval=10 1251533299:5:140
PUTVAL leeloo.lan.home.verplant.org/disk-sda/disk_merged
interval=10 1251533299:4658:29899
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 29 / 43
30. Random: What the Write HTTP JSON data looks like
Note: Write HTTP packs as much data as it can into a 4KB buffer.
[ {
"values": [197141504, 175136768],
"dstypes": ["counter", "counter"],
"dsnames": ["read", "write"],
"time": 1251533299,
"interval": 10,
"host": "leeloo.lan.home.verplant.org",
"plugin": "disk",
"plugin_instance": "sda",
"type": "disk_octets",
"type_instance": ""
}, ... ]
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 30 / 43
31. I didn’t know anything about CouchDB at the time
• Query interface not really suited for retrieving data to visualize
• Insert performance not suited for millions of metrics of data over short
intervals (can insert same data into Postgres several orders of
magnitude faster)
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 31 / 43
32. Now where am I going to put the data?
Hoping that using the Write HTTP is still a good choice:
• Write an ETL
• Table partitioning logic; creation of partition tables
• Transform JSON data into INSERT statements
• Use Postgres
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 32 / 43
33. Database design
Table "collectd.value_list"
Column | Type | Modifiers
-----------------+--------------------------+-----------
time | timestamp with time zone | not null
interval | integer | not null
host | character varying(64) | not null
plugin | character varying(64) | not null
plugin_instance | character varying(64) |
type | character varying(64) | not null
type_instance | character varying(64) |
dsnames | character varying(512)[] | not null
dstypes | character varying(8)[] | not null
values | numeric[] | not null
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 33 / 43
34. Take advantage of partitioning
At least table inheritance in Postgres’ case; partition data by plugin
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 34 / 43
35. Child table
Table "collectd.vl_postgresql"
Column | Type | Modifiers
-----------------+--------------------------+-----------
...
database | character varying(64) | not null
schemaname | character varying(64) |
tablename | character varying(64) |
indexname | character varying(64) |
metric | character varying(64) | not null
Check constraints:
"vl_postgresql_plugin_check" CHECK (plugin::text =
’postgresql’::text)
Inherits: value_list
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 35 / 43
36. How much partitioning?
Lots of straightforward options:
• Date
• Database
• Schema
• Table
• Index
• Metric
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 36 / 43
37. Back to the ETL
Parameters set for fastest path to working prototype:
• Keeping using HTTP POST (Write HTTP plugin) for HTTP protocol
and JSON
• Use Python for built in HTTP Server and JSON parsing (Emma is
primarily a Python shop)
• Use SQLAlchemy/psycopg2
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 37 / 43
38. Back again to the ETL
Python didn’t perform; combination of JSON parsing, data transformation,
and INSERT performance still several orders of magnitude below acceptable
levels:
• redis to queue data to transform
• lighttpd for the HTTP interface
• fastcgi C program to push things to redis
• multi-threaded C program using libpq for Postgres API
• pop data out of redis
• table partitioning creation logic
• transform JSON data into INSERT statements
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 38 / 43
39. Success?
• Table statistics for 1 million tables collect in approximately 12 minutes.
• Is that acceptable?
• Can we go faster?
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 39 / 43
40. If you don’t have millions of data
Easier ways to visualize the data:
• RRDtool
• RRDtool compatible front-ends
http://collectd.org/wiki/index.php/List_of_front-ends
• Graphite with the Carbon and Whisper combo
http://graphite.wikidot.com/
• Reconnoiter
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 40 / 43
43. License
This work is licensed under a Creative Commons Attribution 3.0 Unported
License. To view a copy of this license, (a) visit
http://creativecommons.org/licenses/by/3.0/us/; or, (b) send a
letter to Creative Commons, 171 2nd Street, Suite 300, San Francisco,
California, 94105, USA.
markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 43 / 43