The Gnocchi Experiment

The Gnocchi Experiment
playing with timeseries

History
● Ceilometer started in 2012
○ Original mission: provide an infrastructure to collect any
information needed regarding OpenStack projects
● Added alarming in 2013
○ Create rules and based on threshold conditions that when broken
trigger action
● Added events in 2014
○ The state of an object in an OpenStack service at a point in time
● New mission
○ To reliably collect data on the utilization of the physical and
virtual resources comprising deployed clouds, persist these data for
subsequent retrieval and analysis, and trigger actions when defined

Ceilometer Architecture
OpenStack Services
Notification Bus
API
External Systems
Notification
Agents
Agent1
AgentN
Agent2
Pipeline
Polling
Agents
Agent1
AgentN
Agent2
Pipeline
Databases
Alarms
Events
Meters
AlarmEvaluator
AlarmNotifier
Collectors
Collector1
CollectorN
Collector2

Growing pains
● Too large of a scope - we did everything
● Too complex - must deploy everything
● Too much data - all data in one place
● Too few resources - handful of developers
● Too generic a solution - storage designed to handle any
scenario
● Good at nothing, average/bad at everything

Ceilometer
Gnocchi
Ceilometer Architecture
Notification Bus
Aodh
OpenStack Services
MetricsAPI
External Systems
Notification
Agents
Agent1
AgentN
Agent2
Pipeline
Polling
Agents
Agent1
AgentN
Agent2
Panko
Alarms
Events
Metrics
AlarmEvaluator
Collectors
Collector1
CollectorN
Collector2
AlarmNotifier
EventsAPI

Componentisation
● Split functionality into own projects
○ Faster rate of change
○ Less expertise
● Important functionality lives
● Ceilometer - data gathering and transformation service
● Gnocchi - time series storage service
● Aodh - alarming service
● Panko - event focused storage service
● They all work together and separately

Gnocchi use cases
● Storage brick for a billing system
● Alarm-triggering or monitoring system
● Statistical usage of data

Ceilometer to Gnocchi
● Ceilometer legacy storage
captures full-resolution data
○ Each datapoint has:
Timestamp, measurement, IDs,
resource metadata, metric
metadata, etc…
● Gnocchi stores pre-aggregated
data in a timeserie
○ Each datapoint has:
Timestamp, measurement… that’s
it… and then it’s compressed
○ resource metadata is an
explicit subset AND not tied to
measurement
○ Defined archival rules
■ capture data at 1 min
granularity for 1 day AND
3 hr granularity for 1
month AND ...

Archive Policies
5 minute granularity for a day
1 day granularity for a year

Ceilometer
Raw sample
{
"user_id": "0d9d089b8f8340999fbe01354ef84643",
"resource_id": "a7c7cf84-5bf7-4838-a116-645ea376f4e0",
"timestamp": "2016-05-11T18:23:46.166000",
"meter": "disk.write.bytes",
"volume": 56114794496,
"source": "openstack",
"recorded_at": "2016-05-11T18:23:47.177000",
"project_id": "dec2b73655154e31be903fc93e575146",
"type": "cumulative",
"id": "7fbf56ca-17a5-11e6-a210-e8bdd1f62a56",
"unit": "B",
"metadata": {
"instance_host": "cloud03.wz",
"ephemeral_gb": "0",
"flavor.vcpus": "8",
"OS-EXT-AZ.availability_zone": "nova",
"memory_mb": "16384",
"display_name": "gord_dev",
"state": "active",
"flavor.id": "5",
"status": "active",
"ramdisk_id": "None",
"flavor.name": "m1.xlarge",
"disk_gb": "160",
"kernel_id": "None",
"image.id": "dba2c73c-3f11-45a1-998a-6a4ca2cf243e",
"flavor.ram": "16384",
"host":
"64fe410a8b602f69fe43a180c62b02d6c00e41c03caba40a092e2fb6",
"device": "['vda']",
"flavor.ephemeral": "0",
"image.name": "fedora-23-x86_64",
}
}

Separation of value
Resource
● Id
● User_id
● Project_id
● Start_timestamp: timestamp
● End_timestamp: timestamp
● Metadata: {attribute: value}
● Metric: list
Measurements
● [ (timestamp, value), ... ]
Metric
● Name
● archive_policy

Gnocchi Architecture
API
Resource
Indexer
Metric
Storage MetricD
Computation workers
data

MetricD Aggregation
Metric Storage
MetricD
Computation
workers2
raw metric dump
computed aggregates
1
3backlog
1. Get unprocessed datapoint
2. Compute new aggregations
a. Update sum, avg, min, max, etc…
values based on define policy
3. Add datapoint to backlog for next
computation
a. Delete datapoints not required for
future aggregations
b. By default, only keep backlog for
single period.

Storage format
Metric Storage
raw metric dump
computed aggregates
backlog
● [ (timestamp, value), (timestamp,value) ]
● One object per write
● { values: { timestamp: value, timestamp:value },
block_size: max number of points,
back_window: number of blocks to retain}
● Binary serialised using msgpacks
● One object per metric
● { first_timestamp: first timestamp of block,
aggregation_method: sum, min, max, etc…,
max_size: max number of points,
sampling: granularity (60s, 300s, etc…),
timestamps: [ time1, time2, … ],
values: [value1, value2, … ]}
● Binary serialised using msgpacks
● Compressed with LZ4
● Split into chunks to minimise transfer when updating large series
● (potentially) multiple objects per aggregate per granularity per metric

Query path
API
Resource
Indexer
Metric
Storage
What’s the cpu utilisation for
VM1?
resource_id
Meausures (all granularities)
metric_id
+---------------------------+-------------+----------------+
| timestamp | granularity | value |
+---------------------------+-------------+----------------+
| 2016-04-07T00:00:00+00:00 | 86400.0 | 0.30323927544 |
| 2016-04-07T17:00:00+00:00 | 3600.0 | 1.2855184725 |
| 2016-04-07T18:00:00+00:00 | 3600.0 | 0.188613527791 |
| 2016-04-07T19:00:00+00:00 | 3600.0 | 0.188871232024 |
| 2016-04-07T20:00:00+00:00 | 3600.0 | 0.188876901916 |
| 2016-04-07T21:00:00+00:00 | 3600.0 | 0.189646641908 |
| 2016-04-07T21:10:00+00:00 | 300.0 | 0.190019839676 |
| 2016-04-07T21:15:00+00:00 | 300.0 | 0.186565358466 |
| 2016-04-07T21:20:00+00:00 | 300.0 | 0.183166934543 |
| 2016-04-07T21:25:00+00:00 | 300.0 | 0.179994544916 |
| 2016-04-07T21:30:00+00:00 | 300.0 | 0.186649908928 |
| 2016-04-07T21:35:00+00:00 | 300.0 | 0.193315212093 |
| 2016-04-07T21:40:00+00:00 | 300.0 | 0.193272093903 |
| 2016-04-07T21:45:00+00:00 | 300.0 | 0.196677374077 |
| 2016-04-07T21:50:00+00:00 | 300.0 | 0.193300189049 |
+---------------------------+-------------+----------------+
metric_id

Zero computation at
query. Only lookup.

Results (benchmark data, Gnocchi 1.3.x)

Ceilometer to Gnocchi
Ceilometer legacy storage
● Single datapoint averages to
~1.5KB/point (mongodb) or
~150B/point (SQL)
● For 1000 VM, capturing 10
metrics/VM, every minute:
~15MB/minute, ~900MB/hour,
~21GB/day, etc…
Gnocchi
● Single datapoint AT MOST is
9B/point
● For 1000 VM, capturing 10
metrics/VM, every minute:
~90KB/minute, ~5.4MB/hour,
~130MB/day, etc…

The Gnocchi Experiment

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (7)

Similar to The Gnocchi Experiment

Similar to The Gnocchi Experiment (20)

Recently uploaded

Recently uploaded (20)

The Gnocchi Experiment