SlideShare uma empresa Scribd logo
1 de 34
LongTerm
storage for
Prometheus
Far long ago….. in all datacentres
bigger average = less details
CONTENT
 Introduction
 Long-Term Storage Overview
 Thanos Architecture and Resources Usage
 VictoriaMetrics Architecture and Resources Usage
 Price comparison
INTRODUCTION
Thanos is a set of components that can be composed into a highly available metric system with
unlimited storage capacity, which can be added seamlessly on top of existing Prometheus
deployments.
Curren release 0.5.0 is designed to store old metrics (which reached retention period on
Prometheus nodes) on some S3 like storage for long-term.
Collected metrics can be accessed for reviewing via Grafana. Prometheus query dashboard will
show only data stored on Prometheus instances.
VictoriaMetrics is fast, cost-effective and scalable time-series database. It can be used as long-
term remote storage for Prometheus. It uses own data compression, it allows to store more data
on the same disk size.
Cortex provides horizontally scalable, highly available, multi-tenant, long term storage for
Prometheus.
Prometheus
node
Monitored service 1
Monitored service 2
Monitored service ...
Monitored service N
Storage
Grafana
Long-Term Storage
DataSource 1
DataSource 2
Alerts
Alertmanager
Store data after retention is reached
 Why do we need Long-Term storage:
 To store a historical data about your workloads
 To review an incidents
 To plan a scaling based on seasonal load
 To find a bottlenecks into infrastructure during continuous run/load
 What solutions can be used for storing Long-Term historical timeseries:
 Cortex, InfluxDB, Kafka, Graphite, …, Thanos, VictoriaMetrics *
* https://prometheus.io/docs/operating/integrations/#remote-endpoints-and-storage
LONG-TERM STORAGE OVERVIEW
THANOS ARCHITECTURE
Prometheus POD
Prometheus Prometheus config reloader
Configmap reloader Thanos sidecar
Thanos query POD
Thanos query
Thanos compact
Thanos compact
Thanos Store Gateway POD
Thanos store gateway
* https://thanos.io/getting-started.md/
Storages for Thanos:
(stable)
- Google Cloud Storage
- AWS S3
- Azure Storage Account
(beta)
- OpenStack Swift
- Tencent COS
Thanos query POD
Thanos query
Thanos Store Gateway POD
Thanos store gateway
Prometheus 2
Grafana or Thanos UI
Prometheus 1
Bucket
AVANTAGES AND DISAVANTAGES
- Infinity retention without reconfiguring
srorage
- Collected data is available even if
infrastucture recreated (data is into bucket)
- Global query view over data collected from
multiple Prometheus instances and bucket
- Horizontal scalability
- Metrics compaction
- Full monitoring stack
- Complicated infrastructure
HOW IT WAS TESTED
NODE_0
NODE_2
NODE...
NODE_498
NODE_499
METRIC_0
METRIC_1
METRIC_2
METRIC_...
METRIC_999
NODE_49
9
NODE_
4
500
NODES
1000 METRIC PER
NODE
each 15 seconds
24 Hours
Scroll Bar (500 reporters)
500 nodes, 4 times per minute, 24 hours = 2 880 000 000 points
LOAD ON CLUSTER NODES
QUERIES VIA THANOS FROM BUCKET
MEMORY USAGE STABILIZATION ON CLUSTER NODES
SCRAPE DURATION
GKE CLUSTER DETAILS
pay attention on allocation )))
Between scrapes 30 sec, during this time we have 2 15-sec intervals,
So 4.37 sec prometheus needs to scrape 1 000 000 metrics
VICTORIAMETRICS ARCHITECTURE
VM-select_2VM-select_1 VM-select_3
VM-storage_2VM-storage_1 VM-storage_3
VM-insert_2VM-insert_1 VM-insert_3
LB/ClusterIP
LB/ClusterIP
STATEFUL
STATELESS
STATELESS
READ OPERATIONS
WRITE OPERATIONS
AVANTAGES AND DISAVANTAGES
- Infinity retention with reconfiguring storage
- Global query view over data collected from
storage
- Horizontal scalability
- Metrics compaction (multpile times better)
(floating to integer)
- Simple infrastructure
- No integration with Alert Manager
- Cloud storages are not supported yet
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/129
- More load on hosts
24 Hours
Scroll Bar (500 reporters)
500 nodes, 4 times per minute, 24 hours = 2 880 000 000 points
THANOS VICTORIAMETRICS
- 12-15 GiB metrics per 1 day (2.88Bil)
- 16GiB memory used on nodes
- 2.1 – 2.4 CPU cores are used on nodes
- 2.8-3 GiB metrics per 1 day on each
storage(2.88Bil)
- 16GiB memory used on nodes
- 2.8 – 4 CPU cores are used on nodes
Storage price (Cloud Storage*):
15*365=5475 ~5500Gib
Storage total: $126.50 per month;
~$1500 in 1 year
* Based on retention we can move data to a
cold line storage class
Storage price (Persistent Disk Standard):
3*365=1095 ~1100Gib
$52.80 per month * Numer_of_Storages
Storage total: 52.8*3=158.4 per 1 month
* https://github.com/VictoriaMetrics/VictoriaMetrics/issues/134
If one of the storages lost – some part of data became unavailable
PRICE COMPARISON
Thanos Vicrotiametrics
Instance’s price 3* N-standard-4 4vCPU 15GB memory $97.49 monthly estimate
3*97.49=$292 Standard Provisioned Space: 1,500 GB - $60
CPU usage
Memory usage
50%
16GB
65%
16GB
Metrics per day 15GB 9GB
Metrics per minute 2 000 000 2 000 000
Metrics per one day 2 880 000 000 2 880 000 000
Scrape interval (1M metrics) 4.373 s 4.553 s
Historical data access 303-525 ms
(500 timeseies)
179-492 ms
(500 timeseies)
STORAGE PRICING
Q & A
To produce downsampled data, the Compactor continuously aggregates series down to five
minute and one hour resolutions. For each raw chunk, encoded with TSDB’s XOR
compression, it stores different types of aggregations, e.g. min, max, or sum in a single block.
This allows Querier to automatically choose the aggregate that is appropriate for a given
PromQL query.
VM Gorilla compression analysis
VM Gorilla compression analysis
VM Gorilla compression analysis
The only problem is the result may exceed 64 bits — default integer size used in modern computers.
How to deal with it? Normalize the integer by dividing by 10^M where M is the minimum value that
allows fitting all the time series values into 64 bits and removing common trailing decimal zeros.

Mais conteúdo relacionado

Mais procurados

[241]large scale search with polysemous codes
[241]large scale search with polysemous codes[241]large scale search with polysemous codes
[241]large scale search with polysemous codesNAVER D2
 
SkyhookDM - Towards an Arrow-Native Storage System
SkyhookDM - Towards an Arrow-Native Storage SystemSkyhookDM - Towards an Arrow-Native Storage System
SkyhookDM - Towards an Arrow-Native Storage SystemJayjeetChakraborty
 
Designing and Building Multi-Region Swift Deployment
Designing and Building Multi-Region Swift DeploymentDesigning and Building Multi-Region Swift Deployment
Designing and Building Multi-Region Swift DeploymentSiheon Kim
 
Storing metrics at scale with Gnocchi
Storing metrics at scale with GnocchiStoring metrics at scale with Gnocchi
Storing metrics at scale with GnocchiGordon Chung
 
Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...
Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...
Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...Igor Sfiligoi
 
Gnocchi Profiling 2.1.x
Gnocchi Profiling 2.1.xGnocchi Profiling 2.1.x
Gnocchi Profiling 2.1.xGordon Chung
 
Gnocchi Profiling v2
Gnocchi Profiling v2Gnocchi Profiling v2
Gnocchi Profiling v2Gordon Chung
 
Testing data and metadata backends with ClawIO
Testing data and metadata backends with ClawIOTesting data and metadata backends with ClawIO
Testing data and metadata backends with ClawIOHugo González Labrador
 
A Fast and Efficient Time Series Storage Based on Apache Solr
A Fast and Efficient Time Series Storage Based on Apache SolrA Fast and Efficient Time Series Storage Based on Apache Solr
A Fast and Efficient Time Series Storage Based on Apache SolrQAware GmbH
 
Managing Data and Operation Distribution In MongoDB
Managing Data and Operation Distribution In MongoDBManaging Data and Operation Distribution In MongoDB
Managing Data and Operation Distribution In MongoDBJason Terpko
 
Gnocchi v4 - past and present
Gnocchi v4 - past and presentGnocchi v4 - past and present
Gnocchi v4 - past and presentGordon Chung
 
Tajo case study bay area hug 20131105
Tajo case study bay area hug 20131105Tajo case study bay area hug 20131105
Tajo case study bay area hug 20131105Gruter
 
Cassandra&map reduce
Cassandra&map reduceCassandra&map reduce
Cassandra&map reducevlaskinvlad
 
Burst data retrieval after 50k GPU Cloud run
Burst data retrieval after 50k GPU Cloud runBurst data retrieval after 50k GPU Cloud run
Burst data retrieval after 50k GPU Cloud runIgor Sfiligoi
 
Object multifunctional indexing with an open API
Object multifunctional indexing with an open API Object multifunctional indexing with an open API
Object multifunctional indexing with an open API akvalex
 
Data-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud BurstData-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud BurstIgor Sfiligoi
 
NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic...
 NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic... NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic...
NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic...Igor Sfiligoi
 
Triggers In MongoDB
Triggers In MongoDBTriggers In MongoDB
Triggers In MongoDBJason Terpko
 

Mais procurados (20)

[241]large scale search with polysemous codes
[241]large scale search with polysemous codes[241]large scale search with polysemous codes
[241]large scale search with polysemous codes
 
SkyhookDM - Towards an Arrow-Native Storage System
SkyhookDM - Towards an Arrow-Native Storage SystemSkyhookDM - Towards an Arrow-Native Storage System
SkyhookDM - Towards an Arrow-Native Storage System
 
Designing and Building Multi-Region Swift Deployment
Designing and Building Multi-Region Swift DeploymentDesigning and Building Multi-Region Swift Deployment
Designing and Building Multi-Region Swift Deployment
 
Storing metrics at scale with Gnocchi
Storing metrics at scale with GnocchiStoring metrics at scale with Gnocchi
Storing metrics at scale with Gnocchi
 
Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...
Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...
Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...
 
The new time series kid on the block
The new time series kid on the blockThe new time series kid on the block
The new time series kid on the block
 
Gnocchi Profiling 2.1.x
Gnocchi Profiling 2.1.xGnocchi Profiling 2.1.x
Gnocchi Profiling 2.1.x
 
Gnocchi Profiling v2
Gnocchi Profiling v2Gnocchi Profiling v2
Gnocchi Profiling v2
 
Testing data and metadata backends with ClawIO
Testing data and metadata backends with ClawIOTesting data and metadata backends with ClawIO
Testing data and metadata backends with ClawIO
 
A Fast and Efficient Time Series Storage Based on Apache Solr
A Fast and Efficient Time Series Storage Based on Apache SolrA Fast and Efficient Time Series Storage Based on Apache Solr
A Fast and Efficient Time Series Storage Based on Apache Solr
 
Managing Data and Operation Distribution In MongoDB
Managing Data and Operation Distribution In MongoDBManaging Data and Operation Distribution In MongoDB
Managing Data and Operation Distribution In MongoDB
 
Gnocchi v4 - past and present
Gnocchi v4 - past and presentGnocchi v4 - past and present
Gnocchi v4 - past and present
 
Tajo case study bay area hug 20131105
Tajo case study bay area hug 20131105Tajo case study bay area hug 20131105
Tajo case study bay area hug 20131105
 
Cassandra&map reduce
Cassandra&map reduceCassandra&map reduce
Cassandra&map reduce
 
Burst data retrieval after 50k GPU Cloud run
Burst data retrieval after 50k GPU Cloud runBurst data retrieval after 50k GPU Cloud run
Burst data retrieval after 50k GPU Cloud run
 
Object multifunctional indexing with an open API
Object multifunctional indexing with an open API Object multifunctional indexing with an open API
Object multifunctional indexing with an open API
 
Data-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud BurstData-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud Burst
 
NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic...
 NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic... NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic...
NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic...
 
NBITSearch. Features.
NBITSearch. Features.NBITSearch. Features.
NBITSearch. Features.
 
Triggers In MongoDB
Triggers In MongoDBTriggers In MongoDB
Triggers In MongoDB
 

Semelhante a ДЕНИС КЛЕПIКОВ «Long Term storage for Prometheus» Lviv DevOps Conference 2019

(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon RedshiftAmazon Web Services
 
Adam Dagnall: Advanced S3 compatible storage integration in CloudStack
Adam Dagnall: Advanced S3 compatible storage integration in CloudStackAdam Dagnall: Advanced S3 compatible storage integration in CloudStack
Adam Dagnall: Advanced S3 compatible storage integration in CloudStackShapeBlue
 
Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical DemonstrationMaximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical DemonstrationDenodo
 
Integrating On-premises Enterprise Storage Workloads with AWS (ENT301) | AWS ...
Integrating On-premises Enterprise Storage Workloads with AWS (ENT301) | AWS ...Integrating On-premises Enterprise Storage Workloads with AWS (ENT301) | AWS ...
Integrating On-premises Enterprise Storage Workloads with AWS (ENT301) | AWS ...Amazon Web Services
 
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...Big Data Spain
 
How The Weather Company Uses Apache Spark to Serve Weather Data Fast at Low Cost
How The Weather Company Uses Apache Spark to Serve Weather Data Fast at Low CostHow The Weather Company Uses Apache Spark to Serve Weather Data Fast at Low Cost
How The Weather Company Uses Apache Spark to Serve Weather Data Fast at Low CostDatabricks
 
958 and 959 sales exam prep
958 and 959 sales exam prep958 and 959 sales exam prep
958 and 959 sales exam prepJason Wong
 
Data Engineer's Lunch #23: Thanos/Cortex
Data Engineer's Lunch #23: Thanos/CortexData Engineer's Lunch #23: Thanos/Cortex
Data Engineer's Lunch #23: Thanos/CortexAnant Corporation
 
Prometheus in Practice: High Availability with Thanos (DevOpsDays Edinburgh 2...
Prometheus in Practice: High Availability with Thanos (DevOpsDays Edinburgh 2...Prometheus in Practice: High Availability with Thanos (DevOpsDays Edinburgh 2...
Prometheus in Practice: High Availability with Thanos (DevOpsDays Edinburgh 2...Thomas Riley
 
RaptorX: Building a 10X Faster Presto with hierarchical cache
RaptorX: Building a 10X Faster Presto with hierarchical cacheRaptorX: Building a 10X Faster Presto with hierarchical cache
RaptorX: Building a 10X Faster Presto with hierarchical cacheAlluxio, Inc.
 
Monitoring with prometheus at scale
Monitoring with prometheus at scaleMonitoring with prometheus at scale
Monitoring with prometheus at scaleJuraj Hantak
 
Monitoring with prometheus at scale
Monitoring with prometheus at scaleMonitoring with prometheus at scale
Monitoring with prometheus at scaleAdam Hamsik
 
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayC4Media
 
OSMC 2021 | pg_stat_monitor: A cool extension for better database (PostgreSQL...
OSMC 2021 | pg_stat_monitor: A cool extension for better database (PostgreSQL...OSMC 2021 | pg_stat_monitor: A cool extension for better database (PostgreSQL...
OSMC 2021 | pg_stat_monitor: A cool extension for better database (PostgreSQL...NETWAYS
 
Azure and StorSimple for Disaster Recovery and Storage Management - SoftwareO...
Azure and StorSimple for Disaster Recovery and Storage Management - SoftwareO...Azure and StorSimple for Disaster Recovery and Storage Management - SoftwareO...
Azure and StorSimple for Disaster Recovery and Storage Management - SoftwareO...SoftwareONEPresents
 
Apache con 2020 use cases and optimizations of iotdb
Apache con 2020 use cases and optimizations of iotdbApache con 2020 use cases and optimizations of iotdb
Apache con 2020 use cases and optimizations of iotdbZhangZhengming
 
Lessons Learned: Running InfluxDB Cloud and Other Cloud Services at Scale | T...
Lessons Learned: Running InfluxDB Cloud and Other Cloud Services at Scale | T...Lessons Learned: Running InfluxDB Cloud and Other Cloud Services at Scale | T...
Lessons Learned: Running InfluxDB Cloud and Other Cloud Services at Scale | T...InfluxData
 
Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade Off
Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade OffDatabases Have Forgotten About Single Node Performance, A Wrongheaded Trade Off
Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade OffTimescale
 
How to Reduce Public Cloud Storage Costs
How to Reduce Public Cloud Storage CostsHow to Reduce Public Cloud Storage Costs
How to Reduce Public Cloud Storage CostsBuurst
 

Semelhante a ДЕНИС КЛЕПIКОВ «Long Term storage for Prometheus» Lviv DevOps Conference 2019 (20)

(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift
 
Adam Dagnall: Advanced S3 compatible storage integration in CloudStack
Adam Dagnall: Advanced S3 compatible storage integration in CloudStackAdam Dagnall: Advanced S3 compatible storage integration in CloudStack
Adam Dagnall: Advanced S3 compatible storage integration in CloudStack
 
Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical DemonstrationMaximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
 
Integrating On-premises Enterprise Storage Workloads with AWS (ENT301) | AWS ...
Integrating On-premises Enterprise Storage Workloads with AWS (ENT301) | AWS ...Integrating On-premises Enterprise Storage Workloads with AWS (ENT301) | AWS ...
Integrating On-premises Enterprise Storage Workloads with AWS (ENT301) | AWS ...
 
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
 
How The Weather Company Uses Apache Spark to Serve Weather Data Fast at Low Cost
How The Weather Company Uses Apache Spark to Serve Weather Data Fast at Low CostHow The Weather Company Uses Apache Spark to Serve Weather Data Fast at Low Cost
How The Weather Company Uses Apache Spark to Serve Weather Data Fast at Low Cost
 
958 and 959 sales exam prep
958 and 959 sales exam prep958 and 959 sales exam prep
958 and 959 sales exam prep
 
Data Engineer's Lunch #23: Thanos/Cortex
Data Engineer's Lunch #23: Thanos/CortexData Engineer's Lunch #23: Thanos/Cortex
Data Engineer's Lunch #23: Thanos/Cortex
 
Prometheus in Practice: High Availability with Thanos (DevOpsDays Edinburgh 2...
Prometheus in Practice: High Availability with Thanos (DevOpsDays Edinburgh 2...Prometheus in Practice: High Availability with Thanos (DevOpsDays Edinburgh 2...
Prometheus in Practice: High Availability with Thanos (DevOpsDays Edinburgh 2...
 
RaptorX: Building a 10X Faster Presto with hierarchical cache
RaptorX: Building a 10X Faster Presto with hierarchical cacheRaptorX: Building a 10X Faster Presto with hierarchical cache
RaptorX: Building a 10X Faster Presto with hierarchical cache
 
Monitoring with prometheus at scale
Monitoring with prometheus at scaleMonitoring with prometheus at scale
Monitoring with prometheus at scale
 
Monitoring with prometheus at scale
Monitoring with prometheus at scaleMonitoring with prometheus at scale
Monitoring with prometheus at scale
 
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
 
OSMC 2021 | pg_stat_monitor: A cool extension for better database (PostgreSQL...
OSMC 2021 | pg_stat_monitor: A cool extension for better database (PostgreSQL...OSMC 2021 | pg_stat_monitor: A cool extension for better database (PostgreSQL...
OSMC 2021 | pg_stat_monitor: A cool extension for better database (PostgreSQL...
 
Azure and StorSimple for Disaster Recovery and Storage Management - SoftwareO...
Azure and StorSimple for Disaster Recovery and Storage Management - SoftwareO...Azure and StorSimple for Disaster Recovery and Storage Management - SoftwareO...
Azure and StorSimple for Disaster Recovery and Storage Management - SoftwareO...
 
Apache con 2020 use cases and optimizations of iotdb
Apache con 2020 use cases and optimizations of iotdbApache con 2020 use cases and optimizations of iotdb
Apache con 2020 use cases and optimizations of iotdb
 
Symantec NetBackup na Nuvem AWS
Symantec NetBackup na Nuvem AWSSymantec NetBackup na Nuvem AWS
Symantec NetBackup na Nuvem AWS
 
Lessons Learned: Running InfluxDB Cloud and Other Cloud Services at Scale | T...
Lessons Learned: Running InfluxDB Cloud and Other Cloud Services at Scale | T...Lessons Learned: Running InfluxDB Cloud and Other Cloud Services at Scale | T...
Lessons Learned: Running InfluxDB Cloud and Other Cloud Services at Scale | T...
 
Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade Off
Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade OffDatabases Have Forgotten About Single Node Performance, A Wrongheaded Trade Off
Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade Off
 
How to Reduce Public Cloud Storage Costs
How to Reduce Public Cloud Storage CostsHow to Reduce Public Cloud Storage Costs
How to Reduce Public Cloud Storage Costs
 

Mais de UA DevOps Conference

ІЛЛЯ ЛУБЕНЕЦЬ «DevSecOps наступний етап розвитку DevOps» GO DevOps
ІЛЛЯ ЛУБЕНЕЦЬ «DevSecOps наступний етап розвитку DevOps»  GO DevOpsІЛЛЯ ЛУБЕНЕЦЬ «DevSecOps наступний етап розвитку DevOps»  GO DevOps
ІЛЛЯ ЛУБЕНЕЦЬ «DevSecOps наступний етап розвитку DevOps» GO DevOpsUA DevOps Conference
 
ОЛЕКСАНДР СНІГОВИЙ «Continuous Deployment: Challenges, Solutions, and Lesson...
ОЛЕКСАНДР СНІГОВИЙ «Continuous Deployment: Challenges, Solutions, and Lesson...ОЛЕКСАНДР СНІГОВИЙ «Continuous Deployment: Challenges, Solutions, and Lesson...
ОЛЕКСАНДР СНІГОВИЙ «Continuous Deployment: Challenges, Solutions, and Lesson...UA DevOps Conference
 
АРТЕМ КОБРІН «Achieve Networking at Scale with a Self-Service Network Solutio...
АРТЕМ КОБРІН «Achieve Networking at Scale with a Self-Service Network Solutio...АРТЕМ КОБРІН «Achieve Networking at Scale with a Self-Service Network Solutio...
АРТЕМ КОБРІН «Achieve Networking at Scale with a Self-Service Network Solutio...UA DevOps Conference
 
ОЛЕКСАНДР СИРОТЕНКО «DataKernel: майструючи український фреймворк для highloa...
ОЛЕКСАНДР СИРОТЕНКО «DataKernel: майструючи український фреймворк для highloa...ОЛЕКСАНДР СИРОТЕНКО «DataKernel: майструючи український фреймворк для highloa...
ОЛЕКСАНДР СИРОТЕНКО «DataKernel: майструючи український фреймворк для highloa...UA DevOps Conference
 
ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...
ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...
ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...UA DevOps Conference
 
ОЛЕКСАНДР ВІЛЬЧИНСЬКИЙ «DevOps culture» Lviv DevOps Conference 2019
ОЛЕКСАНДР ВІЛЬЧИНСЬКИЙ «DevOps culture» Lviv DevOps Conference 2019ОЛЕКСАНДР ВІЛЬЧИНСЬКИЙ «DevOps culture» Lviv DevOps Conference 2019
ОЛЕКСАНДР ВІЛЬЧИНСЬКИЙ «DevOps culture» Lviv DevOps Conference 2019UA DevOps Conference
 
КОСТЯНТИН СЕВЕРЕНЧУК «Monitoring and Automation in DevTestSecOps world» Lviv ...
КОСТЯНТИН СЕВЕРЕНЧУК «Monitoring and Automation in DevTestSecOps world» Lviv ...КОСТЯНТИН СЕВЕРЕНЧУК «Monitoring and Automation in DevTestSecOps world» Lviv ...
КОСТЯНТИН СЕВЕРЕНЧУК «Monitoring and Automation in DevTestSecOps world» Lviv ...UA DevOps Conference
 
ОЛЕКСАНДР СНІГОВИЙ «Extension of DevOps: Policy as Code» Lviv DevOps Confere...
ОЛЕКСАНДР СНІГОВИЙ «Extension of DevOps: Policy as Code» Lviv DevOps Confere...ОЛЕКСАНДР СНІГОВИЙ «Extension of DevOps: Policy as Code» Lviv DevOps Confere...
ОЛЕКСАНДР СНІГОВИЙ «Extension of DevOps: Policy as Code» Lviv DevOps Confere...UA DevOps Conference
 
СТАНІСЛАВ КОЛЕНКІН «Cilium – Network security for microservices. Let’s see ho...
СТАНІСЛАВ КОЛЕНКІН «Cilium – Network security for microservices. Let’s see ho...СТАНІСЛАВ КОЛЕНКІН «Cilium – Network security for microservices. Let’s see ho...
СТАНІСЛАВ КОЛЕНКІН «Cilium – Network security for microservices. Let’s see ho...UA DevOps Conference
 
ОЛЕГ МАЦЬКІВ «Crash course on Operator Framework» Lviv DevOps Conference 2019
ОЛЕГ МАЦЬКІВ «Crash course on Operator Framework» Lviv DevOps Conference 2019ОЛЕГ МАЦЬКІВ «Crash course on Operator Framework» Lviv DevOps Conference 2019
ОЛЕГ МАЦЬКІВ «Crash course on Operator Framework» Lviv DevOps Conference 2019UA DevOps Conference
 

Mais de UA DevOps Conference (10)

ІЛЛЯ ЛУБЕНЕЦЬ «DevSecOps наступний етап розвитку DevOps» GO DevOps
ІЛЛЯ ЛУБЕНЕЦЬ «DevSecOps наступний етап розвитку DevOps»  GO DevOpsІЛЛЯ ЛУБЕНЕЦЬ «DevSecOps наступний етап розвитку DevOps»  GO DevOps
ІЛЛЯ ЛУБЕНЕЦЬ «DevSecOps наступний етап розвитку DevOps» GO DevOps
 
ОЛЕКСАНДР СНІГОВИЙ «Continuous Deployment: Challenges, Solutions, and Lesson...
ОЛЕКСАНДР СНІГОВИЙ «Continuous Deployment: Challenges, Solutions, and Lesson...ОЛЕКСАНДР СНІГОВИЙ «Continuous Deployment: Challenges, Solutions, and Lesson...
ОЛЕКСАНДР СНІГОВИЙ «Continuous Deployment: Challenges, Solutions, and Lesson...
 
АРТЕМ КОБРІН «Achieve Networking at Scale with a Self-Service Network Solutio...
АРТЕМ КОБРІН «Achieve Networking at Scale with a Self-Service Network Solutio...АРТЕМ КОБРІН «Achieve Networking at Scale with a Self-Service Network Solutio...
АРТЕМ КОБРІН «Achieve Networking at Scale with a Self-Service Network Solutio...
 
ОЛЕКСАНДР СИРОТЕНКО «DataKernel: майструючи український фреймворк для highloa...
ОЛЕКСАНДР СИРОТЕНКО «DataKernel: майструючи український фреймворк для highloa...ОЛЕКСАНДР СИРОТЕНКО «DataKernel: майструючи український фреймворк для highloa...
ОЛЕКСАНДР СИРОТЕНКО «DataKernel: майструючи український фреймворк для highloa...
 
ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...
ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...
ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...
 
ОЛЕКСАНДР ВІЛЬЧИНСЬКИЙ «DevOps culture» Lviv DevOps Conference 2019
ОЛЕКСАНДР ВІЛЬЧИНСЬКИЙ «DevOps culture» Lviv DevOps Conference 2019ОЛЕКСАНДР ВІЛЬЧИНСЬКИЙ «DevOps culture» Lviv DevOps Conference 2019
ОЛЕКСАНДР ВІЛЬЧИНСЬКИЙ «DevOps culture» Lviv DevOps Conference 2019
 
КОСТЯНТИН СЕВЕРЕНЧУК «Monitoring and Automation in DevTestSecOps world» Lviv ...
КОСТЯНТИН СЕВЕРЕНЧУК «Monitoring and Automation in DevTestSecOps world» Lviv ...КОСТЯНТИН СЕВЕРЕНЧУК «Monitoring and Automation in DevTestSecOps world» Lviv ...
КОСТЯНТИН СЕВЕРЕНЧУК «Monitoring and Automation in DevTestSecOps world» Lviv ...
 
ОЛЕКСАНДР СНІГОВИЙ «Extension of DevOps: Policy as Code» Lviv DevOps Confere...
ОЛЕКСАНДР СНІГОВИЙ «Extension of DevOps: Policy as Code» Lviv DevOps Confere...ОЛЕКСАНДР СНІГОВИЙ «Extension of DevOps: Policy as Code» Lviv DevOps Confere...
ОЛЕКСАНДР СНІГОВИЙ «Extension of DevOps: Policy as Code» Lviv DevOps Confere...
 
СТАНІСЛАВ КОЛЕНКІН «Cilium – Network security for microservices. Let’s see ho...
СТАНІСЛАВ КОЛЕНКІН «Cilium – Network security for microservices. Let’s see ho...СТАНІСЛАВ КОЛЕНКІН «Cilium – Network security for microservices. Let’s see ho...
СТАНІСЛАВ КОЛЕНКІН «Cilium – Network security for microservices. Let’s see ho...
 
ОЛЕГ МАЦЬКІВ «Crash course on Operator Framework» Lviv DevOps Conference 2019
ОЛЕГ МАЦЬКІВ «Crash course on Operator Framework» Lviv DevOps Conference 2019ОЛЕГ МАЦЬКІВ «Crash course on Operator Framework» Lviv DevOps Conference 2019
ОЛЕГ МАЦЬКІВ «Crash course on Operator Framework» Lviv DevOps Conference 2019
 

Último

Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceanilsa9823
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 

Último (20)

Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 

ДЕНИС КЛЕПIКОВ «Long Term storage for Prometheus» Lviv DevOps Conference 2019

  • 2. Far long ago….. in all datacentres bigger average = less details
  • 3. CONTENT  Introduction  Long-Term Storage Overview  Thanos Architecture and Resources Usage  VictoriaMetrics Architecture and Resources Usage  Price comparison
  • 4. INTRODUCTION Thanos is a set of components that can be composed into a highly available metric system with unlimited storage capacity, which can be added seamlessly on top of existing Prometheus deployments. Curren release 0.5.0 is designed to store old metrics (which reached retention period on Prometheus nodes) on some S3 like storage for long-term. Collected metrics can be accessed for reviewing via Grafana. Prometheus query dashboard will show only data stored on Prometheus instances. VictoriaMetrics is fast, cost-effective and scalable time-series database. It can be used as long- term remote storage for Prometheus. It uses own data compression, it allows to store more data on the same disk size. Cortex provides horizontally scalable, highly available, multi-tenant, long term storage for Prometheus.
  • 5. Prometheus node Monitored service 1 Monitored service 2 Monitored service ... Monitored service N Storage Grafana Long-Term Storage DataSource 1 DataSource 2 Alerts Alertmanager Store data after retention is reached
  • 6.  Why do we need Long-Term storage:  To store a historical data about your workloads  To review an incidents  To plan a scaling based on seasonal load  To find a bottlenecks into infrastructure during continuous run/load  What solutions can be used for storing Long-Term historical timeseries:  Cortex, InfluxDB, Kafka, Graphite, …, Thanos, VictoriaMetrics * * https://prometheus.io/docs/operating/integrations/#remote-endpoints-and-storage LONG-TERM STORAGE OVERVIEW
  • 7. THANOS ARCHITECTURE Prometheus POD Prometheus Prometheus config reloader Configmap reloader Thanos sidecar Thanos query POD Thanos query Thanos compact Thanos compact Thanos Store Gateway POD Thanos store gateway
  • 8. * https://thanos.io/getting-started.md/ Storages for Thanos: (stable) - Google Cloud Storage - AWS S3 - Azure Storage Account (beta) - OpenStack Swift - Tencent COS
  • 9. Thanos query POD Thanos query Thanos Store Gateway POD Thanos store gateway Prometheus 2 Grafana or Thanos UI Prometheus 1 Bucket
  • 10. AVANTAGES AND DISAVANTAGES - Infinity retention without reconfiguring srorage - Collected data is available even if infrastucture recreated (data is into bucket) - Global query view over data collected from multiple Prometheus instances and bucket - Horizontal scalability - Metrics compaction - Full monitoring stack - Complicated infrastructure
  • 11. HOW IT WAS TESTED NODE_0 NODE_2 NODE... NODE_498 NODE_499 METRIC_0 METRIC_1 METRIC_2 METRIC_... METRIC_999 NODE_49 9 NODE_ 4 500 NODES 1000 METRIC PER NODE each 15 seconds
  • 12. 24 Hours Scroll Bar (500 reporters) 500 nodes, 4 times per minute, 24 hours = 2 880 000 000 points
  • 14.
  • 15. QUERIES VIA THANOS FROM BUCKET
  • 16.
  • 17. MEMORY USAGE STABILIZATION ON CLUSTER NODES SCRAPE DURATION GKE CLUSTER DETAILS pay attention on allocation ))) Between scrapes 30 sec, during this time we have 2 15-sec intervals, So 4.37 sec prometheus needs to scrape 1 000 000 metrics
  • 18. VICTORIAMETRICS ARCHITECTURE VM-select_2VM-select_1 VM-select_3 VM-storage_2VM-storage_1 VM-storage_3 VM-insert_2VM-insert_1 VM-insert_3 LB/ClusterIP LB/ClusterIP STATEFUL STATELESS STATELESS READ OPERATIONS WRITE OPERATIONS
  • 19.
  • 20.
  • 21. AVANTAGES AND DISAVANTAGES - Infinity retention with reconfiguring storage - Global query view over data collected from storage - Horizontal scalability - Metrics compaction (multpile times better) (floating to integer) - Simple infrastructure - No integration with Alert Manager - Cloud storages are not supported yet https://github.com/VictoriaMetrics/VictoriaMetrics/issues/129 - More load on hosts
  • 22.
  • 23. 24 Hours Scroll Bar (500 reporters) 500 nodes, 4 times per minute, 24 hours = 2 880 000 000 points
  • 24.
  • 25.
  • 26. THANOS VICTORIAMETRICS - 12-15 GiB metrics per 1 day (2.88Bil) - 16GiB memory used on nodes - 2.1 – 2.4 CPU cores are used on nodes - 2.8-3 GiB metrics per 1 day on each storage(2.88Bil) - 16GiB memory used on nodes - 2.8 – 4 CPU cores are used on nodes Storage price (Cloud Storage*): 15*365=5475 ~5500Gib Storage total: $126.50 per month; ~$1500 in 1 year * Based on retention we can move data to a cold line storage class Storage price (Persistent Disk Standard): 3*365=1095 ~1100Gib $52.80 per month * Numer_of_Storages Storage total: 52.8*3=158.4 per 1 month * https://github.com/VictoriaMetrics/VictoriaMetrics/issues/134 If one of the storages lost – some part of data became unavailable PRICE COMPARISON
  • 27. Thanos Vicrotiametrics Instance’s price 3* N-standard-4 4vCPU 15GB memory $97.49 monthly estimate 3*97.49=$292 Standard Provisioned Space: 1,500 GB - $60 CPU usage Memory usage 50% 16GB 65% 16GB Metrics per day 15GB 9GB Metrics per minute 2 000 000 2 000 000 Metrics per one day 2 880 000 000 2 880 000 000 Scrape interval (1M metrics) 4.373 s 4.553 s Historical data access 303-525 ms (500 timeseies) 179-492 ms (500 timeseies)
  • 29. Q & A
  • 30.
  • 31. To produce downsampled data, the Compactor continuously aggregates series down to five minute and one hour resolutions. For each raw chunk, encoded with TSDB’s XOR compression, it stores different types of aggregations, e.g. min, max, or sum in a single block. This allows Querier to automatically choose the aggregate that is appropriate for a given PromQL query.
  • 34. VM Gorilla compression analysis The only problem is the result may exceed 64 bits — default integer size used in modern computers. How to deal with it? Normalize the integer by dividing by 10^M where M is the minimum value that allows fitting all the time series values into 64 bits and removing common trailing decimal zeros.