SlideShare uma empresa Scribd logo
1 de 14
Baixar para ler offline
Operating Prometheus
モニタリング勉強会
2017/10/27 @kfdm
Self Introduction
• Paul Traylor
• LINE Fukuoka 開発室
• Currently responsible for updating monitoring environment at
LINE Fukuoka
• https://github.com/line/promgen
• https://promcon.io/2017-munich/talks/prometheus-as-a-
internal-service/
Operating Prometheus at LINE Fukuoka
• 4 HA Pairs
• ~2000 targets
per machine
• ~800k samples
per machine
• ~3.5 million samples
• ~7000 exporters
https://github.com/line/promgen
Scaling Prometheus ‒ HA
• Run multiple Prometheus
instance with the same targets
• Alerts are de-duplicated by Alertmanager
Scaling Prometheus ‒ Shard
• Split targets
across multiple
servers
• Alertmanager
de-duplicates
alerts
• Proxy or remote
read
Prometheus 1.8 ‒ Storage Format
https://promcon.io/2016-berlin/talks/the-prometheus-time-series-database/
http://labs.gree.jp/blog/2017/10/16614/
• One series per file
• Rewrites may have
to touch millions
of files
• Queries also may
touch millions of
files
• No easy way to backup
Prometheus 2.0 ‒ New Storage Format
https://promcon.io/2017-munich/slides/storing-16-bytes-at-scale.pdf
https://fabxc.org/blog/2017-04-10-writing-a-tsdb/
• Chunks stored in buckets by time
• Chunks past retention setting are just deleted
• Easier to backup
• Easier to compress
Prometheus 2.0 ‒ Backups
├── 01BX40G8TA6T1MNSS8JJE7ENPY/
│ ├── chunks/
│ ├── index
│ ├── meta.json
│ └── tombstones
├── 01BX5Y9SSE10VBZK4CMZ86WDR6/
│ ├── chunks/
│ ├── index
│ ├── meta.json
│ └── tombstones
├── lock
└── wal/
├── 000760
└── 000761
• https://github.com/Gouthamve/agni
Prometheus 2.0 ‒ Flag Changes
• Most flags move from single dash to double dash
• Many storage settings move to tsdb settings
• -config.file -> --config.file
• -storage.local.path -> --storage.tsdb.path
Prometheus 2.0 ‒ Rule Format Changes
https://www.robustperception.io/converting-rules-to-
the-prometheus-2-0-format/
groups:
- name: alert.rules
rules:
- alert: HighErrorRate
expr: job:request_latency_seconds:mean5m{job="myjob"}
> 0.5
for: 10m
annotations:
summary: High request latency
- alert: DailyTest
expr: vector(1)
for: 1m
annotations:
summary: Daily alert test
• ./promtool update rules /path/to/rules
Prometheus 2.0 ‒ Migration
Prometheus 2.0 ‒ Remote Read
• Prometheus 1.8 (Read)
• InfluxDB (Read and Write)
• Graphite (Write)
• OpenTSDB (Write)
• TimescaledB (Read and Write)
• https://prometheus.io/docs/operating/integrations/
• https://github.com/prometheus/prometheus/tree/master/do
cumentation/examples/remote_storage/remote_storage_ada
pter
Open Metrics
• https://github.com/RichiH/OpenMetrics
• https://github.com/RichiH/OpenMetrics/blob/master/CONT
RIBUTORS.md
Questions?

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Monitoring a Kubernetes-backed microservice architecture with Prometheus
Monitoring a Kubernetes-backed microservice architecture with PrometheusMonitoring a Kubernetes-backed microservice architecture with Prometheus
Monitoring a Kubernetes-backed microservice architecture with Prometheus
 
Prometheus + Grafana = Awesome Monitoring
Prometheus + Grafana = Awesome MonitoringPrometheus + Grafana = Awesome Monitoring
Prometheus + Grafana = Awesome Monitoring
 
The history of Prometheus at SoundCloud
The history of Prometheus at SoundCloudThe history of Prometheus at SoundCloud
The history of Prometheus at SoundCloud
 
Promcon2016
Promcon2016Promcon2016
Promcon2016
 
Fission: Serverless Functions for Kubernetes
Fission: Serverless Functions for KubernetesFission: Serverless Functions for Kubernetes
Fission: Serverless Functions for Kubernetes
 
Monitoring akka cluster on kubernetes
Monitoring akka cluster on kubernetesMonitoring akka cluster on kubernetes
Monitoring akka cluster on kubernetes
 
Monitoring on Kubernetes using prometheus
Monitoring on Kubernetes using prometheusMonitoring on Kubernetes using prometheus
Monitoring on Kubernetes using prometheus
 
Monitoring Kafka w/ Prometheus
Monitoring Kafka w/ PrometheusMonitoring Kafka w/ Prometheus
Monitoring Kafka w/ Prometheus
 
Prometheus london
Prometheus londonPrometheus london
Prometheus london
 
Monitoring with prometheus at scale
Monitoring with prometheus at scaleMonitoring with prometheus at scale
Monitoring with prometheus at scale
 
3.2 Streaming and Messaging
3.2 Streaming and Messaging3.2 Streaming and Messaging
3.2 Streaming and Messaging
 
Prometheus (Microsoft, 2016)
Prometheus (Microsoft, 2016)Prometheus (Microsoft, 2016)
Prometheus (Microsoft, 2016)
 
Monitoring, the Prometheus Way - Julius Voltz, Prometheus
Monitoring, the Prometheus Way - Julius Voltz, Prometheus Monitoring, the Prometheus Way - Julius Voltz, Prometheus
Monitoring, the Prometheus Way - Julius Voltz, Prometheus
 
Prometheus on AWS
Prometheus on AWSPrometheus on AWS
Prometheus on AWS
 
Managing multi-package repositories
Managing multi-package repositoriesManaging multi-package repositories
Managing multi-package repositories
 
App engine ja night 9 beertalk2
App engine ja night 9 beertalk2App engine ja night 9 beertalk2
App engine ja night 9 beertalk2
 
[Draft] Fast Prototyping with DPDK and eBPF in Containernet
[Draft] Fast Prototyping with DPDK and eBPF in Containernet[Draft] Fast Prototyping with DPDK and eBPF in Containernet
[Draft] Fast Prototyping with DPDK and eBPF in Containernet
 
Prometheus - Intro, CNCF, TSDB,PromQL,Grafana
Prometheus - Intro, CNCF, TSDB,PromQL,GrafanaPrometheus - Intro, CNCF, TSDB,PromQL,Grafana
Prometheus - Intro, CNCF, TSDB,PromQL,Grafana
 
Webpack presentation
Webpack presentationWebpack presentation
Webpack presentation
 
Weave Cortex: Multi-tenant, horizontally scalable Prometheus as a Service
Weave Cortex: Multi-tenant, horizontally scalable Prometheus as a ServiceWeave Cortex: Multi-tenant, horizontally scalable Prometheus as a Service
Weave Cortex: Multi-tenant, horizontally scalable Prometheus as a Service
 

Semelhante a 20171027 モニタリング勉強会

Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit
 

Semelhante a 20171027 モニタリング勉強会 (20)

Solum - OpenStack PaaS / ALM - Austin OpenStack summit
Solum - OpenStack PaaS / ALM - Austin OpenStack summitSolum - OpenStack PaaS / ALM - Austin OpenStack summit
Solum - OpenStack PaaS / ALM - Austin OpenStack summit
 
Docker based Architecture by Denys Serdiuk
Docker based Architecture by Denys SerdiukDocker based Architecture by Denys Serdiuk
Docker based Architecture by Denys Serdiuk
 
Follow the White Rabbit - Message Queues with PHP
Follow the White Rabbit - Message Queues with PHPFollow the White Rabbit - Message Queues with PHP
Follow the White Rabbit - Message Queues with PHP
 
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache AccumuloReal-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
 
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
 
Cortex: Horizontally Scalable, Highly Available Prometheus
Cortex: Horizontally Scalable, Highly Available PrometheusCortex: Horizontally Scalable, Highly Available Prometheus
Cortex: Horizontally Scalable, Highly Available Prometheus
 
lessons from managing a pulsar cluster
 lessons from managing a pulsar cluster lessons from managing a pulsar cluster
lessons from managing a pulsar cluster
 
Introduction to webprogramming using PHP and MySQL
Introduction to webprogramming using PHP and MySQLIntroduction to webprogramming using PHP and MySQL
Introduction to webprogramming using PHP and MySQL
 
Asynchronous Frameworks.pptx
Asynchronous Frameworks.pptxAsynchronous Frameworks.pptx
Asynchronous Frameworks.pptx
 
Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0
 
Introduction Apache Kafka
Introduction Apache KafkaIntroduction Apache Kafka
Introduction Apache Kafka
 
Alfresco Coding mit dem Alfresco SDK (auf Englisch) - Julien Bruinaud, Techni...
Alfresco Coding mit dem Alfresco SDK (auf Englisch) - Julien Bruinaud, Techni...Alfresco Coding mit dem Alfresco SDK (auf Englisch) - Julien Bruinaud, Techni...
Alfresco Coding mit dem Alfresco SDK (auf Englisch) - Julien Bruinaud, Techni...
 
12 Factor App Methodology
12 Factor App Methodology12 Factor App Methodology
12 Factor App Methodology
 
Streaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit LogStreaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit Log
 
Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...
Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...
Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...
 
Open Source Libraries for.NET developers
Open Source Libraries for.NET developersOpen Source Libraries for.NET developers
Open Source Libraries for.NET developers
 
Machine Learning With H2O vs SparkML
Machine Learning With H2O vs SparkMLMachine Learning With H2O vs SparkML
Machine Learning With H2O vs SparkML
 
SCAPE Information Day at BL - Some of the SCAPE Outputs Available
SCAPE Information Day at BL - Some of the SCAPE Outputs AvailableSCAPE Information Day at BL - Some of the SCAPE Outputs Available
SCAPE Information Day at BL - Some of the SCAPE Outputs Available
 
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraReal-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
 
Webinar: Queues with RabbitMQ - Lorna Mitchell
Webinar: Queues with RabbitMQ - Lorna MitchellWebinar: Queues with RabbitMQ - Lorna Mitchell
Webinar: Queues with RabbitMQ - Lorna Mitchell
 

Último

TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 

Último (20)

ManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide DeckManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide Deck
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
LEVEL 5 - SESSION 1 2023 (1).pptx - PDF 123456
LEVEL 5   - SESSION 1 2023 (1).pptx - PDF 123456LEVEL 5   - SESSION 1 2023 (1).pptx - PDF 123456
LEVEL 5 - SESSION 1 2023 (1).pptx - PDF 123456
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 

20171027 モニタリング勉強会

  • 2. Self Introduction • Paul Traylor • LINE Fukuoka 開発室 • Currently responsible for updating monitoring environment at LINE Fukuoka • https://github.com/line/promgen • https://promcon.io/2017-munich/talks/prometheus-as-a- internal-service/
  • 3. Operating Prometheus at LINE Fukuoka • 4 HA Pairs • ~2000 targets per machine • ~800k samples per machine • ~3.5 million samples • ~7000 exporters https://github.com/line/promgen
  • 4. Scaling Prometheus ‒ HA • Run multiple Prometheus instance with the same targets • Alerts are de-duplicated by Alertmanager
  • 5. Scaling Prometheus ‒ Shard • Split targets across multiple servers • Alertmanager de-duplicates alerts • Proxy or remote read
  • 6. Prometheus 1.8 ‒ Storage Format https://promcon.io/2016-berlin/talks/the-prometheus-time-series-database/ http://labs.gree.jp/blog/2017/10/16614/ • One series per file • Rewrites may have to touch millions of files • Queries also may touch millions of files • No easy way to backup
  • 7. Prometheus 2.0 ‒ New Storage Format https://promcon.io/2017-munich/slides/storing-16-bytes-at-scale.pdf https://fabxc.org/blog/2017-04-10-writing-a-tsdb/ • Chunks stored in buckets by time • Chunks past retention setting are just deleted • Easier to backup • Easier to compress
  • 8. Prometheus 2.0 ‒ Backups ├── 01BX40G8TA6T1MNSS8JJE7ENPY/ │ ├── chunks/ │ ├── index │ ├── meta.json │ └── tombstones ├── 01BX5Y9SSE10VBZK4CMZ86WDR6/ │ ├── chunks/ │ ├── index │ ├── meta.json │ └── tombstones ├── lock └── wal/ ├── 000760 └── 000761 • https://github.com/Gouthamve/agni
  • 9. Prometheus 2.0 ‒ Flag Changes • Most flags move from single dash to double dash • Many storage settings move to tsdb settings • -config.file -> --config.file • -storage.local.path -> --storage.tsdb.path
  • 10. Prometheus 2.0 ‒ Rule Format Changes https://www.robustperception.io/converting-rules-to- the-prometheus-2-0-format/ groups: - name: alert.rules rules: - alert: HighErrorRate expr: job:request_latency_seconds:mean5m{job="myjob"} > 0.5 for: 10m annotations: summary: High request latency - alert: DailyTest expr: vector(1) for: 1m annotations: summary: Daily alert test • ./promtool update rules /path/to/rules
  • 11. Prometheus 2.0 ‒ Migration
  • 12. Prometheus 2.0 ‒ Remote Read • Prometheus 1.8 (Read) • InfluxDB (Read and Write) • Graphite (Write) • OpenTSDB (Write) • TimescaledB (Read and Write) • https://prometheus.io/docs/operating/integrations/ • https://github.com/prometheus/prometheus/tree/master/do cumentation/examples/remote_storage/remote_storage_ada pter
  • 13. Open Metrics • https://github.com/RichiH/OpenMetrics • https://github.com/RichiH/OpenMetrics/blob/master/CONT RIBUTORS.md