Grafana Optimization.pdf

•

0 gostou•24 visualizações

ShreyasKashyap12

Grafana Optimization

Engenharia

About me
• MitsuhiroTanda
• Infrastructure Engineer @ GREE
• Use Prometheus on AWS (1.5 year)
• Grafana committer
• @mtanda

Our environment
• deploy multiple Prometheus for each service
• each service launch 100 or more instances
• rarely use RDS, run MySQL on EC2
• various service, role, and many instances

Dashboard policy
• Adapt dynamic environment with Auto
scaling
• Avoid service specific parameter hard coding
• Reuse same dashboard for several service
• Prepare dashboard for drilldown analysis

Periodic check
Service trend
Alert!
Find a problem!
Drill down to the
Root cause
Operation flow

Key Grafana feature
• Templating
– Query parameter
– Datasource
• Panel Repeat
• Scripted dashboard
• Table panel (with Annotations)

Templated queries
Name Description
label_values(label) Returns a list of label values for the label in every metric.
label_values(metric, label) Returns a list of label values for the label in the specified metric.
metrics(metric) Returns a list of metrics matching the specified metric regex.
query_result(query) Returns a list of Prometheus query result for the query.
http://docs.grafana.org/datasources/prometheus/

Service trend
• Prepare dashboard for key metrics
– CPU Utilization
– Response time
– Etc.
• Filter by role, and check deeply

• Refresh option
– “OnTime Range Change”
– Query each time when time range changed
• label_values(metrics, label_key)
– Get label values from metrics
– Only match the metrics in current time range
– (match current active instances)

• Switch datasource quickly
• Can check several service on same dashboard

Alert
• We use PagerDuty to call on-call engineer
• Alert message also be posted to chat
• Message contains shortcut link to alert
dashboard

• Use Scripted dashboard
• Parse Prometheus alert view HTML
• And generate dashboard
• https://gist.github.com/mtanda/2aba0e96d2a8aace7b6b9a903bcd6b31

• Query “ALERTS” metrics of Prometheus
• Set alert annotation data toTable panel

Drilldown
• Show graphs for corresponding instance roles
• Host level metrics and systems metrics
• Need to create dashboard dynamically
• Use Scripted dashboard

graph definition (JSON file)
Instance role
Scripted dashboard
Generate dashboard!

• Filter by role and threshold
• Quickly find problematic instances

Wrap up
• Grafana is very powerful visualization tool
• It is little tricky, but very flexible
• Make better Grafana by contributing!

Mais conteúdo relacionado

Semelhante a Grafana Optimization.pdf

Introduction to Amazon AthenaAmazon Web Services

AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...Amazon Web Services

Prometheus - Utah Software Architecture Meetup - Clint Checkettsclintchecketts

Migrate from Oracle to Aurora PostgreSQL: Best Practices, Design Patterns, & ...Amazon Web Services

About QTP 9.2chandrasekhar

About Qtp_1 92techgajanan

About Qtp 92techgajanan

StackMate - CloudFormation for CloudStackChiradeep Vittal

On-boarding with JanusGraph PerformanceChin Huang

Introduction to Amazon AthenaAmazon Web Services

Apache Eagle: Secure Hadoop in Real TimeDataWorks Summit/Hadoop Summit

Apache Eagle at Hadoop Summit 2016 San JoseHao Chen

Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...Amazon Web Services

Optimising your Amazon Redshift Cluster for Peak PerformanceAmazon Web Services

Managing Millions of Tests Using DatabricksDatabricks

Large-Scale ETL Data Flows With Data Pipeline and DataductSourabh Bajaj

Overview of data analytics service: Treasure Data ServiceSATOSHI TAGOMORI

Managing tfsEsteban Garcia

Nov. 4, 2011 o reilly webcast-hbase- lars georgeO'Reilly Media

SQL Server 2008 For DevelopersJohn Sterrett

Semelhante a Grafana Optimization.pdf (20)

Introduction to Amazon Athena

AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...

Prometheus - Utah Software Architecture Meetup - Clint Checketts

Migrate from Oracle to Aurora PostgreSQL: Best Practices, Design Patterns, & ...

About QTP 9.2

About Qtp_1 92

About Qtp 92

StackMate - CloudFormation for CloudStack

On-boarding with JanusGraph Performance

Introduction to Amazon Athena

Apache Eagle: Secure Hadoop in Real Time

Apache Eagle at Hadoop Summit 2016 San Jose

Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...

Optimising your Amazon Redshift Cluster for Peak Performance

Managing Millions of Tests Using Databricks

Large-Scale ETL Data Flows With Data Pipeline and Dataduct

Overview of data analytics service: Treasure Data Service

Managing tfs

Nov. 4, 2011 o reilly webcast-hbase- lars george

SQL Server 2008 For Developers

Último

UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)Dr SOUNDIRARAJ N

Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665

US Department of Education FAFSA Week of ActionMebane Rash

lifi-technology with integration of IOT.pptxsomshekarkn64

Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)dollysharma2066

young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service9953056974 Low Rate Call Girls In Saket, Delhi NCR

Risk Assessment For Installation of Drainage Pipes.pdfROCENODodongVILLACER

Past, Present and Future of Generative AIabhishek36461

Design and analysis of solar grass cutter.pdfTagore Institute of Engineering And Technology

Correctly Loading Incremental Data at ScaleAlluxio, Inc.

Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis

Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncssuser2ae721

8251 universal synchronous asynchronous receiver transmitterShivangiSharma879191

TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catcherssdickerson1

Work Experience-Dalton Park.pptxfvvvvvvvLewisJB

Electronically Controlled suspensions system .pdfme23b1001

Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ

Solving The Right Triangles PowerPoint 2.pptJasonTagapanGulla

Arduino_CSE ece ppt for working and principal of arduino.pptSAURABHKUMAR892774

Introduction-To-Agricultural-Surveillance-Rover.pptxk795866

Grafana Optimization.pdf

1. Grafana optimization for Prometheus

2. About me • MitsuhiroTanda • Infrastructure Engineer @ GREE • Use Prometheus on AWS (1.5 year) • Grafana committer • @mtanda

3. Our environment • deploy multiple Prometheus for each service • each service launch 100 or more instances • rarely use RDS, run MySQL on EC2 • various service, role, and many instances

4. Dashboard policy • Adapt dynamic environment with Auto scaling • Avoid service specific parameter hard coding • Reuse same dashboard for several service • Prepare dashboard for drilldown analysis

5. Periodic check Service trend Alert! Find a problem! Drill down to the Root cause Operation flow

6. Key Grafana feature • Templating – Query parameter – Datasource • Panel Repeat • Scripted dashboard • Table panel (with Annotations)

7. Templated queries Name Description label_values(label) Returns a list of label values for the label in every metric. label_values(metric, label) Returns a list of label values for the label in the specified metric. metrics(metric) Returns a list of metrics matching the specified metric regex. query_result(query) Returns a list of Prometheus query result for the query. http://docs.grafana.org/datasources/prometheus/

8. Service trend • Prepare dashboard for key metrics – CPU Utilization – Response time – Etc. • Filter by role, and check deeply

9. Dynamic dashboard

10.

11. • Refresh option – “OnTime Range Change” – Query each time when time range changed • label_values(metrics, label_key) – Get label values from metrics – Only match the metrics in current time range – (match current active instances)

12. Datasource templating

13.

14. • Switch datasource quickly • Can check several service on same dashboard

15. Alert • We use PagerDuty to call on-call engineer • Alert message also be posted to chat • Message contains shortcut link to alert dashboard

16. Prometheus alert view

17. Grafana alert view

18. • Use Scripted dashboard • Parse Prometheus alert view HTML • And generate dashboard • https://gist.github.com/mtanda/2aba0e96d2a8aace7b6b9a903bcd6b31

19. Alert history

20.

21.

22. • Query “ALERTS” metrics of Prometheus • Set alert annotation data toTable panel

23. Drilldown • Show graphs for corresponding instance roles • Host level metrics and systems metrics • Need to create dashboard dynamically • Use Scripted dashboard

24. graph definition (JSON file) Instance role Scripted dashboard Generate dashboard!

25.

26. EBS latency dashboard

27. • Filter by role and threshold • Quickly find problematic instances

28. Wrap up • Grafana is very powerful visualization tool • It is little tricky, but very flexible • Make better Grafana by contributing!

Grafana Optimization.pdf

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a Grafana Optimization.pdf

Semelhante a Grafana Optimization.pdf (20)

Último

Último (20)

Grafana Optimization.pdf