SlideShare a Scribd company logo
1 of 162
Download to read offline
Prometheus
Prometheus
Course Introduction
Spinnaker course: Edward Viaene & Jorn Jambers
Prometheus
• Thank you for taking this Prometheus Cours
e

• Open-source monitoring solution & time series database


• Was built by Soundclou
d

• Very active developer and user community


• Now its a standalone open source project


• Joined the Cloud Native Computing Foundation in 2016


• Ideal for monitoring on premise as well as cloud workloads
Prometheus course: Edward Viaene & Jorn Jambers
Course Overview
Introduction Monitoring Alerting Internals Use cases
What is Prometheus Client Libraries Introduction Storage Grafana Provisioning
Installing Prometheus
& Grafana
Pushing metrics Setting up alerts Security
Cloudwatch
Integration
Concepts Querying
Con
fi
guration Service Discovery
Monitoring Nodes Exporters
Architecture
Prometheus course: Edward Viaene & Jorn Jambers
Course Objectives
• To be able to use Prometheus


• To get familiar with the Prometheus ecosystem


• To setup a monitoring platform usin
g

• Prometheus


• To create alerts in Prometheus


• To be able to query Prometheus data
Prometheus course: Edward Viaene & Jorn Jambers
Who is Edward Viaene
• My name is Edward Viaene


• I am a consultant and trainer in Cloud and Big Data technologies


• I’m a big advocate of Agile and DevOps techniques


• I held various roles from banking to startups


• I have a background in Unix/Linux, Networks, Security, Risk, and
distributed computing


• Nowadays I specialize in everything that has to do with Cloud and
DevOps
Prometheus course: Edward Viaene & Jorn Jambers
Who is Jorn Jambers
• My name is Jorn Jambers


• I am a freelance DevOps consultant and trainer


• DevOps advocate


• Worked in banks, consultancy companies and startups


• In the latter I found my passion for DevOps


• I have a background in Unix/Linux, Hadoop, DBA, Networks, automations


• Today I help companies succeed on the public cloud
Prometheus course: Edward Viaene & Jorn Jambers
Online Training
• Online training on Udemy


• DevOps, Distributed Computing, Cloud, Big Data


• Using online video lectures


• 40,000+ enrolled students in 100+ countries
Prometheus
Introduction
Prometheus course: Edward Viaene & Jorn Jambers
Prometheus
• Prometheus is an Open source monitoring solution


• Started at SoundCloud around 2012-2013, and was made public in early
2015


• Prometheus provides Metrics & Alerting


• It is inspired by Google’s Borgmon, which uses time-series data as a
datasource, to then send alerts based on this data


• It
fi
ts very well in the cloud native infrastructur
e

• Prometheus is also a member of the CNCF (Cloud Native Foundation)
Prometheus course: Edward Viaene & Jorn Jambers
Prometheus
• In Prometheus we talk about Dimensional Data: time series are identi
fi
ed by metric
name and a set of key/value pairs


• Prometheus includes a Flexible Query Language


• Visualizations can be shown using a built-in expression browser or with
integrations like Grafana


• It stores metrics in memory and local disk in an own custom, ef
fi
cient format


• It is written in Go


• Many client libraries and integrations available
Metric name Label Sample
Temperature location=outside 90
Prometheus course: Edward Viaene & Jorn Jambers
How does Prometheus work?
• Prometheus collects metrics from monitored
targets by scraping metrics HTTP endpoint
s

• This is fundamentally different than other
monitoring and alerting systems, (except
this is also how Google’s Borgmon works)


• Rather than using custom scripts that
check on particular services and systems,
the monitoring data itself is use
d

• Scraping endpoints is much more ef
fi
cient
than other mechanisms, like 3rd party agents


• A single prometheus server is able to ingest
up to one million samples per second as
several million time series
Database
Windows Server
Application
Prometheus
HTTP
HTTP
HTTP
Prometheus
Installation
Prometheus course: Edward Viaene & Jorn Jambers
Prometheus Installation
• I will install Prometheus using scripts from our GitHub repository (https://github.com/
in4it/prometheus-course)


• They will work on any modern Linux distribution


• I’ll install it on a DigitalOcean droplet


• Feel free to use the scripts with any Cloud Provider, Virtual Machine, or Docker
image, as long as it’s a recent Linux distribution


• To get a free $100 coupon on DigitalOcean, valid for 60-days with a valid payment
method added, use the following link:




https://m.do.co/c/b71b388ab76f
• $10 is enough to run a 2 GB memory droplet for one month
Prometheus course: Edward Viaene & Jorn Jambers
Prometheus Installation
• If you do not want to use the provided scripts, you can download the full
distribution from https://github.com/prometheus/prometheus/releases


• MacOS, Windows, Linux, and some Unix distributions are supported


• After extracting you’ll get a prometheus executable (prometheus.exe for
windows), which you can use to run prometheus, for example:


• ./prometheus --con
fi
g.
fi
le /path/to/prometheus.yaml


• It’s best to use the scripts we provided so that your environment is the
same as ours when you follow the demos
Demo
Installing Prometheus & Grafana
Prometheus
Basic Concepts
Prometheus course: Edward Viaene & Jorn Jambers
Concepts
• All data is stored as time series


• Every time series is identi
fi
ed
by the “metric name” and a
set of key-value pairs,
called labels


• metric: go_memstat_alloc_bytes


• instance=localhost:9090


• job=prometheus
Prometheus course: Edward Viaene & Jorn Jambers
Prometheus
• The time series data also consists of the actual data, called Samples:


• It can be a
fl
oat64 value


• or a millisecond-precision timestamp
Prometheus course: Edward Viaene & Jorn Jambers
Prometheus
• The notation of time series is often using this notation:


• <metric name>{<label name>=<label value>, …}


• For example:


• node_boot_time{instance="localhost:9100",job="node_exporter"}
Prometheus
Prometheus Con
fi
guration
fi
le
Prometheus course: Edward Viaene & Jorn Jambers
Prometheus Con
fi
guration
• The con
fi
guration is stored in the Prometheus con
fi
guration
fi
le, in yaml
format


• The con
fi
guration
fi
le can be changed and applied, without having to
restart Prometheus


• A reload can be done by executing kill -SIGHUP <pid>


• You can also pass parameters (
fl
ags) at startup time to ./prometheus


• Those parameters cannot be changed without restarting Prometheus


• The con
fi
guration
fi
le is passed using the
fl
ag --con
fi
g.
fi
le
Prometheus course: Edward Viaene & Jorn Jambers
Prometheus Con
fi
guration
• The default con
fi
guration looks like this:
# my global config


global:


scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.


evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.


# scrape_timeout is set to the global default (10s).


# Alertmanager configuration


alerting:


alertmanagers:


- static_configs:


- targets:


# - alertmanager:9093


# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.


rule_files:


# - "first_rules.yml"


# - "second_rules.yml"
Prometheus course: Edward Viaene & Jorn Jambers
Prometheus Con
fi
guration
• To scrape metrics, you need to add con
fi
guration to the prometheus
con
fi
g
fi
le


• For example, to scrape metrics from prometheus itself, the following code
block is added by default
# A scrape configuration containing exactly one endpoint to scrape:


# Here it's Prometheus itself.


scrape_configs:


# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.


- job_name: 'prometheus'


# metrics_path defaults to '/metrics'


# scheme defaults to 'http'.


static_configs:


- targets: ['localhost:9090']
Demo
Prometheus Con
fi
guration
Prometheus
Monitoring Nodes
Prometheus course: Edward Viaene & Jorn Jambers
Monitor nodes
• To monitor nodes, you need to install the node-exporter


• The node exporter will expose machine metrics of Linux / *Nix machines


• For example: cpu usage, memory usage


• The node exporter can be used to monitor machines, and later on, you
can create alerts based on these ingested metric
s

• For Windows, there’s a WMI exporter (see https://github.com/martinlindhe/
wmi_exporter)
Prometheus course: Edward Viaene & Jorn Jambers
Monitor nodes
Linux machine
node exporter HTTP
Windows machine
WMI exporter HTTP
Prometheus
Prometheus
Demo
Node exporter
Demo
WMI exporter
Prometheus
Monitoring
Prometheus course: Edward Viaene & Jorn Jambers
Monitoring - Introduction
• Client Libraries


• Pushing Metrics


• Querying


• Service Discovery


• Exporters
Prometheus
Client Libraries
Prometheus course: Edward Viaene & Jorn Jambers
Client Libraries - Introduction
• Instrumenting your code


• Libraries


• Of
fi
cial: Go, Java/Scala, Python, Ruby


• Unof
fi
cial: Bash, C++, Common Lisp, Elixir, Erlang, Haskell, Lua for
Nginx, Lua for Tarantool, .NET / C#, Node.js, PHP, Rust


• No client library available?


• Implement it yourself in one of the supported exposition formats
Prometheus course: Edward Viaene & Jorn Jambers
Client Libraries - Introduction
• Exposition formats:


• Simple text-based format


• Protocol-buffer format (Prometheus 2.0 removed support for the protocol-buffer format)
metric_name [


"{" label_name "=" `"` label_value `"` { "," label_name "=" `"` label_value `"` } [ "," ] "}"


] value [ timestamp ]


node_filesystem_avail_bytes{device="/dev/vda1",fstype="ext4",mountpoint="/"} 4.9386491904e+10


node_filesystem_avail_bytes{device="/dev/vda15",fstype="vfat",mountpoint="/boot/efi"} 1.05903104e+08


node_filesystem_avail_bytes{device="lxcfs",fstype="fuse.lxcfs",mountpoint="/var/lib/lxcfs"} 0


node_filesystem_avail_bytes{device="tmpfs",fstype="tmpfs",mountpoint="/run"} 2.01273344e+08


node_filesystem_avail_bytes{device="tmpfs",fstype="tmpfs",mountpoint="/run/lock"} 5.24288e+06
Prometheus course: Edward Viaene & Jorn Jambers
Client Libraries - Introduction
• 4 types of metrics


• Counte
r

• Gaug
e

• Histogra
m

• Summary
Prometheus course: Edward Viaene & Jorn Jambers
Client Libraries - Introduction
• Counter




A value that only goes up (e.g. Visits to a website)
Prometheus course: Edward Viaene & Jorn Jambers
Client Libraries - Introduction
• Gauge




Single numeric value that can go up and down (e.g. CPU load,
temperature)
Prometheus course: Edward Viaene & Jorn Jambers
Client Libraries - Introduction
• Histogram




Samples observations (e.g. request durations or response sizes) and
these observations get counted into buckets. Includes (_count and _sum)




Main purpose is calculating quantiles
Prometheus course: Edward Viaene & Jorn Jambers
Client Libraries - Introduction
• Summary




Similar to a histogram, a summary samples observations (e.g. request
durations or response sizes). A summary also provides a total count of
observations and a sum of all observed values, it calculates con
fi
gurable
quantiles over a sliding time window.




Example: You need 2 counters for calculating the latency


1) total request(_count)


2) the total latency of those requests (_sum)




Take the rate() and divide = average latency
Prometheus
Instrumentation- Python
Prometheus course: Edward Viaene & Jorn Jambers
Client Libraries - Python Example
• https://github.com/prometheus/client_python


• Of
fi
cially supported language


• pip install prometheus_client


• Supported metrics: Counter, Gauge, Summary and Histogram


Prometheus course: Edward Viaene & Jorn Jambers
Client Libraries - Python Example
• Python example:


import random, time


from flask import Flask, render_template_string, abort


from prometheus_client import generate_latest, REGISTRY, Counter, Gauge, Histogram


app = Flask(__name__)


REQUESTS = Counter('http_requests_total', 'Total HTTP Requests (count)', ['method', 'endpoint', 'status_code'])


IN_PROGRESS = Gauge('http_requests_inprogress', 'Number of in progress HTTP requests')


TIMINGS = Histogram('http_request_duration_seconds', 'HTTP request latency (seconds)')


@app.route('/')


@TIMINGS.time()


@IN_PROGRESS.track_inprogress()


def hello_world():


REQUESTS.labels(method='GET', endpoint="/", status_code=200).inc() # Increment the counter


return 'Hello, World!'


@app.route('/prometheus-course/<name>')


@IN_PROGRESS.track_inprogress()


@TIMINGS.time()


def index(name):


REQUESTS.labels(method='GET', endpoint="/prometheus-course/<name>", status_code=200).inc()


return render_template_string('<b>Hello {{name}} welcome!</b>!', name=name)


@app.route('/metrics')


@IN_PROGRESS.track_inprogress()
Prometheus course: Edward Viaene & Jorn Jambers
Client Libraries - Python Example
• Python example:


import random, time


from flask import Flask, render_template_string, abort


from prometheus_client import generate_latest, REGISTRY, Counter, Gauge, Histogram


app = Flask(__name__)


REQUESTS = Counter('http_requests_total', 'Total HTTP Requests (count)', ['method', 'endpoint', 'status_code'])


IN_PROGRESS = Gauge('http_requests_inprogress', 'Number of in progress HTTP requests')


TIMINGS = Histogram('http_request_duration_seconds', 'HTTP request latency (seconds)')


@app.route('/')


@TIMINGS.time()


@IN_PROGRESS.track_inprogress()


def hello_world():


REQUESTS.labels(method='GET', endpoint="/", status_code=200).inc() # Increment the counter


return 'Hello, World!'


@app.route('/prometheus-course/<name>')


@IN_PROGRESS.track_inprogress()


@TIMINGS.time()


def index(name):


REQUESTS.labels(method='GET', endpoint="/prometheus-course/<name>", status_code=200).inc()


return render_template_string('<b>Hello {{name}} welcome!</b>!', name=name)


@app.route('/metrics')


@IN_PROGRESS.track_inprogress()
Prometheus course: Edward Viaene & Jorn Jambers
Client Libraries - Python Example
• Python example:


import random, time


from flask import Flask, render_template_string, abort


from prometheus_client import generate_latest, REGISTRY, Counter, Gauge, Histogram


app = Flask(__name__)


REQUESTS = Counter('http_requests_total', 'Total HTTP Requests (count)', ['method', 'endpoint', 'status_code'])


IN_PROGRESS = Gauge('http_requests_inprogress', 'Number of in progress HTTP requests')


TIMINGS = Histogram('http_request_duration_seconds', 'HTTP request latency (seconds)')


@app.route('/')


@TIMINGS.time()


@IN_PROGRESS.track_inprogress()


def hello_world():


REQUESTS.labels(method='GET', endpoint="/", status_code=200).inc() # Increment the counter


return 'Hello, World!'


@app.route('/prometheus-course/<name>')


@IN_PROGRESS.track_inprogress()


@TIMINGS.time()


def index(name):


REQUESTS.labels(method='GET', endpoint="/prometheus-course/<name>", status_code=200).inc()


return render_template_string('<b>Hello {{name}} welcome!</b>!', name=name)


@app.route('/metrics')


@IN_PROGRESS.track_inprogress()
Prometheus course: Edward Viaene & Jorn Jambers
Client Libraries - Python Example
• Python example:


import random, time


from flask import Flask, render_template_string, abort


from prometheus_client import generate_latest, REGISTRY, Counter, Gauge, Histogram


app = Flask(__name__)


REQUESTS = Counter('http_requests_total', 'Total HTTP Requests (count)', ['method', 'endpoint', 'status_code'])


IN_PROGRESS = Gauge('http_requests_inprogress', 'Number of in progress HTTP requests')


TIMINGS = Histogram('http_request_duration_seconds', 'HTTP request latency (seconds)')


@app.route('/')


@TIMINGS.time()


@IN_PROGRESS.track_inprogress()


def hello_world():


REQUESTS.labels(method='GET', endpoint="/", status_code=200).inc() # Increment the counter


return 'Hello, World!'


@app.route('/prometheus-course/<name>')


@IN_PROGRESS.track_inprogress()


@TIMINGS.time()


def index(name):


REQUESTS.labels(method='GET', endpoint="/prometheus-course/<name>", status_code=200).inc()


return render_template_string('<b>Hello {{name}} welcome!</b>!', name=name)


@app.route('/metrics')


@IN_PROGRESS.track_inprogress()
Prometheus course: Edward Viaene & Jorn Jambers
Client Libraries - Python Example
• Python example:


import random, time


from flask import Flask, render_template_string, abort


from prometheus_client import generate_latest, REGISTRY, Counter, Gauge, Histogram


app = Flask(__name__)


REQUESTS = Counter('http_requests_total', 'Total HTTP Requests (count)', ['method', 'endpoint', 'status_code'])


IN_PROGRESS = Gauge('http_requests_inprogress', 'Number of in progress HTTP requests')


TIMINGS = Histogram('http_request_duration_seconds', 'HTTP request latency (seconds)')


@app.route('/')


@TIMINGS.time()


@IN_PROGRESS.track_inprogress()


def hello_world():


REQUESTS.labels(method='GET', endpoint="/", status_code=200).inc() # Increment the counter


return 'Hello, World!'


@app.route('/prometheus-course/<name>')


@IN_PROGRESS.track_inprogress()


@TIMINGS.time()


def index(name):


REQUESTS.labels(method='GET', endpoint="/prometheus-course/<name>", status_code=200).inc()


return render_template_string('<b>Hello {{name}} welcome!</b>!', name=name)


@app.route('/metrics')


@IN_PROGRESS.track_inprogress()
Prometheus course: Edward Viaene & Jorn Jambers
Client Libraries - Python Example
• Python example:


import random, time


from flask import Flask, render_template_string, abort


from prometheus_client import generate_latest, REGISTRY, Counter, Gauge, Histogram


app = Flask(__name__)


REQUESTS = Counter('http_requests_total', 'Total HTTP Requests (count)', ['method', 'endpoint', 'status_code'])


IN_PROGRESS = Gauge('http_requests_inprogress', 'Number of in progress HTTP requests')


TIMINGS = Histogram('http_request_duration_seconds', 'HTTP request latency (seconds)')


@app.route('/')


@TIMINGS.time()


@IN_PROGRESS.track_inprogress()


def hello_world():


REQUESTS.labels(method='GET', endpoint="/", status_code=200).inc() # Increment the counter


return 'Hello, World!'


@app.route('/prometheus-course/<name>')


@IN_PROGRESS.track_inprogress()


@TIMINGS.time()


def index(name):


REQUESTS.labels(method='GET', endpoint="/prometheus-course/<name>", status_code=200).inc()


return render_template_string('<b>Hello {{name}} welcome!</b>!', name=name)


@app.route('/metrics')


@IN_PROGRESS.track_inprogress()
Prometheus course: Edward Viaene & Jorn Jambers
Client Libraries - Python Example
• Python example:


import random, time


from flask import Flask, render_template_string, abort


from prometheus_client import generate_latest, REGISTRY, Counter, Gauge, Histogram


app = Flask(__name__)


REQUESTS = Counter('http_requests_total', 'Total HTTP Requests (count)', ['method', 'endpoint', 'status_code'])


IN_PROGRESS = Gauge('http_requests_inprogress', 'Number of in progress HTTP requests')


TIMINGS = Histogram('http_request_duration_seconds', 'HTTP request latency (seconds)')


@app.route('/')


@TIMINGS.time()


@IN_PROGRESS.track_inprogress()


def hello_world():


REQUESTS.labels(method='GET', endpoint="/", status_code=200).inc() # Increment the counter


return 'Hello, World!'


@app.route('/prometheus-course/<name>')


@IN_PROGRESS.track_inprogress()


@TIMINGS.time()


def index(name):


REQUESTS.labels(method='GET', endpoint=“/prometheus-course/jorn", status_code=200).inc()


return render_template_string('<b>Hello {{name}} welcome!</b>!', name=name)


@app.route('/metrics')


@IN_PROGRESS.track_inprogress()
Prometheus course: Edward Viaene & Jorn Jambers
Client Libraries - Python Example
• Python example:


import random, time


from flask import Flask, render_template_string, abort


from prometheus_client import generate_latest, REGISTRY, Counter, Gauge, Histogram


app = Flask(__name__)


REQUESTS = Counter('http_requests_total', 'Total HTTP Requests (count)', ['method', 'endpoint', 'status_code'])


IN_PROGRESS = Gauge('http_requests_inprogress', 'Number of in progress HTTP requests')


TIMINGS = Histogram('http_request_duration_seconds', 'HTTP request latency (seconds)')


@app.route('/')


@TIMINGS.time()


@IN_PROGRESS.track_inprogress()


def hello_world():


REQUESTS.labels(method='GET', endpoint="/", status_code=200).inc() # Increment the counter


return 'Hello, World!'


@app.route('/prometheus-course/<name>')


@IN_PROGRESS.track_inprogress()


@TIMINGS.time()


def index(name):


REQUESTS.labels(method='GET', endpoint=“/prometheus-course/jorn", status_code=200).inc()


return render_template_string('<b>Hello {{name}} welcome!</b>!', name=name)


@app.route('/metrics')


@IN_PROGRESS.track_inprogress()
Prometheus course: Edward Viaene & Jorn Jambers
Client Libraries - Python Example
• Python example:


@app.route('/metrics')


@IN_PROGRESS.track_inprogress()


@TIMINGS.time()


def metrics():


REQUESTS.labels(method='GET', endpoint="/metrics", status_code=200).inc()


return generate_latest(REGISTRY)


if __name__ == "__main__":


app.run(host='0.0.0.0')
Prometheus
Instrumentation - Go
Prometheus course: Edward Viaene & Jorn Jambers
Client Libraries - Golang Example
• https://github.com/prometheus/client_golang


• Of
fi
cially supported language


• Easy to implement:


• Supported metrics: Counter, Gauge, Summary and Histogram
package main


import (


"github.com/prometheus/client_golang/prometheus/promhttp"


"net/http"


)


func main() {


http.Handle("/metrics", promhttp.Handler())


panic(http.ListenAndServe(":8080", nil))


}
Prometheus course: Edward Viaene & Jorn Jambers
Client Libraries - Golang Example
• Gauge


import "github.com/prometheus/client_golang/prometheus"


var jobsInQueue = prometheus.NewGauge(


prometheus.GaugeOpts{


Name: "jobs_queued",


Help: "Current number of jobs queued",


},


)


func init(){


promtheus.MustRegister(jobsQueued)


}




func enqueueJob(job Job) {


queue.Add(job)


jobsInQueue.Inc()


}


func runNextJob() {


job := queue.Dequeue()


jobsInQueue.Dec()


job.Run()


}
Prometheus course: Edward Viaene & Jorn Jambers
Client Libraries - Golang Example
• Gauge


import "github.com/prometheus/client_golang/prometheus"


var jobsQueued = prometheus.NewGauge(


prometheus.GaugeOpts{


Name: "jobs_queued",


Help: "Current number of jobs queued",


},


)


func init(){


promtheus.MustRegister(jobsQueued)


}




func enqueueJob(job Job) {


queue.Add(job)


jobsQueued.Inc()


}


func runNextJob() {


job := queue.Dequeue()


jobsInQueue.Dec()


job.Run()


}
Prometheus course: Edward Viaene & Jorn Jambers
Client Libraries - Golang Example
• Adding labels


import "github.com/prometheus/client_golang/prometheus"


var jobsQueued = prometheus.NewGaugeVec(


prometheus.GaugeOpts{


Name: "jobs_queued",


Help: "Current number of jobs in the queue",


},


[]string{"job_type"},


)


func init(){


promtheus.MustRegister(jobsQueued)


}




func enqueueJob(job Job) {


queue.Add(job)


jobsInQueue.WithLabelValues(job.Type()).Inc()


}


func runNextJob() {


job := queue.Dequeue()


jobsInQueue.WithLabelValues(job.Type()).Dec()


job.Run()


}
Prometheus course: Edward Viaene & Jorn Jambers
Client Libraries - Golang Example
• Adding labels


import "github.com/prometheus/client_golang/prometheus"


var jobsQueued = prometheus.NewGaugeVec(


prometheus.GaugeOpts{


Name: "jobs_queued",


Help: "Current number of jobs in the queue",


},


[]string{"job_type"},


)


func init(){


promtheus.MustRegister(jobsQueued)


}




func enqueueJob(job Job) {


queue.Add(job)


jobsQueued.WithLabelValues(job.Type()).Inc()


}


func runNextJob() {


job := queue.Dequeue()


jobsQueued.WithLabelValues(job.Type()).Dec()


job.Run()


}
Prometheus course: Edward Viaene & Jorn Jambers
Client Libraries - Golang Example
• Histogram


import "github.com/prometheus/client_golang/prometheus"


var jobsDurationHistogram = prometheus.NewHistogramVec(


prometheus.HistogramOpts{


Name: "jobs_duration_seconds",


Help: "Jobs duration distribution",


Buckets: []float64{1, 2, 5, 10, 20, 60},


},


[]string{"job_type"},


)


start := time.Now()


job.Run()


duration := time.Since(start)


jobsDurationHistogram.WithLabelValues(job.Type()).Observe(duration.Seconds())
Prometheus course: Edward Viaene & Jorn Jambers
Client Libraries - Golang Example
• Histogram


import "github.com/prometheus/client_golang/prometheus"


var jobsDurationHistogram = prometheus.NewHistogramVec(


prometheus.HistogramOpts{


Name: "jobs_duration_seconds",


Help: "Jobs duration distribution",


Buckets: []float64{1, 2, 5, 10, 20, 60},


},


[]string{"job_type"},


)


start := time.Now()


job.Run()


duration := time.Since(start)


jobsDurationHistogram.WithLabelValues(job.Type()).Observe(duration.Seconds())
Prometheus course: Edward Viaene & Jorn Jambers
Client Libraries - Golang Example
• Summary


prometheus.NewSummary()
Prometheus
Pushing metrics
Prometheus course: Edward Viaene & Jorn Jambers
Pushing Metrics - Introduction
App Prometheus
Push Gateway
Push


Metrics
Pull


Metrics
• https://github.com/prometheus/pushgateway


• Diagram
Prometheus course: Edward Viaene & Jorn Jambers
Pushing Metrics - Introduction
• Sometimes metrics cannot be scraped


Example: batch jobs, servers are not reachable due to NAT,
fi
rewal
l

• Pushgateway is used as an intermediary service which allows you to push metrics.


• Pitfall
s

• Most of the times this is a single instance so this results in a SPOF


• Prometheus’s automatic instance health monitoring is not possible


• The Pushgateway never forgets the metrics unless they are deleted via the api


example:




curl -X DELETE http://localhost:9091/metrics/job/prom_course/instance/localhost


Prometheus course: Edward Viaene & Jorn Jambers
Pushing Metrics - Introduction
• Only 1 valid use case for the Pushgateway


• Service-level batch jobs and not related to a speci
fi
c machine


• If NAT or/both
fi
rewall is blocking you from using the pull mechanism


• Move the Prometheus server on the same network
Prometheus course: Edward Viaene & Jorn Jambers
Pushing Metrics - Python Example
• Python example:


• Pushgateway functions take a grouping key.


• push_to_gateway replaces metrics with the same grouping key


• pushadd_to_gateway only replaces metrics with the same name and grouping key


• delete_from_gateway deletes metrics with the given job and grouping key.


from prometheus_client import CollectorRegistry, Gauge, push_to_gateway


registry = CollectorRegistry()


g = Gauge('job_last_success_unixtime', ‘Last time the course batch job has finished', registry=registry)


g.set_to_current_time()


push_to_gateway('localhost:9091', job='batchA', registry=registry)
Prometheus
Pushing Metrics - Go
Prometheus course: Edward Viaene & Jorn Jambers
Pushing Metrics - Go Example
• Go example:


package main


import (


"flag"


"log"


"net/http"


"github.com/prometheus/client_golang/prometheus/promhttp"


"github.com/prometheus/client_golang/prometheus/push"


)


gatewayUrl:="http://localhost:9091/"


 


throughputGuage := prometheus.NewGauge(prometheus.GaugeOpts{


Name: “throughput”,


Help: "Throughput in Mbps",


})


throughputGuage.Set(800)


 


if err := push.Collectors(


"throughput_job", push.HostnameGroupingKey(),


gatewayUrl, throughputGuage


); err != nil {


fmt.Println("Could not push completion time to Pushgateway:", err)


}
Prometheus
Querying
Prometheus course: Edward Viaene & Jorn Jambers
Querying Metrics - Introduction
• Prometheus provides a functional expression language called PromQL


• Provides built in operators and functions


• Vector-based calculations like Excel


• Expressions over time-series vectors


• PromQL is read-only


• Example:


100 - (avg by (instance) (irate(node_cpu_seconds_total{job='node_exporter',mode="idle"}[5m])) * 100)
Prometheus
Querying - Expressions
Prometheus course: Edward Viaene & Jorn Jambers
Querying Metrics - Introduction
• Instant vector - a set of time series containing a single sample for each
time series, all sharing the same timestamp


Example: node_cpu_seconds_total


• Range vector - a set of time series containing a range of data points over
time for each time series


Example: node_cpu_seconds_total[5m]


• Scalar - a simple numeric
fl
oating point value


Example: -3.14


• String - a simple string value; currently unused


Example: foobar
Prometheus
Querying - Operators
Prometheus course: Edward Viaene & Jorn Jambers
Querying Metrics - Introduction
• Arithmetic binary operators 


Example: - (subtraction), * (multiplication), / (division), % (modulo), ^ (power/exponentiation)


• Comparison binary operators 


Example: == (equal), != (not-equal), > (greater-than), < (less-than) ,>= (greater-or-equal), <=
(less-or-equal)


• Logical/set binary operators

Example: and (intersection), or (union), unless (complement)


• Aggregation operators 


Example:sum (calculate sum over dimensions), min (select minimum over dimensions) ,max
(select maximum over dimensions), avg (calculate the average over dimensions), stddev
(calculate population standard deviation over dimensions), stdvar (calculate population standard
variance over dimensions), count (count number of elements in the vector), count_values (count
number of elements with the same value), bottomk (smallest k elements by sample value), topk
(largest k elements by sample value), quantile (calculate φ-quantile (0 ≤ φ ≤ 1) over dimensions)
Demo
Querying
Prometheus
Service Discovery
Prometheus course: Edward Viaene & Jorn Jambers
Service Discovery - Introduction
• De
fi
nition:


Service discovery is the automatic detection of devices and services
offered by these devices on a computer network.


• Not really a service discovery mechanism


• Cloud support for (AWS, Azure, Google,…)


• Cluster managers (Kubernetes, Marathon, …)


• Generic mechanisms (DNS, Consul, Zookeeper, …)
static_configs:


- targets: ['localhost:9090']
Prometheus
Service Discovery - Example AWS
Prometheus course: Edward Viaene & Jorn Jambers
Service Discovery - Introduction
• EC2 Example:


Add following con
fi
g to /etc/prometheus/prometheus.yml


• Make sure the user has the following IAM role: AmazonEC2ReadOnlyAcces
s

• Make sure you security groups allow access to port (9100, 9090)


global:


scrape_interval: 1s


evaluation_interval: 1s


scrape_configs:


- job_name: 'node'


ec2_sd_configs:


- region: eu-west-1


access_key: PUT_THE_ACCESS_KEY_HERE


secret_key: PUT_THE_SECRET_KEY_HERE


port: 9100
Prometheus course: Edward Viaene & Jorn Jambers
Service Discovery - Introduction
• EC2 Example:


Only monitor instances started with the name PROD
global:


scrape_interval: 1s


evaluation_interval: 1s


scrape_configs:


- job_name: 'node'


ec2_sd_configs:


- region: eu-west-1


access_key: PUT_THE_ACCESS_KEY_HERE


secret_key: PUT_THE_SECRET_KEY_HERE


port: 9100


relabel_configs:


# Only monitor instances with a tag Name starting with “PROD"


- source_labels: [__meta_ec2_tag_Name]


regex: PROD.*


action: keep


# Use the instance ID as the instance label


- source_labels: [__meta_ec2_instance_id]


target_label: instance
Prometheus course: Edward Viaene & Jorn Jambers
Service Discovery - Introduction
• EC2 Example:


Relabel ip adress to instance id for convenience
global:


scrape_interval: 1s


evaluation_interval: 1s


scrape_configs:


- job_name: 'node'


ec2_sd_configs:


- region: eu-west-1


access_key: PUT_THE_ACCESS_KEY_HERE


secret_key: PUT_THE_SECRET_KEY_HERE


port: 9100


relabel_configs:


# Only monitor instances with a tag Name starting with “PROD"


- source_labels: [__meta_ec2_tag_Name]


regex: PROD.*


action: keep


# Use the instance ID as the instance label


- source_labels: [__meta_ec2_instance_id]


target_label: instance
Prometheus
Service Discovery - Example Kubernetes
Prometheus course: Edward Viaene & Jorn Jambers
Service Discovery - Introduction
• Kubernetes Example:


Add following con
fi
g to /etc/prometheus/prometheus.yml


- job_name:'kubernetes'


kubernetes_sd_configs:


-


api_servers:


- https://kubernetes.default.svc


in_cluster: true


basic_auth:


username: prometheus


password: secret


retry_interval:5s


- job_name:’kubernetes-service-endpoints'


kubernetes_sd_configs:


-


api_servers:


- https://kube-master.prometheuscourse.com


in_cluster: true
Prometheus
Service Discovery - Example DNS
Prometheus course: Edward Viaene & Jorn Jambers
Service Discovery - DNS
• DNS Example:


Add following con
fi
g to /etc/prometheus/prometheus.yml


- job_name: mysql


dns_sd_configs:


- names:


- metrics.mysql.example.com


- job_name: haproxy


dns_sd_configs:


- names:


- metrics.haproxy.example.com
Prometheus
Service Discovery - Example using
fi
le
Prometheus course: Edward Viaene & Jorn Jambers
Service Discovery - Using
fi
le
• File Example:


Add following con
fi
g to /etc/prometheus/prometheus.yml


• Format target.json
scrape_configs:


- job_name: 'dummy' # This is a default value, it is
mandatory.


file_sd_configs:


- files:


- targets.json
[


{


"targets": [ "myslave1:9104", "myslave2:9104" ],


"labels": {


"env": "prod",


"job": "mysql_slave"


}


},


{


"targets": [ "mymaster:9104" ],


"labels": {


"env": "prod",


"job": "mysql_master"


}


}


]
Prometheus
Exporters
Prometheus course: Edward Viaene & Jorn Jambers
Exporters - Introduction
• Build for exporting prometheus metrics from existing third-party metrics


• When Prometheus is not able to pull metrics directly(Linux sys stats, haproxy, …)


• Examples:


MySQL server exporter


Memcached exporter


Consul exporter


Node/system metrics exporter


MongoDB


Redis


Many more….


• https://prometheus.io/docs/instrumenting/exporters/
Prometheus course: Edward Viaene & Jorn Jambers
Exporters - Introduction
• We are already using one :-)


Check /etc/prometheus/prometheus.yml
- job_name: 'node_exporter'


scrape_interval: 5s


static_configs:


- targets: ['localhost:9100']
Prometheus
Alerting
Prometheus
Alerting - Introduction
Prometheus course: Edward Viaene & Jorn Jambers
Alerting - Introduction
• Alerting in Prometheus is separated into 2 parts


• Alerting rules in Prometheus server


• Alertmanager
Receivers
SLACK
EMAIL
Prometheu
s

Server
Routes
Alertmanage
r

Push alert
Rule 1 Rule 2 Rule n
Prometheus
Alerting - Alerting rules
Prometheus course: Edward Viaene & Jorn Jambers
Alerting Rules
• Rules live in Prometheus server con
fi
g


• Best practice to separate the alerts from the prometheus con
fi
g


• Add an include to /etc/prometheus/prometheus.yml


rule_files:


- "/etc/prometheus/alert.rules"


• Alert format:


• Alert example:


groups:


- name: example


rules:


- alert: cpuUsge


expr: 100 - (avg by (instance) (irate(node_cpu_seconds_total{job='node_exporter',mode="idle"}[5m])) * 100) >
95


for: 1m


labels:


severity: critical


annotations:


summary: Machine under healvy load


ALERT <alert name>


IF <expression>


[ FOR <duration> ]


[ LABELS <label set> ]


[ ANNOTATIONS <label set> ]
Prometheus course: Edward Viaene & Jorn Jambers
Alerting Rules
• Alerting rules allow you to de
fi
ne the alert conditions


• Alerting rules sent the alerts being
fi
red to an external service


• The format of these alerts is in the Prometheus expression language


• Example:


groups:


- name: Important instance


rules:


# Alert for any instance that is unreachable for >5 minutes.


- alert: InstanceDown


expr: up == 0


for: 5m


labels:


severity: page


annotations:


summary: "Instance {{ $labels.instance }} down"


description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes."
Prometheus
Alerting - Alertmanager
Prometheus course: Edward Viaene & Jorn Jambers
Alertmanager
• Alertmanager handles the alerts
fi
red by the prometheus server


• Handles deduplication, grouping and routing of alerts


• Routes alerts to receivers (Pagerduty, Opsgenie, email, Slack,…)
Prometheus course: Edward Viaene & Jorn Jambers
Alertmanager
• Alertmanager Con
fi
guration (/etc/alertmanager/alertmanager.yml):
global:


smtp_smarthost: 'localhost:25'


smtp_from: 'alertmanager@prometheus.com'


smtp_auth_username: ''


smtp_auth_password: ''


templates:


- '/etc/alertmanager/template/*.tmpl'


route:


repeat_interval: 1h


receiver: operations-team


receivers:


- name: 'operations-team'


email_configs:


- to: 'operations-team+alerts@example.org'


slack_configs:


- api_url: https://hooks.slack.com/services/XXXXXX/XXXXXX/XXXXXX


channel: '#prometheus-course'


send_resolved: true
Prometheus course: Edward Viaene & Jorn Jambers
Alertmanager
• Prometheus Con
fi
guration (/etc/prometheus/prometheus.yml):
# my global config


global:


scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.


evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.


# scrape_timeout is set to the global default (10s).


# Alertmanager configuration


alerting:


alertmanagers:


- static_configs:


- targets:


- localhost:9093


# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.


rule_files:


# - "first_rules.yml"


# - "second_rules.yml"


# A scrape configuration containing exactly one endpoint to scrape:


# Here it's Prometheus itself.


scrape_configs:


# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.


- job_name: 'prometheus'


# metrics_path defaults to '/metrics'


# scheme defaults to 'http'.


static_configs:


- targets: ['localhost:9090']
Prometheus course: Edward Viaene & Jorn Jambers
Alertmanager
• Concepts:


• Grouping: Groups similar alerts into 1 noti
fi
cation


• Inhibition: Silence other alerts if one speci
fi
ed alert is already
fi
red


• Silences: A simple way to mute certain noti
fi
cations
Prometheus course: Edward Viaene & Jorn Jambers
Alertmanager
• High availability


• You can create a high available Alertmanager cluster using mesh con
fi
g


• Do not load balance this service!


• Use a list of Alertmanager nodes in Prometheus con
fi
g


• All alerts are sent to all known Alertmanager nodes


• No need to monitor the monitoring


• Guarantees the noti
fi
cation is at least send once
Prometheus course: Edward Viaene & Jorn Jambers
Alertmanager
• Alert states:


Inactive - No rule is met


Pending - Rule is met but can be suppressed due to validations


Firing - Alert is sent to the con
fi
gured channel(mail,Slack,…)


• Runs on port :9093
Prometheus course: Edward Viaene & Jorn Jambers
Alertmanager
• Notifying multiple destinations
route:


repeat_interval: 1h


receiver: operations-team


receivers:


- name: 'operations-team'


email_configs:


- to: 'operations-team+alerts@example.org'


slack_configs:


- api_url: https://hooks.slack.com/services/XXXXXX/XXXXXX/XXXXXX


channel: '#prometheus-course'


send_resolved: true
Prometheus
Setting up alerts
Prometheus
Setting up alerts - Demo
Prometheus course: Edward Viaene & Jorn Jambers
Setting up alerts
• Install Alertmanager


• Create con
fi
g for the Alertmanager


• Mail


• Slack


• Alter prometheus con
fi
g


• Setup an alert


• See the noti
fi
cation coming in when an alert is
fi
red
Prometheus internals
Prometheus
Architecture
Prometheus course: Edward Viaene & Jorn Jambers
Architecture
From: https://github.com/prometheus/prometheus
Prometheus
Storage
Prometheus course: Edward Viaene & Jorn Jambers
Storage
• You can use the default local on-disk storage, or optionally the remote
storage syste
m

• Local storage: a local time series database in a custom Prometheus
format


• Remote storage: you can read/write samples to a remote system in a
standardized format


• Currently it uses a snappy-compressed protocol buffer encoding
over HTTP, but might change in the future (to use gRPC or HTTP/2)
Prometheus course: Edward Viaene & Jorn Jambers
Remote Storage
• Remote storage is primarily focussed at long term storage


• Currently there are adapters available for the following solutions:
AppOptics: write Graphite: write
Chronix: write In
fl
uxDB: read and write
Cortex: read and write OpenTSDB: write
CrateDB: read and write PostgreSQL/TimescaleDB: read and write
Gnocchi: write SignalFx: write
Source: https://prometheus.io/docs/operating/integrations/#remote-endpoints-and-storage
Prometheus course: Edward Viaene & Jorn Jambers
Local Storage
• Prometheus >=2.0 uses a new storage engine which dramatically increases scalability


• Ingested samples are grouped in blocks of two hours


• Those 2h samples are stored in separate directories (in the data directory of
prometheus)


• Writes are batched and written to disk in chunks, containing multiple data points
directory 1 directory 2 directory 3
2h of data:


chunks/00000
1

chunks/000002
2h of data:


chunks/00000
1

2h of data:


chunks/00000
1

chunks/000002
Prometheus course: Edward Viaene & Jorn Jambers
Local Storage
• Every directory also has an index
fi
le (index) and a metadata
fi
le (meta.json)


• It stores the metric names and the labels, and provides an index from the
metric names and labels to the series in the chunk
fi
les
directory 1 directory 2 directory 3
chunks/000001


chunks/000002


meta.jso
n

index
chunks/000001


meta.jso
n

index
chunks/000001


chunks/000002


meta.jso
n

index
Prometheus course: Edward Viaene & Jorn Jambers
Local Storage
• The most recent data is kept in memory


• You don’t want to loose the in-memory data during a crash, so the data also
needs to be persisted to disk. This is done using a write-ahead-log (WAL)
directory 1 directory 2 directory 3
chunks/000001


chunks/000002


meta.json


index
chunks/000001


meta.json


index
chunks/000001


chunks/000002


meta.json


index
wal:
000001


000002
Prometheus course: Edward Viaene & Jorn Jambers
Local Storage
• Write Ahead Log (WAL)


• It’s quicker to append to a
fi
le (like a log) than making (multiple)
random read/writes


• If there’s a server crash and the data from memory is lost, then the WAL
will be replayed


• This way, no data will be lost or corrupted during a crash
Prometheus course: Edward Viaene & Jorn Jambers
Local Storage
• When series gets deleted, a tombstone
fi
le gets created


• This is more ef
fi
cient than immediately deleting the data from the chunk
fi
les, as
the actual delete can happen at a later time (e.g. when there’s not a lot of load)
directory 1 directory 2 directory 3
2h of data:


chunks/000001


chunks/000002


meta.json


index


tombstone
2h of data:


chunks/000001


chunks/000002


meta.json


index


tombstone
2h of data:


chunks/000001


chunks/000002


meta.json


index


tombstone
Prometheus course: Edward Viaene & Jorn Jambers
Local Storage
• The initial 2-hour blocks are merged in the background to form longer
blocks


• This is called compaction
directory 1 directory 2+3
2h of data:


chunks/000001


chunks/000002
4h of data
:

chunks/000001


chunks/000002
Prometheus course: Edward Viaene & Jorn Jambers
Local Storage
• Block characteristics:


• A block on the
fi
lesystem is a directory with chunks


• You can see each block as a fully independent database containing
all time series for the window


• Every block of data, except the current block, is immutable (no
changes can be made)


• These non-overlapping blocks are actually a horizontal partitioning of
the ingested time series data
Prometheus course: Edward Viaene & Jorn Jambers
Local Storage
• This horizontal partitioning gives a lot of bene
fi
ts:


• When querying, the blocks not in the time range can be skipped


• When completing a block, data only needs to be added, and not
modi
fi
ed (avoids write-ampli
fi
cation)


• Recent data is kept in memory, so can be queried quicker


• Deleting old data is only a matter of deleting directories on the
fi
lesystem
Prometheus course: Edward Viaene & Jorn Jambers
Local Storage
• Compaction:


• When querying, blocks have to be merged together to be able to
calculate the results


• Too many blocks could cause too much merging overhead, so blocks
are compacted


• 2 blocks are merged and form a newly created (often larger) block


• Compaction can also modify data: dropping deleted data or
restructuring the chunks to increase the query performance
Prometheus course: Edward Viaene & Jorn Jambers
Local Storage
• The index:


• Having horizontal partitioning already makes most queries quicker,
but not those that need to go through all the data to get the result


• The index is an inverted index to provide better query performance,
also in cases where all data needs to be queried


• Each series is assigned a unique ID (e.g. ID 1, 2, and 3)


• The index will contain an inverted index for the labels, for example
for label env=production, it’ll have 1 and 3 as IDs if those series
contain the label env=production
Prometheus course: Edward Viaene & Jorn Jambers
Local Storage
• What about Disk size?


• On average, Prometheus needs 1-2 bytes per sample


• You can use the following formula to calculate the disk space needed:
needed_disk_space = retention_time_seconds * ingested_samples_per_second * bytes_per_sample
Prometheus course: Edward Viaene & Jorn Jambers
Local Storage
• How to reduce disk size?


• You can increase the scrape interval, which will get you less data


• You can decrease the targets or series you scrape


• Or you can can reduce the retention (how long you keep the data)
--storage.tsdb.retention: This determines when to remove old data. Defaults to 15d.
Prometheus course: Edward Viaene & Jorn Jambers
References
• To read the full story of Prometheus time series database, read the blog
post from Fabian Reinartz at https://fabxc.org/tsdb/
Prometheus
Security
Prometheus course: Edward Viaene & Jorn Jambers
Security
• At the moment Prometheus doesn’t offer any support for authentication or
encryption (TLS) on the server components


• They argue that they’re focussing on building a monitoring solution, and
want to avoid having to implement complex security features


• You can still enable authentication and TLS, using a reverse proxy


• This is only valid for server components, prometheus can scrape TLS and
authentication enabled targets


• See tls_con
fi
g in the prometheus con
fi
guration to con
fi
gure a CA certi
fi
cate,
user certi
fi
cate and user key


• You’d still need to setup a reverse proxy for the targets itself
Demo
Prometheus TLS and authentication
Demo
Prometheus mutual TLS for targets
Prometheus Use Cases
Monitoring a web app
Prometheus with python-
fl
ask and MySQL
Prometheus course: Edward Viaene & Jorn Jambers
Monitoring a web app
• I’m going to integrate prometheus monitoring with a web application
based on python


• I’ll use the of
fi
cial prometheus_client library for Python


• Flask is the web framework I’m going to use


• It will create an http server and I’ll able to con
fi
gure routes (e.g. /query)


• I’ll use mysqlclient for python to query a MySQL database


• I’ll include one normal query and one “bad behaving” query that will
take between 0 and 10 seconds to execute
Prometheus course: Edward Viaene & Jorn Jambers
The web app
Python web app
fl
ask http server (port 5000)
prometheus monitoring (port 8000)
MySQL Database (port 3306)
Prometheus (port 9090)
Client (curl, wget, browser)
Prometheus course: Edward Viaene & Jorn Jambers
Monitoring a web app
• I’m going to use the Counter and the Histogram metric types to capture the data:


• A Counter to capture the amount of times an http endpoint is hit + to capture the
amount of times a MySQL query is executed


• The value of the Counter must always increase, that’s why you should take
the Counter type for these types of data


• A Histogram to capture the latency of the HTTP requests and the MySQL Queries


• A Histogram samples observations (like latencies) and counts them in
con
fi
gurable buckets. It also provides a sum of all observed values.


• The default buckets are intended to cover a typical web/rpc request from
milliseconds to seconds
Prometheus course: Edward Viaene & Jorn Jambers
Monitoring a web app
from prometheus_client import Counter, Histogram


FLASK_REQUEST_LATENCY = Histogram('
fl
ask_request_latency_seconds', 'Flask Request Latency',


['method', 'endpoint'])


FLASK_REQUEST_COUNT = Counter('
fl
ask_request_count', 'Flask Request Count',


['method', 'endpoint', 'http_status'])


MYSQL_REQUEST_LATENCY = Histogram('mysql_query_latency_seconds', 'MYSQL Query Latency',


['query'])


MYSQL_REQUEST_COUNT = Counter('mysql_query_count', 'Flask Request Count',


['query'])
• This is how I’m going to de
fi
ne the data types in Python for Prometheus:
Prometheus course: Edward Viaene & Jorn Jambers
Monitoring a web app
start_time = time.time()


sql = “select * from table”


# do the query


query_latency = time.time() - start_time


MYSQL_REQUEST_LATENCY.labels(sql[:50]).observe(query_latency)


MYSQL_REQUEST_COUNT.labels(sql[:50]).inc()


• This is how we can calculate the latency of a query:
Demo
Monitor a web application with Prometheus
Demo
Monitor a web application with Prometheus - apdex score
Monitoring a web app
Prometheus with Java Spring boot
Prometheus course: Edward Viaene & Jorn Jambers
Monitoring a web app
• Introduction


• Application's health + Metrics


• Notice unwanted behaviour


• Monoliths as well as microservices


• Crucial in microservices architecture


• “To measure is to know”
Prometheus course: Edward Viaene & Jorn Jambers
Monitoring a web app
• We are going to integrate prometheus monitoring with a web
application based on Java Spring Boot


• We will use following :


• Spring Boot


• Spring Boot Actuato
r

• Micromete
r

• We are also going to do a demo with an example.
Prometheus course: Edward Viaene & Jorn Jambers
The web app
Spring Boot web app
Spring Boot http server (port 8080
)

/api/demo & /api/delayed/demo
prometheus monitoring (port 8080
)

/actuator/prometheus
Prometheus
Client (curl, wget, browser)
Prometheus course: Edward Viaene & Jorn Jambers
Monitoring a web app
• Spring Boot Actuato
r

• Sub-Project of Spring boot


• Production ready endpoints


• /actuator is the common pre
fi
x of these endpoints


• Protected by default


• Adjustable in application.properties


• Expose all: management.endpoints.web.exposure.include=*
Prometheus course: Edward Viaene & Jorn Jambers
Monitoring a web app
• Micromete
r

• Vendor-Neutral application metrics facade


• Support for Prometheus and many others:


• AWS Cloudwatch, Datadog, In
fl
uxDB/Telegraf, New Relic, …


• Transforms /actuator/metrics data into data your monitoring system
understands


• Only a vendor-speci
fi
c micrometer dependency in your application is
required
Prometheus course: Edward Viaene & Jorn Jambers
Monitoring a web app
• Micrometer


• pom.xml example


<dependency>


<groupId>org.springframework.boot</groupId>


<artifactId>spring-boot-starter-actuator</artifactId>


</dependency>


<dependency>


<groupId>io.micrometer</groupId>


<artifactId>micrometer-core</artifactId>


</dependency>


<dependency>


<groupId>io.micrometer</groupId>


<artifactId>micrometer-registry-prometheus</artifactId>


</dependency>
Prometheus course: Edward Viaene & Jorn Jambers
Monitoring a web app
• Spring Boot


• Code example


…


import io.micrometer.core.instrument.Metrics;


…


private Counter runCounter = Metrics.counter("runCounter");


…


@GetMapping("/api/demo")


@Timed


public String apiUse() throws InterruptedException {


runCounter.increment();


log.info("Hello world app accessed on /api/demo");


return "Hello world";


}
Demo
Monitor and instrument a Spring boot application
Grafana
Grafana Provisioning
Prometheus course: Edward Viaene & Jorn Jambers
Grafana Provisioning
• In one of the
fi
rst lectures I showed you how to setup Grafana using the UI


• Rather than using the UI, you can also use yaml and json
fi
les to provision
Grafana with datasources and dashboards


• This is a much more powerful way of using Grafana, as you can test new
dashboards
fi
rst on a dev / test server, then import the newly created
dashboards to production


• You can do the import manually through the UI, or using yaml and json
fi
les


• When using
fi
les, you can keep
fi
les within version control to keep
changes, revisions and backups
Prometheus course: Edward Viaene & Jorn Jambers
Grafana Provisioning
• The con
fi
guration of Grafana is all kept in /etc/grafana:


• The data is kept in /var/lib/grafana:
/etc/grafana/:
 

-rw-r----- 1 root grafana 14K Jul 17 12:30 grafana.ini


-rw-r----- 1 root grafana 3.4K Jul 17 12:30 ldap.toml


drwxr-xr-x 4 root grafana 4.0K Jul 17 13:15 provisioning/


/etc/grafana/provisioning/
:

drwxr-xr-x 2 root grafana 4.0K Jul 17 14:56 dashboards/


drwxr-xr-x 2 root grafana 4.0K Jul 17 15:34 datasources/
/var/lib/grafana
:

drwxr-xr-x 2 root root 4.0K Jul 17 15:47 dashboards/


-rw-r----- 1 grafana grafana 500K Jul 17 15:48 grafana.db


drwxr-x--- 2 grafana grafana 4.0K Jul 17 12:31 plugins/


drwx------ 5 grafana grafana 4.0K Jul 17 12:40 sessions/
Prometheus course: Edward Viaene & Jorn Jambers
Grafana Provisioning
• You can change the database & paths in /etc/grafana/grafana.ini
[paths]


# Path to where grafana can store temp
fi
les, sessions, and the sqlite3 db (if that is used)


;data = /var/lib/grafana


# Directory where grafana can store logs


;logs = /var/log/grafana


# Directory where grafana will automatically scan and look for plugins


;plugins = /var/lib/grafana/plugins


# folder that contains provisioning con
fi
g
fi
les that grafana will apply on startup and while running.


;provisioning = conf/provisioning


…


[database]


# Either "mysql", "postgres" or "sqlite3", it's your choice


;type = sqlite3


;host = 127.0.0.1:3306


;name = grafana


;user = root


# If the password contains # or ; you have to wrap it with triple quotes. Ex """#password;"""


;password =
Demo
Grafana Provisioning
Prometheus Use Cases
Cloudwatch exporter
Prometheus course: Edward Viaene & Jorn Jambers
Use Cases - Cloudwatch
• Cloudwatch exporter


• Installation


• Con
fi
guration (exporter + AWS)


• Charges + measuring them


• Querying metrics
Prometheus Use Cases
Consul integration
Prometheus course: Edward Viaene & Jorn Jambers
Consul integration
• Consul is a distributed, highly available solution providing:


• A Service Mesh


• Service Discovery


• Health checks for your services


• A Key-Value store


• Secure Service Communications


• Multi-datacenter support


• Consul is often deployed in conjunction with Docker
Prometheus course: Edward Viaene & Jorn Jambers
Consul integration
• There are 2 integrations that are interesting to use:


• 1) Prometheus can scrape Consul’s metrics and provide you with all
sorts of information about your running services


• Consul provides Service Discovery, so it knows where services are
running and what the current state of it is


• 2) Consul can be integrated within Prometheus to automatically add the
services as target
s

• Consul will discover your services, and these can then be
automatically added to Prometheus as a target
Prometheus course: Edward Viaene & Jorn Jambers
Consul integration
• In the next demo I’ll focus on the Prometheus integration with Consul, not really on
implementing consul itself


• I’ll show you the installation of consul, but not how to integrate consul with your
infrastructure (it’s out of scope for this Prometheus course)


• I’ll manually register a service to consul, rather than setting up service
discovery


• In production environments, where you have a lot of services, service
discovery using consul will allow you to register your services automatically


• If you are interested in Consul, have a look at the documentation at https://
www.consul.io/ and you can
fi
nd the service registrator for docker at https://
github.com/gliderlabs/registrator
Demo
Consul integration
Prometheus Use Cases
EC2 auto discovery
Prometheus course: Edward Viaene & Jorn Jambers
EC2 auto discovery
• Service discovery is the automatic detection of devices and services
offered by these devices on a computer network.


• In this Use Case we will:


• Create prerequisites in AWS (IAM role, Security Groups, EC2 instances)


• Alter Prometheus con
fi
g (/etc/prometheus/prometheus.yml)


• Query the data in Grafana


• https://github.com/in4it/prometheus-course/blob/master/use-cases/ec2-
auto-discovery/lab.txt
Prometheus on Kubernetes
Getting Kubernetes metrics

More Related Content

What's hot

Terraform -- Infrastructure as Code
Terraform -- Infrastructure as CodeTerraform -- Infrastructure as Code
Terraform -- Infrastructure as CodeMartin Schütte
 
Monitoring With Prometheus
Monitoring With PrometheusMonitoring With Prometheus
Monitoring With PrometheusKnoldus Inc.
 
Terraform AWS modules and some best practices - September 2019
Terraform AWS modules and some best practices - September 2019Terraform AWS modules and some best practices - September 2019
Terraform AWS modules and some best practices - September 2019Anton Babenko
 
Learning Docker from Square One
Learning Docker from Square OneLearning Docker from Square One
Learning Docker from Square OneDocker, Inc.
 
Big Data Redis Mongodb Dynamodb Sharding
Big Data Redis Mongodb Dynamodb ShardingBig Data Redis Mongodb Dynamodb Sharding
Big Data Redis Mongodb Dynamodb ShardingAraf Karsh Hamid
 
Terraform Introduction
Terraform IntroductionTerraform Introduction
Terraform Introductionsoniasnowfrog
 
Netflix security monkey overview
Netflix security monkey overviewNetflix security monkey overview
Netflix security monkey overviewRyan Hodgin
 
Terraform modules and best-practices - September 2018
Terraform modules and best-practices - September 2018Terraform modules and best-practices - September 2018
Terraform modules and best-practices - September 2018Anton Babenko
 
VMware Tanzu Introduction- June 11, 2020
VMware Tanzu Introduction- June 11, 2020VMware Tanzu Introduction- June 11, 2020
VMware Tanzu Introduction- June 11, 2020VMware Tanzu
 
Deconstructing SaaS: Deep Dive into Building Multi-Tenant Solutions on AWS (A...
Deconstructing SaaS: Deep Dive into Building Multi-Tenant Solutions on AWS (A...Deconstructing SaaS: Deep Dive into Building Multi-Tenant Solutions on AWS (A...
Deconstructing SaaS: Deep Dive into Building Multi-Tenant Solutions on AWS (A...Amazon Web Services
 
Building a well-engaged and secure AWS account access management - FND207-R ...
 Building a well-engaged and secure AWS account access management - FND207-R ... Building a well-engaged and secure AWS account access management - FND207-R ...
Building a well-engaged and secure AWS account access management - FND207-R ...Amazon Web Services
 
Getting Started Monitoring with Prometheus and Grafana
Getting Started Monitoring with Prometheus and GrafanaGetting Started Monitoring with Prometheus and Grafana
Getting Started Monitoring with Prometheus and GrafanaSyah Dwi Prihatmoko
 
Terraform introduction
Terraform introductionTerraform introduction
Terraform introductionJason Vance
 
Best Practices for Getting Started with AWS
Best Practices for Getting Started with AWSBest Practices for Getting Started with AWS
Best Practices for Getting Started with AWSAmazon Web Services
 

What's hot (20)

Final terraform
Final terraformFinal terraform
Final terraform
 
Terraform -- Infrastructure as Code
Terraform -- Infrastructure as CodeTerraform -- Infrastructure as Code
Terraform -- Infrastructure as Code
 
Introduction to Amazon EKS
Introduction to Amazon EKSIntroduction to Amazon EKS
Introduction to Amazon EKS
 
Terraform
TerraformTerraform
Terraform
 
Monitoring With Prometheus
Monitoring With PrometheusMonitoring With Prometheus
Monitoring With Prometheus
 
Terraform AWS modules and some best practices - September 2019
Terraform AWS modules and some best practices - September 2019Terraform AWS modules and some best practices - September 2019
Terraform AWS modules and some best practices - September 2019
 
Learning Docker from Square One
Learning Docker from Square OneLearning Docker from Square One
Learning Docker from Square One
 
Big Data Redis Mongodb Dynamodb Sharding
Big Data Redis Mongodb Dynamodb ShardingBig Data Redis Mongodb Dynamodb Sharding
Big Data Redis Mongodb Dynamodb Sharding
 
Terraform Introduction
Terraform IntroductionTerraform Introduction
Terraform Introduction
 
Netflix security monkey overview
Netflix security monkey overviewNetflix security monkey overview
Netflix security monkey overview
 
Terraform Basics
Terraform BasicsTerraform Basics
Terraform Basics
 
Terraform modules and best-practices - September 2018
Terraform modules and best-practices - September 2018Terraform modules and best-practices - September 2018
Terraform modules and best-practices - September 2018
 
VMware Tanzu Introduction- June 11, 2020
VMware Tanzu Introduction- June 11, 2020VMware Tanzu Introduction- June 11, 2020
VMware Tanzu Introduction- June 11, 2020
 
Terraform
TerraformTerraform
Terraform
 
Deconstructing SaaS: Deep Dive into Building Multi-Tenant Solutions on AWS (A...
Deconstructing SaaS: Deep Dive into Building Multi-Tenant Solutions on AWS (A...Deconstructing SaaS: Deep Dive into Building Multi-Tenant Solutions on AWS (A...
Deconstructing SaaS: Deep Dive into Building Multi-Tenant Solutions on AWS (A...
 
Building a well-engaged and secure AWS account access management - FND207-R ...
 Building a well-engaged and secure AWS account access management - FND207-R ... Building a well-engaged and secure AWS account access management - FND207-R ...
Building a well-engaged and secure AWS account access management - FND207-R ...
 
DevOps on AWS
DevOps on AWSDevOps on AWS
DevOps on AWS
 
Getting Started Monitoring with Prometheus and Grafana
Getting Started Monitoring with Prometheus and GrafanaGetting Started Monitoring with Prometheus and Grafana
Getting Started Monitoring with Prometheus and Grafana
 
Terraform introduction
Terraform introductionTerraform introduction
Terraform introduction
 
Best Practices for Getting Started with AWS
Best Practices for Getting Started with AWSBest Practices for Getting Started with AWS
Best Practices for Getting Started with AWS
 

Similar to Prometheus Monitoring Course Introduction

Monitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusMonitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusGrafana Labs
 
Rock Solid Deployment of Web Applications
Rock Solid Deployment of Web ApplicationsRock Solid Deployment of Web Applications
Rock Solid Deployment of Web ApplicationsPablo Godel
 
Application Delivery Patterns for Developers - Technical 401
Application Delivery Patterns for Developers - Technical 401Application Delivery Patterns for Developers - Technical 401
Application Delivery Patterns for Developers - Technical 401Amazon Web Services
 
Anatomy of a Build Pipeline
Anatomy of a Build PipelineAnatomy of a Build Pipeline
Anatomy of a Build PipelineSamuel Brown
 
Automated Testing on Web Applications
Automated Testing on Web ApplicationsAutomated Testing on Web Applications
Automated Testing on Web ApplicationsSamuel Borg
 
Scaling Prometheus on Kubernetes with Thanos
Scaling Prometheus on Kubernetes with ThanosScaling Prometheus on Kubernetes with Thanos
Scaling Prometheus on Kubernetes with ThanosThomas Riley
 
Getting started with Octopus Deploy
Getting started with Octopus DeployGetting started with Octopus Deploy
Getting started with Octopus DeployKaroline Klever
 
Big datatraining.in devops-part1
Big datatraining.in devops-part1Big datatraining.in devops-part1
Big datatraining.in devops-part1MvkZ
 
Big datatraining.in devops-part1
Big datatraining.in devops-part1Big datatraining.in devops-part1
Big datatraining.in devops-part1sKaushikNarayanan
 
Big datatraining.in devops-part1
Big datatraining.in devops-part1Big datatraining.in devops-part1
Big datatraining.in devops-part1sKaushikNarayanan
 
Big datatraining.in devops-part1
Big datatraining.in devops-part1Big datatraining.in devops-part1
Big datatraining.in devops-part1MvkZ
 
Big datatraining.in devops-part1
Big datatraining.in devops-part1Big datatraining.in devops-part1
Big datatraining.in devops-part1MvkZ
 
Big datatraining.in devops-part1
Big datatraining.in devops-part1Big datatraining.in devops-part1
Big datatraining.in devops-part1MvkZ
 
Big datatraining.in devops-part1
Big datatraining.in devops-part1Big datatraining.in devops-part1
Big datatraining.in devops-part1sKaushikNarayanan
 
[India Merge World Tour] Electric Cloud
[India Merge World Tour] Electric Cloud[India Merge World Tour] Electric Cloud
[India Merge World Tour] Electric CloudPerforce
 
Machine Learning , Analytics & Cyber Security the Next Level Threat Analytics...
Machine Learning , Analytics & Cyber Security the Next Level Threat Analytics...Machine Learning , Analytics & Cyber Security the Next Level Threat Analytics...
Machine Learning , Analytics & Cyber Security the Next Level Threat Analytics...PranavPatil822557
 
Docker presentasjon java bin
Docker presentasjon java binDocker presentasjon java bin
Docker presentasjon java binOlve Hansen
 
Testing - How Vital and How Easy to use
Testing - How Vital and How Easy to useTesting - How Vital and How Easy to use
Testing - How Vital and How Easy to useUma Ghotikar
 
No Compromise - Better, Stronger, Faster Java in the Cloud
No Compromise - Better, Stronger, Faster Java in the CloudNo Compromise - Better, Stronger, Faster Java in the Cloud
No Compromise - Better, Stronger, Faster Java in the CloudAll Things Open
 

Similar to Prometheus Monitoring Course Introduction (20)

Monitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusMonitoring Kubernetes with Prometheus
Monitoring Kubernetes with Prometheus
 
Rock Solid Deployment of Web Applications
Rock Solid Deployment of Web ApplicationsRock Solid Deployment of Web Applications
Rock Solid Deployment of Web Applications
 
Application Delivery Patterns for Developers - Technical 401
Application Delivery Patterns for Developers - Technical 401Application Delivery Patterns for Developers - Technical 401
Application Delivery Patterns for Developers - Technical 401
 
Anatomy of a Build Pipeline
Anatomy of a Build PipelineAnatomy of a Build Pipeline
Anatomy of a Build Pipeline
 
Automated Testing on Web Applications
Automated Testing on Web ApplicationsAutomated Testing on Web Applications
Automated Testing on Web Applications
 
Scaling Prometheus on Kubernetes with Thanos
Scaling Prometheus on Kubernetes with ThanosScaling Prometheus on Kubernetes with Thanos
Scaling Prometheus on Kubernetes with Thanos
 
Getting started with Octopus Deploy
Getting started with Octopus DeployGetting started with Octopus Deploy
Getting started with Octopus Deploy
 
Big datatraining.in devops-part1
Big datatraining.in devops-part1Big datatraining.in devops-part1
Big datatraining.in devops-part1
 
Big datatraining.in devops-part1
Big datatraining.in devops-part1Big datatraining.in devops-part1
Big datatraining.in devops-part1
 
Big datatraining.in devops-part1
Big datatraining.in devops-part1Big datatraining.in devops-part1
Big datatraining.in devops-part1
 
Big datatraining.in devops-part1
Big datatraining.in devops-part1Big datatraining.in devops-part1
Big datatraining.in devops-part1
 
Big datatraining.in devops-part1
Big datatraining.in devops-part1Big datatraining.in devops-part1
Big datatraining.in devops-part1
 
Big datatraining.in devops-part1
Big datatraining.in devops-part1Big datatraining.in devops-part1
Big datatraining.in devops-part1
 
Big datatraining.in devops-part1
Big datatraining.in devops-part1Big datatraining.in devops-part1
Big datatraining.in devops-part1
 
Application Delivery Patterns
Application Delivery PatternsApplication Delivery Patterns
Application Delivery Patterns
 
[India Merge World Tour] Electric Cloud
[India Merge World Tour] Electric Cloud[India Merge World Tour] Electric Cloud
[India Merge World Tour] Electric Cloud
 
Machine Learning , Analytics & Cyber Security the Next Level Threat Analytics...
Machine Learning , Analytics & Cyber Security the Next Level Threat Analytics...Machine Learning , Analytics & Cyber Security the Next Level Threat Analytics...
Machine Learning , Analytics & Cyber Security the Next Level Threat Analytics...
 
Docker presentasjon java bin
Docker presentasjon java binDocker presentasjon java bin
Docker presentasjon java bin
 
Testing - How Vital and How Easy to use
Testing - How Vital and How Easy to useTesting - How Vital and How Easy to use
Testing - How Vital and How Easy to use
 
No Compromise - Better, Stronger, Faster Java in the Cloud
No Compromise - Better, Stronger, Faster Java in the CloudNo Compromise - Better, Stronger, Faster Java in the Cloud
No Compromise - Better, Stronger, Faster Java in the Cloud
 

Recently uploaded

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 

Prometheus Monitoring Course Introduction

  • 3. Spinnaker course: Edward Viaene & Jorn Jambers Prometheus • Thank you for taking this Prometheus Cours e • Open-source monitoring solution & time series database • Was built by Soundclou d • Very active developer and user community • Now its a standalone open source project • Joined the Cloud Native Computing Foundation in 2016 • Ideal for monitoring on premise as well as cloud workloads
  • 4. Prometheus course: Edward Viaene & Jorn Jambers Course Overview Introduction Monitoring Alerting Internals Use cases What is Prometheus Client Libraries Introduction Storage Grafana Provisioning Installing Prometheus & Grafana Pushing metrics Setting up alerts Security Cloudwatch Integration Concepts Querying Con fi guration Service Discovery Monitoring Nodes Exporters Architecture
  • 5. Prometheus course: Edward Viaene & Jorn Jambers Course Objectives • To be able to use Prometheus • To get familiar with the Prometheus ecosystem • To setup a monitoring platform usin g • Prometheus • To create alerts in Prometheus • To be able to query Prometheus data
  • 6. Prometheus course: Edward Viaene & Jorn Jambers Who is Edward Viaene • My name is Edward Viaene • I am a consultant and trainer in Cloud and Big Data technologies • I’m a big advocate of Agile and DevOps techniques • I held various roles from banking to startups • I have a background in Unix/Linux, Networks, Security, Risk, and distributed computing • Nowadays I specialize in everything that has to do with Cloud and DevOps
  • 7. Prometheus course: Edward Viaene & Jorn Jambers Who is Jorn Jambers • My name is Jorn Jambers • I am a freelance DevOps consultant and trainer • DevOps advocate • Worked in banks, consultancy companies and startups • In the latter I found my passion for DevOps • I have a background in Unix/Linux, Hadoop, DBA, Networks, automations • Today I help companies succeed on the public cloud
  • 8. Prometheus course: Edward Viaene & Jorn Jambers Online Training • Online training on Udemy • DevOps, Distributed Computing, Cloud, Big Data • Using online video lectures • 40,000+ enrolled students in 100+ countries
  • 10. Prometheus course: Edward Viaene & Jorn Jambers Prometheus • Prometheus is an Open source monitoring solution • Started at SoundCloud around 2012-2013, and was made public in early 2015 • Prometheus provides Metrics & Alerting • It is inspired by Google’s Borgmon, which uses time-series data as a datasource, to then send alerts based on this data • It fi ts very well in the cloud native infrastructur e • Prometheus is also a member of the CNCF (Cloud Native Foundation)
  • 11. Prometheus course: Edward Viaene & Jorn Jambers Prometheus • In Prometheus we talk about Dimensional Data: time series are identi fi ed by metric name and a set of key/value pairs • Prometheus includes a Flexible Query Language • Visualizations can be shown using a built-in expression browser or with integrations like Grafana • It stores metrics in memory and local disk in an own custom, ef fi cient format • It is written in Go • Many client libraries and integrations available Metric name Label Sample Temperature location=outside 90
  • 12. Prometheus course: Edward Viaene & Jorn Jambers How does Prometheus work? • Prometheus collects metrics from monitored targets by scraping metrics HTTP endpoint s • This is fundamentally different than other monitoring and alerting systems, (except this is also how Google’s Borgmon works) • Rather than using custom scripts that check on particular services and systems, the monitoring data itself is use d • Scraping endpoints is much more ef fi cient than other mechanisms, like 3rd party agents • A single prometheus server is able to ingest up to one million samples per second as several million time series Database Windows Server Application Prometheus HTTP HTTP HTTP
  • 14. Prometheus course: Edward Viaene & Jorn Jambers Prometheus Installation • I will install Prometheus using scripts from our GitHub repository (https://github.com/ in4it/prometheus-course) • They will work on any modern Linux distribution • I’ll install it on a DigitalOcean droplet • Feel free to use the scripts with any Cloud Provider, Virtual Machine, or Docker image, as long as it’s a recent Linux distribution • To get a free $100 coupon on DigitalOcean, valid for 60-days with a valid payment method added, use the following link: 
 
 https://m.do.co/c/b71b388ab76f • $10 is enough to run a 2 GB memory droplet for one month
  • 15. Prometheus course: Edward Viaene & Jorn Jambers Prometheus Installation • If you do not want to use the provided scripts, you can download the full distribution from https://github.com/prometheus/prometheus/releases • MacOS, Windows, Linux, and some Unix distributions are supported • After extracting you’ll get a prometheus executable (prometheus.exe for windows), which you can use to run prometheus, for example: • ./prometheus --con fi g. fi le /path/to/prometheus.yaml • It’s best to use the scripts we provided so that your environment is the same as ours when you follow the demos
  • 18. Prometheus course: Edward Viaene & Jorn Jambers Concepts • All data is stored as time series • Every time series is identi fi ed by the “metric name” and a set of key-value pairs, called labels • metric: go_memstat_alloc_bytes • instance=localhost:9090 • job=prometheus
  • 19. Prometheus course: Edward Viaene & Jorn Jambers Prometheus • The time series data also consists of the actual data, called Samples: • It can be a fl oat64 value • or a millisecond-precision timestamp
  • 20. Prometheus course: Edward Viaene & Jorn Jambers Prometheus • The notation of time series is often using this notation: • <metric name>{<label name>=<label value>, …} • For example: • node_boot_time{instance="localhost:9100",job="node_exporter"}
  • 22. Prometheus course: Edward Viaene & Jorn Jambers Prometheus Con fi guration • The con fi guration is stored in the Prometheus con fi guration fi le, in yaml format • The con fi guration fi le can be changed and applied, without having to restart Prometheus • A reload can be done by executing kill -SIGHUP <pid> • You can also pass parameters ( fl ags) at startup time to ./prometheus • Those parameters cannot be changed without restarting Prometheus • The con fi guration fi le is passed using the fl ag --con fi g. fi le
  • 23. Prometheus course: Edward Viaene & Jorn Jambers Prometheus Con fi guration • The default con fi guration looks like this: # my global config global: scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute. evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute. # scrape_timeout is set to the global default (10s). # Alertmanager configuration alerting: alertmanagers: - static_configs: - targets: # - alertmanager:9093 # Load rules once and periodically evaluate them according to the global 'evaluation_interval'. rule_files: # - "first_rules.yml" # - "second_rules.yml"
  • 24. Prometheus course: Edward Viaene & Jorn Jambers Prometheus Con fi guration • To scrape metrics, you need to add con fi guration to the prometheus con fi g fi le • For example, to scrape metrics from prometheus itself, the following code block is added by default # A scrape configuration containing exactly one endpoint to scrape: # Here it's Prometheus itself. scrape_configs: # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config. - job_name: 'prometheus' # metrics_path defaults to '/metrics' # scheme defaults to 'http'. static_configs: - targets: ['localhost:9090']
  • 27. Prometheus course: Edward Viaene & Jorn Jambers Monitor nodes • To monitor nodes, you need to install the node-exporter • The node exporter will expose machine metrics of Linux / *Nix machines • For example: cpu usage, memory usage • The node exporter can be used to monitor machines, and later on, you can create alerts based on these ingested metric s • For Windows, there’s a WMI exporter (see https://github.com/martinlindhe/ wmi_exporter)
  • 28. Prometheus course: Edward Viaene & Jorn Jambers Monitor nodes Linux machine node exporter HTTP Windows machine WMI exporter HTTP Prometheus Prometheus
  • 32. Prometheus course: Edward Viaene & Jorn Jambers Monitoring - Introduction • Client Libraries • Pushing Metrics • Querying • Service Discovery • Exporters
  • 34. Prometheus course: Edward Viaene & Jorn Jambers Client Libraries - Introduction • Instrumenting your code • Libraries • Of fi cial: Go, Java/Scala, Python, Ruby • Unof fi cial: Bash, C++, Common Lisp, Elixir, Erlang, Haskell, Lua for Nginx, Lua for Tarantool, .NET / C#, Node.js, PHP, Rust • No client library available? • Implement it yourself in one of the supported exposition formats
  • 35. Prometheus course: Edward Viaene & Jorn Jambers Client Libraries - Introduction • Exposition formats: • Simple text-based format • Protocol-buffer format (Prometheus 2.0 removed support for the protocol-buffer format) metric_name [ "{" label_name "=" `"` label_value `"` { "," label_name "=" `"` label_value `"` } [ "," ] "}" ] value [ timestamp ] node_filesystem_avail_bytes{device="/dev/vda1",fstype="ext4",mountpoint="/"} 4.9386491904e+10 node_filesystem_avail_bytes{device="/dev/vda15",fstype="vfat",mountpoint="/boot/efi"} 1.05903104e+08 node_filesystem_avail_bytes{device="lxcfs",fstype="fuse.lxcfs",mountpoint="/var/lib/lxcfs"} 0 node_filesystem_avail_bytes{device="tmpfs",fstype="tmpfs",mountpoint="/run"} 2.01273344e+08 node_filesystem_avail_bytes{device="tmpfs",fstype="tmpfs",mountpoint="/run/lock"} 5.24288e+06
  • 36. Prometheus course: Edward Viaene & Jorn Jambers Client Libraries - Introduction • 4 types of metrics • Counte r • Gaug e • Histogra m • Summary
  • 37. Prometheus course: Edward Viaene & Jorn Jambers Client Libraries - Introduction • Counter 
 
 A value that only goes up (e.g. Visits to a website)
  • 38. Prometheus course: Edward Viaene & Jorn Jambers Client Libraries - Introduction • Gauge 
 
 Single numeric value that can go up and down (e.g. CPU load, temperature)
  • 39. Prometheus course: Edward Viaene & Jorn Jambers Client Libraries - Introduction • Histogram 
 
 Samples observations (e.g. request durations or response sizes) and these observations get counted into buckets. Includes (_count and _sum) 
 
 Main purpose is calculating quantiles
  • 40. Prometheus course: Edward Viaene & Jorn Jambers Client Libraries - Introduction • Summary 
 
 Similar to a histogram, a summary samples observations (e.g. request durations or response sizes). A summary also provides a total count of observations and a sum of all observed values, it calculates con fi gurable quantiles over a sliding time window. 
 
 Example: You need 2 counters for calculating the latency 
 1) total request(_count) 
 2) the total latency of those requests (_sum) 
 
 Take the rate() and divide = average latency
  • 42. Prometheus course: Edward Viaene & Jorn Jambers Client Libraries - Python Example • https://github.com/prometheus/client_python • Of fi cially supported language • pip install prometheus_client • Supported metrics: Counter, Gauge, Summary and Histogram 

  • 43. Prometheus course: Edward Viaene & Jorn Jambers Client Libraries - Python Example • Python example: 
 import random, time from flask import Flask, render_template_string, abort from prometheus_client import generate_latest, REGISTRY, Counter, Gauge, Histogram app = Flask(__name__) REQUESTS = Counter('http_requests_total', 'Total HTTP Requests (count)', ['method', 'endpoint', 'status_code']) IN_PROGRESS = Gauge('http_requests_inprogress', 'Number of in progress HTTP requests') TIMINGS = Histogram('http_request_duration_seconds', 'HTTP request latency (seconds)') @app.route('/') @TIMINGS.time() @IN_PROGRESS.track_inprogress() def hello_world(): REQUESTS.labels(method='GET', endpoint="/", status_code=200).inc() # Increment the counter return 'Hello, World!' @app.route('/prometheus-course/<name>') @IN_PROGRESS.track_inprogress() @TIMINGS.time() def index(name): REQUESTS.labels(method='GET', endpoint="/prometheus-course/<name>", status_code=200).inc() return render_template_string('<b>Hello {{name}} welcome!</b>!', name=name) @app.route('/metrics') @IN_PROGRESS.track_inprogress()
  • 44. Prometheus course: Edward Viaene & Jorn Jambers Client Libraries - Python Example • Python example: 
 import random, time from flask import Flask, render_template_string, abort from prometheus_client import generate_latest, REGISTRY, Counter, Gauge, Histogram app = Flask(__name__) REQUESTS = Counter('http_requests_total', 'Total HTTP Requests (count)', ['method', 'endpoint', 'status_code']) IN_PROGRESS = Gauge('http_requests_inprogress', 'Number of in progress HTTP requests') TIMINGS = Histogram('http_request_duration_seconds', 'HTTP request latency (seconds)') @app.route('/') @TIMINGS.time() @IN_PROGRESS.track_inprogress() def hello_world(): REQUESTS.labels(method='GET', endpoint="/", status_code=200).inc() # Increment the counter return 'Hello, World!' @app.route('/prometheus-course/<name>') @IN_PROGRESS.track_inprogress() @TIMINGS.time() def index(name): REQUESTS.labels(method='GET', endpoint="/prometheus-course/<name>", status_code=200).inc() return render_template_string('<b>Hello {{name}} welcome!</b>!', name=name) @app.route('/metrics') @IN_PROGRESS.track_inprogress()
  • 45. Prometheus course: Edward Viaene & Jorn Jambers Client Libraries - Python Example • Python example: 
 import random, time from flask import Flask, render_template_string, abort from prometheus_client import generate_latest, REGISTRY, Counter, Gauge, Histogram app = Flask(__name__) REQUESTS = Counter('http_requests_total', 'Total HTTP Requests (count)', ['method', 'endpoint', 'status_code']) IN_PROGRESS = Gauge('http_requests_inprogress', 'Number of in progress HTTP requests') TIMINGS = Histogram('http_request_duration_seconds', 'HTTP request latency (seconds)') @app.route('/') @TIMINGS.time() @IN_PROGRESS.track_inprogress() def hello_world(): REQUESTS.labels(method='GET', endpoint="/", status_code=200).inc() # Increment the counter return 'Hello, World!' @app.route('/prometheus-course/<name>') @IN_PROGRESS.track_inprogress() @TIMINGS.time() def index(name): REQUESTS.labels(method='GET', endpoint="/prometheus-course/<name>", status_code=200).inc() return render_template_string('<b>Hello {{name}} welcome!</b>!', name=name) @app.route('/metrics') @IN_PROGRESS.track_inprogress()
  • 46. Prometheus course: Edward Viaene & Jorn Jambers Client Libraries - Python Example • Python example: 
 import random, time from flask import Flask, render_template_string, abort from prometheus_client import generate_latest, REGISTRY, Counter, Gauge, Histogram app = Flask(__name__) REQUESTS = Counter('http_requests_total', 'Total HTTP Requests (count)', ['method', 'endpoint', 'status_code']) IN_PROGRESS = Gauge('http_requests_inprogress', 'Number of in progress HTTP requests') TIMINGS = Histogram('http_request_duration_seconds', 'HTTP request latency (seconds)') @app.route('/') @TIMINGS.time() @IN_PROGRESS.track_inprogress() def hello_world(): REQUESTS.labels(method='GET', endpoint="/", status_code=200).inc() # Increment the counter return 'Hello, World!' @app.route('/prometheus-course/<name>') @IN_PROGRESS.track_inprogress() @TIMINGS.time() def index(name): REQUESTS.labels(method='GET', endpoint="/prometheus-course/<name>", status_code=200).inc() return render_template_string('<b>Hello {{name}} welcome!</b>!', name=name) @app.route('/metrics') @IN_PROGRESS.track_inprogress()
  • 47. Prometheus course: Edward Viaene & Jorn Jambers Client Libraries - Python Example • Python example: 
 import random, time from flask import Flask, render_template_string, abort from prometheus_client import generate_latest, REGISTRY, Counter, Gauge, Histogram app = Flask(__name__) REQUESTS = Counter('http_requests_total', 'Total HTTP Requests (count)', ['method', 'endpoint', 'status_code']) IN_PROGRESS = Gauge('http_requests_inprogress', 'Number of in progress HTTP requests') TIMINGS = Histogram('http_request_duration_seconds', 'HTTP request latency (seconds)') @app.route('/') @TIMINGS.time() @IN_PROGRESS.track_inprogress() def hello_world(): REQUESTS.labels(method='GET', endpoint="/", status_code=200).inc() # Increment the counter return 'Hello, World!' @app.route('/prometheus-course/<name>') @IN_PROGRESS.track_inprogress() @TIMINGS.time() def index(name): REQUESTS.labels(method='GET', endpoint="/prometheus-course/<name>", status_code=200).inc() return render_template_string('<b>Hello {{name}} welcome!</b>!', name=name) @app.route('/metrics') @IN_PROGRESS.track_inprogress()
  • 48. Prometheus course: Edward Viaene & Jorn Jambers Client Libraries - Python Example • Python example: 
 import random, time from flask import Flask, render_template_string, abort from prometheus_client import generate_latest, REGISTRY, Counter, Gauge, Histogram app = Flask(__name__) REQUESTS = Counter('http_requests_total', 'Total HTTP Requests (count)', ['method', 'endpoint', 'status_code']) IN_PROGRESS = Gauge('http_requests_inprogress', 'Number of in progress HTTP requests') TIMINGS = Histogram('http_request_duration_seconds', 'HTTP request latency (seconds)') @app.route('/') @TIMINGS.time() @IN_PROGRESS.track_inprogress() def hello_world(): REQUESTS.labels(method='GET', endpoint="/", status_code=200).inc() # Increment the counter return 'Hello, World!' @app.route('/prometheus-course/<name>') @IN_PROGRESS.track_inprogress() @TIMINGS.time() def index(name): REQUESTS.labels(method='GET', endpoint="/prometheus-course/<name>", status_code=200).inc() return render_template_string('<b>Hello {{name}} welcome!</b>!', name=name) @app.route('/metrics') @IN_PROGRESS.track_inprogress()
  • 49. Prometheus course: Edward Viaene & Jorn Jambers Client Libraries - Python Example • Python example: 
 import random, time from flask import Flask, render_template_string, abort from prometheus_client import generate_latest, REGISTRY, Counter, Gauge, Histogram app = Flask(__name__) REQUESTS = Counter('http_requests_total', 'Total HTTP Requests (count)', ['method', 'endpoint', 'status_code']) IN_PROGRESS = Gauge('http_requests_inprogress', 'Number of in progress HTTP requests') TIMINGS = Histogram('http_request_duration_seconds', 'HTTP request latency (seconds)') @app.route('/') @TIMINGS.time() @IN_PROGRESS.track_inprogress() def hello_world(): REQUESTS.labels(method='GET', endpoint="/", status_code=200).inc() # Increment the counter return 'Hello, World!' @app.route('/prometheus-course/<name>') @IN_PROGRESS.track_inprogress() @TIMINGS.time() def index(name): REQUESTS.labels(method='GET', endpoint=“/prometheus-course/jorn", status_code=200).inc() return render_template_string('<b>Hello {{name}} welcome!</b>!', name=name) @app.route('/metrics') @IN_PROGRESS.track_inprogress()
  • 50. Prometheus course: Edward Viaene & Jorn Jambers Client Libraries - Python Example • Python example: 
 import random, time from flask import Flask, render_template_string, abort from prometheus_client import generate_latest, REGISTRY, Counter, Gauge, Histogram app = Flask(__name__) REQUESTS = Counter('http_requests_total', 'Total HTTP Requests (count)', ['method', 'endpoint', 'status_code']) IN_PROGRESS = Gauge('http_requests_inprogress', 'Number of in progress HTTP requests') TIMINGS = Histogram('http_request_duration_seconds', 'HTTP request latency (seconds)') @app.route('/') @TIMINGS.time() @IN_PROGRESS.track_inprogress() def hello_world(): REQUESTS.labels(method='GET', endpoint="/", status_code=200).inc() # Increment the counter return 'Hello, World!' @app.route('/prometheus-course/<name>') @IN_PROGRESS.track_inprogress() @TIMINGS.time() def index(name): REQUESTS.labels(method='GET', endpoint=“/prometheus-course/jorn", status_code=200).inc() return render_template_string('<b>Hello {{name}} welcome!</b>!', name=name) @app.route('/metrics') @IN_PROGRESS.track_inprogress()
  • 51. Prometheus course: Edward Viaene & Jorn Jambers Client Libraries - Python Example • Python example: 
 @app.route('/metrics') @IN_PROGRESS.track_inprogress() @TIMINGS.time() def metrics(): REQUESTS.labels(method='GET', endpoint="/metrics", status_code=200).inc() return generate_latest(REGISTRY) if __name__ == "__main__": app.run(host='0.0.0.0')
  • 53. Prometheus course: Edward Viaene & Jorn Jambers Client Libraries - Golang Example • https://github.com/prometheus/client_golang • Of fi cially supported language • Easy to implement: • Supported metrics: Counter, Gauge, Summary and Histogram package main import ( "github.com/prometheus/client_golang/prometheus/promhttp" "net/http" ) func main() { http.Handle("/metrics", promhttp.Handler()) panic(http.ListenAndServe(":8080", nil)) }
  • 54. Prometheus course: Edward Viaene & Jorn Jambers Client Libraries - Golang Example • Gauge 
 import "github.com/prometheus/client_golang/prometheus" var jobsInQueue = prometheus.NewGauge( prometheus.GaugeOpts{ Name: "jobs_queued", Help: "Current number of jobs queued", }, ) func init(){ promtheus.MustRegister(jobsQueued) } 
 
 func enqueueJob(job Job) { queue.Add(job) jobsInQueue.Inc() } func runNextJob() { job := queue.Dequeue() jobsInQueue.Dec() job.Run() }
  • 55. Prometheus course: Edward Viaene & Jorn Jambers Client Libraries - Golang Example • Gauge 
 import "github.com/prometheus/client_golang/prometheus" var jobsQueued = prometheus.NewGauge( prometheus.GaugeOpts{ Name: "jobs_queued", Help: "Current number of jobs queued", }, ) func init(){ promtheus.MustRegister(jobsQueued) } 
 
 func enqueueJob(job Job) { queue.Add(job) jobsQueued.Inc() } func runNextJob() { job := queue.Dequeue() jobsInQueue.Dec() job.Run() }
  • 56. Prometheus course: Edward Viaene & Jorn Jambers Client Libraries - Golang Example • Adding labels 
 import "github.com/prometheus/client_golang/prometheus" var jobsQueued = prometheus.NewGaugeVec( prometheus.GaugeOpts{ Name: "jobs_queued", Help: "Current number of jobs in the queue", }, []string{"job_type"}, ) func init(){ promtheus.MustRegister(jobsQueued) } 
 
 func enqueueJob(job Job) { queue.Add(job) jobsInQueue.WithLabelValues(job.Type()).Inc() } func runNextJob() { job := queue.Dequeue() jobsInQueue.WithLabelValues(job.Type()).Dec() job.Run() }
  • 57. Prometheus course: Edward Viaene & Jorn Jambers Client Libraries - Golang Example • Adding labels 
 import "github.com/prometheus/client_golang/prometheus" var jobsQueued = prometheus.NewGaugeVec( prometheus.GaugeOpts{ Name: "jobs_queued", Help: "Current number of jobs in the queue", }, []string{"job_type"}, ) func init(){ promtheus.MustRegister(jobsQueued) } 
 
 func enqueueJob(job Job) { queue.Add(job) jobsQueued.WithLabelValues(job.Type()).Inc() } func runNextJob() { job := queue.Dequeue() jobsQueued.WithLabelValues(job.Type()).Dec() job.Run() }
  • 58. Prometheus course: Edward Viaene & Jorn Jambers Client Libraries - Golang Example • Histogram 
 import "github.com/prometheus/client_golang/prometheus" var jobsDurationHistogram = prometheus.NewHistogramVec( prometheus.HistogramOpts{ Name: "jobs_duration_seconds", Help: "Jobs duration distribution", Buckets: []float64{1, 2, 5, 10, 20, 60}, }, []string{"job_type"}, ) start := time.Now() job.Run() duration := time.Since(start) jobsDurationHistogram.WithLabelValues(job.Type()).Observe(duration.Seconds())
  • 59. Prometheus course: Edward Viaene & Jorn Jambers Client Libraries - Golang Example • Histogram 
 import "github.com/prometheus/client_golang/prometheus" var jobsDurationHistogram = prometheus.NewHistogramVec( prometheus.HistogramOpts{ Name: "jobs_duration_seconds", Help: "Jobs duration distribution", Buckets: []float64{1, 2, 5, 10, 20, 60}, }, []string{"job_type"}, ) start := time.Now() job.Run() duration := time.Since(start) jobsDurationHistogram.WithLabelValues(job.Type()).Observe(duration.Seconds())
  • 60. Prometheus course: Edward Viaene & Jorn Jambers Client Libraries - Golang Example • Summary 
 prometheus.NewSummary()
  • 62. Prometheus course: Edward Viaene & Jorn Jambers Pushing Metrics - Introduction App Prometheus Push Gateway Push 
 Metrics Pull 
 Metrics • https://github.com/prometheus/pushgateway • Diagram
  • 63. Prometheus course: Edward Viaene & Jorn Jambers Pushing Metrics - Introduction • Sometimes metrics cannot be scraped 
 Example: batch jobs, servers are not reachable due to NAT, fi rewal l • Pushgateway is used as an intermediary service which allows you to push metrics. • Pitfall s • Most of the times this is a single instance so this results in a SPOF • Prometheus’s automatic instance health monitoring is not possible • The Pushgateway never forgets the metrics unless they are deleted via the api 
 example: 
 
 curl -X DELETE http://localhost:9091/metrics/job/prom_course/instance/localhost 

  • 64. Prometheus course: Edward Viaene & Jorn Jambers Pushing Metrics - Introduction • Only 1 valid use case for the Pushgateway • Service-level batch jobs and not related to a speci fi c machine • If NAT or/both fi rewall is blocking you from using the pull mechanism • Move the Prometheus server on the same network
  • 65. Prometheus course: Edward Viaene & Jorn Jambers Pushing Metrics - Python Example • Python example: • Pushgateway functions take a grouping key. • push_to_gateway replaces metrics with the same grouping key • pushadd_to_gateway only replaces metrics with the same name and grouping key • delete_from_gateway deletes metrics with the given job and grouping key. from prometheus_client import CollectorRegistry, Gauge, push_to_gateway registry = CollectorRegistry() g = Gauge('job_last_success_unixtime', ‘Last time the course batch job has finished', registry=registry) g.set_to_current_time() push_to_gateway('localhost:9091', job='batchA', registry=registry)
  • 67. Prometheus course: Edward Viaene & Jorn Jambers Pushing Metrics - Go Example • Go example: 
 package main import ( "flag" "log" "net/http" "github.com/prometheus/client_golang/prometheus/promhttp" "github.com/prometheus/client_golang/prometheus/push" ) gatewayUrl:="http://localhost:9091/"   throughputGuage := prometheus.NewGauge(prometheus.GaugeOpts{ Name: “throughput”, Help: "Throughput in Mbps", }) throughputGuage.Set(800)   if err := push.Collectors( "throughput_job", push.HostnameGroupingKey(), gatewayUrl, throughputGuage ); err != nil { fmt.Println("Could not push completion time to Pushgateway:", err) }
  • 69. Prometheus course: Edward Viaene & Jorn Jambers Querying Metrics - Introduction • Prometheus provides a functional expression language called PromQL • Provides built in operators and functions • Vector-based calculations like Excel • Expressions over time-series vectors • PromQL is read-only • Example: 100 - (avg by (instance) (irate(node_cpu_seconds_total{job='node_exporter',mode="idle"}[5m])) * 100)
  • 71. Prometheus course: Edward Viaene & Jorn Jambers Querying Metrics - Introduction • Instant vector - a set of time series containing a single sample for each time series, all sharing the same timestamp 
 Example: node_cpu_seconds_total • Range vector - a set of time series containing a range of data points over time for each time series 
 Example: node_cpu_seconds_total[5m] • Scalar - a simple numeric fl oating point value 
 Example: -3.14 • String - a simple string value; currently unused 
 Example: foobar
  • 73. Prometheus course: Edward Viaene & Jorn Jambers Querying Metrics - Introduction • Arithmetic binary operators  
 Example: - (subtraction), * (multiplication), / (division), % (modulo), ^ (power/exponentiation) • Comparison binary operators  
 Example: == (equal), != (not-equal), > (greater-than), < (less-than) ,>= (greater-or-equal), <= (less-or-equal) • Logical/set binary operators
 Example: and (intersection), or (union), unless (complement) • Aggregation operators  
 Example:sum (calculate sum over dimensions), min (select minimum over dimensions) ,max (select maximum over dimensions), avg (calculate the average over dimensions), stddev (calculate population standard deviation over dimensions), stdvar (calculate population standard variance over dimensions), count (count number of elements in the vector), count_values (count number of elements with the same value), bottomk (smallest k elements by sample value), topk (largest k elements by sample value), quantile (calculate φ-quantile (0 ≤ φ ≤ 1) over dimensions)
  • 76. Prometheus course: Edward Viaene & Jorn Jambers Service Discovery - Introduction • De fi nition: 
 Service discovery is the automatic detection of devices and services offered by these devices on a computer network. • Not really a service discovery mechanism • Cloud support for (AWS, Azure, Google,…) • Cluster managers (Kubernetes, Marathon, …) • Generic mechanisms (DNS, Consul, Zookeeper, …) static_configs: - targets: ['localhost:9090']
  • 78. Prometheus course: Edward Viaene & Jorn Jambers Service Discovery - Introduction • EC2 Example: 
 Add following con fi g to /etc/prometheus/prometheus.yml • Make sure the user has the following IAM role: AmazonEC2ReadOnlyAcces s • Make sure you security groups allow access to port (9100, 9090) 
 global: scrape_interval: 1s evaluation_interval: 1s scrape_configs: - job_name: 'node' ec2_sd_configs: - region: eu-west-1 access_key: PUT_THE_ACCESS_KEY_HERE secret_key: PUT_THE_SECRET_KEY_HERE port: 9100
  • 79. Prometheus course: Edward Viaene & Jorn Jambers Service Discovery - Introduction • EC2 Example: 
 Only monitor instances started with the name PROD global: scrape_interval: 1s evaluation_interval: 1s scrape_configs: - job_name: 'node' ec2_sd_configs: - region: eu-west-1 access_key: PUT_THE_ACCESS_KEY_HERE secret_key: PUT_THE_SECRET_KEY_HERE port: 9100 relabel_configs: # Only monitor instances with a tag Name starting with “PROD" - source_labels: [__meta_ec2_tag_Name] regex: PROD.* action: keep # Use the instance ID as the instance label - source_labels: [__meta_ec2_instance_id] target_label: instance
  • 80. Prometheus course: Edward Viaene & Jorn Jambers Service Discovery - Introduction • EC2 Example: 
 Relabel ip adress to instance id for convenience global: scrape_interval: 1s evaluation_interval: 1s scrape_configs: - job_name: 'node' ec2_sd_configs: - region: eu-west-1 access_key: PUT_THE_ACCESS_KEY_HERE secret_key: PUT_THE_SECRET_KEY_HERE port: 9100 relabel_configs: # Only monitor instances with a tag Name starting with “PROD" - source_labels: [__meta_ec2_tag_Name] regex: PROD.* action: keep # Use the instance ID as the instance label - source_labels: [__meta_ec2_instance_id] target_label: instance
  • 81. Prometheus Service Discovery - Example Kubernetes
  • 82. Prometheus course: Edward Viaene & Jorn Jambers Service Discovery - Introduction • Kubernetes Example: 
 Add following con fi g to /etc/prometheus/prometheus.yml 
 - job_name:'kubernetes' kubernetes_sd_configs: - api_servers: - https://kubernetes.default.svc in_cluster: true basic_auth: username: prometheus password: secret retry_interval:5s - job_name:’kubernetes-service-endpoints' kubernetes_sd_configs: - api_servers: - https://kube-master.prometheuscourse.com in_cluster: true
  • 84. Prometheus course: Edward Viaene & Jorn Jambers Service Discovery - DNS • DNS Example: 
 Add following con fi g to /etc/prometheus/prometheus.yml 
 - job_name: mysql dns_sd_configs: - names: - metrics.mysql.example.com - job_name: haproxy dns_sd_configs: - names: - metrics.haproxy.example.com
  • 85. Prometheus Service Discovery - Example using fi le
  • 86. Prometheus course: Edward Viaene & Jorn Jambers Service Discovery - Using fi le • File Example: 
 Add following con fi g to /etc/prometheus/prometheus.yml • Format target.json scrape_configs: - job_name: 'dummy' # This is a default value, it is mandatory. file_sd_configs: - files: - targets.json [ { "targets": [ "myslave1:9104", "myslave2:9104" ], "labels": { "env": "prod", "job": "mysql_slave" } }, { "targets": [ "mymaster:9104" ], "labels": { "env": "prod", "job": "mysql_master" } } ]
  • 88. Prometheus course: Edward Viaene & Jorn Jambers Exporters - Introduction • Build for exporting prometheus metrics from existing third-party metrics • When Prometheus is not able to pull metrics directly(Linux sys stats, haproxy, …) • Examples: 
 MySQL server exporter 
 Memcached exporter 
 Consul exporter 
 Node/system metrics exporter 
 MongoDB 
 Redis 
 Many more…. • https://prometheus.io/docs/instrumenting/exporters/
  • 89. Prometheus course: Edward Viaene & Jorn Jambers Exporters - Introduction • We are already using one :-) 
 Check /etc/prometheus/prometheus.yml - job_name: 'node_exporter' scrape_interval: 5s static_configs: - targets: ['localhost:9100']
  • 92. Prometheus course: Edward Viaene & Jorn Jambers Alerting - Introduction • Alerting in Prometheus is separated into 2 parts • Alerting rules in Prometheus server • Alertmanager Receivers SLACK EMAIL Prometheu s Server Routes Alertmanage r Push alert Rule 1 Rule 2 Rule n
  • 94. Prometheus course: Edward Viaene & Jorn Jambers Alerting Rules • Rules live in Prometheus server con fi g • Best practice to separate the alerts from the prometheus con fi g • Add an include to /etc/prometheus/prometheus.yml 
 rule_files: 
 - "/etc/prometheus/alert.rules" • Alert format: 
 • Alert example: 
 groups: - name: example rules: - alert: cpuUsge expr: 100 - (avg by (instance) (irate(node_cpu_seconds_total{job='node_exporter',mode="idle"}[5m])) * 100) > 95 for: 1m labels: severity: critical annotations: summary: Machine under healvy load ALERT <alert name> IF <expression> [ FOR <duration> ] [ LABELS <label set> ] [ ANNOTATIONS <label set> ]
  • 95. Prometheus course: Edward Viaene & Jorn Jambers Alerting Rules • Alerting rules allow you to de fi ne the alert conditions • Alerting rules sent the alerts being fi red to an external service • The format of these alerts is in the Prometheus expression language • Example: groups: - name: Important instance rules: # Alert for any instance that is unreachable for >5 minutes. - alert: InstanceDown expr: up == 0 for: 5m labels: severity: page annotations: summary: "Instance {{ $labels.instance }} down" description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes."
  • 97. Prometheus course: Edward Viaene & Jorn Jambers Alertmanager • Alertmanager handles the alerts fi red by the prometheus server • Handles deduplication, grouping and routing of alerts • Routes alerts to receivers (Pagerduty, Opsgenie, email, Slack,…)
  • 98. Prometheus course: Edward Viaene & Jorn Jambers Alertmanager • Alertmanager Con fi guration (/etc/alertmanager/alertmanager.yml): global: smtp_smarthost: 'localhost:25' smtp_from: 'alertmanager@prometheus.com' smtp_auth_username: '' smtp_auth_password: '' templates: - '/etc/alertmanager/template/*.tmpl' route: repeat_interval: 1h receiver: operations-team receivers: - name: 'operations-team' email_configs: - to: 'operations-team+alerts@example.org' slack_configs: - api_url: https://hooks.slack.com/services/XXXXXX/XXXXXX/XXXXXX channel: '#prometheus-course' send_resolved: true
  • 99. Prometheus course: Edward Viaene & Jorn Jambers Alertmanager • Prometheus Con fi guration (/etc/prometheus/prometheus.yml): # my global config global: scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute. evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute. # scrape_timeout is set to the global default (10s). # Alertmanager configuration alerting: alertmanagers: - static_configs: - targets: - localhost:9093 # Load rules once and periodically evaluate them according to the global 'evaluation_interval'. rule_files: # - "first_rules.yml" # - "second_rules.yml" # A scrape configuration containing exactly one endpoint to scrape: # Here it's Prometheus itself. scrape_configs: # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config. - job_name: 'prometheus' # metrics_path defaults to '/metrics' # scheme defaults to 'http'. static_configs: - targets: ['localhost:9090']
  • 100. Prometheus course: Edward Viaene & Jorn Jambers Alertmanager • Concepts: • Grouping: Groups similar alerts into 1 noti fi cation • Inhibition: Silence other alerts if one speci fi ed alert is already fi red • Silences: A simple way to mute certain noti fi cations
  • 101. Prometheus course: Edward Viaene & Jorn Jambers Alertmanager • High availability • You can create a high available Alertmanager cluster using mesh con fi g • Do not load balance this service! • Use a list of Alertmanager nodes in Prometheus con fi g • All alerts are sent to all known Alertmanager nodes • No need to monitor the monitoring • Guarantees the noti fi cation is at least send once
  • 102. Prometheus course: Edward Viaene & Jorn Jambers Alertmanager • Alert states: 
 Inactive - No rule is met 
 Pending - Rule is met but can be suppressed due to validations 
 Firing - Alert is sent to the con fi gured channel(mail,Slack,…) • Runs on port :9093
  • 103. Prometheus course: Edward Viaene & Jorn Jambers Alertmanager • Notifying multiple destinations route: repeat_interval: 1h receiver: operations-team receivers: - name: 'operations-team' email_configs: - to: 'operations-team+alerts@example.org' slack_configs: - api_url: https://hooks.slack.com/services/XXXXXX/XXXXXX/XXXXXX channel: '#prometheus-course' send_resolved: true
  • 106. Prometheus course: Edward Viaene & Jorn Jambers Setting up alerts • Install Alertmanager • Create con fi g for the Alertmanager • Mail • Slack • Alter prometheus con fi g • Setup an alert • See the noti fi cation coming in when an alert is fi red
  • 109. Prometheus course: Edward Viaene & Jorn Jambers Architecture From: https://github.com/prometheus/prometheus
  • 111. Prometheus course: Edward Viaene & Jorn Jambers Storage • You can use the default local on-disk storage, or optionally the remote storage syste m • Local storage: a local time series database in a custom Prometheus format • Remote storage: you can read/write samples to a remote system in a standardized format • Currently it uses a snappy-compressed protocol buffer encoding over HTTP, but might change in the future (to use gRPC or HTTP/2)
  • 112. Prometheus course: Edward Viaene & Jorn Jambers Remote Storage • Remote storage is primarily focussed at long term storage • Currently there are adapters available for the following solutions: AppOptics: write Graphite: write Chronix: write In fl uxDB: read and write Cortex: read and write OpenTSDB: write CrateDB: read and write PostgreSQL/TimescaleDB: read and write Gnocchi: write SignalFx: write Source: https://prometheus.io/docs/operating/integrations/#remote-endpoints-and-storage
  • 113. Prometheus course: Edward Viaene & Jorn Jambers Local Storage • Prometheus >=2.0 uses a new storage engine which dramatically increases scalability • Ingested samples are grouped in blocks of two hours • Those 2h samples are stored in separate directories (in the data directory of prometheus) • Writes are batched and written to disk in chunks, containing multiple data points directory 1 directory 2 directory 3 2h of data: chunks/00000 1 chunks/000002 2h of data: chunks/00000 1 2h of data: chunks/00000 1 chunks/000002
  • 114. Prometheus course: Edward Viaene & Jorn Jambers Local Storage • Every directory also has an index fi le (index) and a metadata fi le (meta.json) • It stores the metric names and the labels, and provides an index from the metric names and labels to the series in the chunk fi les directory 1 directory 2 directory 3 chunks/000001 chunks/000002 meta.jso n index chunks/000001 meta.jso n index chunks/000001 chunks/000002 meta.jso n index
  • 115. Prometheus course: Edward Viaene & Jorn Jambers Local Storage • The most recent data is kept in memory • You don’t want to loose the in-memory data during a crash, so the data also needs to be persisted to disk. This is done using a write-ahead-log (WAL) directory 1 directory 2 directory 3 chunks/000001 chunks/000002 meta.json index chunks/000001 meta.json index chunks/000001 chunks/000002 meta.json index wal: 000001 000002
  • 116. Prometheus course: Edward Viaene & Jorn Jambers Local Storage • Write Ahead Log (WAL) • It’s quicker to append to a fi le (like a log) than making (multiple) random read/writes • If there’s a server crash and the data from memory is lost, then the WAL will be replayed • This way, no data will be lost or corrupted during a crash
  • 117. Prometheus course: Edward Viaene & Jorn Jambers Local Storage • When series gets deleted, a tombstone fi le gets created • This is more ef fi cient than immediately deleting the data from the chunk fi les, as the actual delete can happen at a later time (e.g. when there’s not a lot of load) directory 1 directory 2 directory 3 2h of data: chunks/000001 chunks/000002 meta.json index tombstone 2h of data: chunks/000001 chunks/000002 meta.json index tombstone 2h of data: chunks/000001 chunks/000002 meta.json index tombstone
  • 118. Prometheus course: Edward Viaene & Jorn Jambers Local Storage • The initial 2-hour blocks are merged in the background to form longer blocks • This is called compaction directory 1 directory 2+3 2h of data: chunks/000001 chunks/000002 4h of data : chunks/000001 chunks/000002
  • 119. Prometheus course: Edward Viaene & Jorn Jambers Local Storage • Block characteristics: • A block on the fi lesystem is a directory with chunks • You can see each block as a fully independent database containing all time series for the window • Every block of data, except the current block, is immutable (no changes can be made) • These non-overlapping blocks are actually a horizontal partitioning of the ingested time series data
  • 120. Prometheus course: Edward Viaene & Jorn Jambers Local Storage • This horizontal partitioning gives a lot of bene fi ts: • When querying, the blocks not in the time range can be skipped • When completing a block, data only needs to be added, and not modi fi ed (avoids write-ampli fi cation) • Recent data is kept in memory, so can be queried quicker • Deleting old data is only a matter of deleting directories on the fi lesystem
  • 121. Prometheus course: Edward Viaene & Jorn Jambers Local Storage • Compaction: • When querying, blocks have to be merged together to be able to calculate the results • Too many blocks could cause too much merging overhead, so blocks are compacted • 2 blocks are merged and form a newly created (often larger) block • Compaction can also modify data: dropping deleted data or restructuring the chunks to increase the query performance
  • 122. Prometheus course: Edward Viaene & Jorn Jambers Local Storage • The index: • Having horizontal partitioning already makes most queries quicker, but not those that need to go through all the data to get the result • The index is an inverted index to provide better query performance, also in cases where all data needs to be queried • Each series is assigned a unique ID (e.g. ID 1, 2, and 3) • The index will contain an inverted index for the labels, for example for label env=production, it’ll have 1 and 3 as IDs if those series contain the label env=production
  • 123. Prometheus course: Edward Viaene & Jorn Jambers Local Storage • What about Disk size? • On average, Prometheus needs 1-2 bytes per sample • You can use the following formula to calculate the disk space needed: needed_disk_space = retention_time_seconds * ingested_samples_per_second * bytes_per_sample
  • 124. Prometheus course: Edward Viaene & Jorn Jambers Local Storage • How to reduce disk size? • You can increase the scrape interval, which will get you less data • You can decrease the targets or series you scrape • Or you can can reduce the retention (how long you keep the data) --storage.tsdb.retention: This determines when to remove old data. Defaults to 15d.
  • 125. Prometheus course: Edward Viaene & Jorn Jambers References • To read the full story of Prometheus time series database, read the blog post from Fabian Reinartz at https://fabxc.org/tsdb/
  • 127. Prometheus course: Edward Viaene & Jorn Jambers Security • At the moment Prometheus doesn’t offer any support for authentication or encryption (TLS) on the server components • They argue that they’re focussing on building a monitoring solution, and want to avoid having to implement complex security features • You can still enable authentication and TLS, using a reverse proxy • This is only valid for server components, prometheus can scrape TLS and authentication enabled targets • See tls_con fi g in the prometheus con fi guration to con fi gure a CA certi fi cate, user certi fi cate and user key • You’d still need to setup a reverse proxy for the targets itself
  • 128. Demo Prometheus TLS and authentication
  • 131. Monitoring a web app Prometheus with python- fl ask and MySQL
  • 132. Prometheus course: Edward Viaene & Jorn Jambers Monitoring a web app • I’m going to integrate prometheus monitoring with a web application based on python • I’ll use the of fi cial prometheus_client library for Python • Flask is the web framework I’m going to use • It will create an http server and I’ll able to con fi gure routes (e.g. /query) • I’ll use mysqlclient for python to query a MySQL database • I’ll include one normal query and one “bad behaving” query that will take between 0 and 10 seconds to execute
  • 133. Prometheus course: Edward Viaene & Jorn Jambers The web app Python web app fl ask http server (port 5000) prometheus monitoring (port 8000) MySQL Database (port 3306) Prometheus (port 9090) Client (curl, wget, browser)
  • 134. Prometheus course: Edward Viaene & Jorn Jambers Monitoring a web app • I’m going to use the Counter and the Histogram metric types to capture the data: • A Counter to capture the amount of times an http endpoint is hit + to capture the amount of times a MySQL query is executed • The value of the Counter must always increase, that’s why you should take the Counter type for these types of data • A Histogram to capture the latency of the HTTP requests and the MySQL Queries • A Histogram samples observations (like latencies) and counts them in con fi gurable buckets. It also provides a sum of all observed values. • The default buckets are intended to cover a typical web/rpc request from milliseconds to seconds
  • 135. Prometheus course: Edward Viaene & Jorn Jambers Monitoring a web app from prometheus_client import Counter, Histogram FLASK_REQUEST_LATENCY = Histogram(' fl ask_request_latency_seconds', 'Flask Request Latency', ['method', 'endpoint']) FLASK_REQUEST_COUNT = Counter(' fl ask_request_count', 'Flask Request Count', ['method', 'endpoint', 'http_status']) MYSQL_REQUEST_LATENCY = Histogram('mysql_query_latency_seconds', 'MYSQL Query Latency', ['query']) MYSQL_REQUEST_COUNT = Counter('mysql_query_count', 'Flask Request Count', ['query']) • This is how I’m going to de fi ne the data types in Python for Prometheus:
  • 136. Prometheus course: Edward Viaene & Jorn Jambers Monitoring a web app start_time = time.time() sql = “select * from table” # do the query query_latency = time.time() - start_time MYSQL_REQUEST_LATENCY.labels(sql[:50]).observe(query_latency) MYSQL_REQUEST_COUNT.labels(sql[:50]).inc() • This is how we can calculate the latency of a query:
  • 137. Demo Monitor a web application with Prometheus
  • 138. Demo Monitor a web application with Prometheus - apdex score
  • 139. Monitoring a web app Prometheus with Java Spring boot
  • 140. Prometheus course: Edward Viaene & Jorn Jambers Monitoring a web app • Introduction • Application's health + Metrics • Notice unwanted behaviour • Monoliths as well as microservices • Crucial in microservices architecture • “To measure is to know”
  • 141. Prometheus course: Edward Viaene & Jorn Jambers Monitoring a web app • We are going to integrate prometheus monitoring with a web application based on Java Spring Boot • We will use following : • Spring Boot • Spring Boot Actuato r • Micromete r • We are also going to do a demo with an example.
  • 142. Prometheus course: Edward Viaene & Jorn Jambers The web app Spring Boot web app Spring Boot http server (port 8080 ) /api/demo & /api/delayed/demo prometheus monitoring (port 8080 ) /actuator/prometheus Prometheus Client (curl, wget, browser)
  • 143. Prometheus course: Edward Viaene & Jorn Jambers Monitoring a web app • Spring Boot Actuato r • Sub-Project of Spring boot • Production ready endpoints • /actuator is the common pre fi x of these endpoints • Protected by default • Adjustable in application.properties • Expose all: management.endpoints.web.exposure.include=*
  • 144. Prometheus course: Edward Viaene & Jorn Jambers Monitoring a web app • Micromete r • Vendor-Neutral application metrics facade • Support for Prometheus and many others: • AWS Cloudwatch, Datadog, In fl uxDB/Telegraf, New Relic, … • Transforms /actuator/metrics data into data your monitoring system understands • Only a vendor-speci fi c micrometer dependency in your application is required
  • 145. Prometheus course: Edward Viaene & Jorn Jambers Monitoring a web app • Micrometer • pom.xml example 
 <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-actuator</artifactId> </dependency> <dependency> <groupId>io.micrometer</groupId> <artifactId>micrometer-core</artifactId> </dependency> <dependency> <groupId>io.micrometer</groupId> <artifactId>micrometer-registry-prometheus</artifactId> </dependency>
  • 146. Prometheus course: Edward Viaene & Jorn Jambers Monitoring a web app • Spring Boot • Code example 
 … import io.micrometer.core.instrument.Metrics; … private Counter runCounter = Metrics.counter("runCounter"); … @GetMapping("/api/demo") @Timed public String apiUse() throws InterruptedException { runCounter.increment(); log.info("Hello world app accessed on /api/demo"); return "Hello world"; }
  • 147. Demo Monitor and instrument a Spring boot application
  • 149. Prometheus course: Edward Viaene & Jorn Jambers Grafana Provisioning • In one of the fi rst lectures I showed you how to setup Grafana using the UI • Rather than using the UI, you can also use yaml and json fi les to provision Grafana with datasources and dashboards • This is a much more powerful way of using Grafana, as you can test new dashboards fi rst on a dev / test server, then import the newly created dashboards to production • You can do the import manually through the UI, or using yaml and json fi les • When using fi les, you can keep fi les within version control to keep changes, revisions and backups
  • 150. Prometheus course: Edward Viaene & Jorn Jambers Grafana Provisioning • The con fi guration of Grafana is all kept in /etc/grafana: • The data is kept in /var/lib/grafana: /etc/grafana/: -rw-r----- 1 root grafana 14K Jul 17 12:30 grafana.ini -rw-r----- 1 root grafana 3.4K Jul 17 12:30 ldap.toml drwxr-xr-x 4 root grafana 4.0K Jul 17 13:15 provisioning/ /etc/grafana/provisioning/ : drwxr-xr-x 2 root grafana 4.0K Jul 17 14:56 dashboards/ drwxr-xr-x 2 root grafana 4.0K Jul 17 15:34 datasources/ /var/lib/grafana : drwxr-xr-x 2 root root 4.0K Jul 17 15:47 dashboards/ -rw-r----- 1 grafana grafana 500K Jul 17 15:48 grafana.db drwxr-x--- 2 grafana grafana 4.0K Jul 17 12:31 plugins/ drwx------ 5 grafana grafana 4.0K Jul 17 12:40 sessions/
  • 151. Prometheus course: Edward Viaene & Jorn Jambers Grafana Provisioning • You can change the database & paths in /etc/grafana/grafana.ini [paths] # Path to where grafana can store temp fi les, sessions, and the sqlite3 db (if that is used) ;data = /var/lib/grafana # Directory where grafana can store logs ;logs = /var/log/grafana # Directory where grafana will automatically scan and look for plugins ;plugins = /var/lib/grafana/plugins # folder that contains provisioning con fi g fi les that grafana will apply on startup and while running. ;provisioning = conf/provisioning … [database] # Either "mysql", "postgres" or "sqlite3", it's your choice ;type = sqlite3 ;host = 127.0.0.1:3306 ;name = grafana ;user = root # If the password contains # or ; you have to wrap it with triple quotes. Ex """#password;""" ;password =
  • 154. Prometheus course: Edward Viaene & Jorn Jambers Use Cases - Cloudwatch • Cloudwatch exporter • Installation • Con fi guration (exporter + AWS) • Charges + measuring them • Querying metrics
  • 156. Prometheus course: Edward Viaene & Jorn Jambers Consul integration • Consul is a distributed, highly available solution providing: • A Service Mesh • Service Discovery • Health checks for your services • A Key-Value store • Secure Service Communications • Multi-datacenter support • Consul is often deployed in conjunction with Docker
  • 157. Prometheus course: Edward Viaene & Jorn Jambers Consul integration • There are 2 integrations that are interesting to use: • 1) Prometheus can scrape Consul’s metrics and provide you with all sorts of information about your running services • Consul provides Service Discovery, so it knows where services are running and what the current state of it is • 2) Consul can be integrated within Prometheus to automatically add the services as target s • Consul will discover your services, and these can then be automatically added to Prometheus as a target
  • 158. Prometheus course: Edward Viaene & Jorn Jambers Consul integration • In the next demo I’ll focus on the Prometheus integration with Consul, not really on implementing consul itself • I’ll show you the installation of consul, but not how to integrate consul with your infrastructure (it’s out of scope for this Prometheus course) • I’ll manually register a service to consul, rather than setting up service discovery • In production environments, where you have a lot of services, service discovery using consul will allow you to register your services automatically • If you are interested in Consul, have a look at the documentation at https:// www.consul.io/ and you can fi nd the service registrator for docker at https:// github.com/gliderlabs/registrator
  • 160. Prometheus Use Cases EC2 auto discovery
  • 161. Prometheus course: Edward Viaene & Jorn Jambers EC2 auto discovery • Service discovery is the automatic detection of devices and services offered by these devices on a computer network. • In this Use Case we will: • Create prerequisites in AWS (IAM role, Security Groups, EC2 instances) • Alter Prometheus con fi g (/etc/prometheus/prometheus.yml) • Query the data in Grafana • https://github.com/in4it/prometheus-course/blob/master/use-cases/ec2- auto-discovery/lab.txt
  • 162. Prometheus on Kubernetes Getting Kubernetes metrics