Our Drupal 8 websites are true applications, often very complex ones.
More and more workload is delegated to external systems, usually microservices, that are used for many different tasks.
Architectures are always more distributed and fragmented.
To trace the lifecycle of a single request that origins in a client, passes throught all Drupal subsytems, reaches external (micro)services and comes back will become mandatory to track down problems and to optimize for performances. This is often time consuming and without the right tools may became very difficult.
A simple unstructured log stream isn't enough anymore, we need to find a way to observe the details of what is going on.
Observability is all about this and is based on structured logs, metrics and traces. In this talk we will see how to implement these tecniques in Drupal, which tools and which modules to use to trace and log all requests that reach our website and how to expose and display useful metrics.
We will integrate Drupal with OpenTracing, Prometheus, Monolog, Grafana and many more.
3. WE ARE A TECH COMPANY OF ENGINEERS,
DEVELOPERS AND DESIGNERS WHO WILL
THINK, DESIGN AND BUILD YOUR CUSTOM APPLICATIONS,
MODERNIZE YOUR LEGACY AND TAKE YOU TO THE
CLOUD NATIVE ERA.
SPARKFABRIK
4. We help italian businesses to bridge
the gap with China thanks to our
Official Partnership with
Alibaba Cloud.
SparkFabrik is Cloud Native
Computing Foundation
Silver Member.
SparkFabrik is Google Cloud
Platform Technology Partner.
SparkFabrik is AWS
Official Partner.
PROUD OF OUR PARTNERSHIPS
5. Almost everyone is working with distributed systems.
There are microservices, containers, cloud, serverless,
and a lot of combinations of these technologies. All of
these increase the number of failures that systems may encounter
because there are too many parts interacting.
And because of the distributed system’s diversity, it’s
complex to understand present problems and predict
future ones
6. Observability is a measure of how well
internal states of a system can be
inferred from knowledge of its external
outputs
7. We want to observe production environments, and
generic metrics like CPU and memory usage
are not sufficient anymore
9. Tools
OpenTelemetry
OpenTelemetry is a collection of tools, APIs,
and SDKs. It can be used to instrument,
generate, collect, and export telemetry data
(metrics, logs, and traces) to help analyze
software’s performance and behavior.
OpenTelemetry is an incubating project
from the Cloud Native Computing
Foundation, created after the merger of
OpenCensus (from Google) and
OpenTracing (from Uber).
The data collected with OpenTelemetry is
vendor-agnostic and can be exported in
many formats.
https://opentelemetry.io
10. Cloud vendor
● AWS Distro for OpenTelemetry: aws-otel.github.io/
● Google Cloud OpenTelemetry: google-cloud-opentelemetry.readthedocs.io
● Azure Monitor OpenTelemetry: docs.microsoft.com/en-us/azure/azure-monitor/app/opentelemetry-
overview
12. Tools
OpenTelemetry is an interesting project but it’s not yet ready to
cover the entire observability stack. We need other tools to
collect metrics and logs, and a way to store and visualize
collected data.
Logs -> Monolog
Metrics -> Prometheus (using OpenTelemetry API)
Traces -> OpenTelemetry
Storage and visualization -> Grafana
13. Tools
Monolog
Monolog is a standard PHP library and it can
be included in a Drupal website using a
contrib module (www.drupal.org/project/monolog)
Monolog sends logs to files, sockets,
inboxes, databases and various web
services.
Monolog implements the PSR-3 interface
that can be type-hint against in code to
keep a maximum of interoperability.
github.com/Seldaek/monolog
14. Tools
Prometheus
Prometheus is an open-source systems
monitoring and alerting toolkit originally built
at SoundCloud. It is now a standalone open
source project and maintained independently
of any company.
Prometheus collects and stores metrics as
time series data.
Prometheus was the second project to join
the Cloud Native Computing Foundation after
Kubernetes.
prometheus.io
15. Tools
Grafana
Grafana allows to query, visualize, alert on
and understand metrics, traces and logs no
matter where they are stored.
grafana.com
Application 1
Instrumentation
library
Application 2
Instrumentation
library
Application 3
Instrumentation
library
Orchestrator
Grafana Agent
Loki
Tempo
Prometheus
Grafana
Grafana Cloud
23. parameters:
monolog.channel_handlers:
default:
handlers:
- name: 'rotating_file'
formatter: 'json'
monolog.processors: [
'message_placeholder', 'current_user',
'request_uri', 'ip', 'referer',
'filter_backtrace', 'introspection'
]
Then define handlers, formatter
and processors using service
container parameters. Here we're
configuring the default channel to
catch all log messages and to save
them using the
monolog.handler.rotating_file
service, in json format and after
being processed by a set of
processors
27. Structured logs makes it simple to query them for any
sort of useful information
We can write custom Monolog processors to add
application’s custom data to our logs
28. In a Cloud Native environment, the application runs on multiple servers (or pods). We
need a way to export all those logs generated by every instance of the application.
In this case our logs are files stored in the local filesystem of every instance.
We have to discover, scrape and send them to a log collector.
Promtail is an agent which ships the contents of local logs to a private Grafana Loki
instance or Grafana Cloud. It is usually deployed to every machine that has applications
needed to be monitored.
32. 1. Logs are about storing specific events
2. Metrics are a measurement at a point in time for the system
33. Examples of the sort of metrics you might have would be:
● the number of times you receive an HTTP request
● how much time was spent handling requests
● how many requests are currently in progress
● the number of errors occurred
34. To instrument our application and record real-time
metrics we will use the Prometheus exporter
exposed by OpenTelemetry
36. There’s a module for that!
Observability suite
https://www.drupal.org/project/o11y
37. Prometheus scrapes data at the /metrics endpoint at a configured rate
PHP uses a shared-nothing architecture by default
o11y needs a way to store data between one scrape and the next
default implementation uses Redis as a storage backend
38. o11y_metrics module automatically instrument a Drupal website to collect data about:
● number of requests (per route)
● time of the request (per route)
● used PHP memory
39. The module exposes an URL with metrics in
Prometheus format (/metrics)
# HELP php_info Information about the PHP environment.
# TYPE php_info gauge
php_info{version="8.1.3"} 1
# HELP requests The number of requests
# TYPE requests counter
requests{path="entity.user.canonical"} 1
# HELP memory The peak of memory allocated by PHP
# TYPE memory histogram
memory_bucket{path="entity.user.canonical",le="0.005"} 0
memory_bucket{path="entity.user.canonical",le="0.01"} 0
memory_bucket{path="entity.user.canonical",le="0.025"} 0
memory_bucket{path="entity.user.canonical",le="0.05"} 0
memory_bucket{path="entity.user.canonical",le="0.075"} 0
memory_bucket{path="entity.user.canonical",le="0.1"} 0
memory_bucket{path="entity.user.canonical",le="0.25"} 0
memory_bucket{path="entity.user.canonical",le="0.5"} 1
memory_bucket{path="entity.user.canonical",le="0.75"} 1
memory_bucket{path="entity.user.canonical",le="1"} 1
memory_bucket{path="entity.user.canonical",le="2.5"} 1
memory_bucket{path="entity.user.canonical",le="5"} 1
memory_bucket{path="entity.user.canonical",le="7.5"} 1
memory_bucket{path="entity.user.canonical",le="10"} 1
memory_bucket{path="entity.user.canonical",le="+Inf"} 1
memory_count{path="entity.user.canonical"} 1
memory_sum{path="entity.user.canonical"} 0.31189873814648
# HELP time The time of a request
# TYPE time histogram
time_bucket{path="entity.user.canonical",le="0.005"} 0
time_bucket{path="entity.user.canonical",le="0.01"} 0
time_bucket{path="entity.user.canonical",le="0.025"} 1
time_bucket{path="entity.user.canonical",le="0.05"} 1
time_bucket{path="entity.user.canonical",le="0.075"} 1
time_bucket{path="entity.user.canonical",le="0.1"} 1
time_bucket{path="entity.user.canonical",le="0.25"} 1
time_bucket{path="entity.user.canonical",le="0.5"} 1
time_bucket{path="entity.user.canonical",le="0.75"} 1
time_bucket{path="entity.user.canonical",le="1"} 1
time_bucket{path="entity.user.canonical",le="2.5"} 1
time_bucket{path="entity.user.canonical",le="5"} 1
time_bucket{path="entity.user.canonical",le="7.5"} 1
time_bucket{path="entity.user.canonical",le="10"} 1
time_bucket{path="entity.user.canonical",le="+Inf"} 1
time_count{path="entity.user.canonical"} 1
time_sum{path="entity.user.canonical"} 0.01953125
42. Node exporter
Prometheus exporter for hardware and OS metrics
exposed by *NIX kernels, written in Go with pluggable
metric collectors, for example:
cpu
meminfo
filesystem
diskstats
netdev
44. o11y module is a POC of what we can do with the OpenTelemetry API for metrics, if you
want a more robust solution you can try the Prometheus.io Exporter module (also from
SparkFabrik):
https://www.drupal.org/project/prometheusio_exporter
46. 1. Logs are about storing specific events
2. Metrics are a measurement at a point in time for the system
3. Distributed traces deals with information that is request-
scoped
47. We will use the Observability suite module to instrument our
application
Internally the module uses OpenTelemetry to do the hard work
48. Per-process logging and metric monitoring have their
place, but neither can reconstruct the elaborate
journeys that transactions take as they propagate
across a distributed system. Distributed traces are
these journeys
49. We take for example a Drupal 10 website
that renders a page with some data that comes from a
remote microservice
50. class MicroserviceController extends ControllerBase {
private Client $httpClient;
public static function create(ContainerInterface $container) {
return new static(
$container->get('http_client')
);
}
final public function __construct(Client $httpClient) {
$this->httpClient = $httpClient;
}
public function view() {
$response = $this->httpClient->get('http://ddev-drupal10-microservice:8080/hello-instrumented');
$json = json_decode($response->getBody()->getContents());
$this->loggerFactory->get('devdays')->notice($json->message);
return [
'#type' => 'markup',
'#markup' => $json->message,
];
}
}
59. One last thing we need is to correlate traces with logs,
so when we found a problem with a request we can go
from the trace to the logs (and viceversa)
60. The O11y module provides a new processor for
Monolog that adds a trace_id argument to every log