SlideShare uma empresa Scribd logo
1 de 116
Baixar para ler offline
@A_Bangser @FlowConFR #FlowCon
My slides are / will be available for you at:
@A_Bangser @FlowConFR #FlowCon
Observability -
Experiencing the “why” behind the jargon
Abby Bangser
https://www.slideshare.net/AbigailBangser
@A_Bangser @FlowConFR #FlowCon
Observability
In control theory, observability is a measure of how well
internal states of a system can be inferred from knowledge of
its external outputs.
@A_Bangser @FlowConFR #FlowCon
Observability
In control theory, observability is a measure of how well
internal states of a system can be inferred from knowledge of
its external outputs.
@A_Bangser @FlowConFR #FlowCon
“measure of how well” means observability is a scale
@A_Bangser @FlowConFR #FlowCon
“measure of how well” means observability is a scale
@A_Bangser @FlowConFR #FlowCon
“measure of how well” means observability is a scale
Incident
triage
Incident
triage
happening?!
@A_Bangser @FlowConFR #FlowCon
“measure of how well” means observability is a scale
How easy is it to answer a new question without deploying new code?
Incident
triage
Incident
triage
happening?!
observability observability
observability
@A_Bangser @FlowConFR #FlowCon
Observability
In control theory, observability is a measure of how well
internal states of a system can be inferred from knowledge
of its external outputs.
@A_Bangser @FlowConFR #FlowCon
External outputs help us answer these questions
@A_Bangser @FlowConFR #FlowCon
External outputs help us answer these questions
@A_Bangser @FlowConFR #FlowCon
External outputs help us answer these questions
@A_Bangser @FlowConFR #FlowCon
External outputs help us answer these questions
@A_Bangser @FlowConFR #FlowCon
So you might be thinking… “right, monitoring”
https://bravenewgeek.com/wp-content/uploads/2019/10/monitoring_vs_observability_overlay-1024x539.png
@A_Bangser @FlowConFR #FlowCon
So you might be thinking… “right, monitoring”
https://bravenewgeek.com/wp-content/uploads/2019/10/monitoring_vs_observability_overlay-1024x539.png
@A_Bangser @FlowConFR #FlowCon
So you might be thinking… “right, monitoring”
https://bravenewgeek.com/wp-content/uploads/2019/10/monitoring_vs_observability_overlay-1024x539.png
@A_Bangser @FlowConFR #FlowCon
True observability is discovering new behaviours
https://bravenewgeek.com/wp-content/uploads/2019/10/monitoring_vs_observability_overlay-1024x539.png
@A_Bangser @FlowConFR #FlowCon
Observability
In control theory, observability is a measure of how well
internal states of a system can be inferred from knowledge of
its external outputs.
@A_Bangser @FlowConFR #FlowCon
Characteristics of what generates valuable outputs
https://thenewstack.io/observability-a-3-year-retrospective/
➔ raw events
➔ no pre-aggregation
➔ structured data
➔ arbitrarily wide events
➔ schema-less-ness
➔ high cardinality dimensions
➔ oriented around the lifecycle of the request
➔ batched up context
➔ static dashboards don’t work, it must be exploratory
@A_Bangser @FlowConFR #FlowCon
Characteristics of what generates valuable outputs
https://thenewstack.io/observability-a-3-year-retrospective/
➔ raw events
➔ no pre-aggregation
➔ structured data
➔ arbitrarily wide events
➔ schema-less-ness
➔ high cardinality dimensions
➔ oriented around the lifecycle of the request
➔ batched up context
➔ static dashboards don’t work, it must be exploratory
ByTwitter,CCBY4.0,
https://commons.wikimedia.org/w/index.php?curid=76921548
@A_Bangser @FlowConFR #FlowCon
Let’s understand a couple of these through examples
➔ raw events
➔ no pre-aggregation
➔ structured data
➔ arbitrarily wide events
➔ schema-less-ness
➔ high cardinality dimensions
➔ oriented around the lifecycle of the request
➔ batched up context
➔ static dashboards don’t work, it must be exploratory
@A_Bangser @FlowConFR #FlowCon
Let’s understand a couple of these through examples
➔ raw events
➔ no pre-aggregation
➔ structured data
➔ arbitrarily wide events
➔ schema-less-ness
➔ high cardinality dimensions
➔ oriented around the lifecycle of the request
➔ batched up context
➔ static dashboards don’t work, it must be exploratory
@A_Bangser @FlowConFR #FlowCon
The promise of monitoring vs my reality
My rollercoaster journey with understanding metrics and
pre-aggregation starts back in 2016...
@A_Bangser @FlowConFR #FlowCon
Monitorama 2016 - an awakening
Lessons include…
➔ It is not just testing that is dead
➔ Wow! There is a world of available data I have no idea about
➔ These tools are so cool...wait, what are these tools?
@A_Bangser @FlowConFR #FlowCon
Metrics can track success (and failure) of changes made
https://landing.google.com/sre/sre-book/chapters/monitoring-distributed-systems/#xref_monitoring_golden-signals
@A_Bangser @FlowConFR #FlowCon
An ask:
I want to monitor live
systems
An opportunity:
Help create a
client’s first cloud
infrastructure
@A_Bangser @FlowConFR #FlowCon
@A_Bangser @FlowConFR #FlowCon
An operations focused project changed my tool chain
@A_Bangser @FlowConFR #FlowCon
An operations focused project changed my tool chain
@A_Bangser @FlowConFR #FlowCon@A_Bangser @FlowConFR #FlowCon
And then…
just like testability,
operability became
hard to prioritise
@A_Bangser @FlowConFR #FlowCon@A_Bangser @FlowConFR #FlowCon
Two years and many projects later Hobbsy had a plan
Track latency over 4 weeks and alert when current trends exceed 2 standard deviations
2standarddeviations
@A_Bangser @FlowConFR #FlowCon@A_Bangser @FlowConFR #FlowCon
Two years and many projects later Hobbsy had a plan
Track latency over 4 weeks and alert when current trends exceed 2 standard deviations
2standarddeviations
@A_Bangser @FlowConFR #FlowCon
To do this at MOO
s / MOO / any company over a few years old /
➔ 40 services
➔ 4 core languages
➔ 3 eras of architectural decisions
➔ 2 transport protocols (http and gRPC)
@A_Bangser @FlowConFR #FlowCon
To do this at MOO
s / MOO / any company over a few years old /
➔ 40 services
➔ 4 core languages
➔ 3 eras of architectural decisions
➔ 2 transport protocols (http and gRPC)
...and a partridge in a pear tree
@A_Bangser @FlowConFR #FlowCon@A_Bangser @FlowConFR #FlowCon
The plan: Standardise metrics across the estate
@A_Bangser @FlowConFR #FlowCon@A_Bangser @FlowConFR #FlowCon
Consistency across services created so much learning
@A_Bangser @FlowConFR #FlowCon@A_Bangser @FlowConFR #FlowCon
But...
@A_Bangser @FlowConFR #FlowCon@A_Bangser @FlowConFR #FlowCon
Our data collection made certain assumptions which
in the end required re-collecting in a different way
@A_Bangser @FlowConFR #FlowCon@A_Bangser @FlowConFR #FlowCon
How histograms gets generated in a time series DB
le= 0.05
http_requests_seconds_bucket
le= 0.1 le= 0.5 le= 1 le= 5 le= +inf
* “le” stands for “less than or equal to”
@A_Bangser @FlowConFR #FlowCon@A_Bangser @FlowConFR #FlowCon
How histograms gets generated in a time series DB
le= 0.05
http_requests_seconds_bucket
le= 0.1 le= 0.5 le= 1 le= 5 le= +inf
* “le” stands for “less than or equal to”
www.moo.com in 0.25 seconds
@A_Bangser @FlowConFR #FlowCon@A_Bangser @FlowConFR #FlowCon
How histograms gets generated in a time series DB
le= 0.05
http_requests_seconds_bucket
le= 0.1 le= 0.5 le= 1 le= 5 le= +inf
* “le” stands for “less than or equal to”
www.moo.com in 0.25 seconds
@A_Bangser @FlowConFR #FlowCon@A_Bangser @FlowConFR #FlowCon
How histograms gets generated in a time series DB
le= 0.05
http_requests_seconds_bucket
le= 0.1 le= 0.5 le= 1 le= 5 le= +inf
* “le” stands for “less than or equal to”
www.moo.com/big_file in 5 seconds
www.moo.com in 0.25 seconds
@A_Bangser @FlowConFR #FlowCon@A_Bangser @FlowConFR #FlowCon
How histograms gets generated in a time series DB
le= 0.05
http_requests_seconds_bucket
le= 0.1 le= 0.5 le= 1 le= 5 le= +inf
* “le” stands for “less than or equal to”
www.moo.com/big_file in 5 seconds
www.moo.com in 0.25 seconds
@A_Bangser @FlowConFR #FlowCon@A_Bangser @FlowConFR #FlowCon
We collected counts of how many requests per bucket
le= .05
http_requests_seconds_bucket
le= .1 le= .5 le= 1 le= 5 le= +inf
Offset
1week2week3week4week
le= .05 le= .1 le= .5 le= 1 le= 5 le= +inf
le= .05 le= .1 le= .5 le= 1 le= 5 le= +inf
le= .05 le= .1 le= .5 le= 1 le= 5 le= +inf
@A_Bangser @FlowConFR #FlowCon@A_Bangser @FlowConFR #FlowCon
The data we had collected, we had to throw away
http_requests_seconds_bucket
Offset
1week2week3week4week
le= .05 le= 5le= .5le= .1 le= +infle= 1
le= .05 le= .1 le= +infle= 1
le= 5le= .1 le= +infle= 1
le= .05 le= 5le= .1 le= .5
le= .05
le= .5
le= .5
le= 1
le= 5
le= +inf
@A_Bangser @FlowConFR #FlowCon@A_Bangser @FlowConFR #FlowCon
At least the update was made, now we are all set right?
@A_Bangser @FlowConFR #FlowCon
le= .05 le= 5le= .1 le= .5 le= 1 le= +inf
Except, that 99th percentile...what does that actually mean?
@A_Bangser @FlowConFR #FlowCon
Let’s see what our logs say about it
@A_Bangser @FlowConFR #FlowCon
Just 1% of 500,000 requests applies to 56,000 people
@A_Bangser @FlowConFR #FlowCon
To see >10 seconds, I would need the 99.996%
@A_Bangser @FlowConFR #FlowCon@A_Bangser @FlowConFR #FlowCon
So, while consistent metrics
trending over time was a big
step forward...
In retrospect,
these experiences were
not mature observability
@A_Bangser @FlowConFR #FlowCon
Why avoid pre-aggregation?
Because you can never regain the original context and detail,
you can only ever ask predetermined questions
@A_Bangser @FlowConFR #FlowCon
Let’s understand a couple of these through examples
➔ raw events
➔ no pre-aggregation
➔ structured data
➔ arbitrarily wide events
➔ schema-less-ness
➔ high cardinality dimensions
➔ oriented around the lifecycle of the request
➔ batched up context
➔ static dashboards don’t work, it must be exploratory
@A_Bangser @FlowConFR #FlowCon
Data is not the same as information
Step one is accepting that while sentences may be readable.
<key : value> pairs are more easily queried.
@A_Bangser @FlowConFR #FlowCon
Even from the first “Hello World” we humans logged
@A_Bangser @FlowConFR #FlowCon
And from there we wanted more information
7a82dd3a
@A_Bangser @FlowConFR #FlowCon
And from there we wanted more information
7a82dd3a
@A_Bangser @FlowConFR #FlowCon
So then we backfilled in structure
grok {
match => [
"Request",
"%{URIPROTO:request_uri_scheme}://
%{HOSTNAME:request_uri_host}(?::%{POSINT:request_uri_port})
?%{URIPATH:request_uri_path}(?:%{URIPARAM:request_uri_query})?"
]}
}
@A_Bangser @FlowConFR #FlowCon
And of course, from there we wanted more
mutate {
split => { "uri_array" => "/"}
add_field => {
"uri_root" => ["/%{[uri_array][1]}"]
"uri_first" => ["/%{[uri_array][2]}"]
"uri_second" => ["/%{[uri_array][3]}"]
"uri_root_first" => "%{uri_root}%{uri_first}"
"uri_root_second" => "%{uri_root}%{uri_first}%{uri_second}"
}
@A_Bangser @FlowConFR #FlowCon
And even looking past the bad fields values, lots of
servers means lots of intermingled logs
@A_Bangser @FlowConFR #FlowCon
And even looking past the bad fields values, lots of
servers means lots of intermingled logs
@A_Bangser @FlowConFR #FlowCon
Rewind...how are logs written during a request?
@A_Bangser @FlowConFR #FlowCon
Detailing how logs get written during a request
@PostMapping("flip")
public ResponseEntity flipImage(@RequestParam("image") MultipartFile file,
@RequestParam(value = "vertical") Boolean vertical,
@RequestParam(value = "horizontal") Boolean horizontal) {
if (file.getContentType() != null) {
LOGGER.warn("Wrong content type uploaded: {}", file.getContentType());
return new ResponseEntity<>("Wrong content type uploaded: " + file.getContentType());
}
LOGGER.info("Receiving {} image to flip.", file.getContentType());
byte[] flippedImage = imageService.flip(file, vertical, horizontal);
if (flippedImage == null) {
return new ResponseEntity<>("Failed to flip image", HttpStatus.INTERNAL_SERVER_ERROR);
}
LOGGER.info("Successfully flipped image id: {}", file.getId());
return new ResponseEntity<>(flippedImage, headers, HttpStatus.OK);
}
@A_Bangser @FlowConFR #FlowCon
Detailing how logs get written during a request
@PostMapping("flip")
public ResponseEntity flipImage(@RequestParam("image") MultipartFile file,
@RequestParam(value = "vertical") Boolean vertical,
@RequestParam(value = "horizontal") Boolean horizontal) {
if (file.getContentType() != null) {
LOGGER.warn("Wrong content type uploaded: {}", file.getContentType());
return new ResponseEntity<>("Wrong content type uploaded: " + file.getContentType());
}
LOGGER.info("Receiving {} image to flip.", file.getContentType());
byte[] flippedImage = imageService.flip(file, vertical, horizontal);
if (flippedImage == null) {
return new ResponseEntity<>("Failed to flip image", HttpStatus.INTERNAL_SERVER_ERROR);
}
LOGGER.info("Successfully flipped image id: {}", file.getId());
return new ResponseEntity<>(flippedImage, headers, HttpStatus.OK);
}
@A_Bangser @FlowConFR #FlowCon
Detailing how logs get written during a request
@PostMapping("flip")
public ResponseEntity flipImage(@RequestParam("image") MultipartFile file,
@RequestParam(value = "vertical") Boolean vertical,
@RequestParam(value = "horizontal") Boolean horizontal) {
if (file.getContentType() != null) {
LOGGER.warn("Wrong content type uploaded: {}", file.getContentType());
return new ResponseEntity<>("Wrong content type uploaded: " + file.getContentType());
}
LOGGER.info("Receiving {} image to flip.", file.getContentType());
byte[] flippedImage = imageService.flip(file, vertical, horizontal);
if (flippedImage == null) {
return new ResponseEntity<>("Failed to flip image", HttpStatus.INTERNAL_SERVER_ERROR);
}
LOGGER.info("Successfully flipped image id: {}", file.getId());
return new ResponseEntity<>(flippedImage, headers, HttpStatus.OK);
}
LOGGER.info("Receiving {} image to flip.", file.getContentType());
@A_Bangser @FlowConFR #FlowCon
Detailing how logs get written during a request
@PostMapping("flip")
public ResponseEntity flipImage(@RequestParam("image") MultipartFile file,
@RequestParam(value = "vertical") Boolean vertical,
@RequestParam(value = "horizontal") Boolean horizontal) {
if (file.getContentType() != null) {
LOGGER.warn("Wrong content type uploaded: {}", file.getContentType());
return new ResponseEntity<>("Wrong content type uploaded: " + file.getContentType());
}
LOGGER.info("Receiving {} image to flip.", file.getContentType());
byte[] flippedImage = imageService.flip(file, vertical, horizontal);
if (flippedImage == null) {
return new ResponseEntity<>("Failed to flip image", HttpStatus.INTERNAL_SERVER_ERROR);
}
LOGGER.info("Successfully flipped image id: {}", file.getId());
return new ResponseEntity<>(flippedImage, headers, HttpStatus.OK);
}
LOGGER.info("Receiving {} image to flip.", file.getContentType());
@A_Bangser @FlowConFR #FlowCon
Detailing how logs get written during a request
@PostMapping("flip")
public ResponseEntity flipImage(@RequestParam("image") MultipartFile file,
@RequestParam(value = "vertical") Boolean vertical,
@RequestParam(value = "horizontal") Boolean horizontal) {
if (file.getContentType() != null) {
LOGGER.warn("Wrong content type uploaded: {}", file.getContentType());
return new ResponseEntity<>("Wrong content type uploaded: " + file.getContentType());
}
LOGGER.info("Receiving {} image to flip.", file.getContentType());
byte[] flippedImage = imageService.flip(file, vertical, horizontal);
if (flippedImage == null) {
return new ResponseEntity<>("Failed to flip image", HttpStatus.INTERNAL_SERVER_ERROR);
}
LOGGER.info("Successfully flipped image id: {}", file.getId());
return new ResponseEntity<>(flippedImage, headers, HttpStatus.OK);
}
@A_Bangser @FlowConFR #FlowCon
Detailing how logs get written during a request
@PostMapping("flip")
public ResponseEntity flipImage(@RequestParam("image") MultipartFile file,
@RequestParam(value = "vertical") Boolean vertical,
@RequestParam(value = "horizontal") Boolean horizontal) {
if (file.getContentType() != null) {
LOGGER.warn("Wrong content type uploaded: {}", file.getContentType());
return new ResponseEntity<>("Wrong content type uploaded: " + file.getContentType());
}
LOGGER.info("Receiving {} image to flip.", file.getContentType());
byte[] flippedImage = imageService.flip(file, vertical, horizontal);
if (flippedImage == null) {
return new ResponseEntity<>("Failed to flip image", HttpStatus.INTERNAL_SERVER_ERROR);
}
LOGGER.info("Successfully flipped image id: {}", file.getId());
return new ResponseEntity<>(flippedImage, headers, HttpStatus.OK);
}
@A_Bangser @FlowConFR #FlowCon
Detailing how logs get written during a request
@PostMapping("flip")
public ResponseEntity flipImage(@RequestParam("image") MultipartFile file,
@RequestParam(value = "vertical") Boolean vertical,
@RequestParam(value = "horizontal") Boolean horizontal) {
if (file.getContentType() != null) {
LOGGER.warn("Wrong content type uploaded: {}", file.getContentType());
return new ResponseEntity<>("Wrong content type uploaded: " + file.getContentType());
}
LOGGER.info("Receiving {} image to flip.", file.getContentType());
byte[] flippedImage = imageService.flip(file, vertical, horizontal);
if (flippedImage == null) {
return new ResponseEntity<>("Failed to flip image", HttpStatus.INTERNAL_SERVER_ERROR);
}
LOGGER.info("Successfully flipped image id: {}", file.getId());
return new ResponseEntity<>(flippedImage, headers, HttpStatus.OK);
}
LOGGER.info("Successfully flipped image id: {}", file.getId()");
@A_Bangser @FlowConFR #FlowCon
Detailing how logs get written during a request
@PostMapping("flip")
public ResponseEntity flipImage(@RequestParam("image") MultipartFile file,
@RequestParam(value = "vertical") Boolean vertical,
@RequestParam(value = "horizontal") Boolean horizontal) {
if (file.getContentType() != null) {
LOGGER.warn("Wrong content type uploaded: {}", file.getContentType());
return new ResponseEntity<>("Wrong content type uploaded: " + file.getContentType());
}
LOGGER.info("Receiving {} image to flip.", file.getContentType());
byte[] flippedImage = imageService.flip(file, vertical, horizontal);
if (flippedImage == null) {
return new ResponseEntity<>("Failed to flip image", HttpStatus.INTERNAL_SERVER_ERROR);
}
LOGGER.info("Successfully flipped image id: {}", file.getId());
return new ResponseEntity<>(flippedImage, headers, HttpStatus.OK);
}
LOGGER.info("Successfully flipped image id: {}", file.getId()");
@A_Bangser @FlowConFR #FlowCon
Detailing how logs get written during a request
@PostMapping("flip")
public ResponseEntity flipImage(@RequestParam("image") MultipartFile file,
@RequestParam(value = "vertical") Boolean vertical,
@RequestParam(value = "horizontal") Boolean horizontal) {
if (file.getContentType() != null) {
LOGGER.warn("Wrong content type uploaded: {}", file.getContentType());
return new ResponseEntity<>("Wrong content type uploaded: " + file.getContentType());
}
LOGGER.info("Receiving {} image to flip.", file.getContentType());
byte[] flippedImage = imageService.flip(file, vertical, horizontal);
if (flippedImage == null) {
return new ResponseEntity<>("Failed to flip image", HttpStatus.INTERNAL_SERVER_ERROR);
}
LOGGER.info("Successfully flipped image id: {}", file.getId());
return new ResponseEntity<>(flippedImage, headers, HttpStatus.OK);
}
@A_Bangser @FlowConFR #FlowCon@A_Bangser @FlowConFR #FlowCon
Log outputs
@A_Bangser @FlowConFR #FlowCon
In contrast, how an event is created during a request
@PostMapping("flip")
public ResponseEntity flipImage(...) {
EVENT.addField("content.type", file.getContentType());
EVENT.addField("action", "flip");
EVENT.addField("image_id", file.getId());
EVENT.addField("flip_vertical", vertical);
EVENT.addField("flip_horizontal", horizontal);
...
LOGGER.info("Receiving {} image to flip.", file.getContentType());
byte[] flippedImage = imageService.flip(file, vertical, horizontal);
...
LOGGER.info("Successfully flipped image id: {}", file.getId());
EVENT.addField("action.success", "true");
return new ResponseEntity<>(flippedImage, headers, HttpStatus.OK);
}
@A_Bangser @FlowConFR #FlowCon
In contrast, how an event is created during a request
@PostMapping("flip")
public ResponseEntity flipImage(...) {
EVENT.addField("content.type", file.getContentType());
EVENT.addField("action", "flip");
EVENT.addField("image_id", file.getId());
EVENT.addField("flip_vertical", vertical);
EVENT.addField("flip_horizontal", horizontal);
...
LOGGER.info("Receiving {} image to flip.", file.getContentType());
byte[] flippedImage = imageService.flip(file, vertical, horizontal);
...
LOGGER.info("Successfully flipped image id: {}", file.getId());
EVENT.addField("action.success", "true");
return new ResponseEntity<>(flippedImage, headers, HttpStatus.OK);
}
@A_Bangser @FlowConFR #FlowCon
In contrast, how an event is created during a request
@PostMapping("flip")
public ResponseEntity flipImage(...) {
EVENT.addField("content.type", file.getContentType());
EVENT.addField("action", "flip");
EVENT.addField("image_id", file.getId());
EVENT.addField("flip_vertical", vertical);
EVENT.addField("flip_horizontal", horizontal);
...
LOGGER.info("Receiving {} image to flip.", file.getContentType());
byte[] flippedImage = imageService.flip(file, vertical, horizontal);
...
LOGGER.info("Successfully flipped image id: {}", file.getId());
EVENT.addField("action.success", "true");
return new ResponseEntity<>(flippedImage, headers, HttpStatus.OK);
}
EVENT.addField("content.type", file.getContentType());
@A_Bangser @FlowConFR #FlowCon
In contrast, how an event is created during a request
@PostMapping("flip")
public ResponseEntity flipImage(...) {
EVENT.addField("content.type", file.getContentType());
EVENT.addField("action", "flip");
EVENT.addField("image_id", file.getId());
EVENT.addField("flip_vertical", vertical);
EVENT.addField("flip_horizontal", horizontal);
...
LOGGER.info("Receiving {} image to flip.", file.getContentType());
byte[] flippedImage = imageService.flip(file, vertical, horizontal);
...
LOGGER.info("Successfully flipped image id: {}", file.getId());
EVENT.addField("action.success", "true");
return new ResponseEntity<>(flippedImage, headers, HttpStatus.OK);
}
EVENT.addField("content.type", file.getContentType());
EVENT.addField("action", "flip");
@A_Bangser @FlowConFR #FlowCon
In contrast, how an event is created during a request
@PostMapping("flip")
public ResponseEntity flipImage(...) {
EVENT.addField("content.type", file.getContentType());
EVENT.addField("action", "flip");
EVENT.addField("image_id", file.getId());
EVENT.addField("flip_vertical", vertical);
EVENT.addField("flip_horizontal", horizontal);
...
LOGGER.info("Receiving {} image to flip.", file.getContentType());
byte[] flippedImage = imageService.flip(file, vertical, horizontal);
...
LOGGER.info("Successfully flipped image id: {}", file.getId());
EVENT.addField("action.success", "true");
return new ResponseEntity<>(flippedImage, headers, HttpStatus.OK);
}
EVENT.addField("content.type", file.getContentType());
EVENT.addField("action", "flip");
EVENT.addField("image_id", file.getId());
@A_Bangser @FlowConFR #FlowCon
In contrast, how an event is created during a request
@PostMapping("flip")
public ResponseEntity flipImage(...) {
EVENT.addField("content.type", file.getContentType());
EVENT.addField("action", "flip");
EVENT.addField("image_id", file.getId());
EVENT.addField("flip_vertical", vertical);
EVENT.addField("flip_horizontal", horizontal);
...
LOGGER.info("Receiving {} image to flip.", file.getContentType());
byte[] flippedImage = imageService.flip(file, vertical, horizontal);
...
LOGGER.info("Successfully flipped image id: {}", file.getId());
EVENT.addField("action.success", "true");
return new ResponseEntity<>(flippedImage, headers, HttpStatus.OK);
}
EVENT.addField("content.type", file.getContentType());
EVENT.addField("action", "flip");
EVENT.addField("flip_vertical", vertical);
EVENT.addField("image_id", file.getId());
@A_Bangser @FlowConFR #FlowCon
In contrast, how an event is created during a request
@PostMapping("flip")
public ResponseEntity flipImage(...) {
EVENT.addField("content.type", file.getContentType());
EVENT.addField("action", "flip");
EVENT.addField("image_id", file.getId());
EVENT.addField("flip_vertical", vertical);
EVENT.addField("flip_horizontal", horizontal);
...
LOGGER.info("Receiving {} image to flip.", file.getContentType());
byte[] flippedImage = imageService.flip(file, vertical, horizontal);
...
LOGGER.info("Successfully flipped image id: {}", file.getId());
EVENT.addField("action.success", "true");
return new ResponseEntity<>(flippedImage, headers, HttpStatus.OK);
}
LOGGER.info("Receiving {} image to flip.", file.getContentType());
EVENT.addField("content.type", file.getContentType());
EVENT.addField("action", "flip");
EVENT.addField("flip_vertical", vertical);
EVENT.addField("image_id", file.getId());
EVENT.addField("flip_horizontal", horizontal);
@A_Bangser @FlowConFR #FlowCon
In contrast, how an event is created during a request
@PostMapping("flip")
public ResponseEntity flipImage(...) {
EVENT.addField("content.type", file.getContentType());
EVENT.addField("action", "flip");
EVENT.addField("image_id", file.getId());
EVENT.addField("flip_vertical", vertical);
EVENT.addField("flip_horizontal", horizontal);
...
LOGGER.info("Receiving {} image to flip.", file.getContentType());
byte[] flippedImage = imageService.flip(file, vertical, horizontal);
...
LOGGER.info("Successfully flipped image id: {}", file.getId());
EVENT.addField("action.success", "true");
return new ResponseEntity<>(flippedImage, headers, HttpStatus.OK);
}
EVENT.addField("content.type", file.getContentType());
EVENT.addField("action", "flip");
EVENT.addField("flip_vertical", vertical);
EVENT.addField("image_id", file.getId());
EVENT.addField("flip_horizontal", horizontal);
@A_Bangser @FlowConFR #FlowCon
In contrast, how an event is created during a request
@PostMapping("flip")
public ResponseEntity flipImage(...) {
EVENT.addField("content.type", file.getContentType());
EVENT.addField("action", "flip");
EVENT.addField("image_id", file.getId());
EVENT.addField("flip_vertical", vertical);
EVENT.addField("flip_horizontal", horizontal);
...
LOGGER.info("Receiving {} image to flip.", file.getContentType());
byte[] flippedImage = imageService.flip(file, vertical, horizontal);
...
LOGGER.info("Successfully flipped image id: {}", file.getId());
EVENT.addField("action.success", "true");
return new ResponseEntity<>(flippedImage, headers, HttpStatus.OK);
}
LOGGER.info("Successfully flipped image id: {}", file.getId());
EVENT.addField("content.type", file.getContentType());
EVENT.addField("action", "flip");
EVENT.addField("flip_vertical", vertical);
EVENT.addField("image_id", file.getId());
EVENT.addField("flip_horizontal", horizontal);
@A_Bangser @FlowConFR #FlowCon
In contrast, how an event is created during a request
@PostMapping("flip")
public ResponseEntity flipImage(...) {
EVENT.addField("content.type", file.getContentType());
EVENT.addField("action", "flip");
EVENT.addField("image_id", file.getId());
EVENT.addField("flip_vertical", vertical);
EVENT.addField("flip_horizontal", horizontal);
...
LOGGER.info("Receiving {} image to flip.", file.getContentType());
byte[] flippedImage = imageService.flip(file, vertical, horizontal);
...
LOGGER.info("Successfully flipped image id: {}", file.getId());
EVENT.addField("action.success", "true");
return new ResponseEntity<>(flippedImage, headers, HttpStatus.OK);
}
EVENT.addField("content.type", file.getContentType());
EVENT.addField("action", "flip");
EVENT.addField("flip_vertical", vertical);
EVENT.addField("image_id", file.getId());
EVENT.addField("flip_horizontal", horizontal);
@A_Bangser @FlowConFR #FlowCon
In contrast, how an event is created during a request
@PostMapping("flip")
public ResponseEntity flipImage(...) {
EVENT.addField("content.type", file.getContentType());
EVENT.addField("action", "flip");
EVENT.addField("image_id", file.getId());
EVENT.addField("flip_vertical", vertical);
EVENT.addField("flip_horizontal", horizontal);
...
LOGGER.info("Receiving {} image to flip.", file.getContentType());
byte[] flippedImage = imageService.flip(file, vertical, horizontal);
...
LOGGER.info("Successfully flipped image id: {}", file.getId());
EVENT.addField("action.success", "true");
return new ResponseEntity<>(flippedImage, headers, HttpStatus.OK);
}
EVENT.addField("action.success", "true");
EVENT.addField("content.type", file.getContentType());
EVENT.addField("action", "flip");
EVENT.addField("flip_vertical", vertical);
EVENT.addField("image_id", file.getId());
EVENT.addField("flip_horizontal", horizontal);
@A_Bangser @FlowConFR #FlowCon@A_Bangser @FlowConFR #FlowCon
Comparing the outputs
Multiple logs
A single event
@A_Bangser @FlowConFR #FlowCon@A_Bangser @FlowConFR #FlowCon
Making the information easy to query
@A_Bangser @FlowConFR #FlowCon@A_Bangser @FlowConFR #FlowCon
And keeping the information in context
@A_Bangser @FlowConFR #FlowCon@A_Bangser @FlowConFR #FlowCon
Most importantly, making it easy to add more!
@A_Bangser @FlowConFR #FlowCon
In order to combate tribal knowledge based guessing
when debugging our complex systems, we need:
A low friction way to add fields to your
logs for structure and searchability
Allowing application and user context to
be wrapped in a business context
CustomerID:234567VersionOfApp:2
RequestedUri:www.
@A_Bangser @FlowConFR #FlowCon
Let’s understand a couple of these through examples
➔ raw events
➔ no pre-aggregation
➔ structured data
➔ arbitrarily wide events
➔ schema-less-ness
➔ high cardinality dimensions
➔ oriented around the lifecycle of the request
➔ batched up context
➔ static dashboards don’t work, it must be exploratory
@A_Bangser @FlowConFR #FlowCon
Debugging distributed systems is hard
Especially when business impact is on the line.
Let’s talk outages
@A_Bangser @FlowConFR #FlowCon
Hmmm, a warning alert has come in
This is an automated alert based on a warning production service sending a high percent of 500’s in production!
@A_Bangser @FlowConFR #FlowCon
Yup, definitely an issue
@A_Bangser @FlowConFR #FlowCon
All hands on deck, what is happening...and why?
@A_Bangser @FlowConFR #FlowCon
All hands on deck, what is happening...and why?
@A_Bangser @FlowConFR #FlowCon
All hands on deck, what is happening...and why?
@A_Bangser @FlowConFR #FlowCon
All hands on deck, what is happening...and why?
@A_Bangser @FlowConFR #FlowCon
All hands on deck, what is happening...and why?
@A_Bangser @FlowConFR #FlowCon
2+ hrs and still aren’t sure we know what happened
@A_Bangser @FlowConFR #FlowCon
And then it keeps happening
@A_Bangser @FlowConFR #FlowCon
Oncall engineers are not amused
@A_Bangser @FlowConFR #FlowCon
But the service owners weren’t just lounging around
@A_Bangser @FlowConFR #FlowCon
And these were some awesome dashboards
@A_Bangser @FlowConFR #FlowCon
Let’s break down what this dashboard shows
Request Counts Response Latency
@A_Bangser @FlowConFR #FlowCon
Let’s break down what this dashboard shows
Enhanced Images
Original Images
Enhanced Images
Enhanced and resized
Request Counts Response Latency
@A_Bangser @FlowConFR #FlowCon
This dashboard helped limit impact
~3 hours
40 min
@A_Bangser @FlowConFR #FlowCon
And eventually, powerful human pattern matchers
solved the problem
@A_Bangser @FlowConFR #FlowCon
So what happens to this dashboard now?
@A_Bangser @FlowConFR #FlowCon
They have been sent to a farm… with their other friends
@A_Bangser @FlowConFR #FlowCon
Why ditch the dashboards?
The scar tissue of your past outages is not a sufficient
replacement for the creativity required to investigate your
future incidents
https://www.needpix.com/photo/907639/images-leash-leash-polaroid-free-pictures-free-photos-free-images-royalty-free
@A_Bangser @FlowConFR #FlowCon
Let’s revisit those characteristics
➔ raw events
➔ no pre-aggregation
➔ structured data
➔ arbitrarily wide events
➔ schema-less-ness
➔ high cardinality dimensions
➔ oriented around the lifecycle of the request
➔ batched up context
➔ static dashboards don’t work, it must be
exploratory
ByTwitter,CCBY4.0,https://commons.wikimedia.org/w/index.php?curid=80936515
@A_Bangser @FlowConFR #FlowCon
Let’s revisit those characteristics
➔ raw events
➔ no pre-aggregation
➔ structured data
➔ arbitrarily wide events
➔ schema-less-ness
➔ high cardinality dimensions
➔ oriented around the lifecycle of the request
➔ batched up context
➔ static dashboards don’t work, it must be
exploratory
ByTwitter,CCBY4.0,https://commons.wikimedia.org/w/index.php?curid=80936515
The only way to ask new questions
is to keep the original raw data
available and queryable
@A_Bangser @FlowConFR #FlowCon
Let’s revisit those characteristics
➔ raw events
➔ no pre-aggregation
➔ structured data
➔ arbitrarily wide events
➔ schema-less-ness
➔ high cardinality dimensions
➔ oriented around the lifecycle of the request
➔ batched up context
➔ static dashboards don’t work, it must be
exploratory
ByTwitter,CCBY4.0,https://commons.wikimedia.org/w/index.php?curid=80936515
Make data easy to
add details to and
easy to query
@A_Bangser @FlowConFR #FlowCon
Let’s revisit those characteristics
➔ raw events
➔ no pre-aggregation
➔ structured data
➔ arbitrarily wide events
➔ schema-less-ness
➔ high cardinality dimensions
➔ oriented around the lifecycle of the request
➔ batched up context
➔ static dashboards don’t work, it must be
exploratory
ByTwitter,CCBY4.0,https://commons.wikimedia.org/w/index.php?curid=80936515
Empower creative
and shared
exploration based
on business context
@A_Bangser @FlowConFR #FlowCon
Let’s revisit those characteristics
➔ raw events
➔ no pre-aggregation
➔ structured data
➔ arbitrarily wide events
➔ schema-less-ness
➔ high cardinality dimensions
➔ oriented around the lifecycle of the request
➔ batched up context
➔ static dashboards don’t work, it must be
exploratory
ByTwitter,CCBY4.0,https://commons.wikimedia.org/w/index.php?curid=80936515
The only way to ask new questions
is to keep the original raw data
available and queryable
Make data easy to
add details to and
easy to query
Empower creative
and shared
exploration based
on business context
@A_Bangser @FlowConFR #FlowCon
QA
TWU
Looking back journeys are never clear, so why do we
still expect them to be when we start a new one?
Political
Science Major
Data analysis for
investments
A desire to
learn how to
code
Automation
FTW!
An “analyst”
computer
A “DevOps”
friend
engaged me
in his work
onitorama
An infrastructure
project
Platform
Engineering @
Professional
scuba diver
A (slight)
obsession with
observability
@A_Bangser @FlowConFR #FlowCon
Start where you are.
Use what you have.
Do what you can.
- Arthur Ashe
@A_Bangser @FlowConFR #FlowCon
➔ All of tech and product is now asking more interesting questions
➔ We are expecting more of our tooling
➔ We are building new awareness about our services and system
Start where you are.
Use what you have.
Do what you can.
- Arthur Ashe
@A_Bangser @FlowConFR #FlowCon
Thank you!
www.SlideShare.net/
AbigailBangser
@A_Bangser

Mais conteúdo relacionado

Mais procurados

TechSEO Boost 2018: Implementing Hreflang on Legacy Tech Stacks Using Service...
TechSEO Boost 2018: Implementing Hreflang on Legacy Tech Stacks Using Service...TechSEO Boost 2018: Implementing Hreflang on Legacy Tech Stacks Using Service...
TechSEO Boost 2018: Implementing Hreflang on Legacy Tech Stacks Using Service...Catalyst
 
Cucumber From the Ground Up - Joseph Beale
Cucumber From the Ground Up - Joseph BealeCucumber From the Ground Up - Joseph Beale
Cucumber From the Ground Up - Joseph BealeQA or the Highway
 
Migrating existing monolith to serverless in 8 steps
Migrating existing monolith to serverless in 8 stepsMigrating existing monolith to serverless in 8 steps
Migrating existing monolith to serverless in 8 stepsYan Cui
 
SREcon americas 2019 - Latency SLOs Done Right
SREcon americas 2019 - Latency SLOs Done RightSREcon americas 2019 - Latency SLOs Done Right
SREcon americas 2019 - Latency SLOs Done RightFred Moyer
 
FoundConf 2018 Signals Speak - Alexis Sanders
FoundConf 2018 Signals Speak - Alexis SandersFoundConf 2018 Signals Speak - Alexis Sanders
FoundConf 2018 Signals Speak - Alexis SandersAlexis Sanders
 
Inbound 2017: Back to Our Roots with Technical SEO
Inbound 2017: Back to Our Roots with Technical SEOInbound 2017: Back to Our Roots with Technical SEO
Inbound 2017: Back to Our Roots with Technical SEOStephanie Wallace
 
Challenges of building a search engine like web rendering service
Challenges of building a search engine like web rendering serviceChallenges of building a search engine like web rendering service
Challenges of building a search engine like web rendering serviceGiacomo Zecchini
 
Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe
Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, EuropePatterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe
Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, EuropeFlip Kromer
 
Building a social network in under 4 weeks with Serverless and GraphQL
Building a social network in under 4 weeks with Serverless and GraphQLBuilding a social network in under 4 weeks with Serverless and GraphQL
Building a social network in under 4 weeks with Serverless and GraphQLYan Cui
 
How Googlebot Renders (Roleplaying as Google's Web Rendering Service-- D&D st...
How Googlebot Renders (Roleplaying as Google's Web Rendering Service-- D&D st...How Googlebot Renders (Roleplaying as Google's Web Rendering Service-- D&D st...
How Googlebot Renders (Roleplaying as Google's Web Rendering Service-- D&D st...Jamie Indigo
 
How to bring chaos engineering to serverless
How to bring chaos engineering to serverlessHow to bring chaos engineering to serverless
How to bring chaos engineering to serverlessYan Cui
 
TechSEO Boost 2017: SEO Best Practices for JavaScript T-Based Websites
TechSEO Boost 2017: SEO Best Practices for JavaScript T-Based WebsitesTechSEO Boost 2017: SEO Best Practices for JavaScript T-Based Websites
TechSEO Boost 2017: SEO Best Practices for JavaScript T-Based WebsitesCatalyst
 
WE are Doing it Wrong - Dmitry Sharkov
WE are Doing it Wrong - Dmitry SharkovWE are Doing it Wrong - Dmitry Sharkov
WE are Doing it Wrong - Dmitry SharkovQA or the Highway
 
Are you there Page Experience? It's Me, DevTools.
Are you there Page Experience? It's Me, DevTools.Are you there Page Experience? It's Me, DevTools.
Are you there Page Experience? It's Me, DevTools.Rachel Anderson
 
Navigating the critical rendering path - Jamie Alberico - VirtuaCon
Navigating the critical rendering path -  Jamie Alberico - VirtuaConNavigating the critical rendering path -  Jamie Alberico - VirtuaCon
Navigating the critical rendering path - Jamie Alberico - VirtuaConJamie Indigo
 
Automating Google Lighthouse
Automating Google LighthouseAutomating Google Lighthouse
Automating Google LighthouseHamlet Batista
 
Preparing for CDN failure: Why and how
Preparing for CDN failure: Why and howPreparing for CDN failure: Why and how
Preparing for CDN failure: Why and howAaron Peters
 
Docker Docker - Docker Security - Docker
Docker Docker - Docker Security - DockerDocker Docker - Docker Security - Docker
Docker Docker - Docker Security - DockerBoyd Hemphill
 
Automated Duplicate Content Consolidation with Google Cloud Functions
Automated Duplicate Content Consolidation with Google Cloud FunctionsAutomated Duplicate Content Consolidation with Google Cloud Functions
Automated Duplicate Content Consolidation with Google Cloud FunctionsHamlet Batista
 

Mais procurados (20)

TechSEO Boost 2018: Implementing Hreflang on Legacy Tech Stacks Using Service...
TechSEO Boost 2018: Implementing Hreflang on Legacy Tech Stacks Using Service...TechSEO Boost 2018: Implementing Hreflang on Legacy Tech Stacks Using Service...
TechSEO Boost 2018: Implementing Hreflang on Legacy Tech Stacks Using Service...
 
Cucumber From the Ground Up - Joseph Beale
Cucumber From the Ground Up - Joseph BealeCucumber From the Ground Up - Joseph Beale
Cucumber From the Ground Up - Joseph Beale
 
Migrating existing monolith to serverless in 8 steps
Migrating existing monolith to serverless in 8 stepsMigrating existing monolith to serverless in 8 steps
Migrating existing monolith to serverless in 8 steps
 
SREcon americas 2019 - Latency SLOs Done Right
SREcon americas 2019 - Latency SLOs Done RightSREcon americas 2019 - Latency SLOs Done Right
SREcon americas 2019 - Latency SLOs Done Right
 
FoundConf 2018 Signals Speak - Alexis Sanders
FoundConf 2018 Signals Speak - Alexis SandersFoundConf 2018 Signals Speak - Alexis Sanders
FoundConf 2018 Signals Speak - Alexis Sanders
 
Inbound 2017: Back to Our Roots with Technical SEO
Inbound 2017: Back to Our Roots with Technical SEOInbound 2017: Back to Our Roots with Technical SEO
Inbound 2017: Back to Our Roots with Technical SEO
 
Challenges of building a search engine like web rendering service
Challenges of building a search engine like web rendering serviceChallenges of building a search engine like web rendering service
Challenges of building a search engine like web rendering service
 
Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe
Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, EuropePatterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe
Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe
 
Building a social network in under 4 weeks with Serverless and GraphQL
Building a social network in under 4 weeks with Serverless and GraphQLBuilding a social network in under 4 weeks with Serverless and GraphQL
Building a social network in under 4 weeks with Serverless and GraphQL
 
Devoxx 2014 Monitoring
Devoxx 2014 Monitoring Devoxx 2014 Monitoring
Devoxx 2014 Monitoring
 
How Googlebot Renders (Roleplaying as Google's Web Rendering Service-- D&D st...
How Googlebot Renders (Roleplaying as Google's Web Rendering Service-- D&D st...How Googlebot Renders (Roleplaying as Google's Web Rendering Service-- D&D st...
How Googlebot Renders (Roleplaying as Google's Web Rendering Service-- D&D st...
 
How to bring chaos engineering to serverless
How to bring chaos engineering to serverlessHow to bring chaos engineering to serverless
How to bring chaos engineering to serverless
 
TechSEO Boost 2017: SEO Best Practices for JavaScript T-Based Websites
TechSEO Boost 2017: SEO Best Practices for JavaScript T-Based WebsitesTechSEO Boost 2017: SEO Best Practices for JavaScript T-Based Websites
TechSEO Boost 2017: SEO Best Practices for JavaScript T-Based Websites
 
WE are Doing it Wrong - Dmitry Sharkov
WE are Doing it Wrong - Dmitry SharkovWE are Doing it Wrong - Dmitry Sharkov
WE are Doing it Wrong - Dmitry Sharkov
 
Are you there Page Experience? It's Me, DevTools.
Are you there Page Experience? It's Me, DevTools.Are you there Page Experience? It's Me, DevTools.
Are you there Page Experience? It's Me, DevTools.
 
Navigating the critical rendering path - Jamie Alberico - VirtuaCon
Navigating the critical rendering path -  Jamie Alberico - VirtuaConNavigating the critical rendering path -  Jamie Alberico - VirtuaCon
Navigating the critical rendering path - Jamie Alberico - VirtuaCon
 
Automating Google Lighthouse
Automating Google LighthouseAutomating Google Lighthouse
Automating Google Lighthouse
 
Preparing for CDN failure: Why and how
Preparing for CDN failure: Why and howPreparing for CDN failure: Why and how
Preparing for CDN failure: Why and how
 
Docker Docker - Docker Security - Docker
Docker Docker - Docker Security - DockerDocker Docker - Docker Security - Docker
Docker Docker - Docker Security - Docker
 
Automated Duplicate Content Consolidation with Google Cloud Functions
Automated Duplicate Content Consolidation with Google Cloud FunctionsAutomated Duplicate Content Consolidation with Google Cloud Functions
Automated Duplicate Content Consolidation with Google Cloud Functions
 

Semelhante a Observability - Experiencing the “why” behind the jargon (FlowCon 2019)

Data Driven DevOps
Data Driven DevOpsData Driven DevOps
Data Driven DevOpsLeon Stigter
 
Updates on Offline: “My AppCache won’t come back” and “ServiceWorker Tricks ...
Updates on Offline: “My AppCache won’t come back” and  “ServiceWorker Tricks ...Updates on Offline: “My AppCache won’t come back” and  “ServiceWorker Tricks ...
Updates on Offline: “My AppCache won’t come back” and “ServiceWorker Tricks ...Natasha Rooney
 
FirefoxOS Meetup - Updates on Offline in HTML5 Web Apps
FirefoxOS Meetup - Updates on Offline in HTML5 Web AppsFirefoxOS Meetup - Updates on Offline in HTML5 Web Apps
FirefoxOS Meetup - Updates on Offline in HTML5 Web AppsNatasha Rooney
 
Concurrent Ruby Application Servers
Concurrent Ruby Application ServersConcurrent Ruby Application Servers
Concurrent Ruby Application ServersLin Jen-Shin
 
Atmosphere Conference 2015: The 10 Myths of DevOps
Atmosphere Conference 2015: The 10 Myths of DevOpsAtmosphere Conference 2015: The 10 Myths of DevOps
Atmosphere Conference 2015: The 10 Myths of DevOpsPROIDEA
 
Data driven devops as presented at QCon London 2018
Data driven devops as presented at QCon London 2018Data driven devops as presented at QCon London 2018
Data driven devops as presented at QCon London 2018Baruch Sadogursky
 
The Power of Open Data
The Power of Open DataThe Power of Open Data
The Power of Open DataPhil Windley
 
Microservices, Events, and Breaking the Data Monolith with Kafka
Microservices, Events, and Breaking the Data Monolith with KafkaMicroservices, Events, and Breaking the Data Monolith with Kafka
Microservices, Events, and Breaking the Data Monolith with KafkaVMware Tanzu
 
GeeCON 2015 DevOps and the dark side
GeeCON 2015 DevOps and the dark side GeeCON 2015 DevOps and the dark side
GeeCON 2015 DevOps and the dark side Steve Poole
 
Tom Capper Mozcon 2021 - Core Web Vitals - The Fast & The Spurious
Tom Capper Mozcon 2021 - Core Web Vitals - The Fast & The SpuriousTom Capper Mozcon 2021 - Core Web Vitals - The Fast & The Spurious
Tom Capper Mozcon 2021 - Core Web Vitals - The Fast & The SpuriousTom Capper
 
Starting Your DevOps Journey – Practical Tips for Ops
Starting Your DevOps Journey – Practical Tips for OpsStarting Your DevOps Journey – Practical Tips for Ops
Starting Your DevOps Journey – Practical Tips for OpsDynatrace
 
Majestic Workshop on Backlinks and Link Building
Majestic Workshop on Backlinks and Link BuildingMajestic Workshop on Backlinks and Link Building
Majestic Workshop on Backlinks and Link BuildingSante J. Achille
 
Deploying and Testing Microservices
Deploying and Testing MicroservicesDeploying and Testing Microservices
Deploying and Testing MicroservicesThoughtworks
 
Web Development Foundation & Team Collaboration
Web Development Foundation & Team CollaborationWeb Development Foundation & Team Collaboration
Web Development Foundation & Team CollaborationSupanat Potiwarakorn
 
8 Lessons Learned from Using Kafka in 1500 microservices - confluent streamin...
8 Lessons Learned from Using Kafka in 1500 microservices - confluent streamin...8 Lessons Learned from Using Kafka in 1500 microservices - confluent streamin...
8 Lessons Learned from Using Kafka in 1500 microservices - confluent streamin...Natan Silnitsky
 
How Shutl Delivers Even Faster Using Neo4J
How Shutl Delivers Even Faster Using Neo4JHow Shutl Delivers Even Faster Using Neo4J
How Shutl Delivers Even Faster Using Neo4JC4Media
 
Reactive Streams in the Web
Reactive Streams in the WebReactive Streams in the Web
Reactive Streams in the WebFlorian Stefan
 
Build your own analytics power tools
Build your own analytics power toolsBuild your own analytics power tools
Build your own analytics power toolsAlban Gérôme
 

Semelhante a Observability - Experiencing the “why” behind the jargon (FlowCon 2019) (20)

Data Driven DevOps
Data Driven DevOpsData Driven DevOps
Data Driven DevOps
 
Updates on Offline: “My AppCache won’t come back” and “ServiceWorker Tricks ...
Updates on Offline: “My AppCache won’t come back” and  “ServiceWorker Tricks ...Updates on Offline: “My AppCache won’t come back” and  “ServiceWorker Tricks ...
Updates on Offline: “My AppCache won’t come back” and “ServiceWorker Tricks ...
 
FirefoxOS Meetup - Updates on Offline in HTML5 Web Apps
FirefoxOS Meetup - Updates on Offline in HTML5 Web AppsFirefoxOS Meetup - Updates on Offline in HTML5 Web Apps
FirefoxOS Meetup - Updates on Offline in HTML5 Web Apps
 
Concurrent Ruby Application Servers
Concurrent Ruby Application ServersConcurrent Ruby Application Servers
Concurrent Ruby Application Servers
 
Devoxx 2014 monitoring
Devoxx 2014 monitoringDevoxx 2014 monitoring
Devoxx 2014 monitoring
 
Atmosphere Conference 2015: The 10 Myths of DevOps
Atmosphere Conference 2015: The 10 Myths of DevOpsAtmosphere Conference 2015: The 10 Myths of DevOps
Atmosphere Conference 2015: The 10 Myths of DevOps
 
Data driven devops as presented at QCon London 2018
Data driven devops as presented at QCon London 2018Data driven devops as presented at QCon London 2018
Data driven devops as presented at QCon London 2018
 
The Power of Open Data
The Power of Open DataThe Power of Open Data
The Power of Open Data
 
Microservices, Events, and Breaking the Data Monolith with Kafka
Microservices, Events, and Breaking the Data Monolith with KafkaMicroservices, Events, and Breaking the Data Monolith with Kafka
Microservices, Events, and Breaking the Data Monolith with Kafka
 
GeeCON 2015 DevOps and the dark side
GeeCON 2015 DevOps and the dark side GeeCON 2015 DevOps and the dark side
GeeCON 2015 DevOps and the dark side
 
Tom Capper Mozcon 2021 - Core Web Vitals - The Fast & The Spurious
Tom Capper Mozcon 2021 - Core Web Vitals - The Fast & The SpuriousTom Capper Mozcon 2021 - Core Web Vitals - The Fast & The Spurious
Tom Capper Mozcon 2021 - Core Web Vitals - The Fast & The Spurious
 
Starting Your DevOps Journey – Practical Tips for Ops
Starting Your DevOps Journey – Practical Tips for OpsStarting Your DevOps Journey – Practical Tips for Ops
Starting Your DevOps Journey – Practical Tips for Ops
 
Majestic Workshop on Backlinks and Link Building
Majestic Workshop on Backlinks and Link BuildingMajestic Workshop on Backlinks and Link Building
Majestic Workshop on Backlinks and Link Building
 
Deploying and Testing Microservices
Deploying and Testing MicroservicesDeploying and Testing Microservices
Deploying and Testing Microservices
 
Web Development Foundation & Team Collaboration
Web Development Foundation & Team CollaborationWeb Development Foundation & Team Collaboration
Web Development Foundation & Team Collaboration
 
8 Lessons Learned from Using Kafka in 1500 microservices - confluent streamin...
8 Lessons Learned from Using Kafka in 1500 microservices - confluent streamin...8 Lessons Learned from Using Kafka in 1500 microservices - confluent streamin...
8 Lessons Learned from Using Kafka in 1500 microservices - confluent streamin...
 
How Shutl Delivers Even Faster Using Neo4J
How Shutl Delivers Even Faster Using Neo4JHow Shutl Delivers Even Faster Using Neo4J
How Shutl Delivers Even Faster Using Neo4J
 
Reactive Streams in the Web
Reactive Streams in the WebReactive Streams in the Web
Reactive Streams in the Web
 
Shift left-devoxx-pl
Shift left-devoxx-plShift left-devoxx-pl
Shift left-devoxx-pl
 
Build your own analytics power tools
Build your own analytics power toolsBuild your own analytics power tools
Build your own analytics power tools
 

Mais de Abigail Bangser

DevExForPlatformEngineers, introducing Kratix
DevExForPlatformEngineers, introducing KratixDevExForPlatformEngineers, introducing Kratix
DevExForPlatformEngineers, introducing KratixAbigail Bangser
 
Building a great internal platform starts with the API
Building a great internal platform starts with the API Building a great internal platform starts with the API
Building a great internal platform starts with the API Abigail Bangser
 
Providing as-a-Service Across Multi-Cluster Kubernetes
Providing  				  as-a-Service Across Multi-Cluster KubernetesProviding  				  as-a-Service Across Multi-Cluster Kubernetes
Providing as-a-Service Across Multi-Cluster KubernetesAbigail Bangser
 
Platforms aren't tools, they are experiences. And Kubernetes isn’t a platfor...
Platforms aren't tools, they are experiences.  And Kubernetes isn’t a platfor...Platforms aren't tools, they are experiences.  And Kubernetes isn’t a platfor...
Platforms aren't tools, they are experiences. And Kubernetes isn’t a platfor...Abigail Bangser
 
Flipping the script: How to take the first step towards internal developer pl...
Flipping the script: How to take the first step towards internal developer pl...Flipping the script: How to take the first step towards internal developer pl...
Flipping the script: How to take the first step towards internal developer pl...Abigail Bangser
 
Tutorial Becoming a Kubernetes Developer_ Writing Your First Operator
Tutorial Becoming a Kubernetes Developer_ Writing Your First OperatorTutorial Becoming a Kubernetes Developer_ Writing Your First Operator
Tutorial Becoming a Kubernetes Developer_ Writing Your First OperatorAbigail Bangser
 
Demystifying observability
Demystifying observability Demystifying observability
Demystifying observability Abigail Bangser
 
2020 03-19 introducing-chaosengineering
2020 03-19 introducing-chaosengineering2020 03-19 introducing-chaosengineering
2020 03-19 introducing-chaosengineeringAbigail Bangser
 
Empowerment through Observability - Keynote
Empowerment through Observability - KeynoteEmpowerment through Observability - Keynote
Empowerment through Observability - KeynoteAbigail Bangser
 
2018-05-09_CRAFTConf_FirstStepsMovingToTheCloud
2018-05-09_CRAFTConf_FirstStepsMovingToTheCloud2018-05-09_CRAFTConf_FirstStepsMovingToTheCloud
2018-05-09_CRAFTConf_FirstStepsMovingToTheCloudAbigail Bangser
 

Mais de Abigail Bangser (10)

DevExForPlatformEngineers, introducing Kratix
DevExForPlatformEngineers, introducing KratixDevExForPlatformEngineers, introducing Kratix
DevExForPlatformEngineers, introducing Kratix
 
Building a great internal platform starts with the API
Building a great internal platform starts with the API Building a great internal platform starts with the API
Building a great internal platform starts with the API
 
Providing as-a-Service Across Multi-Cluster Kubernetes
Providing  				  as-a-Service Across Multi-Cluster KubernetesProviding  				  as-a-Service Across Multi-Cluster Kubernetes
Providing as-a-Service Across Multi-Cluster Kubernetes
 
Platforms aren't tools, they are experiences. And Kubernetes isn’t a platfor...
Platforms aren't tools, they are experiences.  And Kubernetes isn’t a platfor...Platforms aren't tools, they are experiences.  And Kubernetes isn’t a platfor...
Platforms aren't tools, they are experiences. And Kubernetes isn’t a platfor...
 
Flipping the script: How to take the first step towards internal developer pl...
Flipping the script: How to take the first step towards internal developer pl...Flipping the script: How to take the first step towards internal developer pl...
Flipping the script: How to take the first step towards internal developer pl...
 
Tutorial Becoming a Kubernetes Developer_ Writing Your First Operator
Tutorial Becoming a Kubernetes Developer_ Writing Your First OperatorTutorial Becoming a Kubernetes Developer_ Writing Your First Operator
Tutorial Becoming a Kubernetes Developer_ Writing Your First Operator
 
Demystifying observability
Demystifying observability Demystifying observability
Demystifying observability
 
2020 03-19 introducing-chaosengineering
2020 03-19 introducing-chaosengineering2020 03-19 introducing-chaosengineering
2020 03-19 introducing-chaosengineering
 
Empowerment through Observability - Keynote
Empowerment through Observability - KeynoteEmpowerment through Observability - Keynote
Empowerment through Observability - Keynote
 
2018-05-09_CRAFTConf_FirstStepsMovingToTheCloud
2018-05-09_CRAFTConf_FirstStepsMovingToTheCloud2018-05-09_CRAFTConf_FirstStepsMovingToTheCloud
2018-05-09_CRAFTConf_FirstStepsMovingToTheCloud
 

Último

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 

Último (20)

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 

Observability - Experiencing the “why” behind the jargon (FlowCon 2019)

  • 1. @A_Bangser @FlowConFR #FlowCon My slides are / will be available for you at: @A_Bangser @FlowConFR #FlowCon Observability - Experiencing the “why” behind the jargon Abby Bangser https://www.slideshare.net/AbigailBangser
  • 2. @A_Bangser @FlowConFR #FlowCon Observability In control theory, observability is a measure of how well internal states of a system can be inferred from knowledge of its external outputs.
  • 3. @A_Bangser @FlowConFR #FlowCon Observability In control theory, observability is a measure of how well internal states of a system can be inferred from knowledge of its external outputs.
  • 4. @A_Bangser @FlowConFR #FlowCon “measure of how well” means observability is a scale
  • 5. @A_Bangser @FlowConFR #FlowCon “measure of how well” means observability is a scale
  • 6. @A_Bangser @FlowConFR #FlowCon “measure of how well” means observability is a scale Incident triage Incident triage happening?!
  • 7. @A_Bangser @FlowConFR #FlowCon “measure of how well” means observability is a scale How easy is it to answer a new question without deploying new code? Incident triage Incident triage happening?! observability observability observability
  • 8. @A_Bangser @FlowConFR #FlowCon Observability In control theory, observability is a measure of how well internal states of a system can be inferred from knowledge of its external outputs.
  • 9. @A_Bangser @FlowConFR #FlowCon External outputs help us answer these questions
  • 10. @A_Bangser @FlowConFR #FlowCon External outputs help us answer these questions
  • 11. @A_Bangser @FlowConFR #FlowCon External outputs help us answer these questions
  • 12. @A_Bangser @FlowConFR #FlowCon External outputs help us answer these questions
  • 13. @A_Bangser @FlowConFR #FlowCon So you might be thinking… “right, monitoring” https://bravenewgeek.com/wp-content/uploads/2019/10/monitoring_vs_observability_overlay-1024x539.png
  • 14. @A_Bangser @FlowConFR #FlowCon So you might be thinking… “right, monitoring” https://bravenewgeek.com/wp-content/uploads/2019/10/monitoring_vs_observability_overlay-1024x539.png
  • 15. @A_Bangser @FlowConFR #FlowCon So you might be thinking… “right, monitoring” https://bravenewgeek.com/wp-content/uploads/2019/10/monitoring_vs_observability_overlay-1024x539.png
  • 16. @A_Bangser @FlowConFR #FlowCon True observability is discovering new behaviours https://bravenewgeek.com/wp-content/uploads/2019/10/monitoring_vs_observability_overlay-1024x539.png
  • 17. @A_Bangser @FlowConFR #FlowCon Observability In control theory, observability is a measure of how well internal states of a system can be inferred from knowledge of its external outputs.
  • 18. @A_Bangser @FlowConFR #FlowCon Characteristics of what generates valuable outputs https://thenewstack.io/observability-a-3-year-retrospective/ ➔ raw events ➔ no pre-aggregation ➔ structured data ➔ arbitrarily wide events ➔ schema-less-ness ➔ high cardinality dimensions ➔ oriented around the lifecycle of the request ➔ batched up context ➔ static dashboards don’t work, it must be exploratory
  • 19. @A_Bangser @FlowConFR #FlowCon Characteristics of what generates valuable outputs https://thenewstack.io/observability-a-3-year-retrospective/ ➔ raw events ➔ no pre-aggregation ➔ structured data ➔ arbitrarily wide events ➔ schema-less-ness ➔ high cardinality dimensions ➔ oriented around the lifecycle of the request ➔ batched up context ➔ static dashboards don’t work, it must be exploratory ByTwitter,CCBY4.0, https://commons.wikimedia.org/w/index.php?curid=76921548
  • 20. @A_Bangser @FlowConFR #FlowCon Let’s understand a couple of these through examples ➔ raw events ➔ no pre-aggregation ➔ structured data ➔ arbitrarily wide events ➔ schema-less-ness ➔ high cardinality dimensions ➔ oriented around the lifecycle of the request ➔ batched up context ➔ static dashboards don’t work, it must be exploratory
  • 21. @A_Bangser @FlowConFR #FlowCon Let’s understand a couple of these through examples ➔ raw events ➔ no pre-aggregation ➔ structured data ➔ arbitrarily wide events ➔ schema-less-ness ➔ high cardinality dimensions ➔ oriented around the lifecycle of the request ➔ batched up context ➔ static dashboards don’t work, it must be exploratory
  • 22. @A_Bangser @FlowConFR #FlowCon The promise of monitoring vs my reality My rollercoaster journey with understanding metrics and pre-aggregation starts back in 2016...
  • 23. @A_Bangser @FlowConFR #FlowCon Monitorama 2016 - an awakening Lessons include… ➔ It is not just testing that is dead ➔ Wow! There is a world of available data I have no idea about ➔ These tools are so cool...wait, what are these tools?
  • 24. @A_Bangser @FlowConFR #FlowCon Metrics can track success (and failure) of changes made https://landing.google.com/sre/sre-book/chapters/monitoring-distributed-systems/#xref_monitoring_golden-signals
  • 25. @A_Bangser @FlowConFR #FlowCon An ask: I want to monitor live systems An opportunity: Help create a client’s first cloud infrastructure @A_Bangser @FlowConFR #FlowCon
  • 26. @A_Bangser @FlowConFR #FlowCon An operations focused project changed my tool chain
  • 27. @A_Bangser @FlowConFR #FlowCon An operations focused project changed my tool chain
  • 28. @A_Bangser @FlowConFR #FlowCon@A_Bangser @FlowConFR #FlowCon And then… just like testability, operability became hard to prioritise
  • 29. @A_Bangser @FlowConFR #FlowCon@A_Bangser @FlowConFR #FlowCon Two years and many projects later Hobbsy had a plan Track latency over 4 weeks and alert when current trends exceed 2 standard deviations 2standarddeviations
  • 30. @A_Bangser @FlowConFR #FlowCon@A_Bangser @FlowConFR #FlowCon Two years and many projects later Hobbsy had a plan Track latency over 4 weeks and alert when current trends exceed 2 standard deviations 2standarddeviations
  • 31. @A_Bangser @FlowConFR #FlowCon To do this at MOO s / MOO / any company over a few years old / ➔ 40 services ➔ 4 core languages ➔ 3 eras of architectural decisions ➔ 2 transport protocols (http and gRPC)
  • 32. @A_Bangser @FlowConFR #FlowCon To do this at MOO s / MOO / any company over a few years old / ➔ 40 services ➔ 4 core languages ➔ 3 eras of architectural decisions ➔ 2 transport protocols (http and gRPC) ...and a partridge in a pear tree
  • 33. @A_Bangser @FlowConFR #FlowCon@A_Bangser @FlowConFR #FlowCon The plan: Standardise metrics across the estate
  • 34. @A_Bangser @FlowConFR #FlowCon@A_Bangser @FlowConFR #FlowCon Consistency across services created so much learning
  • 35. @A_Bangser @FlowConFR #FlowCon@A_Bangser @FlowConFR #FlowCon But...
  • 36. @A_Bangser @FlowConFR #FlowCon@A_Bangser @FlowConFR #FlowCon Our data collection made certain assumptions which in the end required re-collecting in a different way
  • 37. @A_Bangser @FlowConFR #FlowCon@A_Bangser @FlowConFR #FlowCon How histograms gets generated in a time series DB le= 0.05 http_requests_seconds_bucket le= 0.1 le= 0.5 le= 1 le= 5 le= +inf * “le” stands for “less than or equal to”
  • 38. @A_Bangser @FlowConFR #FlowCon@A_Bangser @FlowConFR #FlowCon How histograms gets generated in a time series DB le= 0.05 http_requests_seconds_bucket le= 0.1 le= 0.5 le= 1 le= 5 le= +inf * “le” stands for “less than or equal to” www.moo.com in 0.25 seconds
  • 39. @A_Bangser @FlowConFR #FlowCon@A_Bangser @FlowConFR #FlowCon How histograms gets generated in a time series DB le= 0.05 http_requests_seconds_bucket le= 0.1 le= 0.5 le= 1 le= 5 le= +inf * “le” stands for “less than or equal to” www.moo.com in 0.25 seconds
  • 40. @A_Bangser @FlowConFR #FlowCon@A_Bangser @FlowConFR #FlowCon How histograms gets generated in a time series DB le= 0.05 http_requests_seconds_bucket le= 0.1 le= 0.5 le= 1 le= 5 le= +inf * “le” stands for “less than or equal to” www.moo.com/big_file in 5 seconds www.moo.com in 0.25 seconds
  • 41. @A_Bangser @FlowConFR #FlowCon@A_Bangser @FlowConFR #FlowCon How histograms gets generated in a time series DB le= 0.05 http_requests_seconds_bucket le= 0.1 le= 0.5 le= 1 le= 5 le= +inf * “le” stands for “less than or equal to” www.moo.com/big_file in 5 seconds www.moo.com in 0.25 seconds
  • 42. @A_Bangser @FlowConFR #FlowCon@A_Bangser @FlowConFR #FlowCon We collected counts of how many requests per bucket le= .05 http_requests_seconds_bucket le= .1 le= .5 le= 1 le= 5 le= +inf Offset 1week2week3week4week le= .05 le= .1 le= .5 le= 1 le= 5 le= +inf le= .05 le= .1 le= .5 le= 1 le= 5 le= +inf le= .05 le= .1 le= .5 le= 1 le= 5 le= +inf
  • 43. @A_Bangser @FlowConFR #FlowCon@A_Bangser @FlowConFR #FlowCon The data we had collected, we had to throw away http_requests_seconds_bucket Offset 1week2week3week4week le= .05 le= 5le= .5le= .1 le= +infle= 1 le= .05 le= .1 le= +infle= 1 le= 5le= .1 le= +infle= 1 le= .05 le= 5le= .1 le= .5 le= .05 le= .5 le= .5 le= 1 le= 5 le= +inf
  • 44. @A_Bangser @FlowConFR #FlowCon@A_Bangser @FlowConFR #FlowCon At least the update was made, now we are all set right?
  • 45. @A_Bangser @FlowConFR #FlowCon le= .05 le= 5le= .1 le= .5 le= 1 le= +inf Except, that 99th percentile...what does that actually mean?
  • 46. @A_Bangser @FlowConFR #FlowCon Let’s see what our logs say about it
  • 47. @A_Bangser @FlowConFR #FlowCon Just 1% of 500,000 requests applies to 56,000 people
  • 48. @A_Bangser @FlowConFR #FlowCon To see >10 seconds, I would need the 99.996%
  • 49. @A_Bangser @FlowConFR #FlowCon@A_Bangser @FlowConFR #FlowCon So, while consistent metrics trending over time was a big step forward... In retrospect, these experiences were not mature observability
  • 50. @A_Bangser @FlowConFR #FlowCon Why avoid pre-aggregation? Because you can never regain the original context and detail, you can only ever ask predetermined questions
  • 51. @A_Bangser @FlowConFR #FlowCon Let’s understand a couple of these through examples ➔ raw events ➔ no pre-aggregation ➔ structured data ➔ arbitrarily wide events ➔ schema-less-ness ➔ high cardinality dimensions ➔ oriented around the lifecycle of the request ➔ batched up context ➔ static dashboards don’t work, it must be exploratory
  • 52. @A_Bangser @FlowConFR #FlowCon Data is not the same as information Step one is accepting that while sentences may be readable. <key : value> pairs are more easily queried.
  • 53. @A_Bangser @FlowConFR #FlowCon Even from the first “Hello World” we humans logged
  • 54. @A_Bangser @FlowConFR #FlowCon And from there we wanted more information 7a82dd3a
  • 55. @A_Bangser @FlowConFR #FlowCon And from there we wanted more information 7a82dd3a
  • 56. @A_Bangser @FlowConFR #FlowCon So then we backfilled in structure grok { match => [ "Request", "%{URIPROTO:request_uri_scheme}:// %{HOSTNAME:request_uri_host}(?::%{POSINT:request_uri_port}) ?%{URIPATH:request_uri_path}(?:%{URIPARAM:request_uri_query})?" ]} }
  • 57. @A_Bangser @FlowConFR #FlowCon And of course, from there we wanted more mutate { split => { "uri_array" => "/"} add_field => { "uri_root" => ["/%{[uri_array][1]}"] "uri_first" => ["/%{[uri_array][2]}"] "uri_second" => ["/%{[uri_array][3]}"] "uri_root_first" => "%{uri_root}%{uri_first}" "uri_root_second" => "%{uri_root}%{uri_first}%{uri_second}" }
  • 58. @A_Bangser @FlowConFR #FlowCon And even looking past the bad fields values, lots of servers means lots of intermingled logs
  • 59. @A_Bangser @FlowConFR #FlowCon And even looking past the bad fields values, lots of servers means lots of intermingled logs
  • 60. @A_Bangser @FlowConFR #FlowCon Rewind...how are logs written during a request?
  • 61. @A_Bangser @FlowConFR #FlowCon Detailing how logs get written during a request @PostMapping("flip") public ResponseEntity flipImage(@RequestParam("image") MultipartFile file, @RequestParam(value = "vertical") Boolean vertical, @RequestParam(value = "horizontal") Boolean horizontal) { if (file.getContentType() != null) { LOGGER.warn("Wrong content type uploaded: {}", file.getContentType()); return new ResponseEntity<>("Wrong content type uploaded: " + file.getContentType()); } LOGGER.info("Receiving {} image to flip.", file.getContentType()); byte[] flippedImage = imageService.flip(file, vertical, horizontal); if (flippedImage == null) { return new ResponseEntity<>("Failed to flip image", HttpStatus.INTERNAL_SERVER_ERROR); } LOGGER.info("Successfully flipped image id: {}", file.getId()); return new ResponseEntity<>(flippedImage, headers, HttpStatus.OK); }
  • 62. @A_Bangser @FlowConFR #FlowCon Detailing how logs get written during a request @PostMapping("flip") public ResponseEntity flipImage(@RequestParam("image") MultipartFile file, @RequestParam(value = "vertical") Boolean vertical, @RequestParam(value = "horizontal") Boolean horizontal) { if (file.getContentType() != null) { LOGGER.warn("Wrong content type uploaded: {}", file.getContentType()); return new ResponseEntity<>("Wrong content type uploaded: " + file.getContentType()); } LOGGER.info("Receiving {} image to flip.", file.getContentType()); byte[] flippedImage = imageService.flip(file, vertical, horizontal); if (flippedImage == null) { return new ResponseEntity<>("Failed to flip image", HttpStatus.INTERNAL_SERVER_ERROR); } LOGGER.info("Successfully flipped image id: {}", file.getId()); return new ResponseEntity<>(flippedImage, headers, HttpStatus.OK); }
  • 63. @A_Bangser @FlowConFR #FlowCon Detailing how logs get written during a request @PostMapping("flip") public ResponseEntity flipImage(@RequestParam("image") MultipartFile file, @RequestParam(value = "vertical") Boolean vertical, @RequestParam(value = "horizontal") Boolean horizontal) { if (file.getContentType() != null) { LOGGER.warn("Wrong content type uploaded: {}", file.getContentType()); return new ResponseEntity<>("Wrong content type uploaded: " + file.getContentType()); } LOGGER.info("Receiving {} image to flip.", file.getContentType()); byte[] flippedImage = imageService.flip(file, vertical, horizontal); if (flippedImage == null) { return new ResponseEntity<>("Failed to flip image", HttpStatus.INTERNAL_SERVER_ERROR); } LOGGER.info("Successfully flipped image id: {}", file.getId()); return new ResponseEntity<>(flippedImage, headers, HttpStatus.OK); } LOGGER.info("Receiving {} image to flip.", file.getContentType());
  • 64. @A_Bangser @FlowConFR #FlowCon Detailing how logs get written during a request @PostMapping("flip") public ResponseEntity flipImage(@RequestParam("image") MultipartFile file, @RequestParam(value = "vertical") Boolean vertical, @RequestParam(value = "horizontal") Boolean horizontal) { if (file.getContentType() != null) { LOGGER.warn("Wrong content type uploaded: {}", file.getContentType()); return new ResponseEntity<>("Wrong content type uploaded: " + file.getContentType()); } LOGGER.info("Receiving {} image to flip.", file.getContentType()); byte[] flippedImage = imageService.flip(file, vertical, horizontal); if (flippedImage == null) { return new ResponseEntity<>("Failed to flip image", HttpStatus.INTERNAL_SERVER_ERROR); } LOGGER.info("Successfully flipped image id: {}", file.getId()); return new ResponseEntity<>(flippedImage, headers, HttpStatus.OK); } LOGGER.info("Receiving {} image to flip.", file.getContentType());
  • 65. @A_Bangser @FlowConFR #FlowCon Detailing how logs get written during a request @PostMapping("flip") public ResponseEntity flipImage(@RequestParam("image") MultipartFile file, @RequestParam(value = "vertical") Boolean vertical, @RequestParam(value = "horizontal") Boolean horizontal) { if (file.getContentType() != null) { LOGGER.warn("Wrong content type uploaded: {}", file.getContentType()); return new ResponseEntity<>("Wrong content type uploaded: " + file.getContentType()); } LOGGER.info("Receiving {} image to flip.", file.getContentType()); byte[] flippedImage = imageService.flip(file, vertical, horizontal); if (flippedImage == null) { return new ResponseEntity<>("Failed to flip image", HttpStatus.INTERNAL_SERVER_ERROR); } LOGGER.info("Successfully flipped image id: {}", file.getId()); return new ResponseEntity<>(flippedImage, headers, HttpStatus.OK); }
  • 66. @A_Bangser @FlowConFR #FlowCon Detailing how logs get written during a request @PostMapping("flip") public ResponseEntity flipImage(@RequestParam("image") MultipartFile file, @RequestParam(value = "vertical") Boolean vertical, @RequestParam(value = "horizontal") Boolean horizontal) { if (file.getContentType() != null) { LOGGER.warn("Wrong content type uploaded: {}", file.getContentType()); return new ResponseEntity<>("Wrong content type uploaded: " + file.getContentType()); } LOGGER.info("Receiving {} image to flip.", file.getContentType()); byte[] flippedImage = imageService.flip(file, vertical, horizontal); if (flippedImage == null) { return new ResponseEntity<>("Failed to flip image", HttpStatus.INTERNAL_SERVER_ERROR); } LOGGER.info("Successfully flipped image id: {}", file.getId()); return new ResponseEntity<>(flippedImage, headers, HttpStatus.OK); }
  • 67. @A_Bangser @FlowConFR #FlowCon Detailing how logs get written during a request @PostMapping("flip") public ResponseEntity flipImage(@RequestParam("image") MultipartFile file, @RequestParam(value = "vertical") Boolean vertical, @RequestParam(value = "horizontal") Boolean horizontal) { if (file.getContentType() != null) { LOGGER.warn("Wrong content type uploaded: {}", file.getContentType()); return new ResponseEntity<>("Wrong content type uploaded: " + file.getContentType()); } LOGGER.info("Receiving {} image to flip.", file.getContentType()); byte[] flippedImage = imageService.flip(file, vertical, horizontal); if (flippedImage == null) { return new ResponseEntity<>("Failed to flip image", HttpStatus.INTERNAL_SERVER_ERROR); } LOGGER.info("Successfully flipped image id: {}", file.getId()); return new ResponseEntity<>(flippedImage, headers, HttpStatus.OK); } LOGGER.info("Successfully flipped image id: {}", file.getId()");
  • 68. @A_Bangser @FlowConFR #FlowCon Detailing how logs get written during a request @PostMapping("flip") public ResponseEntity flipImage(@RequestParam("image") MultipartFile file, @RequestParam(value = "vertical") Boolean vertical, @RequestParam(value = "horizontal") Boolean horizontal) { if (file.getContentType() != null) { LOGGER.warn("Wrong content type uploaded: {}", file.getContentType()); return new ResponseEntity<>("Wrong content type uploaded: " + file.getContentType()); } LOGGER.info("Receiving {} image to flip.", file.getContentType()); byte[] flippedImage = imageService.flip(file, vertical, horizontal); if (flippedImage == null) { return new ResponseEntity<>("Failed to flip image", HttpStatus.INTERNAL_SERVER_ERROR); } LOGGER.info("Successfully flipped image id: {}", file.getId()); return new ResponseEntity<>(flippedImage, headers, HttpStatus.OK); } LOGGER.info("Successfully flipped image id: {}", file.getId()");
  • 69. @A_Bangser @FlowConFR #FlowCon Detailing how logs get written during a request @PostMapping("flip") public ResponseEntity flipImage(@RequestParam("image") MultipartFile file, @RequestParam(value = "vertical") Boolean vertical, @RequestParam(value = "horizontal") Boolean horizontal) { if (file.getContentType() != null) { LOGGER.warn("Wrong content type uploaded: {}", file.getContentType()); return new ResponseEntity<>("Wrong content type uploaded: " + file.getContentType()); } LOGGER.info("Receiving {} image to flip.", file.getContentType()); byte[] flippedImage = imageService.flip(file, vertical, horizontal); if (flippedImage == null) { return new ResponseEntity<>("Failed to flip image", HttpStatus.INTERNAL_SERVER_ERROR); } LOGGER.info("Successfully flipped image id: {}", file.getId()); return new ResponseEntity<>(flippedImage, headers, HttpStatus.OK); }
  • 70. @A_Bangser @FlowConFR #FlowCon@A_Bangser @FlowConFR #FlowCon Log outputs
  • 71. @A_Bangser @FlowConFR #FlowCon In contrast, how an event is created during a request @PostMapping("flip") public ResponseEntity flipImage(...) { EVENT.addField("content.type", file.getContentType()); EVENT.addField("action", "flip"); EVENT.addField("image_id", file.getId()); EVENT.addField("flip_vertical", vertical); EVENT.addField("flip_horizontal", horizontal); ... LOGGER.info("Receiving {} image to flip.", file.getContentType()); byte[] flippedImage = imageService.flip(file, vertical, horizontal); ... LOGGER.info("Successfully flipped image id: {}", file.getId()); EVENT.addField("action.success", "true"); return new ResponseEntity<>(flippedImage, headers, HttpStatus.OK); }
  • 72. @A_Bangser @FlowConFR #FlowCon In contrast, how an event is created during a request @PostMapping("flip") public ResponseEntity flipImage(...) { EVENT.addField("content.type", file.getContentType()); EVENT.addField("action", "flip"); EVENT.addField("image_id", file.getId()); EVENT.addField("flip_vertical", vertical); EVENT.addField("flip_horizontal", horizontal); ... LOGGER.info("Receiving {} image to flip.", file.getContentType()); byte[] flippedImage = imageService.flip(file, vertical, horizontal); ... LOGGER.info("Successfully flipped image id: {}", file.getId()); EVENT.addField("action.success", "true"); return new ResponseEntity<>(flippedImage, headers, HttpStatus.OK); }
  • 73. @A_Bangser @FlowConFR #FlowCon In contrast, how an event is created during a request @PostMapping("flip") public ResponseEntity flipImage(...) { EVENT.addField("content.type", file.getContentType()); EVENT.addField("action", "flip"); EVENT.addField("image_id", file.getId()); EVENT.addField("flip_vertical", vertical); EVENT.addField("flip_horizontal", horizontal); ... LOGGER.info("Receiving {} image to flip.", file.getContentType()); byte[] flippedImage = imageService.flip(file, vertical, horizontal); ... LOGGER.info("Successfully flipped image id: {}", file.getId()); EVENT.addField("action.success", "true"); return new ResponseEntity<>(flippedImage, headers, HttpStatus.OK); } EVENT.addField("content.type", file.getContentType());
  • 74. @A_Bangser @FlowConFR #FlowCon In contrast, how an event is created during a request @PostMapping("flip") public ResponseEntity flipImage(...) { EVENT.addField("content.type", file.getContentType()); EVENT.addField("action", "flip"); EVENT.addField("image_id", file.getId()); EVENT.addField("flip_vertical", vertical); EVENT.addField("flip_horizontal", horizontal); ... LOGGER.info("Receiving {} image to flip.", file.getContentType()); byte[] flippedImage = imageService.flip(file, vertical, horizontal); ... LOGGER.info("Successfully flipped image id: {}", file.getId()); EVENT.addField("action.success", "true"); return new ResponseEntity<>(flippedImage, headers, HttpStatus.OK); } EVENT.addField("content.type", file.getContentType()); EVENT.addField("action", "flip");
  • 75. @A_Bangser @FlowConFR #FlowCon In contrast, how an event is created during a request @PostMapping("flip") public ResponseEntity flipImage(...) { EVENT.addField("content.type", file.getContentType()); EVENT.addField("action", "flip"); EVENT.addField("image_id", file.getId()); EVENT.addField("flip_vertical", vertical); EVENT.addField("flip_horizontal", horizontal); ... LOGGER.info("Receiving {} image to flip.", file.getContentType()); byte[] flippedImage = imageService.flip(file, vertical, horizontal); ... LOGGER.info("Successfully flipped image id: {}", file.getId()); EVENT.addField("action.success", "true"); return new ResponseEntity<>(flippedImage, headers, HttpStatus.OK); } EVENT.addField("content.type", file.getContentType()); EVENT.addField("action", "flip"); EVENT.addField("image_id", file.getId());
  • 76. @A_Bangser @FlowConFR #FlowCon In contrast, how an event is created during a request @PostMapping("flip") public ResponseEntity flipImage(...) { EVENT.addField("content.type", file.getContentType()); EVENT.addField("action", "flip"); EVENT.addField("image_id", file.getId()); EVENT.addField("flip_vertical", vertical); EVENT.addField("flip_horizontal", horizontal); ... LOGGER.info("Receiving {} image to flip.", file.getContentType()); byte[] flippedImage = imageService.flip(file, vertical, horizontal); ... LOGGER.info("Successfully flipped image id: {}", file.getId()); EVENT.addField("action.success", "true"); return new ResponseEntity<>(flippedImage, headers, HttpStatus.OK); } EVENT.addField("content.type", file.getContentType()); EVENT.addField("action", "flip"); EVENT.addField("flip_vertical", vertical); EVENT.addField("image_id", file.getId());
  • 77. @A_Bangser @FlowConFR #FlowCon In contrast, how an event is created during a request @PostMapping("flip") public ResponseEntity flipImage(...) { EVENT.addField("content.type", file.getContentType()); EVENT.addField("action", "flip"); EVENT.addField("image_id", file.getId()); EVENT.addField("flip_vertical", vertical); EVENT.addField("flip_horizontal", horizontal); ... LOGGER.info("Receiving {} image to flip.", file.getContentType()); byte[] flippedImage = imageService.flip(file, vertical, horizontal); ... LOGGER.info("Successfully flipped image id: {}", file.getId()); EVENT.addField("action.success", "true"); return new ResponseEntity<>(flippedImage, headers, HttpStatus.OK); } LOGGER.info("Receiving {} image to flip.", file.getContentType()); EVENT.addField("content.type", file.getContentType()); EVENT.addField("action", "flip"); EVENT.addField("flip_vertical", vertical); EVENT.addField("image_id", file.getId()); EVENT.addField("flip_horizontal", horizontal);
  • 78. @A_Bangser @FlowConFR #FlowCon In contrast, how an event is created during a request @PostMapping("flip") public ResponseEntity flipImage(...) { EVENT.addField("content.type", file.getContentType()); EVENT.addField("action", "flip"); EVENT.addField("image_id", file.getId()); EVENT.addField("flip_vertical", vertical); EVENT.addField("flip_horizontal", horizontal); ... LOGGER.info("Receiving {} image to flip.", file.getContentType()); byte[] flippedImage = imageService.flip(file, vertical, horizontal); ... LOGGER.info("Successfully flipped image id: {}", file.getId()); EVENT.addField("action.success", "true"); return new ResponseEntity<>(flippedImage, headers, HttpStatus.OK); } EVENT.addField("content.type", file.getContentType()); EVENT.addField("action", "flip"); EVENT.addField("flip_vertical", vertical); EVENT.addField("image_id", file.getId()); EVENT.addField("flip_horizontal", horizontal);
  • 79. @A_Bangser @FlowConFR #FlowCon In contrast, how an event is created during a request @PostMapping("flip") public ResponseEntity flipImage(...) { EVENT.addField("content.type", file.getContentType()); EVENT.addField("action", "flip"); EVENT.addField("image_id", file.getId()); EVENT.addField("flip_vertical", vertical); EVENT.addField("flip_horizontal", horizontal); ... LOGGER.info("Receiving {} image to flip.", file.getContentType()); byte[] flippedImage = imageService.flip(file, vertical, horizontal); ... LOGGER.info("Successfully flipped image id: {}", file.getId()); EVENT.addField("action.success", "true"); return new ResponseEntity<>(flippedImage, headers, HttpStatus.OK); } LOGGER.info("Successfully flipped image id: {}", file.getId()); EVENT.addField("content.type", file.getContentType()); EVENT.addField("action", "flip"); EVENT.addField("flip_vertical", vertical); EVENT.addField("image_id", file.getId()); EVENT.addField("flip_horizontal", horizontal);
  • 80. @A_Bangser @FlowConFR #FlowCon In contrast, how an event is created during a request @PostMapping("flip") public ResponseEntity flipImage(...) { EVENT.addField("content.type", file.getContentType()); EVENT.addField("action", "flip"); EVENT.addField("image_id", file.getId()); EVENT.addField("flip_vertical", vertical); EVENT.addField("flip_horizontal", horizontal); ... LOGGER.info("Receiving {} image to flip.", file.getContentType()); byte[] flippedImage = imageService.flip(file, vertical, horizontal); ... LOGGER.info("Successfully flipped image id: {}", file.getId()); EVENT.addField("action.success", "true"); return new ResponseEntity<>(flippedImage, headers, HttpStatus.OK); } EVENT.addField("content.type", file.getContentType()); EVENT.addField("action", "flip"); EVENT.addField("flip_vertical", vertical); EVENT.addField("image_id", file.getId()); EVENT.addField("flip_horizontal", horizontal);
  • 81. @A_Bangser @FlowConFR #FlowCon In contrast, how an event is created during a request @PostMapping("flip") public ResponseEntity flipImage(...) { EVENT.addField("content.type", file.getContentType()); EVENT.addField("action", "flip"); EVENT.addField("image_id", file.getId()); EVENT.addField("flip_vertical", vertical); EVENT.addField("flip_horizontal", horizontal); ... LOGGER.info("Receiving {} image to flip.", file.getContentType()); byte[] flippedImage = imageService.flip(file, vertical, horizontal); ... LOGGER.info("Successfully flipped image id: {}", file.getId()); EVENT.addField("action.success", "true"); return new ResponseEntity<>(flippedImage, headers, HttpStatus.OK); } EVENT.addField("action.success", "true"); EVENT.addField("content.type", file.getContentType()); EVENT.addField("action", "flip"); EVENT.addField("flip_vertical", vertical); EVENT.addField("image_id", file.getId()); EVENT.addField("flip_horizontal", horizontal);
  • 82. @A_Bangser @FlowConFR #FlowCon@A_Bangser @FlowConFR #FlowCon Comparing the outputs Multiple logs A single event
  • 83. @A_Bangser @FlowConFR #FlowCon@A_Bangser @FlowConFR #FlowCon Making the information easy to query
  • 84. @A_Bangser @FlowConFR #FlowCon@A_Bangser @FlowConFR #FlowCon And keeping the information in context
  • 85. @A_Bangser @FlowConFR #FlowCon@A_Bangser @FlowConFR #FlowCon Most importantly, making it easy to add more!
  • 86. @A_Bangser @FlowConFR #FlowCon In order to combate tribal knowledge based guessing when debugging our complex systems, we need: A low friction way to add fields to your logs for structure and searchability Allowing application and user context to be wrapped in a business context CustomerID:234567VersionOfApp:2 RequestedUri:www.
  • 87. @A_Bangser @FlowConFR #FlowCon Let’s understand a couple of these through examples ➔ raw events ➔ no pre-aggregation ➔ structured data ➔ arbitrarily wide events ➔ schema-less-ness ➔ high cardinality dimensions ➔ oriented around the lifecycle of the request ➔ batched up context ➔ static dashboards don’t work, it must be exploratory
  • 88. @A_Bangser @FlowConFR #FlowCon Debugging distributed systems is hard Especially when business impact is on the line. Let’s talk outages
  • 89. @A_Bangser @FlowConFR #FlowCon Hmmm, a warning alert has come in This is an automated alert based on a warning production service sending a high percent of 500’s in production!
  • 90. @A_Bangser @FlowConFR #FlowCon Yup, definitely an issue
  • 91. @A_Bangser @FlowConFR #FlowCon All hands on deck, what is happening...and why?
  • 92. @A_Bangser @FlowConFR #FlowCon All hands on deck, what is happening...and why?
  • 93. @A_Bangser @FlowConFR #FlowCon All hands on deck, what is happening...and why?
  • 94. @A_Bangser @FlowConFR #FlowCon All hands on deck, what is happening...and why?
  • 95. @A_Bangser @FlowConFR #FlowCon All hands on deck, what is happening...and why?
  • 96. @A_Bangser @FlowConFR #FlowCon 2+ hrs and still aren’t sure we know what happened
  • 97. @A_Bangser @FlowConFR #FlowCon And then it keeps happening
  • 98. @A_Bangser @FlowConFR #FlowCon Oncall engineers are not amused
  • 99. @A_Bangser @FlowConFR #FlowCon But the service owners weren’t just lounging around
  • 100. @A_Bangser @FlowConFR #FlowCon And these were some awesome dashboards
  • 101. @A_Bangser @FlowConFR #FlowCon Let’s break down what this dashboard shows Request Counts Response Latency
  • 102. @A_Bangser @FlowConFR #FlowCon Let’s break down what this dashboard shows Enhanced Images Original Images Enhanced Images Enhanced and resized Request Counts Response Latency
  • 103. @A_Bangser @FlowConFR #FlowCon This dashboard helped limit impact ~3 hours 40 min
  • 104. @A_Bangser @FlowConFR #FlowCon And eventually, powerful human pattern matchers solved the problem
  • 105. @A_Bangser @FlowConFR #FlowCon So what happens to this dashboard now?
  • 106. @A_Bangser @FlowConFR #FlowCon They have been sent to a farm… with their other friends
  • 107. @A_Bangser @FlowConFR #FlowCon Why ditch the dashboards? The scar tissue of your past outages is not a sufficient replacement for the creativity required to investigate your future incidents https://www.needpix.com/photo/907639/images-leash-leash-polaroid-free-pictures-free-photos-free-images-royalty-free
  • 108. @A_Bangser @FlowConFR #FlowCon Let’s revisit those characteristics ➔ raw events ➔ no pre-aggregation ➔ structured data ➔ arbitrarily wide events ➔ schema-less-ness ➔ high cardinality dimensions ➔ oriented around the lifecycle of the request ➔ batched up context ➔ static dashboards don’t work, it must be exploratory ByTwitter,CCBY4.0,https://commons.wikimedia.org/w/index.php?curid=80936515
  • 109. @A_Bangser @FlowConFR #FlowCon Let’s revisit those characteristics ➔ raw events ➔ no pre-aggregation ➔ structured data ➔ arbitrarily wide events ➔ schema-less-ness ➔ high cardinality dimensions ➔ oriented around the lifecycle of the request ➔ batched up context ➔ static dashboards don’t work, it must be exploratory ByTwitter,CCBY4.0,https://commons.wikimedia.org/w/index.php?curid=80936515 The only way to ask new questions is to keep the original raw data available and queryable
  • 110. @A_Bangser @FlowConFR #FlowCon Let’s revisit those characteristics ➔ raw events ➔ no pre-aggregation ➔ structured data ➔ arbitrarily wide events ➔ schema-less-ness ➔ high cardinality dimensions ➔ oriented around the lifecycle of the request ➔ batched up context ➔ static dashboards don’t work, it must be exploratory ByTwitter,CCBY4.0,https://commons.wikimedia.org/w/index.php?curid=80936515 Make data easy to add details to and easy to query
  • 111. @A_Bangser @FlowConFR #FlowCon Let’s revisit those characteristics ➔ raw events ➔ no pre-aggregation ➔ structured data ➔ arbitrarily wide events ➔ schema-less-ness ➔ high cardinality dimensions ➔ oriented around the lifecycle of the request ➔ batched up context ➔ static dashboards don’t work, it must be exploratory ByTwitter,CCBY4.0,https://commons.wikimedia.org/w/index.php?curid=80936515 Empower creative and shared exploration based on business context
  • 112. @A_Bangser @FlowConFR #FlowCon Let’s revisit those characteristics ➔ raw events ➔ no pre-aggregation ➔ structured data ➔ arbitrarily wide events ➔ schema-less-ness ➔ high cardinality dimensions ➔ oriented around the lifecycle of the request ➔ batched up context ➔ static dashboards don’t work, it must be exploratory ByTwitter,CCBY4.0,https://commons.wikimedia.org/w/index.php?curid=80936515 The only way to ask new questions is to keep the original raw data available and queryable Make data easy to add details to and easy to query Empower creative and shared exploration based on business context
  • 113. @A_Bangser @FlowConFR #FlowCon QA TWU Looking back journeys are never clear, so why do we still expect them to be when we start a new one? Political Science Major Data analysis for investments A desire to learn how to code Automation FTW! An “analyst” computer A “DevOps” friend engaged me in his work onitorama An infrastructure project Platform Engineering @ Professional scuba diver A (slight) obsession with observability
  • 114. @A_Bangser @FlowConFR #FlowCon Start where you are. Use what you have. Do what you can. - Arthur Ashe
  • 115. @A_Bangser @FlowConFR #FlowCon ➔ All of tech and product is now asking more interesting questions ➔ We are expecting more of our tooling ➔ We are building new awareness about our services and system Start where you are. Use what you have. Do what you can. - Arthur Ashe
  • 116. @A_Bangser @FlowConFR #FlowCon Thank you! www.SlideShare.net/ AbigailBangser @A_Bangser