Over the past decade, DevOps has empowered teams to break silos and create an environment of shared responsibility for delivering scalable applications.
At this breakout session, Remie Bolte, Marketplace Vendor and Cloud Solutions Architect, will explore how to break down one of the last silos still standing: application monitoring. You will learn about the history of monitoring and how it has evolved from basic systems monitoring to application performance monitoring. It will outline the common pitfalls of the most popular monitoring solutions and how these are antithetical to the DevOps movement.
To solve this, we'll introduce you to a new monitoring concept focused on developers: Monitoring as Code.
8. History of
Monitoring
Pre-historic
Single system monitoring
All tools were focused on the wellbeing of their host
system (top, vmstat, syslog). It was a symbiotic mess.
Command-line interfaces
Because... well… does this need explaining?
It makes me feel old.
Application? Say what?
It is hard enough to keep the systems running, who
cares about the actual applications.
9. History of
Monitoring
‘90s
Systems & network monitoring
Even back then we wanted to make sure amazon.com
was always online (we ❤ books!)
Web-based interfaces FTW!
Yes, finally, a web-based interface. Who doesn’t love
to configure CGI scripts in Apache 1.3?
Oh right, applications, yes, getting there
Ok, so you created this “website” you want to run on
my system. You want it to ALWAYS work. I get it.
10. History of
Monitoring
‘00s
Busy doing other stuff until late ‘00
Yeah sorry, life kept us busy doing other stuff.
Be the change you seek 😉
Agile. DevOps. Right… gotcha
Ops are still using Nagios. Devs spent the entire
decade reinventing almost everything.
What is APM again?
Install what New Relic agent now? JVM level
integration? Oh wow, those graphs are awesome 😍!
11. History of
Monitoring
Cloud age
Multi-faceted landscape monitoring
We have eyes on our on-prem, multi-cloud, micro
services based infrastructure. We have 200 tools for it.
Still getting notifications at 3am
Oh well… some things never change 🤷
Your application are belong to us
We are using auto-scaling now, don’t really care if
your application is hording resources. We cool.
13. (false) positively cruel
fool me once, shame on you;
fool me twice, shame on me;
fool me at 3am and for the love of me
I will know where to find you.
14. 25% believe these
interruptions [..] make their
jobs unmanageable at times
2018 SURVEY OF OVER 800 IT PROFESSIONALS, PAGERDUTY
21. Monitoring
As Code
The problem
We are using the wrong metrics
Reactive monitoring based on thresholds determined
by historic trend analysis is not good enough anymore
We operate with a split brain
Developers write business logic in code, operations
crew recreates this in separate monitoring tooling
The learning curve is too steep
We can’t expect anyone to be a full-stack-devops-
rainbow-unicorn-centaur 🦄 🌈 '
22. Monitoring
As Code
The goal
Monitoring application state
We need to proactively track the actual real-time
state of the application
Monitoring should be SOLID and DRY
We should implement the same principles for
monitoring as we do with application development
Use modern development methods
There should be no new languages, no new techniques
and no context switching for monitoring
23. Monitoring
As Code
The solution
Nagios Core+Docker+TypeScript = 🤔😊😍
Create your checks in Typescript and deploy it with
Docker
Incorporate monitoring in your application
Ops are still using Nagios. Devs spent the entire
decade reinventing almost everything.
Write checks like you write code
Use your existing skills, your existing CI/CD pipeline
and your existing process to develop monitoring
24. Monitoring
As Code
But why
Nagios though?
Proven technology
Conceived in the same year Toni Braxton wanted her
heart to be unbroken and Lauryn Hill was killed softly
Lightweight, super fast, fit for purpose
Written in C, focused on one thing. No fancy stuff, just a
very good task scheduler for monitoring & alerting
Active community, well documented
Hosted on GitHub, it is actively maintained with
regular stable releases and thorough documentation
32. Getting
Started
Identify
Monitor what matters to you
Make sure to identify which parts of your application
landscape require monitoring & alerting
Determine ownership
Who will be alerted when things go south? What do
you expect will happen at 3am?
Adjust your definition of done
Make sure to include writing monitoring checks as
part of your DoD, just like you’d do with tests
33. Getting
Started
Create
Write your checks
Add a folder called ‘monitoring’ (next to tests) and
initialize it:
$: mkdir monitoring
$: cd monitoring
$: npx @remie/nagios-cli init
This will install an example project that you can use to
start writing your checks.
Run it with `npm start` and check the results here:
http://localhost:8000
34. Getting
Started
Test
Write unit tests for your checks
Remember, we’re dealing with Typescript. You can
write unit tests for it.
Run locally with Docker
You can run the checks against your local
development environment with Docker
Deploy to staging environment
By leveraging environment variables or IoC, you easily
deploy to your staging environment
35. Getting
Started
Deploy
Include monitoring in your CI/CD process
Build & compile your Typescript code and build the
Docker container in your existing CI/CD pipeline
Deploy to your Kubernetes/ECS cluster
Because containers, AM I RIGHT?
No but seriously, deployment is that simple
Connect with existing alerting solutions
Integrate with Slack, OpsGenie, PagerDuty or
StatusPage to receive alert notifications