The document discusses how artificial intelligence can be applied to performance engineering to make it self-healing and self-service. It describes how monitoring needs have evolved from just looking at dashboards and logs to dealing with dynamic cloud environments. It outlines how AI can be used for full-stack monitoring with one agent, automated end-to-end tracing, automated log analytics and change detection. It then discusses how AI can enable shifting work left to break the pipeline earlier, improve mean time to resolution with auto-mitigation, and shift work right with tags, deployments and events to create actionable feedback loops across development, operations and business teams.
18. Improve MTTR: Automate Mitigate with AI Data
Auto Mitigate!
1 CPU Exhausted? Add a new service instance!
3 Issue with BLUE only? Switch back to GREEN!
?Escalate at 2AM?
2 High Garbage Collection? Adjust/Revert Memory Settings!
4 Hung threads? Restart Service!
5 Still ongoing? Initiate Rollback!
Escalate
? Still ongoing?5
1
2
3
4
Mark Bad Commits
Update Dev Tickets
…
…
Impact Mitigated??
?
That may have worked well for static environments where you knew what you are looking at
If your apps gave you logs you could use log analytics to analyze the log files ->in case you knew what to look for and in case the log messages were actually written
We could also correlate logs and exceptions to identify strange patterns
All of this worked well in case the applications were rather static – not too large and you had the people that understood how to analyze data provided by different tools
BUT – the world has changed
These is the new technology stack we are dealing with – and it is by far not complete
New players coming and going – allowing us to implement new types of apps with new architectural and deployment options
But there is more than production! There is more we can do throughout the whole DevOps Toolchain
But there is more than production! There is more we can do throughout the whole DevOps Toolchain