My presentation from Velocity Europe 2013 in London: Beyond Pretty Charts…. Analytics for the cloud infrastructure.
IT Ops collect tons of data on the status of their data center or cloud environment. Much of that data ends up as graphs on big screens so ops folks can keep an eye on the behavior of their systems. But unless a threshold is crossed, behavioral issues will often fall through the cracks. Thresholds are reactive, and humans are, well, human. Applying analytics and machine learning to detect anomalies in dynamic infrastructure environments can catch these behavioral changes before they become critical.
Current tools used to monitor web environments rely on fundamental assumptions that are no longer true such as assuming that the underlying system being monitored is relatively static or that the behavioral limits of these systems can be defined by static rules and thresholds. Thus interest in applying analytics and machine learning to predict and detect anomalies in these dynamic environments is gaining steam. However, understanding which algorithms should be used to identify and predict anomalies accurately within all that data we generate is not so easy.
This talk will begin with a brief definition of the types of anomalies commonly found in dynamic data center environments and then discuss some of the key elements to consider when thinking about anomaly detection such as:
Understanding your data’s characteristics
The two main approaches for analyzing operations data: parametric and non-parametric methods
Simple data transformations that can give you powerful results
By the end of this talk, attendees will understand the pros and cons of the key statistical analysis techniques and walk away with examples as well as practical rules of thumb and usage patterns.