The presentation includes great overview on why and how to track and monitor your cloud infrastructure. It list the different types of cloud monitoring include the underlying infrastructure all the way up the application stack. Here you can find names of relevant tools that can support monitoring cloud online applications.
2. whoami
Andreas Chatzakis
CTO & co-founder /
High traffic Greek Real Estate portal
Software delivery team management
IT Operations
co-founder of AWS Usergroup Greece
@achatzakis
2
3. Why monitoring
You need monitoring to proact or react to
availability & performance risks and issues:
Detect problems before (many) users are aware
Alerts and notifications at 3 AM
Be informed of issues you wouldn't be able to recreate
Collect data to discover root cause of an incident
...and automate response for next time
Statistics and KPIs to track service quality trends
Visibility to prioritize optimization efforts
Make sense out of large quantity of logs and data
3
4. Monitoring in the cloud
Principles are not that diverse from
traditional infrastructure but...
Cloud allows us to build highly dynamic setups
More data
Our tools need to adapt
Ephemeral resources require centralized approach
Need aggregation based on server role
Cloud promises agility
Only possible when cost of failure is low
Being able to spot issues in a more automated
manner is key
The rise of the devops
Developers need visibility to understand how their
code affects costs and impacts availability
4
5. Types of monitoring
There is a variety of monitoring tools that
complement each other
External checks (is my app still up?)
Server monitoring (CPU, RAM, IO...)
Systems monitoring (mySQL, Apache etc metrics)
Process monitoring (restart crashed services)
Application monitoring (bottlenecks in the code)
End user monitoring (client side performance)
Log aggregation & analysis (centralize storage)
Cloud Analytics (do I make the most out of AWS?)
5
6. Deployment models
Consider the deployment model of each
monitoring solution
Agent vs Agent-less
SaaS vs DIY on own computing instances
Consider different AZ or provider
Least privilege principle (e.g. read-only
access to agent)
6
7. Pricing models
Different pricing models offered by the
various solutions
Freeware
Per host
Per host-hour
Per user
Per alert
Per stored Gbyte
7
8. External tests
External tests detect failure & alert you so
that you react
Treats your app as a black box
Periodic check from a bot
Define expected response (specific string)
Tests from different geographies
Report on average response time, latency etc
Alert via email, sms, phone
8
9. Server & Systems monitoring
Server monitoring collects data from OS and
Systems
Server metrics (CPU, Load Average, RAM, IO activity)
System metrics (Apache status, MySQL connections...)
Typically works via an agent or remote access
Can point towards root cause
But can't trace issues to specific parts of your code
Helps with capacity planning and scaling decisions
9
10. Process monitoring
Processes die or misbehave... Monitor their
health and automate response
Tools that check critical processes
Restart if crashed process
...or those using too many resources
Can configure complex scenarios
Beware of false positives
Beware of recurring restarts
10
11. Application monitoring
A 'Flight recorder' for your code helps you
fix real issues.
It is often hard to recreate a production issue.
Plugs into your app servers & tracks execution
Code tracing
Captures errors, input variables and debugging
info
Records performance metrics
Time spent on DB, Cache, external services
Overhead of specific classes or methods
Slow queries
11
12. End user monitoring
Get real data about the experience of your
app's users
It works for you. Does it work for them?
Servers running ok. What about that 3rd party widget?
Typically collects actual end user data via js
Capture performance issues faced by user segments
OS / browser / addons
Network connection speed
Geographical location
First time visit VS warm browser cache
12
13. Log aggregators
Centralized storage of logs for cloud setups
with ephemeral instances
Logs are sent over to centralized repository
Persists after server has been decomissioned
Logs are captured, stored, archived & recycled
Logs are indexed and analyzed
Preconfigured analyzers for known apps
Free text analyzers for less known apps
Alerts based on specific patterns, frequencies
13
14. Swiss knives
The future might belong to holistic
monitoring solutions
Monitoring at multiple levels
Correlating data can be a godsend for
devops
Cloud management tools might move to
integrate or provide such functionality
14
15. A common pitfall
While it does have its uses, you should not
rely on custom application logging
Typically inconsistent logging that is added
reactively
Developer bias and lack of operational
issues understanding
logging what you anticipate to go wrong
Increased code maintenance costs and risks
Can hurt performance if you are not careful
Instead use a proper monitoring toolset
let developers focus on building new
functionality
15
16. Cloud Analytics
Combine traditinal monitoring with Newvem's
Analytics and make the most of the cloud
Powerful analytics of cloud usage data
Reveal security & availability issues in
your cloud infra
Get actionable insights
Identify opportunities for cost reductions
Spot overloaded resources requiring
vertical or horizontal scaling
Visibility and confidence you making the
most of the cloud
16