Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Monitoring your API
1.
2. WHAT ISTHISTALK ABOUT?
• Passive monitoring with graphite (collect statistics).
• What metrics to monitor.
• What tools.
• Graph examples.
3. ASSUMPTIONS
• You are using Nginx as a proxy for your API.
• You are using Ubuntu (but works in other Linux
distributions).
• You’ll be using graphite to store metrics sent by
collectl for system metrics and logster for Nginx
logs.
4. WHATTO MONITOR?
“The 15 Essential Nginx Metrics to Monitor” by
Scalyr https://www.scalyr.com/community/guides/
how-to-monitor-nginx-the-essential-guide
•Requests per second
•Response time
•Active connections
•Connection backlog queue
•Response codes
•Process open file handlers
•Process state*
•Server status*
•Server load average
•Server network usage
•Server disk space
•Hosting provider status*
•DNS expiration*
•SSL certificate expiration*
•User activity*
* Not the kind of thing you would measure so not talking about them in this talk
5. WHATTO MONITOR?
• “The USE Method” by Brendan Gregg http://www.brendangregg.com/
usemethod.html
• Methodology for analyzing the performance of any system.
• Summarized as:“For every resource, check utilization, saturation, and
errors.”
• Consider software a resource as well
• “USE Method: Rosetta Stone of Performance Checklists” by Brendan
Gregg http://www.brendangregg.com/USEmethod/use-rosetta.html
6. WHATTO MONITOR ?
Utilization Saturation Errors
App Performance
Response time, #
Requests
— 5xx code
Nginx Connections Active
Accepted -
Handled
—
Open file descriptors # open files — —
CPU % Util Run queue size —
Network Rx orTx / Max Dropped Errors
Memory Used Swap —
Disk % Util
Wait time and
queue length
—
7. WHATTOOLS?
Utilization Saturation Errors
App Performance
Response time, #
Requests
— 5xx code
Nginx Connections Active
Accepted -
Handled
—
Open file descriptors # open files — —
CPU % Util Run queue size —
Network Rx orTx / Max Dropped Errors
Memory Used Swap —
Disk % Util
Wait time and
queue length
—
8. WHATTOOLS? COLLECTL
• Created by HP
• Low overhead
• Available in all major Linux distributions
• Measure a rich set of metrics
• Store locally and exports to ganglia and graphite, custom imports and
exports can be added
• Problem: doesn’t export all metrics to graphite
11. WHATTOOLS? LOGSTER
• Created by Etsy
• Export to ganglia, graphite, statsd, cloudwatch, nagios
• Few dependencies
• New parsers can be added
• 1 minute resolution
• Problem: only sends requests / sec per response code
12. WHATTOOLS? LOGSTER
• Nginx allows to log the request time via $request_time
• I created a parser for logster that takes advantage of
$request_time
• Sends percentiles and max
• DOESN’T USE AVERAGES
• Sends total of requests per responde code
13. WHATTOOLS? LOGSTER
• Why a new parser that doesn't use averages:
“#LatencyTipOfTheDay:Average (def): a random
number that falls somewhere between the
maximum and 1/2 the median. Most often used to
ignore reality.” by GilTene http://
latencytipoftheday.blogspot.com.co/2014/06/
latencytipoftheday-average-random.html
14. WHATTOOLS? LOGSTER
• Why a new parser that doesn't use averages:
“#LatencyTipOfTheDay: If you are not measuring
and/or plotting Max, what are you hiding (from)?”
by GilTene http://
latencytipoftheday.blogspot.com.co/2014/06/
latencytipoftheday-if-you-are-not.html
15. WHATTOOLS? LOGSTER
• Why a new parser that doesn't use averages:
More resources about response times in web
apps:
http://www.infoq.com/presentations/latency-pitfalls
by GilTene
https://vimeo.com/104129953 by Andre Arko