Exploring the Future Potential of AI-Enabled Smartphone Processors
Capacity Planning Free Solution
1. The Performance People
Performance Management with
Free and Bundled Tools
Adrian Cockcroft
Netflix Inc.
acockcroft@netflix.com
(Co-authored with Mario Jauvin
MFJ Associates
mario@mfjassociates.net)
14 May 2014
2. May 14, 2014 Adrian Cockcroft and Mario Jauvin
Agenda
Overview of Capacity Planning Requirements
and Data Sources
Performance Data Collection
Free Network Monitoring Tools
Free System Monitoring Tools
Free Load Generation and Modelling Tools
Licences and References
3. May 14, 2014 Adrian Cockcroft and Mario Jauvin
What are we talking about?
Network
monitoring with
WireShark, MRTG,
BigSister, Cacti,
Nagios, OpenNMS,
Zenoss, Openxtra,
ntop
Database Tier monitoring
With SEtoolkit, Orca,
XEtoolkit
Application Tier
monitoring with Orca,
Cacti, BigSister, Ganglia,
XEtoolkit
QA Load generation with
Grinder or SLAMD,
modelling with PDQ and R
4. May 14, 2014 Adrian Cockcroft and Mario Jauvin
Capacity Planning
Requirements and Data
Sources
5. May 14, 2014 Adrian Cockcroft and Mario Jauvin
Definitions
Capacity
– Resource utilization and headroom
Planning
– Predicting future needs by analyzing historical data
and modeling future scenarios
Performance Monitoring
– Collecting and reporting on performance data
Free Tools
– Bundled with the OS or available for no $$$
6. May 14, 2014 Adrian Cockcroft and Mario Jauvin
Capacity Planning Requirements
We care about CPU, Memory, Network and Disk
resources, and Application response times
We need to know how much of each resource we
are using now, and will use in the future
We need to know how much headroom we have to
handle higher loads
We want to understand how headroom varies, and
how it relates to application response times and
throughput
7. May 14, 2014 Adrian Cockcroft and Mario Jauvin
CPU Capacity Measurements
CPU Capacity is defined by CPU type and
clock rate, or a benchmark rating like
SPECrateInt2000
CPU utilization is defined as busy time
divided by elapsed time for each CPU
CPU load average measures the average
number of jobs running and ready to run
8. May 14, 2014 Adrian Cockcroft and Mario Jauvin
Memory Capacity Measurements
Physical Memory Capacity Utilization and Limits
– Kernel memory
– Shared Memory segment
– Executable code, stack and heap
– File system cache usage
– Unused free memory
Virtual Memory Capacity - Swap Space
Memory Throughput
– Page in and page out rates
9. May 14, 2014 Adrian Cockcroft and Mario Jauvin
Network Capacity Measurements
Network Interface Throughput
– Byte and packet rates input and output
TCP Protocol Specific Throughput
– TCP connection count and connection rates
– TCP byte rates input and output
NFS/SMB Protocol Specific Throughput
– Byte rates read and write
– NFS/SMB service response times
HTTP Protocol Specific Throughput
– HTTP operation rates
– Get and put payload byte rates and size distribution
10. May 14, 2014 Adrian Cockcroft and Mario Jauvin
Disk Capacity Measurements
Detailed metrics vary by platform
Easy for the simple disk cases
Hard for cached RAID subsystems
Almost Impossible for shared disk
subsystems and SANs
– Another system or volume can be sharing a
backend spindle, when it gets busy your own
volume can saturate, even though you did not
change your own workload
11. May 14, 2014 Adrian Cockcroft and Mario Jauvin
Capacity Planning Challenges
Constantly changing infrastructure
Limited attention span from staff
Horizontally scaled commodity systems
Per node software licencing costs too much
Too many tools, too many agents per node
Too much data, not enough analysis
Non-linear and non-intuitive scalability
Lack of tools and metrics for virtualized resources
12. May 14, 2014 Adrian Cockcroft and Mario Jauvin
Observability
Four different viewpoints
– Management
– Engineering
– QA Testing
– Operations
Each needs very different information
Ideal would be different views of the same
performance database
Reality is a mess of disjoint tools
13. May 14, 2014 Adrian Cockcroft and Mario Jauvin
Management Viewpoint
Daily summary of status and problems
Business oriented metrics
Future scenario planning
Marketing and management input
Concise report with dashboard style status
indicators
Free tools: R, Spreadsheet and Web based
displays, no good summarization tools
14. May 14, 2014 Adrian Cockcroft and Mario Jauvin
Engineering Viewpoint
Large volumes of detailed data at several different
time scales
Input to tuning, reconfiguring and future product
development
Low level problem diagnosis
Detailed reports with drill down and correlation
analysis
Free tools: XE/SE Toolkit, Orca, Ganglia, Cacti, R
15. May 14, 2014 Adrian Cockcroft and Mario Jauvin
QA Test Viewpoint
Workload specification tools
Load generation frameworks
Testing for functionality and performance
Regression tools to compare releases
Modelling difference between test configuration
and production configuration
Free Tools: The Grinder, SLAMD, R, PDQ
16. May 14, 2014 Adrian Cockcroft and Mario Jauvin
Operations Viewpoint
Immediate timeframe
Real time display, updated in seconds
Alert based monitoring
High level problem diagnosis
Simple high level graphs and views
Free tools: BigSister, Nagios, OpenNMS,
MRTG, Cacti, Ganglia, WireShark, ntop
17. May 14, 2014 Adrian Cockcroft and Mario Jauvin
Measurement Data Interfaces
Several generic raw access methods
– Read the kernel directly (not a good idea)
– Structured system data (Solaris kstat, Linux /proc)
– Process data
– Network data
– Accounting data
– Application data
Command based data interfaces
– Scrape data from vmstat, iostat, netstat, sar, ps
– Higher overhead, lower resolution, missing metrics
Data available is platform specific either way
Much more detail on this topic in the Solaris/Linux Performance
Measurement and Tuning Class
18. May 14, 2014 Adrian Cockcroft and Mario Jauvin
Free Network
Monitoring Tools
19. May 14, 2014 Adrian Cockcroft and Mario Jauvin
SNMP
Simple network management protocol
UDP protocol based on port 161
Client/server like
– Client is called management application entity
– Server is called an agent entity
Agent entity is designed to be implemented
on network hardware, router, switches, etc
20. May 14, 2014 Adrian Cockcroft and Mario Jauvin
SNMP – MIBs
Management information base
Defines the structure and the semantic of the
information that can be reported on
Most commonly used is MIB-II which defines a set of
standard networking attributes
– Interface tables
– System level information
– Routing tables
Specified using ASN.1 (abstract syntax notation 1)
21. May 14, 2014 Adrian Cockcroft and Mario Jauvin
SNMP – commands
Called PDU (protocol data units)
GET
GETNEXT
GETBULK
SET
Encoded using BER (basic encoding rules)
22. May 14, 2014 Adrian Cockcroft and Mario Jauvin
Versions
Version 1, original version done in May 1991
Version 2, around 1993. Failed because the
IETF credo of “rough consensus and running
code” could not be met on securing SNMP
Turned into V2c for community string security
(like V1)
Version 3, added security and complexity in
1998
23. May 14, 2014 Adrian Cockcroft and Mario Jauvin
SNMP tools
Too numerous to name all but…
OpenNMS
Nagios
Cacti
MRTG
Net-snmp
– See www.snmplink.org
24. May 14, 2014 Adrian Cockcroft and Mario Jauvin
SNMP tools
Snmpwalk – will report all data in a specified
MIB
getIf – will report data about interfaces and
includes built-in MIB browser
Snmptable – will report tabular data from MIB
tables
25. May 14, 2014 Adrian Cockcroft and Mario Jauvin
OpenNMS
Well…. it’s not that portable
– 95% java is not 100% java
– Requires about 20-30 different platform specific
packages (PostgreSQL, Perl, RRD tool, Tomcat 4
etc…)
– Difficult to install
– Easy auto discovery
– Web-based interface
26. May 14, 2014 Adrian Cockcroft and Mario Jauvin
OpenNMS
Main screen shot
27. May 14, 2014 Adrian Cockcroft and Mario Jauvin
OpenNMS
Node screen shot
28. May 14, 2014 Adrian Cockcroft and Mario Jauvin
Nagios
Easy to build/compile (on Solaris 10)
Easy to install
Quick response from CGI
Configuration is manual and a pain
– 13 configuration files with all kinds of interrelated
entries
– Tedious and error prone
Requires plugins to do anything
29. May 14, 2014 Adrian Cockcroft and Mario Jauvin
Nagios
Main screen shot
30. May 14, 2014 Adrian Cockcroft and Mario Jauvin
Nagios
Host detail screen shot
32. May 14, 2014 Adrian Cockcroft and Mario Jauvin
ntop
Similar to familiar UNIX top tool for
processes but used for network
Provide huge selection of real-time data
Can be found at http://www.openxtra.co.uk/
33. May 14, 2014 Adrian Cockcroft and Mario Jauvin
ntop – Active Sessions
34. May 14, 2014 Adrian Cockcroft and Mario Jauvin
ntop Hosts
35. May 14, 2014 Adrian Cockcroft and Mario Jauvin
ntop Network Load
36. May 14, 2014 Adrian Cockcroft and Mario Jauvin
ntop_Network_Thruput
37. May 14, 2014 Adrian Cockcroft and Mario Jauvin
ntop Port Dist
38. May 14, 2014 Adrian Cockcroft and Mario Jauvin
ntop_Protocol_Dist
39. May 14, 2014 Adrian Cockcroft and Mario Jauvin
ntop Protocols
40. May 14, 2014 Adrian Cockcroft and Mario Jauvin
Zenoss
Open source monitoring and management of
IT infrastructure
Zenoss core is free
Other editions are for a fee
Get it from http://www.zenoss.com/download/
41. May 14, 2014 Adrian Cockcroft and Mario Jauvin
zenoss Architecture
42. May 14, 2014 Adrian Cockcroft and Mario Jauvin
zenoss Dash Config
43. May 14, 2014 Adrian Cockcroft and Mario Jauvin
zenoss Google
44. May 14, 2014 Adrian Cockcroft and Mario Jauvin
zenoss Google Alerts
45. May 14, 2014 Adrian Cockcroft and Mario Jauvin
Zenoss Graphs
46. May 14, 2014 Adrian Cockcroft and Mario Jauvin
zenoss Topology
47. May 14, 2014 Adrian Cockcroft and Mario Jauvin
MRTG
Really simple to install and configure
Require manual config file creation
Only for MIB-II interface plotting out of the
box
Graphing not flexible, axis, time etc
48. May 14, 2014 Adrian Cockcroft and Mario Jauvin
MRTG
Interface screen shot
49. May 14, 2014 Adrian Cockcroft and Mario Jauvin
MRTG
Other CPU screen shot
50. May 14, 2014 Adrian Cockcroft and Mario Jauvin
RRD tool
Software to store, retrieve and graph
numerical time series data
Use a round robin algorithm
Data files are a fixed size
– Don’t grow
– Don’t require maintenance
51. May 14, 2014 Adrian Cockcroft and Mario Jauvin
RRD tool
Compiles on most platforms
Used by many SNMP based tools
– OpenNMS
– Cacti
– BigSister
– WeatherMap4RRD
– MailGraph
52. May 14, 2014 Adrian Cockcroft and Mario Jauvin
RRD tool
14all CGI script that plots data similar to
MRTG
Configurable to collect data at different
interval (unlike MRTG)
Flexible and variable in what data can be
collected
53. May 14, 2014 Adrian Cockcroft and Mario Jauvin
RRD tool
Sample screen shot
54. May 14, 2014 Adrian Cockcroft and Mario Jauvin
RRD tool
Screen shot
55. May 14, 2014 Adrian Cockcroft and Mario Jauvin
RRD tool
Create a RRD database
rrdtool create test.rrd
--start 920804400
DS:speed:COUNTER:600:U:U
RRA:AVERAGE:0.5:1:24
RRA:AVERAGE:0.5:6:10
56. May 14, 2014 Adrian Cockcroft and Mario Jauvin
RRD tool
Create a graph
rrdtool graph speed.png
--start 920804400 --end 920808000
DEF:myspeed=test.rrd:speed:AVERAGE
LINE2:myspeed#FF0000
57. May 14, 2014 Adrian Cockcroft and Mario Jauvin
Free Performance Data
Collection and Rules
Toolkits
58. May 14, 2014 Adrian Cockcroft and Mario Jauvin
SE toolkit Example Tools
A free performance toolkit for rapidly creating custom data sources
Makes all the very extensive Solaris metrics easily available
Very system specific and not enough metrics exist to port to Linux
Written by Rich Pettit with contributions from Adrian Cockcroft
Get SE3.4 from http://sourceforge.net/projects/setoolkit/
Open source with support for SPARC & x86 Solaris 8, 9, 10
Function Example SE Programs
Rule Monitors cpg.se monlog.se mon_cm.se live_test.se percollator.se
zoom.se virtual_adrian.se virtual_adrian_lite.se
Disk Monitors siostat.se xio.se xiostat.se iomonitor.se iost.se xit.se disks.se
CPU Monitors cpu_meter.se vmmonitor.se mpvmstat.se
Process Monitors msacct.se pea.se ps-ax.se ps-p.se pwatch.se pw.se
Network Monitors net.se tcp_monitor.se netmonitor.se netstatx.se nfsmonitor.se nx.se
Clones iostat.se uname.se vmstat.se nfsstat-m.se perfmeter.se xload.se
Data browsers aw.se infotool.se multi_meter.se
Contributed Code anasa dfstats kview systune watch orcollator.se
Test Programs syslog.se cpus.se pure_test.se collisions.se uptime.se dumpkstats.se
net_example nproc.se kvmname.se
59. May 14, 2014 Adrian Cockcroft and Mario Jauvin
SE language features
SE is a 64bit interpreted dialect of C
– Not a new language to learn from scratch!
– Standard C /usr/ccs/bin/cpp used at runtime to preprocess SE scripts
– Main omissions - pointer types and goto
– Main additions - classes and “string” type
– powerful ways to handle dynamically allocated data
– built-in fast balanced tree routines for storing key indexed data
Dynamic linking to all existing C libraries
– Built-in classes access kernel data
– Supplied class code hides details, provides the data you want
Example scripts improve on basic utilities e.g. siostat.se, nx.se, pea.se
Example rule based monitors e.g. virtual_adrian.se, orcallator.se
60. Creating Rules
Based on real experiences of all the things that go
wrong
Capture an approximation to intuition
Test and calibrate rules on as many systems as
possible
Easy??
May 14, 2014 Adrian Cockcroft and Mario Jauvin
61. May 14, 2014 Adrian Cockcroft and Mario Jauvin
Configuring Rules
Thresholds should be configured
Very application dependent
Capture the operating envelope
– Measure the underlying values
– Measure peaks in normal operation
– Note values during problems
– Set thresholds to capture the difference
This applies to any tool
– SE Toolkit, Cacti, Ganglia, Nagios, OpenNMS etc.
62. May 14, 2014 Adrian Cockcroft and Mario Jauvin
Rules as Objects
Define only the input and output information
Hide implementation details
Make high level rule objects trivial to use and
reuse
SE Toolkit does it in three lines of code:
– #include <rules file>
– Declare rule object as a typed variable
– Read and use or print object status
63. May 14, 2014 Adrian Cockcroft and Mario Jauvin
"virtual adrian" rules summary
Disk Rule for all disks at once
– Looks for slow disks and unbalanced usage
Network Rule for all networks at once
– Looks for slow nets and unbalanced usage
Swap Rule - Looks for lack of available swap space
RAM Rule - Looks for short page residence times
CPU Power Rule
– Scales on MP systems
– Looks for long run queue delays
Mutex Rule - Looks for kernel lock contention and high sys CPU time
TCP Rule
– Looks for listen queue problems
– Reports on connection attempt failures
64. May 14, 2014 Adrian Cockcroft and Mario Jauvin
XE Toolkit - www.xetoolkit.com
Complete re-write of SE Toolkit by Rich Pettit
– Extensible Java collector, customize with jar files
– Release 1.2 available April 2008
– Multi-platform support Solaris, Linux/x86, Windows, BSD,
OSX, HP-UX, AIX, Linux/s390, Linux/Power
Licencing
– Free GPL version for standard use and shared derivations
– Open source, hosted at http://sourceforge.net/projects/xe-toolkit/
– Commercial support available if needed
– Commercial product license for custom in-house derivations
Addresses all the issues people had with SE toolkit !
65. May 14, 2014 Adrian Cockcroft and Mario Jauvin
Captive Metrics / XE Toolkit
Architecture
66. May 14, 2014 Adrian Cockcroft and Mario Jauvin
Free System Monitoring
Tools
67. May 14, 2014 Adrian Cockcroft and Mario Jauvin
Collated Performance Data - Orca
Problems with time sync when collecting data from multiple tools
– No timestamp at all for vmstat, netstat, df...
– No timestamp by default for iostat and ps...
– No way to collect realtime stats from an http logfile
Use SE Toolkit to generate one timestamped row containing all the data
– First version of percollator.se written by Adrian Cockcroft in 1996
– Extended orcallator.se written by Blair Zajac a few years later
– Graphs generated by orca batch job feeding rrdtool based web pages
– Active community developing tool at http://www.orcaware.com
– Extended to collect much more data, including process workloads
– Basic data collection ported to Linux, HP-UX and Windows
Orca is basically MRTG for System metrics rather than Network
See http://www.orcaware.com/orca/docs/Orca_Understanding_Performance_Data.ppt
68. May 14, 2014 Adrian Cockcroft and Mario Jauvin
Orca data collections
Collected using “procollator” reading info from /proc on Linux
[Uptime] [Average # Processes in Run Queue (Load Average)] [CPU Usage]
[New Process Spawn Rate] [Number of System & Running Processes]
[Context Switches & Interrupts Rate] [Interface Input Bits Per Second]
[Interface Output Bits Per Second] [Interface Input Packets Per Second]
[Interface Output Packets Per Second] [Interface Input Errors Per Second]
[Interface Output Errors Per Second] [Interface Input Dropped Per Second]
[Interface Output Dropped Per Second] [Interface Output Collisions]
[Interface Output Carrier Losses] [TCP Current Connections] [IP Statistics]
[TCP Statistics] [ICMP Statistics] [UDP Statistics]
[Disk System Wide Reads/Writes Per Second] [Disk System Wide Transfer Rate]
[Disk Reads/Writes Per Second] [Disk Transfer Rate] [Disk Space Percent
Usage] [Physical Memory Usage] [Swap Usage] [Page Ins & Outs Rate]
[Swap Ins & Outs Rate]
Orca on Solaris collects many more metrics than shown above
Strength of Orca is lots of detailed metrics with low overhead for collection
Easily customized to add more system metrics or application metrics
Orca can already track HTTP traffic and parse log files
69. May 14, 2014 Adrian Cockcroft and Mario Jauvin
All metrics are stored in
“round robin database” format
using RRDtool to generate
displays over different time
spans
Web page is simple collection
of plots with drill down by
metric or by time
Suitable for monitoring a
relatively small number of
systems in great detail, e.g.
backend database servers
70. May 14, 2014 Adrian Cockcroft and Mario Jauvin
Cacti – www.cacti.net
Web based user interface based on RRDtool
More sophisticated GUI than Orca or MRTG
Less sophisticated system metric collection,
but more coverage of networking
Better management of groups of systems
and devices than Orca, useful for tens to
hundreds of nodes
Access control and personalization for users
73. May 14, 2014 Adrian Cockcroft and Mario Jauvin
Ganglia – www.ganglia.info
Web based RRDtool GUI somewhat similar to Cacti
Better management of clusters of systems and
devices than Cacti, useful for hundreds to thousands
of nodes in a hierarchy of clusters
Provides many summary statistic plots at cluster
level and collects detailed configuration data
XML based data representation
Uses low overhead network protocol
In common use at hundreds of large HPC Grid sites,
less visibly in use at some large commercial sites
77. May 14, 2014 Adrian Cockcroft and Mario Jauvin
BigBrother and BigSister
Network and system dashboard alert monitor
Widely used at internet sites
Bigbrother is at http://www.bb4.com
Bigsister is at http://bigsister.graeff.com
Bigsister seems to have more features, alert
logging, better portability and more efficient
data collection. Compatible update to BB4.
81. May 14, 2014 Adrian Cockcroft and Mario Jauvin
Free QA Test and
Modelling Tools
82. May 14, 2014 Adrian Cockcroft and Mario Jauvin
QA Test Requirements
Generate test workload
– SLAMD, Grinder
Collect performance metrics
– Any of the tools already mentioned
Report regression against baseline
Predict capacity needed for production system
– Use spreadsheets for simple linear prediction
– Use modelling tools such as PDQ for queuing models
83. May 14, 2014 Adrian Cockcroft and Mario Jauvin
Grinder 3 - Powerful New Features
100% Pure Java - works on any hardware platform and any
operating system that supports J2SE 1.3 and above.
Java and Jython based load testing framework
– Web Browsers: simulate web browsers using HTTP, and HTTPS.
– Web Services: test interfaces using SOAP and XML-RPC.
– Database: test databases using JDBC.
– Middleware: RPC and MOM based systems using IIOP, RMI/IIOP,
RMI/JRMP, and JMS.
– Other Internet protocols: POP3, SMTP, FTP, and LDAP.
See http://grinder.sourceforge.net/g3/features.html
J2EE Performance Testing with BEA WebLogic Server by Peter
Zadrozny, Philip Aston and Ted Osborne, originally published
by Expert Press and now by APress uses Grinder 2 throughout.
84. May 14, 2014 Adrian Cockcroft and Mario Jauvin
SLAMD
Load generation framework, written in Java
Originally built to test LDAP servers by Sun
Extended to be very generic and published
as open source. Actively being developed.
Sophisticated functions and user interface
See http://www.slamd.com
Latest Release 2.0 has better usability focus
88. May 14, 2014 Adrian Cockcroft and Mario Jauvin
PDQ Modelling Tool
Dr Neil Gunther’s toolkit at
http://www.perfdynamics.com
Library used from C or Perl provides MVA queueing
models
Use to calibrate in QA and predict in production
PDQ modelling tool details:
– The Practical Performance Analyst Dr. Neil Gunther -
McGraw-Hill, 1998 ISBN 0-07-912946-3
– Analyzing Computer System Performance with Perl:PDQ
2004, ISBN 3-54-020865-8
89. May 14, 2014 Adrian Cockcroft and Mario Jauvin
References and
Conclusion
90. May 14, 2014 Adrian Cockcroft and Mario Jauvin
Licences for Free Tools
Open Source Initiative
– “OSI Approved licences”
– http://opensource.org/licenses/category
Comparisons of Common Licences
– http://zooko.com/license_quick_ref.html
91. May 14, 2014 Adrian Cockcroft and Mario Jauvin
Web Pages and Books
Adrian’s Performance and other topics blog
– http://perfcap.blogspot.com
MFJ Associates performance tools link page
– http://www.mfjassociates.net/perf_links.html
More free tools compiled by John Sellens
– http://www.generalconcepts.com/resources/monitoring/
More tools compiled by Openxtra
– http://www.openxtra.co.uk/resource-center/open_source_network_monitor_tools.php
SE toolkit info: Sun Performance and Tuning - Java and the Internet - Adrian
Cockcroft and Richard Pettit - Sun Press/Prentice Hall, 2nd
Edition, 1998 ISBN 0-13-
095249-4
Solaris 8 and Linux: System Performance Tuning 2nd
Edition – Gian-Paolo Musumeci,
O’Reilly 2002 ISBN: 0-596-00284-X
Solaris Internals http://www.solarisinternals.com
– Richard McDougall and James Mauro - new 2nd edition and new performance book by
Richard McDougall and Brendan Gregg
92. May 14, 2014 Adrian Cockcroft and Mario Jauvin
Concluding Remarks
Many large installations depend on free tools
A full suite of functionality is available
Several tools are needed to cover the bases
Tradeoff between function and ease of use
Support may be available, but typically
Google is the best support tool
Functionality is increasing….
93. May 14, 2014 Adrian Cockcroft and Mario Jauvin
Questions?
acockcroft@netflix.com
mario@mfjassociates.net
Notas do Editor
&quot;We reject kings, presidents, and voting;
we believe in rough consensus and running code.&quot;
-- David Clark, IAB chair, 1992
I added the SNMP-Informant and it started appearing automagically.
920804400 – Noon 7th of March, 1999
DS – data source SPEED as counter, collected every 300 seconds (defaults)
600 is heartbeat – maximum time to wait after which data is unknown
U:U means unknown minimum and maximum
RRA – round robin archive
0.5 – xfiles factor - % of unknown after which whole archive is unknown
1:24 average every 1 interval (no average) and keep 24 (2 hours worth)
6:10 everage every 6 values and keep 10
Start at noon, end at 13:00, average RRA called SPEED using 2 pixel thickness and color red #FF0000