SlideShare uma empresa Scribd logo
1 de 93
The Performance People
Performance Management with
Free and Bundled Tools
Adrian Cockcroft
Netflix Inc.
acockcroft@netflix.com
(Co-authored with Mario Jauvin
MFJ Associates
mario@mfjassociates.net)
14 May 2014
May 14, 2014 Adrian Cockcroft and Mario Jauvin
Agenda
 Overview of Capacity Planning Requirements
and Data Sources
 Performance Data Collection
 Free Network Monitoring Tools
 Free System Monitoring Tools
 Free Load Generation and Modelling Tools
 Licences and References
May 14, 2014 Adrian Cockcroft and Mario Jauvin
What are we talking about?
Network
monitoring with
WireShark, MRTG,
BigSister, Cacti,
Nagios, OpenNMS,
Zenoss, Openxtra,
ntop
Database Tier monitoring
With SEtoolkit, Orca,
XEtoolkit
Application Tier
monitoring with Orca,
Cacti, BigSister, Ganglia,
XEtoolkit
QA Load generation with
Grinder or SLAMD,
modelling with PDQ and R
May 14, 2014 Adrian Cockcroft and Mario Jauvin
Capacity Planning
Requirements and Data
Sources
May 14, 2014 Adrian Cockcroft and Mario Jauvin
Definitions
 Capacity
– Resource utilization and headroom
 Planning
– Predicting future needs by analyzing historical data
and modeling future scenarios
 Performance Monitoring
– Collecting and reporting on performance data
 Free Tools
– Bundled with the OS or available for no $$$
May 14, 2014 Adrian Cockcroft and Mario Jauvin
Capacity Planning Requirements
 We care about CPU, Memory, Network and Disk
resources, and Application response times
 We need to know how much of each resource we
are using now, and will use in the future
 We need to know how much headroom we have to
handle higher loads
 We want to understand how headroom varies, and
how it relates to application response times and
throughput
May 14, 2014 Adrian Cockcroft and Mario Jauvin
CPU Capacity Measurements
 CPU Capacity is defined by CPU type and
clock rate, or a benchmark rating like
SPECrateInt2000
 CPU utilization is defined as busy time
divided by elapsed time for each CPU
 CPU load average measures the average
number of jobs running and ready to run
May 14, 2014 Adrian Cockcroft and Mario Jauvin
Memory Capacity Measurements
 Physical Memory Capacity Utilization and Limits
– Kernel memory
– Shared Memory segment
– Executable code, stack and heap
– File system cache usage
– Unused free memory
 Virtual Memory Capacity - Swap Space
 Memory Throughput
– Page in and page out rates
May 14, 2014 Adrian Cockcroft and Mario Jauvin
Network Capacity Measurements
 Network Interface Throughput
– Byte and packet rates input and output
 TCP Protocol Specific Throughput
– TCP connection count and connection rates
– TCP byte rates input and output
 NFS/SMB Protocol Specific Throughput
– Byte rates read and write
– NFS/SMB service response times
 HTTP Protocol Specific Throughput
– HTTP operation rates
– Get and put payload byte rates and size distribution
May 14, 2014 Adrian Cockcroft and Mario Jauvin
Disk Capacity Measurements
 Detailed metrics vary by platform
 Easy for the simple disk cases
 Hard for cached RAID subsystems
 Almost Impossible for shared disk
subsystems and SANs
– Another system or volume can be sharing a
backend spindle, when it gets busy your own
volume can saturate, even though you did not
change your own workload
May 14, 2014 Adrian Cockcroft and Mario Jauvin
Capacity Planning Challenges
 Constantly changing infrastructure
 Limited attention span from staff
 Horizontally scaled commodity systems
 Per node software licencing costs too much
 Too many tools, too many agents per node
 Too much data, not enough analysis
 Non-linear and non-intuitive scalability
 Lack of tools and metrics for virtualized resources
May 14, 2014 Adrian Cockcroft and Mario Jauvin
Observability
 Four different viewpoints
– Management
– Engineering
– QA Testing
– Operations
 Each needs very different information
 Ideal would be different views of the same
performance database
 Reality is a mess of disjoint tools
May 14, 2014 Adrian Cockcroft and Mario Jauvin
Management Viewpoint
 Daily summary of status and problems
 Business oriented metrics
 Future scenario planning
 Marketing and management input
 Concise report with dashboard style status
indicators
 Free tools: R, Spreadsheet and Web based
displays, no good summarization tools
May 14, 2014 Adrian Cockcroft and Mario Jauvin
Engineering Viewpoint
 Large volumes of detailed data at several different
time scales
 Input to tuning, reconfiguring and future product
development
 Low level problem diagnosis
 Detailed reports with drill down and correlation
analysis
 Free tools: XE/SE Toolkit, Orca, Ganglia, Cacti, R
May 14, 2014 Adrian Cockcroft and Mario Jauvin
QA Test Viewpoint
 Workload specification tools
 Load generation frameworks
 Testing for functionality and performance
 Regression tools to compare releases
 Modelling difference between test configuration
and production configuration
 Free Tools: The Grinder, SLAMD, R, PDQ
May 14, 2014 Adrian Cockcroft and Mario Jauvin
Operations Viewpoint
 Immediate timeframe
 Real time display, updated in seconds
 Alert based monitoring
 High level problem diagnosis
 Simple high level graphs and views
 Free tools: BigSister, Nagios, OpenNMS,
MRTG, Cacti, Ganglia, WireShark, ntop
May 14, 2014 Adrian Cockcroft and Mario Jauvin
Measurement Data Interfaces
 Several generic raw access methods
– Read the kernel directly (not a good idea)
– Structured system data (Solaris kstat, Linux /proc)
– Process data
– Network data
– Accounting data
– Application data
 Command based data interfaces
– Scrape data from vmstat, iostat, netstat, sar, ps
– Higher overhead, lower resolution, missing metrics
 Data available is platform specific either way
 Much more detail on this topic in the Solaris/Linux Performance
Measurement and Tuning Class
May 14, 2014 Adrian Cockcroft and Mario Jauvin
Free Network
Monitoring Tools
May 14, 2014 Adrian Cockcroft and Mario Jauvin
SNMP
 Simple network management protocol
 UDP protocol based on port 161
 Client/server like
– Client is called management application entity
– Server is called an agent entity
 Agent entity is designed to be implemented
on network hardware, router, switches, etc
May 14, 2014 Adrian Cockcroft and Mario Jauvin
SNMP – MIBs
 Management information base
 Defines the structure and the semantic of the
information that can be reported on
 Most commonly used is MIB-II which defines a set of
standard networking attributes
– Interface tables
– System level information
– Routing tables
 Specified using ASN.1 (abstract syntax notation 1)
May 14, 2014 Adrian Cockcroft and Mario Jauvin
SNMP – commands
 Called PDU (protocol data units)
 GET
 GETNEXT
 GETBULK
 SET
 Encoded using BER (basic encoding rules)
May 14, 2014 Adrian Cockcroft and Mario Jauvin
Versions
 Version 1, original version done in May 1991
 Version 2, around 1993. Failed because the
IETF credo of “rough consensus and running
code” could not be met on securing SNMP
 Turned into V2c for community string security
(like V1)
 Version 3, added security and complexity in
1998
May 14, 2014 Adrian Cockcroft and Mario Jauvin
SNMP tools
 Too numerous to name all but…
 OpenNMS
 Nagios
 Cacti
 MRTG
 Net-snmp
– See www.snmplink.org
May 14, 2014 Adrian Cockcroft and Mario Jauvin
SNMP tools
 Snmpwalk – will report all data in a specified
MIB
 getIf – will report data about interfaces and
includes built-in MIB browser
 Snmptable – will report tabular data from MIB
tables
May 14, 2014 Adrian Cockcroft and Mario Jauvin
OpenNMS
 Well…. it’s not that portable
– 95% java is not 100% java
– Requires about 20-30 different platform specific
packages (PostgreSQL, Perl, RRD tool, Tomcat 4
etc…)
– Difficult to install
– Easy auto discovery
– Web-based interface
May 14, 2014 Adrian Cockcroft and Mario Jauvin
OpenNMS
 Main screen shot
May 14, 2014 Adrian Cockcroft and Mario Jauvin
OpenNMS
 Node screen shot
May 14, 2014 Adrian Cockcroft and Mario Jauvin
Nagios
 Easy to build/compile (on Solaris 10)
 Easy to install
 Quick response from CGI
 Configuration is manual and a pain
– 13 configuration files with all kinds of interrelated
entries
– Tedious and error prone
 Requires plugins to do anything
May 14, 2014 Adrian Cockcroft and Mario Jauvin
Nagios
 Main screen shot
May 14, 2014 Adrian Cockcroft and Mario Jauvin
Nagios
 Host detail screen shot
May 14, 2014 Adrian Cockcroft and Mario Jauvin
May 14, 2014 Adrian Cockcroft and Mario Jauvin
ntop
 Similar to familiar UNIX top tool for
processes but used for network
 Provide huge selection of real-time data
 Can be found at http://www.openxtra.co.uk/
May 14, 2014 Adrian Cockcroft and Mario Jauvin
ntop – Active Sessions
May 14, 2014 Adrian Cockcroft and Mario Jauvin
ntop Hosts
May 14, 2014 Adrian Cockcroft and Mario Jauvin
ntop Network Load
May 14, 2014 Adrian Cockcroft and Mario Jauvin
ntop_Network_Thruput
May 14, 2014 Adrian Cockcroft and Mario Jauvin
ntop Port Dist
May 14, 2014 Adrian Cockcroft and Mario Jauvin
ntop_Protocol_Dist
May 14, 2014 Adrian Cockcroft and Mario Jauvin
ntop Protocols
May 14, 2014 Adrian Cockcroft and Mario Jauvin
Zenoss
 Open source monitoring and management of
IT infrastructure
 Zenoss core is free
 Other editions are for a fee
 Get it from http://www.zenoss.com/download/
May 14, 2014 Adrian Cockcroft and Mario Jauvin
zenoss Architecture
May 14, 2014 Adrian Cockcroft and Mario Jauvin
zenoss Dash Config
May 14, 2014 Adrian Cockcroft and Mario Jauvin
zenoss Google
May 14, 2014 Adrian Cockcroft and Mario Jauvin
zenoss Google Alerts
May 14, 2014 Adrian Cockcroft and Mario Jauvin
Zenoss Graphs
May 14, 2014 Adrian Cockcroft and Mario Jauvin
zenoss Topology
May 14, 2014 Adrian Cockcroft and Mario Jauvin
MRTG
 Really simple to install and configure
 Require manual config file creation
 Only for MIB-II interface plotting out of the
box
 Graphing not flexible, axis, time etc
May 14, 2014 Adrian Cockcroft and Mario Jauvin
MRTG
 Interface screen shot
May 14, 2014 Adrian Cockcroft and Mario Jauvin
MRTG
 Other CPU screen shot
May 14, 2014 Adrian Cockcroft and Mario Jauvin
RRD tool
 Software to store, retrieve and graph
numerical time series data
 Use a round robin algorithm
 Data files are a fixed size
– Don’t grow
– Don’t require maintenance
May 14, 2014 Adrian Cockcroft and Mario Jauvin
RRD tool
 Compiles on most platforms
 Used by many SNMP based tools
– OpenNMS
– Cacti
– BigSister
– WeatherMap4RRD
– MailGraph
May 14, 2014 Adrian Cockcroft and Mario Jauvin
RRD tool
 14all CGI script that plots data similar to
MRTG
 Configurable to collect data at different
interval (unlike MRTG)
 Flexible and variable in what data can be
collected
May 14, 2014 Adrian Cockcroft and Mario Jauvin
RRD tool
 Sample screen shot
May 14, 2014 Adrian Cockcroft and Mario Jauvin
RRD tool
 Screen shot
May 14, 2014 Adrian Cockcroft and Mario Jauvin
RRD tool
 Create a RRD database
rrdtool create test.rrd 
--start 920804400 
DS:speed:COUNTER:600:U:U 
RRA:AVERAGE:0.5:1:24 
RRA:AVERAGE:0.5:6:10
May 14, 2014 Adrian Cockcroft and Mario Jauvin
RRD tool
 Create a graph
rrdtool graph speed.png 
--start 920804400 --end 920808000 
DEF:myspeed=test.rrd:speed:AVERAGE 
LINE2:myspeed#FF0000
May 14, 2014 Adrian Cockcroft and Mario Jauvin
Free Performance Data
Collection and Rules
Toolkits
May 14, 2014 Adrian Cockcroft and Mario Jauvin
SE toolkit Example Tools
 A free performance toolkit for rapidly creating custom data sources
 Makes all the very extensive Solaris metrics easily available
 Very system specific and not enough metrics exist to port to Linux
 Written by Rich Pettit with contributions from Adrian Cockcroft
 Get SE3.4 from http://sourceforge.net/projects/setoolkit/
 Open source with support for SPARC & x86 Solaris 8, 9, 10
Function Example SE Programs
Rule Monitors cpg.se monlog.se mon_cm.se live_test.se percollator.se
zoom.se virtual_adrian.se virtual_adrian_lite.se
Disk Monitors siostat.se xio.se xiostat.se iomonitor.se iost.se xit.se disks.se
CPU Monitors cpu_meter.se vmmonitor.se mpvmstat.se
Process Monitors msacct.se pea.se ps-ax.se ps-p.se pwatch.se pw.se
Network Monitors net.se tcp_monitor.se netmonitor.se netstatx.se nfsmonitor.se nx.se
Clones iostat.se uname.se vmstat.se nfsstat-m.se perfmeter.se xload.se
Data browsers aw.se infotool.se multi_meter.se
Contributed Code anasa dfstats kview systune watch orcollator.se
Test Programs syslog.se cpus.se pure_test.se collisions.se uptime.se dumpkstats.se
net_example nproc.se kvmname.se
May 14, 2014 Adrian Cockcroft and Mario Jauvin
SE language features
 SE is a 64bit interpreted dialect of C
– Not a new language to learn from scratch!
– Standard C /usr/ccs/bin/cpp used at runtime to preprocess SE scripts
– Main omissions - pointer types and goto
– Main additions - classes and “string” type
– powerful ways to handle dynamically allocated data
– built-in fast balanced tree routines for storing key indexed data
 Dynamic linking to all existing C libraries
– Built-in classes access kernel data
– Supplied class code hides details, provides the data you want
 Example scripts improve on basic utilities e.g. siostat.se, nx.se, pea.se
 Example rule based monitors e.g. virtual_adrian.se, orcallator.se
Creating Rules
 Based on real experiences of all the things that go
wrong
 Capture an approximation to intuition
 Test and calibrate rules on as many systems as
possible
 Easy??
May 14, 2014 Adrian Cockcroft and Mario Jauvin
May 14, 2014 Adrian Cockcroft and Mario Jauvin
Configuring Rules
 Thresholds should be configured
 Very application dependent
 Capture the operating envelope
– Measure the underlying values
– Measure peaks in normal operation
– Note values during problems
– Set thresholds to capture the difference
 This applies to any tool
– SE Toolkit, Cacti, Ganglia, Nagios, OpenNMS etc.
May 14, 2014 Adrian Cockcroft and Mario Jauvin
Rules as Objects
 Define only the input and output information
 Hide implementation details
 Make high level rule objects trivial to use and
reuse
 SE Toolkit does it in three lines of code:
– #include <rules file>
– Declare rule object as a typed variable
– Read and use or print object status
May 14, 2014 Adrian Cockcroft and Mario Jauvin
"virtual adrian" rules summary
 Disk Rule for all disks at once
– Looks for slow disks and unbalanced usage
 Network Rule for all networks at once
– Looks for slow nets and unbalanced usage
 Swap Rule - Looks for lack of available swap space
 RAM Rule - Looks for short page residence times
 CPU Power Rule
– Scales on MP systems
– Looks for long run queue delays
 Mutex Rule - Looks for kernel lock contention and high sys CPU time
 TCP Rule
– Looks for listen queue problems
– Reports on connection attempt failures
May 14, 2014 Adrian Cockcroft and Mario Jauvin
XE Toolkit - www.xetoolkit.com
 Complete re-write of SE Toolkit by Rich Pettit
– Extensible Java collector, customize with jar files
– Release 1.2 available April 2008
– Multi-platform support Solaris, Linux/x86, Windows, BSD,
OSX, HP-UX, AIX, Linux/s390, Linux/Power
 Licencing
– Free GPL version for standard use and shared derivations
– Open source, hosted at http://sourceforge.net/projects/xe-toolkit/
– Commercial support available if needed
– Commercial product license for custom in-house derivations
 Addresses all the issues people had with SE toolkit !
May 14, 2014 Adrian Cockcroft and Mario Jauvin
Captive Metrics / XE Toolkit
Architecture
May 14, 2014 Adrian Cockcroft and Mario Jauvin
Free System Monitoring
Tools
May 14, 2014 Adrian Cockcroft and Mario Jauvin
Collated Performance Data - Orca
 Problems with time sync when collecting data from multiple tools
– No timestamp at all for vmstat, netstat, df...
– No timestamp by default for iostat and ps...
– No way to collect realtime stats from an http logfile
 Use SE Toolkit to generate one timestamped row containing all the data
– First version of percollator.se written by Adrian Cockcroft in 1996
– Extended orcallator.se written by Blair Zajac a few years later
– Graphs generated by orca batch job feeding rrdtool based web pages
– Active community developing tool at http://www.orcaware.com
– Extended to collect much more data, including process workloads
– Basic data collection ported to Linux, HP-UX and Windows
 Orca is basically MRTG for System metrics rather than Network
 See http://www.orcaware.com/orca/docs/Orca_Understanding_Performance_Data.ppt
May 14, 2014 Adrian Cockcroft and Mario Jauvin
Orca data collections
 Collected using “procollator” reading info from /proc on Linux
[Uptime] [Average # Processes in Run Queue (Load Average)] [CPU Usage]
[New Process Spawn Rate] [Number of System & Running Processes]
[Context Switches & Interrupts Rate] [Interface Input Bits Per Second]
[Interface Output Bits Per Second] [Interface Input Packets Per Second]
[Interface Output Packets Per Second] [Interface Input Errors Per Second]
[Interface Output Errors Per Second] [Interface Input Dropped Per Second]
[Interface Output Dropped Per Second] [Interface Output Collisions]
[Interface Output Carrier Losses] [TCP Current Connections] [IP Statistics]
[TCP Statistics] [ICMP Statistics] [UDP Statistics]
[Disk System Wide Reads/Writes Per Second] [Disk System Wide Transfer Rate]
[Disk Reads/Writes Per Second] [Disk Transfer Rate] [Disk Space Percent
Usage] [Physical Memory Usage] [Swap Usage] [Page Ins & Outs Rate]
[Swap Ins & Outs Rate]
 Orca on Solaris collects many more metrics than shown above
 Strength of Orca is lots of detailed metrics with low overhead for collection
 Easily customized to add more system metrics or application metrics
 Orca can already track HTTP traffic and parse log files
May 14, 2014 Adrian Cockcroft and Mario Jauvin
All metrics are stored in
“round robin database” format
using RRDtool to generate
displays over different time
spans
Web page is simple collection
of plots with drill down by
metric or by time
Suitable for monitoring a
relatively small number of
systems in great detail, e.g.
backend database servers
May 14, 2014 Adrian Cockcroft and Mario Jauvin
Cacti – www.cacti.net
 Web based user interface based on RRDtool
 More sophisticated GUI than Orca or MRTG
 Less sophisticated system metric collection,
but more coverage of networking
 Better management of groups of systems
and devices than Orca, useful for tens to
hundreds of nodes
 Access control and personalization for users
May 14, 2014 Adrian Cockcroft and Mario Jauvin
May 14, 2014 Adrian Cockcroft and Mario Jauvin
May 14, 2014 Adrian Cockcroft and Mario Jauvin
Ganglia – www.ganglia.info
 Web based RRDtool GUI somewhat similar to Cacti
 Better management of clusters of systems and
devices than Cacti, useful for hundreds to thousands
of nodes in a hierarchy of clusters
 Provides many summary statistic plots at cluster
level and collects detailed configuration data
 XML based data representation
 Uses low overhead network protocol
 In common use at hundreds of large HPC Grid sites,
less visibly in use at some large commercial sites
May 14, 2014 Adrian Cockcroft and Mario Jauvin
May 14, 2014 Adrian Cockcroft and Mario Jauvin
May 14, 2014 Adrian Cockcroft and Mario Jauvin
May 14, 2014 Adrian Cockcroft and Mario Jauvin
BigBrother and BigSister
 Network and system dashboard alert monitor
 Widely used at internet sites
 Bigbrother is at http://www.bb4.com
 Bigsister is at http://bigsister.graeff.com
 Bigsister seems to have more features, alert
logging, better portability and more efficient
data collection. Compatible update to BB4.
May 14, 2014 Adrian Cockcroft and Mario Jauvin
May 14, 2014 Adrian Cockcroft and Mario Jauvin
May 14, 2014 Adrian Cockcroft and Mario Jauvin
May 14, 2014 Adrian Cockcroft and Mario Jauvin
Free QA Test and
Modelling Tools
May 14, 2014 Adrian Cockcroft and Mario Jauvin
QA Test Requirements
 Generate test workload
– SLAMD, Grinder
 Collect performance metrics
– Any of the tools already mentioned
 Report regression against baseline
 Predict capacity needed for production system
– Use spreadsheets for simple linear prediction
– Use modelling tools such as PDQ for queuing models
May 14, 2014 Adrian Cockcroft and Mario Jauvin
Grinder 3 - Powerful New Features
 100% Pure Java - works on any hardware platform and any
operating system that supports J2SE 1.3 and above.
 Java and Jython based load testing framework
– Web Browsers: simulate web browsers using HTTP, and HTTPS.
– Web Services: test interfaces using SOAP and XML-RPC.
– Database: test databases using JDBC.
– Middleware: RPC and MOM based systems using IIOP, RMI/IIOP,
RMI/JRMP, and JMS.
– Other Internet protocols: POP3, SMTP, FTP, and LDAP.
 See http://grinder.sourceforge.net/g3/features.html
 J2EE Performance Testing with BEA WebLogic Server by Peter
Zadrozny, Philip Aston and Ted Osborne, originally published
by Expert Press and now by APress uses Grinder 2 throughout.
May 14, 2014 Adrian Cockcroft and Mario Jauvin
SLAMD
 Load generation framework, written in Java
 Originally built to test LDAP servers by Sun
 Extended to be very generic and published
as open source. Actively being developed.
 Sophisticated functions and user interface
 See http://www.slamd.com
 Latest Release 2.0 has better usability focus
May 14, 2014 Adrian Cockcroft and Mario Jauvin
May 14, 2014 Adrian Cockcroft and Mario Jauvin
May 14, 2014 Adrian Cockcroft and Mario Jauvin
May 14, 2014 Adrian Cockcroft and Mario Jauvin
PDQ Modelling Tool
 Dr Neil Gunther’s toolkit at
http://www.perfdynamics.com
 Library used from C or Perl provides MVA queueing
models
 Use to calibrate in QA and predict in production
 PDQ modelling tool details:
– The Practical Performance Analyst Dr. Neil Gunther -
McGraw-Hill, 1998 ISBN 0-07-912946-3
– Analyzing Computer System Performance with Perl:PDQ
2004, ISBN 3-54-020865-8
May 14, 2014 Adrian Cockcroft and Mario Jauvin
References and
Conclusion
May 14, 2014 Adrian Cockcroft and Mario Jauvin
Licences for Free Tools
 Open Source Initiative
– “OSI Approved licences”
– http://opensource.org/licenses/category
 Comparisons of Common Licences
– http://zooko.com/license_quick_ref.html
May 14, 2014 Adrian Cockcroft and Mario Jauvin
Web Pages and Books
 Adrian’s Performance and other topics blog
– http://perfcap.blogspot.com
 MFJ Associates performance tools link page
– http://www.mfjassociates.net/perf_links.html
 More free tools compiled by John Sellens
– http://www.generalconcepts.com/resources/monitoring/
 More tools compiled by Openxtra
– http://www.openxtra.co.uk/resource-center/open_source_network_monitor_tools.php
 SE toolkit info: Sun Performance and Tuning - Java and the Internet - Adrian
Cockcroft and Richard Pettit - Sun Press/Prentice Hall, 2nd
Edition, 1998 ISBN 0-13-
095249-4
 Solaris 8 and Linux: System Performance Tuning 2nd
Edition – Gian-Paolo Musumeci,
O’Reilly 2002 ISBN: 0-596-00284-X
 Solaris Internals http://www.solarisinternals.com
– Richard McDougall and James Mauro - new 2nd edition and new performance book by
Richard McDougall and Brendan Gregg
May 14, 2014 Adrian Cockcroft and Mario Jauvin
Concluding Remarks
 Many large installations depend on free tools
 A full suite of functionality is available
 Several tools are needed to cover the bases
 Tradeoff between function and ease of use
 Support may be available, but typically
Google is the best support tool
 Functionality is increasing….
May 14, 2014 Adrian Cockcroft and Mario Jauvin
Questions?
acockcroft@netflix.com
mario@mfjassociates.net

Mais conteúdo relacionado

Semelhante a Capacity Planning Free Solution

Spark China Summit 2015 Guancheng Chen
Spark China Summit 2015 Guancheng ChenSpark China Summit 2015 Guancheng Chen
Spark China Summit 2015 Guancheng Chen
Guancheng (G.C.) Chen
 
Juan Vazquez & Julián Vilas – Tú a Barcelona y yo a Tejas, a patadas con mi S...
Juan Vazquez & Julián Vilas – Tú a Barcelona y yo a Tejas, a patadas con mi S...Juan Vazquez & Julián Vilas – Tú a Barcelona y yo a Tejas, a patadas con mi S...
Juan Vazquez & Julián Vilas – Tú a Barcelona y yo a Tejas, a patadas con mi S...
RootedCON
 
Building a CloudStack UI for the Enterprise
Building a CloudStack UI for the EnterpriseBuilding a CloudStack UI for the Enterprise
Building a CloudStack UI for the Enterprise
David Grizzanti
 

Semelhante a Capacity Planning Free Solution (20)

Capacity Planning with Free Tools
Capacity Planning with Free ToolsCapacity Planning with Free Tools
Capacity Planning with Free Tools
 
An introduction to the prpl foundation
An introduction to the prpl foundationAn introduction to the prpl foundation
An introduction to the prpl foundation
 
2016 open-source-network-softwarization
2016 open-source-network-softwarization2016 open-source-network-softwarization
2016 open-source-network-softwarization
 
2016 open-source-network-softwarization
2016 open-source-network-softwarization2016 open-source-network-softwarization
2016 open-source-network-softwarization
 
Netsoft19 Keynote: Fluid Network Planes
Netsoft19 Keynote: Fluid Network PlanesNetsoft19 Keynote: Fluid Network Planes
Netsoft19 Keynote: Fluid Network Planes
 
IPv4 to IPv6 network transformation
IPv4 to IPv6 network transformationIPv4 to IPv6 network transformation
IPv4 to IPv6 network transformation
 
"In love with Open Source : Past, Present and Future" : Keynote OSDConf 2014
"In love with Open Source : Past, Present and Future" : Keynote OSDConf 2014"In love with Open Source : Past, Present and Future" : Keynote OSDConf 2014
"In love with Open Source : Past, Present and Future" : Keynote OSDConf 2014
 
Monitoring in 2017 - TIAD Camp Docker
Monitoring in 2017 - TIAD Camp DockerMonitoring in 2017 - TIAD Camp Docker
Monitoring in 2017 - TIAD Camp Docker
 
Next Generation Vulnerability Assessment Using Datadog and Snyk
Next Generation Vulnerability Assessment Using Datadog and SnykNext Generation Vulnerability Assessment Using Datadog and Snyk
Next Generation Vulnerability Assessment Using Datadog and Snyk
 
Spark China Summit 2015 Guancheng Chen
Spark China Summit 2015 Guancheng ChenSpark China Summit 2015 Guancheng Chen
Spark China Summit 2015 Guancheng Chen
 
The Role of Machine Learning in Fluid Network Control and Data Planes.pdf
The Role of Machine Learning in Fluid Network Control and Data Planes.pdfThe Role of Machine Learning in Fluid Network Control and Data Planes.pdf
The Role of Machine Learning in Fluid Network Control and Data Planes.pdf
 
VPN Types, Vulnerabilities & Solutions - Tareq Hanaysha
VPN Types, Vulnerabilities & Solutions - Tareq HanayshaVPN Types, Vulnerabilities & Solutions - Tareq Hanaysha
VPN Types, Vulnerabilities & Solutions - Tareq Hanaysha
 
Juan Vazquez & Julián Vilas – Tú a Barcelona y yo a Tejas, a patadas con mi S...
Juan Vazquez & Julián Vilas – Tú a Barcelona y yo a Tejas, a patadas con mi S...Juan Vazquez & Julián Vilas – Tú a Barcelona y yo a Tejas, a patadas con mi S...
Juan Vazquez & Julián Vilas – Tú a Barcelona y yo a Tejas, a patadas con mi S...
 
2017 dagstuhl-nfv-rothenberg
2017 dagstuhl-nfv-rothenberg2017 dagstuhl-nfv-rothenberg
2017 dagstuhl-nfv-rothenberg
 
How Comcast Turns Big Data into Real Time Operational Insights: Winter Olympi...
How Comcast Turns Big Data into Real Time Operational Insights: Winter Olympi...How Comcast Turns Big Data into Real Time Operational Insights: Winter Olympi...
How Comcast Turns Big Data into Real Time Operational Insights: Winter Olympi...
 
Building a CloudStack UI for the Enterprise
Building a CloudStack UI for the EnterpriseBuilding a CloudStack UI for the Enterprise
Building a CloudStack UI for the Enterprise
 
SRv6 experience for italy iPv6 council
SRv6 experience for italy iPv6 councilSRv6 experience for italy iPv6 council
SRv6 experience for italy iPv6 council
 
Mashups for Network Management
Mashups for Network ManagementMashups for Network Management
Mashups for Network Management
 
Recording and media manipulation of WebRTC streams
Recording and media manipulation of WebRTC streamsRecording and media manipulation of WebRTC streams
Recording and media manipulation of WebRTC streams
 
The future of multimedia communications and services: Kurento and it's role
The future of multimedia communications and services: Kurento and it's roleThe future of multimedia communications and services: Kurento and it's role
The future of multimedia communications and services: Kurento and it's role
 

Mais de luanrjesus

Resumo ISO 27002 para Concurso
Resumo ISO 27002 para ConcursoResumo ISO 27002 para Concurso
Resumo ISO 27002 para Concurso
luanrjesus
 
Resumo ITIL V3 para Concurso
Resumo ITIL V3 para ConcursoResumo ITIL V3 para Concurso
Resumo ITIL V3 para Concurso
luanrjesus
 
Simulado EXIM ISO27002
Simulado EXIM ISO27002Simulado EXIM ISO27002
Simulado EXIM ISO27002
luanrjesus
 
Conceitos de Sistemas de Informação
Conceitos de Sistemas de InformaçãoConceitos de Sistemas de Informação
Conceitos de Sistemas de Informação
luanrjesus
 
Conceito de analise de desenvolvivento de sistemas
Conceito de analise de desenvolvivento de sistemasConceito de analise de desenvolvivento de sistemas
Conceito de analise de desenvolvivento de sistemas
luanrjesus
 
Planejamento de Capacidade Técnicas e Ferramentas
Planejamento de Capacidade Técnicas e FerramentasPlanejamento de Capacidade Técnicas e Ferramentas
Planejamento de Capacidade Técnicas e Ferramentas
luanrjesus
 
Planejamento e Gerenciamento de Capacidade para Sistemas Distribuídos
Planejamento e Gerenciamento de Capacidade para Sistemas DistribuídosPlanejamento e Gerenciamento de Capacidade para Sistemas Distribuídos
Planejamento e Gerenciamento de Capacidade para Sistemas Distribuídos
luanrjesus
 

Mais de luanrjesus (7)

Resumo ISO 27002 para Concurso
Resumo ISO 27002 para ConcursoResumo ISO 27002 para Concurso
Resumo ISO 27002 para Concurso
 
Resumo ITIL V3 para Concurso
Resumo ITIL V3 para ConcursoResumo ITIL V3 para Concurso
Resumo ITIL V3 para Concurso
 
Simulado EXIM ISO27002
Simulado EXIM ISO27002Simulado EXIM ISO27002
Simulado EXIM ISO27002
 
Conceitos de Sistemas de Informação
Conceitos de Sistemas de InformaçãoConceitos de Sistemas de Informação
Conceitos de Sistemas de Informação
 
Conceito de analise de desenvolvivento de sistemas
Conceito de analise de desenvolvivento de sistemasConceito de analise de desenvolvivento de sistemas
Conceito de analise de desenvolvivento de sistemas
 
Planejamento de Capacidade Técnicas e Ferramentas
Planejamento de Capacidade Técnicas e FerramentasPlanejamento de Capacidade Técnicas e Ferramentas
Planejamento de Capacidade Técnicas e Ferramentas
 
Planejamento e Gerenciamento de Capacidade para Sistemas Distribuídos
Planejamento e Gerenciamento de Capacidade para Sistemas DistribuídosPlanejamento e Gerenciamento de Capacidade para Sistemas Distribuídos
Planejamento e Gerenciamento de Capacidade para Sistemas Distribuídos
 

Último

Último (20)

DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 

Capacity Planning Free Solution

  • 1. The Performance People Performance Management with Free and Bundled Tools Adrian Cockcroft Netflix Inc. acockcroft@netflix.com (Co-authored with Mario Jauvin MFJ Associates mario@mfjassociates.net) 14 May 2014
  • 2. May 14, 2014 Adrian Cockcroft and Mario Jauvin Agenda  Overview of Capacity Planning Requirements and Data Sources  Performance Data Collection  Free Network Monitoring Tools  Free System Monitoring Tools  Free Load Generation and Modelling Tools  Licences and References
  • 3. May 14, 2014 Adrian Cockcroft and Mario Jauvin What are we talking about? Network monitoring with WireShark, MRTG, BigSister, Cacti, Nagios, OpenNMS, Zenoss, Openxtra, ntop Database Tier monitoring With SEtoolkit, Orca, XEtoolkit Application Tier monitoring with Orca, Cacti, BigSister, Ganglia, XEtoolkit QA Load generation with Grinder or SLAMD, modelling with PDQ and R
  • 4. May 14, 2014 Adrian Cockcroft and Mario Jauvin Capacity Planning Requirements and Data Sources
  • 5. May 14, 2014 Adrian Cockcroft and Mario Jauvin Definitions  Capacity – Resource utilization and headroom  Planning – Predicting future needs by analyzing historical data and modeling future scenarios  Performance Monitoring – Collecting and reporting on performance data  Free Tools – Bundled with the OS or available for no $$$
  • 6. May 14, 2014 Adrian Cockcroft and Mario Jauvin Capacity Planning Requirements  We care about CPU, Memory, Network and Disk resources, and Application response times  We need to know how much of each resource we are using now, and will use in the future  We need to know how much headroom we have to handle higher loads  We want to understand how headroom varies, and how it relates to application response times and throughput
  • 7. May 14, 2014 Adrian Cockcroft and Mario Jauvin CPU Capacity Measurements  CPU Capacity is defined by CPU type and clock rate, or a benchmark rating like SPECrateInt2000  CPU utilization is defined as busy time divided by elapsed time for each CPU  CPU load average measures the average number of jobs running and ready to run
  • 8. May 14, 2014 Adrian Cockcroft and Mario Jauvin Memory Capacity Measurements  Physical Memory Capacity Utilization and Limits – Kernel memory – Shared Memory segment – Executable code, stack and heap – File system cache usage – Unused free memory  Virtual Memory Capacity - Swap Space  Memory Throughput – Page in and page out rates
  • 9. May 14, 2014 Adrian Cockcroft and Mario Jauvin Network Capacity Measurements  Network Interface Throughput – Byte and packet rates input and output  TCP Protocol Specific Throughput – TCP connection count and connection rates – TCP byte rates input and output  NFS/SMB Protocol Specific Throughput – Byte rates read and write – NFS/SMB service response times  HTTP Protocol Specific Throughput – HTTP operation rates – Get and put payload byte rates and size distribution
  • 10. May 14, 2014 Adrian Cockcroft and Mario Jauvin Disk Capacity Measurements  Detailed metrics vary by platform  Easy for the simple disk cases  Hard for cached RAID subsystems  Almost Impossible for shared disk subsystems and SANs – Another system or volume can be sharing a backend spindle, when it gets busy your own volume can saturate, even though you did not change your own workload
  • 11. May 14, 2014 Adrian Cockcroft and Mario Jauvin Capacity Planning Challenges  Constantly changing infrastructure  Limited attention span from staff  Horizontally scaled commodity systems  Per node software licencing costs too much  Too many tools, too many agents per node  Too much data, not enough analysis  Non-linear and non-intuitive scalability  Lack of tools and metrics for virtualized resources
  • 12. May 14, 2014 Adrian Cockcroft and Mario Jauvin Observability  Four different viewpoints – Management – Engineering – QA Testing – Operations  Each needs very different information  Ideal would be different views of the same performance database  Reality is a mess of disjoint tools
  • 13. May 14, 2014 Adrian Cockcroft and Mario Jauvin Management Viewpoint  Daily summary of status and problems  Business oriented metrics  Future scenario planning  Marketing and management input  Concise report with dashboard style status indicators  Free tools: R, Spreadsheet and Web based displays, no good summarization tools
  • 14. May 14, 2014 Adrian Cockcroft and Mario Jauvin Engineering Viewpoint  Large volumes of detailed data at several different time scales  Input to tuning, reconfiguring and future product development  Low level problem diagnosis  Detailed reports with drill down and correlation analysis  Free tools: XE/SE Toolkit, Orca, Ganglia, Cacti, R
  • 15. May 14, 2014 Adrian Cockcroft and Mario Jauvin QA Test Viewpoint  Workload specification tools  Load generation frameworks  Testing for functionality and performance  Regression tools to compare releases  Modelling difference between test configuration and production configuration  Free Tools: The Grinder, SLAMD, R, PDQ
  • 16. May 14, 2014 Adrian Cockcroft and Mario Jauvin Operations Viewpoint  Immediate timeframe  Real time display, updated in seconds  Alert based monitoring  High level problem diagnosis  Simple high level graphs and views  Free tools: BigSister, Nagios, OpenNMS, MRTG, Cacti, Ganglia, WireShark, ntop
  • 17. May 14, 2014 Adrian Cockcroft and Mario Jauvin Measurement Data Interfaces  Several generic raw access methods – Read the kernel directly (not a good idea) – Structured system data (Solaris kstat, Linux /proc) – Process data – Network data – Accounting data – Application data  Command based data interfaces – Scrape data from vmstat, iostat, netstat, sar, ps – Higher overhead, lower resolution, missing metrics  Data available is platform specific either way  Much more detail on this topic in the Solaris/Linux Performance Measurement and Tuning Class
  • 18. May 14, 2014 Adrian Cockcroft and Mario Jauvin Free Network Monitoring Tools
  • 19. May 14, 2014 Adrian Cockcroft and Mario Jauvin SNMP  Simple network management protocol  UDP protocol based on port 161  Client/server like – Client is called management application entity – Server is called an agent entity  Agent entity is designed to be implemented on network hardware, router, switches, etc
  • 20. May 14, 2014 Adrian Cockcroft and Mario Jauvin SNMP – MIBs  Management information base  Defines the structure and the semantic of the information that can be reported on  Most commonly used is MIB-II which defines a set of standard networking attributes – Interface tables – System level information – Routing tables  Specified using ASN.1 (abstract syntax notation 1)
  • 21. May 14, 2014 Adrian Cockcroft and Mario Jauvin SNMP – commands  Called PDU (protocol data units)  GET  GETNEXT  GETBULK  SET  Encoded using BER (basic encoding rules)
  • 22. May 14, 2014 Adrian Cockcroft and Mario Jauvin Versions  Version 1, original version done in May 1991  Version 2, around 1993. Failed because the IETF credo of “rough consensus and running code” could not be met on securing SNMP  Turned into V2c for community string security (like V1)  Version 3, added security and complexity in 1998
  • 23. May 14, 2014 Adrian Cockcroft and Mario Jauvin SNMP tools  Too numerous to name all but…  OpenNMS  Nagios  Cacti  MRTG  Net-snmp – See www.snmplink.org
  • 24. May 14, 2014 Adrian Cockcroft and Mario Jauvin SNMP tools  Snmpwalk – will report all data in a specified MIB  getIf – will report data about interfaces and includes built-in MIB browser  Snmptable – will report tabular data from MIB tables
  • 25. May 14, 2014 Adrian Cockcroft and Mario Jauvin OpenNMS  Well…. it’s not that portable – 95% java is not 100% java – Requires about 20-30 different platform specific packages (PostgreSQL, Perl, RRD tool, Tomcat 4 etc…) – Difficult to install – Easy auto discovery – Web-based interface
  • 26. May 14, 2014 Adrian Cockcroft and Mario Jauvin OpenNMS  Main screen shot
  • 27. May 14, 2014 Adrian Cockcroft and Mario Jauvin OpenNMS  Node screen shot
  • 28. May 14, 2014 Adrian Cockcroft and Mario Jauvin Nagios  Easy to build/compile (on Solaris 10)  Easy to install  Quick response from CGI  Configuration is manual and a pain – 13 configuration files with all kinds of interrelated entries – Tedious and error prone  Requires plugins to do anything
  • 29. May 14, 2014 Adrian Cockcroft and Mario Jauvin Nagios  Main screen shot
  • 30. May 14, 2014 Adrian Cockcroft and Mario Jauvin Nagios  Host detail screen shot
  • 31. May 14, 2014 Adrian Cockcroft and Mario Jauvin
  • 32. May 14, 2014 Adrian Cockcroft and Mario Jauvin ntop  Similar to familiar UNIX top tool for processes but used for network  Provide huge selection of real-time data  Can be found at http://www.openxtra.co.uk/
  • 33. May 14, 2014 Adrian Cockcroft and Mario Jauvin ntop – Active Sessions
  • 34. May 14, 2014 Adrian Cockcroft and Mario Jauvin ntop Hosts
  • 35. May 14, 2014 Adrian Cockcroft and Mario Jauvin ntop Network Load
  • 36. May 14, 2014 Adrian Cockcroft and Mario Jauvin ntop_Network_Thruput
  • 37. May 14, 2014 Adrian Cockcroft and Mario Jauvin ntop Port Dist
  • 38. May 14, 2014 Adrian Cockcroft and Mario Jauvin ntop_Protocol_Dist
  • 39. May 14, 2014 Adrian Cockcroft and Mario Jauvin ntop Protocols
  • 40. May 14, 2014 Adrian Cockcroft and Mario Jauvin Zenoss  Open source monitoring and management of IT infrastructure  Zenoss core is free  Other editions are for a fee  Get it from http://www.zenoss.com/download/
  • 41. May 14, 2014 Adrian Cockcroft and Mario Jauvin zenoss Architecture
  • 42. May 14, 2014 Adrian Cockcroft and Mario Jauvin zenoss Dash Config
  • 43. May 14, 2014 Adrian Cockcroft and Mario Jauvin zenoss Google
  • 44. May 14, 2014 Adrian Cockcroft and Mario Jauvin zenoss Google Alerts
  • 45. May 14, 2014 Adrian Cockcroft and Mario Jauvin Zenoss Graphs
  • 46. May 14, 2014 Adrian Cockcroft and Mario Jauvin zenoss Topology
  • 47. May 14, 2014 Adrian Cockcroft and Mario Jauvin MRTG  Really simple to install and configure  Require manual config file creation  Only for MIB-II interface plotting out of the box  Graphing not flexible, axis, time etc
  • 48. May 14, 2014 Adrian Cockcroft and Mario Jauvin MRTG  Interface screen shot
  • 49. May 14, 2014 Adrian Cockcroft and Mario Jauvin MRTG  Other CPU screen shot
  • 50. May 14, 2014 Adrian Cockcroft and Mario Jauvin RRD tool  Software to store, retrieve and graph numerical time series data  Use a round robin algorithm  Data files are a fixed size – Don’t grow – Don’t require maintenance
  • 51. May 14, 2014 Adrian Cockcroft and Mario Jauvin RRD tool  Compiles on most platforms  Used by many SNMP based tools – OpenNMS – Cacti – BigSister – WeatherMap4RRD – MailGraph
  • 52. May 14, 2014 Adrian Cockcroft and Mario Jauvin RRD tool  14all CGI script that plots data similar to MRTG  Configurable to collect data at different interval (unlike MRTG)  Flexible and variable in what data can be collected
  • 53. May 14, 2014 Adrian Cockcroft and Mario Jauvin RRD tool  Sample screen shot
  • 54. May 14, 2014 Adrian Cockcroft and Mario Jauvin RRD tool  Screen shot
  • 55. May 14, 2014 Adrian Cockcroft and Mario Jauvin RRD tool  Create a RRD database rrdtool create test.rrd --start 920804400 DS:speed:COUNTER:600:U:U RRA:AVERAGE:0.5:1:24 RRA:AVERAGE:0.5:6:10
  • 56. May 14, 2014 Adrian Cockcroft and Mario Jauvin RRD tool  Create a graph rrdtool graph speed.png --start 920804400 --end 920808000 DEF:myspeed=test.rrd:speed:AVERAGE LINE2:myspeed#FF0000
  • 57. May 14, 2014 Adrian Cockcroft and Mario Jauvin Free Performance Data Collection and Rules Toolkits
  • 58. May 14, 2014 Adrian Cockcroft and Mario Jauvin SE toolkit Example Tools  A free performance toolkit for rapidly creating custom data sources  Makes all the very extensive Solaris metrics easily available  Very system specific and not enough metrics exist to port to Linux  Written by Rich Pettit with contributions from Adrian Cockcroft  Get SE3.4 from http://sourceforge.net/projects/setoolkit/  Open source with support for SPARC & x86 Solaris 8, 9, 10 Function Example SE Programs Rule Monitors cpg.se monlog.se mon_cm.se live_test.se percollator.se zoom.se virtual_adrian.se virtual_adrian_lite.se Disk Monitors siostat.se xio.se xiostat.se iomonitor.se iost.se xit.se disks.se CPU Monitors cpu_meter.se vmmonitor.se mpvmstat.se Process Monitors msacct.se pea.se ps-ax.se ps-p.se pwatch.se pw.se Network Monitors net.se tcp_monitor.se netmonitor.se netstatx.se nfsmonitor.se nx.se Clones iostat.se uname.se vmstat.se nfsstat-m.se perfmeter.se xload.se Data browsers aw.se infotool.se multi_meter.se Contributed Code anasa dfstats kview systune watch orcollator.se Test Programs syslog.se cpus.se pure_test.se collisions.se uptime.se dumpkstats.se net_example nproc.se kvmname.se
  • 59. May 14, 2014 Adrian Cockcroft and Mario Jauvin SE language features  SE is a 64bit interpreted dialect of C – Not a new language to learn from scratch! – Standard C /usr/ccs/bin/cpp used at runtime to preprocess SE scripts – Main omissions - pointer types and goto – Main additions - classes and “string” type – powerful ways to handle dynamically allocated data – built-in fast balanced tree routines for storing key indexed data  Dynamic linking to all existing C libraries – Built-in classes access kernel data – Supplied class code hides details, provides the data you want  Example scripts improve on basic utilities e.g. siostat.se, nx.se, pea.se  Example rule based monitors e.g. virtual_adrian.se, orcallator.se
  • 60. Creating Rules  Based on real experiences of all the things that go wrong  Capture an approximation to intuition  Test and calibrate rules on as many systems as possible  Easy?? May 14, 2014 Adrian Cockcroft and Mario Jauvin
  • 61. May 14, 2014 Adrian Cockcroft and Mario Jauvin Configuring Rules  Thresholds should be configured  Very application dependent  Capture the operating envelope – Measure the underlying values – Measure peaks in normal operation – Note values during problems – Set thresholds to capture the difference  This applies to any tool – SE Toolkit, Cacti, Ganglia, Nagios, OpenNMS etc.
  • 62. May 14, 2014 Adrian Cockcroft and Mario Jauvin Rules as Objects  Define only the input and output information  Hide implementation details  Make high level rule objects trivial to use and reuse  SE Toolkit does it in three lines of code: – #include <rules file> – Declare rule object as a typed variable – Read and use or print object status
  • 63. May 14, 2014 Adrian Cockcroft and Mario Jauvin "virtual adrian" rules summary  Disk Rule for all disks at once – Looks for slow disks and unbalanced usage  Network Rule for all networks at once – Looks for slow nets and unbalanced usage  Swap Rule - Looks for lack of available swap space  RAM Rule - Looks for short page residence times  CPU Power Rule – Scales on MP systems – Looks for long run queue delays  Mutex Rule - Looks for kernel lock contention and high sys CPU time  TCP Rule – Looks for listen queue problems – Reports on connection attempt failures
  • 64. May 14, 2014 Adrian Cockcroft and Mario Jauvin XE Toolkit - www.xetoolkit.com  Complete re-write of SE Toolkit by Rich Pettit – Extensible Java collector, customize with jar files – Release 1.2 available April 2008 – Multi-platform support Solaris, Linux/x86, Windows, BSD, OSX, HP-UX, AIX, Linux/s390, Linux/Power  Licencing – Free GPL version for standard use and shared derivations – Open source, hosted at http://sourceforge.net/projects/xe-toolkit/ – Commercial support available if needed – Commercial product license for custom in-house derivations  Addresses all the issues people had with SE toolkit !
  • 65. May 14, 2014 Adrian Cockcroft and Mario Jauvin Captive Metrics / XE Toolkit Architecture
  • 66. May 14, 2014 Adrian Cockcroft and Mario Jauvin Free System Monitoring Tools
  • 67. May 14, 2014 Adrian Cockcroft and Mario Jauvin Collated Performance Data - Orca  Problems with time sync when collecting data from multiple tools – No timestamp at all for vmstat, netstat, df... – No timestamp by default for iostat and ps... – No way to collect realtime stats from an http logfile  Use SE Toolkit to generate one timestamped row containing all the data – First version of percollator.se written by Adrian Cockcroft in 1996 – Extended orcallator.se written by Blair Zajac a few years later – Graphs generated by orca batch job feeding rrdtool based web pages – Active community developing tool at http://www.orcaware.com – Extended to collect much more data, including process workloads – Basic data collection ported to Linux, HP-UX and Windows  Orca is basically MRTG for System metrics rather than Network  See http://www.orcaware.com/orca/docs/Orca_Understanding_Performance_Data.ppt
  • 68. May 14, 2014 Adrian Cockcroft and Mario Jauvin Orca data collections  Collected using “procollator” reading info from /proc on Linux [Uptime] [Average # Processes in Run Queue (Load Average)] [CPU Usage] [New Process Spawn Rate] [Number of System & Running Processes] [Context Switches & Interrupts Rate] [Interface Input Bits Per Second] [Interface Output Bits Per Second] [Interface Input Packets Per Second] [Interface Output Packets Per Second] [Interface Input Errors Per Second] [Interface Output Errors Per Second] [Interface Input Dropped Per Second] [Interface Output Dropped Per Second] [Interface Output Collisions] [Interface Output Carrier Losses] [TCP Current Connections] [IP Statistics] [TCP Statistics] [ICMP Statistics] [UDP Statistics] [Disk System Wide Reads/Writes Per Second] [Disk System Wide Transfer Rate] [Disk Reads/Writes Per Second] [Disk Transfer Rate] [Disk Space Percent Usage] [Physical Memory Usage] [Swap Usage] [Page Ins & Outs Rate] [Swap Ins & Outs Rate]  Orca on Solaris collects many more metrics than shown above  Strength of Orca is lots of detailed metrics with low overhead for collection  Easily customized to add more system metrics or application metrics  Orca can already track HTTP traffic and parse log files
  • 69. May 14, 2014 Adrian Cockcroft and Mario Jauvin All metrics are stored in “round robin database” format using RRDtool to generate displays over different time spans Web page is simple collection of plots with drill down by metric or by time Suitable for monitoring a relatively small number of systems in great detail, e.g. backend database servers
  • 70. May 14, 2014 Adrian Cockcroft and Mario Jauvin Cacti – www.cacti.net  Web based user interface based on RRDtool  More sophisticated GUI than Orca or MRTG  Less sophisticated system metric collection, but more coverage of networking  Better management of groups of systems and devices than Orca, useful for tens to hundreds of nodes  Access control and personalization for users
  • 71. May 14, 2014 Adrian Cockcroft and Mario Jauvin
  • 72. May 14, 2014 Adrian Cockcroft and Mario Jauvin
  • 73. May 14, 2014 Adrian Cockcroft and Mario Jauvin Ganglia – www.ganglia.info  Web based RRDtool GUI somewhat similar to Cacti  Better management of clusters of systems and devices than Cacti, useful for hundreds to thousands of nodes in a hierarchy of clusters  Provides many summary statistic plots at cluster level and collects detailed configuration data  XML based data representation  Uses low overhead network protocol  In common use at hundreds of large HPC Grid sites, less visibly in use at some large commercial sites
  • 74. May 14, 2014 Adrian Cockcroft and Mario Jauvin
  • 75. May 14, 2014 Adrian Cockcroft and Mario Jauvin
  • 76. May 14, 2014 Adrian Cockcroft and Mario Jauvin
  • 77. May 14, 2014 Adrian Cockcroft and Mario Jauvin BigBrother and BigSister  Network and system dashboard alert monitor  Widely used at internet sites  Bigbrother is at http://www.bb4.com  Bigsister is at http://bigsister.graeff.com  Bigsister seems to have more features, alert logging, better portability and more efficient data collection. Compatible update to BB4.
  • 78. May 14, 2014 Adrian Cockcroft and Mario Jauvin
  • 79. May 14, 2014 Adrian Cockcroft and Mario Jauvin
  • 80. May 14, 2014 Adrian Cockcroft and Mario Jauvin
  • 81. May 14, 2014 Adrian Cockcroft and Mario Jauvin Free QA Test and Modelling Tools
  • 82. May 14, 2014 Adrian Cockcroft and Mario Jauvin QA Test Requirements  Generate test workload – SLAMD, Grinder  Collect performance metrics – Any of the tools already mentioned  Report regression against baseline  Predict capacity needed for production system – Use spreadsheets for simple linear prediction – Use modelling tools such as PDQ for queuing models
  • 83. May 14, 2014 Adrian Cockcroft and Mario Jauvin Grinder 3 - Powerful New Features  100% Pure Java - works on any hardware platform and any operating system that supports J2SE 1.3 and above.  Java and Jython based load testing framework – Web Browsers: simulate web browsers using HTTP, and HTTPS. – Web Services: test interfaces using SOAP and XML-RPC. – Database: test databases using JDBC. – Middleware: RPC and MOM based systems using IIOP, RMI/IIOP, RMI/JRMP, and JMS. – Other Internet protocols: POP3, SMTP, FTP, and LDAP.  See http://grinder.sourceforge.net/g3/features.html  J2EE Performance Testing with BEA WebLogic Server by Peter Zadrozny, Philip Aston and Ted Osborne, originally published by Expert Press and now by APress uses Grinder 2 throughout.
  • 84. May 14, 2014 Adrian Cockcroft and Mario Jauvin SLAMD  Load generation framework, written in Java  Originally built to test LDAP servers by Sun  Extended to be very generic and published as open source. Actively being developed.  Sophisticated functions and user interface  See http://www.slamd.com  Latest Release 2.0 has better usability focus
  • 85. May 14, 2014 Adrian Cockcroft and Mario Jauvin
  • 86. May 14, 2014 Adrian Cockcroft and Mario Jauvin
  • 87. May 14, 2014 Adrian Cockcroft and Mario Jauvin
  • 88. May 14, 2014 Adrian Cockcroft and Mario Jauvin PDQ Modelling Tool  Dr Neil Gunther’s toolkit at http://www.perfdynamics.com  Library used from C or Perl provides MVA queueing models  Use to calibrate in QA and predict in production  PDQ modelling tool details: – The Practical Performance Analyst Dr. Neil Gunther - McGraw-Hill, 1998 ISBN 0-07-912946-3 – Analyzing Computer System Performance with Perl:PDQ 2004, ISBN 3-54-020865-8
  • 89. May 14, 2014 Adrian Cockcroft and Mario Jauvin References and Conclusion
  • 90. May 14, 2014 Adrian Cockcroft and Mario Jauvin Licences for Free Tools  Open Source Initiative – “OSI Approved licences” – http://opensource.org/licenses/category  Comparisons of Common Licences – http://zooko.com/license_quick_ref.html
  • 91. May 14, 2014 Adrian Cockcroft and Mario Jauvin Web Pages and Books  Adrian’s Performance and other topics blog – http://perfcap.blogspot.com  MFJ Associates performance tools link page – http://www.mfjassociates.net/perf_links.html  More free tools compiled by John Sellens – http://www.generalconcepts.com/resources/monitoring/  More tools compiled by Openxtra – http://www.openxtra.co.uk/resource-center/open_source_network_monitor_tools.php  SE toolkit info: Sun Performance and Tuning - Java and the Internet - Adrian Cockcroft and Richard Pettit - Sun Press/Prentice Hall, 2nd Edition, 1998 ISBN 0-13- 095249-4  Solaris 8 and Linux: System Performance Tuning 2nd Edition – Gian-Paolo Musumeci, O’Reilly 2002 ISBN: 0-596-00284-X  Solaris Internals http://www.solarisinternals.com – Richard McDougall and James Mauro - new 2nd edition and new performance book by Richard McDougall and Brendan Gregg
  • 92. May 14, 2014 Adrian Cockcroft and Mario Jauvin Concluding Remarks  Many large installations depend on free tools  A full suite of functionality is available  Several tools are needed to cover the bases  Tradeoff between function and ease of use  Support may be available, but typically Google is the best support tool  Functionality is increasing….
  • 93. May 14, 2014 Adrian Cockcroft and Mario Jauvin Questions? acockcroft@netflix.com mario@mfjassociates.net

Notas do Editor

  1. &amp;quot;We reject kings, presidents, and voting; we believe in rough consensus and running code.&amp;quot; -- David Clark, IAB chair, 1992
  2. I added the SNMP-Informant and it started appearing automagically.
  3. 920804400 – Noon 7th of March, 1999 DS – data source SPEED as counter, collected every 300 seconds (defaults) 600 is heartbeat – maximum time to wait after which data is unknown U:U means unknown minimum and maximum RRA – round robin archive 0.5 – xfiles factor - % of unknown after which whole archive is unknown 1:24 average every 1 interval (no average) and keep 24 (2 hours worth) 6:10 everage every 6 values and keep 10
  4. Start at noon, end at 13:00, average RRA called SPEED using 2 pixel thickness and color red #FF0000