O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.
Keeping an Eye on the PE Stack
An Introduction to Measuring and Tuning PE Performance
Charlie Sharpsteen, Puppet Inc.
Preso title goes here. To update, go to File > Page Setup > Header/Footer, paste title, Apply All
Overview
• How do I meas...
3
Gathering Data
From PE Services
JVM Logging and Metrics
PE Server Components
TrapperKeeper JVM
Puppet Server
PuppetDB
Console Services
Orchestration Services
JVM
ActiveMQ
Other
P...
TrapperKeeper Logging
• Configuration for main logs can be found in:

/etc/puppetlabs/<service name>/logback.xml
• Control...
TrapperKeeper Logging
• Configuration for main logs can be found in:

/etc/puppetlabs/<service name>/request-logging.xml
•...
TrapperKeeper Metrics
• Metrics are recorded using JMX MBeans.
• Metrics that measure activity over time are weighted to r...
TrapperKeeper Configuration
• Configuration files are stored under:

/etc/puppetlabs/<service name>/conf.d
• Most importan...
Puppet Server
It’s all about the JRubies.
9
Preso title goes here. To update, go to File > Page Setup > Header/Footer, paste title, Apply All
Puppet Server Metrics Ov...
Preso title goes here. To update, go to File > Page Setup > Header/Footer, paste title, Apply All
New PE 2016.4.0 Features...
Preso title goes here. To update, go to File > Page Setup > Header/Footer, paste title, Apply All
JRuby Metrics
● Almost a...
Preso title goes here. To update, go to File > Page Setup > Header/Footer, paste title, Apply All
Agent Checkin Activity
●...
Preso title goes here. To update, go to File > Page Setup > Header/Footer, paste title, Apply All
Re-balancing Agent Check...
Preso title goes here. To update, go to File > Page Setup > Header/Footer, paste title, Apply All
Adding More JRuby Capaci...
Preso title goes here. To update, go to File > Page Setup > Header/Footer, paste title, Apply All
Investigating Compile Ti...
Preso title goes here. To update, go to File > Page Setup > Header/Footer, paste title, Apply All
Investigating Agent Run ...
PuppetDB
Processing Time and Storage Space
18
Preso title goes here. To update, go to File > Page Setup > Header/Footer, paste title, Apply All
PuppetDB Storage Usage
●...
Preso title goes here. To update, go to File > Page Setup > Header/Footer, paste title, Apply All
PuppetDB Command Process...
Preso title goes here. To update, go to File > Page Setup > Header/Footer, paste title, Apply All
PuppetDB Command Process...
Preso title goes here. To update, go to File > Page Setup > Header/Footer, paste title, Apply All
PostgreSQL Query Perform...
Resources
This Slide Deck: https://goo.gl/ytzCA5
23
Resources
Logging:
• Directing Output: http://logback.qos.ch/manual/appenders.html
• Formatting Main Logs: http://logback....
Resources
Puppet Server:
• Metrics Reference: https://docs.puppet.com/pe/2016.4/puppet_server_metrics.html
• Configuration...
PuppetConf 2016: An Introduction to Measuring and Tuning PE Performance – Charlie Sharpsteen, Puppet
Próximos SlideShares
Carregando em…5
×

PuppetConf 2016: An Introduction to Measuring and Tuning PE Performance – Charlie Sharpsteen, Puppet

635 visualizações

Publicada em

Here are the slides from Charlie Sharpsteen's PuppetConf 2016 presentation called An Introduction to Measuring and Tuning PE Performance. Watch the videos at https://www.youtube.com/playlist?list=PLV86BgbREluVjwwt-9UL8u2Uy8xnzpIqa

Publicada em: Tecnologia
  • Seja o primeiro a comentar

PuppetConf 2016: An Introduction to Measuring and Tuning PE Performance – Charlie Sharpsteen, Puppet

  1. 1. Keeping an Eye on the PE Stack An Introduction to Measuring and Tuning PE Performance Charlie Sharpsteen, Puppet Inc.
  2. 2. Preso title goes here. To update, go to File > Page Setup > Header/Footer, paste title, Apply All Overview • How do I measure PE performance? What sources of data are available? • What numbers are actually important? • What settings can I adjust when important metrics start showing unhealthy trends? 2
  3. 3. 3 Gathering Data From PE Services JVM Logging and Metrics
  4. 4. PE Server Components TrapperKeeper JVM Puppet Server PuppetDB Console Services Orchestration Services JVM ActiveMQ Other PostgreSQL NGINX Mostly Java based with shared logging and metrics interfaces. 4
  5. 5. TrapperKeeper Logging • Configuration for main logs can be found in:
 /etc/puppetlabs/<service name>/logback.xml • Controls output destinations, log levels and message formatting. • Ship to a log aggregator to provide context for investigations. • Default log pattern is:
 Date Level [Java Namespace] message • Puppet Server also includes thread ID:
 Date Level [thread] [Java Namespace] message • Thread ID is useful for grouping activity related to a single request. 5
  6. 6. TrapperKeeper Logging • Configuration for main logs can be found in:
 /etc/puppetlabs/<service name>/request-logging.xml • Default format is Apache Combined Log + request duration • Easily parsed by most log processors. • Can add additional bits of information such as request headers.
 
 6
  7. 7. TrapperKeeper Metrics • Metrics are recorded using JMX MBeans. • Metrics that measure activity over time are weighted to represent the last 5 minutes. • Metrics can be retrieved via the JMX protocol. • Full access to all available metrics and all available measurements. • Can attach tools such as JConsole and JVisualVM. • Requires additional ports to be opened, configuration can be complex. Java tools only. • Metrics can be retrieved as JSON over HTTP: • For a curated set of common metrics: status/v1?level=debug • For access to all available metrics: metrics/v1/mbeans 7
  8. 8. TrapperKeeper Configuration • Configuration files are stored under:
 /etc/puppetlabs/<service name>/conf.d • Most important settings are managed by puppet_enterprise::profile classes and are tunable via the Console and Hiera. • JVM settings are specified in /etc/sysconfig or /etc/default • JVM memory limit, -Xmx is the primary tunable setting. Enable the G1 garbage collector when using limits higher than 10 GB: -XX:+UseG1GC • These flags are configurable via the java_args parameter on profile classes. 8
  9. 9. Puppet Server It’s all about the JRubies. 9
  10. 10. Preso title goes here. To update, go to File > Page Setup > Header/Footer, paste title, Apply All Puppet Server Metrics Overview ● JVM resource usage: status-service ● JMX namespace: java.lang:* ● HTTP request times per endpoint: pe-master ● JMX namespace: puppetserver:name=puppetlabs.<fqdn>.http.* ● Catalog Compilation metrics: pe-puppet-profiler ● JMX namespace: puppetserver:name=puppetlabs.<fqdn>.compiler.*
 puppetserver:name=puppetlabs.<fqdn>.functions.*
 puppetserver:name=puppetlabs.<fqdn>.puppetdb.* ● JRuby Metrics: pe-jruby-metrics ● JMX namespace: puppetserver:name=puppetlabs.<fqdn>.jruby.* 10
  11. 11. Preso title goes here. To update, go to File > Page Setup > Header/Footer, paste title, Apply All New PE 2016.4.0 Features ● The metrics/v1/mbeans endpoint has been added to Puppet Server. Must be enabled via Hiera:
 
 puppet_enterprise::master::puppetserver::metrics_webservice_enabled: true ● The Graphite metrics reporter has been optimized and extended: ● Only a subset of available metrics are reported by default. ● Reported metrics can be customized using the metrics_puppetserver_metrics_allowed parameter of the puppet_enterprise::profile::master class. 11
  12. 12. Preso title goes here. To update, go to File > Page Setup > Header/Footer, paste title, Apply All JRuby Metrics ● Almost all Puppet Server requests must be handled by a JRuby instance — this makes JRuby availability the primary performance bottleneck. ● num-free-jrubies ● Measures spare capacity for incoming requests. ● average-wait-time ● Should never grow to a significant fraction of HTTP request times. ● Impacted by agent checkin distribution, resource availability, Puppet plugins and code. 12
  13. 13. Preso title goes here. To update, go to File > Page Setup > Header/Footer, paste title, Apply All Agent Checkin Activity ● Agents will check in runinterval after starting their last run — this can lead to pile-ups or “thundering herds”. Be careful of: ● Starting or re-starting a group of agents without the splay setting enabled. ● Triggering a group of agent runs via: mco puppet runonce ● Monitor average-requested-jrubies and Puppet Server access logs for spikes in agent activity. ● Use PostgreSQL to pull a histogram of Agent start times from report data:
 
 sudo su - pe-postgres -s /bin/bash -c "psql -d pe-puppetdb"
 SELECT date_part('minute', start_time), count(*)
 FROM reports
 WHERE start_time BETWEEN '2016-10-20 13:30:00' AND '2015-10-20 14:30:00'
 GROUP BY date_part('minute', start_time)
 ORDER BY date_part('minute', start_time) ASC; 13
  14. 14. Preso title goes here. To update, go to File > Page Setup > Header/Footer, paste title, Apply All Re-balancing Agent Checkins ● Use MCollective to orchestrate a batched re-start:
 
 su - peadmin -c "mco rpc service stop service=puppet"
 su - peadmin -c "mco rpc service start service=puppet --batch 1 
 --batch-sleep <runinterval in seconds / #nodes>” ● Batching is not necessary if the agents have splay enabled. ● For a stable distribution that isn’t affected by re-starts, puppet agent -t can be run on a schedule determined by the fqdn_rand() function instead of using the service. ● Load due to agent activity can be cut dramatically by shifting to the Direct Puppet workflow where Orchestrator or MCollective are used to push catalog updates. 14
  15. 15. Preso title goes here. To update, go to File > Page Setup > Header/Footer, paste title, Apply All Adding More JRuby Capacity ● JRuby count is set via jruby_max_active_instances, constrained by available CPU and RAM: ● Compile masters tend to top out around NCPU - 1. Monolithic masters need to share with PuppetDB and tend more towards (NCPU / 2 - 1). ● RAM requirements are 512 MB per JRuby, but may need to be increased if catalog compilation uses large datasets or dozens of environments are in use. ● The environment_timeout setting can be used to reduce the CPU requirements of catalog compilation. Set to 0 globally and unlimited for long-lived environments with lots of agents. ● Each environment using an unlimited timeout will add to the per-JRuby RAM requirements.
 Monitor memory usage of pre-2016.4.0 installations closely when using unlimited timeouts. ● Code Manager should be enabled when an unlimited timeout is used so that caches are flushed when new code is deployed. 15
  16. 16. Preso title goes here. To update, go to File > Page Setup > Header/Footer, paste title, Apply All Investigating Compile Times ● PE Puppet Server tracks compilation time on several different levels: per-node, per-environment, per- resource, per-function, and more. ● Top 10 resources and functions are available via the status API and Puppet Server performance dashboard:
 
 https://<puppetmaster>:8140/puppet/experimental/dashboard.html ● Full access available through JMX and the metrics API. ● Detailed timing on catalog compilation can be obtained by setting the Puppet Server log level to DEBUG and running puppet agent -t --profile on nodes of interest. 16
  17. 17. Preso title goes here. To update, go to File > Page Setup > Header/Footer, paste title, Apply All Investigating Agent Run Times ● Agent run summaries are stored at:
 
 /opt/puppetlabs/puppet/cache/state/last_run_summary.yaml ● Summaries are also stored by PuppetDB and can be viewed from the PE Console, or queried:
 
 reports[metrics] {
 latest_report? = true and certname = '<node name>' 
 } ● The time section shows amount of time taken per resource type along with config_retrieval measuring the amount of time it took to receive a catalog. ● Per-resource timing can be logged by running: puppet agent -t --evaltrace 17
  18. 18. PuppetDB Processing Time and Storage Space 18
  19. 19. Preso title goes here. To update, go to File > Page Setup > Header/Footer, paste title, Apply All PuppetDB Storage Usage ● Monitor disk space!
 /opt/puppetlabs/server/data/postgresql/
 /opt/puppetlabs/server/data/puppetdb/ ● If disk space runs out, there are two options for returning space to the operating system: ● The existing volume can be enlarged so that a VACUUM FULL can be run. ● Alternately, a new volume can be attached for a database backup and restore. ● The primary source of disk usage is report storage, this can be tuned by setting: report-ttl ● For infrastructure with high node turnover, consider setting node-purge-ttl to remove data related to decommissioned nodes. 19
  20. 20. Preso title goes here. To update, go to File > Page Setup > Header/Footer, paste title, Apply All PuppetDB Command Processing ● Every PuppetDB operation, aside from queries, is executed by an asynchronous command processing queue. This queue is managed by an internal ActiveMQ server:
 
 org.apache.activemq:type=Broker,brokerName=localhost,
 destinationType=Queue,destinationName=puppetlabs.puppetdb.commands ● Important metrics: ● Backlog of commands waiting for processing: QueueSize ● Largest command seen: MaxMessageSize ● Available memory for in-flight commands: MemoryPercentUsage ● Increase PuppetDB heap size along with the command-processing.memory-usage setting if the percentage spikes close to 100%. This will prevent ActiveMQ from paging commands to disk. 20
  21. 21. Preso title goes here. To update, go to File > Page Setup > Header/Footer, paste title, Apply All PuppetDB Command Processing ● Command processing rates:
 
 puppetlabs.puppetdb.mq:name=global.processing-time
 
 puppetlabs.puppetdb.storage:name=replace-facts-time
 puppetlabs.puppetdb.storage:name=replace-catalog-time
 puppetlabs.puppetdb.storage:name=store-report-time ● Additional processing threads can be added using the command-processing.threads setting. ● On a monolithic install, PuppetDB processing threads must be balanced against Puppet Server JRubies and the number of CPU cores available.
 21
  22. 22. Preso title goes here. To update, go to File > Page Setup > Header/Footer, paste title, Apply All PostgreSQL Query Performance ● PostgreSQL configuration can be found in:
 
 /opt/puppetlabs/server/data/postgresql/9.4/data/postgresql.conf ● Add settings to improve logging around slow queries:
 
 log_min_duration_statement = 3000ms
 log_temp_files = 0 ● If a temp file shows up in the logs, that means Postgres had to perform an operation outside of RAM; which is slow. Consider increasing the work_mem setting to be greater than the size of the temp files used. ● If query performance has been dropping over time, a database VACCUM may be needed:
 
 su - pe-postgres -s /bin/bash -c "vacuumdb --analyze --verbose --all" 22
  23. 23. Resources This Slide Deck: https://goo.gl/ytzCA5 23
  24. 24. Resources Logging: • Directing Output: http://logback.qos.ch/manual/appenders.html • Formatting Main Logs: http://logback.qos.ch/manual/layouts.html • Formatting Access Logs: http://logback.qos.ch/manual/layouts.html#logback-access JMX: • Configuration:
 https://docs.oracle.com/javase/8/docs/technotes/guides/management/agent.html • Metric Polling Tool: https://github.com/jmxtrans/jmxtrans 24
  25. 25. Resources Puppet Server: • Metrics Reference: https://docs.puppet.com/pe/2016.4/puppet_server_metrics.html • Configuration Reference: https://docs.puppet.com/puppetserver/2.6/configuration.html • Direct Puppet Workflow: https://docs.puppet.compe/2016.4/direct_puppet_workflow.html PuppetDB: • Metrics Reference: https://docs.puppet.com/puppetdb/4.2/api/metrics/v1/mbeans.html • Configuration Reference: https://docs.puppet.com/puppetdb/4.2/configure.html • Backup Procedures: https://docs.puppet.com/pe/2016.4/maintain_console-db.html • PostgreSQL Maintenance: https://github.com/npwalker/pe_databases 25

×