Wai Keen Woon, CTO CDN Division OnApp Malaysia, gave an interesting overview of what the Puppet architecture at OnApp looks like. The CDN division at OnApp is a large provider of CDN services, and as such makes a very interesting candidate for a case study.
3. About OnApp
A leading provider of software for hosts
The leading cloud The instant global CDN for hosts
management software for
hosts
OnApp launched July 1st 2010
Deep industry knowledge
Backed by LDC
100+ employees in US, EU, APAC
4. Vital Statistics
1 in 3
public clouds
800+
cloud deployments
300+
global clients
8. Systems Overview
l Core & Development
l ~20 physical servers
l ~200 VMs
l Homogeneous environment – 64-bit Debian everywhere
l Mainly use OpenVZ and KVM for virtualization
l CDN Delivery Edge Servers
l 100+ servers in 60+ cities
l Running on the OnApp platform – either Xen or KVM
l Puppet integral to our setup – since day 1
9. Why Puppet?
l More reliable configuration of servers. Less need to
“run ssh in a for loop” and miss out something.
l Self-documenting – our manifests are almost able to
bootstrap an empty server.
l Our manifests can't bootstrap an empty environment yet.
l Limitation – manifests describe what/where/how something
is setup, but doesn't describe *why*.
l Nice syntax – easy on the eyes. Comprehensive builtin
resource types. Able to fallback to dumb ways of doing
things if required (use file, exec et al).
11. What Would OnApp Setup...
l Essential utilities (tcpdump, less, vim, etc).
l Users & their SSH keys, sudoers.
l Developer's shell => /bin/false if production
l Base firewall rules.
l Nagios agent.
l Set uniform locality settings: UTC timezone,
en_US.UTF-8 locale.
l SMTP that smarthosts to our central relay.
l Syslogd for remote logs to central logging server.
l Finally, the services.
12. Core Infra Manifest Excerpt
$portal_domain = "portal.alpha.onappcdn.com" node
"monitoring.alpha.onappcdn.com" {
$portal_db_host = "portal.alpha.onappcdn.com"
include base
$portal_db_user = "aflexi_webportal"
include s_db_monitoring
include s_monitoring_server
$auth_nameservers = { "ns1" => "175.143.72.214",
include collectd::rrdcached
"ns2" => "175.143.72.214",
include s_munin
"ns3" => "175.143.72.214",
include s_monitoring_alerts
"ns4" => "175.143.72.214",
include s_monitoring_graph
}
}
$monitoring_host_server = class collectd::rrdcached {
package { "rrdcached":
[ "monitoring.alpha.onappcdn.com",
"dns.alpha.onappcdn.com" ] ensure => latest,
}
service { "rrdcached":
BLUE – env config definitions ensure => running,
RED – node definitions }
GREEN – class definitions }
13. Package Repo Integration
l Jenkins builds debs of our code and stores it into an apt
repository for the environment it is built for.
l Puppet keeps packages up-to-date (ensure => latest)
and restarts services on package upgrades.
Puppet-agent[25431]:
(/Stage[main]/Debian/Exec[apt-get-update]/returns) executed
successfully
puppet-agent[25431]:
(/Stage[main]/Python::Aflexi::Mq/Package[python-aflexi-mqcore]/
ensure)
ensure changed '7065.20120530.113915-1' to '7066.20120604.090916-1'
puppet-agent[25431]:
(/Stage[main]/S_mq/Service[worker-rabbitmq])
Triggered 'refresh' from 1 events
puppet-agent[25431]: Finished catalog run in 16.08 seconds
15. Nagios Integration
Server manifest Nagios service manifest
*collects the resources to check
*exports the service that is checked
@@nagios_service { "check_load_$fqdn": Nagios_service <<| tag == "onappcdn.cm" |>>
{
check_command => target => "/etc/n3/conf.d/services.cfg",
"check_nrpe_1arg!check_load", require => Package["nagios3"],
use => "generic-service", notify => Exec["reload-nagios"],
host_name => $fqdn, }
service_description => "check_load",
tag => $domain,
}
16. Nagios Integration
l What's logged on the nagios server when puppet runs?
puppet-agent[15293]: (/Stage[main]/Nagios::Monitor_private/
Nagios_host[hrm.onappcdn.com]/ensure) created
puppet-agent[15293]: (/Stage[main]/Nagios::Monitor_private/
Nagios_service[check_load_hrm.onappcdn.com]/ensure) created
nagios3: Nagios 3.2.1 starting... (PID=5601)
puppet-agent[15293]: (/Stage[main]/Nagios::Base/Exec[reload-
nagios]) Triggered 'refresh' from 8 events
17. Monitoring Puppet Itself
l Lots of tools/dashboards out there to achieve this.
l For us: “grep -i err */syslog”. Dumb, but works until we
need to Really Address it.
l Common issues:
l Puppet gets “stuck”. And only one puppet instance
can run at any one time.
l Manifest errors – syntax, merge issues.
l Badly-written manifests (vague dependencies,
conditions/commands not robust enough).
l An important dependent resource failing (e.g. apt-get
install fails due to dpkg-configure error).
18. File/Dir Organization
l We use git to revision control our l Common branch
Manifests/
puppet manifests. alpha.pp
beta.pp
l Style we adopted mainly comes Modules/
Base/
from Hunter Haugen* Users/
l A branch for each environment, l Alpha env branch
Modules/
plus a “common” branch. Python/
Services/
l Each branch checked out as a Nameserver/
separate directory in /etc/puppet/ l Beta env branch
environments/$env Modules/
Python/
l And puppetmaster's includedir Services/
Nameserver/
configured to that directory.
* - http://hunnur.com/blog/2010/10/dynamic-git-branch-puppet-environments/
19. File/Dir Organization
l Common goes into its own branch – for convenience;
less merging needed for manifests that we are Really
Sure won't differ between environments.
l System manifest into common/manifests/$env.pp
l Initially tried putting manifest into alpha/beta/omega
branches as site.pp – merge hell.
l Introduced extra variable - $effective_env
l Abstracts the puppet environment name, from the
environment that the manifest runs in.
20. File/Dir Organization
l Hotfixes branch off omega and merged to alpha/beta/
omega.
l Development branches off alpha
l This branch can be trialed as a separate environment (use
--environment to specify custom env on puppet client).
l Merge to alpha → beta → omega.
l Or merge as feature branch to any other environment.
l “git diff branchA branchB” - differences are shown
clearly between environments.
21. Edge Servers
l Our edge servers are hosted on OnApp cloud (only).
l When creating an edge server, the cloud control panel
l Instantiates a VM from a lightly-customized Debian image.
l Configures the package repositories.
l Issues a puppet run to set up.
l Advantage of setting it up through puppet instead of a
“gold image” - our system can be installed on bare
metal if needed, can be reproducibly installed on
$future_debian_release
22. Edge Servers
l Our edge servers are hosted on OnApp cloud (only).
l When creating an edge server, the control panel
instantiates a VM from a lightly-customized Debian
image, and issues a puppet run to set it up.
23. Edge Servers – External Node Classifier
l No text manifest – all code, using “external node
classifier”.
l Assign variables and classes specific to the edge
server through node classifier. E.g. its password, the
services it runs.
l In python,
output = {}
output[“classes”] = [ “class1”, “class2” ]
output[“parameters”] = { “param1”: “value1” }
print yaml.dump(output)
24. Edge Servers – External Node Classifier
l This YAML-encoded structure...
$ puppet-nodeclassifier 85206671.onappcdn.com
classes: [base, nginx ]
parameters: { edge_secret_key: 86zFsrM7Ma, monitoring_domain:
monitoring.alpha.onappcdn.com }
l … is equivalent to this textual manifest:
node 85206671.onappcdn.com {
$edge_secret_key = “86zFsrM7Ma”
$monitoring_domain = “monitoring.alpha.onappcdn.com”
include base
include nginx
}
25. Edge Servers Storedconfigs
l Puppet stores facts about the edge servers into
MySQL.
l We make minimal use of this – for example sizing
nginx's in-memory cache depending on the amount of
memory it has.
l Could probably use more e.g. set # threads based on
cpu core count.
l The data's always there if we ever want to query it...