#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Puppet Camp CERN Geneva
1. A Puppet Infrastructure at CERN
Steve Traylen
CERN IT Department
steve.traylen@cern.ch
Puppet Camp, Geneva, CH.
11 July 2012
2. Outline
• CERN and Computing for High Energy
Physics
• Today’s CERN IT Deployment
– Why and What’s changing
• Adoption of Puppet, Foreman, …
– Progress, Integration
– Difficulties
– Future
Puppet Camp Geneva - CERN
3. CERN
§ Conseil Européen
pour la Recherche
Nucléaire
§ aka European
Laboratory for
Particle Physics
§ Facilities for
fundamental research
§ Between Geneva and
the Jura mountains,
straddling the Swiss-
French border
§ Founded in 1954
4. The Large Hadron Collider
§ Accelerator for
protons against
protons – 14 TeV
collision energy
§ By far the world’s
most powerful
accelerator
§ Tunnel of 27 km
circumference, 4 m
diameter, 50…150 m
below ground
§ Detectors at four
collision points
5. The
LHC
Computing
Challenge
ž Data volume
è 15 PetaBytes of new data
each year
ž Global compute power
è 250k CPU cores
è 100 PB of disk storage
ž Worldwide analysis & funding
— Distributed computing
infrastructure to provide the
production and analysis
environments for the LHC
experiments
— Managed and operated by a
worldwide collaboration between
the experiments and the
participating computer centres
— Distributed for funding and
sociological reasons
Puppet Camp Geneva -
6. Motivation to Change Tools
• CERN data centre is reaching its limits:
– IT staff numbers remain fixed
– more computing capacity is needed
• Inefficiencies exist but root cause cannot be easily
identified
– Tools becoming increasingly brittle and difficult to adapt
• E.g porting of tools to IPv6 would need a development project
– Some core components cannot be scaled up
Puppet Camp Geneva - CERN
7. Second CERN Data Centre
• Wigner Institute in Budapest, Hungary
• Hands off facility, hardware support only
• Deploying 2012 to 2014
Puppet Camp Geneva - CERN
8. Infrastructure Tools Evolution
• We had to develop our own toolset in 2002
– “Extremely Large Fabric Management System” or http://cern.ch/ELFms
– Included Quattor for configuration
• Nowadays,
– CERN compute capacity is no longer leading edge
– Many options available for open source fabric management
– We need to scale to meet the upcoming capacity increase
• If there is a requirement which is not available through an
open source tool, we should question the need
– If we are the first to need it, contribute it back to the open source tool
Puppet Camp Geneva - CERN
9. Infrastructure as a Service
• Goals
– Improve repair processes with virtualisation
– More efficient use of our hardware
– Better tracking of usage
– Enable remote management for new data centre
– Support potential new use cases , e.g Cloud
– Sustainable support model
• At scale for 2015
– 15,000 servers
– 90% of hardware virtualized.
– 300,000 VMs needed.
• Plan = OpenStack Adoption
Puppet Camp Geneva - CERN
10. Chose Puppet for Configuration
• The tool space has exploded in the last few years
– In configuration management and ops
– Large, shared ‘tool forges’, and lots of experience
• Puppet and Chef are the clear leaders for the ‘core’ tool
• Many large-scale enterprises use Puppet
– Its declarative approach fits better with what we are used to in Quattor.
– Large installations: friendly, wide-base community and commercial support
and training
– You can buy books on it
– You can employ people who know puppet better than you do
Puppet Camp Geneva - CERN
12. Starting with Puppet
• Puppet was and is trivial to setup:
– Anyone can do it in a day:
• Configuring something with puppet is easy
• What’s hard:
– Deciding module scope and interaction with one another.
• Three modules editing grub.conf or one
– We started early 2012 with very little plan in the area of
module organization
Puppet Camp Geneva - CERN
13. Downloading Puppet Modules
• Expectation at start – all done for us:
– ssh, iptables , sysctl , apache, mysql all done
– example42 or similar can do everything.
• Reality
– Modules often not quite correct.
• Too simple,
– e.g. I want my sshd_config to be different in two places.
• Too much abstraction
– I want to use puppet and not some abstraction of 100s of
variables covering every possible case
» e.g puppet with(out) passenger. I only want one
– Parameterized classes and Foreman don’t really work
• Resulting modules are not shareable – ENC globals vs params
Puppet Camp Geneva - CERN
14. Sharing and Fixing Modules
• Not as easy as it should be:
– Our modules are littered with CERNisms
• ntpservers, subnets, authorization systems, ..
• Adaption to work with foreman
• All of us learning puppet and doing things quickly (badly)
• Hiera is being used now:
– Provides the code vs data separation we had with
Quattor
– Dozens of ways to setup and (ab)use hiera
– Little experience with this anywhere yet
– Hiera should make modules more sharable across sites
• Looking forward to it becoming the normal standard thing that
modules use and every one benefits from
Puppet Camp Geneva - CERN
15. Sharing Modules With All
• A big aim is to share our modules as much as
possible with everyone but in particular:
– CERN IT not the only puppet deployment at CERN
• ATLAS Point 1 farm at CERN runs puppet
– ATLAS analysis in the cloud has used puppet
– International HEP Labs use or are switching to puppet
– Puppet was the “winner” at recent CHEP fabric session
• Presentations from CERN, BNL, PIC, ATLAS
• We will share here but its early days:
– http://github.com/cernops
Puppet Camp Geneva - CERN
16. Organizing Modules On Disk
• Started with all modules in one directory in git:
– Obviously wrong, great confusion for new comers
• Current situation two directories in git:
– Modules – reusable items – e.g firewall, apache, sysctl, ..
– Manifests – top level service, e.g batch machine, public
login machine
• Future plans:
– Split up modules into local and downloaded
• modules like puppetlabs-firewall mixed with our own junk
• Will allow us to track /contribute to upstream better
– Inline with puppet’s upcoming vendor path
Puppet Camp Geneva - CERN
17. Configuration Complexity,
150 clusters
ranging form
1 to 3000
hosts.
• We have many configurations of service.
– Puppet handles this diversity well
• We have many administrators >= 300
– These admins change, are on different continents
– Less obvious what to do with Puppet
Puppet Camp Geneva - CERN
18. Trust Amongst SysAdmins
All share one git
Git Repository repository
Rely on code review.
git branches and
environments.
Puppet Master(s) for Puppet Master (s) for
SysAdmin Team A SysAdmin Team B Teams use their
own puppet
masters.
hiera-gpg key for
each team.
Team A’s Team B’s
Nodes Host acl on
Nodes
puppet masters.
• The full implications of this lack of trust between
admins is unclear
– Interested to hear what others have done.
19. Change Control, Dev Cycle
• Core team maintaining OS and basics:
– Hardware monitoring, ntp configuration, accounts, ..
• Specialized teams maintaining services on top:
– They are ultimately responsible for service stability
– We don’t want NTP configured 150 different ways
• Requirements:
– Some services will follow core updates
– Some service will choose when to take core updates
– Parts of services may follow latest updates
– LHC has physical shutdowns for doing timely updates
Puppet Camp Geneva - CERN
20. Change Control , Dev Cycle
• Puppet Environments map to Git Branches:
– Nodes in Production, Testing and Devel branches
– Big new configurations being tested in feature branches
• A few nodes in these feature branches
– Some services live isolated in their own branch
• Risk of divergence
• Current process:
– A blind weekly devel -> production merge
• Next Process:
– Use Atlassian’s Crucible and Fisheye products to code
review puppet configuration
Puppet Camp Geneva - CERN
21. Crucible Reviewing Manifest
• Atlassion themselves use puppet and do this
– http://blogs.atlassian.com/2011/09/puppet_change_management_for_devops/
Puppet Camp Geneva - CERN
22. Hardware Provisioning
• Up to now a homegrown tool in use:
– Has strong similarities to puppet labs new Razor
• Razor is being followed, tracked for the moment
– Final step of tool adds host to foreman
• We are using foreman – happy with it:
– Kickstart templating is great
– Organising hosts into hostgroups is great
– We will now invest time to integrate foreman with CERN
services:
• CERN network database , our master for switches, DNS, …
• AIMS kerberos managed tftp server
• CERN CA – We have our own CA used by other services also
– We will use this for puppet also
Puppet Camp Geneva - CERN
23. Virtual Machine Provisioning
• Existing Microsoft HyperV infrastructure:
– 3000 Virtual Machines of which 70 puppet managed
– VMs pre-seeded into a foreman hostgroup
– VMs being kickstarted onto puppet and foreman
• Puppet managed OpenStack Nova
– Today aiming at 200 hypervisors with up to 4000 puppet
managed VMs.
– Machine Images created with Oz
– Machines NOT pre-seeded in foreman or puppet
• Register at boot time
– amiconfig and cloud-init for contextualizing
• pass puppet server and foreman hostgroup to image
Puppet Camp Geneva - CERN
24. Next Steps till End of Year
• Migrate to PuppetDB
– (300,000 nodes => 300 GB RAM)
• Look at puppet dashboard
• Use mcollective for something:
– Necessary as node number increases
– Currently set up but not being used particularly
• Check Foreman’s integration with OpenStack
• Migrate more services from Quattor to Puppet
• Decide a scheme for secure blob delivery:
– hiera-gpg or ACL’ed puppet fileserver
Puppet Camp Geneva - CERN
25. Conclusions
• Migrating to Puppet
– Largest change in our deployment for 5 years
• Has all been fairly painless: Difficulties:
– forced to integrate to existing stuff sometimes
– Doing things wrong first time
• lack of in house experience
• 300,000 VMs in 2015?
– puppet easy to scale, more hardware can be added
– We expect to dedicate up to 100 of cores to puppet
• It’s a joy to work with an active community
Puppet Camp Geneva - CERN