Puppet Camp CERN Geneva

A Puppet Infrastructure at CERN

Steve Traylen
CERN IT Department
steve.traylen@cern.ch

Puppet Camp, Geneva, CH.
11 July 2012

Outline

•  CERN and Computing for High Energy
Physics
•  Today’s CERN IT Deployment
–  Why and What’s changing
•  Adoption of Puppet, Foreman, …
–  Progress, Integration
–  Difficulties
–  Future

Puppet Camp Geneva - CERN

CERN
§  Conseil Européen
pour la Recherche
Nucléaire
§  aka European
Laboratory for
Particle Physics
§  Facilities for
fundamental research
§  Between Geneva and
the Jura mountains,
straddling the Swiss-
French border
§  Founded in 1954

The Large Hadron Collider
§  Accelerator for
protons against
protons – 14 TeV
collision energy
§  By far the world’s
most powerful
accelerator
§  Tunnel of 27 km
circumference, 4 m
diameter, 50…150 m
below ground
§  Detectors at four
collision points

The
LHC
Computing
Challenge

ž  Data volume
è 15 PetaBytes of new data
each year
ž  Global compute power
è  250k CPU cores
è  100 PB of disk storage
ž  Worldwide analysis & funding
—  Distributed computing
infrastructure to provide the
production and analysis
environments for the LHC
experiments
—  Managed and operated by a
worldwide collaboration between
the experiments and the
participating computer centres
—  Distributed for funding and
sociological reasons
Puppet Camp Geneva -

Motivation to Change Tools

•  CERN data centre is reaching its limits:
–  IT staff numbers remain fixed
–  more computing capacity is needed
•  Inefficiencies exist but root cause cannot be easily
identified
–  Tools becoming increasingly brittle and difficult to adapt
•  E.g porting of tools to IPv6 would need a development project
–  Some core components cannot be scaled up


Second CERN Data Centre

•  Wigner Institute in Budapest, Hungary
•  Hands off facility, hardware support only
•  Deploying 2012 to 2014

Infrastructure Tools Evolution

•  We had to develop our own toolset in 2002
–  “Extremely Large Fabric Management System” or http://cern.ch/ELFms
–  Included Quattor for configuration

•  Nowadays,
–  CERN compute capacity is no longer leading edge
–  Many options available for open source fabric management
–  We need to scale to meet the upcoming capacity increase
•  If there is a requirement which is not available through an
open source tool, we should question the need
–  If we are the first to need it, contribute it back to the open source tool


Infrastructure as a Service
•  Goals
–  Improve repair processes with virtualisation
–  More efficient use of our hardware
–  Better tracking of usage
–  Enable remote management for new data centre
–  Support potential new use cases , e.g Cloud
–  Sustainable support model
•  At scale for 2015
–  15,000 servers
–  90% of hardware virtualized.
–  300,000 VMs needed.
•  Plan = OpenStack Adoption

Chose Puppet for Configuration
•  The tool space has exploded in the last few years
–  In configuration management and ops
–  Large, shared ‘tool forges’, and lots of experience
•  Puppet and Chef are the clear leaders for the ‘core’ tool
•  Many large-scale enterprises use Puppet
–  Its declarative approach fits better with what we are used to in Quattor.
–  Large installations: friendly, wide-base community and commercial support
and training
–  You can buy books on it
–  You can employ people who know puppet better than you do


Starting with Puppet

•  Puppet was and is trivial to setup:
–  Anyone can do it in a day:
•  Configuring something with puppet is easy
•  What’s hard:
–  Deciding module scope and interaction with one another.
•  Three modules editing grub.conf or one
–  We started early 2012 with very little plan in the area of
module organization


Downloading Puppet Modules
•  Expectation at start – all done for us:
–  ssh, iptables , sysctl , apache, mysql all done
–  example42 or similar can do everything.
•  Reality
–  Modules often not quite correct.
•  Too simple,
–  e.g. I want my sshd_config to be different in two places.
•  Too much abstraction
–  I want to use puppet and not some abstraction of 100s of
variables covering every possible case
»  e.g puppet with(out) passenger. I only want one
–  Parameterized classes and Foreman don’t really work
•  Resulting modules are not shareable – ENC globals vs params


Sharing and Fixing Modules
•  Not as easy as it should be:
–  Our modules are littered with CERNisms
•  ntpservers, subnets, authorization systems, ..
•  Adaption to work with foreman
•  All of us learning puppet and doing things quickly (badly)
•  Hiera is being used now:
–  Provides the code vs data separation we had with
Quattor
–  Dozens of ways to setup and (ab)use hiera
–  Little experience with this anywhere yet
–  Hiera should make modules more sharable across sites
•  Looking forward to it becoming the normal standard thing that
modules use and every one benefits from


Sharing Modules With All

•  A big aim is to share our modules as much as
possible with everyone but in particular:
–  CERN IT not the only puppet deployment at CERN
•  ATLAS Point 1 farm at CERN runs puppet
–  ATLAS analysis in the cloud has used puppet
–  International HEP Labs use or are switching to puppet
–  Puppet was the “winner” at recent CHEP fabric session
•  Presentations from CERN, BNL, PIC, ATLAS
•  We will share here but its early days:
–  http://github.com/cernops


Organizing Modules On Disk

•  Started with all modules in one directory in git:
–  Obviously wrong, great confusion for new comers
•  Current situation two directories in git:
–  Modules – reusable items – e.g firewall, apache, sysctl, ..
–  Manifests – top level service, e.g batch machine, public
login machine
•  Future plans:
–  Split up modules into local and downloaded
•  modules like puppetlabs-firewall mixed with our own junk
•  Will allow us to track /contribute to upstream better
–  Inline with puppet’s upcoming vendor path


Configuration Complexity,

150 clusters
ranging form
1 to 3000
hosts.

•  We have many configurations of service.
–  Puppet handles this diversity well
•  We have many administrators >= 300
–  These admins change, are on different continents
–  Less obvious what to do with Puppet

Trust Amongst SysAdmins
All share one git
Git Repository repository
Rely on code review.
git branches and
environments.
Puppet Master(s) for Puppet Master (s) for
SysAdmin Team A SysAdmin Team B Teams use their
own puppet
masters.

hiera-gpg key for
each team.
Team A’s Team B’s
Nodes Host acl on
Nodes
puppet masters.

•  The full implications of this lack of trust between
admins is unclear
–  Interested to hear what others have done.

Change Control, Dev Cycle

•  Core team maintaining OS and basics:
–  Hardware monitoring, ntp configuration, accounts, ..
•  Specialized teams maintaining services on top:
–  They are ultimately responsible for service stability
–  We don’t want NTP configured 150 different ways
•  Requirements:
–  Some services will follow core updates
–  Some service will choose when to take core updates
–  Parts of services may follow latest updates
–  LHC has physical shutdowns for doing timely updates


Change Control , Dev Cycle

•  Puppet Environments map to Git Branches:
–  Nodes in Production, Testing and Devel branches
–  Big new configurations being tested in feature branches
•  A few nodes in these feature branches
–  Some services live isolated in their own branch
•  Risk of divergence
•  Current process:
–  A blind weekly devel -> production merge
•  Next Process:
–  Use Atlassian’s Crucible and Fisheye products to code
review puppet configuration


Crucible Reviewing Manifest

•  Atlassion themselves use puppet and do this
–  http://blogs.atlassian.com/2011/09/puppet_change_management_for_devops/


Hardware Provisioning

•  Up to now a homegrown tool in use:
–  Has strong similarities to puppet labs new Razor
•  Razor is being followed, tracked for the moment
–  Final step of tool adds host to foreman
•  We are using foreman – happy with it:
–  Kickstart templating is great
–  Organising hosts into hostgroups is great
–  We will now invest time to integrate foreman with CERN
services:
•  CERN network database , our master for switches, DNS, …
•  AIMS kerberos managed tftp server
•  CERN CA – We have our own CA used by other services also
–  We will use this for puppet also

Virtual Machine Provisioning
•  Existing Microsoft HyperV infrastructure:
–  3000 Virtual Machines of which 70 puppet managed
–  VMs pre-seeded into a foreman hostgroup
–  VMs being kickstarted onto puppet and foreman
•  Puppet managed OpenStack Nova
–  Today aiming at 200 hypervisors with up to 4000 puppet
managed VMs.
–  Machine Images created with Oz
–  Machines NOT pre-seeded in foreman or puppet
•  Register at boot time
–  amiconfig and cloud-init for contextualizing
•  pass puppet server and foreman hostgroup to image


Next Steps till End of Year

•  Migrate to PuppetDB
–  (300,000 nodes => 300 GB RAM)
•  Look at puppet dashboard
•  Use mcollective for something:
–  Necessary as node number increases
–  Currently set up but not being used particularly
•  Check Foreman’s integration with OpenStack
•  Migrate more services from Quattor to Puppet
•  Decide a scheme for secure blob delivery:
–  hiera-gpg or ACL’ed puppet fileserver


Conclusions

•  Migrating to Puppet
–  Largest change in our deployment for 5 years
•  Has all been fairly painless: Difficulties:
–  forced to integrate to existing stuff sometimes
–  Doing things wrong first time
•  lack of in house experience
•  300,000 VMs in 2015?
–  puppet easy to scale, more hardware can be added
–  We expect to dedicate up to 100 of cores to puppet
•  It’s a joy to work with an active community


Puppet Camp CERN Geneva

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Puppet Camp CERN Geneva

Semelhante a Puppet Camp CERN Geneva (20)

Último

Último (20)

Puppet Camp CERN Geneva