SlideShare uma empresa Scribd logo
1 de 25
Baixar para ler offline
A Puppet Infrastructure at CERN

            Steve Traylen
        CERN IT Department
        steve.traylen@cern.ch

      Puppet Camp, Geneva, CH.
             11 July 2012
Outline

•  CERN and Computing for High Energy
   Physics
•  Today’s CERN IT Deployment
  –  Why and What’s changing
•  Adoption of Puppet, Foreman, …
  –  Progress, Integration
  –  Difficulties
  –  Future




                               Puppet Camp Geneva - CERN
CERN
§  Conseil Européen
  pour la Recherche
  Nucléaire
   §  aka European
       Laboratory for
       Particle Physics
   §  Facilities for
       fundamental research
§  Between Geneva and
    the Jura mountains,
    straddling the Swiss-
    French border
§  Founded in 1954
The Large Hadron Collider
§  Accelerator for
  protons against
  protons – 14 TeV
  collision energy
   §  By far the world’s
       most powerful
       accelerator
§  Tunnel of 27 km
    circumference, 4 m
    diameter, 50…150 m
    below ground
§  Detectors at four
    collision points
The	
  LHC	
  Computing	
  Challenge	
  
ž    Data volume
      è 15 PetaBytes of new data
        each year
ž    Global compute power
      è  250k CPU cores
      è  100 PB of disk storage
ž    Worldwide analysis & funding
      —  Distributed computing
          infrastructure to provide the
          production and analysis
          environments for the LHC
          experiments
      —  Managed and operated by a
          worldwide collaboration between
          the experiments and the
          participating computer centres
      —  Distributed for funding and
          sociological reasons
                                   Puppet Camp Geneva -
Motivation to Change Tools

•  CERN data centre is reaching its limits:
   –  IT staff numbers remain fixed
   –  more computing capacity is needed
•  Inefficiencies exist but root cause cannot be easily
   identified
   –  Tools becoming increasingly brittle and difficult to adapt
      •  E.g porting of tools to IPv6 would need a development project
   –  Some core components cannot be scaled up




                                        Puppet Camp Geneva - CERN
Second CERN Data Centre




•  Wigner Institute in Budapest, Hungary
•  Hands off facility, hardware support only
•  Deploying 2012 to 2014
                            Puppet Camp Geneva - CERN
Infrastructure Tools Evolution

•  We had to develop our own toolset in 2002
   –  “Extremely Large Fabric Management System” or http://cern.ch/ELFms
   –  Included Quattor for configuration


•  Nowadays,
   –  CERN compute capacity is no longer leading edge
   –  Many options available for open source fabric management
   –  We need to scale to meet the upcoming capacity increase
•  If there is a requirement which is not available through an
   open source tool, we should question the need
   –  If we are the first to need it, contribute it back to the open source tool




                                               Puppet Camp Geneva - CERN
Infrastructure as a Service
•  Goals
   –  Improve repair processes with virtualisation
   –  More efficient use of our hardware
   –  Better tracking of usage
   –  Enable remote management for new data centre
   –  Support potential new use cases , e.g Cloud
   –  Sustainable support model
•  At scale for 2015
   –  15,000 servers
   –  90% of hardware virtualized.
   –  300,000 VMs needed.
•  Plan = OpenStack Adoption
                               Puppet Camp Geneva - CERN
Chose Puppet for Configuration
•  The tool space has exploded in the last few years
    –  In configuration management and ops
    –  Large, shared ‘tool forges’, and lots of experience
•  Puppet and Chef are the clear leaders for the ‘core’ tool
•  Many large-scale enterprises use Puppet
    –  Its declarative approach fits better with what we are used to in Quattor.
    –  Large installations: friendly, wide-base community and commercial support
       and training
    –  You can buy books on it
    –  You can employ people who know puppet better than you do




                                                Puppet Camp Geneva - CERN
Deployed System
Starting with Puppet

•  Puppet was and is trivial to setup:
   –  Anyone can do it in a day:
•  Configuring something with puppet is easy
•  What’s hard:
   –  Deciding module scope and interaction with one another.
      •  Three modules editing grub.conf or one
   –  We started early 2012 with very little plan in the area of
      module organization




                                        Puppet Camp Geneva - CERN
Downloading Puppet Modules
•  Expectation at start – all done for us:
   –  ssh, iptables , sysctl , apache, mysql all done
   –  example42 or similar can do everything.
•  Reality
   –  Modules often not quite correct.
      •  Too simple,
           –  e.g. I want my sshd_config to be different in two places.
      •  Too much abstraction
           –  I want to use puppet and not some abstraction of 100s of
              variables covering every possible case
                »  e.g puppet with(out) passenger. I only want one
   –  Parameterized classes and Foreman don’t really work
      •  Resulting modules are not shareable – ENC globals vs params

                                         Puppet Camp Geneva - CERN
Sharing and Fixing Modules
•  Not as easy as it should be:
   –  Our modules are littered with CERNisms
      •  ntpservers, subnets, authorization systems, ..
      •  Adaption to work with foreman
      •  All of us learning puppet and doing things quickly (badly)
•  Hiera is being used now:
   –  Provides the code vs data separation we had with
      Quattor
   –  Dozens of ways to setup and (ab)use hiera
   –  Little experience with this anywhere yet
   –  Hiera should make modules more sharable across sites
      •  Looking forward to it becoming the normal standard thing that
         modules use and every one benefits from

                                          Puppet Camp Geneva - CERN
Sharing Modules With All

•  A big aim is to share our modules as much as
   possible with everyone but in particular:
   –  CERN IT not the only puppet deployment at CERN
      •  ATLAS Point 1 farm at CERN runs puppet
   –  ATLAS analysis in the cloud has used puppet
   –  International HEP Labs use or are switching to puppet
   –  Puppet was the “winner” at recent CHEP fabric session
      •  Presentations from CERN, BNL, PIC, ATLAS
•  We will share here but its early days:
   –  http://github.com/cernops



                                      Puppet Camp Geneva - CERN
Organizing Modules On Disk

•  Started with all modules in one directory in git:
   –  Obviously wrong, great confusion for new comers
•  Current situation two directories in git:
   –  Modules – reusable items – e.g firewall, apache, sysctl, ..
   –  Manifests – top level service, e.g batch machine, public
      login machine
•  Future plans:
   –  Split up modules into local and downloaded
      •  modules like puppetlabs-firewall mixed with our own junk
      •  Will allow us to track /contribute to upstream better
   –  Inline with puppet’s upcoming vendor path


                                        Puppet Camp Geneva - CERN
Configuration Complexity,

                                                   150 clusters
                                                   ranging form
                                                   1 to 3000
                                                   hosts.




•  We have many configurations of service.
  –  Puppet handles this diversity well
•  We have many administrators >= 300
  –  These admins change, are on different continents
  –  Less obvious what to do with Puppet
                                     Puppet Camp Geneva - CERN
Trust Amongst SysAdmins
                                                      All share one git
                     Git Repository                   repository
                                                      Rely on code review.
                                                      git branches and
                                                      environments.
   Puppet Master(s) for       Puppet Master (s) for
    SysAdmin Team A            SysAdmin Team B        Teams use their
                                                      own puppet
                                                      masters.

                                                      hiera-gpg key for
                                                      each team.
      Team A’s                        Team B’s
                                       Nodes          Host acl on
       Nodes
                                                      puppet masters.


•  The full implications of this lack of trust between
   admins is unclear
   –  Interested to hear what others have done.
Change Control, Dev Cycle

•  Core team maintaining OS and basics:
  –  Hardware monitoring, ntp configuration, accounts, ..
•  Specialized teams maintaining services on top:
  –  They are ultimately responsible for service stability
  –  We don’t want NTP configured 150 different ways
•  Requirements:
  –  Some services will follow core updates
  –  Some service will choose when to take core updates
  –  Parts of services may follow latest updates
  –  LHC has physical shutdowns for doing timely updates



                                    Puppet Camp Geneva - CERN
Change Control , Dev Cycle

•  Puppet Environments map to Git Branches:
  –  Nodes in Production, Testing and Devel branches
  –  Big new configurations being tested in feature branches
     •  A few nodes in these feature branches
  –  Some services live isolated in their own branch
     •  Risk of divergence
•  Current process:
  –  A blind weekly devel -> production merge
•  Next Process:
  –  Use Atlassian’s Crucible and Fisheye products to code
     review puppet configuration


                                       Puppet Camp Geneva - CERN
Crucible Reviewing Manifest




•  Atlassion themselves use puppet and do this
  –  http://blogs.atlassian.com/2011/09/puppet_change_management_for_devops/




                                         Puppet Camp Geneva - CERN
Hardware Provisioning

•  Up to now a homegrown tool in use:
  –  Has strong similarities to puppet labs new Razor
     •  Razor is being followed, tracked for the moment
  –  Final step of tool adds host to foreman
•  We are using foreman – happy with it:
  –  Kickstart templating is great
  –  Organising hosts into hostgroups is great
  –  We will now invest time to integrate foreman with CERN
     services:
     •  CERN network database , our master for switches, DNS, …
     •  AIMS kerberos managed tftp server
     •  CERN CA – We have our own CA used by other services also
          –  We will use this for puppet also
                                       Puppet Camp Geneva - CERN
Virtual Machine Provisioning
•  Existing Microsoft HyperV infrastructure:
   –  3000 Virtual Machines of which 70 puppet managed
   –  VMs pre-seeded into a foreman hostgroup
   –  VMs being kickstarted onto puppet and foreman
•  Puppet managed OpenStack Nova
   –  Today aiming at 200 hypervisors with up to 4000 puppet
      managed VMs.
   –  Machine Images created with Oz
   –  Machines NOT pre-seeded in foreman or puppet
      •  Register at boot time
   –  amiconfig and cloud-init for contextualizing
      •  pass puppet server and foreman hostgroup to image

                                      Puppet Camp Geneva - CERN
Next Steps till End of Year

•  Migrate to PuppetDB
  –  (300,000 nodes => 300 GB RAM)
•  Look at puppet dashboard
•  Use mcollective for something:
  –  Necessary as node number increases
  –  Currently set up but not being used particularly
•  Check Foreman’s integration with OpenStack
•  Migrate more services from Quattor to Puppet
•  Decide a scheme for secure blob delivery:
  –  hiera-gpg or ACL’ed puppet fileserver


                                   Puppet Camp Geneva - CERN
Conclusions

•  Migrating to Puppet
   –  Largest change in our deployment for 5 years
•  Has all been fairly painless: Difficulties:
   –  forced to integrate to existing stuff sometimes
   –  Doing things wrong first time
      •  lack of in house experience
•  300,000 VMs in 2015?
   –  puppet easy to scale, more hardware can be added
   –  We expect to dedicate up to 100 of cores to puppet
•  It’s a joy to work with an active community


                                       Puppet Camp Geneva - CERN

Mais conteúdo relacionado

Mais procurados

Build Your Private Cloud with Ezilla and Haduzilla
Build Your Private Cloud with Ezilla and HaduzillaBuild Your Private Cloud with Ezilla and Haduzilla
Build Your Private Cloud with Ezilla and HaduzillaJazz Yao-Tsung Wang
 
Couchbase Performance Benchmarking
Couchbase Performance BenchmarkingCouchbase Performance Benchmarking
Couchbase Performance BenchmarkingRenat Khasanshyn
 
YARN: a resource manager for analytic platform
YARN: a resource manager for analytic platformYARN: a resource manager for analytic platform
YARN: a resource manager for analytic platformTsuyoshi OZAWA
 
유연하고 확장성 있는 빅데이터 처리
유연하고 확장성 있는 빅데이터 처리유연하고 확장성 있는 빅데이터 처리
유연하고 확장성 있는 빅데이터 처리NAVER D2
 
Rethinking computation: A processor architecture for machine intelligence
Rethinking computation: A processor architecture for machine intelligenceRethinking computation: A processor architecture for machine intelligence
Rethinking computation: A processor architecture for machine intelligenceIntel Nervana
 
Introduction to Deep Learning and neon at Galvanize
Introduction to Deep Learning and neon at GalvanizeIntroduction to Deep Learning and neon at Galvanize
Introduction to Deep Learning and neon at GalvanizeIntel Nervana
 
Instrumenting the real-time web
Instrumenting the real-time webInstrumenting the real-time web
Instrumenting the real-time webbcantrill
 
Deep Learning Frameworks Using Spark on YARN by Vartika Singh
Deep Learning Frameworks Using Spark on YARN by Vartika SinghDeep Learning Frameworks Using Spark on YARN by Vartika Singh
Deep Learning Frameworks Using Spark on YARN by Vartika SinghData Con LA
 
20121115 open stack_ch_user_group_v1.2
20121115 open stack_ch_user_group_v1.220121115 open stack_ch_user_group_v1.2
20121115 open stack_ch_user_group_v1.2Tim Bell
 
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Yahoo Developer Network
 
Book of the Dead: Environmental Design, Tools, and Techniques for Photo-Real ...
Book of the Dead: Environmental Design, Tools, and Techniques for Photo-Real ...Book of the Dead: Environmental Design, Tools, and Techniques for Photo-Real ...
Book of the Dead: Environmental Design, Tools, and Techniques for Photo-Real ...Unity Technologies
 
Deep Dive Into the CERN Cloud Infrastructure - November, 2013
Deep Dive Into the CERN Cloud Infrastructure - November, 2013Deep Dive Into the CERN Cloud Infrastructure - November, 2013
Deep Dive Into the CERN Cloud Infrastructure - November, 2013Belmiro Moreira
 
Intel Nervana Artificial Intelligence Meetup 11/30/16
Intel Nervana Artificial Intelligence Meetup 11/30/16Intel Nervana Artificial Intelligence Meetup 11/30/16
Intel Nervana Artificial Intelligence Meetup 11/30/16Intel Nervana
 
産総研におけるプライベートクラウドへの取り組み
産総研におけるプライベートクラウドへの取り組み産総研におけるプライベートクラウドへの取り組み
産総研におけるプライベートクラウドへの取り組みRyousei Takano
 
Node.js at Joyent: Engineering for Production
Node.js at Joyent: Engineering for ProductionNode.js at Joyent: Engineering for Production
Node.js at Joyent: Engineering for Productionjclulow
 
Unix Automation using centralized configuration management tool
Unix Automation using centralized configuration management toolUnix Automation using centralized configuration management tool
Unix Automation using centralized configuration management toolTorrid Networks Private Limited
 
Cloud: From Unmanned Data Center to Algorithmic Economy using Openstack
Cloud: From Unmanned Data Center to Algorithmic Economy using OpenstackCloud: From Unmanned Data Center to Algorithmic Economy using Openstack
Cloud: From Unmanned Data Center to Algorithmic Economy using OpenstackAndrew Yongjoon Kong
 
Memory Management: What You Need to Know When Moving to Java 8
Memory Management: What You Need to Know When Moving to Java 8Memory Management: What You Need to Know When Moving to Java 8
Memory Management: What You Need to Know When Moving to Java 8AppDynamics
 
Exploring the Performance Impact of Virtualization on an HPC Cloud
Exploring the Performance Impact of Virtualization on an HPC CloudExploring the Performance Impact of Virtualization on an HPC Cloud
Exploring the Performance Impact of Virtualization on an HPC CloudRyousei Takano
 

Mais procurados (20)

Build Your Private Cloud with Ezilla and Haduzilla
Build Your Private Cloud with Ezilla and HaduzillaBuild Your Private Cloud with Ezilla and Haduzilla
Build Your Private Cloud with Ezilla and Haduzilla
 
Couchbase Performance Benchmarking
Couchbase Performance BenchmarkingCouchbase Performance Benchmarking
Couchbase Performance Benchmarking
 
YARN: a resource manager for analytic platform
YARN: a resource manager for analytic platformYARN: a resource manager for analytic platform
YARN: a resource manager for analytic platform
 
유연하고 확장성 있는 빅데이터 처리
유연하고 확장성 있는 빅데이터 처리유연하고 확장성 있는 빅데이터 처리
유연하고 확장성 있는 빅데이터 처리
 
Rethinking computation: A processor architecture for machine intelligence
Rethinking computation: A processor architecture for machine intelligenceRethinking computation: A processor architecture for machine intelligence
Rethinking computation: A processor architecture for machine intelligence
 
Introduction to Deep Learning and neon at Galvanize
Introduction to Deep Learning and neon at GalvanizeIntroduction to Deep Learning and neon at Galvanize
Introduction to Deep Learning and neon at Galvanize
 
Instrumenting the real-time web
Instrumenting the real-time webInstrumenting the real-time web
Instrumenting the real-time web
 
Deep Learning Frameworks Using Spark on YARN by Vartika Singh
Deep Learning Frameworks Using Spark on YARN by Vartika SinghDeep Learning Frameworks Using Spark on YARN by Vartika Singh
Deep Learning Frameworks Using Spark on YARN by Vartika Singh
 
20121115 open stack_ch_user_group_v1.2
20121115 open stack_ch_user_group_v1.220121115 open stack_ch_user_group_v1.2
20121115 open stack_ch_user_group_v1.2
 
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
 
Book of the Dead: Environmental Design, Tools, and Techniques for Photo-Real ...
Book of the Dead: Environmental Design, Tools, and Techniques for Photo-Real ...Book of the Dead: Environmental Design, Tools, and Techniques for Photo-Real ...
Book of the Dead: Environmental Design, Tools, and Techniques for Photo-Real ...
 
Deep Dive Into the CERN Cloud Infrastructure - November, 2013
Deep Dive Into the CERN Cloud Infrastructure - November, 2013Deep Dive Into the CERN Cloud Infrastructure - November, 2013
Deep Dive Into the CERN Cloud Infrastructure - November, 2013
 
Intel Nervana Artificial Intelligence Meetup 11/30/16
Intel Nervana Artificial Intelligence Meetup 11/30/16Intel Nervana Artificial Intelligence Meetup 11/30/16
Intel Nervana Artificial Intelligence Meetup 11/30/16
 
産総研におけるプライベートクラウドへの取り組み
産総研におけるプライベートクラウドへの取り組み産総研におけるプライベートクラウドへの取り組み
産総研におけるプライベートクラウドへの取り組み
 
Node.js at Joyent: Engineering for Production
Node.js at Joyent: Engineering for ProductionNode.js at Joyent: Engineering for Production
Node.js at Joyent: Engineering for Production
 
Unix Automation using centralized configuration management tool
Unix Automation using centralized configuration management toolUnix Automation using centralized configuration management tool
Unix Automation using centralized configuration management tool
 
U rpm-v2
U rpm-v2U rpm-v2
U rpm-v2
 
Cloud: From Unmanned Data Center to Algorithmic Economy using Openstack
Cloud: From Unmanned Data Center to Algorithmic Economy using OpenstackCloud: From Unmanned Data Center to Algorithmic Economy using Openstack
Cloud: From Unmanned Data Center to Algorithmic Economy using Openstack
 
Memory Management: What You Need to Know When Moving to Java 8
Memory Management: What You Need to Know When Moving to Java 8Memory Management: What You Need to Know When Moving to Java 8
Memory Management: What You Need to Know When Moving to Java 8
 
Exploring the Performance Impact of Virtualization on an HPC Cloud
Exploring the Performance Impact of Virtualization on an HPC CloudExploring the Performance Impact of Virtualization on an HPC Cloud
Exploring the Performance Impact of Virtualization on an HPC Cloud
 

Semelhante a Puppet Camp CERN Geneva

OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander DibboOpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander DibboOpenNebula Project
 
20120524 cern data centre evolution v2
20120524 cern data centre evolution v220120524 cern data centre evolution v2
20120524 cern data centre evolution v2Tim Bell
 
Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017Dave Holland
 
CERN Data Centre Evolution
CERN Data Centre EvolutionCERN Data Centre Evolution
CERN Data Centre EvolutionGavin McCance
 
CERN Agile Infrastructure, Road to Production
CERN Agile Infrastructure, Road to ProductionCERN Agile Infrastructure, Road to Production
CERN Agile Infrastructure, Road to ProductionSteve Traylen
 
Configuration Management Evolution at CERN
Configuration Management Evolution at CERNConfiguration Management Evolution at CERN
Configuration Management Evolution at CERNGavin McCance
 
Scaling Ceph at CERN - Ceph Day Frankfurt
Scaling Ceph at CERN - Ceph Day Frankfurt Scaling Ceph at CERN - Ceph Day Frankfurt
Scaling Ceph at CERN - Ceph Day Frankfurt Ceph Community
 
Euro ht condor_alahiff
Euro ht condor_alahiffEuro ht condor_alahiff
Euro ht condor_alahiffvandersantiago
 
Puppet Camp Dublin - 06/2012
Puppet Camp Dublin - 06/2012Puppet Camp Dublin - 06/2012
Puppet Camp Dublin - 06/2012Roland Tritsch
 
Unveiling CERN Cloud Architecture - October, 2015
Unveiling CERN Cloud Architecture - October, 2015Unveiling CERN Cloud Architecture - October, 2015
Unveiling CERN Cloud Architecture - October, 2015Belmiro Moreira
 
Multi-Cell OpenStack: How to Evolve Your Cloud to Scale - November, 2014
Multi-Cell OpenStack: How to Evolve Your Cloud to Scale - November, 2014Multi-Cell OpenStack: How to Evolve Your Cloud to Scale - November, 2014
Multi-Cell OpenStack: How to Evolve Your Cloud to Scale - November, 2014Belmiro Moreira
 
DOE Magellan OpenStack user story
DOE Magellan OpenStack user storyDOE Magellan OpenStack user story
DOE Magellan OpenStack user storylaurabeckcahoon
 
Ceph in the GRNET cloud stack
Ceph in the GRNET cloud stackCeph in the GRNET cloud stack
Ceph in the GRNET cloud stackNikos Kormpakis
 
PuppetConf 2016: Changing the Engine While in Flight – Neil Armitage, VMware
PuppetConf 2016: Changing the Engine While in Flight – Neil Armitage, VMwarePuppetConf 2016: Changing the Engine While in Flight – Neil Armitage, VMware
PuppetConf 2016: Changing the Engine While in Flight – Neil Armitage, VMwarePuppet
 
Distributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob KaralusDistributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob KaralusJakob Karalus
 
Open stack in action cern _openstack_accelerating_science
Open stack in action  cern _openstack_accelerating_scienceOpen stack in action  cern _openstack_accelerating_science
Open stack in action cern _openstack_accelerating_scienceeNovance
 
20140509 cern open_stack_linuxtag_v3
20140509 cern open_stack_linuxtag_v320140509 cern open_stack_linuxtag_v3
20140509 cern open_stack_linuxtag_v3Tim Bell
 
Grid is Dead ? Nimrod on the Cloud
Grid is Dead ? Nimrod on the CloudGrid is Dead ? Nimrod on the Cloud
Grid is Dead ? Nimrod on the CloudAdianto Wibisono
 
Urs Köster - Convolutional and Recurrent Neural Networks
Urs Köster - Convolutional and Recurrent Neural NetworksUrs Köster - Convolutional and Recurrent Neural Networks
Urs Köster - Convolutional and Recurrent Neural NetworksIntel Nervana
 

Semelhante a Puppet Camp CERN Geneva (20)

OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander DibboOpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
 
20120524 cern data centre evolution v2
20120524 cern data centre evolution v220120524 cern data centre evolution v2
20120524 cern data centre evolution v2
 
Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017
 
CERN Data Centre Evolution
CERN Data Centre EvolutionCERN Data Centre Evolution
CERN Data Centre Evolution
 
CERN Agile Infrastructure, Road to Production
CERN Agile Infrastructure, Road to ProductionCERN Agile Infrastructure, Road to Production
CERN Agile Infrastructure, Road to Production
 
Configuration Management Evolution at CERN
Configuration Management Evolution at CERNConfiguration Management Evolution at CERN
Configuration Management Evolution at CERN
 
Scaling Ceph at CERN - Ceph Day Frankfurt
Scaling Ceph at CERN - Ceph Day Frankfurt Scaling Ceph at CERN - Ceph Day Frankfurt
Scaling Ceph at CERN - Ceph Day Frankfurt
 
Euro ht condor_alahiff
Euro ht condor_alahiffEuro ht condor_alahiff
Euro ht condor_alahiff
 
Puppet Camp Dublin - 06/2012
Puppet Camp Dublin - 06/2012Puppet Camp Dublin - 06/2012
Puppet Camp Dublin - 06/2012
 
Unveiling CERN Cloud Architecture - October, 2015
Unveiling CERN Cloud Architecture - October, 2015Unveiling CERN Cloud Architecture - October, 2015
Unveiling CERN Cloud Architecture - October, 2015
 
Multi-Cell OpenStack: How to Evolve Your Cloud to Scale - November, 2014
Multi-Cell OpenStack: How to Evolve Your Cloud to Scale - November, 2014Multi-Cell OpenStack: How to Evolve Your Cloud to Scale - November, 2014
Multi-Cell OpenStack: How to Evolve Your Cloud to Scale - November, 2014
 
DOE Magellan OpenStack user story
DOE Magellan OpenStack user storyDOE Magellan OpenStack user story
DOE Magellan OpenStack user story
 
Ceph in the GRNET cloud stack
Ceph in the GRNET cloud stackCeph in the GRNET cloud stack
Ceph in the GRNET cloud stack
 
PuppetConf 2016: Changing the Engine While in Flight – Neil Armitage, VMware
PuppetConf 2016: Changing the Engine While in Flight – Neil Armitage, VMwarePuppetConf 2016: Changing the Engine While in Flight – Neil Armitage, VMware
PuppetConf 2016: Changing the Engine While in Flight – Neil Armitage, VMware
 
Distributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob KaralusDistributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob Karalus
 
Cloud iaa s-labs- ubuntu canonical- fossa2010
Cloud iaa s-labs- ubuntu canonical- fossa2010Cloud iaa s-labs- ubuntu canonical- fossa2010
Cloud iaa s-labs- ubuntu canonical- fossa2010
 
Open stack in action cern _openstack_accelerating_science
Open stack in action  cern _openstack_accelerating_scienceOpen stack in action  cern _openstack_accelerating_science
Open stack in action cern _openstack_accelerating_science
 
20140509 cern open_stack_linuxtag_v3
20140509 cern open_stack_linuxtag_v320140509 cern open_stack_linuxtag_v3
20140509 cern open_stack_linuxtag_v3
 
Grid is Dead ? Nimrod on the Cloud
Grid is Dead ? Nimrod on the CloudGrid is Dead ? Nimrod on the Cloud
Grid is Dead ? Nimrod on the Cloud
 
Urs Köster - Convolutional and Recurrent Neural Networks
Urs Köster - Convolutional and Recurrent Neural NetworksUrs Köster - Convolutional and Recurrent Neural Networks
Urs Köster - Convolutional and Recurrent Neural Networks
 

Último

Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 

Último (20)

Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 

Puppet Camp CERN Geneva

  • 1. A Puppet Infrastructure at CERN Steve Traylen CERN IT Department steve.traylen@cern.ch Puppet Camp, Geneva, CH. 11 July 2012
  • 2. Outline •  CERN and Computing for High Energy Physics •  Today’s CERN IT Deployment –  Why and What’s changing •  Adoption of Puppet, Foreman, … –  Progress, Integration –  Difficulties –  Future Puppet Camp Geneva - CERN
  • 3. CERN §  Conseil Européen pour la Recherche Nucléaire §  aka European Laboratory for Particle Physics §  Facilities for fundamental research §  Between Geneva and the Jura mountains, straddling the Swiss- French border §  Founded in 1954
  • 4. The Large Hadron Collider §  Accelerator for protons against protons – 14 TeV collision energy §  By far the world’s most powerful accelerator §  Tunnel of 27 km circumference, 4 m diameter, 50…150 m below ground §  Detectors at four collision points
  • 5. The  LHC  Computing  Challenge   ž  Data volume è 15 PetaBytes of new data each year ž  Global compute power è  250k CPU cores è  100 PB of disk storage ž  Worldwide analysis & funding —  Distributed computing infrastructure to provide the production and analysis environments for the LHC experiments —  Managed and operated by a worldwide collaboration between the experiments and the participating computer centres —  Distributed for funding and sociological reasons Puppet Camp Geneva -
  • 6. Motivation to Change Tools •  CERN data centre is reaching its limits: –  IT staff numbers remain fixed –  more computing capacity is needed •  Inefficiencies exist but root cause cannot be easily identified –  Tools becoming increasingly brittle and difficult to adapt •  E.g porting of tools to IPv6 would need a development project –  Some core components cannot be scaled up Puppet Camp Geneva - CERN
  • 7. Second CERN Data Centre •  Wigner Institute in Budapest, Hungary •  Hands off facility, hardware support only •  Deploying 2012 to 2014 Puppet Camp Geneva - CERN
  • 8. Infrastructure Tools Evolution •  We had to develop our own toolset in 2002 –  “Extremely Large Fabric Management System” or http://cern.ch/ELFms –  Included Quattor for configuration •  Nowadays, –  CERN compute capacity is no longer leading edge –  Many options available for open source fabric management –  We need to scale to meet the upcoming capacity increase •  If there is a requirement which is not available through an open source tool, we should question the need –  If we are the first to need it, contribute it back to the open source tool Puppet Camp Geneva - CERN
  • 9. Infrastructure as a Service •  Goals –  Improve repair processes with virtualisation –  More efficient use of our hardware –  Better tracking of usage –  Enable remote management for new data centre –  Support potential new use cases , e.g Cloud –  Sustainable support model •  At scale for 2015 –  15,000 servers –  90% of hardware virtualized. –  300,000 VMs needed. •  Plan = OpenStack Adoption Puppet Camp Geneva - CERN
  • 10. Chose Puppet for Configuration •  The tool space has exploded in the last few years –  In configuration management and ops –  Large, shared ‘tool forges’, and lots of experience •  Puppet and Chef are the clear leaders for the ‘core’ tool •  Many large-scale enterprises use Puppet –  Its declarative approach fits better with what we are used to in Quattor. –  Large installations: friendly, wide-base community and commercial support and training –  You can buy books on it –  You can employ people who know puppet better than you do Puppet Camp Geneva - CERN
  • 12. Starting with Puppet •  Puppet was and is trivial to setup: –  Anyone can do it in a day: •  Configuring something with puppet is easy •  What’s hard: –  Deciding module scope and interaction with one another. •  Three modules editing grub.conf or one –  We started early 2012 with very little plan in the area of module organization Puppet Camp Geneva - CERN
  • 13. Downloading Puppet Modules •  Expectation at start – all done for us: –  ssh, iptables , sysctl , apache, mysql all done –  example42 or similar can do everything. •  Reality –  Modules often not quite correct. •  Too simple, –  e.g. I want my sshd_config to be different in two places. •  Too much abstraction –  I want to use puppet and not some abstraction of 100s of variables covering every possible case »  e.g puppet with(out) passenger. I only want one –  Parameterized classes and Foreman don’t really work •  Resulting modules are not shareable – ENC globals vs params Puppet Camp Geneva - CERN
  • 14. Sharing and Fixing Modules •  Not as easy as it should be: –  Our modules are littered with CERNisms •  ntpservers, subnets, authorization systems, .. •  Adaption to work with foreman •  All of us learning puppet and doing things quickly (badly) •  Hiera is being used now: –  Provides the code vs data separation we had with Quattor –  Dozens of ways to setup and (ab)use hiera –  Little experience with this anywhere yet –  Hiera should make modules more sharable across sites •  Looking forward to it becoming the normal standard thing that modules use and every one benefits from Puppet Camp Geneva - CERN
  • 15. Sharing Modules With All •  A big aim is to share our modules as much as possible with everyone but in particular: –  CERN IT not the only puppet deployment at CERN •  ATLAS Point 1 farm at CERN runs puppet –  ATLAS analysis in the cloud has used puppet –  International HEP Labs use or are switching to puppet –  Puppet was the “winner” at recent CHEP fabric session •  Presentations from CERN, BNL, PIC, ATLAS •  We will share here but its early days: –  http://github.com/cernops Puppet Camp Geneva - CERN
  • 16. Organizing Modules On Disk •  Started with all modules in one directory in git: –  Obviously wrong, great confusion for new comers •  Current situation two directories in git: –  Modules – reusable items – e.g firewall, apache, sysctl, .. –  Manifests – top level service, e.g batch machine, public login machine •  Future plans: –  Split up modules into local and downloaded •  modules like puppetlabs-firewall mixed with our own junk •  Will allow us to track /contribute to upstream better –  Inline with puppet’s upcoming vendor path Puppet Camp Geneva - CERN
  • 17. Configuration Complexity, 150 clusters ranging form 1 to 3000 hosts. •  We have many configurations of service. –  Puppet handles this diversity well •  We have many administrators >= 300 –  These admins change, are on different continents –  Less obvious what to do with Puppet Puppet Camp Geneva - CERN
  • 18. Trust Amongst SysAdmins All share one git Git Repository repository Rely on code review. git branches and environments. Puppet Master(s) for Puppet Master (s) for SysAdmin Team A SysAdmin Team B Teams use their own puppet masters. hiera-gpg key for each team. Team A’s Team B’s Nodes Host acl on Nodes puppet masters. •  The full implications of this lack of trust between admins is unclear –  Interested to hear what others have done.
  • 19. Change Control, Dev Cycle •  Core team maintaining OS and basics: –  Hardware monitoring, ntp configuration, accounts, .. •  Specialized teams maintaining services on top: –  They are ultimately responsible for service stability –  We don’t want NTP configured 150 different ways •  Requirements: –  Some services will follow core updates –  Some service will choose when to take core updates –  Parts of services may follow latest updates –  LHC has physical shutdowns for doing timely updates Puppet Camp Geneva - CERN
  • 20. Change Control , Dev Cycle •  Puppet Environments map to Git Branches: –  Nodes in Production, Testing and Devel branches –  Big new configurations being tested in feature branches •  A few nodes in these feature branches –  Some services live isolated in their own branch •  Risk of divergence •  Current process: –  A blind weekly devel -> production merge •  Next Process: –  Use Atlassian’s Crucible and Fisheye products to code review puppet configuration Puppet Camp Geneva - CERN
  • 21. Crucible Reviewing Manifest •  Atlassion themselves use puppet and do this –  http://blogs.atlassian.com/2011/09/puppet_change_management_for_devops/ Puppet Camp Geneva - CERN
  • 22. Hardware Provisioning •  Up to now a homegrown tool in use: –  Has strong similarities to puppet labs new Razor •  Razor is being followed, tracked for the moment –  Final step of tool adds host to foreman •  We are using foreman – happy with it: –  Kickstart templating is great –  Organising hosts into hostgroups is great –  We will now invest time to integrate foreman with CERN services: •  CERN network database , our master for switches, DNS, … •  AIMS kerberos managed tftp server •  CERN CA – We have our own CA used by other services also –  We will use this for puppet also Puppet Camp Geneva - CERN
  • 23. Virtual Machine Provisioning •  Existing Microsoft HyperV infrastructure: –  3000 Virtual Machines of which 70 puppet managed –  VMs pre-seeded into a foreman hostgroup –  VMs being kickstarted onto puppet and foreman •  Puppet managed OpenStack Nova –  Today aiming at 200 hypervisors with up to 4000 puppet managed VMs. –  Machine Images created with Oz –  Machines NOT pre-seeded in foreman or puppet •  Register at boot time –  amiconfig and cloud-init for contextualizing •  pass puppet server and foreman hostgroup to image Puppet Camp Geneva - CERN
  • 24. Next Steps till End of Year •  Migrate to PuppetDB –  (300,000 nodes => 300 GB RAM) •  Look at puppet dashboard •  Use mcollective for something: –  Necessary as node number increases –  Currently set up but not being used particularly •  Check Foreman’s integration with OpenStack •  Migrate more services from Quattor to Puppet •  Decide a scheme for secure blob delivery: –  hiera-gpg or ACL’ed puppet fileserver Puppet Camp Geneva - CERN
  • 25. Conclusions •  Migrating to Puppet –  Largest change in our deployment for 5 years •  Has all been fairly painless: Difficulties: –  forced to integrate to existing stuff sometimes –  Doing things wrong first time •  lack of in house experience •  300,000 VMs in 2015? –  puppet easy to scale, more hardware can be added –  We expect to dedicate up to 100 of cores to puppet •  It’s a joy to work with an active community Puppet Camp Geneva - CERN