This presentation was given at LinuxCon 2010.
The proliferation of cloud computing is inevitable, hosted apps, software-as-as-service and now dynamic on-demand utility computing is becoming the norm. The session will be a “fire-side” chat style discussion of the types of challenges presented by IT management operations personnel and how they can manage cloud infrastructure using open source tools. The talk will discuss options for deploying and integrating tools that provision, configure, orchestrate and monitor cloud (and physical)infrastructure. The session will appeal to those IT professionals (syadmins, net-ops, developers) who develop and manage infrastructure that resides in hosted environments like Amazon EC2 without disregarding traditionally hosted internal infrastructure.
2. Mark R. Hinkle
VP of Community
Zenoss Inc.
mrhinkle@zenoss.com
mrhinkle@gmail.com
Twitter: @mrhinkle
John M. Willis
VP of Services
Opscode Inc.
john@opscode.com
botchagalupe@gmail.com ABSENT
Twitter: @botchagalupe
Monday, August 9, 2010
3. %whoami
•Former Linux Desktop Advocate
•Former LinuxWorld Magazine Editor-in-
Chief
•Open Management Consortium
Conspirator
•Open Desktop Consortium Instigator
•Author - “Windows to Linux Business
Desktop Migration” - Thomson
•NetDirector Project - Open Source
Configuration Management Project
Monday, August 9, 2010
4. Definitions (Toolchains,
Systems Management,
Cloud Computing)
Bad Jokes
Today’s
Overview of Open
Agenda Source Management Tools
Culture Changes
Alien Autopsy Photos
Example Cloud
Computing Toolchains
Monday, August 9, 2010
5. Toolchain
A set of programs where the
output of one program forms
the input of another program.
Monday, August 9, 2010
6. Open Source Management
Tools Adoption
• 98%
of
enterprises
use
open
source
systems
management
tools
• 76%
indicate
they
prefer
to
use
open
source
whenever
possible
• Compelling
factors
for
using
open
source
is
flexibility
followed
by
cost
savings
• 50%
are
already
using
some
form
of
cloud
technology
including
but
not
limited
to
hosted
applica@ons,
Amazon
Web
services
and/or
hosted
storage
• Top
IT
management
priori@es
for
2010:
monitoring,
configura4on
management,
patching
and
Source: 2010 ZenossOpen Source Management Survey
Monday, August 9, 2010
7. Cloud Computing
Hype is Becoming a
Reality
• UBS says Web Services will be a $15 billion
+ market by 2014
• IBM says Cloud Computing will be a $126
billion by 2012
• 20% of Businesses won’t have IT Assets by
2012 - Gartner
Bottom line a large portion of
our infrastructure no longer
lives in our data center... but
we gotta manage it.
Monday, August 9, 2010
8. Systems Management
Disciplines
Provisioning
Installation of operating systems and other software
Configuration Management and Automation
Sets the parameters for servers, starts and stops services,
rotates logs and other menial task
Monitoring
Monitoring queries the servers for overall health and
alerts administrators to problems
Monday, August 9, 2010
9. How to Choose Open
Source Cloud Tools
• Open Source (OSI Approved License)
• Manage Legacy Infrastructure and Cloud
• Extensible (Plugins, accept code contributions)
• Vibrant Communities (activity in Downloads, Forums,
Extensions)
• Client/server (or at least network aware)
Monday, August 9, 2010
10. Comparison of Provisioning
Tools
Year Installation
Language License
Started Targets
Red Hat, Fedora, OpenSuSE,
Cobbler 2007 Python GPL Debian, Ubuntu
Fully Automatic
Installation (FAI) 2000 Perl GPL Debian
Most .deb and Fedora
Kickstart ? Python GPL
based Linux
OpenQRM 2005 PHP GPL Linux, Solaris, Windows
Perl, Python,
Spacewalk 2008 GPL Fedora, CentOS
Java
Viper 2008 Perl GPL Debian
Monday, August 9, 2010
11. Comparison of Configuration
Management & Automation Tools
Year
Language License Client/Server Backing
Started
AutomateIT 2009 Ruby GPL No None
bcfg2 Argonne National
2003 BSD BSD Yes
Labs?
Cfengine 1993 Apache Apache Yes Cfengine Inc.
Chef Solo - No
chef 2009 Ruby Apache Opscode
Chef Server - Yes
Puppet 2004 Ruby GPL Yes PuppetLabs
Monday, August 9, 2010
12. Comparisons of Open
Source Monitoring Tools
Year Monitoring
License Language Collection Methods
Started Type
Cacti 2001 GPL PHP Performance SNMP, Syslog
SNMP, TCP, ICMP, IPMI
Nagios 1999 GPL C/PHP Availability Syslog
Availability,
OpenNMS 2000 GPL Java SNMP, JMX, HTTP
Performance,
Availability,
SNMP, TCP, ICMP, IPMI,
Zabbix 2001 GPL C/PHP Performance, and Synthetic Transactions
more
Availability, SNMP, SSH, Syslog, Event
Zenoss 2005 GPL Python Performance, Management, Synthetic
Transactions
Event Management
Monday, August 9, 2010
13. CloudOps: Change in Culture
GapingVoid - http://www.gapingvoidgallery.com/product_info.php?products_id=1643&osCsid=bc3tdqg6fuh8gato04m9obr0o1
Monday, August 9, 2010
14. Old Systems
Management
Practices
•Human Powered, Labor Intensive
•Repetitive Tasks
•“Meat Cloud”
•How many servers do you
manage per admin?
Monday, August 9, 2010
15. Cloud Changes Everything
• Hyperscalable
• Hardware Abstraction
• Dynamic Infrastructure
• Geography Independent
• Fast & Flexible
• How many cloud
instances do you manage
per admin?
Monday, August 9, 2010
16. The Myth of the Nines
Availability % Downtime per Year Downtime per Month Downtime per Week
99.9% (three nines) 8.76 hours 43.2 minutes 10.1 minutes
99.95% 4.38 hours 21.56 minutes 5.04 minutes
99.99% (four nines) 52.6 minutes 4.32 minutes 1.01 minutes
99.999% (five nines) 5.26 minutes 25.9 seconds 6.05 minutes
99.9999% (six nines) 31.5 seconds 2.59 seconds .0605 seconds
• Average polling interval for monitoring? 5 minutes?
• Even super human operations people can’t be alerted and take action in under 5 minutes.
• One outage per year could drop service level to three nines or worse.
Monday, August 9, 2010
18. Cultural Changes
Agile IT and DevOps movements
• Operations and Developers should
collaborate with each other to
deliver excellent products
• Systems Administrators need to be
come Systems Engineers building
automated, fault tolerant systems
not just maintaining infrastructure
• More frequent changes, more
outages (albeit short) to rapidly
improve IT products and services
• Process, version control, and
automation are important
Monday, August 9, 2010
19. Systems Management
Tools for Cloud Computing
Configuration
Provisioning Management and Monitoring
Automation
Kickstart Ganglia
AutomateIT
Spacewalk Nagios
Chef
OpenNMS
Cobbler Control Tier
Zabbix
OpenQRM Puppet
Zenoss Core
Monday, August 9, 2010
20. “Off the shelf” Open Source
Toolchains
• OpenQRM and Nagios
• Cobbler and Puppet
• Zenoss & [Chef, Cfengine and Puppet]
• Spacewalk and Cobbler
• OpenNMS and Rancid
• OpenNMS and Puppet
Monday, August 9, 2010
21. DevOps ToolChain Project
Project centered around how to automate and improve
infrastructure management using Agile/DevOps methodologies
Discussion Topics
• Open questions on unified pipe architecture
• Distribution methods: package vs file, rscyn/murder vs yum/rpm vs DFS
• Configuration management: RPMs vs puppet/cfengine/chef tool?
• Rollback methodologies for package and config management tools
• Controlling and timing package release and config management tools
• Log management (aggregating, crunching, charting)Change detection
http://code.google.com/p/devops-toolchain/
Monday, August 9, 2010
22. Cloud
Computing
Changes
Everything
•MeatCloud, Can’t Keep up with
Cloud Computing
•Devops & Agile IT Philosophy
•Script Repetitive Tasks
•Automate, Automate, Automate
Monday, August 9, 2010
23. Example
Cloud
ToolChain
•Multiple Cloud
Providers
•Mix and Match and
match tools
•Portability, Flexibility,
and
Monday, August 9, 2010
24. Example - Geeknet
• Servers are automatically built using configuration
management software
• Discovery tool finds infrastructure and populates
a CMDB then spits out information to scripts that
Hundreds of servers, serving
translate information to BIND configurations for
DNS web, databases, and other
infrastructure for some of the
• Monitoring tool adds hosts to polling tool to check world’s most highly trafficked
servers for availability websites – over 40 million
visitors per month.
• As infrastructure changes systems are updated
automatically
• Servers can be spun up and managed in minutes,
not hours automatically with little or no human
interaction
Monday, August 9, 2010
25. Summary
• Automate to improve service, apply leverage
• Rethink how your operations works
• Choose tools that can extend and adapt to new types of
infrastructure (what does cloud look like in 2015?)
Monday, August 9, 2010