Imagine it's eight o'clock on a Thursday morning and you awake to see a bulldozer out your window ready to plow over your data center. Normally you may wish to consult the Encyclopedia Galáctica to discern the best course of action but your copy is likely out of date. And while the Hitchhiker's Guide to the Galaxy (HHGTTG) is a wholly remarkable book it doesn't cover the nuances of cloud computing. That's why you need the Hitchhiker's Guide to Cloud Computing (HHGTCC) or at least to attend this talk understand the state of open source cloud computing. Specifically this talk will cover infrastructure-as-a-service, platform-as-a-service and developments in big data and how to more effectively take advantage of these technologies using open source software. Technologies that will be covered in this talk include Apache CloudStack, Chef, CloudFoundry, NoSQL, OpenStack, Puppet and many more.
What Are The Drone Anti-jamming Systems Technology?
Linux Foundation Collaboration Summit: Hitchhiker's Guide to the Cloud
1. Hitchhiker’s Guide
to The Open Cloud
Linux Foundation Collaboration Summit 2013
Mark R. Hinkle
Sr. Director , OPEN SOURCE SOLUTIONS
Citrix Systems INC.
@mrhinkle
mrhinkle@gmail.com
2. Mark Hinkle, Sr. Director, Open Source Solutions
• Dedicated
to
the
success
of
the
Apache
CloudStack,
Open
Daylight
&
Xen
Project
Communi3es
on
Citrix
behalf
• Run
BuildACloud.org
learning
ac3vi3es
all
over
the
world
• Joined
Citrix
via
Cloud.com
acquisi3on
July
2011
• Zenoss
Core
Open
Source
project
to
100,000
users,
1.5
million
downloads
• Former
LinuxWorld
Magazine
Editor-‐in-‐Chief
• Open
Management
ConsorGum
organizer
• Author
-‐
“Windows
to
Linux
Business
Desktop
MigraGon”
–
Thomson
• NetDirector
Project
-‐
Open
Source
Configura3on
Management
• Some3mes
Author
and
Blogger
at
SocializedSoJware.com
• NetworkWorld
Open
Source
Subnet
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
2
3. Why Open Source and the Cloud Computing?
• User-‐Driven
Context
from
Solving
Real
Problems
• Lower
Barrier
to
Par3cipa3on
• Larger
user
base,
users
helping
users
• Aggressive
release
cycles
stay
current
with
the
state-‐of-‐the-‐art
• Open
Source
innova3ng
faster
than
commercial
• Open
data,
Open
standards,
Open
APIs
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
3
4. Quick Cloud Computing Overview or
the Obligatory “What is the Cloud
Explanation”
Infinite Improbability Drive
5. Five Characteristics of Cloud
1. On-‐Demand
Self-‐Service
2. Broad
Network
Access
3. Resource
Pooling
4. Rapid
Elas3city
5. Measured
Service
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
5
6. Cloud Computing Service Models
USER CLOUD a.k.a. SOFTWARE AS A SERVICE
Single application, multi-tenancy, network-based, one-to-many delivery of
applications, all users have same access to features.
Examples: Salesforce.com, Google Docs, Red Hat Network/RHEL
DEVELOPMENT CLOUD a.k.a. PLATFORM-AS-A-SERVICE
Application developer model, Application deployed to an elastic service that
autoscales, low administrative overhead. No concept of virtual machines or
operating system. Code it and deploy it.
Examples: VMware CloudFoundry, Google AppEngine, Windows Azure,
Rackspace Sites, Red Hat OpenShift, Active State Stackato, Appfog
SYSTEMS CLOUD a.k.a INFRASTRUCTURE-AS-A-SERVICE
Servers and storage are made available in a scalable way over a network.
Examples: EC2,Rackspace CloudFiles, OpenStack, CloudStack,
Eucalyptus, OpenNebula
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
6
10. Hypervisors
Open
Source
• Xen,
Project
Xen
Cloud
PlaMorm
(XCP)
• KVM
–
Kernel-‐based
VirtualizaGon
• VirtualBox*
-‐
Oracle
supported
Virtualiza3on
Solu3ons
• OpenVZ*
-‐
Container-‐based,
Similar
to
Solaris
Containers
or
BSD
Zones
• LXC
–
User
Space
chrooted
installs
Proprietary
• VMware
• Citrix
Xenserver
(based
• Microsoa
Hyper-‐V
• OracleVM
(Based
on
OS
Xen)
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
10
11. Open Virtual Machine Formats
Open
VirtualizaGon
Format
(OVF)
is
an
open
standard
for
packaging
and
distribu3ng
virtual
appliances
or
more
generally
soaware
to
be
run
in
virtual
machines.
Formats
for
hypervisors/cloud
technologies:
• Amazon
-‐
AMI
• KVM
–
QCOW2
• VMware
–
VMDK
• Xen
Project–
IMG
• VHD
–
Virtual
Hard
Disk
-‐
Hyper-‐V
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
11
12. Sourcing Cloud Appliances
Tool/Project
What
you
can
do
with
them
Bitnami
BitNami
provides
free,
ready
to
run
environments
for
your
favorite
open
source
web
applica3ons
and
frameworks,
including
Drupal,
Joomla!,
Wordpress,
PHP,
Rails,
Django
and
many
more.
Boxgrinder
BoxGrinder
is
a
set
of
projects
that
help
you
grind
out
appliances
for
mul3ple
virtualiza3on
and
Cloud
providers
Oz
Command-‐line
tool
that
has
the
ability
to
create
images
for
common
Linux
distribu3ons
to
run
on
KVM
SUSE
Studio
SUSE
Studio
supports
building
and
deploying
directly
to
cloud
services
such
as
Amazon
EC2.
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
12
13. Scale-Up or Scale-Out
VerGcal
Scaling
(Scale-‐Up)
Allocate
addi3onal
resources
to
VMs,
requires
a
reboot,
no
need
for
distributed
app
logic,
single-‐point
of
OS
failure
Horizontal
Scaling
(Scale-‐Out)
Applica3on
needs
logic
to
work
in
distributed
fashion
(e.g.
HA-‐Proxy
and
Apache,
Hadoop)
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
13
14. Compute Clouds (IaaS)
Year
Started
License
VirtualizaGon
Technologies
Apache
CloudStack
2008
Apache
Xenserver,
Xen
Cloud
Plalorm,
KVM,
VMware
(Hyper-‐V
developing)
Eucalyptus
2006
GPL
Xen,
KVM,
VMware
(commercial
version)
OpenNebula
2005
Apache
Xen,
KVM,
VMware
OpenStack
2010
(Developed
by
NASA
by
Anso
Labs
previously)
Apache
VMware
ESX
and
ESXi,
,
Xen,
Xen
Cloud
Plalorm
KVM,
LXC,
QEMU
and
Virtual
Box
Numerous companies are building cloud software on OpenStack including Nebula, Piston Inc., CloudScaling
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
14
15. OpenStack – Ecosystem of Projects
Enterprise
Message
Queue
based
on
Rabbit
MQ
(ESB)
Object
Storage
“Swia”
Image
Service
“Glance
”
Compute
“Nova”
Dashboard
“Horizon”
KVM,
VMware,
Xen
Cloud
Plalorm
Ceph,
Gluster
Advanced
Cloud
and
Networking
services
accessing
the
Quantum
API
Firewall
Service
Gateway
Service
Quantum
Networking
Fabric
REST
API
Plugins
OpenvSwitch
Quantum
Plugin-‐ins
Iden3ty
Services
“Keystone”
API
20+ Collective projects hosted at: https://launchpad.net/openstack
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
15
16. Cloud APIs
• jclouds
• libcloud
• deltacloud
• fog
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
16
17. Cloud Computing Storage
Project
DescripGon
Ceph
Distributed
file
storage
system
developed
by
DreamHost
GlusterFS
Scale
Out
NAS
system
aggrega3ng
storage
over
Ethernet
or
Infiniband
OpenStack
Storage
Long-‐term
object
storage
system
Riak
CS
Riak
CS
is
open
source
soaware
designed
to
provide
simple,
available,
distributed
cloud
storage
at
any
scale.
Riak
CS
is
S3-‐
API
compa3ble
and
supports
per-‐tenant
repor3ng
for
billing
and
metering
use
cases.
Sheepdog
Distributed
storage
for
KVM
hypervisors
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
17
18. Platform-as-a-Service (PaaS)
Project
Year
Started
Sponsors
Languages/Frameworks
CloudFoundry
2011
VMware
Spring
for
Java,
Ruby
for
Rails
and
Sinatra,
node.js,
Grails,
Scala
on
Lia
and
more
via
partners
(e.g.
Python,
PHP)
Cloudify
2012
Gigaspaces
[Groovy
for
deployment
recipes]
OpenShia
**
2011
Red
Hat
Java,
Ruby,
PHP,
Perl
and
Python
Stackato*
2012
Ac3veState
Java,
Python,
PHP,
Ruby,
Perl,
Node.js,
others
WSO2
Stratus
2010
WSO2
Jboss,
Java
EE6
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
18
20. Overview of Software Defined Networking
Business
Applica3ons
Network
Services
SDN
Control
Software
API API
Network
Devices
Network
Devices
Network
Devices
Network
Devices
Network
Devices
Network
Devices
Application
Layer
Control
Layer
Infrastructure
Layer
Control Data Plane Interface (e.g. OpenFlow)
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
20
21. Cloud Promise, Reality and Networks
Cloud
Promise
Cloud
Reality
Centralized
ConfiguraGon
and
AutomaGon
Without
true
virtualiza3on,
network
devices
must
s3ll
be
manually
configured.
Instant
Self-‐Service
Provisioning
In
a
physical
network,
it
could
take
a
long
3me
for
network
engineer
to
provision
new
services.
ElasGcity
and
Scalability
By
horizontally
scaling
up
the
physical
network,
elas3city
is
lost.
Designed
for
Failure
Failover
can
be
automated
and
physical
network
limita3ons
can
be
alleviated.
Source: Midokura
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
21
22. Open Flow
OpenFlow
enables
networks
to
evolve,
by
giving
a
remote
controller
the
power
to
modify
the
behavior
of
network
devices,
through
a
well-‐defined
"forwarding
instruc3on
set".
The
growing
OpenFlow
ecosystem
now
includes
routers,
switches,
virtual
switches,
and
access
points
from
a
range
of
vendors.
Image from http://www.openflow.org/documents/openflow-wp-latest.pdf
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
22
23. Software Defined Networking (SDN)
Project Description
Floodlight
The
Floodlight
controller
is
an
enterprise-‐class,
Apache-‐licensed,
Java-‐based
OpenFlow
Controller.
Indigo
Indigo
is
an
open
source
project
to
support
OpenFlow
on
a
range
of
physical
switches.
By
leveraging
hardware
features
of
Ethernet
switch
ASICs,
Indigo
supports
high
rates
for
high
port
counts,
up
to
48
10-‐gigabit
ports.
Mul3ple
gigabit
plalorms
with
10-‐gigabit
uplinks
are
also
supported.
Open
Daylight
Linux
Founda3on
Collabora3ve
Project
based
on
Cisco
One
Controller
and
OpenStack
Networking
“Quantum”
Pluggable,
scalable,
API-‐driven
network
and
IP
management
Open
vSwitch
Open
vSwitch
is
a
open
source
(ASL
2.0),
mul3layer
virtual
switch
designed
to
enable
massive
network
automa3on
through
programma3c
extension,
while
s3ll
suppor3ng
standard
management
interfaces
and
protocols
(e.g.
NetFlow,
sFlow,
SPAN,
RSPAN,
CLI,
LACP,
802.1ag).
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
23
26. Twitter at 400M Tweets Per Day – June 2012
0
50
100
150
200
250
300
350
400
450
Jan-‐07
Mar-‐07
May-‐07
Jul-‐07
Sep-‐07
Nov-‐07
Jan-‐08
Mar-‐08
May-‐08
Jul-‐08
Sep-‐08
Nov-‐08
Jan-‐09
Mar-‐09
May-‐09
Jul-‐09
Sep-‐09
Nov-‐09
Jan-‐10
Mar-‐10
May-‐10
Jul-‐10
Sep-‐10
Nov-‐10
Jan-‐11
Mar-‐11
May-‐11
Jul-‐11
Sep-‐11
Nov-‐11
Jan-‐12
Mar-‐12
May-‐12
TweetsinMillions
Source :TheBigDataGroup.com
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
26
27. Data
is
growing
faster
than
storage
capacity
and
compu3ng
power.
Legacy
systems
hold
organiza3ons
back;
storage
soaware
must
include
mul3-‐petabyte
capacity,
support
poten3ally
billions
of
objects,
and
provide
applica3on
performance
awareness
and
agile
provisioning.
-‐Gartner,
Big
Data
Challenges
for
the
IT
Infrastructure
Team
Big Data and Storage Infrastructure
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
27
29. Open Source NoSQL Databases
Name Type Description
Apache
Cassandra
Wide
Column
Store/
Families
API:
many
»
Query
Method:
MapReduce,
Replicaton:
,
Wriuen
in:
Java,
Concurrency:
eventually
consistent
,
Misc:
like
"Big-‐Table
on
Amazon
Dynamo
alike",
ini3ated
by
Facebook
CouchDB
Document
Store
API:
Memcached
API+protocol
(binary
and
ASCII)
,
most
languages,
Protocol:
Memcached
REST
interface
for
cluster
conf
+
management,
Wriuen
in:
C/C++
+
Erlang
(clustering),
Replica3on:
Peer
to
Peer,
fully
consistent,
Misc:
Transparent
topology
changes
during
opera3on,
provides
memcached-‐compa3ble
caching
buckets
HBase
Wide
Column
Store/
Families
API:
Java
/
any
writer,
Protocol:
any
write
call,
Query
Method:
MapReduce
Java
/
any
exec,
Replica3on:
HDFS
Replica3on,
Wriuen
in:
Java
Hypertable
Wide
Column
Store/
Families
PI:
Thria
(Java,
PHP,
Perl,
Python,
Ruby,
etc.),
Protocol:
Thria,
Query
Method:
HQL,
na3ve
Thria
API,
Replica3on:
HDFS
Replica3on,
Concurrency:
MVCC,
Consistency
Model:
Fully
consistent
Misc:
High
performance
C++
implementa3on
of
Google's
Bigtable.
MongoDB
Document
Store
API:
BSON,
Protocol:
C,
Query
Method:
dynamic
object-‐based
language
&
MapReduce,
Replica3on:
Master
Slave
&
Auto-‐Sharding,
Wriuen
in:
C++,Concurrency
Redis
Key
Value/
Tuple
Store
API:
Tons
of
languages,
Wriuen
in:
C,
Concurrency:
in
memory
and
saves
asynchronous
disk
aaer
a
defined
3me.
Append
only
mode
available.
Different
kinds
of
fsync
policies.
Replica3on:
Master
/
Slave,
Misc:
also
lists,
sets,
sorted
sets,
hashes,
queues.
Riak
Key
Value
/
Tuple
Store
API:
JSON,
Protocol:
REST,
Query
Method:
MapReduce
term
matching
,
Scaling:
Mul3ple
Masters;
Wriuen
in:
Erlang,
Concurrency:
eventually
consistent
(stronger
then
MVCC
via
Vector
Clocks)
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
29
30. MapReduce
Problem
Data
Master
Node
Worker
Node
1
Worker
Node
2
Worker
Node
3
Solu3on
Data
Map
Reduce
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
30
31. Apache Hadoop
Overview
• Handles
large
amounts
of
data
• Stores
data
in
na3ve
format
• Delivers
linear
scalability
at
low
cost
• Resilient
in
case
of
infrastructure
failures
• Transparent
applica3on
scalability
Facts
• Apache
top-‐level
open
source
project
• One
framework
for
storage
and
compute
– HDFS
–
Scalable
storage
in
Hadoop
Distributed
File
System
(HDFS)
– Compute
via
the
MapReduce
distributed
processing
plalorm
• Domain
Specific
Language
(DSL)
-‐
Java
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
31
32. Hadoop Architecture
Hadoop
Common
HDFS
Distributes
&
replicates
data
across
machines
MapReduce
Distributes
&
monitors
tasks
Hive
Data
warehouse
that
provides
SQL
interface.
Ad
hoc
projec3on
of
data
structure
to
unstructured
MapReduce
• Parallel programming
• Handles large data blocks
Non-Relational DB
HBase
Column-‐oriented
schema-‐less
distributed
DB
modeled
aaer
Google’s
BigTable
Random
real
3me
read/
write.
Scripting
Pig
Plalorm
for
manipula3ng
and
analyzing
large
data
sets.
Scrip3ng
language
for
analysts.
Mahout
Machine
learning
libraries
for
recommenda3ons
,
clustering,
classifica3ons
and
item
sets.
Machine Learning
Chuckwa
Zookeeper
Management
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
32
33. Big Data Summary
• Quan3ty
of
Machine
Created
Data
Increasing
Dras3cally
(examples:
networked
sensor
data
from
mobile
phones
and
GPS
devices)
• Data
manipula3on
moving
from
batched
to
real-‐3me
• Cloud
services
giving
everyone
Big
Data
tools
• Consumer
company
speed
and
scale
requirements
driving
efficiencies
in
Big
Data
storage
and
analy3cs
• New
and
broader
number
of
data
sources
being
meshed
together
• Big
Data
Apps
means
using
Big
Data
is
faster
and
easier
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
33
35. Automation in the Cloud
Meat Cloud Cloud Operations
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
35
36. 4 Types of Management Tools
Provisioning
Installation of operating systems and other software
Configuration Management
Sets the parameters for servers, can specify installation parameters
Orchestration/Automation
Automate tasks across systems
Monitoring
Records errors and health of IT infrastructure
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
36
37. Management Toolchains
Configura3on
Patching
and
Provisioning
Monitoring
Toolchain (n):
A set of tools where
the output of one
tool becomes the
input of another tool
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
37
38. Provisioning
Project Installation Targets
Apache
Provisionr(incuba3ng)
Can
provision
10s
to
1000s
of
machines
on
various
clouds.
Cobbler
Distributed
virtual
infrastructure
using
koan
(kickstart
of
a
network
to
PXE
boot
VMs)
for
Red
Hat,
OpenSUSE
Fedora,
Debian,
Ubuntu
VMs
Crowbar
(Bare
metal
provisioning)
JuJu
Public
Clouds
-‐
Amazon
Web
Services
HP
Cloud,
Private
OpenStack
clouds,
Bare
Metal
via
MAAS.
Salt
Cloud
Tool
to
provision
“salted”
VMs
that
can
then
be
updated
by
a
central
server
via
ZeroMQ
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
38
39. Configuration Management Tools
Project
Year
Started
Language
License
Client/Server
Cfengine
1993
C
Apache
Yes
Chef
2009
Ruby
Apache
Chef
Solo
–
No
Chef
Server
-‐
Yes
Puppet
2004
Ruby
GPL
Yes
&
standalone
Salt
2011
Python
Apache
yes
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
39
40. Automation/Orchestration Tools
Project
DescripGon
Ansible
Ansible's
SSH-‐key
based
access
allows
contributors
to
the
Fedora
Project
to
assist
in
automa3ng
infrastructure
while
having
access
limited
appropriately.
Capistrano
U3lity
and
framework
for
execu3ng
commands
in
parallel
on
mul3ple
remote
machines,
via
SSH.
It
uses
a
simple
DSL
that
allows
you
to
define
tasks,
which
may
be
applied
to
machines
in
certain
roles
RunDeck
Rundeck
is
an
open-‐source
process
automa3on
and
command
orchestra3on
tool
with
a
web
console.
Func
Func
provides
a
two-‐way
authen3cated
system
for
generically
execu3ng
tasks,
integra3ons
with
puppet
and
cobbler.
MCollec3ve
The
Marioneue
Collec3ve
AKA
MCollec3ve
is
a
framework
to
build
server
orchestra3on
or
parallel
job
execu3on
systems.
Salt
Execute
arbitrary
shell
commands
or
choose
from
dozens
of
pre-‐built
modules
of
common
(or
complex)
commands.
Scalr
Provide
scaling
across
mul3ple
cloud
compu3ng
plalorms,
integrates
with
Chef.
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
40
41. Conceptual Automated Toolchain
BootStrapped
Image
CloudStack
OpenStack
ConfiguraGon
Puppet
Chef
Start/Stop
Services
RunDeck
Capistrano
MCollec3ve
Provision
Cobbler
SUSE
Stuido
Monitoring
Nagios
Zenoss
Cac3
Generate
Images
SUSE
Studio
BoxGrinder
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
41
42. NetFlix Open Source ToolBag for AWS
ASGARD ASTYANAX EDDA
EUREKA PRIAM SIMIAN ARMY
42
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
http://netflix.github.com
45. Questions?
Slides Can be Viewed and Downloaded at:
http://www.slideshare.net/socializedsoftware/
Copyright Mark R. Hinkle, available
under the CCbySA license some
rights reserved. 2012 -2013
46. Contact Me
Professional: mark.hinkle@citrix.com
Personal: mrhinkle@gmail.com
Phone: 919.228.8049
Personal: http://www.socializedsoftware.com
Twitter: @mrhinkle
Mark R. Hinkle
Senior Director,
Open Source Solutions
Citrix Systems Inc.
Open Source Enthusiast
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
46
48. Additional Resources
• Devops
Toolchains
Group
• Soaware
Defined
Networking:
The
New
Norm
for
Networks
(Whitepaper)
• DevOps
Wikipedia
Page
• NoSQL-‐Database.org
–
Ul3mate
Guide
to
the
Non-‐Rela3onal
Universe
• Open
Cloud
Ini3a3ve
• NIST
Cloud
Compu3ng
Plalorm
• Open
Virtualiza3on
Format
Specs
• Cloudera3
Twiuer
Account
• Planet
DevOps
• Nicira
Whitepaper
–
It’s
Time
to
Virtualize
the
Network
• Why
Open
vSwitch
FAQ
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
48
49. Monitoring Tools
License
Type
of
Monitoring
CollecGon
Methods
Cac3
/
RRDTool
GPL
Performance
SNMP,
syslog
Graphite
Apache
2.0
Performance
Agent
Nagios
GPL
Availability
SNMP,TCP,
ICMP,
IPMI,
syslog
Zabbix
GPL
Availability/
Performance
and
more
SNMP,
TCP/ICMP,
IPMI,
Synthe3c
Transac3ons
Zenoss
GPL
Availability,
Performance,
Event
Management
SNMP,
ICMP,
SSH,
syslog,
WMI
Hitchhiker’s Guide to the Open Cloud by @mrhinkle
49
Notas do Editor
Infinite Probability Drive The Infinite Improbability Drive is a faster-than-light drive. The most prominent usage of the drive is in the starship Heart of Gold. It is based on a particular perception of quantum theory: a subatomic particle is most likely to be in a particular place, such as near the nucleus of an atom, but there is also an infintesimally small probability of it being found very far from its point of origin (for example close to a distant star). Thus, a body could travel from place to place without passing through the intervening space (or hyperspace, for that matter), if you had sufficient control of probability.Reference : Michael Lockwood (2005). The Labyrinth of Time: introducing the universe. Oxford University Press. ISBN 0-19-924995-4.
Cloud Software as a Service (SaaS) – The Application CloudThe capability provided to the consumer is to use the provider’s applications running on a cloud infrastructure. The applications are accessible from various client devices through a thin client interface such as a web browser (e.g., web-based email). The consumer does not manage or control the underlying cloud infrastructure including network, servers, operating systems, storage, or even individual application capabilities, with the possible exception of limited user-specific application configuration settings.Cloud Platform as a Service (PaaS) – The Development Cloud The capability provided to the consumer is to deploy onto the cloud infrastructure consumer-created or acquired applications created using programming languages and tools supported by the provider. The consumer does not manage or control the underlying cloud infrastructure including network, servers, operating systems, or storage, but has control over the deployed applications and possibly application hosting environment configurations.Cloud Infrastructure as a Service (IaaS). – Systems CloudThe capability provided to the consumer is to provision processing, storage, networks, and other fundamental computing resources where the consumer is able to deploy and run arbitrary software, which can include operating systems and applications. The consumer does not manage or control the underlying cloud infrastructure but has control over operating systems, storage, deployed applications, and possibly limited control of select networking components (e.g., host firewalls).
Private cloudThe cloud infrastructure is operated solely for an organization. It may be managed by the organization or a third party and may exist on premise or off premise.Public cloudThe cloud infrastructure is made available to the general public or a large industry group and is owned by an organization selling cloud services.Hybrid cloudThe cloud infrastructure is a composition of two or more clouds (private, community, or public) that remain unique entities but are bound together by standardized or proprietary technology that enables data and application portability (e.g., cloud bursting for load balancing between clouds).
The Kill-o-Zap is a weapon first appearing in the novel The Hitchhiker's Guide to the Galaxy, wielded by the police from Blagulon Kappa when they come to Magrathea to arrest Zaphod. It is referenced throughout the series in the role of a standard and widespread brand of raygun.In the novel The Restaurant at the End of the Universe it is described in more detail:The designer of the gun had clearly not been instructed to beat about the bush. 'Make it evil,' he'd been told. 'Make it totally clear that this gun has a right end and a wrong end. Make it totally clear to anyone standing at the wrong end that things are going badly for them. If that means sticking all sort of spikes and prongs and blackened bits all over it then so be it. This is not a gun for hanging over the fireplace or sticking in the umbrella stand, it is a gun for going out and making people miserable with.'In the novel Life, the Universe and Everything, the group arms themselves with Kill-o-Zap guns against the Krikkiters. Arthur "fumbled to release the safety catch and engage the extreme danger catch as Ford had shown him. He was shaking so much that if he'd fired at anybody at that moment he probably would have burnt his signature on them."In the 2005 movie adaptation, the gun has a sophisticated look. It is more of a white circle that covers the hand and has a trigger on the inside. This version is wielded by Marvin.
Derived from the NIST Diagram Physical Resources NetworkingComputeStorageBios/FirmwareSoftware KernelOperating Systems with Type II HypervisorsVM Manager (VMM) – Type 1 Hypervisors Virtualized Resources NetworkingComputeStorageVirtualized ResourcesMetadataVirtual Machine Images
Top choices for Cloud Computing are Xen and KVM.OpenVZ, container virtualization for Linux, is an interesting option as it has a very minimal overhead to scale application space similar to containers like BSD Jails. Advantage is that memory allocation is soft and unutilized memory can be used by other applications.
OVFAn OVF package consists of several files, placed in one directory. A one-file alternative is the OVA package, which is a TAR file with the OVF directory inside.OVF is a packaging format for software appliances. From a technical point of view, an OVF is a transport mechanism for virtual machine templates. One OVF may contain a single VM, or many VMs (it is left to the software appliance developer to decide which arrangement best suits their application). OVFs must be installed before they can be run; a particular virtualization platform may run the VM from the OVF, but this is not required. If this is done, the OVF itself can no longer be viewed as a “golden image” version of the appliance, since run-time state for the virtual machine(s) will pervade the OVF. Moreover the digital signature that allows the platform to check the integrity of the OVF will be invalidAn Amazon Machine Image (AMI) is a special type of virtual appliance which is used to instantiate (create) a virtual machine within the Amazon Elastic Compute Cloud. It serves as the basic unit of deployment for services delivered using EC2..Amazon AMI An Amazon Machine Image (AMI) is a special type of virtual appliance which is used to instantiate (create) a virtual machine within the Amazon Elastic Compute Cloud. It serves as the basic unit of deployment for services delivered using EC2. Like all virtual appliances, the main component of an AMI is a read-only filesystem image which includes an operating system (e.g., Linux, UNIX, or Windows) and any additional software required to deliver a service or a portion of it.[2]The AMI filesystem is compressed, encrypted, signed, split into a series of 10MB chunks and uploaded into Amazon S3 for storage. An XML manifest file stores information about the AMI, including name, version, architecture, default kernel id, decryption key and digests for all of the filesystem chunks.An AMI does not include a kernel image, only a pointer to the default kernel id, which can be chosen from an approved list of safe kernels maintained by Amazon and its partners (e.g., RedHat, Canonical, Microsoft). Users may choose kernels other than the default when booting an AMI.QCOW2 – QEMU “Copy on Write” Version 2qcow stands for "QEMU Copy On Write" and denotes a disk storage optimization strategy that delays allocation of storage until it is actually needed. QEMU is an emulator and virtual machine container, and it can use a variety of virtual disk images which are generally associated with specific guests operating systems.qcow2 is a newer version of the qcow format. QEMU can use a base image which is read-only, and store all writes to the qcow2 image. Among the QEMU supported formats, this is the most versatile format. Features include smaller images (useful if the filesystem does not support holes, for example on FAT32), optional AES encryption, zlib based compression and support of multiple VM snapshots. qemu and xen have retained the qcow format for backwards compatibility. Users can easily convert qcow disk images to the qcow2 format.VMDK - Virtual Machine Disk VMDK (Virtual Machine Disk) is a file format used for virtual appliances developed for VMware products. The format is a container for virtual hard disk drives to be used in virtual machines like VMware Workstation or Virtualbox. VMDK is an open format.IMGThe IMG file extension is used by files which are standardized raw dumps of a disk, and by files in various formats created by different imaging programs.Xen can use raw disk images and physical disks as filesystems for a Xen based domainU. Another option is to use the disk images used by QEMU. VHD – Virtual Hard Disk Virtual Hard Disk format started by Connectix (now part of Microsoft) made open through the Microsoft Open Specification Promise.VHDs are implemented as files that reside on the native host file system. The following types of VHD formats are supported by Microsoft Virtual PC and Virtual Server:Fixed hard disk image: a file that is allocated to the size of the virtual disk. Fixed VHDs consist of a raw disk image followed by a VHD footer (512 or formerly 511 bytes).[1]Dynamic hard disk image: a file that at any given time is as large as the actual data written to it, plus the size of the header and footer. Dynamic and differencing VHDs begin with a copy of the VHD footer (padded to 512 bytes), and for dynamic or differencing VHDs created by Microsoft products this results in a VHD-cookie string conectix at the begin of the VHD file.[1]Differencing hard disk image: a set of modified blocks (maintained in a separate file referred to as the "child image") in comparison to a parent image. The Differencing hard disk image format allows the concept of Undo Changes: when enabled, all changes to a hard drive contained within a VHD (the parent image) are stored in a separate file (the child image). Options are available to undo the changes to the VHD, or to merge them permanently into the VHD. Different child images based on the same parent image also allow "cloning" of VHDs; at least the globally unique identifier (GUID) must be different.Linked to a hard disk: a file which contains a link to a physical hard drive or partition of a physical hard drive
Appliances are like toasters, they do one thing very well. BitnamiBitNami Cloud Images allow BitNami Stacks to run in a cloud computing environment. BitNami offers Amazon Machine Images (AMIs) for running BitNami Stacks on the Amazon Cloud, as well as BitNami Cloud Hosting, a service that simplifies the process of running open source applications on Amazon EC2.BoxGrinderBoxGrinder supports many virtualization and Cloud platforms like EC2, Xen, KVM, VMware. You can create an appliance based on Fedora, Red Hat Enterprise Linux or CentOS. You are of course free to write your own plugin to support any other virtualization platform or operating system.Oz Oz is a command-line tool that has the ability to create images for common Linux distributions. There are lots of tools for image building. Oz is a bit different from most others in that it actually spawns a VM to do an install, while most other tools simply use a loopback mounted filesystem.SUSE StudioSuSE Studio allows you to use a hosted build service and a on premise virtual build system. Has a RESTful API to make calls to SUSE Studio openSUSE, SUSE Enterprise Linux (SuSE) and JeOSIntegrates with SUSE Lifecycle Management Server and WebYASTCan Share Images in the SUSE Studio GalleryOther projects:Imagefactory - http://imgfac.org/ - imagefactory builds images for a variety of operating system/cloud combinations.UShareSoft – Create cloud Server Templates on any OS in minutes via a SaaS
Scale Up Scale Out
CloudStack – www.cloudstack.org - CloudStack is an Apache Software Foundation project released under ASL 2.0 that provides a highly capable IaaS solution for service providers and enterprises. Robust Web Interface Comprehensive APISecure-Single Sign-OnDynamic Workload ManagementXenserver, Xen Cloud Platform, KVM, VMware, OracleVM supportSecure AJAX Console for VMsNetworking-as-a-Service (Create VLANs to segregate traffic)EC2 API Compatibility Usage MeteringEucalyptus– http://open.eucalyptus.com - IaaS platform originally targeted to provide migration path from Amazon EC2 to private cloud. Amazon AWS Interface CompatibilitySupports Amazon AMIHigh AvailabilityNetwork Management, Security Groups, Traffic IsolationSelf Service S3 compatible Storage Bucket-Based StorageXen and KVM Hypervisor Support (VMware in Enterprise Edition)User Group and Role-Based ManagementSingle Data Center OpenStack– www.openstack.orgOpenStack Compute (Nova) – Nova is a cloud orchestration platform similar to Amazon EC2 Orchestration of popular hypervisors (Xen, Xenserver, KVM, Hyper-V, VMware, Linux Containers)Floating IP Addresses (keep IPs and DNS correct when restarting VMs)VNC proxy through the WebApache 2.0 License Android/iOS ClientsBlock Storage Support (AOE, iSCSI, Sheepdog)OpenStack Storage (Swift) – Is a EBS style solution used for long term storage not real time. Swift is used creating redundant, scalable object storage using clusters of standardized servers to store petabytes of accessible data.Features:Store and Manage files ProgrammaticallyCreate public and private folders Using Commodity HardwareFault tolerant (Nodes/HDD)Scale-out, Scale-UpOpenStack Image Service(Glance) - OpenStack Image Service (code-named Glance) provides discovery, registration, and delivery services for virtual disk images.Features:Provides images-as-a-serviceSupports Raw, VHD, VDI, qcow2, VMDK, OVF Restful APIBackend Options – Swift, Local, S3, HTTPVersion Control and LoggingOpenNebula – http://www.opennebula.org/ – Cloud Computing Toolkit Apache license
Nova – Compute Fabric Controller similar to Amazon EC2 Nova is the project name for OpenStack Compute, a cloud computing fabric controller, the main part of an IaaS system. Individuals and organizations can use Nova to host and manage their own cloud computing.Component based architecture: Quickly add new behaviorsHighly available: Scale to very serious workloadsFault-Tolerant: Isolated processes avoid cascading failuresRecoverable: Failures should be easy to diagnose, debug, and rectifyOpen Standards: Be a reference implementation for a community-driven apiAPI Compatibility: Nova strives to provide API-compatible with popular systems like Amazon EC2Object Storage - Swift – Object Storage like Amazon S3Object Storage is ideal for cost effective, scale-out storage. It provides a fully distributed, API-accessible storage platform that can be integrated directly into applications or used for backup, archiving and data retention. Block Storage allows block devices to be exposed and connected to compute instances for expanded storage, better performance and integration with enterprise storage platforms, such as NetApp, Nexenta and SolidFire.Image Service “Glance”The OpenStack Image Service provides discovery, registration and delivery services for disk and server images. The ability to copy or snapshot a server image and immediately store it away is a powerful capability of the OpenStack cloud operating system. Stored images can be used as a template to get new servers up and running quicklyand more consistently if you are provisioning multiple serversthan installing a server operating system and individually configuring additional services. It can also be used to store and catalog an unlimited number of backups.Identity Service – “Keystone”OpenStack Identity provides a central directory of users mapped to the OpenStack services they can access. It acts as a common authentication system across the cloud operating system and can integrate with existing backend directory services like LDAP. It supports multiple forms of authentication including standard username and password credentials, token-based systems and AWS-style logins.Additionally, the catalog provides a queryable list of all of the services deployed in an OpenStack cloud in a single registry. Users and third-party tools can programmatically determine which resources they can access.As an administrator, OpenStack Identity enables you to:Configure centralized policies across users and systemsCreate users and tenants and define permissions for compute, storage and networking resources using role-based access control (RBAC) featuresIntegrate with an existing directory like LDAP, allowing for a single source of identity authentication across the enterpriseQuantum The Quantum API allows for creation and management of “virtual networks” each of which can have one or more “ports”. A port on a virtual network can be attached to a “network interface”, where a “network interface” is anything which can source traffic, such as a vNIC exposed by a virtual machine, an interface on a load balancer, and so on. These abstractions offered by Quantum (virtual networks, virtual ports,and network interfaces) are the building blocks for building and managing logical network topologies. Of course, the technology that implements Quantum is fully decoupled from the API (that is, the backend is “pluggable”).
Types of Tasks Accomplished by an APIProvisioning (creating, re-creating, moving, or deleting components e.g. virtual machines, vlans)Configuration (assigning or changing attributes of the architecture such as security and network settings)Cloud ProvidersJclouds – java API Abstraction Libcloud – started by CloudKick (now Rackspace) to abstract clouds, Apache incubator projectDeltacloud – started by Red Hat to abstract clouds, Apache incubator projectFog - provider and abstraction level API across compute and storage, written in Ruby
Primary Storage Secondary Storage = GlusterFSGluster FS is an open source scale-out NAS solution. The software is a powerful and flexible solution that simplifies the task of managing unstructured file data whether you have a few terabytes of storage or multiple petabytes.CephCeph is a distributed network storage and file system designed to provide excellent performance, reliability, and scalability. Ceph is based on a reliable and scalable distributed object store, with a distributed metadata management cluster layered on top to provide a distributed file system with POSIX semantics. There are a variety of ways to interact with the system Used by Bloomberg and DreamHostOpenStack Storage (code-named Swift)is open source software for creating redundant, scalable object storage using clusters of standardized servers to store petabytes of accessible data. It is not a file system or real-time data storage system, but rather a long-term storage system for a more permanent type of static data that can be retrieved, leveraged, and then updated if necessary. Primary examples of data that best fit this type of storage model are virtual machine images, photo storage, email storage and backup archiving. Having no central "brain" or master point of control provides greater scalability, redundancy and permanence.Riak Cloud Storage is simple, available storage software built on top of Riak. It features:Large object support (up to 5GB/object)S3-compatible API and authenticationMulti-tenancy and per-user reportingPer-node or capacity-based pricingMulti-datacenter replicationSheepdogSheepdogis a distributed storage system for QEMU/KVM. It provides highly available block level storage volumes that can be attached to QEMU/KVM virtual machines. Sheepdog scales to several hundreds nodes, and supports advanced volume management features such as snapshot, cloning, and thin provisioning.
CloudFoundryCloud Foundry, a VMware-led project, for building a Platform as a Service (PaaS) offering. Cloud Foundry provides a platform for building, deploying, and running cloud apps using Spring for Java developers, Rails and Sinatra for Ruby developers, Node.js and other JVM frameworks including Grails.CloudifyCloudify is designed to bring any app to any cloud enabling enterprises, ISVs, and managed service providers alike to quickly benefit from the cloud automation and elasticity organizations today need. Cloudify helps you maximize application onboarding and automation by externally orchestrating the application deployment and runtime. Cloudify’sDevOps approach treats infrastructure as code, enabling you to describe deployment and post–deployment steps for any application through an external blueprint – AKA, a recipe, which you can then take from cloud to cloud, unchanged.Cloudify recipes on Github at OpenShiftA free Platform-as-a-Service that enables developers to deploy apps written in multiple frameworks and languages across clouds. Open source licensing is forthcoming. StackatoStackato enables you to create a private PaaS hosted on the cloud of your choice (your own or with a hosting provider) to empower your developers to deploy, run, and manage their applications in the cloud. Stackato includes:Multi-choice cloud application platform with automatic provisioning:choice of language (Java, Python, PHP, Ruby, Perl, Node.js, Erlang, Scala, Clojure)choice of framework (popular frameworks for each of the languages above, such as Spring, Django, Pyramid, Rails, Mojolicious, Catalyst and more)choice of data service (MySQL, PostgreSQL, Redis, MongoDB) plus ability to connect to othersWSO2 The WSO2 middleware platform offers a full range of core services: application server, enterprise service bus (ESB), governance registry and repository, identity and access management, business process management (BPM), business activity monitor (BAM), portal server and more. WSO2 Stratos monitors CPU, memory and bandwidth utilization, and SLAs. Then it automatically scales up or down depending on the load. When new resources are needed, WSO2 Stratos transparently adds services and when load goes down, WSO2 Stratos automatically brings services down. Dynamic discovery enables services to automatically detect when resource allocations change; there is no need for manual monitoring or reconfiguration.
In the SDN architecture, the control and data planes are decoupled, network intelligence and state are logically centralized, and the underlying network infrastructure is abstracted from the applications. As a result, enterprises and carriers gain unprecedented programmability, automation, and network control, enabling them to build highly scalable, flexible networks that readily adapt to changing business needs
Software Defined Networking (SDN) is an emerging network architecture where network control is decoupled from forwarding and is directly programmable. This migration of control, formerly tightly bound in individual network devices, into accessible computing devices enables the underlying infrastructure to be abstracted for applications and network services, which can treat the network as a logical or virtual entity. This figure depicts a logical view of the SDN architecture. Network intelligence is (logically) centralized in software-based SDN controllers, which maintain a global view of the network. As a result, the network appears to the applications and policy engines as a single, logical switch. With SDN, enterprises and carriers gain vendor-independent control over the entire network from a single logical point, which greatly simplifies the network design and operation. SDN also greatly simplifies the network devices themselves, since they no longer need to understand and process thousands of protocol standards but merely accept instructions from the SDN controllers.
The limitations of the hardware-dependent network are preventing the enterprise from realizing the full potential of their cloud—and vastly limiting the return on their investment.To get the most from your cloud, you must untether your network.
Open FlowOpenFlow is an open standard that enables researchers to run experimental protocols in the campus networks we use every day. OpenFlow is added as a feature to commercial Ethernet switches, routers and wireless access points – and provides a standardized hook to allow researchers to run experiments, without requiring vendors to expose the internal workings of their network devices. OpenFlow is currently being implemented by major vendors, with OpenFlow-enabled switches now commercially available.In a classical router or switch, the fast packet forwarding (data path) and the high level routing decisions (control path) occur on the same device. An OpenFlow Switch separates these two functions. The data path portion still resides on the switch, while high-level routing decisions are moved to a separate controller, typically a standard server. The OpenFlow Switch and Controller communicate via the OpenFlow protocol, which defines messages, such as packet-received, send-packet-out, modify-forwarding-table, and get-stats.The data path of an OpenFlow Switch presents a clean flow table abstraction; each flow table entry contains a set of packet fields to match, and an action (such as send-out-port, modify-field, or drop). When an OpenFlow Switch receives a packet it has never seen before, for which it has no matching flow entries, it sends this packet to the controller. The controller then makes a decision on how to handle this packet. It can drop the packet, or it can add a flow entry directing the switch on how to forward similar packets in the future.OpenFlow is the first standard communications interface defined betweenthe control and forwarding layers of an SDN architecture. OpenFlow allows direct access to and manipulation of the forwarding plane of network devices such as switches and routers, both physical and virtual (hypervisor-based). It is the absence of an open interface to the forwarding plane that has led to the characterization of today’s networking devices as monolithic, closed, and mainframe-like. No other standard protocol does what OpenFlow does, and a protocol like OpenFlow is needed to move network control out of the networking switches to logically centralized control software
Floodlight - http://floodlight.openflowhub.org/The Floodlight controller is an enterprise-class, Apache-licensed, Java-based OpenFlow Controller. It is supported by a community of developers including a number of engineers from Big Switch Networks.OpenFlow is a open standard managed by the Open Networking Foundation (ONF). It specifies a protocol through switch a remote controller can modify the behavior of networking devices through a well-defined “forwarding instruction set”. Floodlight is designed to work with the growing number of switches, routers, virtual witches, and access points that support the OpenFlow standard.Open Daylight – http://www.opendaylight.comThe adoption of new technologies and pursuit of programmable networks has the potential to significantly improve levels of functionality, flexibility and adaptability of mainstream datacenter architectures. To leverage this abstraction to its fullest requires the network to adapt and evolve to a Software-Defined architecture. One of the architectural elements required to achieve this goal is a Software-Defined-Networking (SDN) platform that enables network control and programmability.OpenStack Networking “Quantum” – https://www.openstack.org/software/openstack-networking/OpenStack Networking is a pluggable, scalable and API-driven system for managing networks and IP addresses. Like other aspects of the cloud operating system, it can be used by administrators and users to increase the value of existing datacenter assets. OpenStack Networking ensures the network will not be the bottleneck or limiting factor in a cloud deployment and gives users real self service, even over their network configurations.Networking CapabilitiesOpenStack provides flexible networking models to suit the needs of different applications or user groups. Standard models include flat networks or VLANs for separation of servers and traffic.OpenStack Networking manages IP addresses, allowing for dedicated static IPs or DHCP. Floating IPs allow traffic to be dynamically rerouted to any of your compute resources, which allows you to redirect traffic during maintenance or in the case of failure. Users can create their own networks, control traffic and connect servers and devices to one or more networks.The pluggable backend architecture lets users take advantage of commodity gear or advanced networking services from supported vendors.Administrators can take advantage of software-defined networking (SDN) technology like OpenFlow to allow for high levels of multi-tenancy and massive scale.OpenStack Networking has an extension framework allowing additional network services, such as intrusion detection systems (IDS), load balancing, firewalls and virtual private networks (VPN) to be deployed and managed.Open vSwitchOpen vSwitch is a production quality, multilayer virtual switch licensed under the open source Apache 2.0 license. It is designed to enable massive network automation through programmatic extension, while still supporting standard management interfaces and protocols (e.g. NetFlow, sFlow, SPAN, RSPAN, CLI, LACP, 802.1ag). In addition, it is designed to support distribution across multiple physical servers similar to VMware's vNetwork distributed vswitch or Cisco's Nexus 1000V. See the full feature list here
Deep Thought is a computer that was created by the pan-dimensional, hyper-intelligent species of beings (whose three dimensional protrusions into our universe are ordinary white mice) to come up with the Answer to The Ultimate Question of Life, the Universe, and Everything. Deep Thought is the size of a small city. When, after seven and a half million years of calculation, the answer finally turns out to be 42, Deep Thought admonishes Loonquawl and Phouchg (the receivers of the Ultimate Answer) that "[he] checked it very thoroughly, and that quite definitely is the answer. I think the problem, to be quite honest with you is that you've never actually known what the question was.”Deep Thought does not know the ultimate question to Life, the Universe and Everything, but offers to design an even more powerful computer, Earth, to calculate it. After ten million years of calculation, the Earth is destroyed by Vogons five minutes before the computation is complete.
NoSQLIn computing, NoSQL (commonly interpreted as "not only SQL"[1]) is a broad class of database management systems identified by non-adherence to the widely used relational database management system model. NoSQL databases are not built primarily on tables, and generally do not use SQL for data manipulation.NoSQL database systems are often highly optimized for retrieval and appending operations and often offer little functionality beyond record storage (e.g. key–value stores). The reduced run-time flexibility compared to full SQL systems is compensated by marked gains in scalability and performance for certain data models.Apache CassandraThe Apache Cassandra database is the right choice when you need scalability and high availability without compromising performance. Linear scalability and proven fault-tolerance on commodity hardware or cloud infrastructure make it the perfect platform for mission-critical data. Cassandra's support for replicating across multiple datacenters is best-in-class, providing lower latency for your users and the peace of mind of knowing that you can survive regional outages.Cassandra's ColumnFamily data model offers the convenience of column indexes with the performance of log-structured updates, strong support for materialized views, and powerful built-in caching. Cassandra is in use at Netflix, Twitter, Urban Airship, Constant Contact, Reddit, Cisco, OpenX, Digg, CloudKick, Ooyala, and more companies that have large, active data sets. The largest known Cassandra cluster has over 300 TB of data in over 400 machines. HypertableHypertable is based on a design developed by Googl(e.g.BigTable clone) to meet their scalability requirements and solves the scale problem better than any of the other NoSQL solutions out there.Mongo DB RedisRedis is an open source, BSD licensed, advanced key-value store. It is often referred to as a data structure server since keys can contain strings, hashes, lists, sets and sorted sets.Riak
MapReduce is a programming model for processing large data sets, and the name of an implementation of the model by Google. MapReduce is typically used to do distributed computing on clusters of computersThe model is inspired by the map and reduce functions commonly used in functional programming, although their purpose in the MapReduce framework is not the same as their original forms.[3]MapReduce libraries have been written in many programming languages. A popular free implementation is Apache Hadoop.
Chukwa - http://incubator.apache.org/chukwa/Chukwa is a Hadoop subproject devoted to large-scale log collection and analysis. Chukwa is built on top of the Hadoop distributed filesystem (HDFS) and MapReduce framework and inherits Hadoop’s scalability and robustness. Chukwa also includes a flexible and powerful toolkit for displaying monitoring and analyzing results, in order to make the best use of this collected data.ZooKeeper - http://zookeeper.apache.org/ -
MeatCloud, Can’t Keep up with Cloud ComputingDevops & Agile IT PhilosophyScript Repetitive TasksAutomate, Automate, Automate
Other disciplines like back-up, log management, performance and security (virus,intrusion detection) are important but not core to the delivery of cloud computing systems
Ideally for the cloud you create management toolchains that automate the management of your cloud. So that the output of one tool informs the input of another.
These tools are all appropriate for Linux guest operating systems, Windows operating system provisioning is not well addressed in OSS. AxemblerProvisonrProvisionr solves the problem of cloud portability by hiding completely the APIs and only focusing on building a cluster that matches the same set of assumptions on all clouds, assumptions like: a specific OS, pre-installed packages and binaries, sane dns settings, ssh & vpn access etc. - think a solid foundation for configuration.As a secondary goal Provisionr will also provide primitives for building automatic or semi-automatic workflows for configuring and monitoring services, workflows that assume that all the machines share a common set of characteristics as described above.CobblerCobbler is a Linux installation server that allows for rapid setup of network installation environments. It glues together and automates many associated Linux tasks so you do not have to hop between lots of various commands and applications when rolling out new systems, and, in some cases, changing existing ones. With a simple series of commands, network installs can be configured for PXE, reinstallations, media-based net-installs, and virtualized installs (supporting Xen, qemu, KVM, and some variants of VMware). Cobbler uses a helper program called 'koan' (which interacts with Cobbler) for reinstallation and virtualization support. CrowbarBare metal provisioning for CloudStack developed by Dell using Opscode Chef. JujuMetal as a Service (MAAS)MAAS offers a nice UI to provision your Ubuntu servers. Each physical server (“node”) will be commissioned automatically on first boot. During the commissioning process administrators are able to configure hardware settings manually before an automated smoke test and burn-in test are done. Once commissioned, a node can be deployed on demand by name, or allocated to a queue for dynamic allocation to services being deployed on this MAAS.Salt Cloud Salt Cloud is a tool for provisioning salted minions across various cloud providers. Currently supported providers are:- Amazon EC2- GoGrid- HP Cloud (using OpenStack)- Joyent- Linode- OpenStack- Rackspace (using OpenStack)The salt-cloud command can be used to query configured providers, create VMs on them, deploy salt-minion on those VMs and destroy them when no longer needed.Salt Cloud requires Salt to be installed, but does not require any Salt daemons to be running. However, if used in a salted environment, it is best to run Salt Cloud on the salt-master, so that it can properly lay down salt keys when it deploys machines, and then properly remove them later. If Salt Cloud is run in this manner, minions will automatically be approved by the master; no need to manually authenticate them later.Deprecated SpacewalkSpacewalk manages software content updates for Red Hat derived distributions such as Fedora, CentOS, and Scientific Linux, within your firewall. You can stage software content through different environments, managing the deployment of updates to systems and allowing you to view at which update level any given system is at across your deployment. A clean central web interface allows viewing of systems and their software update status, and initiating update actions.
Salt - https://github.com/saltstack/salt
AnsibleAnsible's SSH-key based access allows contributors to the Fedora Project to assist in automating infrastructure while having access limited appropriately. Ansible is also used to roll out and manage clusters of machines and ISV software, such as Basho's flagship key-value store Riak.CapistranoCapistrano is a developer tool for deploying web applications. It is typically installed on a workstation, and used to deploy code from your source code management (SCM) to one, or more servers.Capistrano recently added classes capabilities that match cobbler. RunDeckRunDeck is cross-platform open source software that helps you automate ad-hoc and routine procedures in data center or cloud environments. RunDeck allows you to run tasks on any number of nodes from a web-based or command-line interface. RunDeck also includes other features that make it easy to scale up your scripting efforts including: access control, workflow building, scheduling, logging, and integration with external sources for node and option data.FuncFunc allows for running commands on remote systems in a secure way, like SSH, but offers several improvements. Func allows you to manage an arbitrary group of machines all at once. Func automatically distributes certificates to all "slave" machines. There's almost nothing to configure. Func comes with a command line for sending remote commands and gathering data. There are lots of modules already provided for common tasks. Anyone can write their own modules using the simple Python module API. Everything that can be done with the command line can be done with the Python client API. The hack potential is unlimited. You'll never have to use "expect" or other ugly hacks to automate your workflow. It's really simple under the covers. Func works over XMLRPC and SSL. Since func uses certmaster, any program can use func certificates, latch on to them, and take advantage of secure master-to-slave communication. There are no databases or crazy stuff to install and configure. Again, certificate distribution is automatic too. McollectiveThe Marionette Collective AKA mcollective is a framework to build server orchestration or parallel job execution systems.Mcollective is used as a means of programmatic execution of Systems Administration actions on clusters of servers. MCollective use modern tools like Publish Subscribe Middleware and modern philosophies like real time discovery of network resources using meta data and not hostnames. Delivering a very scalable and very fast parallel execution environment.ScalrScalr is a pretty darn good open source cloud management tool. It provides both an automation framework (do Foo when Bar) and a web interface (where is this volume mounted) for managing infrastructure on the cloud, like EC2.FEATURES* Integrated into Opscode Chef, for configuration management.* Pre-automated software, such as nginx, mysql, redis, mongo, and rabbitmq* Blazing fast UI* Multi-cloud* More at http://scalr.net/features/ROADMAP* http://wiki.scalr.net/Roadmap
Automated Toolchain(For Linux guests) Bootstrapped image is launched fro a template in the cloud provider, then searches for the Cobbler server.Post Install from Cobbler kicks off Puppet with defined management class to configure server using rolesAfter cobbler runs kicks off configuration management in Puppet. Then services can be started and stopped with RunDeck or post-install scriptsThen RunDeck can insert new hosts in Zenoss or NagiosFinally as the network conditions change Zenoss can remediate via other tools based on situational awareness
NetFlix AWS Toolbag – http://netflix.github.comOver 25 projects developed by NetFlix to manager their cloud deployments. AsgardAsgard is a web-based tool for managing cloud-based applications and infrastructure.AstyanazAstyanax is a high level Java client for Apache Cassandra. Apache Cassandra is a highly available column oriented database.EddaEdda is a Service to track changes in your cloud deployments.EurekaEureka is a REST (Representational State Transfer) based service that is primarily used in the AWS cloud for locating services for the purpose of load balancing and failover of middle-tier servers.At Netflix, Eureka is used for the following purposes apart from playing a critical part in mid-tier load balancing.For aiding Netflix Asgard - an open source service which makes cloud deployments easier, inFast rollback of versions in case of problems avoiding the re-launch of 100's of instances which could take a long time.In rolling pushes, for avoiding propagation of a new version to all instances in case of problems.For our cassandra deployments to take instances out of traffic for maintenance.For our memcached caching services to identify the list of nodes in the ring.PriamPriam is a process/tool that runs alongside Apache Cassandra to automate the following:- Backup and recovery (Complete and incremental)- Token management- Seed discovery- ConfigurationSupport AWS environmentSimian ArmyThe Simian Army is a suite of tools for keeping your cloud operating in top form. Chaos Monkey, the first member, is a resiliency tool that helps ensure that your applications can tolerate random instance failures
CactiCacti is a complete network graphing solution designed to harness the power of RRDTool's data storage and graphing functionality. Cacti provides a fast poller, advanced graph templating, multiple data acquisition methods, and user management features out of the box. All of this is wrapped in an intuitive, easy to use interface that makes sense for LAN-sized installations up to complex networks with hundreds of devices.RRDToolRRDtool is the OpenSource industry standard, high performance data logging and graphing system for time series data. RRDtool can be easily integrated in shell scripts, perl, python, ruby, lua or tcl applications.Graphite Graphite is a highly scalable real-time graphing system. As a user, you write an application that collects numeric time-series data that you are interested in graphing, and send it to Graphite's processing backend, carbon, which stores the data in Graphite's specialized database. The data can then be visualized through graphite's web interfaces.