A look at a selection of the private cloud use cases Paul has encountered, what they're doing, features they're using and features they'd like to see added or enhanced.
Paul will give examples of private CloudStack deployments highlighting interesting constraints and requirements that they had. Paul will explain how these requirements have been met as well as where they haven't, leading into a look at features on the enterprises' wish list for CloudStack
1. What’s The Use?!
Real Customer Use Cases
Paul Angus
Cloud Architect
paul.angus@shapeblue.com
Twitter: @ShapeBlue
2. @ShapeBlue #CloudStack #CCC13 CloudStack Collaboration Conference 2013
Real Customer Use Cases
How they’re using CloudStack
Their challenges
Their solutions (if they’ve got them)
Features and improvements they would
like to see.
What’s The Use?!
3. @ShapeBlue #CloudStack #CCC13 CloudStack Collaboration Conference 2013
Who am I
Cloud Architect with ShapeBlue
Worked with CloudStack since 2.2.13
View CloudStack from ‘What can I practically
do with it’ point-of-view
About Me
4. @ShapeBlue #CloudStack #CCC13 CloudStack Collaboration Conference 2013
“ShapeBlue are expert builders of public &
private clouds. They are the leading global
independent CloudStack / CloudPlatform
integrator & consultancy”
About ShapeBlue
6. @ShapeBlue #CloudStack #CCC13 CloudStack Collaboration Conference 2013
The UK’s Largest Satellite Broadcasting Company
Noel King / John Turner - Paddy Power
Lee Walker - Trader Media
Stuart Jennings - Citrix
Geoff Higginbottom - ShapeBlue
Acknowledgements
8. @ShapeBlue #CloudStack #CCC13 CloudStack Collaboration Conference 2013
Test & Dev
Highly scalable public facing applications
High speed server resource deployment
Reduced reliance on corporate infrastructure teams
Use Cases
I FIND YOUR LACK OF
TESTING DISTURBING
10. @ShapeBlue #CloudStack #CCC13 CloudStack Collaboration Conference 2013
Use case:
Create replicas of production environment to in which to
develop / experiment
Once development completed, environment used for UAT
Once UAT passed, environment becomes the ‘production’
environment
A National Industry Regulator
11. @ShapeBlue #CloudStack #CCC13 CloudStack Collaboration Conference 2013
A National Industry Regulator
Work Flow Production
Environment
Snapshot
Backups
Production
Templates
DEV / UAT
Environment
Machine
Templates
ISO
Images
12. @ShapeBlue #CloudStack #CCC13 CloudStack Collaboration Conference 2013
Trader Media
Owns ‘AutoTrader’ which started in print
media
Website receives 10 million unique users
per month
An average of 833,000,000 page views
per month
CloudStack environment is still in
development
13. @ShapeBlue #CloudStack #CCC13 CloudStack Collaboration Conference 2013
`
Hits
Early Morning Lunchtime Late Evening
Burst Threshold
Traffic Profile
(indicative)
Trader Media
14. @ShapeBlue #CloudStack #CCC13 CloudStack Collaboration Conference 2013
‘Standard deployment’
Trader Media
DC1 DC2
Hosts
Hosts
Hosts
Hosts
Hosts
Hosts
Hosts
MySQL MySQL
CS Man CS Man
LB
Hosts
15. @ShapeBlue #CloudStack #CCC13 CloudStack Collaboration Conference 2013
Trader Media
DC1 DC2
Hosts
Hosts
Hosts
Hosts
Hosts
Hosts
F5
CS ManHosts CS Man CS Man CS Man Hosts
F5
MySQL
Galera
MySQL MySQL
F5
MySQL
Galera
MySQL MySQL
F5
17. @ShapeBlue #CloudStack #CCC13 CloudStack Collaboration Conference 2013
Paddy Power
“If we haven’t yet done the marketing equivalent of running up and slapping you in the face, then please
allow us to introduce ourselves. Paddy Power is Ireland’s biggest, most successful, security conscious
and innovative bookmaker”
18. @ShapeBlue #CloudStack #CCC13 CloudStack Collaboration Conference 2013
Paddy Power
Facts and figures
paddypower.com has 1.6 million users*
Annual revenue of $535m from online
users*
CloudStack environment is still in
development
*2012 figures
19. @ShapeBlue #CloudStack #CCC13 CloudStack Collaboration Conference 2013
Environment Templates
Paddy Power
NEW
APP
NOSQL
NEW
APP
Tomc
at
NOSQL
NEW
APP
Tomc
at
NOSQL
NEW
APP
Tomc
at
NOSQL
NEW
APP
Tomc
at
20. @ShapeBlue #CloudStack #CCC13 CloudStack Collaboration Conference 2013
Chaos Monkey / Simian Army (Problems as a Service)
Guest instances which are designed
specifically to disrupt or test elements
to prove the robustness of the overall
architecture.
Paddy Power
21. @ShapeBlue #CloudStack #CCC13 CloudStack Collaboration Conference 2013
Faster transition from Development to Production
Paddy Power
Web
App
DB
Virtual Cisco
ASA
Web
App
DB
Physical Cisco
ASADevelpoment
ntier
Environment
Production
ntier
Environment
22. @ShapeBlue #CloudStack #CCC13 CloudStack Collaboration Conference 2013
UK Satellite Broadcaster
Mobile viewing application has 3.26
million users
51 million monthly streamed items
12 million monthly streamed VOD views
Mobile Apps have a combined 6.3 million
user base
23 million weekly on-demand downloads
23. @ShapeBlue #CloudStack #CCC13 CloudStack Collaboration Conference 2013
Security Groups -
VPC Bottleneck
UK Satellite Broadcaster
Web
Tier
App
Tier
CloudStack Public
Data
Tier
GW LB
VPC Router
24. @ShapeBlue #CloudStack #CCC13 CloudStack Collaboration Conference 2013
Security Groups
UK Satellite Broadcaster
Web
Tier
App
Tier
CloudStack External
CF
Tier
Data
Tier
Mgmt
Tier
GWLB
Security Group Separation
25. @ShapeBlue #CloudStack #CCC13 CloudStack Collaboration Conference 2013
SSL Offload
Cavium Nitrox SSL accelerator card in every host
Host level scalability & redundancy
Specific Nitrox card provides 8 virtual interfaces
Enable PCI pass-through from host (requires customised kernel)
Xen & KVM drivers available
UK Satellite Broadcaster
26. @ShapeBlue #CloudStack #CCC13 CloudStack Collaboration Conference 2013
SSL Offload – PCI Pass-through
Guest Instance Configuration
Create guest
instance through
CloudStack
Shutdown the
instance (virsh)
Edit the XML
definition of the
instance
Restart the
instance again
(virsh)
UK Satellite Broadcaster
27. @ShapeBlue #CloudStack #CCC13 CloudStack Collaboration Conference 2013
Ultra Low Latency Pipelines
UK Satellite Broadcaster
DEA
DEA
DEA
DEA
DEA
DEA
Cassandra
Cassandra
Cassandra
Cassandra
Cassandra
Cassandra
CF Router
CF Router
28. @ShapeBlue #CloudStack #CCC13 CloudStack Collaboration Conference 2013
CloudFoundry
UK Satellite Broadcaster
CF Router
CF Router
Cassandra
Cassandra
Cassandra
Cassandra
Health
Manager
Cloud
Controller
Stager
Data
Services
Web Tier App Tier Data TierApp(s) + DEA
App(s) + DEA
App(s) + DEA
App(s) + DEA
App(s) + DEA
App(s) + DEA
f5
CF Bus
(NATS)
SSL Termination
LB & Response
Level Cache
29. @ShapeBlue #CloudStack #CCC13 CloudStack Collaboration Conference 2013
Cassandra
Requires anti-affinity of instances
‘Snitch’ maps IPs to racks and data centers – requires control
over IP addressing in conjunction with VM placement
UK Satellite Broadcaster
110.100.200.105
30. @ShapeBlue #CloudStack #CCC13 CloudStack Collaboration Conference 2013
Bursting to Amazon
Requires VPN/direct link to maintain database consistency
UK Satellite Broadcaster
32. @ShapeBlue #CloudStack #CCC13 CloudStack Collaboration Conference 2013
Feature Requests
Storage Options
Multiple local storage pools per host
Kerberos
(Read Active Directory)
Real thin provisioning monitoring
RBAC
Low latency Pipelines (Affinity / Anti-
Affinity + vApps)
33. @ShapeBlue #CloudStack #CCC13 CloudStack Collaboration Conference 2013
Feature Requests
Public – Private cloud integration
Security Groups in Advanced Zones
Multiple VLAN separated networks with
SG separation within them.
Post deployment actions
Push configuration information to VMs
without requiring virtual router DHCP.
(Altiris-like behaviour)
35. What’s The Use?!
Paul Angus
Cloud Architect
paul.angus@shapeblue.com
Twitter: @ShapeBlue
Notas do Editor
Can’t name some clients because how they run their clouds is sensitive commercially or ‘for pubic safety’But these are all real customer stories
A brief word on ShapeBlue - What we’re saying is:This what we do - This is our day job - This feeds our families,etcetc
Names will be changed to protect the innocent / guilty / culpableSome very large customers’ logos not shown – although I will talk about their use cases
Special thanks to these guys+ others I can’t mention
What kind of uses are we talking about
We throw out ‘Test & Dev’ as our go-to use case for private clouds – we’ll look at what this actually meansScalable public facing apps – indicate kind of scale we’re talking about laterHigh speed deployment – quote one client physical tin was a 15 working day lead time, virtualisation changed that – to 18 days (because it’s a black art with therefore more internal checks and balancesNo names as to who said reduced reliance (been cited more than once) - Networking & VM teams – firewall rules etc
That’s the generalitiesNow for the specifics
Facts from http://www.tradermedia.co.uk/media-centre/key-facts.aspx and annual report 2012CloudStack environment is still in developmentNote MASSIVE scale
Load split between AWS and CloudStack - burst above threshold goes to AWSCheaper to run base load in own datacentre.They have done the maths to figure out where the line should be.
Loss of link to DC2 stops ability to orchestrate DC2
Really cool setup – trader media grouploss of an entire datacenter does not effect the other datacenterTwo f5 load balancers are separate virtual servers from the same physical box
RightScale templates for each environmentA unique code which ties a ‘large tomcat server’ instance in AWS to a ‘large tomcat server’ instance in CloudStackRightScale can be set to start instances on a automated time-based
Paddy Power – Based in Dublin (Ireland)This was the least offence ad I could find
2012 figures
Templating environments allows developers to ‘call up’ a standard environment and start developing quickly,discard and start again
Chaos Monkey is not a reference to any ShapeBlue consultants or Project Managers [humour]http://techblog.netflix.com/2011/07/netflix-simian-army.htmlLatency Monkey induces artificial delays in our RESTful client-server communication layer to simulate service degradation and measures if upstream services respond appropriately. In addition, by making very large delays, we can simulate a node or even an entire service downtime (and test our ability to survive it) without physically bringing these instances down. This can be particularly useful when testing the fault-tolerance of a new service by simulating the failure of its dependencies, without making these dependencies unavailable to the rest of the system.Conformity Monkey finds instances that don’t adhere to best-practices and shuts them down. For example, we know that if we find instances that don’t belong to an auto-scaling group, that’s trouble waiting to happen. We shut them down to give the service owner the opportunity to re-launch them properly.Doctor Monkey taps into health checks that run on each instance as well as monitors other external signs of health (e.g. CPU load) to detect unhealthy instances. Once unhealthy instances are detected, they are removed from service and after giving the service owners time to root-cause the problem, are eventually terminated.Janitor Monkey ensures that our cloud environment is running free of clutter and waste. It searches for unused resources and disposes of them.Security Monkey is an extension of Conformity Monkey. It finds security violations or vulnerabilities, such as improperly configured AWS security groups, and terminates the offending instances. It also ensures that all our SSL and DRM certificates are valid and are not coming up for renewal.10-18 Monkey (short for Localization-Internationalization, or l10n-i18n) detects configuration and run time problems in instances serving customers in multiple geographic regions, using different languages and character sets.Chaos Gorilla is similar to Chaos Monkey, but simulates an outage of an entire Amazon availability zone. We want to verify that our services automatically re-balance to the functional availability zones without user-visible impact or manual intervention.
Isolation allows dev to much more closely match dev (also see regulatory authority)Faster transition from dev to prod through closer replication of environment including constraints (ie firewall ACLs)Take configs from virtual ASA in dev environment -> ratify and place on prod environment devicesThe Prod environment may or may not be in CloudStack
UK’s Largest Pay-TV BroadcasterMassive scaleA number of elements of their infrastructure will be run on CloudStack
Traffic must traverse the VPC router multiple times to answer a single ‘query’10’s of thousands of Transactions per Second required in the environment
Using security groups there is no bottleneck.Arrows here show allowed direction of requestsCF = cloudfoundry
Custom kernel for KVMBecause host level scaling adding 1 host adds equivalent amount of SSL capacity
Currently they are stopping, editing and restarting the instance outside of CloudStackPushing Cavium to get involved in PCI pass-through development
Avoid ANY hops through switchesRequires either advanced logic around affinity/anti affinityDEA = Droplet Execution AgentUnit of scale is a single host including SSL Offload
NATS - Not Another TibcoServer = lightweight [cloud] messaging systemDEA takes care of managing an application instance's lifecycle. It can be instructed by the Cloud Controller to start and stop application instances. It keeps track of all started instances, and periodically broadcasts messages about their state over NATS (meant to be picked up by the Health Manager).The ROUTER routes traffic coming into Cloud Foundry to the appropriate component - usually Cloud Controller or a running application on a DEA nodeApp instances live on DEA VMs
I’m not a Cassandra expert so I’m going to keep this simpleIn basic zone – map pod to clustermore difficult in advanced zones.
In basic zone – map pod to clustermore difficult in advanced zones.
Some of these may be slated for 4.2in complete or partially
Multiple local storage pools – FusionIO, SSD, different RAID configurations often asked for in relation to Role Based Access Control – particularly read only and start/stop only
Public – Private cloud integration -> ability to see Amazon as another CloudStack zone giving a ‘single pain of glass’ with Orchestrating of Amazon zones - leading to LB rules across CloudStackVLAN separation of accounts with security group isolation of VMs within that account (see broadcaster use cases)Post deployment actions (without chef / puppet)