Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
From The Coalface CCCEU13
1. From The Coalface
Real Customer Use Cases
Paul Angus
Cloud Architect
paul.angus@shapeblue.com
Twitter: @ShapeBlue
2. From The Coalface
Real Customer Use Cases
How they’re using CloudStack
CloudStack Infrastructure
@ShapeBlue #CloudStack #CCCEU13
3. About Me
Who am I
Cloud Architect with ShapeBlue
Worked with CloudStack since 2.2.13
Specialising in deployment of CloudStack
and surrounding infrastructure
Orange, TomTom, PaddyPower, Ascenty, BSkyB
I view CloudStack from ‘What can cloud
consumers practically do with it’ point-of-view
@ShapeBlue #CloudStack #CCCEU13
5. About ShapeBlue
“ShapeBlue are expert builders of public &
private clouds. They are the leading global
independent CloudStack / CloudPlatform
integrator & consultancy”
@ShapeBlue #CloudStack #CCCEU13
7. Use Cases
Test & Dev
Highly scalable public facing applications
High speed server resource deployment
Reduced reliance on corporate infrastructure teams
@ShapeBlue #CloudStack #CCCEU13
9. Trader Media
Owns ‘AutoTrader’ which started in print
media
Website receives 10 million unique users
per month
An average of 833,000,000 page views
per month
CloudStack environment is still in
development
@ShapeBlue #CloudStack #CCCEU13
14. Paddy Power
“If we haven’t yet done the marketing equivalent of running up and slapping you in the
face, then please allow us to introduce ourselves. Paddy Power is Ireland’s biggest, most
successful, security conscious and innovative bookmaker”
@ShapeBlue #CloudStack #CCCEU13
15. Paddy Power
Facts and figures
paddypower.com has 1.6 million users*
Annual revenue of $535m from online
users*
CloudStack environment is still in
development
*2012 figures
@ShapeBlue #CloudStack #CCCEU13
17. Paddy Power
Chaos Monkey / Simian Army (Problems as a Service)
Guest instances which are designed
specifically to disrupt or test elements
to prove the robustness of the overall
architecture.
@ShapeBlue #CloudStack #CCCEU13
18. Paddy Power
Faster transition from Development to Production
Develpoment
ntier
Environment
Virtual Cisco
ASA
Physical Cisco
ASA
Web
Web
App
App
DB
DB
@ShapeBlue #CloudStack #CCCEU13
Production
ntier
Environment
19. UK Satellite Broadcaster
Mobile viewing application has 3.26
million users
51 million monthly streamed items
12 million monthly streamed VOD views
Mobile Apps have a combined 6.3 million
user base
23 million weekly on-demand downloads
@ShapeBlue #CloudStack #CCCEU13
20. UK Satellite Broadcaster
Security Groups VPC Bottleneck
GW
LB
CloudStack Public
VPC Router
Web
Tier
App
Tier
Data
Tier
@ShapeBlue #CloudStack #CCCEU13
21. UK Satellite Broadcaster
CloudStack External
Security groups in advanced
zones
LB
GW
GW
GW
GW
GW
Security Group Separation
Web
Tier
CF
Tier
App
Tier
Mgmt
Tier
Data
Tier
@ShapeBlue #CloudStack #CCCEU13
22. UK Satellite Broadcaster
Cassandra
Requires anti-affinity of instances
‘Snitch’ maps IPs to racks and data centers – requires control
over IP addressing in conjunction with VM placement
110.100.200.105
@ShapeBlue #CloudStack #CCCEU13
23. UK Satellite Broadcaster
Bursting to Amazon
Requires VPN/direct link to maintain database consistency
Uses Akamai
@ShapeBlue #CloudStack #CCCEU13
31. From The Coalface
Real Customer Use Cases
Paul Angus
Cloud Architect
paul.angus@shapeblue.com
Twitter: @CloudyAngus @ShapeBlue
Notas do Editor
Can’t name some clients because how they run their clouds is sensitive commercially or ‘for pubic safety’But these are all real customer stories
Names will be changed to protect the innocent / guilty / culpableSome very large customers’ logos not shown – although I will talk about their use cases
A brief word on ShapeBlue - What we’re saying is:This what we do - This is our day job - This feeds our families,etcetc
What kind of uses are we talking about
We throw out ‘Test & Dev’ as our go-to use case for private clouds – we’ll look at what this actually meansScalable public facing apps – indicate kind of scale we’re talking about laterHigh speed deployment – quote one client physical tin was a 15 working day lead time, virtualisation changed that – to 18 days (because it’s a black art with therefore more internal checks and balancesNo names as to who said reduced reliance (been cited more than once) - Networking & VM teams – firewall rules etc
That’s the generalitiesNow for the specifics
Facts from http://www.tradermedia.co.uk/media-centre/key-facts.aspx and annual report 2012CloudStack environment is still in developmentNote MASSIVE scale
Load split between AWS and CloudStack - burst above threshold goes to AWSCheaper to run base load in own datacentre.They have done the maths to figure out where the line should be.
Loss of link to DC2 stops ability to orchestrate DC2
Really cool setup – trader media grouploss of an entire datacenter does not effect the other datacenterTwo f5 load balancers are separate virtual servers from the same physical box
RightScale templates for each environmentA unique code which ties a ‘large tomcat server’ instance in AWS to a ‘large tomcat server’ instance in CloudStackRightScale can be set to start instances on a automated time-based
Paddy Power – Based in Dublin (Ireland)This was the least offence ad I could find
2012 figures
Templating environments allows developers to ‘call up’ a standard environment and start developing quickly,discard and start again
Chaos Monkey is not a reference to any ShapeBlue consultants or Project Managers [humour]http://techblog.netflix.com/2011/07/netflix-simian-army.htmlLatency Monkey induces artificial delays in our RESTful client-server communication layer to simulate service degradation and measures if upstream services respond appropriately. In addition, by making very large delays, we can simulate a node or even an entire service downtime (and test our ability to survive it) without physically bringing these instances down. This can be particularly useful when testing the fault-tolerance of a new service by simulating the failure of its dependencies, without making these dependencies unavailable to the rest of the system.Conformity Monkey finds instances that don’t adhere to best-practices and shuts them down. For example, we know that if we find instances that don’t belong to an auto-scaling group, that’s trouble waiting to happen. We shut them down to give the service owner the opportunity to re-launch them properly.Doctor Monkey taps into health checks that run on each instance as well as monitors other external signs of health (e.g. CPU load) to detect unhealthy instances. Once unhealthy instances are detected, they are removed from service and after giving the service owners time to root-cause the problem, are eventually terminated.Janitor Monkey ensures that our cloud environment is running free of clutter and waste. It searches for unused resources and disposes of them.Security Monkey is an extension of Conformity Monkey. It finds security violations or vulnerabilities, such as improperly configured AWS security groups, and terminates the offending instances. It also ensures that all our SSL and DRM certificates are valid and are not coming up for renewal.10-18 Monkey (short for Localization-Internationalization, or l10n-i18n) detects configuration and run time problems in instances serving customers in multiple geographic regions, using different languages and character sets.Chaos Gorilla is similar to Chaos Monkey, but simulates an outage of an entire Amazon availability zone. We want to verify that our services automatically re-balance to the functional availability zones without user-visible impact or manual intervention.
Isolation allows dev to much more closely match dev (also see regulatory authority)Faster transition from dev to prod through closer replication of environment including constraints (ie firewall ACLs)Take configs from virtual ASA in dev environment -> ratify and place on prod environment devicesThe Prod environment may or may not be in CloudStack
UK’s Largest Pay-TV BroadcasterMassive scaleA number of elements of their infrastructure will be run on CloudStack
Traffic must traverse the VPC router multiple times to answer a single ‘query’10’s of thousands of Transactions per Second required in the environment
Using security groups there is no bottleneck.Arrows here show allowed direction of requestsCF = cloudfoundry
I’m not a Cassandra expert so I’m going to keep this simpleIn basic zone – map pod to clustermore difficult in advanced zones.
In basic zone – map pod to clustermore difficult in advanced zones.
That’s the generalitiesNow for the specifics
Deployment server – Chef/Puppet/Ansible (rise of DevOps)Load balancers – physical (F5/NetScaler) or virtual (Zen Load balancer or HAProxy)CS Manager usually in active/passive configuration (some clients looking at dedicated API CS manager) MySQL still in master/slave configuration.MySQL HA / DR is still a headache for most clients.-> DRDB / MySQL clustering / Galera all being trialled