Visit this page to view a recording of this webinar - http://www.acquia.com/resources/acquia-tv/conference/acquia-managed-cloud-highly-available-architecture-highly
A Journey Into the Emotions of Software Developers
Acquia Managed Cloud: Highly Available Architecture for Highly Unpredictable Traffic
1. Acquia Managed Cloud:!
Highly Available Architecture for Highly Unpredictable Traffic!
Kieran Lal! Jess Iandiorio!
Technical Director! Sr. Director, Cloud
Product Marketing!
Acquia!
Acquia!
January 19th, 2012!
2. Your Drupal Application Life Stages
Set-up/Launch Production Crisis
Build Application updates Diagnosis
• Load balancers • Drupal App code • Site failure
• Fast page cache • Infrastructure failure
Infrastructure updates
• App Servers • Application errors
• OS
• Database Resolution
• Debugging
• File systems
• Security • Resize
• Web servers
Operations • Launch new virtual servers
• App Configuration
• Multi-region failover
• HA architecture • 24X7 monitoring & alerts
• Backups
Deploy
• Load testing
• Integrated Git/SVN
• Drag and drop content
management
2!
3. Capacity Planning Options
Options Users hitting your site
.010
Over Plan
1 .008
Over Pay
.006
.004
.002
0
Jul Aug Sept Oct Nov Dec
3!
4. Capacity Planning Options
Options Users hitting your site
.010
Over Plan
1 .008
Over Pay
Under Plan .006
2
Expect Outages
.004
.002
0
Jul Aug Sept Oct Nov Dec
4!
5. Capacity Planning Options
Options Users hitting your site
.010
Over Plan
1 .008
Over Pay
Under Plan .006
2
Expect Outages
.004
Acquia Plan
3
No Failure
.002
0
Jul Aug Sept Oct Nov Dec
5!
6. Unpredictable Traffic Victims
Events Businesses News/ M&E Organizations High Growth Sites
Challenges Challenges Challenges
• Plagued by prior event stats • You never know when you’ll be • Lack of experience/skill set
• Failure extends beyond web “Huff Po’d” • No prior benchmarking data
Consequences of • Time-to-market is critical Consequences of failure
failure Consequences of failure • Missed opportunities
• Sales (tickets) • Loss of credibility • Discouraged users
• Brand Damage • Readership • Loss of confidence
• Missed donation • Contractual failures per
opportunities advertising agreements
• Impact to the ad sales cycle
6!
7. The Framework
Planned Successfully Planned Unsuccessfully Unplanned
1 2 3
Test early, often Best Effort Not Enough “Crisis mode”
Profile Profile Profile
• Companies that are • Companies that plan to handle • Companies with truly volatile
experienced with resizing it themselves but don’t have businesses
exercises the “crisis” speed skill set
• Mission-critical sites where
• Allocate 3+ weeks for resizing • Web teams that have no prior failure isn’t an option
exercises combined with load experience manually scaling
servers • Web teams that haven’t
testing
invested in HA architecture
• Don’t underestimate • Web teams who don’t have a
triage plan in place for • Web teams that have separate
administrative challenges
evaluating application v. application and infrastructure
infrastructure failures support
• Companies that are unlucky
7!
8. Planned Successfully
1
Test early, often
Planned Successfully
Profile
• Advanced notice
• Work with our team to
develop a plan and load
test it
Acquia:
• Plan development
• Provision resources
• Continuous monitoring
day of event
8!
10. Planned Successfully
1
The King Center
Test early, often
The Players!
Customer: The King Center!
Partner: Palantir, Soasta!
Acquia: Sales, Operations, Support!
Triage to Resolution: 3 Weeks!
10!
11. Planned Unsuccessfully
2
Best Effort Not Enough
Planned Unsuccessfully
Profile
• Advanced notice
• Tried to plan for the
“worst case scenario”
• Planning fell short of
worst case scenario
Acquia:
• Immediate detection &
resolution of
infrastructure issues
11!
13. Planned Unsuccessfully
2
The BRIT Awards Best Effort Not Enough
The Players!
Customer: The BRIT Awards!
Acquia: Support, Operations, Cloud
Engineering!
Triage to Resolution: 20 minutes!
13!
15. Unplanned
3
“Crisis mode”
Unplanned
Profile
• No advanced notice
• Resources not
available
• Site goes down
• Panic
Acquia:
• Triage the issue –
Code, attack or
capacity?
• Resolve
15!
25. The Acquia Triage Checklist
Determine nature of the problem 10 to 30 minutes
Check monitoring
Check logs
Mitigate problem 30 minutes to 2+ hrs
Code
Roll back or remediate
Attack
DOS – Block offending IP
DDOS – Bring in DOSarrest
Resize
Automatic: Server HA, Web/DB failover
Manual:
Clone site for internal testing (Nagios)
Increase size of DB
Faster load balancers
Larger Varnish Page Caching
File system updates (GlusterFS)
Increase web servers
25!
27. Underlying Elastic Technology Stack
Caching Load
Page Caching Load Balancing
Balancer
Each layer is
Web Servers Drupal Modules composed of
Drupal Application multiple
Servers redundant
PHP Caching
servers. If
one fails,
MySQL File Storage there is little
or no
Data Services
downtime!
Memcache Email
International Data Centers Monitoring
Secure Infrastructure Amazon AWS Backups
27!
28. Multi-region replication & failover
For Back-ups across Borders
• Acquia can deploy instances in any
Amazon EC2 regions:
- US East
- US West
- Europe
- Singapore
- Japan
• Who is this for?
- Organizations who see significant risk
hosting their sites out of one geographic
location
28!
29. Lessons Learned
Planned Successfully Planned Unsuccessfully Unplanned
1 2 3
Test early, often Best Effort Not Enough “Crisis mode”
How can I be successful?
You need elastic infrastructure
You need scaling automation
You need a team that can do diagnosis
You need 24X7 support
Engage Acquia early and often
29!
30. Conclusion
Acquia won’t let you fail
We have the talent & infrastructure in place to ensure you’re
successful
We’ll find the needle in a haystack, and ensure your best day
will never be your worst
Predictable outcomes for unpredictable businesses!
30!
31. For more information about Managed Cloud
Check out our website Speak to a Sales rep
http://www.acquia.com/products-services/acquia-managed-cloud!
31!
32. Questions
• For more information visit:
http://www.acquia.com
• Contact us: sales@acquia.com or 888.9.ACQUIA
• Follow us: @acquia
• Comments welcome:
• Jess.iandiorio@Acquia.com
• Kieran.Lal@Acquia.com
!"#$%&'()*+,-$.(.*/".#,-0(),11(+*(2"'3*#(3"4(
http://acquia.com/resources/recorded_webinars!