How we built, architected and scaled Defensio, from our prototype to the version currently in production.
Presented at ConFoo in Montreal on March 12, 2010.
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
Building Scalable Web Applications For The Cloud
1. Building Scalable Web
Applications for the
Cloud
Carl Mercier (@cmercier)
Director of software development, Websense Inc.
Founder, Defensio.com
cmercier@websense.com
O U T S M A R T I N G E V I L S PA M
Friday, March 12, 2010
2. Security
for
the
Social
Web
We protect your website from spam,
malicious content,
unwanted URLs and profanity.
Friday, March 12, 2010
3. The Cloud is
different
O U T S M A R T I N G E V I L S PA M
Friday, March 12, 2010
4. Architecture challenges
• We’re an API, not a website
• Many million requests per day, non stop
• Each requests can be fast or slow
• Very little caching possible
O U T S M A R T I N G E V I L S PA M
Friday, March 12, 2010
5. Architecture challenges
• Write intensive
• Traffic comes in spikes
• Any downtime is catastrophic
• 2 different versions of our APIs
• Bootstrapped startup. We’re broke!
O U T S M A R T I N G E V I L S PA M
Friday, March 12, 2010
6. Getting technical
• Built in Ruby (Rails, Merb and pure Ruby)
• External services written in Perl and C
• 100% hosted on Amazon EC2
• Mix of 32 and 64 bit machines
• mostly m1.small (the cheapest ones)
O U T S M A R T I N G E V I L S PA M
Friday, March 12, 2010
7. Prototyping/1.0 beta
aka The Spaghetti Release
• Single Ruby on Rails application
• No direction whatsoever
• A few small EC2 instances
• A single MySQL
O U T S M A R T I N G E V I L S PA M
Friday, March 12, 2010
8. Prototyping/1.0 beta
aka The Spaghetti Release
• Horizontal scaling:
Start more instances
DNS Round Robin • This also scaled the website
NGINX + API + WEB NGINX + API + WEB
• Eventually moved MySQL to m1.large
MySQL
O U T S M A R T I N G E V I L S PA M
Friday, March 12, 2010
9. What was wrong?
• Unmaintainable code
• Why did it even work?
• but it REALLY did work, and well! :)
• DNS Round Robin
• Very database intensive
O U T S M A R T I N G E V I L S PA M
Friday, March 12, 2010
10. The Big Rewrite
• Complete code rewrite
• Proper code separation
• Completely tested
• Ruby + MERB + Datamapper
• Replaced DNS RR with HAProxy
• Added Memcached to the mix
O U T S M A R T I N G E V I L S PA M
Friday, March 12, 2010
11. The Big Rewrite
architecture
HAProxy
NGINX + API (Merb) NGINX + API (Merb) NGINX + API (Merb)
MySQL + Memcached
O U T S M A R T I N G E V I L S PA M
Friday, March 12, 2010
12. Later Improvements
• Dumped HAProxy (single point of failure)
• replaced with Amazon ELB
• Move Memcached to its own machine
• Decoupled resource-intensive parts
• turned them into web services
O U T S M A R T I N G E V I L S PA M
Friday, March 12, 2010
13. The Big Rewrite
architecture, revisited
Amazon ELB
NGINX + API (Merb)
many EC2 instances
MySQL
Memcached
Web Service 1 Web Service n
O U T S M A R T I N G E V I L S PA M
Friday, March 12, 2010
14. Advantages of this architecture
• Easy to scale horizontally OR vertically
• Each unit can be scaled & tweaked independently
• Easy to maintain
• Increased redundancy
O U T S M A R T I N G E V I L S PA M
Friday, March 12, 2010
15. MySQL Pain
• Traffic keeps growing
• Adding millions of records per day
• Database size growing exponentially
• Most of this data was non-critical
• Stuck with our schemas and indexes
O U T S M A R T I N G E V I L S PA M
Friday, March 12, 2010
16. Scaling MySQL on EC2
• If your DB fits in memory, don’t worry, be happy!
• It’s painful.
• You should be on EBS or equivalent
• permanent and robust storage
• EBS snapshots
O U T S M A R T I N G E V I L S PA M
Friday, March 12, 2010
17. Scaling MySQL on EC2
• Scale up (move to a bigger machine)
• More RAM
• Database often IO bound
• RAID 0 (stripe)
• Inconsistent EBS snapshots
O U T S M A R T I N G E V I L S PA M
Friday, March 12, 2010
18. Scaling MySQL on EC2
• Replication
• headache
• all writes go to master
• Split database
O U T S M A R T I N G E V I L S PA M
Friday, March 12, 2010
19. MongoDB
• Document-oriented database
• Schema-less
• Fast
• Replication, fail-over, auto-sharding
O U T S M A R T I N G E V I L S PA M
Friday, March 12, 2010
20. Three Data Stores
• MySQL (critical data)
• accounts, keys, account settings, statistics
• MongoDB (semi non-critical)
• documents, reputations
• Memcached (non-critical data)
• short term, very fast updates
O U T S M A R T I N G E V I L S PA M
Friday, March 12, 2010
21. Three Data Stores
Amazon ELB
NGINX + API (Merb)
many EC2 instances
MySQL
m1.small
MongoDB
64-bit
Memcached
Web Service 1 Web Service n
O U T S M A R T I N G E V I L S PA M
Friday, March 12, 2010
22. API 2.0 Challenges
• Completely new API to the user
• Keep support for 1.x
• Asynchronous
• New features, can’t just wrap API 1.x
O U T S M A R T I N G E V I L S PA M
Friday, March 12, 2010
23. Frontend
• Ruby on Rails
• Accepts HTTP connections
• Knows the API definition for both 1.x and 2.0
• Converts API calls into “jobs”
O U T S M A R T I N G E V I L S PA M
Friday, March 12, 2010
24. Frontend
• Jobs are put in a queue
• Backend responds with generic response
• Frontend converts response and renders
O U T S M A R T I N G E V I L S PA M
Friday, March 12, 2010
25. Queue/Messaging: RabbitMQ
• Messaging (AMQP)
• Ultra-fast
• Feature-rich
• Complex (too complex for our needs)
O U T S M A R T I N G E V I L S PA M
Friday, March 12, 2010
26. Queue/Messaging: Beanstalkd
• Ultra-simple simple queue
• Not a messaging server (hack it to make it behave like one!)
• Just as fast as RabbitMQ
• Delayed jobs
O U T S M A R T I N G E V I L S PA M
Friday, March 12, 2010
27. Backend
• Previously our “API” servers
• Doesn’t accept HTTP connections anymore
• Communicates through jobs/response (queue)
• API agnostic. Only knows about jobs/response
• All processing/logic
• Spits a response back in the queue
O U T S M A R T I N G E V I L S PA M
Friday, March 12, 2010
28. Current Architecture API 2.0
Amazon ELB
Cluster n
API Frontend (Unicorn + Rails)
many EC2 instances
Queue/Messaging
(Beanstalkd)
Backend (hacked Merb)
many EC2 instances
MySQL MongoDB
Memcached
m1.small 64-bit
Web Service 1 Web Service n
O U T S M A R T I N G E V I L S PA M
Friday, March 12, 2010
29. Advantages
• Awesome fault-tolerence
•
Amazon ELB
API Frontend (Unicorn + Rails)
many EC2 instances
Cluster n Horizontal scaling is easy
Queue/Messaging
(Beanstalkd)
Backend (hacked Merb)
• Add capacity to a cluster
•
many EC2 instances
Add clusters
MySQL MongoDB
Memcached
•
m1.small 64-bit
No more MySQL scaling worries
•
Web Service 1 Web Service n
Complete schema flexibility w/
MongoDB
O U T S M A R T I N G E V I L S PA M
Friday, March 12, 2010
30. When to scale “out”
(horizontally)
• Each instances are identical clones
• Redundancy
• Fast & easy scaling
• Instance is “irrelevant”
O U T S M A R T I N G E V I L S PA M
Friday, March 12, 2010
31. What we scale “out”
(horizontally)
• Frontend
• Backend
• Internal web services
O U T S M A R T I N G E V I L S PA M
Friday, March 12, 2010
32. When to scale “up”
(vertically)
• Multiple instances are hard to manage (eg: database)
• CPU or memory intensive applications
• Scaling out becomes unpractical
• Scaling out becomes cost-ineffective
O U T S M A R T I N G E V I L S PA M
Friday, March 12, 2010
33. I really like
scaling out
vs. scaling up
O U T S M A R T I N G E V I L S PA M
Friday, March 12, 2010
34. Bulletproof your app
O U T S M A R T I N G E V I L S PA M
Friday, March 12, 2010
35. Scale & shrink fast
even automatically
O U T S M A R T I N G E V I L S PA M
Friday, March 12, 2010
36. Most cost-effective
O U T S M A R T I N G E V I L S PA M
Friday, March 12, 2010
37. Things I learned
• Cloud instances are disposable
• Architect your app accordingly
• Instances should be killed, not fixed
O U T S M A R T I N G E V I L S PA M
Friday, March 12, 2010
38. Things I learned
• Pre-optimizing is useless
• Be aware of your bottlenecks
• Architect your application for flexibility
• Deploy different parts to different servers
• Secure your important data
O U T S M A R T I N G E V I L S PA M
Friday, March 12, 2010
39. Things I learned (about EC2)
• It is pretty reliable, anything else you heard is a myth
• When shit hits the fan, you’re on your own
• Create images
• Automate as much as you can
O U T S M A R T I N G E V I L S PA M
Friday, March 12, 2010
40. Things I learned (about EC2)
• Auto-scaling is easy, but rarely needed
• IO is inconsistent and mostly sucks
• Slowish (Rackspace Cloud is much faster)
• Large(r) instances are too expensive
O U T S M A R T I N G E V I L S PA M
Friday, March 12, 2010
41. Questions?
Twitter: @cmercier and @defensio
Email: cmercier@websense.com
Web: www.defensio.com
O U T S M A R T I N G E V I L S PA M
Friday, March 12, 2010