Building Scalable Cloud Applications - Presentation at VCCF 2012
1. Building Scalable Applications
for the Modern Cloud
Presentation @ VCCF 2012 Tech Labs
Fotis Stamatelopoulos
(http://www.linkedin.com/in/fstamatelopoulos)
Christos Stathis
(http://www.linkedin.com/in/chstath)
2. About this presentation
● This is not a live demo presentation
● It does not present a cloud product
● It focuses instead on
○ designing software specifically for the modern cloud
○ building scalable applications
● Discusses actual case study implementations
4. Why/when deploy to the Cloud?
● Financial reasons: pay only what you use
● Scalability: fast & easy addition of resources
● Elasticity: adapt to traffic
● Ideal for SaaS offerings
Rule of thumb:
If your traffic/load is constant
then build your data center instead
5. Designing SW for the Cloud
● The IaaS environment
○ monolithic architectures are not an option
○ design to scale-up / down
○ monitor everything! have alerts
○ handle latency and replication delays
● The PaaS environment
○ the cloud handles scaling
○ coding / lib limitations
○ modular design
○ service oriented approach
6. Designing SW for the Cloud
Examples:
● break down your app in multiple server components
hosted in multiple VMs
● use scalable back-ends: your own DB/NoSQL cluster
or services like GAE datastore, AWS SimpleDB
● handle PaaS limitations like:
○ maximum request process time
○ limitations in Java / Python lib support (GAE)
○ no filesystem I/O
○ limitations of high availability, distributed services like AWS SQS
8. Case Study 1:
GSS / MyNetworkFolders
● SaaS offering implemented by EBS.gr
● A distributed, scalable file storage platform:
○ supporting access via multiple interfaces (web
browser, mobile devices, desktop, WebDAV)
○ provides a RESTful API
○ based on our FOSS project gss-project.org (used by
GRNET Pithos and the Univ of Zagreb)
● It was designed from scratch as a scalable
cloud application
9.
10.
11. Design Decisions
● Clustering without session replication
● Stateless, RESTful design
○ easier to scale up
○ allows building of multiple clients
● No HTTP session / no sticky session
● Multiple levels of caching (client, front-end,
second-level, etc)
● Use polyglot storage
13. Deployment to Amazon AWS
Components hosted in
AWS EC2 instances (VMs)
AWS S3 handles files
(replicated and highly
available storage - PaaS)
EBS volumes backed-up to
S3 used for persistent VM
storage
will move to a distributed NoSQL
(mongoDB, PaaS AWS service)
14. Findings
● Administration effort is minimal
● S3 rules! as a reliable file storage back-end
● With an IaaS providers like AWS you can
achieve real elasticity
● Availability and stability of services &
resources is excellent, but you still need:
○ Monitoring & alerts is a must
○ Have a consistent backup plan
● Fine tuning and good design will save you $$
15. Case Study 2: EUDOXUS
● Stakeholder: Hellenic Ministry of Education
● Developed, operated and supported by
GRNET http://grnet.gr/
● A distributed system that supports and
streamlines processes and operations related
to the textbooks distribution for the higher
education institutions of Greece
● Handles ~ 300K named users per semester
http://eudoxus.gr
16. Design Goals - Challenges
● Multiple APIs and integration with other ISs
● Tight schedule, fixed milestones - deadlines
○ Incrementally release modules in production
● Not finalized requirements
○ Be flexible, handle changing requirements
● High availability
○ Redundant architecture
○ Live application updates – no downtime
● Handle fluctuating load & traffic
17. Design decisions
● Rich Web-based clients (UIs)
○ State is handled by the rich clients
● Support various APIs
○ RESTFul APIs (JSON / XML payload)
○ use SOAP-based web services specific cases
● Back-end storage: RDBMS vs noSQL
decision
○ noSQL offer better scalability than typical RDBMs
○ needed at least a minimal transactional core
○ dropped the initial hybrid approach - full RDBMS
design
18. Design decisions (cont'd)
● Use caching as much as possible, at multiple
levels to off-load the back-end storage
● Authentication / Credentials
○ Stateless mechanism - killed user / http sessions
○ Support both
■ Shibboleth-based authentication for students
(identity federation)
■ form based login for other user classes
22. The data store
● Single RDBMS (postgresql) instance with
fallback
● “Plan B” alternatives
○ Cold replicas for reports
○ Sharding / hot replication (one writer / multiple
readers)
○ Moving parts of the data to NoSQL or even use Solr
as a NoSQL data store
● Currently handling the load without
sweating
23. Findings
● If your user space has an upper bound, you
can make it with an RDBMS
● Stateless is the way to go
● Use caching at multiple levels and save on
resources