Google Devfest 2009 Argentina - Intro to Appengine

From Spark Plug to Drive Train:
Life of an App Engine Request
Patrick Chanezon @chanezon
Ignacio Blanco @blanconet
11/16/09
Post your questions for this talk on Twitter #devfest09 #appengine

Designing for Scale and Reliability

Agenda

Designing for Scale and Reliability

App Engine: Design Motivations

Life of a Request
Request for static content
Request for dynamic content
Requests that use APIs

App Engine: Design Motivations, Recap

App Engine: The Numbers

LiveJournal circa 2007
From Brad Fitzpatrick's USENIX '07 talk:
"LiveJournal: Behind the Scenes"

LiveJournal circa 2007
From Brad Fitzpatrick's USENIX '07 talk:
"LiveJournal: Behind the Scenes" Memcache
Application
Frontends Servers Storage

Static File Servers

Basic LAMP

Linux, Apache, MySQL,
Programming Language
Scalable?
Shared machine for database
and webserver
Reliable?
Single point of failure (SPOF)

Dedicated Database

Database running on a
separate server
Requirements
Another machine plus
additional management
Scalable?
Up to one web server
Reliable?
Two single points of failure

Multiple Web Servers

Benefits:
Grow traffic beyond the capacity of one
webserver
Requirements:
More machines
Set up load balancing

Load Balancing: DNS Round Robin

Register list of IPs with DNS
Statistical load balancing
DNS record is cached with Time To Live (TTL)
TTL may not be respected


Register list of IPs with DNS
Statistical load balancing
DNS record is cached with Time To Live (TTL)
TTL may not be respected
Now wait for DNS changes to propagate :-(


Scalable?
Add more webservers as necessary
Still I/O bound on one database
Reliable?
Cannot redirect traffic quickly
Database still SPOF

Reverse Proxy

Benefits:
Custom Routing
Specialization
Application-level load balancing
Requirements:
More machines
Configuration and code for reverse proxies

Reverse Proxy
Scalable?
Add more web servers
Specialization
Bound by
Routing capacity of reverse proxy
One database server
Reliable?
Agile application-level routing
Specialized components are more robust
Multiple reverse proxies requires network-level routing
DNS Round Robin (again)
Fancy network routing hardware
Database is still SPOF

Master-Slave Database

Benefits:
Better read throughput
Invisible to application
Requirements:
Even more machines
Changes to MySQL


Scalable?
Scales read rate with # of servers
But not writes

What happens eventually?


Reliable?
Master is SPOF for writes
Master may die before replication

Partitioned Database

Benefits: Requirements:
Increase in both read Even more machines
and write throughput Lots of management
Re-architect data model
Rewrite queries

App Engine:
Design Motivations

Design Motivations

Build on Existing Google Technology

Provide an Integrated Environment

Encourage Small Per-Request Footprints

Encourage Fast Requests

Maintain Isolation Between Applications

Encourage Statelessness and Specialization

Require Partitioned Data Model

Life of a Request:
Request for Static Content

Request for Static Content on Google Network

Routed to the nearest Google datacenter
Travels over Google's network
Same infrastructure other Google products use
Lots of advantages for free

Routing at the Front End

Google App Engine Front Ends
Load balancing
Routing

Frontends route static requests to specialized serving
infrastructure

Static Content Servers

Google Static Content Serving
Built on shared Google Infrastructure
Static files are physically separate from code files
How are static files defined?

What content is static?

Java Runtime: appengine-web.xml
...
<static>
<include path="/**.png" />
<exclude path="/data/**.png />
</static>
...

Python Runtime: app.yaml
...
- url: /images
static_dir: static/images
OR
- url: /images/(.*)
static_files: static/images/1
upload: static/images/(.*)
...

Request For Static Content
Response to the user

Back to the Front End and out to the user

Specialized infrastructure
App runtimes don't serve static content

Life of a Request:
Request for Dynamic Content

Request for Dynamic Content: New Components
App Servers and App Master

App Servers
Serve dynamic requests
Where your code runs

App Master
Schedules applications
Informs Front Ends

Request for Dynamic Content: Appservers
What do they do?

Many applications
Many concurrent requests
Smaller app footprint + fast requests = more apps
Enforce Isolation
Keeps apps safe from each other
Enforce statelessness
Allows for scheduling flexibility
Service API requests

Request For Dynamic Content
Routing at the Frontend

Front Ends route dynamic
requests to App Servers

Request for Dynamic Content
App Server

1. Checks for cached runtime
If it exists, no initialization
2. Execute request
3. Cache the runtime
System is designed to maximize caching

Slow first request, faster subsequent requests
Optimistically cache data in your runtime!

Life of a Request:
Requests accessing APIs

API Requests
App Server

1. App issues API call
2. App Server accepts
3. App Server blocks runtime
4. App Server issues call
5. Returns the response

Use APIs to do things you
don't want to do in your
runtime, such as...

Memcacheg
A more persistent in-memory cache

Distributed in-memory cache
memcacheg
Also written by Brad Fitzpatrick
adds: set_multi, get_multi, add_multi
Optimistically cache for optimization
Very stable, robust and specialized

The App Engine Datastore
Persistent storage

Bigtable in one slide
Scalable structured storage

Is not... Is...

database sharded, sorted array
sharded database
distributed hashtable

Datastore Under the Covers

Scalability does not come for free
Datastore built on top of Bigtable
Queries
Indexes

Bigtable

Sort

Machine #1

Machine #2

Machine #3

Bigtable
Operations

Read Write Delete

Single row transaction (atomic)

Operate

Bigtable
Operations

Scan: prefix b Scan: range b to d

Entities table

Primary table
Every entity in every app
Generic, schemaless
Row name -> entity key
Column (only 1) serialized entity

Entity keys
Based on parent entities: root entity, then child, child, ...

class Grandparent(db.Model): pass
class Parent(db.Model): pass
class Child(db.Model): pass

Felipe = Grandparent()
Carlos = Parent(parent=Felipe)
Ignacio = Child(parent=Carlos)

my family
/Grandparent:Felipe
/Grandparent:Felipe/Parent:Carlos
/Grandparent:Felipe/Parent:Carlos/Child:Ignacio

Queries
GQL < SQL

Restrict by kind
Filter by property values
Sort on properties
Ancestor

SELECT * FROM Person ...

WHERE name = 'John';

ORDER BY name DESC;

WHERE city = 'Sonoma' AND state = 'CA'
AND country = 'USA';

WHERE ANCESTOR IS :Felipe ORDER BY name;

Indexes

Dedicated Bigtable tables
Map property values to entities
Each row = index data + entity key

Convert queries to scans

No filtering in memory
No sorting in memory

Query planning
Convert query to dense index scan

1. Pick index
2. Compute prefix or range
3. Scan!

Kind index

SELECT * FROM Grandparent

1. Kind index
2. Prefix = Grandparent
3. Scan!

Row name Row name

Single-property index
Queries on single property (one ASC one DESC)

WHERE name = 'John'
ORDER BY name DESC
WHERE name >= 'B' AND name < 'C' ORDER BY name

Row name Key

Queries on single property (one ASC one DESC)

Row name Key


WHERE name = 'John'
1. Prefix = Parent name John (fully specified)
2. Scan!


ORDER BY name DESC
1. Prefix = Parent name
2. Scan!


WHERE name >= 'B' and name < 'C' ORDER BY name
1. RANGE = [Parent name B, Parent name C)
2. Scan!

The App Engine Datastore
Persistent storage

Your data is already partitioned
Use Entity Groups
Explicit Indexes make for fast reads
But slower writes
Replicated and fault tolerant
On commit: ≥3 machines
Geographically distributed
Bonus: Keep globally unique IDs for free

Other APIs

GMail Gadget API

Google Tasks Queue
Accounts

Picasaweb Google Talk

App Engine:
Design Motivations, Recap

Build on Existing Google Technology

creative commons licensed photograph: http://www.flickr.com/photos/cote/54408562/

Provide an Integrated Environment

Why?
Manage all apps together
What it means for you:
Follow best practices
Some restrictions
Use our tools
Benefits:
Use our tools
Admin Console
All of your logs in one place
No machines to configure or manage
Easy deployment

Encourage Small Per-Request Footprints

Why?
Better utilization of App Servers
Fairness
What it means for your app:
Less Memory Usage
Limited CPU
Benefits:
Better use of resources

Encourage Fast Requests

Why?
Better utilization of appservers
Fairness between applications
Routing and scheduling agility
Runtime caching
Request deadlines
Benefits:
Optimistically share state between requests
Better throughput
Fault tolerance

Maintain Isolation Between Apps

Why?
Safety
Predictability
Certain system calls unavailable
Benefits:
Security
Performance

Encourage Statelessness and Specialization

Why?
App Server performance
Load balancing
Fault tolerance
What this means for you app:
Use API calls
Benefits:
Automatic load balancing
Fault tolerance
Less code for you to write

Require Partitioned Data Model

Why?
The Datastore
What this means for your app:
Data model + Indexes
Reads are fast, writes are slower
Benefits:
Design your schema once
No need to re-architect for scalability
More efficient use of cpu and memory

Google App Engine:
The Numbers

Google App Engine

Currently, over 100K applications

Serving over 250M pageviews per day

Written by over 200K developers

Open For Questions
The White House's "Open For Questions" application accepted
100K questions and 3.6M votes in under 48 hours

18 months in review
Apr 2008 Python launch
May 2008 Memcache, Images API
Jul 2008 Logs export
Aug 2008 Batch write/delete
Oct 2008 HTTPS support
Dec 2008 Status dashboard, quota details
Feb 2009 Billing, larger files
Apr 2009 Java launch, DB import, cron support, SDC
May 2009 Key-only queries
Jun 2009 Task queues
Aug 2009 Kindless queries
Sep 2009 XMPP
Oct 2009 Incoming email

Roadmap

Storing/serving large files
Mapping operations across data
sets
Datastore cursors
Notification alerts for application
exceptions
Datastore dump and restore
facility

Always free to get started
~5M pageviews/month
6.5 CPU hrs/day
1 GB storage
650K URL Fetch calls
2,000 recipients emailed
1 GB/day bandwidth
N tasks

Purchase additional resources *

* free monthly quota of ~5 million page views still in full
effect

Thank you

Read more
http://code.google.com/appengine/

Questions?
Post your questions for this talk on Twitter #devfest09 #appengine

Thanks
To Alon Levi, Fred Sauer for most of these slides

Google Devfest 2009 Argentina - Intro to Appengine

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (16)

Semelhante a Google Devfest 2009 Argentina - Intro to Appengine

Semelhante a Google Devfest 2009 Argentina - Intro to Appengine (20)

Mais de Patrick Chanezon

Mais de Patrick Chanezon (20)

Último

Último (20)

Google Devfest 2009 Argentina - Intro to Appengine