This document summarizes a presentation on scaling Pinterest. It discusses how Pinterest scaled from using a single small web server and database in 2010 to using Amazon EC2, S3, multiple databases, caches and task queues with over 100 servers and 25 engineers by 2012. It also covers lessons learned around keeping infrastructure simple initially and the challenges of clustering versus sharding databases.
How to Troubleshoot Apps for the Modern Connected Worker
Pinterest arch summit august 2012 - scaling pinterest
1. 感谢您参加本次Ar h u
c S mmi全球架构师峰会!
t
大会官方网站与资料下载地址:
www. c um m i . om
ar hs tc
2. Scaling
Marty Weiner Evrhet Milam
Krypton Batcave
12年8月10⽇日星期五
TODO:
Title page names
Pass on page title consistency
Fill out numbers
Put "always test in production" pin in screenshot of website
3. Pinterest is . . .
An online pinboard to organize
and
share what inspires you.
Scaling Pinterest
12年8月10⽇日星期五
Talking points should be presented with any key phrases in bold, and everything else regular
weight. All text should always be centered.
4. 12年8月10⽇日星期五
Talking points should be presented with any key phrases in bold, and everything else regular
weight. All text should always be centered.
5. 12年8月10⽇日星期五
Images should be full-bleed when possible. Captions should be succinct and appear in bold
in the bottom right corner. If necessary, you can make this white to make it legible (like this
one).
6. 12年8月10⽇日星期五
Talking points should be presented with any key phrases in bold, and everything else regular
weight. All text should always be centered.
7. Relationships
Marty Weiner
Grayskull, Eternia
Yashh Nelapati
Gotham City
Scaling Pinterest
12年8月10⽇日星期五
If you have to use a list, do this. But remember, the less stuff on a slide, the better
8. Page Views / Day
· RackSpace
· 1 small Web Engine
·
Mar 2010 Jan 2011 Jan 2012
1 small MySQL DB
· 1 Engineer
Mar 2010 Jan 2011 Jan 2012 May 2012
Scaling Pinterest
12年8月10⽇日星期五
4-5 mts.
9. · Amazon EC2 + Page Views / Day
S3 + CloudFront
· 1 NGinX, 4 Web Engines
· 1 MySQL DB + 1 Read Slave
· 1 Task Queue + 2 Task Processors
· 1 MongoDB
Mar 2010
· 2 Engineers
Jan 2011 Jan 2012 May 2012
Scaling Pinterest
12年8月10⽇日星期五
TODO: Show total somewhere
10. · Amazon EC2 + S3 + CloudFront
· 2 NGinX, 16 Web Engines + 2 API Engines
Page Views / Day
· 5 Functionally Sharded MySQL DB + 9 read slaves
· 4 Cassandra Nodes
· 15 Membase Nodes (3 separate clusters)
· 8 Memcache Nodes
· 10 Redis Nodes
Mar 2010·
Mar 2010
3 Task Routers + 4 Task Processors
Jan 2011
Jan 2011
Jan 2012
Jan 2012
May 2012
· 4 Elastic Search Nodes
· 3 Mongo Clusters
· 3 Engineers
Scaling Pinterest
12年8月10⽇日星期五
11. Lesson Learned #1
It will fail. Keep it simple.
Scaling Pinterest
12年8月10⽇日星期五
Talking points should be presented with any key phrases in bold, and everything else regular
weight. All text should always be centered.
12. · Amazon EC2 + S3 + Akamai, ELB
Page Views / Day
· 90 Web Engines + 50 API Engines
· 66 MySQL DBs (m1.xlarge) + 1 slave each
· 59 Redis Instances
· 51 Memcache Instances
· 1 Redis Task Manager + 25 Task Processors
Mar 2010 · Sharded Solr Jan 2011 Jan 2012 May 2012
· 6 Engineers
Scaling Pinterest
12年8月10⽇日星期五
13. · Amazon EC2 + S3 + Edge Cast, ELB
Page Views / Day
· 135 Web Engines + 75 API Engines
· 80 MySQL DBs (m1.xlarge) + 1 slave each
· 110 Redis Instances
· 60 Memcache Instances
· 2 Redis Task Manager + 60 Task Processors
Mar 2010 · Sharded Solr Jan 2011 Jan 2012 May 2012
· 25 Engineers
Scaling Pinterest
12年8月10⽇日星期五
14. Why Amazon EC2/S3?
· Very good reliability, reporting, and support
· Very good peripherals, such as managed
cache, DB, load balancing, DNS, map
reduce, and more...
· New instances ready in seconds
· Con: Limited choice
· Pro: Limited choice
Scaling Pinterest
12年8月10⽇日星期五
If you have to use a list, do this. But remember, the less stuff on a slide, the better
15. Why MySQL?
· Extremely mature
· Well known and well liked
· Rarely catastrophic loss of data
· Response time to request rate increases linearly
· Very good software support - XtraBackup, Innotop,
Maatkit
· Solid active community
· Very good support from Percona
· Free
Scaling Pinterest
12年8月10⽇日星期五
TODO: Animate in a money bag
16. Why Memcache?
· Extremely mature
· Very good performance
· Well known and well liked
· Never crashes, and few failure modes
· Free
Scaling Pinterest
12年8月10⽇日星期五
If you have to use a list, do this. But remember, the less stuff on a slide, the better
17. Why Redis?
· Variety of convenient data structures
· Has persistence and replication
· Well known and well liked
· Consistently good performance
· Few failure modes
· Free
Scaling Pinterest
12年8月10⽇日星期五
Data structures -- list, set, sorted set, pubsub, string. All support atomic operations. All
support pipelining.
18. Clustering
vs
Sharding
Scaling Pinterest
12年8月10⽇日星期五
clustered = automatically managed nodes with autoscaling, sharding = engineer assigns how
data is distributed, splits databases, and writes code to redistribute data (if that option is
supported)
19. Clustering
· Data distributed automatically
· Data can move
· Rebalances to distribute capacity
· Nodes communicate with each othe
Sharding
Scaling Pinterest
12年8月10⽇日星期五
clustered = automatically managed nodes with autoscaling, sharding = engineer assigns how
data is distributed, splits databases, and writes code to redistribute data (if that option is
supported). Clustered nodes gossip with each other to determine if rebalancing is needed
20. Clustering
· Data distributed manually
· Data does not move
· Split data to distribute load
· Nodes are not aware of each other
Sharding
Scaling Pinterest
12年8月10⽇日星期五
clustered = automatically managed nodes with autoscaling, sharding = engineer assigns how
data is distributed, splits databases, and writes code to redistribute data (if that option is
supported). Clustered nodes gossip with each other to determine if rebalancing is needed
21. Why Clustering?
· Examples: Cassandra, MemBase, HBase, Riak
· Automatically scale your datastore
· Easy to set up
· Spatially distribute and colocate your data
· High availability
· Load balancing
· No single point of failure
Scaling Pinterest
12年8月10⽇日星期五
What could go wrong?
22. What could possibly go wrong?
source: thereifixedit.com
Scaling Pinterest
12年8月10⽇日星期五
If you have to use a list, do this. But remember, the less stuff on a slide, the better
23. Why Not Clustering?
· Still fairly young
· Fundamentally complicated
· Less community support
· Fewer engineers with working knowledge
· Difficult and scary upgrade mechanisms
· And, yes, there is a single point of failure. A
BIG one.
Scaling Pinterest
12年8月10⽇日星期五
I lied, there is major single point of failure. The cluster management system. What if it fails? When will it fail? How will you, a non DB developer, fix it? Clustering is a pipe dream. Maybe
someday it will work -- same day when there's a perfect makefile system, version control, bug management
24. Clustering Single Point of Failure
Cluster
Management
Algorithm
Scaling Pinterest
12年8月10⽇日星期五
joke (without names) about how when the cluster manager fails, load balancing may stop one
night and keep you up til 6am, or the cluster manager sprays bad data to all nodes and teh
CEO of the cluster company contacts you telling you your data is gone and can he buy you a
pizza.
25. Cluster Manager
· Same complex code replicated over all nodes
· Failure modes:
· Data rebalance breaks
· Data corruption across all nodes
· Improper balancing that cannot be fixed
(easily)
· Data authority failure
Scaling Pinterest
12年8月10⽇日星期五
What could go wrong?
26. Lesson Learned #2
Clustering is scary.
Scaling Pinterest
12年8月10⽇日星期五
Swap speakers after this slide
27. Why Sharding?
· Can split your databases to add more capacity
· Spatially distribute and colocate your data
· High availability
· Load balancing
· Algorithm for placing data is very simple
· ID generation is simplistic
Scaling Pinterest
12年8月10⽇日星期五
ID management is a single point of failure too, but much simpler and easy to test/debug.
28. When to shard?
· Sharding makes schema design harder
· Solidify site design and backend architecture
· Remove all joins and complex queries, add
cache
· Functionally shard as much as possible
· Still growing? Shard.
Scaling Pinterest
12年8月10⽇日星期五
Maybe a pictograph here showing sharding early = faster transition, but unnecessary
complexity. later = slower transition
29. Our Transition
1 DB + Foreign Keys + Joins
1 DB + Denormalized +
Cache + Read slaves +
1 DB
Cache
Several functionally sharded DBs + Read slaves +
Cache ID sharded DBs + Backup slaves +
Cache
Scaling Pinterest
12年8月10⽇日星期五
Another possible splitting point
30. Watch out for...
· Cannot perform most JOINS
· No transaction capabilities
· Extra effort to maintain unique constraints
· Schema changes requires more planning
· Reports require running same query on all
shards
Scaling Pinterest
12年8月10⽇日星期五
Another possible splitting point
32. Sharded Server Topology
db00001 db00513 db03072 db03584
db00002 db00514 db03073 db03585
....... ....... ....... .......
db00512 db01024 db03583 db04096
Initially, 8 physical servers, each with 512
DBs
Scaling Pinterest
12年8月10⽇日星期五
If you have to use a list, do this. But remember, the less stuff on a slide, the better
33. High Availability
db00001 db00513 db03072 db03584
db00002 db00514 db03073 db03585
....... ....... ....... .......
db00512 db01024 db03583 db04096
Multi Master
replication
Scaling Pinterest
12年8月10⽇日星期五
If you have to use a list, do this. But remember, the less stuff on a slide, the better
34. Increased load on DB?
db00001
db00002
.......
db00256
db00001
db00002
....... db00257
db00512 db00258
.......
db00512
To increase capacity, a server is replicated
and the new replica becomes responsible for
some DBs
Scaling Pinterest
12年8月10⽇日星期五
If you have to use a list, do this. But remember, the less stuff on a slide, the better
35. ID Structure
64 bits
Shard ID Type Local ID
· A lookup data structure has physical server to
shard ID range (cached by each app server
process)
· Shard ID denotes which shard
· Type denotes object type (e.g., pins)
· Local ID denotes position in table
Scaling Pinterest
12年8月10⽇日星期五
If you have to use a list, do this. But remember, the less stuff on a slide, the better
36. Lookup Structure
{“sharddb001a”: ( 1, 512),
“sharddb002b”: ( 513, 1024),
“sharddb003a”: (1025, 1536),
...
“sharddb008b”: (3585, 4096)}
sharddb003a DB01025 users
users 1 ser-data
user_has_boards 2 ser-data
boards 3 ser-data
Scaling Pinterest
12年8月10⽇日星期五
If you have to use a list, do this. But remember, the less stuff on a slide, the better
37. ID Structure
· New users are randomly distributed across
shards
· Boards, pins, etc. try to be collocated with user
· Local ID’s are assigned by auto-increment
· Enough ID space for 65536 shards, but only
first 4096 opened initially. Can expand
horizontally.
Scaling Pinterest
12年8月10⽇日星期五
If you have to use a list, do this. But remember, the less stuff on a slide, the better
38. · Object tables (e.g., pin, board, user, comment)
Objects and Mappings
· Local ID MySQL blob (JSON / Serialized
thrift)
· Mapping tables (e.g., user has boards, pin has
likes)
· Full ID Full ID (+ timestamp)
· Naming schema is noun_verb_noun
· Queries are PK or index lookups (no joins)
· Data DOES NOT MOVE
· All tables exist on all shards
· No schema changes required (index = new
Scaling Pinterest
table)
12年8月10⽇日星期五
If you have to use a list, do this. But remember, the less stuff on a slide, the better
39. Loading a Page
· Rendering user profile
SELECT body FROM users WHERE id=<local_user_id>
SELECT board_id FROM user_has_boards WHERE user_id=<user_id>
SELECT body FROM boards WHERE id IN (<board_ids>)
SELECT pin_id FROM board_has_pins WHERE board_id=<board_id>
SELECT body FROM pins WHERE id IN (pin_ids)
· Most of these calls will be a cache hit
· Omitting offset/limits and mapping sequence
id sort
Scaling Pinterest
12年8月10⽇日星期五
40. Scripting
· Must get old data into your shiny new shard
· 500M pins, 1.6B follower rows, etc
· Build a scripting farm
· Spawn more workers and complete the task
faster
· Pyres - based on Github’s Resque queue
Scaling Pinterest
12年8月10⽇日星期五
If you have to use a list, do this. But remember, the less stuff on a slide, the better
41. Future
· Sharded MySQL is here to stay
· Auto-sharding on top of MySQL becoming
viable
· Clustering may become hardened in 5 to 10
years
Scaling Pinterest
12年8月10⽇日星期五
If you have to use a list, do this. But remember, the less stuff on a slide, the better
42. In The Works
· Service Based Architecture
· Connection limits
· Isolation of functionality
· Isolation of access (security)
· Scaling the Team
Scaling Pinterest
12年8月10⽇日星期五
Connection limits + Isolation of functionality = service oriented architecture
43. Lesson Learned #3
Keep it fun.
Scaling Pinterest
12年8月10⽇日星期五
Swap speakers after this slide
44. We are Hiring!
jobs@pinterest.com
Scaling Pinterest
12年8月10⽇日星期五
Connection limits + Isolation of functionality = service oriented architecture
45. Questions?
marty@pinterest.com yashh@pinterest.com
evrhet@pinterest.com
Scaling Pinterest
12年8月10⽇日星期五
Talking points should be presented with any key phrases in bold, and everything else regular
weight. All text should always be centered.