SlideShare uma empresa Scribd logo
1 de 46
Baixar para ler offline
感谢您参加本次Ar h u
         c S mmi全球架构师峰会!
               t
大会官方网站与资料下载地址:
www. c um m i . om
   ar hs    tc
Scaling




                          Marty Weiner                                   Evrhet Milam
                          Krypton                                        Batcave




12年8月10⽇日星期五
TODO:
Title page names
Pass on page title consistency
Fill out numbers
Put "always test in production" pin in screenshot of website
Pinterest is . . .
          An online pinboard to organize
                        and
             share what inspires you.



  Scaling Pinterest

12年8月10⽇日星期五

Talking points should be presented with any key phrases in bold, and everything else regular
weight. All text should always be centered.
12年8月10⽇日星期五

Talking points should be presented with any key phrases in bold, and everything else regular
weight. All text should always be centered.
12年8月10⽇日星期五

Images should be full-bleed when possible. Captions should be succinct and appear in bold
in the bottom right corner. If necessary, you can make this white to make it legible (like this
one).
12年8月10⽇日星期五

Talking points should be presented with any key phrases in bold, and everything else regular
weight. All text should always be centered.
Relationships




                      Marty Weiner
                      Grayskull, Eternia




                                           Yashh Nelapati
                                           Gotham City
  Scaling Pinterest

12年8月10⽇日星期五

If you have to use a list, do this. But remember, the less stuff on a slide, the better
Page Views / Day


                       ·   RackSpace
                       ·   1 small Web Engine
                       ·
          Mar 2010                       Jan 2011                 Jan 2012
                           1 small MySQL DB
                       ·   1 Engineer

            Mar 2010          Jan 2011                 Jan 2012     May 2012




  Scaling Pinterest

12年8月10⽇日星期五

4-5 mts.
·   Amazon EC2 + Page Views / Day
                                         S3 + CloudFront
                       ·   1 NGinX, 4 Web Engines
                       ·   1 MySQL DB + 1 Read Slave
                       ·   1 Task Queue + 2 Task Processors
                       ·   1 MongoDB

            Mar 2010
                       ·   2 Engineers
                                         Jan 2011             Jan 2012   May 2012




  Scaling Pinterest

12年8月10⽇日星期五

TODO: Show total somewhere
·   Amazon EC2 + S3 + CloudFront
                  ·   2 NGinX, 16 Web Engines + 2 API Engines
                                      Page Views / Day

                  ·   5 Functionally Sharded MySQL DB + 9 read slaves
                  ·   4 Cassandra Nodes
                  ·   15 Membase Nodes (3 separate clusters)
                  ·   8 Memcache Nodes
                  ·   10 Redis Nodes
          Mar 2010·
           Mar 2010
                      3 Task Routers + 4 Task Processors
                                    Jan 2011
                                               Jan 2011
                                                           Jan 2012
                                                                      Jan 2012
                                                                         May 2012

                  ·   4 Elastic Search Nodes
                  ·   3 Mongo Clusters
                  ·   3 Engineers
  Scaling Pinterest

12年8月10⽇日星期五
Lesson Learned #1
                      It will fail. Keep it simple.




  Scaling Pinterest

12年8月10⽇日星期五

Talking points should be presented with any key phrases in bold, and everything else regular
weight. All text should always be centered.
·   Amazon EC2 + S3 + Akamai, ELB
                                                Page Views / Day
                       ·   90 Web Engines + 50 API Engines
                       ·   66 MySQL DBs (m1.xlarge) + 1 slave each
                       ·   59 Redis Instances
                       ·   51 Memcache Instances
                       ·   1 Redis Task Manager + 25 Task Processors
            Mar 2010   ·   Sharded Solr   Jan 2011                 Jan 2012   May 2012



                       ·   6 Engineers



  Scaling Pinterest

12年8月10⽇日星期五
·   Amazon EC2 + S3 + Edge Cast, ELB
                                                Page Views / Day
                       ·   135 Web Engines + 75 API Engines
                       ·   80 MySQL DBs (m1.xlarge) + 1 slave each
                       ·   110 Redis Instances
                       ·   60 Memcache Instances
                       ·   2 Redis Task Manager + 60 Task Processors
            Mar 2010   ·   Sharded Solr   Jan 2011                 Jan 2012   May 2012



                       ·   25 Engineers



  Scaling Pinterest

12年8月10⽇日星期五
Why Amazon EC2/S3?
             · Very good reliability, reporting, and support
             · Very good peripherals, such as managed
                  cache, DB, load balancing, DNS, map
                  reduce, and more...
             · New instances ready in seconds
             · Con: Limited choice
             · Pro: Limited choice

  Scaling Pinterest

12年8月10⽇日星期五

If you have to use a list, do this. But remember, the less stuff on a slide, the better
Why MySQL?
             ·    Extremely mature
             ·    Well known and well liked
             ·    Rarely catastrophic loss of data
             ·    Response time to request rate increases linearly
             ·    Very good software support - XtraBackup, Innotop,
                  Maatkit
             · Solid active community
             · Very good support from Percona
             · Free

  Scaling Pinterest

12年8月10⽇日星期五

TODO: Animate in a money bag
Why Memcache?
             ·    Extremely mature
             ·    Very good performance
             ·    Well known and well liked
             ·    Never crashes, and few failure modes
             ·    Free




  Scaling Pinterest

12年8月10⽇日星期五

If you have to use a list, do this. But remember, the less stuff on a slide, the better
Why Redis?
             ·    Variety of convenient data structures
             ·    Has persistence and replication
             ·    Well known and well liked
             ·    Consistently good performance
             ·    Few failure modes
             ·    Free



  Scaling Pinterest

12年8月10⽇日星期五

Data structures -- list, set, sorted set, pubsub, string. All support atomic operations. All
support pipelining.
Clustering
                                     vs
                                  Sharding




  Scaling Pinterest

12年8月10⽇日星期五

clustered = automatically managed nodes with autoscaling, sharding = engineer assigns how
data is distributed, splits databases, and writes code to redistribute data (if that option is
supported)
Clustering



                                 ·   Data distributed automatically
                                 ·   Data can move
                                 ·   Rebalances to distribute capacity
                                 ·   Nodes communicate with each othe



                Sharding
  Scaling Pinterest

12年8月10⽇日星期五

clustered = automatically managed nodes with autoscaling, sharding = engineer assigns how
data is distributed, splits databases, and writes code to redistribute data (if that option is
supported). Clustered nodes gossip with each other to determine if rebalancing is needed
Clustering



                                 ·   Data distributed manually
                                 ·   Data does not move
                                 ·   Split data to distribute load
                                 ·   Nodes are not aware of each other



                Sharding
  Scaling Pinterest

12年8月10⽇日星期五

clustered = automatically managed nodes with autoscaling, sharding = engineer assigns how
data is distributed, splits databases, and writes code to redistribute data (if that option is
supported). Clustered nodes gossip with each other to determine if rebalancing is needed
Why Clustering?
             ·    Examples: Cassandra, MemBase, HBase, Riak
             ·    Automatically scale your datastore
             ·    Easy to set up
             ·    Spatially distribute and colocate your data
             ·    High availability
             ·    Load balancing
             ·    No single point of failure

  Scaling Pinterest

12年8月10⽇日星期五

What could go wrong?
What could possibly go wrong?




                                                          source: thereifixedit.com




  Scaling Pinterest

12年8月10⽇日星期五

If you have to use a list, do this. But remember, the less stuff on a slide, the better
Why Not Clustering?
                  ·     Still fairly young
                  ·     Fundamentally complicated
                  ·     Less community support
                  ·     Fewer engineers with working knowledge
                  ·     Difficult and scary upgrade mechanisms
                  ·     And, yes, there is a single point of failure. A
                        BIG one.


   Scaling Pinterest

12年8月10⽇日星期五
I lied, there is major single point of failure. The cluster management system. What if it fails? When will it fail? How will you, a non DB developer, fix it? Clustering is a pipe dream. Maybe
someday it will work -- same day when there's a perfect makefile system, version control, bug management
Clustering Single Point of Failure


                                                               Cluster
                                                              Management
                                                              Algorithm




  Scaling Pinterest

12年8月10⽇日星期五

joke (without names) about how when the cluster manager fails, load balancing may stop one
night and keep you up til 6am, or the cluster manager sprays bad data to all nodes and teh
CEO of the cluster company contacts you telling you your data is gone and can he buy you a
pizza.
Cluster Manager
             · Same complex code replicated over all nodes
             · Failure modes:
               · Data rebalance breaks
               · Data corruption across all nodes
               · Improper balancing that cannot be fixed
                      (easily)
                  · Data authority failure

  Scaling Pinterest

12年8月10⽇日星期五

What could go wrong?
Lesson Learned #2
                      Clustering is scary.




  Scaling Pinterest

12年8月10⽇日星期五

Swap speakers after this slide
Why Sharding?
             ·    Can split your databases to add more capacity
             ·    Spatially distribute and colocate your data
             ·    High availability
             ·    Load balancing
             ·    Algorithm for placing data is very simple
             ·    ID generation is simplistic


  Scaling Pinterest

12年8月10⽇日星期五

ID management is a single point of failure too, but much simpler and easy to test/debug.
When to shard?
             · Sharding makes schema design harder

             · Solidify site design and backend architecture
             · Remove all joins and complex queries, add
                  cache
             · Functionally shard as much as possible
             · Still growing? Shard.

  Scaling Pinterest

12年8月10⽇日星期五

Maybe a pictograph here showing sharding early = faster transition, but unnecessary
complexity. later = slower transition
Our Transition
                          1 DB + Foreign Keys + Joins
                          1 DB + Denormalized +
                          Cache + Read slaves +
                           1 DB
                      Cache
         Several functionally sharded DBs + Read slaves +
         Cache ID sharded DBs + Backup slaves +

                      Cache


  Scaling Pinterest

12年8月10⽇日星期五

Another possible splitting point
Watch out for...
             ·    Cannot perform most JOINS
             ·    No transaction capabilities
             ·    Extra effort to maintain unique constraints
             ·    Schema changes requires more planning
             ·    Reports require running same query on all
                  shards


  Scaling Pinterest

12年8月10⽇日星期五

Another possible splitting point
How we sharded




  Scaling Pinterest

12年8月10⽇日星期五
Sharded Server Topology




                      db00001        db00513               db03072              db03584
                      db00002        db00514               db03073              db03585
                        .......        .......               .......              .......
                      db00512        db01024               db03583              db04096


                      Initially, 8 physical servers, each with 512
                      DBs
  Scaling Pinterest

12年8月10⽇日星期五

If you have to use a list, do this. But remember, the less stuff on a slide, the better
High Availability




                      db00001          db00513            db03072              db03584
                      db00002          db00514            db03073              db03585
                        .......          .......            .......              .......
                      db00512          db01024            db03583              db04096


                                   Multi Master
                                   replication
  Scaling Pinterest

12年8月10⽇日星期五

If you have to use a list, do this. But remember, the less stuff on a slide, the better
Increased load on DB?

                                                                   db00001
                                                                   db00002
                                                                     .......
                                                                   db00256




                         db00001
                         db00002
                           .......                                 db00257
                         db00512                                   db00258
                                                                     .......
                                                                   db00512
                 To increase capacity, a server is replicated
                 and the new replica becomes responsible for
                 some DBs
  Scaling Pinterest

12年8月10⽇日星期五

If you have to use a list, do this. But remember, the less stuff on a slide, the better
ID Structure
                                             64 bits


                      Shard ID        Type               Local ID
         · A lookup data structure has physical server to
              shard ID range (cached by each app server
              process)
         · Shard ID denotes which shard
         · Type denotes object type (e.g., pins)
         · Local ID denotes position in table
  Scaling Pinterest

12年8月10⽇日星期五

If you have to use a list, do this. But remember, the less stuff on a slide, the better
Lookup Structure

                                    {“sharddb001a”:   (   1, 512),
                                     “sharddb002b”:   ( 513, 1024),
                                     “sharddb003a”:   (1025, 1536),
                                      ...
                                     “sharddb008b”:   (3585, 4096)}




                      sharddb003a                      DB01025                   users


                                                        users                1   ser-data
                                                  user_has_boards            2   ser-data
                                                       boards                3   ser-data



  Scaling Pinterest

12年8月10⽇日星期五

If you have to use a list, do this. But remember, the less stuff on a slide, the better
ID Structure
         · New users are randomly distributed across
              shards
         · Boards, pins, etc. try to be collocated with user
         · Local ID’s are assigned by auto-increment
         · Enough ID space for 65536 shards, but only
              first 4096 opened initially. Can expand
              horizontally.


  Scaling Pinterest

12年8月10⽇日星期五

If you have to use a list, do this. But remember, the less stuff on a slide, the better
· Object tables (e.g., pin, board, user, comment)
                  Objects and Mappings
               · Local ID    MySQL blob (JSON / Serialized
                      thrift)
             · Mapping tables (e.g., user has boards, pin has
                  likes)
                  · Full ID  Full ID (+ timestamp)
                  · Naming schema is noun_verb_noun
             ·    Queries are PK or index lookups (no joins)
             ·    Data DOES NOT MOVE
             ·    All tables exist on all shards
             ·    No schema changes required (index = new
  Scaling Pinterest



                  table)
12年8月10⽇日星期五

If you have to use a list, do this. But remember, the less stuff on a slide, the better
Loading a Page
    · Rendering user profile
         SELECT       body FROM users WHERE id=<local_user_id>
         SELECT       board_id FROM user_has_boards WHERE user_id=<user_id>
         SELECT       body FROM boards WHERE id IN (<board_ids>)
         SELECT       pin_id FROM board_has_pins WHERE board_id=<board_id>
         SELECT       body FROM pins WHERE id IN (pin_ids)
    · Most of these calls will be a cache hit
    · Omitting offset/limits and mapping sequence
         id sort

  Scaling Pinterest

12年8月10⽇日星期五
Scripting
         · Must get old data into your shiny new shard
         · 500M pins, 1.6B follower rows, etc
         · Build a scripting farm
           · Spawn more workers and complete the task
                  faster
         · Pyres - based on Github’s Resque queue



  Scaling Pinterest

12年8月10⽇日星期五

If you have to use a list, do this. But remember, the less stuff on a slide, the better
Future
         · Sharded MySQL is here to stay
         · Auto-sharding on top of MySQL becoming
              viable
         · Clustering may become hardened in 5 to 10
              years




  Scaling Pinterest

12年8月10⽇日星期五

If you have to use a list, do this. But remember, the less stuff on a slide, the better
In The Works
                · Service Based Architecture
                  · Connection limits
                  · Isolation of functionality
                  · Isolation of access (security)
                · Scaling the Team



  Scaling Pinterest

12年8月10⽇日星期五

Connection limits + Isolation of functionality = service oriented architecture
Lesson Learned #3
                          Keep it fun.




  Scaling Pinterest

12年8月10⽇日星期五

Swap speakers after this slide
We are Hiring!
                                 jobs@pinterest.com




  Scaling Pinterest

12年8月10⽇日星期五

Connection limits + Isolation of functionality = service oriented architecture
Questions?

                      marty@pinterest.com              yashh@pinterest.com

                                     evrhet@pinterest.com


  Scaling Pinterest

12年8月10⽇日星期五

Talking points should be presented with any key phrases in bold, and everything else regular
weight. All text should always be centered.
杭州站·2012年 10月 25日 ~27日
大会官网:www.c n a g h uc m
        q o h n z o .o

Mais conteúdo relacionado

Semelhante a Pinterest arch summit august 2012 - scaling pinterest

Pinterest的数据库分片架构
Pinterest的数据库分片架构Pinterest的数据库分片架构
Pinterest的数据库分片架构Tommy Chiu
 
Morning with MongoDB Paris 2012 - Accueil et Introductions
Morning with MongoDB Paris 2012 - Accueil et IntroductionsMorning with MongoDB Paris 2012 - Accueil et Introductions
Morning with MongoDB Paris 2012 - Accueil et IntroductionsMongoDB
 
Navigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesNavigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesshnkr_rmchndrn
 
Spring Data NHJUG April 2012
Spring Data NHJUG April 2012Spring Data NHJUG April 2012
Spring Data NHJUG April 2012trisberg
 
SQL or NoSQL, that is the question!
SQL or NoSQL, that is the question!SQL or NoSQL, that is the question!
SQL or NoSQL, that is the question!Andraz Tori
 
North Bay Ruby Meetup 101911
North Bay Ruby Meetup 101911North Bay Ruby Meetup 101911
North Bay Ruby Meetup 101911Ines Sombra
 
Building Scalable Web Applications For The Cloud
Building Scalable Web Applications For The CloudBuilding Scalable Web Applications For The Cloud
Building Scalable Web Applications For The CloudCarl Mercier
 
Where Does Big Data Meet Big Database - QCon 2012
Where Does Big Data Meet Big Database - QCon 2012Where Does Big Data Meet Big Database - QCon 2012
Where Does Big Data Meet Big Database - QCon 2012Ben Stopford
 
Minnebar 2013 - Scaling with Cassandra
Minnebar 2013 - Scaling with CassandraMinnebar 2013 - Scaling with Cassandra
Minnebar 2013 - Scaling with CassandraJeff Bollinger
 
Building a data warehouse with Amazon Redshift … and a quick look at Amazon ...
Building a data warehouse  with Amazon Redshift … and a quick look at Amazon ...Building a data warehouse  with Amazon Redshift … and a quick look at Amazon ...
Building a data warehouse with Amazon Redshift … and a quick look at Amazon ...Julien SIMON
 
Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Don Demcsak
 
Machine Learning with JavaScript
Machine Learning with JavaScriptMachine Learning with JavaScript
Machine Learning with JavaScriptIvo Andreev
 
Scala Days Highlights | BoldRadius
Scala Days Highlights | BoldRadiusScala Days Highlights | BoldRadius
Scala Days Highlights | BoldRadiusBoldRadius Solutions
 
CloudFoundry and MongoDb, a marriage made in heaven
CloudFoundry and MongoDb, a marriage made in heavenCloudFoundry and MongoDb, a marriage made in heaven
CloudFoundry and MongoDb, a marriage made in heavenPatrick Chanezon
 
How big data moved the needle from monolithic SQL RDBMS to distributed NoSQL
How big data moved the needle from monolithic SQL RDBMS to distributed NoSQLHow big data moved the needle from monolithic SQL RDBMS to distributed NoSQL
How big data moved the needle from monolithic SQL RDBMS to distributed NoSQLSayyaparaju Sunil
 
3/15 - Intro to Spring Data Neo4j
3/15 - Intro to Spring Data Neo4j3/15 - Intro to Spring Data Neo4j
3/15 - Intro to Spring Data Neo4jNeo4j
 
Overview of Redundant Disk Arrays
Overview of Redundant Disk ArraysOverview of Redundant Disk Arrays
Overview of Redundant Disk ArraysAndrew Robinson
 

Semelhante a Pinterest arch summit august 2012 - scaling pinterest (20)

Pinterest的数据库分片架构
Pinterest的数据库分片架构Pinterest的数据库分片架构
Pinterest的数据库分片架构
 
Morning with MongoDB Paris 2012 - Accueil et Introductions
Morning with MongoDB Paris 2012 - Accueil et IntroductionsMorning with MongoDB Paris 2012 - Accueil et Introductions
Morning with MongoDB Paris 2012 - Accueil et Introductions
 
Navigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesNavigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skies
 
Spring Data NHJUG April 2012
Spring Data NHJUG April 2012Spring Data NHJUG April 2012
Spring Data NHJUG April 2012
 
SQL or NoSQL, that is the question!
SQL or NoSQL, that is the question!SQL or NoSQL, that is the question!
SQL or NoSQL, that is the question!
 
North Bay Ruby Meetup 101911
North Bay Ruby Meetup 101911North Bay Ruby Meetup 101911
North Bay Ruby Meetup 101911
 
NoSQL
NoSQLNoSQL
NoSQL
 
Building Scalable Web Applications For The Cloud
Building Scalable Web Applications For The CloudBuilding Scalable Web Applications For The Cloud
Building Scalable Web Applications For The Cloud
 
Where Does Big Data Meet Big Database - QCon 2012
Where Does Big Data Meet Big Database - QCon 2012Where Does Big Data Meet Big Database - QCon 2012
Where Does Big Data Meet Big Database - QCon 2012
 
Minnebar 2013 - Scaling with Cassandra
Minnebar 2013 - Scaling with CassandraMinnebar 2013 - Scaling with Cassandra
Minnebar 2013 - Scaling with Cassandra
 
Building a data warehouse with Amazon Redshift … and a quick look at Amazon ...
Building a data warehouse  with Amazon Redshift … and a quick look at Amazon ...Building a data warehouse  with Amazon Redshift … and a quick look at Amazon ...
Building a data warehouse with Amazon Redshift … and a quick look at Amazon ...
 
Iwmn architecture
Iwmn architectureIwmn architecture
Iwmn architecture
 
Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)
 
Machine Learning with JavaScript
Machine Learning with JavaScriptMachine Learning with JavaScript
Machine Learning with JavaScript
 
Scala Days Highlights | BoldRadius
Scala Days Highlights | BoldRadiusScala Days Highlights | BoldRadius
Scala Days Highlights | BoldRadius
 
CloudFoundry and MongoDb, a marriage made in heaven
CloudFoundry and MongoDb, a marriage made in heavenCloudFoundry and MongoDb, a marriage made in heaven
CloudFoundry and MongoDb, a marriage made in heaven
 
How big data moved the needle from monolithic SQL RDBMS to distributed NoSQL
How big data moved the needle from monolithic SQL RDBMS to distributed NoSQLHow big data moved the needle from monolithic SQL RDBMS to distributed NoSQL
How big data moved the needle from monolithic SQL RDBMS to distributed NoSQL
 
3/15 - Intro to Spring Data Neo4j
3/15 - Intro to Spring Data Neo4j3/15 - Intro to Spring Data Neo4j
3/15 - Intro to Spring Data Neo4j
 
Overview of Redundant Disk Arrays
Overview of Redundant Disk ArraysOverview of Redundant Disk Arrays
Overview of Redundant Disk Arrays
 
No sql
No sqlNo sql
No sql
 

Mais de drewz lin

Web security-–-everything-we-know-is-wrong-eoin-keary
Web security-–-everything-we-know-is-wrong-eoin-kearyWeb security-–-everything-we-know-is-wrong-eoin-keary
Web security-–-everything-we-know-is-wrong-eoin-kearydrewz lin
 
Via forensics appsecusa-nov-2013
Via forensics appsecusa-nov-2013Via forensics appsecusa-nov-2013
Via forensics appsecusa-nov-2013drewz lin
 
Phu appsec13
Phu appsec13Phu appsec13
Phu appsec13drewz lin
 
Owasp2013 johannesullrich
Owasp2013 johannesullrichOwasp2013 johannesullrich
Owasp2013 johannesullrichdrewz lin
 
Owasp advanced mobile-application-code-review-techniques-v0.2
Owasp advanced mobile-application-code-review-techniques-v0.2Owasp advanced mobile-application-code-review-techniques-v0.2
Owasp advanced mobile-application-code-review-techniques-v0.2drewz lin
 
I mas appsecusa-nov13-v2
I mas appsecusa-nov13-v2I mas appsecusa-nov13-v2
I mas appsecusa-nov13-v2drewz lin
 
Defeating xss-and-xsrf-with-my faces-frameworks-steve-wolf
Defeating xss-and-xsrf-with-my faces-frameworks-steve-wolfDefeating xss-and-xsrf-with-my faces-frameworks-steve-wolf
Defeating xss-and-xsrf-with-my faces-frameworks-steve-wolfdrewz lin
 
Csrf not-all-defenses-are-created-equal
Csrf not-all-defenses-are-created-equalCsrf not-all-defenses-are-created-equal
Csrf not-all-defenses-are-created-equaldrewz lin
 
Chuck willis-owaspbwa-beyond-1.0-app secusa-2013-11-21
Chuck willis-owaspbwa-beyond-1.0-app secusa-2013-11-21Chuck willis-owaspbwa-beyond-1.0-app secusa-2013-11-21
Chuck willis-owaspbwa-beyond-1.0-app secusa-2013-11-21drewz lin
 
Appsec usa roberthansen
Appsec usa roberthansenAppsec usa roberthansen
Appsec usa roberthansendrewz lin
 
Appsec usa2013 js_libinsecurity_stefanodipaola
Appsec usa2013 js_libinsecurity_stefanodipaolaAppsec usa2013 js_libinsecurity_stefanodipaola
Appsec usa2013 js_libinsecurity_stefanodipaoladrewz lin
 
Appsec2013 presentation-dickson final-with_all_final_edits
Appsec2013 presentation-dickson final-with_all_final_editsAppsec2013 presentation-dickson final-with_all_final_edits
Appsec2013 presentation-dickson final-with_all_final_editsdrewz lin
 
Appsec2013 presentation
Appsec2013 presentationAppsec2013 presentation
Appsec2013 presentationdrewz lin
 
Appsec 2013-krehel-ondrej-forensic-investigations-of-web-exploitations
Appsec 2013-krehel-ondrej-forensic-investigations-of-web-exploitationsAppsec 2013-krehel-ondrej-forensic-investigations-of-web-exploitations
Appsec 2013-krehel-ondrej-forensic-investigations-of-web-exploitationsdrewz lin
 
Appsec2013 assurance tagging-robert martin
Appsec2013 assurance tagging-robert martinAppsec2013 assurance tagging-robert martin
Appsec2013 assurance tagging-robert martindrewz lin
 
Amol scadaowasp
Amol scadaowaspAmol scadaowasp
Amol scadaowaspdrewz lin
 
Agile sdlc-v1.1-owasp-app sec-usa
Agile sdlc-v1.1-owasp-app sec-usaAgile sdlc-v1.1-owasp-app sec-usa
Agile sdlc-v1.1-owasp-app sec-usadrewz lin
 
Vulnex app secusa2013
Vulnex app secusa2013Vulnex app secusa2013
Vulnex app secusa2013drewz lin
 
基于虚拟化技术的分布式软件测试框架
基于虚拟化技术的分布式软件测试框架基于虚拟化技术的分布式软件测试框架
基于虚拟化技术的分布式软件测试框架drewz lin
 
新浪微博稳定性经验谈
新浪微博稳定性经验谈新浪微博稳定性经验谈
新浪微博稳定性经验谈drewz lin
 

Mais de drewz lin (20)

Web security-–-everything-we-know-is-wrong-eoin-keary
Web security-–-everything-we-know-is-wrong-eoin-kearyWeb security-–-everything-we-know-is-wrong-eoin-keary
Web security-–-everything-we-know-is-wrong-eoin-keary
 
Via forensics appsecusa-nov-2013
Via forensics appsecusa-nov-2013Via forensics appsecusa-nov-2013
Via forensics appsecusa-nov-2013
 
Phu appsec13
Phu appsec13Phu appsec13
Phu appsec13
 
Owasp2013 johannesullrich
Owasp2013 johannesullrichOwasp2013 johannesullrich
Owasp2013 johannesullrich
 
Owasp advanced mobile-application-code-review-techniques-v0.2
Owasp advanced mobile-application-code-review-techniques-v0.2Owasp advanced mobile-application-code-review-techniques-v0.2
Owasp advanced mobile-application-code-review-techniques-v0.2
 
I mas appsecusa-nov13-v2
I mas appsecusa-nov13-v2I mas appsecusa-nov13-v2
I mas appsecusa-nov13-v2
 
Defeating xss-and-xsrf-with-my faces-frameworks-steve-wolf
Defeating xss-and-xsrf-with-my faces-frameworks-steve-wolfDefeating xss-and-xsrf-with-my faces-frameworks-steve-wolf
Defeating xss-and-xsrf-with-my faces-frameworks-steve-wolf
 
Csrf not-all-defenses-are-created-equal
Csrf not-all-defenses-are-created-equalCsrf not-all-defenses-are-created-equal
Csrf not-all-defenses-are-created-equal
 
Chuck willis-owaspbwa-beyond-1.0-app secusa-2013-11-21
Chuck willis-owaspbwa-beyond-1.0-app secusa-2013-11-21Chuck willis-owaspbwa-beyond-1.0-app secusa-2013-11-21
Chuck willis-owaspbwa-beyond-1.0-app secusa-2013-11-21
 
Appsec usa roberthansen
Appsec usa roberthansenAppsec usa roberthansen
Appsec usa roberthansen
 
Appsec usa2013 js_libinsecurity_stefanodipaola
Appsec usa2013 js_libinsecurity_stefanodipaolaAppsec usa2013 js_libinsecurity_stefanodipaola
Appsec usa2013 js_libinsecurity_stefanodipaola
 
Appsec2013 presentation-dickson final-with_all_final_edits
Appsec2013 presentation-dickson final-with_all_final_editsAppsec2013 presentation-dickson final-with_all_final_edits
Appsec2013 presentation-dickson final-with_all_final_edits
 
Appsec2013 presentation
Appsec2013 presentationAppsec2013 presentation
Appsec2013 presentation
 
Appsec 2013-krehel-ondrej-forensic-investigations-of-web-exploitations
Appsec 2013-krehel-ondrej-forensic-investigations-of-web-exploitationsAppsec 2013-krehel-ondrej-forensic-investigations-of-web-exploitations
Appsec 2013-krehel-ondrej-forensic-investigations-of-web-exploitations
 
Appsec2013 assurance tagging-robert martin
Appsec2013 assurance tagging-robert martinAppsec2013 assurance tagging-robert martin
Appsec2013 assurance tagging-robert martin
 
Amol scadaowasp
Amol scadaowaspAmol scadaowasp
Amol scadaowasp
 
Agile sdlc-v1.1-owasp-app sec-usa
Agile sdlc-v1.1-owasp-app sec-usaAgile sdlc-v1.1-owasp-app sec-usa
Agile sdlc-v1.1-owasp-app sec-usa
 
Vulnex app secusa2013
Vulnex app secusa2013Vulnex app secusa2013
Vulnex app secusa2013
 
基于虚拟化技术的分布式软件测试框架
基于虚拟化技术的分布式软件测试框架基于虚拟化技术的分布式软件测试框架
基于虚拟化技术的分布式软件测试框架
 
新浪微博稳定性经验谈
新浪微博稳定性经验谈新浪微博稳定性经验谈
新浪微博稳定性经验谈
 

Último

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Último (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Pinterest arch summit august 2012 - scaling pinterest

  • 1. 感谢您参加本次Ar h u c S mmi全球架构师峰会! t 大会官方网站与资料下载地址: www. c um m i . om ar hs tc
  • 2. Scaling Marty Weiner Evrhet Milam Krypton Batcave 12年8月10⽇日星期五 TODO: Title page names Pass on page title consistency Fill out numbers Put "always test in production" pin in screenshot of website
  • 3. Pinterest is . . . An online pinboard to organize and share what inspires you. Scaling Pinterest 12年8月10⽇日星期五 Talking points should be presented with any key phrases in bold, and everything else regular weight. All text should always be centered.
  • 4. 12年8月10⽇日星期五 Talking points should be presented with any key phrases in bold, and everything else regular weight. All text should always be centered.
  • 5. 12年8月10⽇日星期五 Images should be full-bleed when possible. Captions should be succinct and appear in bold in the bottom right corner. If necessary, you can make this white to make it legible (like this one).
  • 6. 12年8月10⽇日星期五 Talking points should be presented with any key phrases in bold, and everything else regular weight. All text should always be centered.
  • 7. Relationships Marty Weiner Grayskull, Eternia Yashh Nelapati Gotham City Scaling Pinterest 12年8月10⽇日星期五 If you have to use a list, do this. But remember, the less stuff on a slide, the better
  • 8. Page Views / Day · RackSpace · 1 small Web Engine · Mar 2010 Jan 2011 Jan 2012 1 small MySQL DB · 1 Engineer Mar 2010 Jan 2011 Jan 2012 May 2012 Scaling Pinterest 12年8月10⽇日星期五 4-5 mts.
  • 9. · Amazon EC2 + Page Views / Day S3 + CloudFront · 1 NGinX, 4 Web Engines · 1 MySQL DB + 1 Read Slave · 1 Task Queue + 2 Task Processors · 1 MongoDB Mar 2010 · 2 Engineers Jan 2011 Jan 2012 May 2012 Scaling Pinterest 12年8月10⽇日星期五 TODO: Show total somewhere
  • 10. · Amazon EC2 + S3 + CloudFront · 2 NGinX, 16 Web Engines + 2 API Engines Page Views / Day · 5 Functionally Sharded MySQL DB + 9 read slaves · 4 Cassandra Nodes · 15 Membase Nodes (3 separate clusters) · 8 Memcache Nodes · 10 Redis Nodes Mar 2010· Mar 2010 3 Task Routers + 4 Task Processors Jan 2011 Jan 2011 Jan 2012 Jan 2012 May 2012 · 4 Elastic Search Nodes · 3 Mongo Clusters · 3 Engineers Scaling Pinterest 12年8月10⽇日星期五
  • 11. Lesson Learned #1 It will fail. Keep it simple. Scaling Pinterest 12年8月10⽇日星期五 Talking points should be presented with any key phrases in bold, and everything else regular weight. All text should always be centered.
  • 12. · Amazon EC2 + S3 + Akamai, ELB Page Views / Day · 90 Web Engines + 50 API Engines · 66 MySQL DBs (m1.xlarge) + 1 slave each · 59 Redis Instances · 51 Memcache Instances · 1 Redis Task Manager + 25 Task Processors Mar 2010 · Sharded Solr Jan 2011 Jan 2012 May 2012 · 6 Engineers Scaling Pinterest 12年8月10⽇日星期五
  • 13. · Amazon EC2 + S3 + Edge Cast, ELB Page Views / Day · 135 Web Engines + 75 API Engines · 80 MySQL DBs (m1.xlarge) + 1 slave each · 110 Redis Instances · 60 Memcache Instances · 2 Redis Task Manager + 60 Task Processors Mar 2010 · Sharded Solr Jan 2011 Jan 2012 May 2012 · 25 Engineers Scaling Pinterest 12年8月10⽇日星期五
  • 14. Why Amazon EC2/S3? · Very good reliability, reporting, and support · Very good peripherals, such as managed cache, DB, load balancing, DNS, map reduce, and more... · New instances ready in seconds · Con: Limited choice · Pro: Limited choice Scaling Pinterest 12年8月10⽇日星期五 If you have to use a list, do this. But remember, the less stuff on a slide, the better
  • 15. Why MySQL? · Extremely mature · Well known and well liked · Rarely catastrophic loss of data · Response time to request rate increases linearly · Very good software support - XtraBackup, Innotop, Maatkit · Solid active community · Very good support from Percona · Free Scaling Pinterest 12年8月10⽇日星期五 TODO: Animate in a money bag
  • 16. Why Memcache? · Extremely mature · Very good performance · Well known and well liked · Never crashes, and few failure modes · Free Scaling Pinterest 12年8月10⽇日星期五 If you have to use a list, do this. But remember, the less stuff on a slide, the better
  • 17. Why Redis? · Variety of convenient data structures · Has persistence and replication · Well known and well liked · Consistently good performance · Few failure modes · Free Scaling Pinterest 12年8月10⽇日星期五 Data structures -- list, set, sorted set, pubsub, string. All support atomic operations. All support pipelining.
  • 18. Clustering vs Sharding Scaling Pinterest 12年8月10⽇日星期五 clustered = automatically managed nodes with autoscaling, sharding = engineer assigns how data is distributed, splits databases, and writes code to redistribute data (if that option is supported)
  • 19. Clustering · Data distributed automatically · Data can move · Rebalances to distribute capacity · Nodes communicate with each othe Sharding Scaling Pinterest 12年8月10⽇日星期五 clustered = automatically managed nodes with autoscaling, sharding = engineer assigns how data is distributed, splits databases, and writes code to redistribute data (if that option is supported). Clustered nodes gossip with each other to determine if rebalancing is needed
  • 20. Clustering · Data distributed manually · Data does not move · Split data to distribute load · Nodes are not aware of each other Sharding Scaling Pinterest 12年8月10⽇日星期五 clustered = automatically managed nodes with autoscaling, sharding = engineer assigns how data is distributed, splits databases, and writes code to redistribute data (if that option is supported). Clustered nodes gossip with each other to determine if rebalancing is needed
  • 21. Why Clustering? · Examples: Cassandra, MemBase, HBase, Riak · Automatically scale your datastore · Easy to set up · Spatially distribute and colocate your data · High availability · Load balancing · No single point of failure Scaling Pinterest 12年8月10⽇日星期五 What could go wrong?
  • 22. What could possibly go wrong? source: thereifixedit.com Scaling Pinterest 12年8月10⽇日星期五 If you have to use a list, do this. But remember, the less stuff on a slide, the better
  • 23. Why Not Clustering? · Still fairly young · Fundamentally complicated · Less community support · Fewer engineers with working knowledge · Difficult and scary upgrade mechanisms · And, yes, there is a single point of failure. A BIG one. Scaling Pinterest 12年8月10⽇日星期五 I lied, there is major single point of failure. The cluster management system. What if it fails? When will it fail? How will you, a non DB developer, fix it? Clustering is a pipe dream. Maybe someday it will work -- same day when there's a perfect makefile system, version control, bug management
  • 24. Clustering Single Point of Failure Cluster Management Algorithm Scaling Pinterest 12年8月10⽇日星期五 joke (without names) about how when the cluster manager fails, load balancing may stop one night and keep you up til 6am, or the cluster manager sprays bad data to all nodes and teh CEO of the cluster company contacts you telling you your data is gone and can he buy you a pizza.
  • 25. Cluster Manager · Same complex code replicated over all nodes · Failure modes: · Data rebalance breaks · Data corruption across all nodes · Improper balancing that cannot be fixed (easily) · Data authority failure Scaling Pinterest 12年8月10⽇日星期五 What could go wrong?
  • 26. Lesson Learned #2 Clustering is scary. Scaling Pinterest 12年8月10⽇日星期五 Swap speakers after this slide
  • 27. Why Sharding? · Can split your databases to add more capacity · Spatially distribute and colocate your data · High availability · Load balancing · Algorithm for placing data is very simple · ID generation is simplistic Scaling Pinterest 12年8月10⽇日星期五 ID management is a single point of failure too, but much simpler and easy to test/debug.
  • 28. When to shard? · Sharding makes schema design harder · Solidify site design and backend architecture · Remove all joins and complex queries, add cache · Functionally shard as much as possible · Still growing? Shard. Scaling Pinterest 12年8月10⽇日星期五 Maybe a pictograph here showing sharding early = faster transition, but unnecessary complexity. later = slower transition
  • 29. Our Transition 1 DB + Foreign Keys + Joins 1 DB + Denormalized + Cache + Read slaves + 1 DB Cache Several functionally sharded DBs + Read slaves + Cache ID sharded DBs + Backup slaves + Cache Scaling Pinterest 12年8月10⽇日星期五 Another possible splitting point
  • 30. Watch out for... · Cannot perform most JOINS · No transaction capabilities · Extra effort to maintain unique constraints · Schema changes requires more planning · Reports require running same query on all shards Scaling Pinterest 12年8月10⽇日星期五 Another possible splitting point
  • 31. How we sharded Scaling Pinterest 12年8月10⽇日星期五
  • 32. Sharded Server Topology db00001 db00513 db03072 db03584 db00002 db00514 db03073 db03585 ....... ....... ....... ....... db00512 db01024 db03583 db04096 Initially, 8 physical servers, each with 512 DBs Scaling Pinterest 12年8月10⽇日星期五 If you have to use a list, do this. But remember, the less stuff on a slide, the better
  • 33. High Availability db00001 db00513 db03072 db03584 db00002 db00514 db03073 db03585 ....... ....... ....... ....... db00512 db01024 db03583 db04096 Multi Master replication Scaling Pinterest 12年8月10⽇日星期五 If you have to use a list, do this. But remember, the less stuff on a slide, the better
  • 34. Increased load on DB? db00001 db00002 ....... db00256 db00001 db00002 ....... db00257 db00512 db00258 ....... db00512 To increase capacity, a server is replicated and the new replica becomes responsible for some DBs Scaling Pinterest 12年8月10⽇日星期五 If you have to use a list, do this. But remember, the less stuff on a slide, the better
  • 35. ID Structure 64 bits Shard ID Type Local ID · A lookup data structure has physical server to shard ID range (cached by each app server process) · Shard ID denotes which shard · Type denotes object type (e.g., pins) · Local ID denotes position in table Scaling Pinterest 12年8月10⽇日星期五 If you have to use a list, do this. But remember, the less stuff on a slide, the better
  • 36. Lookup Structure {“sharddb001a”: ( 1, 512), “sharddb002b”: ( 513, 1024), “sharddb003a”: (1025, 1536), ... “sharddb008b”: (3585, 4096)} sharddb003a DB01025 users users 1 ser-data user_has_boards 2 ser-data boards 3 ser-data Scaling Pinterest 12年8月10⽇日星期五 If you have to use a list, do this. But remember, the less stuff on a slide, the better
  • 37. ID Structure · New users are randomly distributed across shards · Boards, pins, etc. try to be collocated with user · Local ID’s are assigned by auto-increment · Enough ID space for 65536 shards, but only first 4096 opened initially. Can expand horizontally. Scaling Pinterest 12年8月10⽇日星期五 If you have to use a list, do this. But remember, the less stuff on a slide, the better
  • 38. · Object tables (e.g., pin, board, user, comment) Objects and Mappings · Local ID MySQL blob (JSON / Serialized thrift) · Mapping tables (e.g., user has boards, pin has likes) · Full ID Full ID (+ timestamp) · Naming schema is noun_verb_noun · Queries are PK or index lookups (no joins) · Data DOES NOT MOVE · All tables exist on all shards · No schema changes required (index = new Scaling Pinterest table) 12年8月10⽇日星期五 If you have to use a list, do this. But remember, the less stuff on a slide, the better
  • 39. Loading a Page · Rendering user profile SELECT body FROM users WHERE id=<local_user_id> SELECT board_id FROM user_has_boards WHERE user_id=<user_id> SELECT body FROM boards WHERE id IN (<board_ids>) SELECT pin_id FROM board_has_pins WHERE board_id=<board_id> SELECT body FROM pins WHERE id IN (pin_ids) · Most of these calls will be a cache hit · Omitting offset/limits and mapping sequence id sort Scaling Pinterest 12年8月10⽇日星期五
  • 40. Scripting · Must get old data into your shiny new shard · 500M pins, 1.6B follower rows, etc · Build a scripting farm · Spawn more workers and complete the task faster · Pyres - based on Github’s Resque queue Scaling Pinterest 12年8月10⽇日星期五 If you have to use a list, do this. But remember, the less stuff on a slide, the better
  • 41. Future · Sharded MySQL is here to stay · Auto-sharding on top of MySQL becoming viable · Clustering may become hardened in 5 to 10 years Scaling Pinterest 12年8月10⽇日星期五 If you have to use a list, do this. But remember, the less stuff on a slide, the better
  • 42. In The Works · Service Based Architecture · Connection limits · Isolation of functionality · Isolation of access (security) · Scaling the Team Scaling Pinterest 12年8月10⽇日星期五 Connection limits + Isolation of functionality = service oriented architecture
  • 43. Lesson Learned #3 Keep it fun. Scaling Pinterest 12年8月10⽇日星期五 Swap speakers after this slide
  • 44. We are Hiring! jobs@pinterest.com Scaling Pinterest 12年8月10⽇日星期五 Connection limits + Isolation of functionality = service oriented architecture
  • 45. Questions? marty@pinterest.com yashh@pinterest.com evrhet@pinterest.com Scaling Pinterest 12年8月10⽇日星期五 Talking points should be presented with any key phrases in bold, and everything else regular weight. All text should always be centered.
  • 46. 杭州站·2012年 10月 25日 ~27日 大会官网:www.c n a g h uc m q o h n z o .o