SlideShare uma empresa Scribd logo
1 de 39
Baixar para ler offline
感谢您参加本次Ar h u
         c S mmi全球架构师峰会!
               t
大会官方网站与资料下载地址:
www. c um m i . om
   ar hs    tc
分享产品与技术 分享腾讯与互联网



更多精彩内容

   QQ基础数据库架构演变之路

   QQ空间技术架构之峥嵘岁月

   架构之美:开放环境下的网络架构

   查看更多
Sharding




               Marty Weiner              Evrhet Milam
               Metropolis                Batcave




12年8月10⽇日星期五
·   Amazon EC2 + S3 + CloudFront
                  ·   2 NGinX, 16 Web Engines + 2 API Engines
                                      Page Views / Day

                  ·   5 Functionally Sharded MySQL DB + 9 read slaves
                  ·   4 Cassandra Nodes
                  ·   15 Membase Nodes (3 separate clusters)
                  ·   8 Memcache Nodes
                  ·   10 Redis Nodes
          Mar 2010·
           Mar 2010
                      3 Task Routers + 4 Task Processors
                                    Jan 2011
                                               Jan 2011
                                                           Jan 2012
                                                                      Jan 2012
                                                                         May 2012

                  ·   4 Elastic Search Nodes
                  ·   3 Mongo Clusters
                  ·   3 Engineers
  Scaling Pinterest

12年8月10⽇日星期五
Clustering



                           ·   Data distributed automatically
                           ·   Data can move
                           ·   Rebalances to distribute capacity
                           ·   Nodes communicate with each othe



                Sharding
  Scaling Pinterest

12年8月10⽇日星期五
Clustering



                           ·   Data distributed manually
                           ·   Data does not move
                           ·   Split data to distribute load
                           ·   Nodes are not aware of each other



                Sharding
  Scaling Pinterest

12年8月10⽇日星期五
Our Transition
                          1 DB + Foreign Keys + Joins
                          1 DB + Denormalized +
                          Cache + Read slaves +
                           1 DB
                      Cache
         Several functionally sharded DBs + Read slaves +
         Cache ID sharded DBs + Backup slaves +

                      Cache


  Scaling Pinterest

12年8月10⽇日星期五
How we sharded




  Scaling Pinterest

12年8月10⽇日星期五
Sharded Server Topology




                      db00001     db00513      db03072       db03584
                      db00002     db00514      db03073       db03585
                        .......     .......      .......       .......
                      db00512     db01024      db03583       db04096


                      Initially, 8 physical servers, each with 512
                      DBs
  Scaling Pinterest

12年8月10⽇日星期五
High Availability




                      db00001        db00513     db03072     db03584
                      db00002        db00514     db03073     db03585
                        .......        .......     .......     .......
                      db00512        db01024     db03583     db04096

                         Multi Master replication for hot swapping
                         All writes/reads go to only 1 master
  Scaling Pinterest

12年8月10⽇日星期五
Configuration
         · Initially 8 masters, each with 512 DBs
         · Now at 80 masters, some 32 DBs and some 64
              DBs
              · Over 10 TB
         · All tables are InnoDB



  Scaling Pinterest

12年8月10⽇日星期五
Configuration
         · Ubuntu 11.10 on AWS managed by Puppet
         · m1.xlarge using 4 stripe-raided ephemeral
              drives
         · cc2.8xlarge using 4 stripe-raided ephemeral
              drives
         · Soon: hi1.4xlarge (SSD)



  Scaling Pinterest

12年8月10⽇日星期五
ID Structure
                                    64 bits


                      Shard ID   Type         Local ID
         · A lookup data structure has physical server to
              shard ID range (cached by each app server
              process)
         · Shard ID denotes which shard
         · Type denotes object type (e.g., pins)
         · Local ID denotes position in table
  Scaling Pinterest

12年8月10⽇日星期五
Lookup Structure

                                    {“sharddb001a”:   (   1, 512),
                                     “sharddb002b”:   ( 513, 1024),
                                     “sharddb003a”:   (1025, 1536),
                                      ...
                                     “sharddb008b”:   (3585, 4096)}




                      sharddb003a                      DB01025            users


                                                        users         1    json
                                                  user_has_boards     2    json
                                                       boards         3    json



  Scaling Pinterest

12年8月10⽇日星期五
ID Generation
         · New users are randomly distributed across
              shards
         · Boards, pins, etc. try to be collocated with user
         · Local ID’s are assigned by auto-increment
              mysql_query(“USE pbdata%d” % shard_id)
              local_id = mysql_query(“INSERT INTO users (body) VALUES (%s)”, json)
              full_id = shard_id << 46 | USER_TYPE << 10 | local_id


         · Enough ID space for 65536 shards, but only
              first 4096 opened initially. Can expand
              horizontally.
  Scaling Pinterest

12年8月10⽇日星期五
· Object tables (e.g., pin, board, user, comment)
                  Objects and Mappings
               · Local ID    MySQL blob (JSON / Serialized
                      thrift)
             · Mapping tables (e.g., user has boards, pin has
                  likes)
                  · Full ID     Full ID (+ sequence ID /
                      timestamp)
                  · Naming schema is noun_verb_noun
             ·    Queries are PK or index lookups (no joins)
             ·    Data DOES NOT MOVE
             ·    All tables exist on all shards
  Scaling Pinterest

12年8月10⽇日星期五
             ·    No schema changes required (index = new
Loading a Page
    · Rendering user profile
         SELECT       body FROM users WHERE id=<local_user_id>
         SELECT       board_id FROM user_has_boards WHERE user_id=<user_id>
         SELECT       body FROM boards WHERE id IN (<board_ids>)
         SELECT       pin_id FROM board_has_pins WHERE board_id=<board_id>
         SELECT       body FROM pins WHERE id IN (pin_ids)
    · Most of these calls will be a cache hit
    · Omitting offset/limits and mapping sequence
         id sort

  Scaling Pinterest

12年8月10⽇日星期五
Increased load on DB?
                                         sharddb001a

                                                       db00001
                                                       db00002
                                                       db00003
         {“sharddb001a”:   (   1, 4),                  db00004
          “sharddb002b”:   (   5, 8),
          “sharddb003a”:   (   9, 12),
           ...
          “sharddb008b”:   ( 29, 32)}




                                         sharddb001b




                 To increase capacity, a server is replicated
                 and the new replica becomes responsible for
                 some DBs
  Scaling Pinterest

12年8月10⽇日星期五
Increased load on DB?
                                         sharddb001a

                                                       db00001
                                                       db00002
                                                       db00003
         {“sharddb001a”:   (   1, 4),                  db00004   sharddb009a
          “sharddb002b”:   (   5, 8),
          “sharddb003a”:   (   9, 12),                                         db00001
           ...                                                                 db00002
          “sharddb008b”:   ( 29, 32)}                                          db00003
                                                                               db00004




                                         sharddb001b




                                                                 sharddb009b


                 To increase capacity, a server is replicated
                 and the new replica becomes responsible for
                 some DBs
  Scaling Pinterest

12年8月10⽇日星期五
Increased load on DB?
                                         sharddb001a

                                                       db00001
                                                       db00002
                                                       db00003
         {“sharddb001a”:   (   1, 2),                  db00004   sharddb009a
          “sharddb002b”:   (   5, 8),
          “sharddb003a”:   (   9, 12),                                         db00001
           ...                                                                 db00002
          “sharddb008b”:   ( 29, 32),                                          db00003
          “sharddb009a”:   ( 3, 4)}                                            db00004




                                         sharddb001b




                                                                 sharddb009b


                 To increase capacity, a server is replicated
                 and the new replica becomes responsible for
                 some DBs
  Scaling Pinterest

12年8月10⽇日星期五
Increased load on DB?
                                         sharddb001a

                                                       db00001
                                                       db00002
                                                       db00003
         {“sharddb001a”:   (   1, 2),                  db00004   sharddb009a
          “sharddb002b”:   (   5, 8),
          “sharddb003a”:   (   9, 12),                                         db00001
           ...                                                                 db00002
          “sharddb008b”:   ( 29, 32),                                          db00003
          “sharddb009a”:   ( 3, 4)}                                            db00004




                                         sharddb001b




                                                                 sharddb009b


                 To increase capacity, a server is replicated
                 and the new replica becomes responsible for
                 some DBs
  Scaling Pinterest

12年8月10⽇日星期五
Increased load on DB?
                                         sharddb001a

                                                       db00001
                                                       db00002

         {“sharddb001a”:   (   1, 2),                            sharddb009a
          “sharddb002b”:   (   5, 8),
          “sharddb003a”:   (   9, 12),
           ...
          “sharddb008b”:   ( 29, 32),                                          db00003
          “sharddb009a”:   ( 3, 4)}                                            db00004




                                         sharddb001b




                                                                 sharddb009b


                 To increase capacity, a server is replicated
                 and the new replica becomes responsible for
                 some DBs
  Scaling Pinterest

12年8月10⽇日星期五
Scripting
         · Must get old data into your shiny new shard
         · 500M pins, 1.6B follower rows, etc
         · Build a scripting farm with Pyres + Redis
           · Spawn more workers and complete the task
                  faster
              · Can push millions of jobs onto the queue


  Scaling Pinterest

12年8月10⽇日星期五
Sharding Alternatives




  Scaling Pinterest

12年8月10⽇日星期五
· ID generated by Generation
                      ID shard + type + auto-
              increment
              ·   Pinterest, Facebook
              ·   Pro: ID generation scales with size of shard
              ·   Pro: No routing services which can fail
              ·   Pro: No holes in ID space
              ·   Pro: Type verification built-in
              ·   Con: No global ordering (requires sequence
                  ID)
              · Con: Can’t move parts of databases around
  Scaling Pinterest

12年8月10⽇日星期五
ID Generation
         · ID contains timestamp + shard
           · Instagram
           · Pro: Global ordering
           · Pro: No routing services which can fail
           · Con: No type verification
           · Con: Can’t move parts of databases around
           · Con: Not much space for shard ID’s


  Scaling Pinterest

12年8月10⽇日星期五
ID Generation
         ·    ID generated by service
              ·   Twitter, Etsy
              ·   Requires: ID routing service
              ·   Pro: Possible global ordering
              ·   Pro: Can move individual data elements
              ·   Pro: Can perform type-checking
              ·   Con: ID routing service can fail
              ·   Con: ID routing service can be unwieldy and
                  complicated
  Scaling Pinterest

12年8月10⽇日星期五
Problems




  Scaling Pinterest

12年8月10⽇日星期五
Hotspots
        · Our board_followedby_users tables were
             unbalanced
            · Rearchitected to have
                 board_followedby_user_shard map boards to
                 shards containing subsets of the
                 board_followedby_users data.
            · Pro: Data evenly distributed
            · Con: May have to hit multiple shards

  Scaling Pinterest

12年8月10⽇日星期五
No Joins
          · Home is user_follows_board JOIN board_has_pins
            · Each user has a home feed redis list
            · New pin ID placed in each following user’s list

                      for user_id in board_followedby_users:
                         redisconn.lpush(“home:%s” % user_id, new_pin_id)




  Scaling Pinterest

12年8月10⽇日星期五
No Automatic Failover
             · Some fraction of users see timeouts if a DB fails
             · Manually change settings and re-deploy
             · Needs to be automated, especially with 128+
                  DBs
             · Moving to automation with Zookeeper




  Scaling Pinterest

12年8月10⽇日星期五
Caching




  Scaling Pinterest

12年8月10⽇日星期五
Caching
               · Redis lists for persistently caching homefeeds
                 · lpush homefeed:82363 7233494
                 · lrange homefeed:82363 0 49
               · Memcache for caching lists
                 · MessagePack to serialize data
                 · Append to add new data
                 · Never delete from middle
               · Memcache for caching objects
  Scaling Pinterest

12年8月10⽇日星期五
Caching
                 · Caches separated by pools
                   · Example: pin objects, board_has_pins, etc
                   · Better stats
                   · Easier to scale and manage




  Scaling Pinterest

12年8月10⽇日星期五
Caching with decorators
                @mc_objects(USER_MC_CONN,
                      format = “user:%d”,
                     version=1,
                     serialization=simplejson
                     expire_time=0)
                def user_get_many(self, user_ids):
                       cursor = get_mysql_conn().cursor
                       cursor.execute(“SELECT * FROM users WHERE id in (%s)”,
                       user_ids)
                       return cursor.fetchall()




  Scaling Pinterest

12年8月10⽇日星期五
Caching with decorators

                @paged_list(BOARD_MC_CONN,
                     format = “b:p:%d”,
                    version=1,
                    expire_time=24*60*60)
                def get_board_pins(self, board_id, offset, limit):
                       cursor = get_mysql_conn().cursor
                       cursor.execute(“SELECT pin_id FROM bp WHERE id =%s OFFSET
                                           %d LIMIT %d”, board_id, offset, limit)
                       return cursor.fetchall()




  Scaling Pinterest

12年8月10⽇日星期五
We are Hiring!
                       jobs@pinterest.com




  Scaling Pinterest

12年8月10⽇日星期五
Questions?

                      marty@pinterest.com   evrhet@pinterest.com




  Scaling Pinterest

12年8月10⽇日星期五
杭州站·2012年 10月 25日 ~27日
大会官网:www.c n a g h uc m
        q o h n z o .o

Mais conteúdo relacionado

Mais procurados

ENIB 2015 2016 - CAI Web S02E03 - Forge JS 2/4 - MongoDB and NoSQL
ENIB 2015 2016 - CAI Web S02E03 - Forge JS 2/4 - MongoDB and NoSQLENIB 2015 2016 - CAI Web S02E03 - Forge JS 2/4 - MongoDB and NoSQL
ENIB 2015 2016 - CAI Web S02E03 - Forge JS 2/4 - MongoDB and NoSQLHoracio Gonzalez
 
Couchbase Korea User Group 2nd Meetup #2
Couchbase Korea User Group 2nd Meetup #2Couchbase Korea User Group 2nd Meetup #2
Couchbase Korea User Group 2nd Meetup #2won min jang
 
Simplifying Persistence for Java and MongoDB with Morphia
Simplifying Persistence for Java and MongoDB with MorphiaSimplifying Persistence for Java and MongoDB with Morphia
Simplifying Persistence for Java and MongoDB with MorphiaMongoDB
 
Morphia, Spring Data & Co.
Morphia, Spring Data & Co.Morphia, Spring Data & Co.
Morphia, Spring Data & Co.Tobias Trelle
 
Morphia: Simplifying Persistence for Java and MongoDB
Morphia:  Simplifying Persistence for Java and MongoDBMorphia:  Simplifying Persistence for Java and MongoDB
Morphia: Simplifying Persistence for Java and MongoDBJeff Yemin
 
OrientDB the graph database
OrientDB the graph databaseOrientDB the graph database
OrientDB the graph databaseArtem Orobets
 
Couchdbkit djangocong-20100425
Couchdbkit djangocong-20100425Couchdbkit djangocong-20100425
Couchdbkit djangocong-20100425guest4f2eea
 
MongoDB + Java - Everything you need to know
MongoDB + Java - Everything you need to know MongoDB + Java - Everything you need to know
MongoDB + Java - Everything you need to know Norberto Leite
 
Java Persistence Frameworks for MongoDB
Java Persistence Frameworks for MongoDBJava Persistence Frameworks for MongoDB
Java Persistence Frameworks for MongoDBMongoDB
 
An introduction to CouchDB
An introduction to CouchDBAn introduction to CouchDB
An introduction to CouchDBDavid Coallier
 
Scala dsls-dissecting-and-implementing-rogue
Scala dsls-dissecting-and-implementing-rogueScala dsls-dissecting-and-implementing-rogue
Scala dsls-dissecting-and-implementing-rogueKonrad Malawski
 
Introduction To MongoDB
Introduction To MongoDBIntroduction To MongoDB
Introduction To MongoDBYnon Perek
 
MongoDB: Easy Java Persistence with Morphia
MongoDB: Easy Java Persistence with MorphiaMongoDB: Easy Java Persistence with Morphia
MongoDB: Easy Java Persistence with MorphiaScott Hernandez
 
03 Custom Classes
03 Custom Classes03 Custom Classes
03 Custom ClassesMahmoud
 
Cassandra Intro -- TheEdge2012
Cassandra Intro -- TheEdge2012Cassandra Intro -- TheEdge2012
Cassandra Intro -- TheEdge2012robosonia Mar
 

Mais procurados (20)

ENIB 2015 2016 - CAI Web S02E03 - Forge JS 2/4 - MongoDB and NoSQL
ENIB 2015 2016 - CAI Web S02E03 - Forge JS 2/4 - MongoDB and NoSQLENIB 2015 2016 - CAI Web S02E03 - Forge JS 2/4 - MongoDB and NoSQL
ENIB 2015 2016 - CAI Web S02E03 - Forge JS 2/4 - MongoDB and NoSQL
 
Couchbase Korea User Group 2nd Meetup #2
Couchbase Korea User Group 2nd Meetup #2Couchbase Korea User Group 2nd Meetup #2
Couchbase Korea User Group 2nd Meetup #2
 
Querydsl
QuerydslQuerydsl
Querydsl
 
hibernate
hibernatehibernate
hibernate
 
Jndi (1)
Jndi (1)Jndi (1)
Jndi (1)
 
Simplifying Persistence for Java and MongoDB with Morphia
Simplifying Persistence for Java and MongoDB with MorphiaSimplifying Persistence for Java and MongoDB with Morphia
Simplifying Persistence for Java and MongoDB with Morphia
 
Morphia, Spring Data & Co.
Morphia, Spring Data & Co.Morphia, Spring Data & Co.
Morphia, Spring Data & Co.
 
Morphia: Simplifying Persistence for Java and MongoDB
Morphia:  Simplifying Persistence for Java and MongoDBMorphia:  Simplifying Persistence for Java and MongoDB
Morphia: Simplifying Persistence for Java and MongoDB
 
OrientDB the graph database
OrientDB the graph databaseOrientDB the graph database
OrientDB the graph database
 
Couchdbkit djangocong-20100425
Couchdbkit djangocong-20100425Couchdbkit djangocong-20100425
Couchdbkit djangocong-20100425
 
MongoDB + Java - Everything you need to know
MongoDB + Java - Everything you need to know MongoDB + Java - Everything you need to know
MongoDB + Java - Everything you need to know
 
Storekit diagram
Storekit diagramStorekit diagram
Storekit diagram
 
Java Persistence Frameworks for MongoDB
Java Persistence Frameworks for MongoDBJava Persistence Frameworks for MongoDB
Java Persistence Frameworks for MongoDB
 
An introduction to CouchDB
An introduction to CouchDBAn introduction to CouchDB
An introduction to CouchDB
 
09 Data
09 Data09 Data
09 Data
 
Scala dsls-dissecting-and-implementing-rogue
Scala dsls-dissecting-and-implementing-rogueScala dsls-dissecting-and-implementing-rogue
Scala dsls-dissecting-and-implementing-rogue
 
Introduction To MongoDB
Introduction To MongoDBIntroduction To MongoDB
Introduction To MongoDB
 
MongoDB: Easy Java Persistence with Morphia
MongoDB: Easy Java Persistence with MorphiaMongoDB: Easy Java Persistence with Morphia
MongoDB: Easy Java Persistence with Morphia
 
03 Custom Classes
03 Custom Classes03 Custom Classes
03 Custom Classes
 
Cassandra Intro -- TheEdge2012
Cassandra Intro -- TheEdge2012Cassandra Intro -- TheEdge2012
Cassandra Intro -- TheEdge2012
 

Destaque

Electronic Report Distribution with OM Plus RM
Electronic Report Distribution with OM Plus RMElectronic Report Distribution with OM Plus RM
Electronic Report Distribution with OM Plus RMPlus Technologies
 
Printer Load Balancing Systems and Benefits
Printer Load Balancing Systems and BenefitsPrinter Load Balancing Systems and Benefits
Printer Load Balancing Systems and BenefitsPlus Technologies
 
Drawing and printmaking
Drawing and printmaking Drawing and printmaking
Drawing and printmaking Carmel Torres
 
Sarah and trevor hall
Sarah and trevor hallSarah and trevor hall
Sarah and trevor hall587554
 
Peritaje en la construcción de proyectos inmobiliarios 1
Peritaje en la construcción de proyectos inmobiliarios 1Peritaje en la construcción de proyectos inmobiliarios 1
Peritaje en la construcción de proyectos inmobiliarios 1Magaly Caba
 
Florida-John
Florida-JohnFlorida-John
Florida-Johnklei8103
 
Tennessee-Alize and Tesnim
Tennessee-Alize and TesnimTennessee-Alize and Tesnim
Tennessee-Alize and Tesnimklei8103
 
Kentucky-Tahsiyn and Kassidy
Kentucky-Tahsiyn and KassidyKentucky-Tahsiyn and Kassidy
Kentucky-Tahsiyn and Kassidyklei8103
 
"Сокровищницы Прибыльных Идей"
 "Сокровищницы Прибыльных Идей" "Сокровищницы Прибыльных Идей"
"Сокровищницы Прибыльных Идей"Marina Kiselevich
 
The Industrial Revolution 20
The  Industrial  Revolution 20The  Industrial  Revolution 20
The Industrial Revolution 20Grace Pangan
 
Mississippi-Evan and Teddy
Mississippi-Evan and Teddy Mississippi-Evan and Teddy
Mississippi-Evan and Teddy klei8103
 
Rahul network resume copy
Rahul  network resume   copyRahul  network resume   copy
Rahul network resume copyRahul patil
 
7. susret 17.11.2011. konkretno lice boga oca
7. susret 17.11.2011.   konkretno lice boga oca7. susret 17.11.2011.   konkretno lice boga oca
7. susret 17.11.2011. konkretno lice boga ocaMeri-Lucijeta
 
Introduction to-csharp-1229579367461426-1
Introduction to-csharp-1229579367461426-1Introduction to-csharp-1229579367461426-1
Introduction to-csharp-1229579367461426-1Sachin Singh
 

Destaque (20)

B2B Video 101
B2B Video 101B2B Video 101
B2B Video 101
 
Electronic Report Distribution with OM Plus RM
Electronic Report Distribution with OM Plus RMElectronic Report Distribution with OM Plus RM
Electronic Report Distribution with OM Plus RM
 
Dockerの導入
Dockerの導入Dockerの導入
Dockerの導入
 
Printer Load Balancing Systems and Benefits
Printer Load Balancing Systems and BenefitsPrinter Load Balancing Systems and Benefits
Printer Load Balancing Systems and Benefits
 
Drawing and printmaking
Drawing and printmaking Drawing and printmaking
Drawing and printmaking
 
Sarah and trevor hall
Sarah and trevor hallSarah and trevor hall
Sarah and trevor hall
 
Peritaje en la construcción de proyectos inmobiliarios 1
Peritaje en la construcción de proyectos inmobiliarios 1Peritaje en la construcción de proyectos inmobiliarios 1
Peritaje en la construcción de proyectos inmobiliarios 1
 
Florida-John
Florida-JohnFlorida-John
Florida-John
 
Pinball1
Pinball1Pinball1
Pinball1
 
Gypsy Girl
Gypsy Girl Gypsy Girl
Gypsy Girl
 
Tennessee-Alize and Tesnim
Tennessee-Alize and TesnimTennessee-Alize and Tesnim
Tennessee-Alize and Tesnim
 
Bbfc ratings
Bbfc ratingsBbfc ratings
Bbfc ratings
 
Researchss
ResearchssResearchss
Researchss
 
Kentucky-Tahsiyn and Kassidy
Kentucky-Tahsiyn and KassidyKentucky-Tahsiyn and Kassidy
Kentucky-Tahsiyn and Kassidy
 
"Сокровищницы Прибыльных Идей"
 "Сокровищницы Прибыльных Идей" "Сокровищницы Прибыльных Идей"
"Сокровищницы Прибыльных Идей"
 
The Industrial Revolution 20
The  Industrial  Revolution 20The  Industrial  Revolution 20
The Industrial Revolution 20
 
Mississippi-Evan and Teddy
Mississippi-Evan and Teddy Mississippi-Evan and Teddy
Mississippi-Evan and Teddy
 
Rahul network resume copy
Rahul  network resume   copyRahul  network resume   copy
Rahul network resume copy
 
7. susret 17.11.2011. konkretno lice boga oca
7. susret 17.11.2011.   konkretno lice boga oca7. susret 17.11.2011.   konkretno lice boga oca
7. susret 17.11.2011. konkretno lice boga oca
 
Introduction to-csharp-1229579367461426-1
Introduction to-csharp-1229579367461426-1Introduction to-csharp-1229579367461426-1
Introduction to-csharp-1229579367461426-1
 

Semelhante a Pinterest的数据库分片架构

Plmce2012 scaling pinterest
Plmce2012 scaling pinterestPlmce2012 scaling pinterest
Plmce2012 scaling pinterestMohit Jain
 
Pinterest arch summit august 2012 - scaling pinterest
Pinterest arch summit   august 2012 - scaling pinterestPinterest arch summit   august 2012 - scaling pinterest
Pinterest arch summit august 2012 - scaling pinterestdrewz lin
 
Castle enhanced Cassandra
Castle enhanced CassandraCastle enhanced Cassandra
Castle enhanced CassandraEric Evans
 
High-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and JavaHigh-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and Javasunnygleason
 
MongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & AnalyticsMongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & AnalyticsServer Density
 
Morning with MongoDB Paris 2012 - Accueil et Introductions
Morning with MongoDB Paris 2012 - Accueil et IntroductionsMorning with MongoDB Paris 2012 - Accueil et Introductions
Morning with MongoDB Paris 2012 - Accueil et IntroductionsMongoDB
 
Brisk hadoop june2011_sfjava
Brisk hadoop june2011_sfjavaBrisk hadoop june2011_sfjava
Brisk hadoop june2011_sfjavasrisatish ambati
 
Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)Jon Haddad
 
Getting started with Spark & Cassandra by Jon Haddad of Datastax
Getting started with Spark & Cassandra by Jon Haddad of DatastaxGetting started with Spark & Cassandra by Jon Haddad of Datastax
Getting started with Spark & Cassandra by Jon Haddad of DatastaxData Con LA
 
Databases for Storage Engineers
Databases for Storage EngineersDatabases for Storage Engineers
Databases for Storage EngineersThomas Kejser
 
Evolution of the DBA to Data Platform Administrator/Specialist
Evolution of the DBA to Data Platform Administrator/SpecialistEvolution of the DBA to Data Platform Administrator/Specialist
Evolution of the DBA to Data Platform Administrator/SpecialistTony Rogerson
 
Big Data Analytics: Finding diamonds in the rough with Azure
Big Data Analytics: Finding diamonds in the rough with AzureBig Data Analytics: Finding diamonds in the rough with Azure
Big Data Analytics: Finding diamonds in the rough with AzureChristos Charmatzis
 
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...MongoDB
 
Spring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataSpring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataRoger Xia
 
Mongodb - Scaling write performance
Mongodb - Scaling write performanceMongodb - Scaling write performance
Mongodb - Scaling write performanceDaum DNA
 
Scaling Big Data Mining Infrastructure Twitter Experience
Scaling Big Data Mining Infrastructure Twitter ExperienceScaling Big Data Mining Infrastructure Twitter Experience
Scaling Big Data Mining Infrastructure Twitter ExperienceDataWorks Summit
 

Semelhante a Pinterest的数据库分片架构 (20)

Plmce2012 scaling pinterest
Plmce2012 scaling pinterestPlmce2012 scaling pinterest
Plmce2012 scaling pinterest
 
Pinterest arch summit august 2012 - scaling pinterest
Pinterest arch summit   august 2012 - scaling pinterestPinterest arch summit   august 2012 - scaling pinterest
Pinterest arch summit august 2012 - scaling pinterest
 
Castle enhanced Cassandra
Castle enhanced CassandraCastle enhanced Cassandra
Castle enhanced Cassandra
 
High-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and JavaHigh-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and Java
 
MongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & AnalyticsMongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & Analytics
 
Morning with MongoDB Paris 2012 - Accueil et Introductions
Morning with MongoDB Paris 2012 - Accueil et IntroductionsMorning with MongoDB Paris 2012 - Accueil et Introductions
Morning with MongoDB Paris 2012 - Accueil et Introductions
 
Brisk hadoop june2011_sfjava
Brisk hadoop june2011_sfjavaBrisk hadoop june2011_sfjava
Brisk hadoop june2011_sfjava
 
NoSQL
NoSQLNoSQL
NoSQL
 
Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)
 
Getting started with Spark & Cassandra by Jon Haddad of Datastax
Getting started with Spark & Cassandra by Jon Haddad of DatastaxGetting started with Spark & Cassandra by Jon Haddad of Datastax
Getting started with Spark & Cassandra by Jon Haddad of Datastax
 
Everyday - mongodb
Everyday - mongodbEveryday - mongodb
Everyday - mongodb
 
No Sql
No SqlNo Sql
No Sql
 
Databases for Storage Engineers
Databases for Storage EngineersDatabases for Storage Engineers
Databases for Storage Engineers
 
Evolution of the DBA to Data Platform Administrator/Specialist
Evolution of the DBA to Data Platform Administrator/SpecialistEvolution of the DBA to Data Platform Administrator/Specialist
Evolution of the DBA to Data Platform Administrator/Specialist
 
Big Data Analytics: Finding diamonds in the rough with Azure
Big Data Analytics: Finding diamonds in the rough with AzureBig Data Analytics: Finding diamonds in the rough with Azure
Big Data Analytics: Finding diamonds in the rough with Azure
 
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
 
Spring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataSpring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_data
 
Brisk hadoop june2011
Brisk hadoop june2011Brisk hadoop june2011
Brisk hadoop june2011
 
Mongodb - Scaling write performance
Mongodb - Scaling write performanceMongodb - Scaling write performance
Mongodb - Scaling write performance
 
Scaling Big Data Mining Infrastructure Twitter Experience
Scaling Big Data Mining Infrastructure Twitter ExperienceScaling Big Data Mining Infrastructure Twitter Experience
Scaling Big Data Mining Infrastructure Twitter Experience
 

Último

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 

Último (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 

Pinterest的数据库分片架构

  • 1. 感谢您参加本次Ar h u c S mmi全球架构师峰会! t 大会官方网站与资料下载地址: www. c um m i . om ar hs tc
  • 2. 分享产品与技术 分享腾讯与互联网 更多精彩内容 QQ基础数据库架构演变之路 QQ空间技术架构之峥嵘岁月 架构之美:开放环境下的网络架构 查看更多
  • 3. Sharding Marty Weiner Evrhet Milam Metropolis Batcave 12年8月10⽇日星期五
  • 4. · Amazon EC2 + S3 + CloudFront · 2 NGinX, 16 Web Engines + 2 API Engines Page Views / Day · 5 Functionally Sharded MySQL DB + 9 read slaves · 4 Cassandra Nodes · 15 Membase Nodes (3 separate clusters) · 8 Memcache Nodes · 10 Redis Nodes Mar 2010· Mar 2010 3 Task Routers + 4 Task Processors Jan 2011 Jan 2011 Jan 2012 Jan 2012 May 2012 · 4 Elastic Search Nodes · 3 Mongo Clusters · 3 Engineers Scaling Pinterest 12年8月10⽇日星期五
  • 5. Clustering · Data distributed automatically · Data can move · Rebalances to distribute capacity · Nodes communicate with each othe Sharding Scaling Pinterest 12年8月10⽇日星期五
  • 6. Clustering · Data distributed manually · Data does not move · Split data to distribute load · Nodes are not aware of each other Sharding Scaling Pinterest 12年8月10⽇日星期五
  • 7. Our Transition 1 DB + Foreign Keys + Joins 1 DB + Denormalized + Cache + Read slaves + 1 DB Cache Several functionally sharded DBs + Read slaves + Cache ID sharded DBs + Backup slaves + Cache Scaling Pinterest 12年8月10⽇日星期五
  • 8. How we sharded Scaling Pinterest 12年8月10⽇日星期五
  • 9. Sharded Server Topology db00001 db00513 db03072 db03584 db00002 db00514 db03073 db03585 ....... ....... ....... ....... db00512 db01024 db03583 db04096 Initially, 8 physical servers, each with 512 DBs Scaling Pinterest 12年8月10⽇日星期五
  • 10. High Availability db00001 db00513 db03072 db03584 db00002 db00514 db03073 db03585 ....... ....... ....... ....... db00512 db01024 db03583 db04096 Multi Master replication for hot swapping All writes/reads go to only 1 master Scaling Pinterest 12年8月10⽇日星期五
  • 11. Configuration · Initially 8 masters, each with 512 DBs · Now at 80 masters, some 32 DBs and some 64 DBs · Over 10 TB · All tables are InnoDB Scaling Pinterest 12年8月10⽇日星期五
  • 12. Configuration · Ubuntu 11.10 on AWS managed by Puppet · m1.xlarge using 4 stripe-raided ephemeral drives · cc2.8xlarge using 4 stripe-raided ephemeral drives · Soon: hi1.4xlarge (SSD) Scaling Pinterest 12年8月10⽇日星期五
  • 13. ID Structure 64 bits Shard ID Type Local ID · A lookup data structure has physical server to shard ID range (cached by each app server process) · Shard ID denotes which shard · Type denotes object type (e.g., pins) · Local ID denotes position in table Scaling Pinterest 12年8月10⽇日星期五
  • 14. Lookup Structure {“sharddb001a”: ( 1, 512), “sharddb002b”: ( 513, 1024), “sharddb003a”: (1025, 1536), ... “sharddb008b”: (3585, 4096)} sharddb003a DB01025 users users 1 json user_has_boards 2 json boards 3 json Scaling Pinterest 12年8月10⽇日星期五
  • 15. ID Generation · New users are randomly distributed across shards · Boards, pins, etc. try to be collocated with user · Local ID’s are assigned by auto-increment mysql_query(“USE pbdata%d” % shard_id) local_id = mysql_query(“INSERT INTO users (body) VALUES (%s)”, json) full_id = shard_id << 46 | USER_TYPE << 10 | local_id · Enough ID space for 65536 shards, but only first 4096 opened initially. Can expand horizontally. Scaling Pinterest 12年8月10⽇日星期五
  • 16. · Object tables (e.g., pin, board, user, comment) Objects and Mappings · Local ID MySQL blob (JSON / Serialized thrift) · Mapping tables (e.g., user has boards, pin has likes) · Full ID Full ID (+ sequence ID / timestamp) · Naming schema is noun_verb_noun · Queries are PK or index lookups (no joins) · Data DOES NOT MOVE · All tables exist on all shards Scaling Pinterest 12年8月10⽇日星期五 · No schema changes required (index = new
  • 17. Loading a Page · Rendering user profile SELECT body FROM users WHERE id=<local_user_id> SELECT board_id FROM user_has_boards WHERE user_id=<user_id> SELECT body FROM boards WHERE id IN (<board_ids>) SELECT pin_id FROM board_has_pins WHERE board_id=<board_id> SELECT body FROM pins WHERE id IN (pin_ids) · Most of these calls will be a cache hit · Omitting offset/limits and mapping sequence id sort Scaling Pinterest 12年8月10⽇日星期五
  • 18. Increased load on DB? sharddb001a db00001 db00002 db00003 {“sharddb001a”: ( 1, 4), db00004 “sharddb002b”: ( 5, 8), “sharddb003a”: ( 9, 12), ... “sharddb008b”: ( 29, 32)} sharddb001b To increase capacity, a server is replicated and the new replica becomes responsible for some DBs Scaling Pinterest 12年8月10⽇日星期五
  • 19. Increased load on DB? sharddb001a db00001 db00002 db00003 {“sharddb001a”: ( 1, 4), db00004 sharddb009a “sharddb002b”: ( 5, 8), “sharddb003a”: ( 9, 12), db00001 ... db00002 “sharddb008b”: ( 29, 32)} db00003 db00004 sharddb001b sharddb009b To increase capacity, a server is replicated and the new replica becomes responsible for some DBs Scaling Pinterest 12年8月10⽇日星期五
  • 20. Increased load on DB? sharddb001a db00001 db00002 db00003 {“sharddb001a”: ( 1, 2), db00004 sharddb009a “sharddb002b”: ( 5, 8), “sharddb003a”: ( 9, 12), db00001 ... db00002 “sharddb008b”: ( 29, 32), db00003 “sharddb009a”: ( 3, 4)} db00004 sharddb001b sharddb009b To increase capacity, a server is replicated and the new replica becomes responsible for some DBs Scaling Pinterest 12年8月10⽇日星期五
  • 21. Increased load on DB? sharddb001a db00001 db00002 db00003 {“sharddb001a”: ( 1, 2), db00004 sharddb009a “sharddb002b”: ( 5, 8), “sharddb003a”: ( 9, 12), db00001 ... db00002 “sharddb008b”: ( 29, 32), db00003 “sharddb009a”: ( 3, 4)} db00004 sharddb001b sharddb009b To increase capacity, a server is replicated and the new replica becomes responsible for some DBs Scaling Pinterest 12年8月10⽇日星期五
  • 22. Increased load on DB? sharddb001a db00001 db00002 {“sharddb001a”: ( 1, 2), sharddb009a “sharddb002b”: ( 5, 8), “sharddb003a”: ( 9, 12), ... “sharddb008b”: ( 29, 32), db00003 “sharddb009a”: ( 3, 4)} db00004 sharddb001b sharddb009b To increase capacity, a server is replicated and the new replica becomes responsible for some DBs Scaling Pinterest 12年8月10⽇日星期五
  • 23. Scripting · Must get old data into your shiny new shard · 500M pins, 1.6B follower rows, etc · Build a scripting farm with Pyres + Redis · Spawn more workers and complete the task faster · Can push millions of jobs onto the queue Scaling Pinterest 12年8月10⽇日星期五
  • 24. Sharding Alternatives Scaling Pinterest 12年8月10⽇日星期五
  • 25. · ID generated by Generation ID shard + type + auto- increment · Pinterest, Facebook · Pro: ID generation scales with size of shard · Pro: No routing services which can fail · Pro: No holes in ID space · Pro: Type verification built-in · Con: No global ordering (requires sequence ID) · Con: Can’t move parts of databases around Scaling Pinterest 12年8月10⽇日星期五
  • 26. ID Generation · ID contains timestamp + shard · Instagram · Pro: Global ordering · Pro: No routing services which can fail · Con: No type verification · Con: Can’t move parts of databases around · Con: Not much space for shard ID’s Scaling Pinterest 12年8月10⽇日星期五
  • 27. ID Generation · ID generated by service · Twitter, Etsy · Requires: ID routing service · Pro: Possible global ordering · Pro: Can move individual data elements · Pro: Can perform type-checking · Con: ID routing service can fail · Con: ID routing service can be unwieldy and complicated Scaling Pinterest 12年8月10⽇日星期五
  • 28. Problems Scaling Pinterest 12年8月10⽇日星期五
  • 29. Hotspots · Our board_followedby_users tables were unbalanced · Rearchitected to have board_followedby_user_shard map boards to shards containing subsets of the board_followedby_users data. · Pro: Data evenly distributed · Con: May have to hit multiple shards Scaling Pinterest 12年8月10⽇日星期五
  • 30. No Joins · Home is user_follows_board JOIN board_has_pins · Each user has a home feed redis list · New pin ID placed in each following user’s list for user_id in board_followedby_users: redisconn.lpush(“home:%s” % user_id, new_pin_id) Scaling Pinterest 12年8月10⽇日星期五
  • 31. No Automatic Failover · Some fraction of users see timeouts if a DB fails · Manually change settings and re-deploy · Needs to be automated, especially with 128+ DBs · Moving to automation with Zookeeper Scaling Pinterest 12年8月10⽇日星期五
  • 32. Caching Scaling Pinterest 12年8月10⽇日星期五
  • 33. Caching · Redis lists for persistently caching homefeeds · lpush homefeed:82363 7233494 · lrange homefeed:82363 0 49 · Memcache for caching lists · MessagePack to serialize data · Append to add new data · Never delete from middle · Memcache for caching objects Scaling Pinterest 12年8月10⽇日星期五
  • 34. Caching · Caches separated by pools · Example: pin objects, board_has_pins, etc · Better stats · Easier to scale and manage Scaling Pinterest 12年8月10⽇日星期五
  • 35. Caching with decorators @mc_objects(USER_MC_CONN, format = “user:%d”, version=1, serialization=simplejson expire_time=0) def user_get_many(self, user_ids): cursor = get_mysql_conn().cursor cursor.execute(“SELECT * FROM users WHERE id in (%s)”, user_ids) return cursor.fetchall() Scaling Pinterest 12年8月10⽇日星期五
  • 36. Caching with decorators @paged_list(BOARD_MC_CONN, format = “b:p:%d”, version=1, expire_time=24*60*60) def get_board_pins(self, board_id, offset, limit): cursor = get_mysql_conn().cursor cursor.execute(“SELECT pin_id FROM bp WHERE id =%s OFFSET %d LIMIT %d”, board_id, offset, limit) return cursor.fetchall() Scaling Pinterest 12年8月10⽇日星期五
  • 37. We are Hiring! jobs@pinterest.com Scaling Pinterest 12年8月10⽇日星期五
  • 38. Questions? marty@pinterest.com evrhet@pinterest.com Scaling Pinterest 12年8月10⽇日星期五
  • 39. 杭州站·2012年 10月 25日 ~27日 大会官网:www.c n a g h uc m q o h n z o .o