SlideShare uma empresa Scribd logo
1 de 198
Baixar para ler offline
Scaling


                                      100x
                                 in six months



                         by Eric Saxby & Konstantin Gredeskoul
                                        April 2013

                                                                 Proprietary and
Thursday, April 18, 13                                           Confidential      1
What is Wanelo?
            ■            Wanelo (“Wah-nee-lo” from Want, Need
                         Love) is a global platform for shopping.




                                                                Proprietary and
Thursday, April 18, 13                                          Confidential      2
What is Wanelo?
            ■            Wanelo (“Wah-nee-lo” from Want, Need
                         Love) is a global platform for shopping.




                                                                Proprietary and
Thursday, April 18, 13                                          Confidential      2
■          It’s marketing-free shopping across
                         100s of thousands of unique stores

                                                               Proprietary and
Thursday, April 18, 13                                         Confidential      3
Personal Activity Feed...
                                                     Proprietary and
Thursday, April 18, 13                               Confidential      4
Personal Activity Feed...
                                                     Proprietary and
Thursday, April 18, 13                               Confidential      4
iOS + Android
                                         Proprietary and
Thursday, April 18, 13                   Confidential      5
iOS + Android
                                         Proprietary and
Thursday, April 18, 13                   Confidential      5
Early Decisions




                           Proprietary and
Thursday, April 18, 13     Confidential      6
Early Decisions

            ■ Optimize for iteration speed, not
                         performance




                                                  Proprietary and
Thursday, April 18, 13                            Confidential      6
Early Decisions

            ■ Optimize for iteration speed, not
                         performance


            ■ Keep scalability in mind, track metrics,
                         and fix as needed




                                                  Proprietary and
Thursday, April 18, 13                            Confidential      6
Early Decisions

            ■ Optimize for iteration speed, not
                         performance


            ■ Keep scalability in mind, track metrics,
                         and fix as needed

            ■ Introduce many levels of caching early

                                                  Proprietary and
Thursday, April 18, 13                            Confidential      6
Technology Timeline




                               Proprietary and
Thursday, April 18, 13         Confidential      7
Technology Timeline

            ■            2010 - 2011
                         Wanelo v1 stack is Java, JSP, MySQL, Hibernate
                         90K lines of code, 53+ DB tables, no tests




                                                                    Proprietary and
Thursday, April 18, 13                                              Confidential      7
Technology Timeline

            ■            2010 - 2011
                         Wanelo v1 stack is Java, JSP, MySQL, Hibernate
                         90K lines of code, 53+ DB tables, no tests



            ■            May 2012 - June 2012
                         Rewrite from scratch to RoR on PostgreSQL (v2)




                                                                    Proprietary and
Thursday, April 18, 13                                              Confidential      7
Technology Timeline

            ■            2010 - 2011
                         Wanelo v1 stack is Java, JSP, MySQL, Hibernate
                         90K lines of code, 53+ DB tables, no tests



            ■            May 2012 - June 2012
                         Rewrite from scratch to RoR on PostgreSQL (v2)
                               ■   Ruby app is 10K LOC, full test coverage, 8
                                   database tables, less features



                                                                            Proprietary and
Thursday, April 18, 13                                                      Confidential      7
The “Big” Rewrite




                             Proprietary and
Thursday, April 18, 13       Confidential      8
The “Big” Rewrite




                         More info here....




                                              Proprietary and
Thursday, April 18, 13                        Confidential      8
The “Big” Rewrite




                                   More info here....




                               building.wanelo.com/
                         http://


                                                        Proprietary and
Thursday, April 18, 13                                  Confidential      8
The “Big” Rewrite




                                   More info here....




                               building.wanelo.com/
                         http://


                                                        Proprietary and
Thursday, April 18, 13                                  Confidential      8
Growth Timeline




                           Proprietary and
Thursday, April 18, 13     Confidential      9
Growth Timeline
           ■         06/2012 - RoR App Relaunches




                                                    Proprietary and
Thursday, April 18, 13                              Confidential      9
Growth Timeline
           ■         06/2012 - RoR App Relaunches
                      ■ 2-3K requests per minute (RPM) peak




                                                              Proprietary and
Thursday, April 18, 13                                        Confidential      9
Growth Timeline
           ■         06/2012 - RoR App Relaunches
                      ■ 2-3K requests per minute (RPM) peak

            ■            08/2012 - iOS App is launched




                                                              Proprietary and
Thursday, April 18, 13                                        Confidential      9
Growth Timeline
           ■         06/2012 - RoR App Relaunches
                      ■ 2-3K requests per minute (RPM) peak

            ■            08/2012 - iOS App is launched
                          ■ 10-40K RPM peak




                                                              Proprietary and
Thursday, April 18, 13                                        Confidential      9
Growth Timeline
           ■         06/2012 - RoR App Relaunches
                      ■ 2-3K requests per minute (RPM) peak

            ■            08/2012 - iOS App is launched
                          ■ 10-40K RPM peak

            ■            12/2012 - Android app launched




                                                              Proprietary and
Thursday, April 18, 13                                        Confidential      9
Growth Timeline
           ■         06/2012 - RoR App Relaunches
                      ■ 2-3K requests per minute (RPM) peak

            ■            08/2012 - iOS App is launched
                          ■ 10-40K RPM peak

            ■            12/2012 - Android app launched
                          ■ 40-120K RPM peak




                                                              Proprietary and
Thursday, April 18, 13                                        Confidential      9
Growth Timeline
           ■         06/2012 - RoR App Relaunches
                      ■ 2-3K requests per minute (RPM) peak

            ■            08/2012 - iOS App is launched
                          ■ 10-40K RPM peak

            ■            12/2012 - Android app launched
                          ■ 40-120K RPM peak

           ■         03/2013 - #24 top free apps iTunes

                                                              Proprietary and
Thursday, April 18, 13                                        Confidential      9
Growth Timeline
           ■         06/2012 - RoR App Relaunches
                      ■ 2-3K requests per minute (RPM) peak

            ■            08/2012 - iOS App is launched
                          ■ 10-40K RPM peak

            ■            12/2012 - Android app launched
                          ■ 40-120K RPM peak

           ■         03/2013 - #24 top free apps iTunes
                      ■ 80-200K RPM peak
                                                              Proprietary and
Thursday, April 18, 13                                        Confidential      9
Requests Per Minute (RPM)




                                     Proprietary and
Thursday, April 18, 13               Confidential      10
Current Numbers...

                         ■   4M active monthly users
                         ■   5M products saved 700M times
                         ■   8M products saved per day
                         ■   200k stores




                                                            Proprietary and
Thursday, April 18, 13                                      Confidential      11
Backend Stack & Key Vendors
                         ■   MRI Ruby 1.9.3 & Rails 3.2
                         ■   PostgreSQL 9.2.4, Solr 3.6
                         ■   Joyent Cloud, SmartOS
                             ZFS, ARC, raw IO performance, SmartOS, CPU bursting, dTrace


                         ■   Circonus, Chef + Opscode
                             Monitoring, graphing, alerting, automation


                         ■   Amazon S3 + Fastly CDN
                         ■   NewRelic, statsd, Graphite, nagios

                                                                                 Proprietary and
Thursday, April 18, 13                                                           Confidential      12
Wanelo Web Architecture

                                        nginx
                6 x 2GB
                                      haproxy




                                    unicorn x 14                          sidekiq
               20 x 8GB
                                                                                                  4 x 8GB
                          haproxy    pgbouncer     twemproxy   haproxy   pgbouncer    twemproxy




                           Solr                  PostgreSQL    Redis                 MemCached




                                                                                                      Proprietary and
Thursday, April 18, 13                                                                                Confidential      13
This talk is about:




                               Proprietary and
Thursday, April 18, 13         Confidential      14
This talk is about:

           1. How much traffic can your database handle?




                                                  Proprietary and
Thursday, April 18, 13                            Confidential      14
This talk is about:

           1. How much traffic can your database handle?

           2. Special report on counters




                                                  Proprietary and
Thursday, April 18, 13                            Confidential      14
This talk is about:

           1. How much traffic can your database handle?

           2. Special report on counters

           3. Scaling database reads




                                                  Proprietary and
Thursday, April 18, 13                            Confidential      14
This talk is about:

           1. How much traffic can your database handle?

           2. Special report on counters

           3. Scaling database reads


           4. Scaling database writes


                                                  Proprietary and
Thursday, April 18, 13                            Confidential      14
1.
            How much traffic can your
            database handle?



Thursday, April 18, 13                  15
PostgreSQL is Awesome!




                                       Proprietary and
Thursday, April 18, 13                 Confidential      16
PostgreSQL is Awesome!
            ■ Does a fantastic job of not corrupting
                         your data




                                                  Proprietary and
Thursday, April 18, 13                            Confidential      16
PostgreSQL is Awesome!
            ■ Does a fantastic job of not corrupting
                         your data

            ■ Streaming replication in 9.2 is
                         extremely reliable




                                                  Proprietary and
Thursday, April 18, 13                            Confidential      16
PostgreSQL is Awesome!
            ■ Does a fantastic job of not corrupting
                         your data

            ■ Streaming replication in 9.2 is
                         extremely reliable

            ■ Won’t write to a read-only replica


                                                   Proprietary and
Thursday, April 18, 13                             Confidential      16
PostgreSQL is Awesome!
            ■ Does a fantastic job of not corrupting
                         your data

            ■ Streaming replication in 9.2 is
                         extremely reliable

            ■ Won’t write to a read-only replica

            ■ But... No master/master replication
                                                    Proprietary and
Thursday, April 18, 13                              Confidential      16
PostgreSQL is Awesome!
            ■ Does a fantastic job of not corrupting
                         your data

            ■ Streaming replication in 9.2 is
                         extremely reliable

            ■ Won’t write to a read-only replica

            ■ But... No master/master replication
                                              (good!)
                                                        Proprietary and
Thursday, April 18, 13                                  Confidential      16
Is the database healthy?




                                                Proprietary and
Thursday, April 18, 13                          Confidential      17
What’s healthy?




                                Proprietary and
Thursday, April 18, 13          Confidential      18
What’s healthy?

            ■ Able to respond quickly to queries from
                         application (< 4ms disk seek time)




                                                              Proprietary and
Thursday, April 18, 13                                        Confidential      18
What’s healthy?

            ■ Able to respond quickly to queries from
                         application (< 4ms disk seek time)

            ■ Has enough room to grow




                                                              Proprietary and
Thursday, April 18, 13                                        Confidential      18
What’s healthy?

            ■ Able to respond quickly to queries from
                         application (< 4ms disk seek time)

            ■ Has enough room to grow

            ■ How do we know when we’re
                         approaching a dangerous threshold?


                                                              Proprietary and
Thursday, April 18, 13                                        Confidential      18
Oops!




                         NewRelic Latency (yellow = database)

                                                                Proprietary and
Thursday, April 18, 13                                          Confidential      19
Oops!




                         NewRelic Latency (yellow = database)

                                                                Proprietary and
Thursday, April 18, 13                                          Confidential      19
pg_stat_statements

            ■ Maybe your app is to blame for
                         performance...
                         	
  	
  select	
  	
  	
  query,	
  calls,	
  total_time	
  
                         	
  	
  from	
  	
  	
  	
  	
  pg_stat_statements	
  
                         	
  	
  order	
  by	
  total_time	
  desc	
  limit	
  12;




                                                                                        Proprietary and
Thursday, April 18, 13                                                                  Confidential      20
pg_stat_statements

            ■ Maybe your app is to blame for
                         performance...
                         	
  	
  select	
  	
  	
  query,	
  calls,	
  total_time	
  
                         	
  	
  from	
  	
  	
  	
  	
  pg_stat_statements	
  
                         	
  	
  order	
  by	
  total_time	
  desc	
  limit	
  12;



                     Similar to Percona Toolkit, but runs all the
                     time collecting stats.


                                                                                        Proprietary and
Thursday, April 18, 13                                                                  Confidential      20
pg_stat_statements




                                   Proprietary and
Thursday, April 18, 13             Confidential      21
pg_stat_user_indexes

            ■ Using indexes as much as you think
                         you are?




            ■ Using indexes at all?

                                               Proprietary and
Thursday, April 18, 13                         Confidential      22
pg_stat_user_indexes

            ■ Using indexes as much as you think
                         you are?




            ■ Using indexes at all?

                                               Proprietary and
Thursday, April 18, 13                         Confidential      22
pg_stat_user_tables
            ■ Full table scans? (seq_scan)




                                             Proprietary and
Thursday, April 18, 13                       Confidential      23
pg_stat_user_tables
            ■ Full table scans? (seq_scan)




                                             Proprietary and
Thursday, April 18, 13                       Confidential      23
Throw that in a graph




                         Reads/second for one large table, daily

                                                               Proprietary and
Thursday, April 18, 13                                         Confidential      24
Non-linear changes




                         Suspicious spike!

                                             Proprietary and
Thursday, April 18, 13                       Confidential      25
Correlate different data




                         Deployments! Aha!

                                             Proprietary and
Thursday, April 18, 13                       Confidential      26
Utilization vs Saturation




                         # of Active PostgreSQL connections


                                                              Proprietary and
Thursday, April 18, 13                                        Confidential      27
Utilization vs Saturation




                         Red line: % of max connections established
                             Purple: % of connections in query

                                                                Proprietary and
Thursday, April 18, 13                                          Confidential      28
Disk reads/writes




                                  green: reads, red: writes




                                                   Proprietary and
Thursday, April 18, 13                             Confidential      29
Disk reads/writes




                                                      green: reads, red: writes


                         Usage increases, but are the disks saturated?


                                                                       Proprietary and
Thursday, April 18, 13                                                 Confidential      29
Utilization vs Saturation




                                          Proprietary and
Thursday, April 18, 13                    Confidential      30
Utilization vs Saturation




                                          Proprietary and
Thursday, April 18, 13                    Confidential      30
Utilization vs Saturation




                                                    [
                         How much are you waiting on disk?


                                                             Proprietary and
Thursday, April 18, 13                                       Confidential      31
File system cache (ARC)




                                        Proprietary and
Thursday, April 18, 13                  Confidential      32
File system cache (ARC)




                                        Proprietary and
Thursday, April 18, 13                  Confidential      32
File system cache (ARC)




                                        Proprietary and
Thursday, April 18, 13                  Confidential      32
Watch the right things




                         Hit ratio of the file system cache (ARC)

                                                                   Proprietary and
Thursday, April 18, 13                                             Confidential      33
Watch the right things




                         Hit ratio of the file system cache (ARC)

                                                                   Proprietary and
Thursday, April 18, 13                                             Confidential      33
Room to grow...




                         Size (including indexes) of a key table



                                                                   Proprietary and
Thursday, April 18, 13                                             Confidential      34
Working set in RAM?




                         Adding index increases the size


                                                           Proprietary and
Thursday, April 18, 13                                     Confidential      35
Working set in RAM?




                         Adding index increases the size


                                                           Proprietary and
Thursday, April 18, 13                                     Confidential      35
Collect all the data you can




                         Once we knew where to look, graphs added
                               later could explain behavior we could
                                                 only guess at earlier

                                                               Proprietary and
Thursday, April 18, 13                                         Confidential      36
Collect all the data you can




                         Once we knew where to look, graphs added
                               later could explain behavior we could
                                                 only guess at earlier

                                                               Proprietary and
Thursday, April 18, 13                                         Confidential      36
2.
            Special report on
            Counters and Pagination



Thursday, April 18, 13                37
Problem #1: DB Latency Up...




                                       Proprietary and
Thursday, April 18, 13                 Confidential      38
Problem #1: DB Latency Up...

              ■          iostat shows 100% disk busy




                                                       Proprietary and
Thursday, April 18, 13                                 Confidential      38
Problem #1: DB Latency Up...

              ■          iostat shows 100% disk busy

             device	
  	
  	
  	
  	
  	
  r/s	
  	
  	
  	
  w/s	
  	
  	
  Mr/s	
  	
  	
  Mw/s	
  wait	
  actv	
  	
  svc_t	
  	
  %w	
  	
  %b	
  

             sd1	
  	
  	
  	
  	
  	
  	
  384.0	
  1157.5	
  	
  	
  48.0	
  	
  116.8	
  	
  0.0	
  	
  8.8	
  	
  	
  	
  5.7	
  	
  	
  2	
  100	
  
             sd1	
  	
  	
  	
  	
  	
  	
  368.0	
  1117.9	
  	
  	
  45.7	
  	
  106.3	
  	
  0.0	
  	
  8.0	
  	
  	
  	
  5.4	
  	
  	
  2	
  100	
  
             sd1	
  	
  	
  	
  	
  	
  	
  330.3	
  1357.5	
  	
  	
  41.3	
  	
  139.1	
  	
  0.0	
  	
  9.5	
  	
  	
  	
  5.6	
  	
  	
  2	
  100	
  




                                                                                                                                                    Proprietary and
Thursday, April 18, 13                                                                                                                              Confidential      38
Problem #1: DB Latency Up...

              ■          iostat shows 100% disk busy

             device	
  	
  	
  	
  	
  	
  r/s	
  	
  	
  	
  w/s	
  	
  	
  Mr/s	
  	
  	
  Mw/s	
  wait	
  actv	
  	
  svc_t	
  	
  %w	
  	
  %b	
  

             sd1	
  	
  	
  	
  	
  	
  	
  384.0	
  1157.5	
  	
  	
  48.0	
  	
  116.8	
  	
  0.0	
  	
  8.8	
  	
  	
  	
  5.7	
  	
  	
  2	
  100	
  
             sd1	
  	
  	
  	
  	
  	
  	
  368.0	
  1117.9	
  	
  	
  45.7	
  	
  106.3	
  	
  0.0	
  	
  8.0	
  	
  	
  	
  5.4	
  	
  	
  2	
  100	
  
             sd1	
  	
  	
  	
  	
  	
  	
  330.3	
  1357.5	
  	
  	
  41.3	
  	
  139.1	
  	
  0.0	
  	
  9.5	
  	
  	
  	
  5.6	
  	
  	
  2	
  100	
  




                                                                                                                                                    Proprietary and
Thursday, April 18, 13                                                                                                                              Confidential      38
Problem #1: DB Latency Up...

              ■          iostat shows 100% disk busy

             device	
  	
  	
  	
  	
  	
  r/s	
  	
  	
  	
  w/s	
  	
  	
  Mr/s	
  	
  	
  Mw/s	
  wait	
  actv	
  	
  svc_t	
  	
  %w	
  	
  %b	
  

             sd1	
  	
  	
  	
  	
  	
  	
  384.0	
  1157.5	
  	
  	
  48.0	
  	
  116.8	
  	
  0.0	
  	
  8.8	
  	
  	
  	
  5.7	
  	
  	
  2	
  100	
  
             sd1	
  	
  	
  	
  	
  	
  	
  368.0	
  1117.9	
  	
  	
  45.7	
  	
  106.3	
  	
  0.0	
  	
  8.0	
  	
  	
  	
  5.4	
  	
  	
  2	
  100	
  
             sd1	
  	
  	
  	
  	
  	
  	
  330.3	
  1357.5	
  	
  	
  41.3	
  	
  139.1	
  	
  0.0	
  	
  9.5	
  	
  	
  	
  5.6	
  	
  	
  2	
  100	
  




                                                                                                                                                    Proprietary and
Thursday, April 18, 13                                                                                                                              Confidential      38
Problem #1: Diagnostics




                                        Proprietary and
Thursday, April 18, 13                  Confidential      39
Problem #1: Diagnostics
            ■            Database is running very very hot.
                         Initial investigation shows large number of counts.




                                                                           Proprietary and
Thursday, April 18, 13                                                     Confidential      39
Problem #1: Diagnostics
            ■            Database is running very very hot.
                         Initial investigation shows large number of counts.



            ■            Turns out anytime you page with Kaminari, it
                         always does a count(*)!




                                                                           Proprietary and
Thursday, April 18, 13                                                     Confidential      39
Problem #1: Diagnostics
            ■            Database is running very very hot.
                         Initial investigation shows large number of counts.



            ■            Turns out anytime you page with Kaminari, it
                         always does a count(*)!

                         SELECT	
  "stores".*	
  FROM	
  "stores"	
  
                         	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  WHERE	
  (state	
  =	
  'approved')	
  
                         	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  LIMIT	
  20	
  OFFSET	
  0

                         SELECT	
  COUNT(*)	
  FROM	
  "stores"	
  WHERE	
  (state	
  =	
  'approved')



                                                                                                                                           Proprietary and
Thursday, April 18, 13                                                                                                                     Confidential      39
Problem #1: Pagination




                                       Proprietary and
Thursday, April 18, 13                 Confidential      40
Problem #1: Pagination




            ■            Doing count(*) is pretty expensive, as DB
                         must scan many rows (either the actual table
                         or an index)


                                                                 Proprietary and
Thursday, April 18, 13                                           Confidential      40
Problem #1: Pagination




                                       Proprietary and
Thursday, April 18, 13                 Confidential      41
Problem #1: Pagination

            ■            We are paginating everything! Even infinite
                         scroll is a paged view behind the scenes.




                                                                 Proprietary and
Thursday, April 18, 13                                           Confidential      41
Problem #1: Pagination

            ■            We are paginating everything! Even infinite
                         scroll is a paged view behind the scenes.



            ■            But we really DON’T want to run count(*) for
                         every paged view.




                                                                  Proprietary and
Thursday, April 18, 13                                            Confidential      41
Problem #1: Pagination

            ■            We are showing most popular stores
                         ■   Maybe it’s OK to hard-code the total number to,
                             say, 1000?




                                                                       Proprietary and
Thursday, April 18, 13                                                 Confidential      42
Problem #1: Pagination

            ■            We are showing most popular stores
                         ■   Maybe it’s OK to hard-code the total number to,
                             say, 1000?



            ■            How do we tell Kaminari NOT to issue a
                         count query in this case?



                                                                       Proprietary and
Thursday, April 18, 13                                                 Confidential      42
Problem #1: Pagination (ctd)




                                          Proprietary and
Thursday, April 18, 13                    Confidential      43
Solution #1: Monkey Patch!!




                                         Proprietary and
Thursday, April 18, 13                   Confidential      44
Solution #1: Monkey Patch!!




                                         Proprietary and
Thursday, April 18, 13                   Confidential      44
Solution #1: Pass in the
                counter




                                           Proprietary and
Thursday, April 18, 13                     Confidential      45
Solution #1: Pass in the
                counter




                   SELECT	
  "stores".*	
  FROM	
  "stores"	
  WHERE	
  (state	
  =	
  
                   'approved')	
  LIMIT	
  20	
  OFFSET	
  0




                                                                                   Proprietary and
Thursday, April 18, 13                                                             Confidential      45
Problem #2: Count Draculas
            ■            AKA: We still are doing too many counts!




                                                                    Proprietary and
Thursday, April 18, 13                                              Confidential      46
Problem #2: Count Draculas
            ■            AKA: We still are doing too many counts!




                                                                    Proprietary and
Thursday, April 18, 13                                              Confidential      46
Problem #2: Count Draculas
            ■            AKA: We still are doing too many counts!




              ■          Rails makes it so easy to do it the lazy way.


                                                                    Proprietary and
Thursday, April 18, 13                                              Confidential      46
Problem #2: Too Many Counts!
            ■            But it just doesn’t scale well




                                                          Proprietary and
Thursday, April 18, 13                                    Confidential      47
Problem #2: Too Many Counts!
            ■            But it just doesn’t scale well

            ■            Fortunately, Rails has just a feature for this...




                                                                      Proprietary and
Thursday, April 18, 13                                                Confidential      47
Problem #2: Too Many Counts!
            ■            But it just doesn’t scale well

            ■            Fortunately, Rails has just a feature for this...




                                                                      Proprietary and
Thursday, April 18, 13                                                Confidential      47
Counter Caches
            ■            Unfortunately, it has one massive issue:




                                                                    Proprietary and
Thursday, April 18, 13                                              Confidential      48
Counter Caches
            ■            Unfortunately, it has one massive issue:

            ■            It causes database deadlocks at high volume




                                                                    Proprietary and
Thursday, April 18, 13                                              Confidential      48
Counter Caches
            ■            Unfortunately, it has one massive issue:

            ■            It causes database deadlocks at high volume

                            ■   Because many ruby processes are creating child
                                records concurrently




                                                                        Proprietary and
Thursday, April 18, 13                                                  Confidential      48
Counter Caches
            ■            Unfortunately, it has one massive issue:

            ■            It causes database deadlocks at high volume

                            ■   Because many ruby processes are creating child
                                records concurrently

                            ■   Each is executing a callback, trying to update
                                counter_cache column on the parent, requiring
                                row-level lock



                                                                         Proprietary and
Thursday, April 18, 13                                                   Confidential      48
Counter Caches
            ■            Unfortunately, it has one massive issue:

            ■            It causes database deadlocks at high volume

                            ■   Because many ruby processes are creating child
                                records concurrently

                            ■   Each is executing a callback, trying to update
                                counter_cache column on the parent, requiring
                                row-level lock

                            ■   Deadlocks ensue
                                                                         Proprietary and
Thursday, April 18, 13                                                   Confidential      48
Possible Solution:
            Use Background Jobs




                                  Proprietary and
Thursday, April 18, 13            Confidential      49
Possible Solution:
            Use Background Jobs
          ■ It works like this:




                                  Proprietary and
Thursday, April 18, 13            Confidential      49
Possible Solution:
            Use Background Jobs
          ■ It works like this:
                         ■   As the record is created, we enqueue a request
                             to recalculate counter_cache on the parent




                                                                              Proprietary and
Thursday, April 18, 13                                                        Confidential      49
Possible Solution:
            Use Background Jobs
          ■ It works like this:
                         ■   As the record is created, we enqueue a request
                             to recalculate counter_cache on the parent


                         ■   The job performs a complete recalculation of
                             the counter cache and is idempotent




                                                                              Proprietary and
Thursday, April 18, 13                                                        Confidential      49
Solution #2: Explained




                                     Proprietary and
Thursday, April 18, 13               Confidential      50
Solution #2: Explained
               ■         Sidekiq with UniqueJob extension




                                                            Proprietary and
Thursday, April 18, 13                                      Confidential      50
Solution #2: Explained
               ■         Sidekiq with UniqueJob extension


              ■          Short wait for “buffering”




                                                            Proprietary and
Thursday, April 18, 13                                      Confidential      50
Solution #2: Explained
               ■         Sidekiq with UniqueJob extension


              ■          Short wait for “buffering”


               ■         Serialize updates via small number of workers




                                                                     Proprietary and
Thursday, April 18, 13                                               Confidential      50
Solution #2: Explained
               ■         Sidekiq with UniqueJob extension


              ■          Short wait for “buffering”


               ■         Serialize updates via small number of workers


               ■         Can temporarily stop workers (in an
                         emergency) to alleviate DB load



                                                                     Proprietary and
Thursday, April 18, 13                                               Confidential      50
Solution #2: Code




                                Proprietary and
Thursday, April 18, 13          Confidential      51
Things are better. BUT...




                                           Proprietary and
Thursday, April 18, 13                     Confidential      52
Things are better. BUT...
               Still too many fucking counts!




                                                Proprietary and
Thursday, April 18, 13                          Confidential      52
Things are better. BUT...
               Still too many fucking counts!

          ■         Even doing count(*) from workers is too
                    much on the databases




                                                        Proprietary and
Thursday, April 18, 13                                  Confidential      52
Things are better. BUT...
               Still too many fucking counts!

          ■         Even doing count(*) from workers is too
                    much on the databases

          ■         We need to stop doing count(*) in DB. But
                    keep counter_caches. How?




                                                        Proprietary and
Thursday, April 18, 13                                  Confidential      52
Things are better. BUT...
               Still too many fucking counts!

          ■         Even doing count(*) from workers is too
                    much on the databases

          ■         We need to stop doing count(*) in DB. But
                    keep counter_caches. How?

          ■         We could use Redis for this.

                                                        Proprietary and
Thursday, April 18, 13                                  Confidential      52
save product product_id



     Solution #3:
     Counts Deltas                                   unicorn




                                                                                        counter_cache column



                         1. INCR product_id       2. ProductCountWorker.enqueue product_id




                              Redis                                Redis
                                                                                             PostgreSQL
                             Counters                             Sidekiq




                               4. GET
                                                                3. Dequeue
                              5. RESET

                                                                         5. SQL Update INCR by N


                                                     sidekiq


                                                                                             Proprietary and
Thursday, April 18, 13                                                                       Confidential      53
save product product_id



     Solution #3:
     Counts Deltas                                                      unicorn




      ■        Web request increments
               counter value in Redis
                                                                                                           counter_cache column

      ■        Enqueues request to
               update counter_cache         1. INCR product_id       2. ProductCountWorker.enqueue product_id


      ■        Background Job picks up
               a few minutes later, reads        Redis                                Redis
                                                                                                                PostgreSQL
               Redis delta value, and           Counters                             Sidekiq
               removes it.

      ■        Updates counter_cache              4. GET
                                                 5. RESET
                                                                                   3. Dequeue

               column by incrementing it
               by delta.                                                                    5. SQL Update INCR by N


                                                                        sidekiq


                                                                                                                Proprietary and
Thursday, April 18, 13                                                                                          Confidential      53
Define counter_cache_on...




          ■ Internal GEM, will open source soon!
                                             Proprietary and
Thursday, April 18, 13                       Confidential      54
Can now use counter caches
                in pagination!




                                        Proprietary and
Thursday, April 18, 13                  Confidential      55
3.
            Scaling reads




Thursday, April 18, 13      56
Multiple optimization cycles

            ■            Caching
                         action caching, fragment, CDN


            ■            Personalization via AJAX
                         Cache the entire page, then add
                         personalized details


            ■            25ms/req memcached time is cheaper than
                         12ms/req of database time

                                                            Proprietary and
Thursday, April 18, 13                                      Confidential      57
Cache optimization




                                     40% hit ratio! Woo!
                              Wait... is that even good?




                                               Proprietary and
Thursday, April 18, 13                         Confidential      58
Cache optimization




                         Increasing your hit ratio means less
                              queries against your database


                                                      Proprietary and
Thursday, April 18, 13                                Confidential      59
Cache optimization




                           Caveat: even low hit ratio caches
                         can save your ass. You’re removing
                              load from the DB, remember?

                                                     Proprietary and
Thursday, April 18, 13                               Confidential      60
Cache saturation




       Blue: cache writes         How long before your caches
       Red: automatic evictions            start evicting data?


                                                       Proprietary and
Thursday, April 18, 13                                 Confidential      61
Cache saturation




       Blue: cache writes         How long before your caches
       Red: automatic evictions            start evicting data?


                                                       Proprietary and
Thursday, April 18, 13                                 Confidential      61
Cache saturation




       Blue: cache writes         How long before your caches
       Red: automatic evictions            start evicting data?


                                                       Proprietary and
Thursday, April 18, 13                                 Confidential      61
Ajax personalization




                                Proprietary and
Thursday, April 18, 13          Confidential      62
Ajax personalization




                                Proprietary and
Thursday, April 18, 13          Confidential      62
Ajax personalization




                                Proprietary and
Thursday, April 18, 13          Confidential      62
Nice!

            ■ Rails Action Caching
                         Runs before_filters, so A/B experiments can still run



            ■ Extremely fast pages
                         4ms application time for some of our
                         computationally heaviest pages



            ■ Could be served via CDN in the future
                                                                        Proprietary and
Thursday, April 18, 13                                                  Confidential      63
Sad trombone...

            ■ Are you actually logged in?
                         Pages don’t know until Ajax successfully runs



            ■ Selenium AND Jasmine tests!



                                                                         Proprietary and
Thursday, April 18, 13                                                   Confidential      64
Read/write splitting
            ■ Sometime in December 2012...




                                             Proprietary and
Thursday, April 18, 13                       Confidential      65
Read/write splitting
            ■ Sometime in December 2012...
                         ■   Database reaching 100% saturation




                                                             Proprietary and
Thursday, April 18, 13                                       Confidential      65
Read/write splitting
            ■ Sometime in December 2012...
                         ■   Database reaching 100% saturation

                         ■   Latency starting to increase non-linearly




                                                                 Proprietary and
Thursday, April 18, 13                                           Confidential      65
Read/write splitting
            ■ Sometime in December 2012...
                         ■   Database reaching 100% saturation

                         ■   Latency starting to increase non-linearly

                         ■   We need to distribute database load




                                                                 Proprietary and
Thursday, April 18, 13                                           Confidential      65
Read/write splitting
            ■ Sometime in December 2012...
                         ■   Database reaching 100% saturation

                         ■   Latency starting to increase non-linearly

                         ■   We need to distribute database load

                         ■   We need to use read replicas!

                                                                 Proprietary and
Thursday, April 18, 13                                           Confidential      65
DB adapters for read/write
            ■            Looked at several, including DbCharmer




                                                            Proprietary and
Thursday, April 18, 13                                      Confidential      66
DB adapters for read/write
            ■            Looked at several, including DbCharmer

            ■            Features / Configurability / Stability
                         ■   Thread safety? This may be Ruby, but some
                             people do actually use threads.

                         ■   If I tell you it’s a read-only replica, DON’T
                             ISSUE WRITES

                         ■   Failover on errors?


                                                                         Proprietary and
Thursday, April 18, 13                                                   Confidential      66
Chose Makara, by TaskRabbit
          ■        Used in production
          ■        We extended it to work with PostgreSQL
          ■        Works with Sidekiqs (thread-safe!)
          ■        Failover code is very simple. Simple is
                   sometimes better.

                   https://github.com/taskrabbit/makara


                                                          Proprietary and
Thursday, April 18, 13                                    Confidential      67
We rolled out Makara and...
            ■ 1 master, 3 read-only async replicas




                                                Proprietary and
Thursday, April 18, 13                          Confidential      68
We rolled out Makara and...
            ■ 1 master, 3 read-only async replicas




                          Wait, what?
                                                Proprietary and
Thursday, April 18, 13                          Confidential      68
A note about graphs

                    ■    NewRelic is great!
                    ■    Not easy to predict when your
                         systems are about to fall over
                    ■    Use something else to visualize
                         Database and disk saturation




                                                           Proprietary and
Thursday, April 18, 13                                     Confidential      69
3 days later, in production

            ■            3 read replicas distributing load from master

            ■            app servers and sidekiqs create lots of
                         connections to DB backends




                                                                   Proprietary and
Thursday, April 18, 13                                             Confidential      70
3 days later, in production

            ■            3 read replicas distributing load from master

            ■            app servers and sidekiqs create lots of
                         connections to DB backends




            ■            Mysterious spikes in errors at high traffic

                                                                      Proprietary and
Thursday, April 18, 13                                                Confidential      70
3 days later, in production

            ■            3 read replicas distributing load from master

            ■            app servers and sidekiqs create lots of
                         connections to DB backends




            ■            Mysterious spikes in errors at high traffic

                                                                      Proprietary and
Thursday, April 18, 13                                                Confidential      70
Replication! Doh!




                                 Replication lag (yellow)
                         correlates with application errors (red)

                                                                Proprietary and
Thursday, April 18, 13                                          Confidential      71
Replication lag! Doh!
            ■ Track latency sending xlog to slaves
                         select client_addr,
                         pg_xlog_location_diff(sent_location, write_location)
                         from pg_stat_replication;



            ■ Track latency applying xlogs on slaves
                         select pg_xlog_location_diff(
                                  pg_last_xlog_receive_location(),
                                  pg_last_xlog_replay_location()),
                         extract(epoch from now()) -
                         extract(epoch from pg_last_xact_replay_timestamp());




                                                                        Proprietary and
Thursday, April 18, 13                                                  Confidential      72
Eventual Consistency




                                Proprietary and
Thursday, April 18, 13          Confidential      73
Eventual Consistency
              ■ Some code paths should always go to
                         master for reads (ie, after signup)




                                                           Proprietary and
Thursday, April 18, 13                                     Confidential      73
Eventual Consistency
              ■ Some code paths should always go to
                         master for reads (ie, after signup)

              ■ Application should be resilient to
                         getting RecordNotFound to tolerate
                         replication delays




                                                           Proprietary and
Thursday, April 18, 13                                     Confidential      73
Eventual Consistency
              ■ Some code paths should always go to
                         master for reads (ie, after signup)

              ■ Application should be resilient to
                         getting RecordNotFound to tolerate
                         replication delays

             ■ Not enough to scale reads.
                         Writes become the bottleneck.
                                                           Proprietary and
Thursday, April 18, 13                                     Confidential      73
Write load delays replication




                Replicas are busy trying to apply XLOGs
                      and serve heavy read traffic

                                                     Proprietary and
Thursday, April 18, 13                               Confidential      74
4.
            Scaling database writes




Thursday, April 18, 13                75
First, No-Brainers:
            ■ Move stuff out of the DB. Easiest first.




                                                  Proprietary and
Thursday, April 18, 13                            Confidential      76
First, No-Brainers:
            ■ Move stuff out of the DB. Easiest first.
                     ■   Tracking user activity is very easy to do
                         with a database table. But slow.




                                                               Proprietary and
Thursday, April 18, 13                                         Confidential      76
First, No-Brainers:
            ■ Move stuff out of the DB. Easiest first.
                     ■   Tracking user activity is very easy to do
                         with a database table. But slow.

                     ■   2000 inserts/sec while also handling site
                         critical data? Not a good idea.




                                                               Proprietary and
Thursday, April 18, 13                                         Confidential      76
First, No-Brainers:
            ■ Move stuff out of the DB. Easiest first.
                     ■   Tracking user activity is very easy to do
                         with a database table. But slow.

                     ■   2000 inserts/sec while also handling site
                         critical data? Not a good idea.

                     ■   Solution:
                         UDP packets to rsyslog, ASCII delimited files, log-
                         rotate, analyze them later

                                                                       Proprietary and
Thursday, April 18, 13                                                 Confidential      76
Next: Async Commits




                                    Proprietary and
Thursday, April 18, 13              Confidential      77
Next: Async Commits

            ■ PostgreSQL supports delayed
                         (batched) commits




                                             Proprietary and
Thursday, April 18, 13                       Confidential      77
Next: Async Commits

            ■ PostgreSQL supports delayed
                         (batched) commits

            ■ Delays fsync for some # of
                         microseconds




                                             Proprietary and
Thursday, April 18, 13                       Confidential      77
Next: Async Commits

            ■ PostgreSQL supports delayed
                         (batched) commits

            ■ Delays fsync for some # of
                         microseconds


            ■ At high volume helps disk IO

                                             Proprietary and
Thursday, April 18, 13                       Confidential      77
PostgreSQL Async Commits




                                     Proprietary and
Thursday, April 18, 13               Confidential      78
ZFS Block Size




                               Proprietary and
Thursday, April 18, 13         Confidential      79
ZFS Block Size

               ■ Default ZFS block size is 128Kb




                                                   Proprietary and
Thursday, April 18, 13                             Confidential      79
ZFS Block Size

               ■ Default ZFS block size is 128Kb
               ■ PostgreSQL block size is 8Kb




                                                   Proprietary and
Thursday, April 18, 13                             Confidential      79
ZFS Block Size

               ■ Default ZFS block size is 128Kb
               ■ PostgreSQL block size is 8Kb
               ■ Small writes require lots of bandwidth




                                                   Proprietary and
Thursday, April 18, 13                             Confidential      79
ZFS Block Size

               ■ Default ZFS block size is 128Kb
               ■ PostgreSQL block size is 8Kb
               ■ Small writes require lots of bandwidth
               device	
  	
  	
  	
  	
  	
  r/s	
  	
  	
  	
  w/s	
  	
  	
  Mr/s	
  	
  	
  Mw/s	
  wait	
  actv	
  	
  svc_t	
  	
  %w	
  	
  %b	
  

               sd1	
  	
  	
  	
  	
  	
  	
  384.0	
  1157.5	
  	
  	
  48.0	
  	
  116.8	
  	
  0.0	
  	
  8.8	
  	
  	
  	
  5.7	
  	
  	
  2	
  100	
  
               sd1	
  	
  	
  	
  	
  	
  	
  368.0	
  1117.9	
  	
  	
  45.7	
  	
  106.3	
  	
  0.0	
  	
  8.0	
  	
  	
  	
  5.4	
  	
  	
  2	
  100	
  
               sd1	
  	
  	
  	
  	
  	
  	
  330.3	
  1357.5	
  	
  	
  41.3	
  	
  139.1	
  	
  0.0	
  	
  9.5	
  	
  	
  	
  5.6	
  	
  	
  2	
  100	
  



                                                                                                                                                    Proprietary and
Thursday, April 18, 13                                                                                                                              Confidential      79
ZFS Block Size (ctd.)




                                      Proprietary and
Thursday, April 18, 13                Confidential      80
ZFS Block Size (ctd.)



            ■ Solution: change ZFS block size to 8K:




                                                Proprietary and
Thursday, April 18, 13                          Confidential      80
ZFS Block Size (ctd.)
                   device	
  	
  	
  	
  	
  	
  r/s	
  	
  	
  	
  w/s	
  	
  	
  Mr/s	
  	
  	
  Mw/s	
  wait	
  actv	
  	
  svc_t	
  	
  %w	
  	
  %b	
  
                   sd1	
  	
  	
  	
  	
  	
  	
  384.0	
  1157.5	
  	
  	
  48.0	
  	
  116.8	
  	
  0.0	
  	
  8.8	
  	
  	
  	
  5.7	
  	
  	
  2	
  100	
  
                   sd1	
  	
  	
  	
  	
  	
  	
  368.0	
  1117.9	
  	
  	
  45.7	
  	
  106.3	
  	
  0.0	
  	
  8.0	
  	
  	
  	
  5.4	
  	
  	
  2	
  100	
  
                   sd1	
  	
  	
  	
  	
  	
  	
  330.3	
  1357.5	
  	
  	
  41.3	
  	
  139.1	
  	
  0.0	
  	
  9.5	
  	
  	
  	
  5.6	
  	
  	
  2	
  100	
  




            ■ Solution: change ZFS block size to 8K:




                                                                                                                                                     Proprietary and
Thursday, April 18, 13                                                                                                                               Confidential      80
ZFS Block Size (ctd.)
                   device	
  	
  	
  	
  	
  	
  r/s	
  	
  	
  	
  w/s	
  	
  	
  Mr/s	
  	
  	
  Mw/s	
  wait	
  actv	
  	
  svc_t	
  	
  %w	
  	
  %b	
  
                   sd1	
  	
  	
  	
  	
  	
  	
  384.0	
  1157.5	
  	
  	
  48.0	
  	
  116.8	
  	
  0.0	
  	
  8.8	
  	
  	
  	
  5.7	
  	
  	
  2	
  100	
  
                   sd1	
  	
  	
  	
  	
  	
  	
  368.0	
  1117.9	
  	
  	
  45.7	
  	
  106.3	
  	
  0.0	
  	
  8.0	
  	
  	
  	
  5.4	
  	
  	
  2	
  100	
  
                   sd1	
  	
  	
  	
  	
  	
  	
  330.3	
  1357.5	
  	
  	
  41.3	
  	
  139.1	
  	
  0.0	
  	
  9.5	
  	
  	
  	
  5.6	
  	
  	
  2	
  100	
  




            ■ Solution: change ZFS block size to 8K:
                  device	
  	
  	
  	
  	
  	
  r/s	
  	
  	
  	
  w/s	
  	
  	
  Mr/s	
  	
  	
  Mw/s	
  wait	
  actv	
  	
  svc_t	
  	
  %w	
  	
  %b
                  sd1	
  	
  	
  	
  	
  	
  	
  130.3	
  	
  219.9	
  	
  	
  	
  1.0	
  	
  	
  	
  4.4	
  	
  0.0	
  	
  0.7	
  	
  	
  	
  2.1	
  	
  	
  0	
  	
  37
                  sd1	
  	
  	
  	
  	
  	
  	
  329.3	
  	
  384.1	
  	
  	
  	
  2.6	
  	
  	
  11.3	
  	
  0.0	
  	
  1.9	
  	
  	
  	
  2.6	
  	
  	
  1	
  	
  78
                  sd1	
  	
  	
  	
  	
  	
  	
  335.3	
  	
  357.0	
  	
  	
  	
  2.6	
  	
  	
  	
  8.7	
  	
  0.0	
  	
  1.8	
  	
  	
  	
  2.6	
  	
  	
  1	
  	
  80
                  sd1	
  	
  	
  	
  	
  	
  	
  354.0	
  	
  200.3	
  	
  	
  	
  2.8	
  	
  	
  	
  4.9	
  	
  0.0	
  	
  1.6	
  	
  	
  	
  3.0	
  	
  	
  0	
  	
  84
                  sd1	
  	
  	
  	
  	
  	
  	
  465.3	
  	
  100.7	
  	
  	
  	
  3.6	
  	
  	
  	
  1.7	
  	
  0.0	
  	
  2.1	
  	
  	
  	
  3.7	
  	
  	
  0	
  	
  91




                                                                                                                                                                 Proprietary and
Thursday, April 18, 13                                                                                                                                           Confidential      80
Next: Vertical Sharding




                                        Proprietary and
Thursday, April 18, 13                  Confidential      81
Next: Vertical Sharding

            ■ Move out largest table into its own
                         master database (150 inserts/sec)




                                                             Proprietary and
Thursday, April 18, 13                                       Confidential      81
Next: Vertical Sharding

            ■ Move out largest table into its own
                         master database (150 inserts/sec)

            ■ Remove any SQL joins, do them in
                         application, drop foreign keys




                                                             Proprietary and
Thursday, April 18, 13                                       Confidential      81
Next: Vertical Sharding

            ■ Move out largest table into its own
                         master database (150 inserts/sec)

            ■ Remove any SQL joins, do them in
                         application, drop foreign keys

            ■ Switch model to establish_connection
                         to another DB. Fix many broken tests.

                                                             Proprietary and
Thursday, April 18, 13                                       Confidential      81
Vertical Sharding
                                                         unicorns


                                               haproxy     pgbouncer   twemproxy




                                                                                   PostgreSQL
                                                                                   saves master
          PostgreSQL            PostgreSQL               PostgreSQL
          main replica          main replica             main master




                         streaming replication

                                                                                           Proprietary and
Thursday, April 18, 13                                                                     Confidential      82
Vertical Sharding: Results




                                           Proprietary and
Thursday, April 18, 13                     Confidential      83
Vertical Sharding: Results

            ■ Deploy All Things!




                                           Proprietary and
Thursday, April 18, 13                     Confidential      83
Future: Services Approach
                                                     unicorns


                                           haproxy     pgbouncer   twemproxy




                                                                                        http / json



     PostgreSQL             PostgreSQL               PostgreSQL                 sinatra services app
     main replica           main replica             main master




                     streaming replication
                                                                               Shard1                 Shard2     Shard3




                                                                                                               Proprietary and
Thursday, April 18, 13                                                                                         Confidential      84
In Conclusion. Tasty gems :)
                         https://github.com/wanelo/pause


                         https://github.com/wanelo/spanx


                         https://github.com/wanelo/redis_with_failover


                         https://github.com/kigster/ventable



                                                                   Proprietary and
Thursday, April 18, 13                                             Confidential      85
In Conclusion. Tasty gems :)
                         https://github.com/wanelo/pause
                            ■ distributed rate limiting using redis

                         https://github.com/wanelo/spanx


                         https://github.com/wanelo/redis_with_failover


                         https://github.com/kigster/ventable



                                                                      Proprietary and
Thursday, April 18, 13                                                Confidential      85
In Conclusion. Tasty gems :)
                         https://github.com/wanelo/pause
                            ■ distributed rate limiting using redis

                         https://github.com/wanelo/spanx
                            ■ rate-limit-based IP blocker for nginx

                         https://github.com/wanelo/redis_with_failover


                         https://github.com/kigster/ventable



                                                                      Proprietary and
Thursday, April 18, 13                                                Confidential      85
In Conclusion. Tasty gems :)
                         https://github.com/wanelo/pause
                            ■ distributed rate limiting using redis

                         https://github.com/wanelo/spanx
                            ■ rate-limit-based IP blocker for nginx

                         https://github.com/wanelo/redis_with_failover
                            ■ attempt another redis server if available

                         https://github.com/kigster/ventable



                                                                      Proprietary and
Thursday, April 18, 13                                                Confidential      85
In Conclusion. Tasty gems :)
                         https://github.com/wanelo/pause
                            ■ distributed rate limiting using redis

                         https://github.com/wanelo/spanx
                            ■ rate-limit-based IP blocker for nginx

                         https://github.com/wanelo/redis_with_failover
                            ■ attempt another redis server if available

                         https://github.com/kigster/ventable
                            ■ observable pattern with a twist

                                                                      Proprietary and
Thursday, April 18, 13                                                Confidential      85
Thanks.
      Comments? Questions?
      https://github.com/wanelo
      https://github.com/wanelo-chef

                         @kig & @sax
                         @kig & @ecdysone
                         @kigster & @sax

                                            Proprietary and
Thursday, April 18, 13                      Confidential      86

Mais conteúdo relacionado

Último

From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 

Último (20)

From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 

Destaque

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

Destaque (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Scaling Wanelo.com 100x in Six Months

  • 1. Scaling 100x in six months by Eric Saxby & Konstantin Gredeskoul April 2013 Proprietary and Thursday, April 18, 13 Confidential 1
  • 2. What is Wanelo? ■ Wanelo (“Wah-nee-lo” from Want, Need Love) is a global platform for shopping. Proprietary and Thursday, April 18, 13 Confidential 2
  • 3. What is Wanelo? ■ Wanelo (“Wah-nee-lo” from Want, Need Love) is a global platform for shopping. Proprietary and Thursday, April 18, 13 Confidential 2
  • 4. It’s marketing-free shopping across 100s of thousands of unique stores Proprietary and Thursday, April 18, 13 Confidential 3
  • 5. Personal Activity Feed... Proprietary and Thursday, April 18, 13 Confidential 4
  • 6. Personal Activity Feed... Proprietary and Thursday, April 18, 13 Confidential 4
  • 7. iOS + Android Proprietary and Thursday, April 18, 13 Confidential 5
  • 8. iOS + Android Proprietary and Thursday, April 18, 13 Confidential 5
  • 9. Early Decisions Proprietary and Thursday, April 18, 13 Confidential 6
  • 10. Early Decisions ■ Optimize for iteration speed, not performance Proprietary and Thursday, April 18, 13 Confidential 6
  • 11. Early Decisions ■ Optimize for iteration speed, not performance ■ Keep scalability in mind, track metrics, and fix as needed Proprietary and Thursday, April 18, 13 Confidential 6
  • 12. Early Decisions ■ Optimize for iteration speed, not performance ■ Keep scalability in mind, track metrics, and fix as needed ■ Introduce many levels of caching early Proprietary and Thursday, April 18, 13 Confidential 6
  • 13. Technology Timeline Proprietary and Thursday, April 18, 13 Confidential 7
  • 14. Technology Timeline ■ 2010 - 2011 Wanelo v1 stack is Java, JSP, MySQL, Hibernate 90K lines of code, 53+ DB tables, no tests Proprietary and Thursday, April 18, 13 Confidential 7
  • 15. Technology Timeline ■ 2010 - 2011 Wanelo v1 stack is Java, JSP, MySQL, Hibernate 90K lines of code, 53+ DB tables, no tests ■ May 2012 - June 2012 Rewrite from scratch to RoR on PostgreSQL (v2) Proprietary and Thursday, April 18, 13 Confidential 7
  • 16. Technology Timeline ■ 2010 - 2011 Wanelo v1 stack is Java, JSP, MySQL, Hibernate 90K lines of code, 53+ DB tables, no tests ■ May 2012 - June 2012 Rewrite from scratch to RoR on PostgreSQL (v2) ■ Ruby app is 10K LOC, full test coverage, 8 database tables, less features Proprietary and Thursday, April 18, 13 Confidential 7
  • 17. The “Big” Rewrite Proprietary and Thursday, April 18, 13 Confidential 8
  • 18. The “Big” Rewrite More info here.... Proprietary and Thursday, April 18, 13 Confidential 8
  • 19. The “Big” Rewrite More info here.... building.wanelo.com/ http:// Proprietary and Thursday, April 18, 13 Confidential 8
  • 20. The “Big” Rewrite More info here.... building.wanelo.com/ http:// Proprietary and Thursday, April 18, 13 Confidential 8
  • 21. Growth Timeline Proprietary and Thursday, April 18, 13 Confidential 9
  • 22. Growth Timeline ■ 06/2012 - RoR App Relaunches Proprietary and Thursday, April 18, 13 Confidential 9
  • 23. Growth Timeline ■ 06/2012 - RoR App Relaunches ■ 2-3K requests per minute (RPM) peak Proprietary and Thursday, April 18, 13 Confidential 9
  • 24. Growth Timeline ■ 06/2012 - RoR App Relaunches ■ 2-3K requests per minute (RPM) peak ■ 08/2012 - iOS App is launched Proprietary and Thursday, April 18, 13 Confidential 9
  • 25. Growth Timeline ■ 06/2012 - RoR App Relaunches ■ 2-3K requests per minute (RPM) peak ■ 08/2012 - iOS App is launched ■ 10-40K RPM peak Proprietary and Thursday, April 18, 13 Confidential 9
  • 26. Growth Timeline ■ 06/2012 - RoR App Relaunches ■ 2-3K requests per minute (RPM) peak ■ 08/2012 - iOS App is launched ■ 10-40K RPM peak ■ 12/2012 - Android app launched Proprietary and Thursday, April 18, 13 Confidential 9
  • 27. Growth Timeline ■ 06/2012 - RoR App Relaunches ■ 2-3K requests per minute (RPM) peak ■ 08/2012 - iOS App is launched ■ 10-40K RPM peak ■ 12/2012 - Android app launched ■ 40-120K RPM peak Proprietary and Thursday, April 18, 13 Confidential 9
  • 28. Growth Timeline ■ 06/2012 - RoR App Relaunches ■ 2-3K requests per minute (RPM) peak ■ 08/2012 - iOS App is launched ■ 10-40K RPM peak ■ 12/2012 - Android app launched ■ 40-120K RPM peak ■ 03/2013 - #24 top free apps iTunes Proprietary and Thursday, April 18, 13 Confidential 9
  • 29. Growth Timeline ■ 06/2012 - RoR App Relaunches ■ 2-3K requests per minute (RPM) peak ■ 08/2012 - iOS App is launched ■ 10-40K RPM peak ■ 12/2012 - Android app launched ■ 40-120K RPM peak ■ 03/2013 - #24 top free apps iTunes ■ 80-200K RPM peak Proprietary and Thursday, April 18, 13 Confidential 9
  • 30. Requests Per Minute (RPM) Proprietary and Thursday, April 18, 13 Confidential 10
  • 31. Current Numbers... ■ 4M active monthly users ■ 5M products saved 700M times ■ 8M products saved per day ■ 200k stores Proprietary and Thursday, April 18, 13 Confidential 11
  • 32. Backend Stack & Key Vendors ■ MRI Ruby 1.9.3 & Rails 3.2 ■ PostgreSQL 9.2.4, Solr 3.6 ■ Joyent Cloud, SmartOS ZFS, ARC, raw IO performance, SmartOS, CPU bursting, dTrace ■ Circonus, Chef + Opscode Monitoring, graphing, alerting, automation ■ Amazon S3 + Fastly CDN ■ NewRelic, statsd, Graphite, nagios Proprietary and Thursday, April 18, 13 Confidential 12
  • 33. Wanelo Web Architecture nginx 6 x 2GB haproxy unicorn x 14 sidekiq 20 x 8GB 4 x 8GB haproxy pgbouncer twemproxy haproxy pgbouncer twemproxy Solr PostgreSQL Redis MemCached Proprietary and Thursday, April 18, 13 Confidential 13
  • 34. This talk is about: Proprietary and Thursday, April 18, 13 Confidential 14
  • 35. This talk is about: 1. How much traffic can your database handle? Proprietary and Thursday, April 18, 13 Confidential 14
  • 36. This talk is about: 1. How much traffic can your database handle? 2. Special report on counters Proprietary and Thursday, April 18, 13 Confidential 14
  • 37. This talk is about: 1. How much traffic can your database handle? 2. Special report on counters 3. Scaling database reads Proprietary and Thursday, April 18, 13 Confidential 14
  • 38. This talk is about: 1. How much traffic can your database handle? 2. Special report on counters 3. Scaling database reads 4. Scaling database writes Proprietary and Thursday, April 18, 13 Confidential 14
  • 39. 1. How much traffic can your database handle? Thursday, April 18, 13 15
  • 40. PostgreSQL is Awesome! Proprietary and Thursday, April 18, 13 Confidential 16
  • 41. PostgreSQL is Awesome! ■ Does a fantastic job of not corrupting your data Proprietary and Thursday, April 18, 13 Confidential 16
  • 42. PostgreSQL is Awesome! ■ Does a fantastic job of not corrupting your data ■ Streaming replication in 9.2 is extremely reliable Proprietary and Thursday, April 18, 13 Confidential 16
  • 43. PostgreSQL is Awesome! ■ Does a fantastic job of not corrupting your data ■ Streaming replication in 9.2 is extremely reliable ■ Won’t write to a read-only replica Proprietary and Thursday, April 18, 13 Confidential 16
  • 44. PostgreSQL is Awesome! ■ Does a fantastic job of not corrupting your data ■ Streaming replication in 9.2 is extremely reliable ■ Won’t write to a read-only replica ■ But... No master/master replication Proprietary and Thursday, April 18, 13 Confidential 16
  • 45. PostgreSQL is Awesome! ■ Does a fantastic job of not corrupting your data ■ Streaming replication in 9.2 is extremely reliable ■ Won’t write to a read-only replica ■ But... No master/master replication (good!) Proprietary and Thursday, April 18, 13 Confidential 16
  • 46. Is the database healthy? Proprietary and Thursday, April 18, 13 Confidential 17
  • 47. What’s healthy? Proprietary and Thursday, April 18, 13 Confidential 18
  • 48. What’s healthy? ■ Able to respond quickly to queries from application (< 4ms disk seek time) Proprietary and Thursday, April 18, 13 Confidential 18
  • 49. What’s healthy? ■ Able to respond quickly to queries from application (< 4ms disk seek time) ■ Has enough room to grow Proprietary and Thursday, April 18, 13 Confidential 18
  • 50. What’s healthy? ■ Able to respond quickly to queries from application (< 4ms disk seek time) ■ Has enough room to grow ■ How do we know when we’re approaching a dangerous threshold? Proprietary and Thursday, April 18, 13 Confidential 18
  • 51. Oops! NewRelic Latency (yellow = database) Proprietary and Thursday, April 18, 13 Confidential 19
  • 52. Oops! NewRelic Latency (yellow = database) Proprietary and Thursday, April 18, 13 Confidential 19
  • 53. pg_stat_statements ■ Maybe your app is to blame for performance...    select      query,  calls,  total_time      from          pg_stat_statements      order  by  total_time  desc  limit  12; Proprietary and Thursday, April 18, 13 Confidential 20
  • 54. pg_stat_statements ■ Maybe your app is to blame for performance...    select      query,  calls,  total_time      from          pg_stat_statements      order  by  total_time  desc  limit  12; Similar to Percona Toolkit, but runs all the time collecting stats. Proprietary and Thursday, April 18, 13 Confidential 20
  • 55. pg_stat_statements Proprietary and Thursday, April 18, 13 Confidential 21
  • 56. pg_stat_user_indexes ■ Using indexes as much as you think you are? ■ Using indexes at all? Proprietary and Thursday, April 18, 13 Confidential 22
  • 57. pg_stat_user_indexes ■ Using indexes as much as you think you are? ■ Using indexes at all? Proprietary and Thursday, April 18, 13 Confidential 22
  • 58. pg_stat_user_tables ■ Full table scans? (seq_scan) Proprietary and Thursday, April 18, 13 Confidential 23
  • 59. pg_stat_user_tables ■ Full table scans? (seq_scan) Proprietary and Thursday, April 18, 13 Confidential 23
  • 60. Throw that in a graph Reads/second for one large table, daily Proprietary and Thursday, April 18, 13 Confidential 24
  • 61. Non-linear changes Suspicious spike! Proprietary and Thursday, April 18, 13 Confidential 25
  • 62. Correlate different data Deployments! Aha! Proprietary and Thursday, April 18, 13 Confidential 26
  • 63. Utilization vs Saturation # of Active PostgreSQL connections Proprietary and Thursday, April 18, 13 Confidential 27
  • 64. Utilization vs Saturation Red line: % of max connections established Purple: % of connections in query Proprietary and Thursday, April 18, 13 Confidential 28
  • 65. Disk reads/writes green: reads, red: writes Proprietary and Thursday, April 18, 13 Confidential 29
  • 66. Disk reads/writes green: reads, red: writes Usage increases, but are the disks saturated? Proprietary and Thursday, April 18, 13 Confidential 29
  • 67. Utilization vs Saturation Proprietary and Thursday, April 18, 13 Confidential 30
  • 68. Utilization vs Saturation Proprietary and Thursday, April 18, 13 Confidential 30
  • 69. Utilization vs Saturation [ How much are you waiting on disk? Proprietary and Thursday, April 18, 13 Confidential 31
  • 70. File system cache (ARC) Proprietary and Thursday, April 18, 13 Confidential 32
  • 71. File system cache (ARC) Proprietary and Thursday, April 18, 13 Confidential 32
  • 72. File system cache (ARC) Proprietary and Thursday, April 18, 13 Confidential 32
  • 73. Watch the right things Hit ratio of the file system cache (ARC) Proprietary and Thursday, April 18, 13 Confidential 33
  • 74. Watch the right things Hit ratio of the file system cache (ARC) Proprietary and Thursday, April 18, 13 Confidential 33
  • 75. Room to grow... Size (including indexes) of a key table Proprietary and Thursday, April 18, 13 Confidential 34
  • 76. Working set in RAM? Adding index increases the size Proprietary and Thursday, April 18, 13 Confidential 35
  • 77. Working set in RAM? Adding index increases the size Proprietary and Thursday, April 18, 13 Confidential 35
  • 78. Collect all the data you can Once we knew where to look, graphs added later could explain behavior we could only guess at earlier Proprietary and Thursday, April 18, 13 Confidential 36
  • 79. Collect all the data you can Once we knew where to look, graphs added later could explain behavior we could only guess at earlier Proprietary and Thursday, April 18, 13 Confidential 36
  • 80. 2. Special report on Counters and Pagination Thursday, April 18, 13 37
  • 81. Problem #1: DB Latency Up... Proprietary and Thursday, April 18, 13 Confidential 38
  • 82. Problem #1: DB Latency Up... ■ iostat shows 100% disk busy Proprietary and Thursday, April 18, 13 Confidential 38
  • 83. Problem #1: DB Latency Up... ■ iostat shows 100% disk busy device            r/s        w/s      Mr/s      Mw/s  wait  actv    svc_t    %w    %b   sd1              384.0  1157.5      48.0    116.8    0.0    8.8        5.7      2  100   sd1              368.0  1117.9      45.7    106.3    0.0    8.0        5.4      2  100   sd1              330.3  1357.5      41.3    139.1    0.0    9.5        5.6      2  100   Proprietary and Thursday, April 18, 13 Confidential 38
  • 84. Problem #1: DB Latency Up... ■ iostat shows 100% disk busy device            r/s        w/s      Mr/s      Mw/s  wait  actv    svc_t    %w    %b   sd1              384.0  1157.5      48.0    116.8    0.0    8.8        5.7      2  100   sd1              368.0  1117.9      45.7    106.3    0.0    8.0        5.4      2  100   sd1              330.3  1357.5      41.3    139.1    0.0    9.5        5.6      2  100   Proprietary and Thursday, April 18, 13 Confidential 38
  • 85. Problem #1: DB Latency Up... ■ iostat shows 100% disk busy device            r/s        w/s      Mr/s      Mw/s  wait  actv    svc_t    %w    %b   sd1              384.0  1157.5      48.0    116.8    0.0    8.8        5.7      2  100   sd1              368.0  1117.9      45.7    106.3    0.0    8.0        5.4      2  100   sd1              330.3  1357.5      41.3    139.1    0.0    9.5        5.6      2  100   Proprietary and Thursday, April 18, 13 Confidential 38
  • 86. Problem #1: Diagnostics Proprietary and Thursday, April 18, 13 Confidential 39
  • 87. Problem #1: Diagnostics ■ Database is running very very hot. Initial investigation shows large number of counts. Proprietary and Thursday, April 18, 13 Confidential 39
  • 88. Problem #1: Diagnostics ■ Database is running very very hot. Initial investigation shows large number of counts. ■ Turns out anytime you page with Kaminari, it always does a count(*)! Proprietary and Thursday, April 18, 13 Confidential 39
  • 89. Problem #1: Diagnostics ■ Database is running very very hot. Initial investigation shows large number of counts. ■ Turns out anytime you page with Kaminari, it always does a count(*)! SELECT  "stores".*  FROM  "stores"                                      WHERE  (state  =  'approved')                                      LIMIT  20  OFFSET  0 SELECT  COUNT(*)  FROM  "stores"  WHERE  (state  =  'approved') Proprietary and Thursday, April 18, 13 Confidential 39
  • 90. Problem #1: Pagination Proprietary and Thursday, April 18, 13 Confidential 40
  • 91. Problem #1: Pagination ■ Doing count(*) is pretty expensive, as DB must scan many rows (either the actual table or an index) Proprietary and Thursday, April 18, 13 Confidential 40
  • 92. Problem #1: Pagination Proprietary and Thursday, April 18, 13 Confidential 41
  • 93. Problem #1: Pagination ■ We are paginating everything! Even infinite scroll is a paged view behind the scenes. Proprietary and Thursday, April 18, 13 Confidential 41
  • 94. Problem #1: Pagination ■ We are paginating everything! Even infinite scroll is a paged view behind the scenes. ■ But we really DON’T want to run count(*) for every paged view. Proprietary and Thursday, April 18, 13 Confidential 41
  • 95. Problem #1: Pagination ■ We are showing most popular stores ■ Maybe it’s OK to hard-code the total number to, say, 1000? Proprietary and Thursday, April 18, 13 Confidential 42
  • 96. Problem #1: Pagination ■ We are showing most popular stores ■ Maybe it’s OK to hard-code the total number to, say, 1000? ■ How do we tell Kaminari NOT to issue a count query in this case? Proprietary and Thursday, April 18, 13 Confidential 42
  • 97. Problem #1: Pagination (ctd) Proprietary and Thursday, April 18, 13 Confidential 43
  • 98. Solution #1: Monkey Patch!! Proprietary and Thursday, April 18, 13 Confidential 44
  • 99. Solution #1: Monkey Patch!! Proprietary and Thursday, April 18, 13 Confidential 44
  • 100. Solution #1: Pass in the counter Proprietary and Thursday, April 18, 13 Confidential 45
  • 101. Solution #1: Pass in the counter SELECT  "stores".*  FROM  "stores"  WHERE  (state  =   'approved')  LIMIT  20  OFFSET  0 Proprietary and Thursday, April 18, 13 Confidential 45
  • 102. Problem #2: Count Draculas ■ AKA: We still are doing too many counts! Proprietary and Thursday, April 18, 13 Confidential 46
  • 103. Problem #2: Count Draculas ■ AKA: We still are doing too many counts! Proprietary and Thursday, April 18, 13 Confidential 46
  • 104. Problem #2: Count Draculas ■ AKA: We still are doing too many counts! ■ Rails makes it so easy to do it the lazy way. Proprietary and Thursday, April 18, 13 Confidential 46
  • 105. Problem #2: Too Many Counts! ■ But it just doesn’t scale well Proprietary and Thursday, April 18, 13 Confidential 47
  • 106. Problem #2: Too Many Counts! ■ But it just doesn’t scale well ■ Fortunately, Rails has just a feature for this... Proprietary and Thursday, April 18, 13 Confidential 47
  • 107. Problem #2: Too Many Counts! ■ But it just doesn’t scale well ■ Fortunately, Rails has just a feature for this... Proprietary and Thursday, April 18, 13 Confidential 47
  • 108. Counter Caches ■ Unfortunately, it has one massive issue: Proprietary and Thursday, April 18, 13 Confidential 48
  • 109. Counter Caches ■ Unfortunately, it has one massive issue: ■ It causes database deadlocks at high volume Proprietary and Thursday, April 18, 13 Confidential 48
  • 110. Counter Caches ■ Unfortunately, it has one massive issue: ■ It causes database deadlocks at high volume ■ Because many ruby processes are creating child records concurrently Proprietary and Thursday, April 18, 13 Confidential 48
  • 111. Counter Caches ■ Unfortunately, it has one massive issue: ■ It causes database deadlocks at high volume ■ Because many ruby processes are creating child records concurrently ■ Each is executing a callback, trying to update counter_cache column on the parent, requiring row-level lock Proprietary and Thursday, April 18, 13 Confidential 48
  • 112. Counter Caches ■ Unfortunately, it has one massive issue: ■ It causes database deadlocks at high volume ■ Because many ruby processes are creating child records concurrently ■ Each is executing a callback, trying to update counter_cache column on the parent, requiring row-level lock ■ Deadlocks ensue Proprietary and Thursday, April 18, 13 Confidential 48
  • 113. Possible Solution: Use Background Jobs Proprietary and Thursday, April 18, 13 Confidential 49
  • 114. Possible Solution: Use Background Jobs ■ It works like this: Proprietary and Thursday, April 18, 13 Confidential 49
  • 115. Possible Solution: Use Background Jobs ■ It works like this: ■ As the record is created, we enqueue a request to recalculate counter_cache on the parent Proprietary and Thursday, April 18, 13 Confidential 49
  • 116. Possible Solution: Use Background Jobs ■ It works like this: ■ As the record is created, we enqueue a request to recalculate counter_cache on the parent ■ The job performs a complete recalculation of the counter cache and is idempotent Proprietary and Thursday, April 18, 13 Confidential 49
  • 117. Solution #2: Explained Proprietary and Thursday, April 18, 13 Confidential 50
  • 118. Solution #2: Explained ■ Sidekiq with UniqueJob extension Proprietary and Thursday, April 18, 13 Confidential 50
  • 119. Solution #2: Explained ■ Sidekiq with UniqueJob extension ■ Short wait for “buffering” Proprietary and Thursday, April 18, 13 Confidential 50
  • 120. Solution #2: Explained ■ Sidekiq with UniqueJob extension ■ Short wait for “buffering” ■ Serialize updates via small number of workers Proprietary and Thursday, April 18, 13 Confidential 50
  • 121. Solution #2: Explained ■ Sidekiq with UniqueJob extension ■ Short wait for “buffering” ■ Serialize updates via small number of workers ■ Can temporarily stop workers (in an emergency) to alleviate DB load Proprietary and Thursday, April 18, 13 Confidential 50
  • 122. Solution #2: Code Proprietary and Thursday, April 18, 13 Confidential 51
  • 123. Things are better. BUT... Proprietary and Thursday, April 18, 13 Confidential 52
  • 124. Things are better. BUT... Still too many fucking counts! Proprietary and Thursday, April 18, 13 Confidential 52
  • 125. Things are better. BUT... Still too many fucking counts! ■ Even doing count(*) from workers is too much on the databases Proprietary and Thursday, April 18, 13 Confidential 52
  • 126. Things are better. BUT... Still too many fucking counts! ■ Even doing count(*) from workers is too much on the databases ■ We need to stop doing count(*) in DB. But keep counter_caches. How? Proprietary and Thursday, April 18, 13 Confidential 52
  • 127. Things are better. BUT... Still too many fucking counts! ■ Even doing count(*) from workers is too much on the databases ■ We need to stop doing count(*) in DB. But keep counter_caches. How? ■ We could use Redis for this. Proprietary and Thursday, April 18, 13 Confidential 52
  • 128. save product product_id Solution #3: Counts Deltas unicorn counter_cache column 1. INCR product_id 2. ProductCountWorker.enqueue product_id Redis Redis PostgreSQL Counters Sidekiq 4. GET 3. Dequeue 5. RESET 5. SQL Update INCR by N sidekiq Proprietary and Thursday, April 18, 13 Confidential 53
  • 129. save product product_id Solution #3: Counts Deltas unicorn ■ Web request increments counter value in Redis counter_cache column ■ Enqueues request to update counter_cache 1. INCR product_id 2. ProductCountWorker.enqueue product_id ■ Background Job picks up a few minutes later, reads Redis Redis PostgreSQL Redis delta value, and Counters Sidekiq removes it. ■ Updates counter_cache 4. GET 5. RESET 3. Dequeue column by incrementing it by delta. 5. SQL Update INCR by N sidekiq Proprietary and Thursday, April 18, 13 Confidential 53
  • 130. Define counter_cache_on... ■ Internal GEM, will open source soon! Proprietary and Thursday, April 18, 13 Confidential 54
  • 131. Can now use counter caches in pagination! Proprietary and Thursday, April 18, 13 Confidential 55
  • 132. 3. Scaling reads Thursday, April 18, 13 56
  • 133. Multiple optimization cycles ■ Caching action caching, fragment, CDN ■ Personalization via AJAX Cache the entire page, then add personalized details ■ 25ms/req memcached time is cheaper than 12ms/req of database time Proprietary and Thursday, April 18, 13 Confidential 57
  • 134. Cache optimization 40% hit ratio! Woo! Wait... is that even good? Proprietary and Thursday, April 18, 13 Confidential 58
  • 135. Cache optimization Increasing your hit ratio means less queries against your database Proprietary and Thursday, April 18, 13 Confidential 59
  • 136. Cache optimization Caveat: even low hit ratio caches can save your ass. You’re removing load from the DB, remember? Proprietary and Thursday, April 18, 13 Confidential 60
  • 137. Cache saturation Blue: cache writes How long before your caches Red: automatic evictions start evicting data? Proprietary and Thursday, April 18, 13 Confidential 61
  • 138. Cache saturation Blue: cache writes How long before your caches Red: automatic evictions start evicting data? Proprietary and Thursday, April 18, 13 Confidential 61
  • 139. Cache saturation Blue: cache writes How long before your caches Red: automatic evictions start evicting data? Proprietary and Thursday, April 18, 13 Confidential 61
  • 140. Ajax personalization Proprietary and Thursday, April 18, 13 Confidential 62
  • 141. Ajax personalization Proprietary and Thursday, April 18, 13 Confidential 62
  • 142. Ajax personalization Proprietary and Thursday, April 18, 13 Confidential 62
  • 143. Nice! ■ Rails Action Caching Runs before_filters, so A/B experiments can still run ■ Extremely fast pages 4ms application time for some of our computationally heaviest pages ■ Could be served via CDN in the future Proprietary and Thursday, April 18, 13 Confidential 63
  • 144. Sad trombone... ■ Are you actually logged in? Pages don’t know until Ajax successfully runs ■ Selenium AND Jasmine tests! Proprietary and Thursday, April 18, 13 Confidential 64
  • 145. Read/write splitting ■ Sometime in December 2012... Proprietary and Thursday, April 18, 13 Confidential 65
  • 146. Read/write splitting ■ Sometime in December 2012... ■ Database reaching 100% saturation Proprietary and Thursday, April 18, 13 Confidential 65
  • 147. Read/write splitting ■ Sometime in December 2012... ■ Database reaching 100% saturation ■ Latency starting to increase non-linearly Proprietary and Thursday, April 18, 13 Confidential 65
  • 148. Read/write splitting ■ Sometime in December 2012... ■ Database reaching 100% saturation ■ Latency starting to increase non-linearly ■ We need to distribute database load Proprietary and Thursday, April 18, 13 Confidential 65
  • 149. Read/write splitting ■ Sometime in December 2012... ■ Database reaching 100% saturation ■ Latency starting to increase non-linearly ■ We need to distribute database load ■ We need to use read replicas! Proprietary and Thursday, April 18, 13 Confidential 65
  • 150. DB adapters for read/write ■ Looked at several, including DbCharmer Proprietary and Thursday, April 18, 13 Confidential 66
  • 151. DB adapters for read/write ■ Looked at several, including DbCharmer ■ Features / Configurability / Stability ■ Thread safety? This may be Ruby, but some people do actually use threads. ■ If I tell you it’s a read-only replica, DON’T ISSUE WRITES ■ Failover on errors? Proprietary and Thursday, April 18, 13 Confidential 66
  • 152. Chose Makara, by TaskRabbit ■ Used in production ■ We extended it to work with PostgreSQL ■ Works with Sidekiqs (thread-safe!) ■ Failover code is very simple. Simple is sometimes better. https://github.com/taskrabbit/makara Proprietary and Thursday, April 18, 13 Confidential 67
  • 153. We rolled out Makara and... ■ 1 master, 3 read-only async replicas Proprietary and Thursday, April 18, 13 Confidential 68
  • 154. We rolled out Makara and... ■ 1 master, 3 read-only async replicas Wait, what? Proprietary and Thursday, April 18, 13 Confidential 68
  • 155. A note about graphs ■ NewRelic is great! ■ Not easy to predict when your systems are about to fall over ■ Use something else to visualize Database and disk saturation Proprietary and Thursday, April 18, 13 Confidential 69
  • 156. 3 days later, in production ■ 3 read replicas distributing load from master ■ app servers and sidekiqs create lots of connections to DB backends Proprietary and Thursday, April 18, 13 Confidential 70
  • 157. 3 days later, in production ■ 3 read replicas distributing load from master ■ app servers and sidekiqs create lots of connections to DB backends ■ Mysterious spikes in errors at high traffic Proprietary and Thursday, April 18, 13 Confidential 70
  • 158. 3 days later, in production ■ 3 read replicas distributing load from master ■ app servers and sidekiqs create lots of connections to DB backends ■ Mysterious spikes in errors at high traffic Proprietary and Thursday, April 18, 13 Confidential 70
  • 159. Replication! Doh! Replication lag (yellow) correlates with application errors (red) Proprietary and Thursday, April 18, 13 Confidential 71
  • 160. Replication lag! Doh! ■ Track latency sending xlog to slaves select client_addr, pg_xlog_location_diff(sent_location, write_location) from pg_stat_replication; ■ Track latency applying xlogs on slaves select pg_xlog_location_diff( pg_last_xlog_receive_location(), pg_last_xlog_replay_location()), extract(epoch from now()) - extract(epoch from pg_last_xact_replay_timestamp()); Proprietary and Thursday, April 18, 13 Confidential 72
  • 161. Eventual Consistency Proprietary and Thursday, April 18, 13 Confidential 73
  • 162. Eventual Consistency ■ Some code paths should always go to master for reads (ie, after signup) Proprietary and Thursday, April 18, 13 Confidential 73
  • 163. Eventual Consistency ■ Some code paths should always go to master for reads (ie, after signup) ■ Application should be resilient to getting RecordNotFound to tolerate replication delays Proprietary and Thursday, April 18, 13 Confidential 73
  • 164. Eventual Consistency ■ Some code paths should always go to master for reads (ie, after signup) ■ Application should be resilient to getting RecordNotFound to tolerate replication delays ■ Not enough to scale reads. Writes become the bottleneck. Proprietary and Thursday, April 18, 13 Confidential 73
  • 165. Write load delays replication Replicas are busy trying to apply XLOGs and serve heavy read traffic Proprietary and Thursday, April 18, 13 Confidential 74
  • 166. 4. Scaling database writes Thursday, April 18, 13 75
  • 167. First, No-Brainers: ■ Move stuff out of the DB. Easiest first. Proprietary and Thursday, April 18, 13 Confidential 76
  • 168. First, No-Brainers: ■ Move stuff out of the DB. Easiest first. ■ Tracking user activity is very easy to do with a database table. But slow. Proprietary and Thursday, April 18, 13 Confidential 76
  • 169. First, No-Brainers: ■ Move stuff out of the DB. Easiest first. ■ Tracking user activity is very easy to do with a database table. But slow. ■ 2000 inserts/sec while also handling site critical data? Not a good idea. Proprietary and Thursday, April 18, 13 Confidential 76
  • 170. First, No-Brainers: ■ Move stuff out of the DB. Easiest first. ■ Tracking user activity is very easy to do with a database table. But slow. ■ 2000 inserts/sec while also handling site critical data? Not a good idea. ■ Solution: UDP packets to rsyslog, ASCII delimited files, log- rotate, analyze them later Proprietary and Thursday, April 18, 13 Confidential 76
  • 171. Next: Async Commits Proprietary and Thursday, April 18, 13 Confidential 77
  • 172. Next: Async Commits ■ PostgreSQL supports delayed (batched) commits Proprietary and Thursday, April 18, 13 Confidential 77
  • 173. Next: Async Commits ■ PostgreSQL supports delayed (batched) commits ■ Delays fsync for some # of microseconds Proprietary and Thursday, April 18, 13 Confidential 77
  • 174. Next: Async Commits ■ PostgreSQL supports delayed (batched) commits ■ Delays fsync for some # of microseconds ■ At high volume helps disk IO Proprietary and Thursday, April 18, 13 Confidential 77
  • 175. PostgreSQL Async Commits Proprietary and Thursday, April 18, 13 Confidential 78
  • 176. ZFS Block Size Proprietary and Thursday, April 18, 13 Confidential 79
  • 177. ZFS Block Size ■ Default ZFS block size is 128Kb Proprietary and Thursday, April 18, 13 Confidential 79
  • 178. ZFS Block Size ■ Default ZFS block size is 128Kb ■ PostgreSQL block size is 8Kb Proprietary and Thursday, April 18, 13 Confidential 79
  • 179. ZFS Block Size ■ Default ZFS block size is 128Kb ■ PostgreSQL block size is 8Kb ■ Small writes require lots of bandwidth Proprietary and Thursday, April 18, 13 Confidential 79
  • 180. ZFS Block Size ■ Default ZFS block size is 128Kb ■ PostgreSQL block size is 8Kb ■ Small writes require lots of bandwidth device            r/s        w/s      Mr/s      Mw/s  wait  actv    svc_t    %w    %b   sd1              384.0  1157.5      48.0    116.8    0.0    8.8        5.7      2  100   sd1              368.0  1117.9      45.7    106.3    0.0    8.0        5.4      2  100   sd1              330.3  1357.5      41.3    139.1    0.0    9.5        5.6      2  100   Proprietary and Thursday, April 18, 13 Confidential 79
  • 181. ZFS Block Size (ctd.) Proprietary and Thursday, April 18, 13 Confidential 80
  • 182. ZFS Block Size (ctd.) ■ Solution: change ZFS block size to 8K: Proprietary and Thursday, April 18, 13 Confidential 80
  • 183. ZFS Block Size (ctd.) device            r/s        w/s      Mr/s      Mw/s  wait  actv    svc_t    %w    %b   sd1              384.0  1157.5      48.0    116.8    0.0    8.8        5.7      2  100   sd1              368.0  1117.9      45.7    106.3    0.0    8.0        5.4      2  100   sd1              330.3  1357.5      41.3    139.1    0.0    9.5        5.6      2  100   ■ Solution: change ZFS block size to 8K: Proprietary and Thursday, April 18, 13 Confidential 80
  • 184. ZFS Block Size (ctd.) device            r/s        w/s      Mr/s      Mw/s  wait  actv    svc_t    %w    %b   sd1              384.0  1157.5      48.0    116.8    0.0    8.8        5.7      2  100   sd1              368.0  1117.9      45.7    106.3    0.0    8.0        5.4      2  100   sd1              330.3  1357.5      41.3    139.1    0.0    9.5        5.6      2  100   ■ Solution: change ZFS block size to 8K: device            r/s        w/s      Mr/s      Mw/s  wait  actv    svc_t    %w    %b sd1              130.3    219.9        1.0        4.4    0.0    0.7        2.1      0    37 sd1              329.3    384.1        2.6      11.3    0.0    1.9        2.6      1    78 sd1              335.3    357.0        2.6        8.7    0.0    1.8        2.6      1    80 sd1              354.0    200.3        2.8        4.9    0.0    1.6        3.0      0    84 sd1              465.3    100.7        3.6        1.7    0.0    2.1        3.7      0    91 Proprietary and Thursday, April 18, 13 Confidential 80
  • 185. Next: Vertical Sharding Proprietary and Thursday, April 18, 13 Confidential 81
  • 186. Next: Vertical Sharding ■ Move out largest table into its own master database (150 inserts/sec) Proprietary and Thursday, April 18, 13 Confidential 81
  • 187. Next: Vertical Sharding ■ Move out largest table into its own master database (150 inserts/sec) ■ Remove any SQL joins, do them in application, drop foreign keys Proprietary and Thursday, April 18, 13 Confidential 81
  • 188. Next: Vertical Sharding ■ Move out largest table into its own master database (150 inserts/sec) ■ Remove any SQL joins, do them in application, drop foreign keys ■ Switch model to establish_connection to another DB. Fix many broken tests. Proprietary and Thursday, April 18, 13 Confidential 81
  • 189. Vertical Sharding unicorns haproxy pgbouncer twemproxy PostgreSQL saves master PostgreSQL PostgreSQL PostgreSQL main replica main replica main master streaming replication Proprietary and Thursday, April 18, 13 Confidential 82
  • 190. Vertical Sharding: Results Proprietary and Thursday, April 18, 13 Confidential 83
  • 191. Vertical Sharding: Results ■ Deploy All Things! Proprietary and Thursday, April 18, 13 Confidential 83
  • 192. Future: Services Approach unicorns haproxy pgbouncer twemproxy http / json PostgreSQL PostgreSQL PostgreSQL sinatra services app main replica main replica main master streaming replication Shard1 Shard2 Shard3 Proprietary and Thursday, April 18, 13 Confidential 84
  • 193. In Conclusion. Tasty gems :) https://github.com/wanelo/pause https://github.com/wanelo/spanx https://github.com/wanelo/redis_with_failover https://github.com/kigster/ventable Proprietary and Thursday, April 18, 13 Confidential 85
  • 194. In Conclusion. Tasty gems :) https://github.com/wanelo/pause ■ distributed rate limiting using redis https://github.com/wanelo/spanx https://github.com/wanelo/redis_with_failover https://github.com/kigster/ventable Proprietary and Thursday, April 18, 13 Confidential 85
  • 195. In Conclusion. Tasty gems :) https://github.com/wanelo/pause ■ distributed rate limiting using redis https://github.com/wanelo/spanx ■ rate-limit-based IP blocker for nginx https://github.com/wanelo/redis_with_failover https://github.com/kigster/ventable Proprietary and Thursday, April 18, 13 Confidential 85
  • 196. In Conclusion. Tasty gems :) https://github.com/wanelo/pause ■ distributed rate limiting using redis https://github.com/wanelo/spanx ■ rate-limit-based IP blocker for nginx https://github.com/wanelo/redis_with_failover ■ attempt another redis server if available https://github.com/kigster/ventable Proprietary and Thursday, April 18, 13 Confidential 85
  • 197. In Conclusion. Tasty gems :) https://github.com/wanelo/pause ■ distributed rate limiting using redis https://github.com/wanelo/spanx ■ rate-limit-based IP blocker for nginx https://github.com/wanelo/redis_with_failover ■ attempt another redis server if available https://github.com/kigster/ventable ■ observable pattern with a twist Proprietary and Thursday, April 18, 13 Confidential 85
  • 198. Thanks. Comments? Questions? https://github.com/wanelo https://github.com/wanelo-chef @kig & @sax @kig & @ecdysone @kigster & @sax Proprietary and Thursday, April 18, 13 Confidential 86

Notas do Editor

  1. Our mission is to democratize and transform the world&apos;s commerce by reorganizing shopping around people.
  2. Some of the stores have close to half a million followers. Some are big and known, and some aren’t at all, outside of Wanelo.
  3. Near real time updates to your feed, as people post products to stores you follow, or collections. Following a hashtag is very powerful.
  4. Rails backend API, simple JSON in/out, using RABL for rendering JSON back (slow!). JSON.generate() is so much faster than to_json
  5. included in /contrib in the Postgres source. Very easy to install. If a package does not come with pg_stat_statements, this is a reason to compile it yourself.
  6. This is why we like Postgres: visibility tools
  7. Sometimes you throw everything in a single graph, not knowing if it’s useful Sometimes that graph saves your ass when you happen to see it out of the corner of your eye
  8. Extremely useful to correlate different data points visually
  9. Why are you even waiting on disks? Postgres relies heavily on the file cache
  10. Adaptive Replacement Cache This is why we like SmartOS/Illumos/Solaris: visibility tools
  11. Great thing about ARC: even when your query misses in-RAM db cache, you hit an in-RAM file cache
  12. Slowed down the site to the point where errors started happening
  13. purple is hit ratio of cache servers
  14. purple is hit ratio of cache servers
  15. purple is hit ratio of cache servers
  16. blue: writes red: automatic eviction
  17. Hard to do this after the fact
  18. This is why you want to already be on Postgres. You can take risks knowing that PG will throw errors, not corrupt data.
  19. When you pull aside the curtain of a Ruby DB adapter, you can get a sense of... betrayal. Why is it written like this? Why method_missing? Why????? ActiveRecord is a finely crafted pile of code defined after the fact. Unfortunately, the DB adapters that don’t use crazy metaprogramming do things even worse to avoid it. Error handling is set of regexs. Easy to extend. Requests after a write read from master.
  20. Putting everything into a class namespace per thread is not thread safety. Threaded code often spawns new threads.
  21. New Relic application graph for month of December
  22. Graphite / Circonus
  23. Postgres 9.2 specific. 9.1 you basically have to connect to both master and replica, do binary math
  24. By the way, these error spikes are 10/minute.