Web scale MySQL at Facebook (Domas Mituzas)

Web scale MySQL@ facebook DomasMituzas 2011-10-03

Facebook 800M active monthly users 500M active daily users 350M mobile users 7M apps and websites integrated via platform

Setup Software MySQL 5.1 Custom facebook patch Launchpad - mysqlatfacebook Extra resiliency Reduced operations effort Hardware Variety of generations Many core Local storage Some flash storage

UDB Performance numbers (FromSep. 2011) Query response time 4ms reads, 5ms writes Network bytes sent per second 90GB peak Queries per second 60M peak Rows read per second 1450M peak Rows changed per second 3.5M peak InnoDB page IO per second 8.1M peak

Performance focus Focus on reliable throughput in production Avoid performance stalls Make sure hardware is used 99th percentile rather than average or median Worst offender analysis – topN & histograms instead of tier averages

Stalls “Dogpiles” Temporary slow down – even 0.1s is huge

Stall tools Dogpiled (in-house) Snapshot aggregation of server state at distress “time machine” view into logs before the event too Aspersa (stalk, collect) Poor man’s profiler (.org) Later iterations – apmp, hpmp, tpmp GDB

Stalls found Tables extending – global I/O mutex held Drop table – both SQL layer and InnoDB global mutexes held Purge contention – unnecessary dictionary lock held Binlog reads – no commits can happen if old events read Kernel mutex – O(N) and O(N^2) operations Transaction creation Lock creation/removal, deadlock detection Background page flushing not really background Many more

Efficiency Increasing utilization of hardware Memory to Disk ratio Finding bottlenecks Disk bound normally Sometimes network Application or server software chokepoints Rarely CPU/memory bandwidth Application design Biggest wins are in optimizing the workload

Disk efficiency Normally diskIOPS bound Allowing higher queue lengths Can operate at more than 8 pending operations per disk InnoDB page size Need adjustable per table or index for real gain XFS/deadline Parallelism at MySQL layer >300 iops on 166 rps disks

Memory efficiency Compact records – Thrift compaction for objects, etc Clustered and covering index planning FORCE INDEX – avoid unnecessary I/O and cached pages Historical data access costly Full table scans ETL-type queries, mysqldump, … Tune midpoint insertion LRU for InnoDB Incremental updating, incremental binary backups O_DIRECT data and logs access

Pure flash (Cheating) Data stored directly on flash Limited data size Not utilizing flash card fully Still used in some cases

Flashcache Flash in front of disks Can use slower disks Write-back cache Much more data storage Able to utilize much more of flash card Very long warmup time Open source (github/facebook/flashcache)

MySQL 2x Flash allows for large loads Large performance difference from pure disk servers Many older servers still being used Solution? Run multiple MySQL instances per server Use ports 3307, 3308, 3309, etc… Replication prevents direct consolidation Redo a lot of port assumptions in code

Application caching Old: memcached Cache invalidation stampedes, refetching full dataset on refresh, many copies New: write-through caching Incremental cache updates Cache hierarchies for datacenter local copies Efficient operations for association set Common API for all use cases

Group commit Some OLTP workloads too busy even for modern RAID cards High I/O pressure increases response times Durability compromises increase operational overhead Dead batteries are extremely painful otherwise Now in 5.1.52-fb

Admission control Server resources are limited Per account thread concurrency Reduces O(N^2) blowup chance max_connections are no longer impacting server load Per-application resource throttling Now in 5.1.52-fb

Online Schema Change External PHP script, open source Utilizes triggers for change tracking Used on 100G+ sized tables Dump/reload + fast index creation Extendable class, may allow: PK composition changes with conflict resolution Indexing previously unindexed datasets

Tools Table and user statistics Shadows Slocket pmysql Replication sampling Client log aggregation Query comments Indigo (Query monitor)

Future MySQL is never a solved problem Always investigating better/new solutions New hardware types New datacenters and topologies New use cases and clients New neighbors to share data with

Visibility Never assume Use metrics to measure When metrics aren’t available, add them Full stack More InnoDB info More application info

Replication Lag used to be a big problem, still is a bottleneck Possible solutions: “Better” slave prefetch Maatkit version has problems Our own version being used on some tiers successfully May be possible with InnoDB cooperation Continuent parallel slave Oracle parallel slave in 5.6

InnoDB Compression Originally was planned during 5.1 upgrade Problems Replication stream cost Increased log writes Performance in some cases Stability, monitoring, etc

Web scale MySQL at Facebook (Domas Mituzas)

Web scale MySQL at Facebook (Domas Mituzas)

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Web scale MySQL at Facebook (Domas Mituzas)

Semelhante a Web scale MySQL at Facebook (Domas Mituzas) (20)

Mais de Ontico

Mais de Ontico (20)

Último

Último (20)

Web scale MySQL at Facebook (Domas Mituzas)