This document summarizes Facebook's use of MySQL at web scale. Key points include:
- Facebook has over 800 million monthly active users generating huge volumes of queries and data.
- MySQL is customized with patches and extra resiliency to handle these loads.
- Performance optimizations focus on avoiding stalls and fully utilizing hardware.
- Tools help identify stalls and bottlenecks like table extensions, purge contention, and I/O pressure.
- Memory and disk efficiency are improved through techniques like compact records and flash caching.
- Group commit and admission control were added to MySQL to further optimize high concurrency workloads.
- Ongoing work looks to leverage new hardware, improve replication, and incorporate compression.
6. Setup Software MySQL 5.1 Custom facebook patch Launchpad - mysqlatfacebook Extra resiliency Reduced operations effort Hardware Variety of generations Many core Local storage Some flash storage
7. UDB Performance numbers (FromSep. 2011) Query response time 4ms reads, 5ms writes Network bytes sent per second 90GB peak Queries per second 60M peak Rows read per second 1450M peak Rows changed per second 3.5M peak InnoDB page IO per second 8.1M peak
8. Performance focus Focus on reliable throughput in production Avoid performance stalls Make sure hardware is used 99th percentile rather than average or median Worst offender analysis – topN & histograms instead of tier averages
10. Stall tools Dogpiled (in-house) Snapshot aggregation of server state at distress “time machine” view into logs before the event too Aspersa (stalk, collect) Poor man’s profiler (.org) Later iterations – apmp, hpmp, tpmp GDB
11. Stalls found Tables extending – global I/O mutex held Drop table – both SQL layer and InnoDB global mutexes held Purge contention – unnecessary dictionary lock held Binlog reads – no commits can happen if old events read Kernel mutex – O(N) and O(N^2) operations Transaction creation Lock creation/removal, deadlock detection Background page flushing not really background Many more
12. Efficiency Increasing utilization of hardware Memory to Disk ratio Finding bottlenecks Disk bound normally Sometimes network Application or server software chokepoints Rarely CPU/memory bandwidth Application design Biggest wins are in optimizing the workload
13. Disk efficiency Normally diskIOPS bound Allowing higher queue lengths Can operate at more than 8 pending operations per disk InnoDB page size Need adjustable per table or index for real gain XFS/deadline Parallelism at MySQL layer >300 iops on 166 rps disks
14. Memory efficiency Compact records – Thrift compaction for objects, etc Clustered and covering index planning FORCE INDEX – avoid unnecessary I/O and cached pages Historical data access costly Full table scans ETL-type queries, mysqldump, … Tune midpoint insertion LRU for InnoDB Incremental updating, incremental binary backups O_DIRECT data and logs access
15. Pure flash (Cheating) Data stored directly on flash Limited data size Not utilizing flash card fully Still used in some cases
16. Flashcache Flash in front of disks Can use slower disks Write-back cache Much more data storage Able to utilize much more of flash card Very long warmup time Open source (github/facebook/flashcache)
17. MySQL 2x Flash allows for large loads Large performance difference from pure disk servers Many older servers still being used Solution? Run multiple MySQL instances per server Use ports 3307, 3308, 3309, etc… Replication prevents direct consolidation Redo a lot of port assumptions in code
18. Application caching Old: memcached Cache invalidation stampedes, refetching full dataset on refresh, many copies New: write-through caching Incremental cache updates Cache hierarchies for datacenter local copies Efficient operations for association set Common API for all use cases
19. Group commit Some OLTP workloads too busy even for modern RAID cards High I/O pressure increases response times Durability compromises increase operational overhead Dead batteries are extremely painful otherwise Now in 5.1.52-fb
20. Admission control Server resources are limited Per account thread concurrency Reduces O(N^2) blowup chance max_connections are no longer impacting server load Per-application resource throttling Now in 5.1.52-fb
21. Online Schema Change External PHP script, open source Utilizes triggers for change tracking Used on 100G+ sized tables Dump/reload + fast index creation Extendable class, may allow: PK composition changes with conflict resolution Indexing previously unindexed datasets
22. Tools Table and user statistics Shadows Slocket pmysql Replication sampling Client log aggregation Query comments Indigo (Query monitor)
24. Future MySQL is never a solved problem Always investigating better/new solutions New hardware types New datacenters and topologies New use cases and clients New neighbors to share data with
25. Visibility Never assume Use metrics to measure When metrics aren’t available, add them Full stack More InnoDB info More application info
26. Replication Lag used to be a big problem, still is a bottleneck Possible solutions: “Better” slave prefetch Maatkit version has problems Our own version being used on some tiers successfully May be possible with InnoDB cooperation Continuent parallel slave Oracle parallel slave in 5.6
27. InnoDB Compression Originally was planned during 5.1 upgrade Problems Replication stream cost Increased log writes Performance in some cases Stability, monitoring, etc