SlideShare uma empresa Scribd logo
1 de 31
Baixar para ler offline
1
Scaling to Millions of
Simultaneous Connections
Rick Reed
WhatsApp
Erlang Factory SF
March 30, 2012
2
About ...
Joined WhatsApp in 2011
New to Erlang
Background in performance of C-based
systems on FreeBSD and Linux
Prior work at Yahoo!, SGI
3
Overview
The “good problem to have”
Performance Goals
Tools and Techniques
Results
General Findings
Specific Scalability Fixes
4
The Problem
A good problem, but a problem nonetheless
Growth, Earthquakes, and Soccer!
Msg rates for past four weeks
Mexican earthquake
goals
HT FT
5
The Problem
Initial server loading: ~200k connections
Discouraging prognosis for growth
Cluster brittle in the face of failures/overloads
6
Performance Goals
1 Million connections per server … !
Resilience against disruptions under load
Software failures
Hardware failures (servers, network gear)
World events (sports, earthquakes, etc.)
7
Performance Goals
Our standard configuration
Dual Westmere Hex-core (24 logical CPUs)
100GB RAM, SSD
Dual NIC (user-facing, back-end/distribution)
FreeBSD 8.3
OTP R14B03
8
Tools and Techniques
System activity monitoring (wsar)
OS-level
BEAM
9
Tools and Techniques
Processor hardware perf counters (pmcstat)
dtrace, kernel lock-counting, gprof
10
Tools and Techniques
fprof (w/ and w/o cpu_timestamp)
BEAM lock-counting (invaluable!!!)
11
Tools and Techniques
Synthetic workload
Good for subsystems with simple interfaces
Limited value for user-facing systems
12
Tools and Techniques
Tee'd workload
Where side-effects can be contained
Extremely useful for tuning
13
Tools and Techniques
Diverted workload
Add additional production load to server
DNS via extra IP aliases
TTL issues
IPFW forwarding
Ran into a few kernel panics at high conn counts
14
Results
Initial bottlenecks appeared around 425k
First round of fixes got us to 1M conns
Fruit was hanging pretty low
15
Results
Continued attacking similar bottlenecks
Achieved 2M conns about a month later
Put further optimizations on back burner
16
Results
Began optimizing app code after New Years
Unintentional record attempt in Feb
Peaked at 2.8M conns before we intervened
571k pkts/sec, >200k dist msgs/sec
17
Results
Still trying to obtain elusive 3M conns
St. Patrick's Day wasn't as lucky as hoped
18
General Findings
Erlang has awesome SMP scalability
>85% cpu utilization across 24 logical cpus
FreeBSD shines as well
19
General Findings
CPU% vs. # Conns
20
General Findings
Contention, contention, contention
From 200k to 2M were all contention fixes
Some issues are internal to BEAM
Some addressable with app changes
Most required BEAM patches
Some required app changes
Especially: partitioning workload correctly
Some common Erlang idioms come at a price
21
Specific Scalability Fixes
FreeBSD
Backported TSC-based kernel timecounter
gettimeofday(2) calls much less expensive
Backported igb network driver
Had issues with MSI-X queue stalls
sysctl tuning
Obvious limits (e.g., kern.ipc.maxsockets)
net.inet.tcp.tcphashsize=524288
22
Specific Scalability Fixes
BEAM metrics
Scheduler (%util, csw, waits, sleeps, …)
statistics(message_queues)
Msgs queued, #non-empty queues, longest queue
process_info(message_queue_stats)
Enq/deq/send count & rates (1s, 10s, 100s)
statistics(message_counts)
Aggregation of message_queue_stats
Enable fprof cpu_timestamp for FreeBSD
23
Specific Scalability Fixes
BEAM metrics (cont.)
Make lock-counting work for larger async
thread counts (e.g., +A 1024)
Add suspend, location, and port_locks options
to erts_debug:lock_counters
Enable/disable process/port lock counting at
runtime
Fix missing accounting for outbound dist bytes
24
Specific Scalability Fixes
BEAM tuning
+swt low
Avoid scheduler perma-sleep
+Mummc/mmmbc/mmsbc 99999
Prefer mseg over malloc
+Mut 24
Want allocator instance per scheduler
25
Specific Scalability Fixes
BEAM tuning
+Mulmbcs 32767 +Mumbcgs 1
+Musmbcs 2047
Want large 2M-aligned mseg allocations to
maximize superpage promotions
Run with real-time scheduling priority
+ssct 1 (via patch; scheduler spin count)
26
Specific Scalability Fixes
BEAM contention
timeofday lock (esp., timeofday delivery)
Reduced slot traversals on timer wheel
Widened bif timer hash table
Ended up moving bif timers to receive timeouts
Improved check_io allocation scalability
Added prim_file:write_file/3 & /4 (port reuse)
Disable mseg max check
27
Specific Scalability Fixes
BEAM contention (cont.)
Reduce setopts calls in prim_inet:accept
and in inet:tcp_controlling_process
28
Specific Scalability Fixes
OTP throughput
Add gc throttling when message queue is long
Increase default dist receive buffer from 4k to
256k (and make configurable)
Patch mnesia_tm to dispatch async_dirty txns
to separate per-table procs for concurrency
Add pg2 denormalized group member lists to
improve lookup throughput
Increase max configurable mseg cache size
29
Specific Scalability Fixes
Erlang usage
Prefer os:timestamp to erlang:now
Implement cross-node gen_server calls without
using monitors (reduces dist traffic and proc
link lock contention)
Partition ets and mnesia tables and localize
access to smaller number of processes
Small mnesia clusters
30
Specific Scalability Fixes
Operability fixes
Added [prepend] option to erlang:send
Added process_flag(flush_message_queue)
31
Questions? Comments?
rr@whatsapp.com

Mais conteúdo relacionado

Mais procurados

Best Practices of HA and Replication of PostgreSQL in Virtualized Environments
Best Practices of HA and Replication of PostgreSQL in Virtualized EnvironmentsBest Practices of HA and Replication of PostgreSQL in Virtualized Environments
Best Practices of HA and Replication of PostgreSQL in Virtualized Environments
Jignesh Shah
 
BPF: Tracing and more
BPF: Tracing and moreBPF: Tracing and more
BPF: Tracing and more
Brendan Gregg
 
Linux 4.x Tracing Tools: Using BPF Superpowers
Linux 4.x Tracing Tools: Using BPF SuperpowersLinux 4.x Tracing Tools: Using BPF Superpowers
Linux 4.x Tracing Tools: Using BPF Superpowers
Brendan Gregg
 
Linux internal
Linux internalLinux internal
Linux internal
mcganesh
 

Mais procurados (20)

[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...
 
Java Performance Analysis on Linux with Flame Graphs
Java Performance Analysis on Linux with Flame GraphsJava Performance Analysis on Linux with Flame Graphs
Java Performance Analysis on Linux with Flame Graphs
 
Optimizing Servers for High-Throughput and Low-Latency at Dropbox
Optimizing Servers for High-Throughput and Low-Latency at DropboxOptimizing Servers for High-Throughput and Low-Latency at Dropbox
Optimizing Servers for High-Throughput and Low-Latency at Dropbox
 
Linux cgroups and namespaces
Linux cgroups and namespacesLinux cgroups and namespaces
Linux cgroups and namespaces
 
Container Performance Analysis
Container Performance AnalysisContainer Performance Analysis
Container Performance Analysis
 
Presentation on Shared Memory Parallel Programming
Presentation on Shared Memory Parallel ProgrammingPresentation on Shared Memory Parallel Programming
Presentation on Shared Memory Parallel Programming
 
Linux Networking Explained
Linux Networking ExplainedLinux Networking Explained
Linux Networking Explained
 
Iptables in linux
Iptables in linuxIptables in linux
Iptables in linux
 
gRPC Design and Implementation
gRPC Design and ImplementationgRPC Design and Implementation
gRPC Design and Implementation
 
Best Practices of HA and Replication of PostgreSQL in Virtualized Environments
Best Practices of HA and Replication of PostgreSQL in Virtualized EnvironmentsBest Practices of HA and Replication of PostgreSQL in Virtualized Environments
Best Practices of HA and Replication of PostgreSQL in Virtualized Environments
 
Linux Performance Tools
Linux Performance ToolsLinux Performance Tools
Linux Performance Tools
 
Introduction to eBPF
Introduction to eBPFIntroduction to eBPF
Introduction to eBPF
 
Linux Interrupts
Linux InterruptsLinux Interrupts
Linux Interrupts
 
BPF: Tracing and more
BPF: Tracing and moreBPF: Tracing and more
BPF: Tracing and more
 
Linux Internals - Kernel/Core
Linux Internals - Kernel/CoreLinux Internals - Kernel/Core
Linux Internals - Kernel/Core
 
Linux 4.x Tracing Tools: Using BPF Superpowers
Linux 4.x Tracing Tools: Using BPF SuperpowersLinux 4.x Tracing Tools: Using BPF Superpowers
Linux 4.x Tracing Tools: Using BPF Superpowers
 
Performance Tuning EC2 Instances
Performance Tuning EC2 InstancesPerformance Tuning EC2 Instances
Performance Tuning EC2 Instances
 
Os Linux
Os LinuxOs Linux
Os Linux
 
Linux internal
Linux internalLinux internal
Linux internal
 
Dhcp server configuration
Dhcp server configurationDhcp server configuration
Dhcp server configuration
 

Destaque

Hike 29 Presentation
Hike 29 PresentationHike 29 Presentation
Hike 29 Presentation
52 Hikes
 
Scaling Hike Messenger to 15M Users
Scaling Hike Messenger to 15M UsersScaling Hike Messenger to 15M Users
Scaling Hike Messenger to 15M Users
MongoDB
 

Destaque (13)

Realtime communication in mobile
Realtime communication in mobileRealtime communication in mobile
Realtime communication in mobile
 
Erlang, the big switch in social games
Erlang, the big switch in social gamesErlang, the big switch in social games
Erlang, the big switch in social games
 
MQTT with Java - a protocol for IoT and M2M communication
MQTT with Java - a protocol for IoT and M2M communicationMQTT with Java - a protocol for IoT and M2M communication
MQTT with Java - a protocol for IoT and M2M communication
 
Getting started with MQTT - Virtual IoT Meetup presentation
Getting started with MQTT - Virtual IoT Meetup presentationGetting started with MQTT - Virtual IoT Meetup presentation
Getting started with MQTT - Virtual IoT Meetup presentation
 
Hike 29 Presentation
Hike 29 PresentationHike 29 Presentation
Hike 29 Presentation
 
Whatsapp Technical
Whatsapp Technical Whatsapp Technical
Whatsapp Technical
 
WhatsApp architecture
WhatsApp architectureWhatsApp architecture
WhatsApp architecture
 
Whatsapp's Architecture
Whatsapp's ArchitectureWhatsapp's Architecture
Whatsapp's Architecture
 
Scaling Hike Messenger to 15M Users
Scaling Hike Messenger to 15M UsersScaling Hike Messenger to 15M Users
Scaling Hike Messenger to 15M Users
 
Whatsapp
WhatsappWhatsapp
Whatsapp
 
Whatsapp project work
Whatsapp project workWhatsapp project work
Whatsapp project work
 
10 Amazing facts about WhatsApp
10 Amazing facts about WhatsApp10 Amazing facts about WhatsApp
10 Amazing facts about WhatsApp
 
whatsapp ppt
whatsapp pptwhatsapp ppt
whatsapp ppt
 

Semelhante a Scaling to Millions of Simultaneous Connections by Rick Reed from WhatsApp

Cell Today and Tomorrow - IBM Systems and Technology Group
Cell Today and Tomorrow - IBM Systems and Technology GroupCell Today and Tomorrow - IBM Systems and Technology Group
Cell Today and Tomorrow - IBM Systems and Technology Group
Slide_N
 
Was liberty at scale
Was liberty at scaleWas liberty at scale
Was liberty at scale
sflynn073
 

Semelhante a Scaling to Millions of Simultaneous Connections by Rick Reed from WhatsApp (20)

Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral Program
Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral ProgramBig Lab Problems Solved with Spectrum Scale: Innovations for the Coral Program
Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral Program
 
Q2.12: Research Update on big.LITTLE MP Scheduling
Q2.12: Research Update on big.LITTLE MP SchedulingQ2.12: Research Update on big.LITTLE MP Scheduling
Q2.12: Research Update on big.LITTLE MP Scheduling
 
Prelim Slides
Prelim SlidesPrelim Slides
Prelim Slides
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large Scale
 
Handout3o
Handout3oHandout3o
Handout3o
 
PraveenBOUT++
PraveenBOUT++PraveenBOUT++
PraveenBOUT++
 
How We Test MongoDB: Evergreen
How We Test MongoDB: EvergreenHow We Test MongoDB: Evergreen
How We Test MongoDB: Evergreen
 
Gatehouse software genanvendelse
Gatehouse software genanvendelseGatehouse software genanvendelse
Gatehouse software genanvendelse
 
Data race
Data raceData race
Data race
 
Multiscale Dataflow Computing: Competitive Advantage at the Exascale Frontier
Multiscale Dataflow Computing: Competitive Advantage at the Exascale FrontierMultiscale Dataflow Computing: Competitive Advantage at the Exascale Frontier
Multiscale Dataflow Computing: Competitive Advantage at the Exascale Frontier
 
Metrics towards enterprise readiness of unikernels
Metrics towards enterprise readiness of unikernelsMetrics towards enterprise readiness of unikernels
Metrics towards enterprise readiness of unikernels
 
State Zero: Middle Tennessee Electric Membership Corporation
State Zero: Middle Tennessee Electric Membership CorporationState Zero: Middle Tennessee Electric Membership Corporation
State Zero: Middle Tennessee Electric Membership Corporation
 
MTEMC’s State 0 Changes with 1700+ Versions Intact
MTEMC’s State 0 Changes with 1700+ Versions IntactMTEMC’s State 0 Changes with 1700+ Versions Intact
MTEMC’s State 0 Changes with 1700+ Versions Intact
 
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
 
Top Java Performance Problems and Metrics To Check in Your Pipeline
Top Java Performance Problems and Metrics To Check in Your PipelineTop Java Performance Problems and Metrics To Check in Your Pipeline
Top Java Performance Problems and Metrics To Check in Your Pipeline
 
Cell Today and Tomorrow - IBM Systems and Technology Group
Cell Today and Tomorrow - IBM Systems and Technology GroupCell Today and Tomorrow - IBM Systems and Technology Group
Cell Today and Tomorrow - IBM Systems and Technology Group
 
Master's Thesis - climateprediction.net: A Cloudy Approach
Master's Thesis - climateprediction.net: A Cloudy ApproachMaster's Thesis - climateprediction.net: A Cloudy Approach
Master's Thesis - climateprediction.net: A Cloudy Approach
 
Hptf 2240 Final
Hptf 2240 FinalHptf 2240 Final
Hptf 2240 Final
 
Was liberty at scale
Was liberty at scaleWas liberty at scale
Was liberty at scale
 
Case Study with Answers.com on Scaling with Memcached and MySQL
Case Study with Answers.com on Scaling with Memcached and MySQLCase Study with Answers.com on Scaling with Memcached and MySQL
Case Study with Answers.com on Scaling with Memcached and MySQL
 

Mais de mustafa sarac

Mais de mustafa sarac (20)

Uluslararasilasma son
Uluslararasilasma sonUluslararasilasma son
Uluslararasilasma son
 
Real time machine learning proposers day v3
Real time machine learning proposers day v3Real time machine learning proposers day v3
Real time machine learning proposers day v3
 
Latka december digital
Latka december digitalLatka december digital
Latka december digital
 
Axial RC SCX10 AE2 ESC user manual
Axial RC SCX10 AE2 ESC user manualAxial RC SCX10 AE2 ESC user manual
Axial RC SCX10 AE2 ESC user manual
 
Array programming with Numpy
Array programming with NumpyArray programming with Numpy
Array programming with Numpy
 
Math for programmers
Math for programmersMath for programmers
Math for programmers
 
The book of Why
The book of WhyThe book of Why
The book of Why
 
BM sgk meslek kodu
BM sgk meslek koduBM sgk meslek kodu
BM sgk meslek kodu
 
TEGV 2020 Bireysel bagiscilarimiz
TEGV 2020 Bireysel bagiscilarimizTEGV 2020 Bireysel bagiscilarimiz
TEGV 2020 Bireysel bagiscilarimiz
 
How to make and manage a bee hotel?
How to make and manage a bee hotel?How to make and manage a bee hotel?
How to make and manage a bee hotel?
 
Cahit arf makineler dusunebilir mi
Cahit arf makineler dusunebilir miCahit arf makineler dusunebilir mi
Cahit arf makineler dusunebilir mi
 
How did Software Got So Reliable Without Proof?
How did Software Got So Reliable Without Proof?How did Software Got So Reliable Without Proof?
How did Software Got So Reliable Without Proof?
 
Staff Report on Algorithmic Trading in US Capital Markets
Staff Report on Algorithmic Trading in US Capital MarketsStaff Report on Algorithmic Trading in US Capital Markets
Staff Report on Algorithmic Trading in US Capital Markets
 
Yetiskinler icin okuma yazma egitimi
Yetiskinler icin okuma yazma egitimiYetiskinler icin okuma yazma egitimi
Yetiskinler icin okuma yazma egitimi
 
Consumer centric api design v0.4.0
Consumer centric api design v0.4.0Consumer centric api design v0.4.0
Consumer centric api design v0.4.0
 
State of microservices 2020 by tsh
State of microservices 2020 by tshState of microservices 2020 by tsh
State of microservices 2020 by tsh
 
Uber pitch deck 2008
Uber pitch deck 2008Uber pitch deck 2008
Uber pitch deck 2008
 
Wireless solar keyboard k760 quickstart guide
Wireless solar keyboard k760 quickstart guideWireless solar keyboard k760 quickstart guide
Wireless solar keyboard k760 quickstart guide
 
State of Serverless Report 2020
State of Serverless Report 2020State of Serverless Report 2020
State of Serverless Report 2020
 
Dont just roll the dice
Dont just roll the diceDont just roll the dice
Dont just roll the dice
 

Último

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 
+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...
+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...
+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...
Health
 
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
HenryBriggs2
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Último (20)

data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Learn the concepts of Thermodynamics on Magic Marks
Learn the concepts of Thermodynamics on Magic MarksLearn the concepts of Thermodynamics on Magic Marks
Learn the concepts of Thermodynamics on Magic Marks
 
2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to Computers
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Bridge Jacking Design Sample Calculation.pptx
Bridge Jacking Design Sample Calculation.pptxBridge Jacking Design Sample Calculation.pptx
Bridge Jacking Design Sample Calculation.pptx
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxA CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...
+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...
+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...
 
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
 

Scaling to Millions of Simultaneous Connections by Rick Reed from WhatsApp

  • 1. 1 Scaling to Millions of Simultaneous Connections Rick Reed WhatsApp Erlang Factory SF March 30, 2012
  • 2. 2 About ... Joined WhatsApp in 2011 New to Erlang Background in performance of C-based systems on FreeBSD and Linux Prior work at Yahoo!, SGI
  • 3. 3 Overview The “good problem to have” Performance Goals Tools and Techniques Results General Findings Specific Scalability Fixes
  • 4. 4 The Problem A good problem, but a problem nonetheless Growth, Earthquakes, and Soccer! Msg rates for past four weeks Mexican earthquake goals HT FT
  • 5. 5 The Problem Initial server loading: ~200k connections Discouraging prognosis for growth Cluster brittle in the face of failures/overloads
  • 6. 6 Performance Goals 1 Million connections per server … ! Resilience against disruptions under load Software failures Hardware failures (servers, network gear) World events (sports, earthquakes, etc.)
  • 7. 7 Performance Goals Our standard configuration Dual Westmere Hex-core (24 logical CPUs) 100GB RAM, SSD Dual NIC (user-facing, back-end/distribution) FreeBSD 8.3 OTP R14B03
  • 8. 8 Tools and Techniques System activity monitoring (wsar) OS-level BEAM
  • 9. 9 Tools and Techniques Processor hardware perf counters (pmcstat) dtrace, kernel lock-counting, gprof
  • 10. 10 Tools and Techniques fprof (w/ and w/o cpu_timestamp) BEAM lock-counting (invaluable!!!)
  • 11. 11 Tools and Techniques Synthetic workload Good for subsystems with simple interfaces Limited value for user-facing systems
  • 12. 12 Tools and Techniques Tee'd workload Where side-effects can be contained Extremely useful for tuning
  • 13. 13 Tools and Techniques Diverted workload Add additional production load to server DNS via extra IP aliases TTL issues IPFW forwarding Ran into a few kernel panics at high conn counts
  • 14. 14 Results Initial bottlenecks appeared around 425k First round of fixes got us to 1M conns Fruit was hanging pretty low
  • 15. 15 Results Continued attacking similar bottlenecks Achieved 2M conns about a month later Put further optimizations on back burner
  • 16. 16 Results Began optimizing app code after New Years Unintentional record attempt in Feb Peaked at 2.8M conns before we intervened 571k pkts/sec, >200k dist msgs/sec
  • 17. 17 Results Still trying to obtain elusive 3M conns St. Patrick's Day wasn't as lucky as hoped
  • 18. 18 General Findings Erlang has awesome SMP scalability >85% cpu utilization across 24 logical cpus FreeBSD shines as well
  • 20. 20 General Findings Contention, contention, contention From 200k to 2M were all contention fixes Some issues are internal to BEAM Some addressable with app changes Most required BEAM patches Some required app changes Especially: partitioning workload correctly Some common Erlang idioms come at a price
  • 21. 21 Specific Scalability Fixes FreeBSD Backported TSC-based kernel timecounter gettimeofday(2) calls much less expensive Backported igb network driver Had issues with MSI-X queue stalls sysctl tuning Obvious limits (e.g., kern.ipc.maxsockets) net.inet.tcp.tcphashsize=524288
  • 22. 22 Specific Scalability Fixes BEAM metrics Scheduler (%util, csw, waits, sleeps, …) statistics(message_queues) Msgs queued, #non-empty queues, longest queue process_info(message_queue_stats) Enq/deq/send count & rates (1s, 10s, 100s) statistics(message_counts) Aggregation of message_queue_stats Enable fprof cpu_timestamp for FreeBSD
  • 23. 23 Specific Scalability Fixes BEAM metrics (cont.) Make lock-counting work for larger async thread counts (e.g., +A 1024) Add suspend, location, and port_locks options to erts_debug:lock_counters Enable/disable process/port lock counting at runtime Fix missing accounting for outbound dist bytes
  • 24. 24 Specific Scalability Fixes BEAM tuning +swt low Avoid scheduler perma-sleep +Mummc/mmmbc/mmsbc 99999 Prefer mseg over malloc +Mut 24 Want allocator instance per scheduler
  • 25. 25 Specific Scalability Fixes BEAM tuning +Mulmbcs 32767 +Mumbcgs 1 +Musmbcs 2047 Want large 2M-aligned mseg allocations to maximize superpage promotions Run with real-time scheduling priority +ssct 1 (via patch; scheduler spin count)
  • 26. 26 Specific Scalability Fixes BEAM contention timeofday lock (esp., timeofday delivery) Reduced slot traversals on timer wheel Widened bif timer hash table Ended up moving bif timers to receive timeouts Improved check_io allocation scalability Added prim_file:write_file/3 & /4 (port reuse) Disable mseg max check
  • 27. 27 Specific Scalability Fixes BEAM contention (cont.) Reduce setopts calls in prim_inet:accept and in inet:tcp_controlling_process
  • 28. 28 Specific Scalability Fixes OTP throughput Add gc throttling when message queue is long Increase default dist receive buffer from 4k to 256k (and make configurable) Patch mnesia_tm to dispatch async_dirty txns to separate per-table procs for concurrency Add pg2 denormalized group member lists to improve lookup throughput Increase max configurable mseg cache size
  • 29. 29 Specific Scalability Fixes Erlang usage Prefer os:timestamp to erlang:now Implement cross-node gen_server calls without using monitors (reduces dist traffic and proc link lock contention) Partition ets and mnesia tables and localize access to smaller number of processes Small mnesia clusters
  • 30. 30 Specific Scalability Fixes Operability fixes Added [prepend] option to erlang:send Added process_flag(flush_message_queue)