SlideShare uma empresa Scribd logo
1 de 38
Baixar para ler offline
Brought to you by
Extreme HTTP Performance Tuning:
1.2M API req/s on a 4 vCPU EC2 Instance
Marc Richards
Chief Problem Solver at
Marc Richards
Chief Problem Solver at Talawah Solutions
Talawah Solutions
■ Based in Kingston Jamaica
■ Cloud Computing Consultant for almost a decade
■ Solutions Architect / DevOps Engineer / Performance Engineer
■ No low-level systems performance tuning experience before
this project!
Demystifying Systems Performance Tuning
■ You don't need to be a kernel developer or a wizard sysadmin.
■ FlameGraph and bpftrace have changed the game.
■ New ebpf based tools coming out will only make things easier!
Overview
■ I accidentally fell down this optimization rabbit hole.
■ Started with a simple, high-performance API server written in C.
■ Used FlameGraph and bpftrace to analyze and optimize the entire stack.
Overview
■ Cloud: AWS
■ Hardware: 4 vCPU c5n.xlarge** (server) / 16 vCPU c5n.4xlarge (client)
■ Benchmark: Techempower JSON Serialization test
■ Server: Techempower libreactor implementation
** In order to minimize inconsistencies at the platform level I did the ïŹnal benchmark run on a c5n.9xlarge that was
restricted to 4 vCPUS using the EC2 CPU Options feature.
Blog post with even more details
https://talawah.io/blog/extreme-http-performance-tuning-one
-point-two-million/
Optimizations
Optimization Gain Req/s
Ground Zero - 224k
Application Optimizations 55% 347k
Disabling Speculative Execution Mitigations 28% 446k
Disabling Syscall Auditing / Blocking 11% 495k
Disabling iptables / netïŹlter 22% 603k
Perfect Locality 38% 834k
Interrupt Optimizations 28% 1.06M
The Case of the Nosy Neighbor 6% 1.12M
The Battle Against the Spin Lock 2% 1.15M
This Goes to Twelve 4% 1.20M
Optimizations
Optimization Gain Req/s
Ground Zero - 224k
Application Optimizations 55% 347k
Disabling Speculative Execution Mitigations 28% 446k
Disabling Syscall Auditing / Blocking 11% 495k
Disabling iptables / netïŹlter 22% 603k
Perfect Locality 38% 834k
Interrupt Optimizations 28% 1.06M
The Case of the Nosy Neighbor 6% 1.12M
The Battle Against the Spin Lock 2% 1.15M
This Goes to Twelve 4% 1.20M
Ground Zero
Running 10s test @ http://server.tfb:8080/json
16 threads and 256 connections
Latency Distribution
50.00% 1.14ms
90.00% 1.21ms
99.00% 1.26ms
99.99% 1.32ms
2243551 requests in 10.00s, 331.64MB read
Requests/sec: 224,353.73
* I modiïŹed nginx.conf to send back a hardcoded JSON response. This is not a part of the Techempower implementation.
Ground Zero
Application Optimizations
Application Optimizations
Application Optimizations
■ Run on all logical cores/vCPUs: ~25%
■ gcc -O3 and march=native: ~15%
■ send/recv instead of write/read: ~5%
■ Remove pthread overhead: ~3%
Application Optimizations
Running 10s test @ http://server.tfb:8080/json
16 threads and 256 connections
Latency Distribution
50.00% 723.00us
90.00% 0.88ms
99.00% 0.94ms
99.99% 1.08ms
3470892 requests in 10.00s, 483.27MB read
Requests/sec: 347,087.15
Application Optimizations
Before After
Disabling...
Speculative Execution Mitigations
Syscall Auditing / Blocking
iptables/netïŹlter
Disabling...
■ Speculative Execution Mitigations: 28%
● nospectre_v1 nospectre_v2 pti=off mds=off tsx_async_abort=off
■ Syscall Auditing/Blocking: 11%
● auditctl -a never,task
● docker run -d --security-opt seccomp=unconïŹned libreactor
■ iptables/netïŹlter: 22%
● modprobe -rv ip_tables
● ExecStart=/usr/bin/dockerd ---bridge=none --iptables=false --ip-forward=false
Disabling...
Running 10s test @ http://server.tfb:8080/json
16 threads and 256 connections
Latency Distribution
50.00% 419.00us
90.00% 479.00us
99.00% 517.00us
99.99% 575.00us
6031161 requests in 10.00s, 839.76MB read
Requests/sec: 603,112.18
Disabling...
Before After
Perfect Locality
+
Interrupt Optimizations
Perfect Locality + Interrupt Optimizations
■ Perfect Locality
● Pin processes to CPUs
● Pin network queues to CPUs (RSS + XPS)
● SO_REUSEPORT + SO_ATTACH_REUSEPORT_CBPF
■ Interrupt Moderation
● ethtool -C eth0 adaptive-rx on
■ Busy polling
● net.core.busy_poll=1
■ Perfect Locality + Interrupt Moderation + Busy Polling = 💯
Perfect Locality + Interrupt Optimizations
Running 10s test @ http://server.tfb:8080/json
16 threads and 256 connections
Latency Distribution
50.00% 233.00us
90.00% 263.00us
99.00% 292.00us
99.99% 348.00us
10660410 requests in 10.00s, 1.45GB read
Requests/sec: 1,066,034.60
Perfect Locality + Interrupt Optimizations
Before After
The Case of the Nosy Neighbor
+
The Battle Against the Spin Lock
The Case of the Nosy Neighbor
Someone, somewhere was spying on all my packets (kinda)
■ dev_queue_xmit_nit() -> packet_rcv()
■ packet_rcv() implicates AF_PACKET
■ sudo ss --packet --processes -> (("dhclient",pid=3191,fd=5))
■ My (extreme) solution was to disable dhclient after boot
The Case of the Nosy Neighbor
Running 10s test @ http://server.tfb:8080/json
16 threads and 256 connections
Latency Distribution
50.00% 218.00us
90.00% 254.00us
99.00% 285.00us
99.99% 341.00us
11279049 requests in 10.00s, 1.53GB read
Requests/sec: 1,127,894.86
The Case of the Nosy Neighbor
Before After
The Battle Against the Spin Lock
Running 10s test @ http://server.tfb:8080/json
16 threads and 256 connections
Latency Distribution
50.00% 212.00us
90.00% 246.00us
99.00% 276.00us
99.99% 338.00us
11551707 requests in 10.00s, 1.57GB read
Requests/sec: 1,155,162.15
The Battle Against the Spin Lock
Before After
This Goes to Twelve
This Goes to Twelve
■ Disabling Generic Receive OïŹ„oad (GRO)
■ TCP Congestion Control: cubic -> reno
■ Static Interrupt Moderation
This Goes to Twelve
Running 10s test @ http://server.tfb:8080/json
16 threads and 256 connections
Latency Distribution
50.00% 203.00us
90.00% 236.00us
99.00% 265.00us
99.99% 317.00us
12031718 requests in 10.00s, 1.64GB read
Requests/sec: 1,203,164.22
Conclusion
436% increase in requests per second. 79% reduction in p99 latency.
■ Throughput: 224k req/s -> 1.2M req/s
■ p99 latency: 1.26ms -> 265.00us
■ p99.99 latency: 1.32ms -> 317.00us
All 11 implementations on a c5n.xlarge using the stock Amazon Linux 2 AMI without any OS/Networking optimizations
All 11 implementations on a c5n.xlarge with all OS/Networking optimizations applied
Next Steps
■ Next gen kernel: 5.10 LTS
■ Next gen technologies: io_uring
■ Next gen instances: ARM vs Intel vs AMD
■ Driving performance from the bottom-up using Rust, Java, etc
Brought to you by
Marc Richards
https://talawah.io/contact
@talawahtech

Mais conteĂșdo relacionado

Mais procurados

Kernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at NetflixKernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at NetflixBrendan Gregg
 
BPF: Tracing and more
BPF: Tracing and moreBPF: Tracing and more
BPF: Tracing and moreBrendan Gregg
 
Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)Brendan Gregg
 
Enable DPDK and SR-IOV for containerized virtual network functions with zun
Enable DPDK and SR-IOV for containerized virtual network functions with zunEnable DPDK and SR-IOV for containerized virtual network functions with zun
Enable DPDK and SR-IOV for containerized virtual network functions with zunheut2008
 
Service Function Chaining in Openstack Neutron
Service Function Chaining in Openstack NeutronService Function Chaining in Openstack Neutron
Service Function Chaining in Openstack NeutronMichelle Holley
 
Blazing Performance with Flame Graphs
Blazing Performance with Flame GraphsBlazing Performance with Flame Graphs
Blazing Performance with Flame GraphsBrendan Gregg
 
VictoriaLogs: Open Source Log Management System - Preview
VictoriaLogs: Open Source Log Management System - PreviewVictoriaLogs: Open Source Log Management System - Preview
VictoriaLogs: Open Source Log Management System - PreviewVictoriaMetrics
 
Performance Tuning EC2 Instances
Performance Tuning EC2 InstancesPerformance Tuning EC2 Instances
Performance Tuning EC2 InstancesBrendan Gregg
 
The Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast StorageThe Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast StorageKernel TLV
 
BPF / XDP 8월 ì„žëŻžë‚˜ KossLab
BPF / XDP 8월 ì„žëŻžë‚˜ KossLabBPF / XDP 8월 ì„žëŻžë‚˜ KossLab
BPF / XDP 8월 ì„žëŻžë‚˜ KossLabTaeung Song
 
Accelerating Envoy and Istio with Cilium and the Linux Kernel
Accelerating Envoy and Istio with Cilium and the Linux KernelAccelerating Envoy and Istio with Cilium and the Linux Kernel
Accelerating Envoy and Istio with Cilium and the Linux KernelThomas Graf
 
Linux Networking Explained
Linux Networking ExplainedLinux Networking Explained
Linux Networking ExplainedThomas Graf
 
Choosing Right Garbage Collector to Increase Efficiency of Java Memory Usage
Choosing Right Garbage Collector to Increase Efficiency of Java Memory UsageChoosing Right Garbage Collector to Increase Efficiency of Java Memory Usage
Choosing Right Garbage Collector to Increase Efficiency of Java Memory UsageJelastic Multi-Cloud PaaS
 
Building an Observability platform with ClickHouse
Building an Observability platform with ClickHouseBuilding an Observability platform with ClickHouse
Building an Observability platform with ClickHouseAltinity Ltd
 
Ceph and RocksDB
Ceph and RocksDBCeph and RocksDB
Ceph and RocksDBSage Weil
 
LinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking WalkthroughLinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking WalkthroughThomas Graf
 
Troubleshooting MySQL from a MySQL Developer Perspective
Troubleshooting MySQL from a MySQL Developer PerspectiveTroubleshooting MySQL from a MySQL Developer Perspective
Troubleshooting MySQL from a MySQL Developer PerspectiveMarcelo Altmann
 
OpenStack DevStack Configuration localrc local.conf Tutorial
OpenStack DevStack Configuration localrc local.conf TutorialOpenStack DevStack Configuration localrc local.conf Tutorial
OpenStack DevStack Configuration localrc local.conf TutorialSaju Madhavan
 
Linux Network Stack
Linux Network StackLinux Network Stack
Linux Network StackAdrien Mahieux
 

Mais procurados (20)

Kernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at NetflixKernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at Netflix
 
BPF: Tracing and more
BPF: Tracing and moreBPF: Tracing and more
BPF: Tracing and more
 
Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)
 
Enable DPDK and SR-IOV for containerized virtual network functions with zun
Enable DPDK and SR-IOV for containerized virtual network functions with zunEnable DPDK and SR-IOV for containerized virtual network functions with zun
Enable DPDK and SR-IOV for containerized virtual network functions with zun
 
Service Function Chaining in Openstack Neutron
Service Function Chaining in Openstack NeutronService Function Chaining in Openstack Neutron
Service Function Chaining in Openstack Neutron
 
Blazing Performance with Flame Graphs
Blazing Performance with Flame GraphsBlazing Performance with Flame Graphs
Blazing Performance with Flame Graphs
 
VictoriaLogs: Open Source Log Management System - Preview
VictoriaLogs: Open Source Log Management System - PreviewVictoriaLogs: Open Source Log Management System - Preview
VictoriaLogs: Open Source Log Management System - Preview
 
Performance Tuning EC2 Instances
Performance Tuning EC2 InstancesPerformance Tuning EC2 Instances
Performance Tuning EC2 Instances
 
The Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast StorageThe Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast Storage
 
BPF / XDP 8월 ì„žëŻžë‚˜ KossLab
BPF / XDP 8월 ì„žëŻžë‚˜ KossLabBPF / XDP 8월 ì„žëŻžë‚˜ KossLab
BPF / XDP 8월 ì„žëŻžë‚˜ KossLab
 
Accelerating Envoy and Istio with Cilium and the Linux Kernel
Accelerating Envoy and Istio with Cilium and the Linux KernelAccelerating Envoy and Istio with Cilium and the Linux Kernel
Accelerating Envoy and Istio with Cilium and the Linux Kernel
 
Linux Networking Explained
Linux Networking ExplainedLinux Networking Explained
Linux Networking Explained
 
Choosing Right Garbage Collector to Increase Efficiency of Java Memory Usage
Choosing Right Garbage Collector to Increase Efficiency of Java Memory UsageChoosing Right Garbage Collector to Increase Efficiency of Java Memory Usage
Choosing Right Garbage Collector to Increase Efficiency of Java Memory Usage
 
Building an Observability platform with ClickHouse
Building an Observability platform with ClickHouseBuilding an Observability platform with ClickHouse
Building an Observability platform with ClickHouse
 
Scale Kubernetes to support 50000 services
Scale Kubernetes to support 50000 servicesScale Kubernetes to support 50000 services
Scale Kubernetes to support 50000 services
 
Ceph and RocksDB
Ceph and RocksDBCeph and RocksDB
Ceph and RocksDB
 
LinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking WalkthroughLinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking Walkthrough
 
Troubleshooting MySQL from a MySQL Developer Perspective
Troubleshooting MySQL from a MySQL Developer PerspectiveTroubleshooting MySQL from a MySQL Developer Perspective
Troubleshooting MySQL from a MySQL Developer Perspective
 
OpenStack DevStack Configuration localrc local.conf Tutorial
OpenStack DevStack Configuration localrc local.conf TutorialOpenStack DevStack Configuration localrc local.conf Tutorial
OpenStack DevStack Configuration localrc local.conf Tutorial
 
Linux Network Stack
Linux Network StackLinux Network Stack
Linux Network Stack
 

Semelhante a Extreme HTTP Performance Tuning: 1.2M API req/s on a 4 vCPU EC2 Instance

Linux Kernel vs DPDK: HTTP Performance Showdown
Linux Kernel vs DPDK: HTTP Performance ShowdownLinux Kernel vs DPDK: HTTP Performance Showdown
Linux Kernel vs DPDK: HTTP Performance ShowdownScyllaDB
 
Cloud Performance Benchmarking
Cloud Performance BenchmarkingCloud Performance Benchmarking
Cloud Performance BenchmarkingSantanu Dey
 
Analyze Virtual Machine Overhead Compared to Bare Metal with Tracing
Analyze Virtual Machine Overhead Compared to Bare Metal with TracingAnalyze Virtual Machine Overhead Compared to Bare Metal with Tracing
Analyze Virtual Machine Overhead Compared to Bare Metal with TracingScyllaDB
 
FPGA based 10G Performance Tester for HW OpenFlow Switch
FPGA based 10G Performance Tester for HW OpenFlow SwitchFPGA based 10G Performance Tester for HW OpenFlow Switch
FPGA based 10G Performance Tester for HW OpenFlow SwitchYutaka Yasuda
 
(NET404) Making Every Packet Count
(NET404) Making Every Packet Count(NET404) Making Every Packet Count
(NET404) Making Every Packet CountAmazon Web Services
 
AWS re:Invent 2016: Making Every Packet Count (NET404)
AWS re:Invent 2016: Making Every Packet Count (NET404)AWS re:Invent 2016: Making Every Packet Count (NET404)
AWS re:Invent 2016: Making Every Packet Count (NET404)Amazon Web Services
 
Web Server Deathmatch 2009 Erlang Factory Joe Williams
Web Server Deathmatch 2009 Erlang Factory Joe WilliamsWeb Server Deathmatch 2009 Erlang Factory Joe Williams
Web Server Deathmatch 2009 Erlang Factory Joe Williamslogicalstack
 
Mininet: Moving Forward
Mininet: Moving ForwardMininet: Moving Forward
Mininet: Moving ForwardON.Lab
 
Crushing Latency with Vert.x
Crushing Latency with Vert.xCrushing Latency with Vert.x
Crushing Latency with Vert.xPaulo Lopes
 
Handy Networking Tools and How to Use Them
Handy Networking Tools and How to Use ThemHandy Networking Tools and How to Use Them
Handy Networking Tools and How to Use ThemSneha Inguva
 
Set Up & Operate Tungsten Replicator
Set Up & Operate Tungsten ReplicatorSet Up & Operate Tungsten Replicator
Set Up & Operate Tungsten ReplicatorContinuent
 
Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)Ontico
 
Setup & Operate Tungsten Replicator
Setup & Operate Tungsten ReplicatorSetup & Operate Tungsten Replicator
Setup & Operate Tungsten ReplicatorContinuent
 
LF_OVS_17_OVS/OVS-DPDK connection tracking for Mobile usecases
LF_OVS_17_OVS/OVS-DPDK connection tracking for Mobile usecasesLF_OVS_17_OVS/OVS-DPDK connection tracking for Mobile usecases
LF_OVS_17_OVS/OVS-DPDK connection tracking for Mobile usecasesLF_OpenvSwitch
 
Introduction to performance tuning perl web applications
Introduction to performance tuning perl web applicationsIntroduction to performance tuning perl web applications
Introduction to performance tuning perl web applicationsPerrin Harkins
 
Training Slides: 153 - Working with the CLI
Training Slides: 153 - Working with the CLITraining Slides: 153 - Working with the CLI
Training Slides: 153 - Working with the CLIContinuent
 
HBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase ClientHBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase ClientHBaseCon
 
Troubleshooting .net core on linux
Troubleshooting .net core on linuxTroubleshooting .net core on linux
Troubleshooting .net core on linuxPavel Klimiankou
 
Anatomy of neutron from the eagle eyes of troubelshoorters
Anatomy of neutron from the eagle eyes of troubelshoortersAnatomy of neutron from the eagle eyes of troubelshoorters
Anatomy of neutron from the eagle eyes of troubelshoortersSadique Puthen
 

Semelhante a Extreme HTTP Performance Tuning: 1.2M API req/s on a 4 vCPU EC2 Instance (20)

Linux Kernel vs DPDK: HTTP Performance Showdown
Linux Kernel vs DPDK: HTTP Performance ShowdownLinux Kernel vs DPDK: HTTP Performance Showdown
Linux Kernel vs DPDK: HTTP Performance Showdown
 
Cloud Performance Benchmarking
Cloud Performance BenchmarkingCloud Performance Benchmarking
Cloud Performance Benchmarking
 
Analyze Virtual Machine Overhead Compared to Bare Metal with Tracing
Analyze Virtual Machine Overhead Compared to Bare Metal with TracingAnalyze Virtual Machine Overhead Compared to Bare Metal with Tracing
Analyze Virtual Machine Overhead Compared to Bare Metal with Tracing
 
FPGA based 10G Performance Tester for HW OpenFlow Switch
FPGA based 10G Performance Tester for HW OpenFlow SwitchFPGA based 10G Performance Tester for HW OpenFlow Switch
FPGA based 10G Performance Tester for HW OpenFlow Switch
 
(NET404) Making Every Packet Count
(NET404) Making Every Packet Count(NET404) Making Every Packet Count
(NET404) Making Every Packet Count
 
AWS re:Invent 2016: Making Every Packet Count (NET404)
AWS re:Invent 2016: Making Every Packet Count (NET404)AWS re:Invent 2016: Making Every Packet Count (NET404)
AWS re:Invent 2016: Making Every Packet Count (NET404)
 
Web Server Deathmatch 2009 Erlang Factory Joe Williams
Web Server Deathmatch 2009 Erlang Factory Joe WilliamsWeb Server Deathmatch 2009 Erlang Factory Joe Williams
Web Server Deathmatch 2009 Erlang Factory Joe Williams
 
Mininet: Moving Forward
Mininet: Moving ForwardMininet: Moving Forward
Mininet: Moving Forward
 
Crushing Latency with Vert.x
Crushing Latency with Vert.xCrushing Latency with Vert.x
Crushing Latency with Vert.x
 
Handy Networking Tools and How to Use Them
Handy Networking Tools and How to Use ThemHandy Networking Tools and How to Use Them
Handy Networking Tools and How to Use Them
 
Set Up & Operate Tungsten Replicator
Set Up & Operate Tungsten ReplicatorSet Up & Operate Tungsten Replicator
Set Up & Operate Tungsten Replicator
 
Haproxy - zastosowania
Haproxy - zastosowaniaHaproxy - zastosowania
Haproxy - zastosowania
 
Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)
 
Setup & Operate Tungsten Replicator
Setup & Operate Tungsten ReplicatorSetup & Operate Tungsten Replicator
Setup & Operate Tungsten Replicator
 
LF_OVS_17_OVS/OVS-DPDK connection tracking for Mobile usecases
LF_OVS_17_OVS/OVS-DPDK connection tracking for Mobile usecasesLF_OVS_17_OVS/OVS-DPDK connection tracking for Mobile usecases
LF_OVS_17_OVS/OVS-DPDK connection tracking for Mobile usecases
 
Introduction to performance tuning perl web applications
Introduction to performance tuning perl web applicationsIntroduction to performance tuning perl web applications
Introduction to performance tuning perl web applications
 
Training Slides: 153 - Working with the CLI
Training Slides: 153 - Working with the CLITraining Slides: 153 - Working with the CLI
Training Slides: 153 - Working with the CLI
 
HBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase ClientHBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase Client
 
Troubleshooting .net core on linux
Troubleshooting .net core on linuxTroubleshooting .net core on linux
Troubleshooting .net core on linux
 
Anatomy of neutron from the eagle eyes of troubelshoorters
Anatomy of neutron from the eagle eyes of troubelshoortersAnatomy of neutron from the eagle eyes of troubelshoorters
Anatomy of neutron from the eagle eyes of troubelshoorters
 

Mais de ScyllaDB

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLScyllaDB
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasScyllaDB
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasScyllaDB
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...ScyllaDB
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...ScyllaDB
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaScyllaDB
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityScyllaDB
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptxScyllaDB
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDBScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationScyllaDB
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsScyllaDB
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesScyllaDB
 
ScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB
 
DBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsDBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsScyllaDB
 
Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBScyllaDB
 
NoSQL Data Modeling 101
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101ScyllaDB
 

Mais de ScyllaDB (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQL
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & Pitfalls
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
 
ScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB Virtual Workshop
ScyllaDB Virtual Workshop
 
DBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsDBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & Tradeoffs
 
Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDB
 
NoSQL Data Modeling 101
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101
 

Último

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍾 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍾 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍾 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍾 8923113531 🎰 Avail...gurkirankumar98700
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 

Último (20)

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍾 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍾 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍾 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍾 8923113531 🎰 Avail...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 

Extreme HTTP Performance Tuning: 1.2M API req/s on a 4 vCPU EC2 Instance

  • 1. Brought to you by Extreme HTTP Performance Tuning: 1.2M API req/s on a 4 vCPU EC2 Instance Marc Richards Chief Problem Solver at
  • 2. Marc Richards Chief Problem Solver at Talawah Solutions Talawah Solutions ■ Based in Kingston Jamaica ■ Cloud Computing Consultant for almost a decade ■ Solutions Architect / DevOps Engineer / Performance Engineer ■ No low-level systems performance tuning experience before this project!
  • 3. Demystifying Systems Performance Tuning ■ You don't need to be a kernel developer or a wizard sysadmin. ■ FlameGraph and bpftrace have changed the game. ■ New ebpf based tools coming out will only make things easier!
  • 4. Overview ■ I accidentally fell down this optimization rabbit hole. ■ Started with a simple, high-performance API server written in C. ■ Used FlameGraph and bpftrace to analyze and optimize the entire stack.
  • 5. Overview ■ Cloud: AWS ■ Hardware: 4 vCPU c5n.xlarge** (server) / 16 vCPU c5n.4xlarge (client) ■ Benchmark: Techempower JSON Serialization test ■ Server: Techempower libreactor implementation ** In order to minimize inconsistencies at the platform level I did the ïŹnal benchmark run on a c5n.9xlarge that was restricted to 4 vCPUS using the EC2 CPU Options feature.
  • 6. Blog post with even more details https://talawah.io/blog/extreme-http-performance-tuning-one -point-two-million/
  • 7. Optimizations Optimization Gain Req/s Ground Zero - 224k Application Optimizations 55% 347k Disabling Speculative Execution Mitigations 28% 446k Disabling Syscall Auditing / Blocking 11% 495k Disabling iptables / netïŹlter 22% 603k Perfect Locality 38% 834k Interrupt Optimizations 28% 1.06M The Case of the Nosy Neighbor 6% 1.12M The Battle Against the Spin Lock 2% 1.15M This Goes to Twelve 4% 1.20M
  • 8. Optimizations Optimization Gain Req/s Ground Zero - 224k Application Optimizations 55% 347k Disabling Speculative Execution Mitigations 28% 446k Disabling Syscall Auditing / Blocking 11% 495k Disabling iptables / netïŹlter 22% 603k Perfect Locality 38% 834k Interrupt Optimizations 28% 1.06M The Case of the Nosy Neighbor 6% 1.12M The Battle Against the Spin Lock 2% 1.15M This Goes to Twelve 4% 1.20M
  • 9. Ground Zero Running 10s test @ http://server.tfb:8080/json 16 threads and 256 connections Latency Distribution 50.00% 1.14ms 90.00% 1.21ms 99.00% 1.26ms 99.99% 1.32ms 2243551 requests in 10.00s, 331.64MB read Requests/sec: 224,353.73
  • 10. * I modiïŹed nginx.conf to send back a hardcoded JSON response. This is not a part of the Techempower implementation.
  • 14. Application Optimizations ■ Run on all logical cores/vCPUs: ~25% ■ gcc -O3 and march=native: ~15% ■ send/recv instead of write/read: ~5% ■ Remove pthread overhead: ~3%
  • 15. Application Optimizations Running 10s test @ http://server.tfb:8080/json 16 threads and 256 connections Latency Distribution 50.00% 723.00us 90.00% 0.88ms 99.00% 0.94ms 99.99% 1.08ms 3470892 requests in 10.00s, 483.27MB read Requests/sec: 347,087.15
  • 17. Disabling... Speculative Execution Mitigations Syscall Auditing / Blocking iptables/netïŹlter
  • 18. Disabling... ■ Speculative Execution Mitigations: 28% ● nospectre_v1 nospectre_v2 pti=off mds=off tsx_async_abort=off ■ Syscall Auditing/Blocking: 11% ● auditctl -a never,task ● docker run -d --security-opt seccomp=unconïŹned libreactor ■ iptables/netïŹlter: 22% ● modprobe -rv ip_tables ● ExecStart=/usr/bin/dockerd ---bridge=none --iptables=false --ip-forward=false
  • 19. Disabling... Running 10s test @ http://server.tfb:8080/json 16 threads and 256 connections Latency Distribution 50.00% 419.00us 90.00% 479.00us 99.00% 517.00us 99.99% 575.00us 6031161 requests in 10.00s, 839.76MB read Requests/sec: 603,112.18
  • 22. Perfect Locality + Interrupt Optimizations ■ Perfect Locality ● Pin processes to CPUs ● Pin network queues to CPUs (RSS + XPS) ● SO_REUSEPORT + SO_ATTACH_REUSEPORT_CBPF ■ Interrupt Moderation ● ethtool -C eth0 adaptive-rx on ■ Busy polling ● net.core.busy_poll=1 ■ Perfect Locality + Interrupt Moderation + Busy Polling = 💯
  • 23. Perfect Locality + Interrupt Optimizations Running 10s test @ http://server.tfb:8080/json 16 threads and 256 connections Latency Distribution 50.00% 233.00us 90.00% 263.00us 99.00% 292.00us 99.99% 348.00us 10660410 requests in 10.00s, 1.45GB read Requests/sec: 1,066,034.60
  • 24. Perfect Locality + Interrupt Optimizations Before After
  • 25. The Case of the Nosy Neighbor + The Battle Against the Spin Lock
  • 26. The Case of the Nosy Neighbor Someone, somewhere was spying on all my packets (kinda) ■ dev_queue_xmit_nit() -> packet_rcv() ■ packet_rcv() implicates AF_PACKET ■ sudo ss --packet --processes -> (("dhclient",pid=3191,fd=5)) ■ My (extreme) solution was to disable dhclient after boot
  • 27. The Case of the Nosy Neighbor Running 10s test @ http://server.tfb:8080/json 16 threads and 256 connections Latency Distribution 50.00% 218.00us 90.00% 254.00us 99.00% 285.00us 99.99% 341.00us 11279049 requests in 10.00s, 1.53GB read Requests/sec: 1,127,894.86
  • 28. The Case of the Nosy Neighbor Before After
  • 29. The Battle Against the Spin Lock Running 10s test @ http://server.tfb:8080/json 16 threads and 256 connections Latency Distribution 50.00% 212.00us 90.00% 246.00us 99.00% 276.00us 99.99% 338.00us 11551707 requests in 10.00s, 1.57GB read Requests/sec: 1,155,162.15
  • 30. The Battle Against the Spin Lock Before After
  • 31. This Goes to Twelve
  • 32. This Goes to Twelve ■ Disabling Generic Receive OïŹ„oad (GRO) ■ TCP Congestion Control: cubic -> reno ■ Static Interrupt Moderation
  • 33. This Goes to Twelve Running 10s test @ http://server.tfb:8080/json 16 threads and 256 connections Latency Distribution 50.00% 203.00us 90.00% 236.00us 99.00% 265.00us 99.99% 317.00us 12031718 requests in 10.00s, 1.64GB read Requests/sec: 1,203,164.22
  • 34. Conclusion 436% increase in requests per second. 79% reduction in p99 latency. ■ Throughput: 224k req/s -> 1.2M req/s ■ p99 latency: 1.26ms -> 265.00us ■ p99.99 latency: 1.32ms -> 317.00us
  • 35. All 11 implementations on a c5n.xlarge using the stock Amazon Linux 2 AMI without any OS/Networking optimizations
  • 36. All 11 implementations on a c5n.xlarge with all OS/Networking optimizations applied
  • 37. Next Steps ■ Next gen kernel: 5.10 LTS ■ Next gen technologies: io_uring ■ Next gen instances: ARM vs Intel vs AMD ■ Driving performance from the bottom-up using Rust, Java, etc
  • 38. Brought to you by Marc Richards https://talawah.io/contact @talawahtech