Tuning the Kernel for Varnish Cache

Tuning the kernel
for Varnish Cache

Me!
• Per Buer
• CTO @ Varnish Software
• Programmer / Sysadmin
background
• Beer

Varnish Software
• Contributes a lot to the Varnish Cache project
• Not the Varnish Cache project
• Support and ad-on software for Varnish Cache
• Media, e-commerce, API and CDN workloads

What is this Varnish?
Client BackendVarnish
TTBF: 30 microseconds TTBF: 150 milliseconds

Varnish Cache: 30s primer
• High performance HTTP Caching reverse proxy
• 10 years old
• Policy-driven conﬁguration language
• Massively threaded - event driven programming is a fad :-P
• Super easy to write modules (no event loop, see)

VCL Example
sub vcl_recv {
if (req.http.host == "www.example.com" &&
req.url ~ "^/fun/" &&
(req.http.referer && req.http.referer !~ "^http://www.example.com/")) {
return (synth(403, "No hotlinking please”));
}
}

So? What is Varnish?
Client BackendVarnish
Run high speed logic here.

Tuning Varnish for fun and proﬁt

What to tune
• Linux IP stack & Netﬁlter
• Linux ethernet - we’ll skip this for now. Most of you don’t have
ethernet interfaces anymore. :-)
• Varnish Cache

http://www.linuxbrigade.com/reduce-time_wait-socket-connections/
(#2 on my Google when searching for tcp_tw_recycle)
Dangerous

Setting up a lab
• Set up three node network (client - router - target)
• Use Trafﬁc Control / Netem on virtual servers

target
router
client
eth1 
192.168.16.1/24
intnet
eth2 
192.168.17.1/24
intnet2
192.168.17.2
intnet2
192.168.16.2
intnet

So we have a perfect network…

Real life networks
• Latency
• Jitter
• Packet reordering
• Packet loss
• Duplication
• Corruption

Trafﬁc Control: netem
• Ships in the 2.6 linux kernel
• Make all sort of characteristics easy
• Reasonably simple to use (see next slide)

tc qdisc add dev eth1 root netem delay 100ms 10ms
distribution normal reorder 2% 10% loss 1%
queuing discipline
tc qdisc add dev eth2 root netem delay 1ms

target
router
client
100ms +/- 10ms
1% loss
2% reordering
1ms

A suitable backend
• https://github.com/espebra/dummy-api
• Perfect for ad hoc testing
• Object size, latencies (ttfb, ttb) are all dynamic (from URL)
• Really fast (100K+ RPS)
• http://target:1337/?header-delay=50&body-delay=100&predictable-
content=10

Linux TCP buffer tuning
• Supposedly auto-tuning
• Defaults are OK
• Some improvements on 10G networks

Client Varnish100ms latency
Need to retain data in buffers while waiting for ACK

Calculating BDP
• Max Bandwidth per ﬂow x Delay
• 1000 Mbps x 0.1 seconds = 100megabits = 12megabytes
• Default: ~3.7 megabytes - 330 megabits @ 100ms latency

BDP Tuning
• Kernel autotunes the details - we just give it more room
• /proc/sys/net/core/(r|w)mem_max can be ignored
• /proc/sys/net/ipv4/tcp_(r|w)mem should be lifted -
• 10240 87380 16777216 is the usual recommendation

Three way handshake
SYN
ACK
GET / …
SYN, ACK
ACK
RESP
Initcwnd

Playing with initcwnd
• Initial congestion window is now 10
• Increasing might break stuff
• Some CDNs increase initcwnd and show some improvement

accept()
• System call used by an application to accept a socket from the
kernel
• Multiple threads in Varnish issue accept() calls - one per thread pool

somaxconn
• Global limit on listen_depth
• Default is silly (128)
• Adds 3s/1s delay to incoming connections (initial syn gets
discarded)
• Increase it to 1 - 16K

tcp_max_syn_backlog
• Threshold for SYN Flood detection
• Limits number of TCP connection being established
• When exhausted - SYN Cookies are sent
• Do not rely on SYN Cookies

Local TCP ports
• Varnish will need local sockets in order to talk to backends
• Busy servers might run low on sockets
• Default: net.ipv4.ip_local_port_range = 32768 61000
• Can safely be increased to “2000 65500”

TIME_WAIT
• Socket is kept around after it is closed
• Linux used 2x FIN timeout
• Default is 60 seconds (no packet should be older than 60s)
• I’ve never seen a packet older than 10s
• net.ipv4.tcp_ﬁn_timeout can be set to 10

More TIME_WAIT
• tcp_tw_recycle is dangerous (unbuckles seat belt)
• tcp_tw_reuse can cause problems with uses behind NAT - makes
sense on LAN w.o./NAT
• tcp_max_tw_buckets can mitigate TIME_WAIT attacks by destroying
sockets in TIME_WAIT state
• Increase tcp_max_tw_buckets to 256K or more

Connection tracking
• Linux ﬁrewall tracks connections
• Loaded implicitly when using certain iptables rules
• [11864.342438] nf_conntrack version 0.5.0 (3917 buckets, 15668 max)
• New connection are rejected when conntrack is full
• Set parameters when loading module (options nf_conntrack
hashsize=XXXXX) and

Linux tuning - summing up
• Leave most things as they are
• Increase somaxconn, tcp_max_backlog
• Increase local_port range
• Decrease tcp_ﬁn_timeout to ~10
• Increase tcp_max_tw_buckets to ~256K
• Increase BDP buffer limit

A short sidestep: TCP Acceleration

Varnish Cache threads
• Number of pool: always 2
• thread_pool_max
• thread_pool_min
• You need ~ 1 thread per RPS

Workspace Tuning
• Varnish pre-allocates memory for the threads
• When it runs out of memory - it crashes

VSL Tuning
• /var/lib/varnish contains the VSL.
• Linux will try to sync the VSL to disk
• On busy servers: put VSL on RAMDISK

Keepalives
• 3 way handshake on long latency is expensive
• TLS handshake is worse
• idle_send_timeout (frontend) and backend_idle_timeout (backend)

Most efﬁcient tuning
• Increase your cache hit rate
• 100ms vs 1ms per request

Increasing cache hit rate
• Prolong TTLs - invalidate on change
• Normalize request headers when using Vary

Summing up: Varnish Cache
• Threads are in pools (you need two)
• Make sure there is enough threads
• Make sure there is enough memory
• Try to tune your cache hit ratio

Preemptive answers
• TLS is not in Varnish Cache due to OpenSSL QA issues
• H/2 support is experimental in Varnish Cache 5.0
• Full H/2 support in Varnish Cache 5.1 (with Hitch)

Tuning the Kernel for Varnish Cache

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (20)

Semelhante a Tuning the Kernel for Varnish Cache

Semelhante a Tuning the Kernel for Varnish Cache (20)

Mais de Per Buer

Mais de Per Buer (6)

Último

Último (20)

Tuning the Kernel for Varnish Cache