В докладе я расскажу, что такое Web-акселератор, он же reverse proxy и он же - фронтенд. Как следует из названия, он ускоряет сайт. Но за счет чего он это делает? Какие они, вообще, бывают? Что они умеют, а что нет? В чем особенности каждого из решений? И, вообще, постараюсь рассказать о них вглубь и вширь.
Еще я расскажу про еще один Open Source Web-акселератор - Tempesta FW. Уникальность проекта в том, что это гибрид Web-акселератора и файервола, разрабатываемый специально для обработки и фильтрации больших объемов HTTP трафика. Основные сценарии использования системы — это защита от DDoS прикладного уровня и просто доставка больших объемов HTTP трафика малыми затратами на оборудование.
- Что такое Web-акселератор, зачем он был придуман и как понять когда он нужен;
- Типичный функционал reverse proxy, его отличия от Web-сервера;
- Упомянем про SSL акселераторы;
- Заглянем вглубь HTTP, и как он управляет кэшированием и проксированием, что может быть закэшированно, а что - нет;
- Мы сравним наиболее популярные акселераторы (Nginx, Varnish, Apache Traffic Server, Apache HTTPD, Squid) по фичам и внутренностям;
- Зачем нужен еще один Web-акселератор Tempesta FW, и в чем его отличие от других акселераторов.
2. Who am I?
CEO & CTO at NatSys Lab & Tempesta Technologies
Tempesta Technologies (Seattle, WA)
● Subsidiary of NatSys Lab. developing Tempesta FW – a first and
only hybrid of HTTP accelerator and firewall for DDoS mitigation &
WAF
NatSys Lab (Moscow, Russia)
● Custom software development in:
• high performance network traffic processing
• databases
6. To Cache
static (e.g. video, images, CSS, HTML)
some dynamic
Negative results (e.g. 404)
Permanent redirects
Incomplete results (206, RFC 7233 Range Requests)
Methods: GET, POST, whatever
GET /script?action=delete – this is your responsibility
(but some servers don't cache URIs w/ arguments)
7. Not to Cache
Responses to Authenticated requests
Unsafe methods (RFC 7231 4.2.1)
(safe methods: GET, HEAD, OPTIONS, TRACE)
Explicit no-cache directive
Set-Cookie (?)
8. Cache POST?
Idempotent POST (e.g. web-serarch) – just like GET
Non-idempotent POST (e.g. blog comment) – cache response for
following GET
RFC 7234 4.4: URI must be invalidated
9. Cache Cookies?
Varnish, Nginx, ATS don't cache responses w/ Set-Cookie by default
mod_cache and Squid do cache responses w/ Set-Cookie by default
RFC 7234:
Note that the Set-Cookie response header field
[RFC6265] does not inhibit caching; a cacheable
response with a Set-Cookie header field can be (and
often is) used to satisfy subsequent requests to
caches. Servers who wish to control caching of these
responses are encouraged to emit appropriate Cache-
Control response header fields.
10. Cache Entries Freshness
RFC 7234: freshness_lifetime > current_age
Freshness calculation:
● Last-Modified – when a resource was modified at origin server
● Date – response generation timestamp
● Age – the age the object has been in proxy cache
● Expires – when a cache entry expires
Revalidation:
Conditional requests (RFC 7232, e.g. If-Modified-Since)
Background activity or on-request job
11. Stale Cache Entries
Sometimes is OK, e.g. Nginx: proxy_cache_use_stale
Expired responses
Invalidated by unsafe methods
Error responses for the URI
Timeout
Etc.
12. Cache-Control
A cache MUST obey the requirements of the Cache-Control directives
Freshness and staleness control
Explicit cache/no-cache
Private caching (browser vs proxy) caching – not privacy!
Pragma: no-cache
13. Vary
(secondary keys say hello to databases)
Accept-Language – return localized version of page (no need
/en/index.html)
User-Agent – mobile vs desktop (bad!)
Accept-Encoding – don't send compressed page if browser doesn't
understaind it
Request headers normalization is required!
14. Buffering vs Streaming
Buffering
● Seems everyone by default
● Performance degradation on large messages
● 200 means Ok, not incomplete response
Streaming
● Tengine (patched Nginx) w/
proxy_request_buffering & fastcgi_request_buffering
● More performance, but 200 doesn't mean full response
15. Cache Storage
Plain files (Nginx, Squid, Apache HTTPD)
● Meta-data in RAM
● Filesystem database
● Easy to manage
Database (Apache Traffic Server, Tempesta FW)
● Faster access
Persistency (experimental in Varnish, upcoming in Tempesta FW)
● no real consistency
16. Cache Storage: mmap(2)
Alistair Wooldrige, “BBC Digital Media
Distribution: How we improved
throughput by 4x”,
http://www.bbc.co.uk/blogs/internet/entries/17d22fb8-cea2-49d5-be14-86e7
48 CPUs, 512GB RAM, 8TB SSD
17. Cache Key
Primary key: URI path + Host
POST key: URI path + Host + body
Secondary (Vary) key: any headers
E.g. Nginx custom cache key:
proxy_cache_key "$request_uri|$request_body"
18. Cache Purging
$ curl -X PURGE <URL>
Not RFC-defined
Squid, Varnish, Nginx (by wildcard)
Use case:
1. Update some resource at upstream (POST can invalidate an entry)
2. Send PURGE & GET reuests to the cache
3. Now cache is up to date
19. Cache Busting
No access to Web-accelerator or Web-server
E.g. force users to use a new version of CSS or Ad?
<?php $css_ver=”1.1”; ?>
<link rel=”stylesheet”
href=”my.css?v=<?php echo $css_ver; ?>”>
20. IO & multitasking
Bryan Call, “Choosing A Proxy Server”, ApacheCon 2014
ATS Nginx Squid Varnish
Apache
HTTPD Tempesta
Threads X per-session! X
Events X X X partial X ~
Processes X X X
CPU locality X
21. Sessions vs Linux RFS
RFS: Receive Flow Steering,
linux/Documentation/networking/scaling.txt
24. Tempesta FW: Challenges
Normal Web-servers deliver content
There are a lot of bad guys in modern Internet
There are also good guys filtering bad guys out
25. Good Guys: WAF
Technologies: XHTML, WSDL, Machine learning, Regexps
Platforms: Nginx, Apache Traffic Server etc.
29. Good Guys: Anti-DDoS CDN
Technologies: DPI or Firewall + Machine Learning
Platforms: Nginx
30. Application Layer DDoS
Service from Cache Rate limit
Nginx 22us 23us
Fail2Ban: write to the log, parse the log, write to the log, parse the
log…
31. Application Layer DDoS
Service from Cache Rate limit
Nginx 22us 23us
Fail2Ban: write to the log, parse the log, write to the log, parse the
log… - really in 21th century?!
32. Web-accelerator Capabilities
Nginx, Varnish, Apache Traffic Server, Squid, Apache HTTPD etc.
● cache static Web-content
● load balancing
● rewrite URLs, ACL, Geo, filtering etc.
● C10K
Kernel-mode Web-accelerators: TUX, kHTTPd
● basically the same sockets and threads
● zero-copy
33. Web-accelerator Capabilities
Nginx, Varnish, Apache Traffic Server, Squid, Apache HTTPD etc.
● cache static Web-content
● load balancing
● rewrite URLs, ACL, Geo, filtering? etc.
● C10K – is it a problem for bot-net? SSL?
● what about tons of 'GET / HTTP/1.0nn'?
Kernel-mode Web-accelerators: TUX, kHTTPd
● basically the same sockets and threads
● zero-copy → sendfile() - not needed
34. Tempesta FW: Yet Another Web-accelerator
First and only hybrid of HTTP accelerator and FireWall
FireWall: layer 3 (IP) – layer 7 (HTTP) filter
FrameWork: high performance and flexible platform to build intelligent
DDoS mitigation systems and Web Application Firewalls (WAF)
Directly embedded into Linux TCP/IP stack
NUMA-aware x86-64 cache conscious Web-cache on huge pages
This is Open Source (GPLv2)
44. Prerequisites
SSE 4.2 (“sse4_2” in /proc/cpuinfo)
Huge pages (“pse” in /proc/cpuinfo)
Custom Linux kernel (KVM or dedicated server)
45. Build Kernel
$ git clone https://github.com/tempesta-tech/linux-4.1-tfw.git
$ cd linux-4.1-tfw
$ make && make modules && make modules_install && make install
$ reboot
46. Build & Run Tempesta
$ git clone https://github.com/natsys/tempesta.git
$ cd tempesta && make
$ cat > etc/tempesta_fw.conf
server 127.0.0.1:8080; # upstream
cache 1; # cache sharding
^D
$ ./scripts/tempesta.sh --start
47. Thanks!
Web site: http://tempesta-tech.com
(Powered by Tempesta since 06.06.16)
Availability: https://github.com/tempesta-tech/tempesta
Blog: http://natsys-lab.blogspot.com
Contact: ak@tempesta-tech.com
We are hiring!