The presentation explains how to setup rate limits, how to work with 429 code, how rate limits are implemented in kubernetes, cni, loadbalancer and so on
2. Who am I
1. Distributed systems expert
2. Head of development hub – Xsolla
• Leading leads of teamleads
• In charge of 24x7 production
• Balancing between finance and architecture
2
3. 1. Payments and payouts for gamedev – 800+ payment providers
2. BaaS for in-game shops
3. LiveOps
4. Antifraud
5. From indie to enterprises
6. 800+ team members across the globe
3
5. What’s not about
1. Promotion of a particular RL tools
2. Subjects of rate limiting in a particular API
3. Aproach what to do in case of 429 status
4. 100500 lines of code
5
7. What’s about purpose
1. Rate limiting implicates total request count by time or by queue length criteria
2. Any request, made after exceeding the limit, will be declined
3. One is allowed to configure various limits with time window and queue
7
8. What’s about purpose
Rate limiting implicates total request count by time or by queue length criteria.
Any request, made after exceeding the limit, will be auto- rejected or declined.
One is allowed to configure various limits with time window.
Throttling implicates total request count by time or by queue length criteria.
Any request, made after exceeding the limit, will be queued. One is allowed to
configure various limits with time window.
8
9. What’s about purpose
Rate limiting implicates total request count by time or by queue length criteria.
Any request, made after exceeding the limit, will be auto- rejected or declined.
One is allowed to configure various limits with time window.
Throttling implicates total request count by time or by queue length criteria.
Any request, made after exceeding the limit, will be queued. One is allowed to
configure various limits with time window.
9
10. What’s about types
A rate limiter limits the number of requests received by the API within
any given time window - short webshop cart calls
A concurrency limiter that limits the number of requests that are
active/queued at any given time - long-lived payments transaction
registry call
10
11. What’s about types
A rate limiter limits the number of requests received by the API
within any given time window - short webshop cart calls
150 requests to cart per second allowed
A concurrency limiter that limits the number of requests that are
active/queued at any given time - long-lived payments transaction
registry call
5 simultaneous transaction reports allowed
11
12. What’s about purpose
1. Eliminate unexpected traffic patterns - internal and external spikes
2. Get rid of unwanted traffic patterns - brute-force
12
13. What’s about purpose
1. Eliminate unexpected traffic patterns - internal and external spikes
2. Get rid of unwanted traffic patterns - brute-force
13
15. Rate limits is all about QoS!
The measurable end-to-end performance
properties of a network service, which can
be guaranteed in advance by a Service
Level Agreement between a user and a
service provider, so as to satisfy specific
customer application requirements.
Note: These properties may include
throughput (bandwidth), transit delay
(latency), error rates, priority, security,
packet loss, packet jitter, etc.
15
21. State issue: global with local
21
Global rate limiting - rate limiting for all instanceS behind LB
Local rate limiting - rate limiting per service instance
Requested RPS Service count Actual RPS
Local 26 3 26 * 3 = 78
Global 26 3 26
42. My rate-limit experience
1. Databases - all about Oracle,
Greenplum, BigQuery
2. API gateways - 80% DataArt
projects
3. Sberbank load balancers
4. Sberbank mesh
5. A lot of RL in Java code -
Bucket4j, Resilence4j mostly
6. Rate limits on Envoy-based
ingress - epic fail!
My name is Alex and I used rate limits
Tell us about rate limits,
Alex
42
49. Pros&cons Code
+
1. Pure code is very fast
2. No infra dependencies
-
1. Multiple implementations across the company: language, library, …
2. Infra dependencies %) Own Redis, Memcahed….
49
59. AWS WAF limitation
1. IP keys only
2. Fixed time window 5 minutes
3. Minimum rate granularity 100
4. Blocks up to 10000 IP. If more – passes through.
59
63. Pros&cons LB
-
1. Infra does all magic
2. Vague visibility for end users
3. Vague visibility for infra - tail logs…
4. Update limit is a magic
5. Local without Nginx Plus
+
Infra does all magic
Traffic doesn’t enter internal network
In case of NGINX
63
66. API gateway paid features
Algorithm tuning
HA/DR
HA/DR
Algorithm tuning
66
67. GraphQL rate limiting
1. Different from REST API limiting
2. Single GraphQL call - many calls inside
3. Decision - calculate score: gateway or code
Query depth may not be 100% relevant!
Only code-based RL is relevant!
67
Depth
68. Score code-based calculation approaches
1. Query depth, annotations - https://github.com/4Catalyzer/graphql-validation-complexity
2. Fields count, annotations - https://github.com/slicknode/graphql-query-complexity
3. Cost – hand-made via Apache Calcite or https://github.com/pa-bru/graphql-cost-analysis
4. All of them in RL library + external state server - https://github.com/ravangen/graphql-rate-limit
The best part for JS only …
68
78. Service mesh
20Ms
1. Blocking read - unary gRPC mode
2. Every request call
3. No stickiness - no profit from cache
78
79. Service mesh RL further architecture
1. Bi-directional stream
2. Local counters with CRDT sync
3. Stickiness
4. Cache
5-10 Ms delays Exists in Lift fork only
79
80. Pros&cons Service mesh
+
Very granular
-
1. Hard to maintain – 3-6 config per RL at least
2. Extra hardware - 500Mi per service at least + rate limiter service
3. New complicated moving part
80
87. Rate limiter operation features
1. Endpoints for information and management
2. Hot reload
3. Rate limits as a code
4. Counters + metadata DBs
- high availability
- insurance limiter
5. Many DBs for statefull layer
6. Shadow mode
7. Monitoring
8. Logs
9. Audit
87
90. What’s about how
1. Discuss with product owners
2. Select resources to limit
3. Decide about environments
4. Calculate limit figures
5. Choose identifier limit on
6. Create list of exceptions
90
91. Product part for rate-limits
1. Just be silent - black hole limits
2. Let users know they are limited
3. Use CAPCHA-magic
91
93. Calculate limit figures
Agile approach
1. Identify rate limits
2. Set rate limits
3. Get hate from customers from support team
Smart agile approach
1. Identify rate limits
2. Set rate limits but in shadow - logs only!
3. Analyze logs and identify rate limits
4. Analyze logs and identify burst settings
5. Set rate limits and burst
6. Keep monitoring
93
95. Choose identifier
1. Per IP - what’s about NAT/proxy
2. Per user - what’s about anonymous
3. Per session - what is a session
4. Per header - what’s about spoofing
5. Per subject domain id – what if different ID needs different limits
95
97. Hybrid limits
AddGoodsToCart(GoodsId int)
97
RL approach Lowest RPS Highest RPS Pros Cons
LB by method 3000 3000 1. 1 point for RL
2. No customer + service
code changes
Always low RPS
LB by method
lower/upper
+ item type in http
header
3000 5000 1. No service code changes
2. Relevant RPS
1. Customers aware about
implementation details
2. Customer code changes
3. Business logic in LB
1. LB by method upper
2. Service by GoodsId
lower
3000 5000 1. No customer code
changes
2. Relevant RPS
1. Service code changes
2. All cons of code RL
3. 2 places for RL
99. Attack options
1. Null chars in request headers + parameters - %00, %0d%0a, %0d,
%0a, %09, %0C, %20
2. Extra parameters and values in patch
3. Space characters in payload
4. IP Rotate Burp extension
5. …
AWS API Gateway based
99
100. Attack options
1. Null chars in request headers + parameters - %00, %0d%0a,
%0d, %0a, %09, %0C, %20
2. Extra parameters and values in patch
3. Space characters in payload
4. IP Rotate Burp extension
5. …
10% success!!!!!
100
107. 1. How to hack rate limiting vulnerabilities with tools :
• https://t.ly/Cg0Q
• https://t.ly/h-XP
• https://t.ly/QMSA
2. Investigate IEEE doc
• https://t.ly/V9XF_
3. Assess the maturity of your teams rate limiting
Hometask
107
108. Conclusion
1. Check rate limit attack vectors
2. Rate-limiter - perfect test task
3. At least 10 places for rate limiting
4. No ideal rate limiter - choose RL + algorithm based on a task
5. Rate limits not only about requests
6. Rate limit everything even internal services
7. Care about debug
8. Please do hometask
108
109. Pain sharing time
https://t.ly/_jo3 – my presentations
https://t.ly/6JZx - my LiknkedIn profile
@Shtock
109
Vote for the presentation!!!
1. Check rate limit attack vectors!!!
2. Rate-limiter - perfect test task
3. 10 places for rate limiting
4. No ideal rate limiter - choose RL + algorithm based on
a task
5. Rate limits not only about requests
6. Rate limit everything even internal services
7. Care about debug
В самом простом случае документация говорит о факте рейтлимитов
Или в случае зрелости посервисно описывает различные рейт-лимиты и как с ними жить
Например, один из наших конкурентов дает максимально полную информацию по обработке и как раз говорит что у них есть и рейт-лимиты на короткие операции и рейт-лимиты на долгие операции и они абсолютно разные. То что называется rate limiter и concurrency limiter
Опрос – а как получить рейтлимиты без их достижения?
Обратите внимание на graphql-казалось бы ничего необычного, но мы к этому еще вернемся
Сколько мест для рейтлимитов знаете вы,
Что важно – хидеры в ЛЮБОМ случае
Связка прояснить
Говорим что недокументировано и подбрито с UI
Обратите внимание как мы берем url. Он нам понадобится позже для темы кибербезопасности.
А тут про лоадбалансер у яндекса
Тут вот в архитектуру про стейт
Спросить зачем на ингрессе и зачем на service
Примеры + почитать мануал про графкуэль в тике как он считает
Примеры + почитать мануал про графкуэль в тике как он считает
Вопрос в чем проблема + еще nginx конфиг притащить + разобрать на 2 слайда для контура и там типа Issue solved, но спытать как работает – сколько реально энвоев и рассказать что у нас не заработал
Conflict-free distributed data types
Как вы думаете какой статус код
Хотя я считаю рейтлимитить надо всё
Примеры атак
Примеры атак
Примеры атак
Запускаем 1000 раз и не ждем ответа чтобы реально быстро было