Rate limits and all about

All about rate limiting
Alexander Tokarev
Xsolla

Who am I
1. Distributed systems expert
2. Head of development hub – Xsolla
• Leading leads of teamleads
• In charge of 24x7 production
• Balancing between finance and architecture
2

1. Payments and payouts for gamedev – 800+ payment providers
2. BaaS for in-game shops
3. LiveOps
4. Antifraud
5. From indie to enterprises
6. 800+ team members across the globe
3

Agenda
1.What for
2.Architecture
3.RL points
4.Documentation
5.Production
6.Malicious behavior
7.Pain sharing time
4

What’s not about
1. Promotion of a particular RL tools
2. Subjects of rate limiting in a particular API
3. Aproach what to do in case of 429 status
4. 100500 lines of code
5

What’s about
1. Background
2. History
3. Variety
6

What’s about purpose
1. Rate limiting implicates total request count by time or by queue length criteria
2. Any request, made after exceeding the limit, will be declined
3. One is allowed to configure various limits with time window and queue
7

Rate limiting implicates total request count by time or by queue length criteria.
Any request, made after exceeding the limit, will be auto- rejected or declined.
One is allowed to configure various limits with time window.
Throttling implicates total request count by time or by queue length criteria.
Any request, made after exceeding the limit, will be queued. One is allowed to
configure various limits with time window.
8

Rate limiting implicates total request count by time or by queue length criteria.
Any request, made after exceeding the limit, will be auto- rejected or declined.
One is allowed to configure various limits with time window.
Throttling implicates total request count by time or by queue length criteria.
Any request, made after exceeding the limit, will be queued. One is allowed to
configure various limits with time window.
9

What’s about types
A rate limiter limits the number of requests received by the API within
any given time window - short webshop cart calls
A concurrency limiter that limits the number of requests that are
active/queued at any given time - long-lived payments transaction
registry call
10

What’s about types
A rate limiter limits the number of requests received by the API
within any given time window - short webshop cart calls
150 requests to cart per second allowed
A concurrency limiter that limits the number of requests that are
active/queued at any given time - long-lived payments transaction
registry call
5 simultaneous transaction reports allowed
11

1. Eliminate unexpected traffic patterns - internal and external spikes
2. Get rid of unwanted traffic patterns - brute-force
12

1. Eliminate unexpected traffic patterns - internal and external spikes
2. Get rid of unwanted traffic patterns - brute-force
13

Rate limits is all about QoS!
The measurable end-to-end performance
properties of a network service, which can
be guaranteed in advance by a Service
Level Agreement between a user and a
service provider, so as to satisfy specific
customer application requirements.
Note: These properties may include
throughput (bandwidth), transit delay
(latency), error rates, priority, security,
packet loss, packet jitter, etc.
15

What’s about resources to limit
Rate limit everything
- Alexander Tokarev
17

What’s about resources to limit
18

Wanna math?
1. Token bucket
2. Leaky bucket
3. Fixed window
4. Sliding window
5. Sliding log
6. Timing wheel – carousel
7. Generic cell rate
8. ……..
19

State issue: global with local
21
Global rate limiting - rate limiting for all instanceS behind LB
Local rate limiting - rate limiting per service instance
Requested RPS Service count Actual RPS
Local 26 3 26 * 3 = 78
Global 26 3 26

State issue: key sync required
22
Without sync
With sync

State issue: paid RL options
23
The same for HAProxy for instance…

State issue – hand-made sync
24 Extra moving parts + LUA performance!!!

What’s about users
1. Try to tell users that they are limited
2. Say when limitations will be removed
3. Save your support team efforts
25

What’s about users
X-RateLimit-UserLimit: 1231513
X-RateLimit-UserRemaining: 2342
X-custom-retry-after-ms: 10000
X-ratelimit-minute: 3
X-rate-limit-hour: 1
X-RateLimit-Retry-After: 11529485261
X-Rate-Limit-Reset: Wed, 21 Oct 2015 07:28:00 GMT
30

What’s about headers in
Forget about rate limits headers in NGINX!!!
39

Rate limiting info without reaching rate limit?
1. Return rate limit headers for 20X status 
2. Dedicated rate limit info service
40

What’s about where
1. Network interface
2. Code
3. Database
4. Database proxy
5. Load balancer
6. WAF
7. CNI
8. API gateway
9. Ingress controller
10.Service mesh
11......
41

My rate-limit experience
1. Databases - all about Oracle,
Greenplum, BigQuery
2. API gateways - 80% DataArt
projects
3. Sberbank load balancers
4. Sberbank mesh
5. A lot of RL in Java code -
Bucket4j, Resilence4j mostly
6. Rate limits on Envoy-based
ingress - epic fail!
My name is Alex and I used rate limits
Tell us about rate limits,
Alex
42

Network level rate limiting
OpenvSwitch
43

Types of RL in code
1. Via framework only
2. Via framework + external service for state
44

Code + external service
Which pattern is here?
Be silent who knows!
45

Pros&cons Code
+
1. Pure code is very fast
2. No infra dependencies
-
1. Multiple implementations across the company: language, library, …
2. Infra dependencies %) Own Redis, Memcahed….
49

Database MySQL ratelimiting
51

Database MySQL ratelimiting
RL settings
52

Database ProxySQL throttling
54

Database BigQuery quoting
Could be Usage Per User Per Day as well!!
55

Load balancer
WHY?????
Otherwise how to sell WAF?
56

AWS WAF limitation
1. IP keys only
2. Fixed time window 5 minutes
3. Minimum rate granularity 100
4. Blocks up to 10000 IP. If more – passes through.
59

Rate limits Yandex cloud
No rate limits - neither
NLB, ALB, AG 
60

Pros&cons LB
-
1. Infra does all magic
2. Vague visibility for end users
3. Vague visibility for infra - tail logs…
4. Update limit is a magic
5. Local without Nginx Plus
+
Infra does all magic
Traffic doesn’t enter internal network
In case of NGINX
63

API gateway Internal rate limiting
External rate limiting
Rate limit all!!!
65

API gateway paid features
Algorithm tuning
HA/DR
HA/DR
Algorithm tuning
66

GraphQL rate limiting
1. Different from REST API limiting
2. Single GraphQL call - many calls inside
3. Decision - calculate score: gateway or code
Query depth may not be 100% relevant!
Only code-based RL is relevant!
67
Depth

Score code-based calculation approaches
1. Query depth, annotations - https://github.com/4Catalyzer/graphql-validation-complexity
2. Fields count, annotations - https://github.com/slicknode/graphql-query-complexity
3. Cost – hand-made via Apache Calcite or https://github.com/pa-bru/graphql-cost-analysis
4. All of them in RL library + external state server - https://github.com/ravangen/graphql-rate-limit
The best part for JS only …
68

Ingress controller
Burst + Nodelay doesn’t work!
Replicas without state mess!
70

Ingress pain
Dedicated ingress
per protected API!!!
71

Ingress controller
Per-route rate-limiting
72
DaemonSet

Pros&cons Ingress
+
More or less standard
-
1. Granular management is sophisticated
2. Granular manament is non-standard + extra hardware
73

Service mesh
Nearly the same for local rate limiting…
77

Service mesh
20Ms
1. Blocking read - unary gRPC mode
2. Every request call
3. No stickiness - no profit from cache
78

Service mesh RL further architecture
1. Bi-directional stream
2. Local counters with CRDT sync
3. Stickiness
4. Cache
5-10 Ms delays Exists in Lift fork only 
79

Pros&cons Service mesh
+
Very granular
-
1. Hard to maintain – 3-6 config per RL at least
2. Extra hardware - 500Mi per service at least + rate limiter service
3. New complicated moving part
80

CNI
eBPF based CNI
+
Carousel algorithm
82
Up to4x faster!!!

What’s about large rate limits
84

Rate limiter operation features
1. Endpoints for information and management
2. Hot reload
3. Rate limits as a code
4. Counters + metadata DBs
- high availability
- insurance limiter
5. Many DBs for statefull layer
6. Shadow mode
7. Monitoring
8. Logs
9. Audit
87

Rate limiter endpoints
Just information 
Management should be implemented…
89

What’s about how
1. Discuss with product owners
2. Select resources to limit
3. Decide about environments
4. Calculate limit figures
5. Choose identifier limit on
6. Create list of exceptions
90

Product part for rate-limits
1. Just be silent - black hole limits
2. Let users know they are limited
3. Use CAPCHA-magic
91

Environment
Distinguish production live, production sandbox and dev rate limits!!!
92

Calculate limit figures
Agile approach
1. Identify rate limits
2. Set rate limits
3. Get hate from customers from support team
Smart agile approach
1. Identify rate limits
2. Set rate limits but in shadow - logs only!
3. Analyze logs and identify rate limits
4. Analyze logs and identify burst settings
5. Set rate limits and burst
6. Keep monitoring
93

Envoy shadow mode
94
Local ShadowMode
for all mesh
for a route

Choose identifier
1. Per IP - what’s about NAT/proxy
2. Per user - what’s about anonymous
3. Per session - what is a session
4. Per header - what’s about spoofing
5. Per subject domain id – what if different ID needs different limits
95

Hybrid limits
AddGoodsToCart(GoodsId int)
96
Id Tables to insert RPS
1 5 5000
10 12 + lock per id 3000
Depends on item type

Hybrid limits
AddGoodsToCart(GoodsId int)
97
RL approach Lowest RPS Highest RPS Pros Cons
LB by method 3000 3000 1. 1 point for RL
2. No customer + service
code changes
Always low RPS
LB by method
lower/upper
+ item type in http
header
3000 5000 1. No service code changes
2. Relevant RPS
1. Customers aware about
implementation details
2. Customer code changes
3. Business logic in LB
1. LB by method upper
2. Service by GoodsId
lower
3000 5000 1. No customer code
changes
2. Relevant RPS
1. Service code changes
2. All cons of code RL
3. 2 places for RL

Attack options
1. Null chars in request headers + parameters - %00, %0d%0a, %0d,
%0a, %09, %0C, %20
2. Extra parameters and values in patch
3. Space characters in payload
4. IP Rotate Burp extension
5. …
AWS API Gateway based
99

Attack options
1. Null chars in request headers + parameters - %00, %0d%0a,
%0d, %0a, %09, %0C, %20
2. Extra parameters and values in patch
3. Space characters in payload
4. IP Rotate Burp extension
5. …
10% success!!!!!
100

Reason to failure
103
90% use it!

Bug bounty common approach
104

What’s about bad guys
Never open RL
implementation
details in headers!
105

1. How to hack rate limiting vulnerabilities with tools :
• https://t.ly/Cg0Q
• https://t.ly/h-XP
• https://t.ly/QMSA
2. Investigate IEEE doc
• https://t.ly/V9XF_
3. Assess the maturity of your teams rate limiting
Hometask
107

Conclusion
1. Check rate limit attack vectors
2. Rate-limiter - perfect test task
3. At least 10 places for rate limiting
4. No ideal rate limiter - choose RL + algorithm based on a task
5. Rate limits not only about requests
6. Rate limit everything even internal services
7. Care about debug
8. Please do hometask
108

Pain sharing time
https://t.ly/_jo3 – my presentations
https://t.ly/6JZx - my LiknkedIn profile
@Shtock
109
Vote for the presentation!!!
1. Check rate limit attack vectors!!!
2. Rate-limiter - perfect test task
3. 10 places for rate limiting
4. No ideal rate limiter - choose RL + algorithm based on
a task
5. Rate limits not only about requests
6. Rate limit everything even internal services
7. Care about debug

Rate limits and all about

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Rate limits and all about

Semelhante a Rate limits and all about (20)

Mais de Alexander Tokarev

Mais de Alexander Tokarev (20)

Último

Último (20)

Rate limits and all about

Notas do Editor