AWS, a w szczególności serverless computing, oferuje nam możliwość skalowania naszych systemów out-of-the-box. W większości przypadków jest to nam bardzo na rękę, ale… Co w sytuacji, gdy potrzebujemy z chirurgiczną precyzją kontrolować, ile aktualnie Lambd jest w użytku? Okazuje się, że nie jest to do końca taka prosta sprawa, gdyż AWS uporczywie robi wszystko, co może, aby wyskalować nasz system, niezależnie czy tego chcemy, czy nie. W tej prelekcji zaprezentuję możliwe sposoby rate limitingu naszych funkcji. Za przykład posłuży nam komunikacja z 3-rd party API, gdzie w większości przypadków jesteśmy ograniczeni ilością requestów, jakie możemy wykonać w jednostce czasu, żeby nie otrzymać 429-tki.
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Jak utrzymać stado Lambd w ryzach
1. Jak utrzymać stado
Lambd w ryzach?
Czyli o tym, jak skutecznie zescrapować zewnętrzne API,
żeby przy okazji nie oberwać 429 - Too Many Requests.
2. About me
● 📧 Bartosz Drozd <drodevbar@protonmail.com>
● 🏙 Live and work remotely from Gdańsk, Poland
● 💻 Node.js developer @ The Software House
● ☁ In love-hate relationship with Cloud for 1.5 years already
● 🏸 Privately, amateur badminton player
4. ● Majority of us at least once consumed some external API
● Most 3rd-party APIs we consume, have hard rate limits
● This means that we cannot make more requests than are allowed
● Otherwise, we will be throttled
General problem overview
5. ● What if some 3rd-party service is mission critical for us?
● We have to guarantee that we won’t exceed the limit
● Nice to have: guarantee that all requests will be processed successfully and eventual failed requests will
be reprocessed
● Nice to have 2: requests that are unprocessable, should be stored in some place, so we could investigate
them
General problem overview
7. What’s Onfido
Software for document ID & facial biometrics verification
A tool that enables you to verify if your future app user is a
human being with its own, unique identify and not a goat (or
even worse, a terrorist)
Verification of the identity happens via upload and analysis of ID
document, photo/video of the customer
8. ● It’s a simple REST API
● Our main resource is an applicant - it represents a person
● Each applicant resource has a connection(s) with the below resources:
○ document - identity document, number depends of the type of a document
○ photo - image and its metadata of an applicant
○ video - footage and its metadata of an applicant
○ check - aggregation of reports of a specific type
■ report - different kinds: for documents, photos, etc.
API specification
9. ● In order to fetch applicant’s data, we have to call GET /applicants/<id>
● In order to fetch the list of applicant’s photos ids, we have to call GET /applicants/<id>/photos
● In order to fetch specific photo’s data, we have to call GET /applicants/<id>/photos/<id>
● And so on…
API specification
10. ● Backup all applicants data along with all the resources they are linked to (documents, photos, etc.)
● Save the data somewhere (S3)
● Limit number of requests to Onfido to at most 300 requests/min
Our task
12. The naive way
● Max Lambda duration is 15 minutes, so the
process will for sure run out of time if the
amount of data isn’t minimal
○ not scalable solution!
● No possibility to somehow rate limit requests
to an external API
13. The better way
● Decouple the process by introducing SQS
○ good practice: add DLQ with custom
redrive policy
● 2 Lambda functions:
○ one starts the process by fetching all
applicants ids and putting them as
messages on the SQS
○ second one performs single applicant data
backup i.e. fetches all of its data and
stores in S3 bucket
14. ● By default, Lambda pulls up to 5 batches of events from SQS
○ single batch aggregates 1 or more messages
How to rate limit requests number with such approach?
● “If messages are still available, Lambda increases the number of processes that are reading batches by
up to 60 more instances per minute. The maximum number of batches that an event source mapping can
process simultaneously is 1,000.”
○ ~ AWS Docs
● Not something we necessarily want - we’ll hit the API quota pretty quickly
● How to limit the number of Lambdas that can run concurrently?
● Answer: reserved concurrency?
15. ● It guarantees the maximum number of concurrent instances for the function
● Default concurrency limit per AWS Region is 1,000 invocations at any given time
● If you set concurrency of function foo to X, this means that:
○ function foo can scale up to X instances running simultaneously at most
○ it’s guaranteed that function foo will always be able to scale up to X instances
○ other functions are left with 1000 - X concurrency
● Free
What’s reserved concurrency?
● (*) Note: do not mistake it with provisioned concurrency
○ that helps to deal with cold start issue - AWS always keeps N instances of our Lambda functions
warmed up
○ paid
17. ● Number of messages in the queue: 10
● Consumer Lambda overall processing time: 5s/message
● Consumer Lambda batch size set to 1 (i.e. process 1 message at a time)
● Redrive policy (maxReceiveCount) set to 2 (i.e. message will be tried to process at most 2 times)
● Message visibility timeout: 10s
● Consumer Lambda reserved concurrency set to 2
Limit Lambda concurrency using reserved concurrency
21. Conclusions
● Reserved concurrency doesn’t impact SQS -> Lambda connection
● Regardless of reserved concurrency of the Lambda function, SQS -> Lambda will still behave in the same
manner, i.e. Lambda will pull up to 5 batches from the queue at a time and will scale up to 1000 batches
24. SQS FIFO
● So far we have been using standard SQS queue
● SQS offers two queue types: standard and FIFO
25. standard FIFO
Message delivery policy At least once Exactly once
Ordering It tries to preserve the order, but doesn’t
guarantee it
Guarantees the order of the messages
Messages groups Nope Yep! (via MessageGroupId)
SQS: standard vs FIFO
26. SQS FIFO: grouping messages
● Think of message groups as of separate queues inside the main queue
27. SQS FIFO: grouping messages - Lambdas as consumers
● If Lambdas are our consumers, they scale up according to the number of message groups
28. SQS FIFO: How to assign messages to msg groups?
● For example, randomly
○ rand(1, nr_of_message_groups)
○ doesn’t guarantee that all messages will be equally distributed, but in most cases it’s good enough
29. ● Number of messages in the queue: 10
● Consumer Lambda overall processing time: 5s/message
● Consumer Lambda batch size set to 1 (i.e. process 1 message at a time)
● Redrive policy (maxReceiveCount) set to 2 (i.e. message will be tried to process at most 2 times)
● Message visibility timeout: 10s
● Consumer Lambda reserved concurrency set to 2
● 2 message groups - “0” and “1” - assigned randomly across the messages
○ Lambda shouldn’t pull messages in faster pace than 2 at a time
Limit Lambda concurrency using FIFO and msg groups
31. Recall: our latest backup data process
● So far, we have been talking about limiting our
Lambda functions, but…
● We want to rate limit number of requests (not
Lambdas!) to at most 300 per minute, but…
● We don’t know how many resources each
applicant contains i.e. how many API calls
each lambda will do…
● How to address that?
32. Resource-specific processing strategies
● Instead of backing up a single applicant with all of its resources at once, let’s backup a single resource type at once
○ applicant, document, etc.
● How to do that?
● Understand the data you are dealing with - analyze the data and the relations between resources
● Introduce resource-specific strategy for each resource you want to backup
34. ● applicant resource backup strategy
1. get applicant data (GET /applicants/<id> - 1 API call) and save its data on S3
2. get list of documents (GET /applicants/<id>/documents - 1 API call) and for each document id, post a message on SQS with
resource type document
3. repeat for photos, videos and checks resources
5 API calls in total
● document resource backup strategy
1. get the document metadata (1 API call) and save the data on S3
2. get the document PDF file (1 API call) and save it on S3
2 API calls in total
● repeat for all other resources…
● This way, we have predictable number of API calls per strategy
● We can now update our backup process flow
Resource-oriented backup strategies
35. Backup process with resource-specific backup strategies
What’s changed?
● backup-single-applicant fx() became
backup-resource fx() and it makes predictable
number of API calls thanks to
resource-specific strategies
○ Note: strategies can be implemented as
design patterns inside 1 Lambda or as
multiple Lambdas with different event filter
patterns
● backup-resource fx() posts messages on SQS
36. 1. We know that thanks to SQS FIFO, we can control Lambda concurrency
2. We figured out how to make predictable number of API calls per single Lambda invocation - by using resource-specific strategies
What have we discovered so far?
Question that still remain:
1. How do we put it all together to control number of backup-resource Lambda invocations in order not to exceed 300 API calls per
minute? 🤔
38. ● We can calculate how long a single Lambda has to run at least before we can invoke another one
● We can combine it with MessageGroupIds to run the processing concurrently and still not exceed the limit
Equation for minimum processing time
39. Equation for minimum processing time - example
Let’s calculate minimum processing time for 2 resource-specific strategies:
● applicant (5 API calls in total)
● and document (2 API calls in total)
Max number of requests we can make per minute: 300
Concurrency (MessageGroupIds): 2
applicant: velocity = 300/2 = 150; minProcessingTime = 60_000 / 150 * 5 = 2000 [ms] = 2.0 [s]
document: velocity = 300/2 = 150; minProcessingTime = 60_000 / 150 * 2 = 800 [ms] = 0.8 [s]
40. Lambda code including minimum processing time
● If the function takes less time than minProcessingTime, we sleep for diff - minProcessingTime
● If the function runs longer or equal to minProcessingTime, nothing to do here - we are fine
43. How about the costs?
● In terms of Lambda function, we pay for 2 things:
○ number of requests ($0.20 per 1M requests)
○ duration ($0.0000166667 for every GB-second)
● Doesn’t matter if Lambda sleeps (does nothing) or does some heavy lifting, we pay for the overall execution time
● If our functions sleep for the majority of the processing time, our costs will increase
● Is it then a good idea to add the minimum processing time???
44. How about the costs?
It turns out that we can reduce Lambda sleep time to the bare minimum by manipulating concurrency.
● If Lambdas sleep for most of the time, reduce concurrency - fewer Lambdas, more requests to process per Lambda
● If Lambdas processing time is slower than expected, increase concurrency - more Lambdas, fewer requests to process per Lambda
= faster processing
45. Summary
Rate limiting requests to an external API via Lambda & FIFO queue
● By using FIFO instead of standard SQS, we can take advantage of message groups
○ Think of message groups as of concurrency
● Instead of naively processing resources, analyze the data and relations between them
○ Relations will help you create resource-oriented processing strategies
● calculate minimum processing time per Lambda using the formula
○ in order to reduce sleep time, play with concurrency to find the optimal value