Serverless is the new hotness. It works great for parallelized workflows and event-driven systems. But what happens when your boss comes along and needs you to build a serverless system that works with a SQL database? Or you need to scrape a website, but you need to rate limit your scraping? For that matter, how can you build a serverless state machine?
13. Working With SQL Databases
Most SQL databases are not serverless-
friendly
They do not scale horizontally
14. Working With SQL Databases
Each DB connection is handled on an independent thread of
execution
Each thread of execution requires memory set aside for it
Switching execution between threads adds overhead
Most SQL databases limit concurrent clients
15. Working With SQL Databases
import sys
import logging
import rds_config
import pymysql
#rds settings
rds_host = "rds-instance-endpoint"
name = rds_config.db_username
password = rds_config.db_password
db_name = rds_config.db_name
logger = logging.getLogger()
logger.setLevel(logging.INFO)
try:
conn = pymysql.connect(rds_host, user=name, passwd=password, db=db_name, connect_timeout=5)
except:
logger.error("ERROR: Unexpected error: Could not connect to MySql instance.")
sys.exit()
logger.info("SUCCESS: Connection to RDS mysql instance succeeded")
def handler(event, context):
"""
This function fetches content from mysql RDS instance
"""
...
HTTPS://DOCS.AWS.AMAZON.COM/LAMBDA/LATEST/DG/VPC-RDS-DEPLOYMENT-PKG.HTML
This is never closed until the
function instance is terminated
16. Working With SQL Databases
Each function has a set
of idle instances with
open connections
20/20
20/20
20/20
20/20
20/20
Total Connections: 100
17. Working With SQL Databases
40/40
15/20
15/20
15/20
15/20
Total Connections: 120
18. Working With SQL Databases
15/40
15/20
40/40
15/20
15/20
Total Connections: 140!
19.
20. Working With SQL Databases
Solution I: Wait for AWS, Azure, GCP to solve the problem
HTTPS://WWW.FLICKR.COM/PHOTOS/MANDJ98/5079617428
Sir Aurora Serverless
d’AWS has no limits
21. Working With SQL Databases
import sys
import logging
import rds_config
import pymysql
#rds settings
rds_host = "rds-instance-endpoint"
name = rds_config.db_username
password = rds_config.db_password
db_name = rds_config.db_name
logger = logging.getLogger()
logger.setLevel(logging.INFO)
def handler(event, context):
"""
This function fetches content from mysql RDS instance
"""
try:
conn = pymysql.connect(rds_host, user=name, passwd=password, db=db_name, connect_timeout=5)
except:
logger.error("ERROR: Unexpected error: Could not connect to MySql instance.")
sys.exit()
logger.info("SUCCESS: Connection to RDS mysql instance succeeded”)
...
Solution II
26. Rate Limiting
Scenario:
• Marketing has syndicated content out
• Need to page scrape websites to find out where
the content is ranking
• Need to rate limit web scraping
35. Rate Limiting
Fillable Queue
• Contains an amount of work to perform
• Retryable
• Providers
• Queuing services
• Databases
36. Rate Limiting
Solution II: Use a partitioned queue (AWS Kinesis Streams or
Azure Event Hubs)
• Provision queue with 50 partitions for concurrency
• Each minute, add 1,000 pages to queue
• 50 Scraper functions will be invoked with a batch of ~20
pages each
37. Rate Limiting
Solution II: Use a partitioned queue
Pros
• Retryable
• Configurable (batching & concurrency)
39. Rate Limiting
Solution III: Use a Key-Value store with a stream (AWS
DynamoDB or Azure Cosmos DB)
• Each minute add 1,000 pages to KV store
• Each page record is batched into 20 page messages that
invoke the Scraper function
• Concurrency is possible, but not configurable
40. Rate Limiting
Solution III: Use a Key-Value store with a stream
Pros
• Retryable
• Somewhat configurable (batching)
• Queue visibility
41. Rate Limiting
Solution III: Use a Key-Value store with a stream
Cons
• Concurrency not configurable
• A little expensive (pay per KV store
throughput)
42. Rate Limiting
What should you choose?
If you’re ok with errors, use a spawner function
If you need precision control over retries and concurrency,
use a partitioned queue
If you are cost conscious, investigate a Key-Value store with
a stream, or use a spawner function with extra error handling
47. State Management
Concurrency Coordination Example: Map / Reduce
• Add a record to a database with count of concurrent invocations
• Perform invocations
• At completion, invocations atomically reduce concurrent count
• If no more concurrent invocations, transition to next state
49. State Management
State Transitions Example: Custom Retry Logic
• Attempt to execute function
• On failure, check “retries” property of the input message
• If retries > MAX_RETRIES, then report failure
• Otherwise, increment “retries” property in message, then reinvoke
self