Square Peg Round Hole: Serverless Solutions For Non-Serverless Problems

Square Peg Round Hole
Chase Douglas, CTO @ Stackery.io
SERVERLESS SOLUTIONS FOR NON-SERVERLESS PROBLEMS

Serverless Solutions For Non-Serverless Problems
• Working With SQL Databases 
• Rate Limiting 
• State Machines

Working With SQL
Databases
INTERFACING WITH NON-SERVERLESS TECHNOLOGIES

Working With SQL Databases
Scenario: PHB says you need to build an API
to access data in a MySQL database

1. Add my fav ORM and DB driver 
2. Connect and query 
3. Proﬁt!

import sys
import logging
import rds_config
import pymysql
#rds settings
rds_host = "rds-instance-endpoint"
name = rds_config.db_username
password = rds_config.db_password
db_name = rds_config.db_name
logger = logging.getLogger()
logger.setLevel(logging.INFO)
try:
conn = pymysql.connect(rds_host, user=name, passwd=password, db=db_name, connect_timeout=5)
except:
logger.error("ERROR: Unexpected error: Could not connect to MySql instance.")
sys.exit()
logger.info("SUCCESS: Connection to RDS mysql instance succeeded")
def handler(event, context):
"""
This function fetches content from mysql RDS instance
"""
...
HTTPS://DOCS.AWS.AMAZON.COM/LAMBDA/LATEST/DG/VPC-RDS-DEPLOYMENT-PKG.HTML

HTTPS://WWW.DRUPAL.ORG/PROJECT/DRUPAL/ISSUES/930876

What’s going on?

Most SQL databases are not serverless-
friendly 
They do not scale horizontally

Each DB connection is handled on an independent thread of
execution 
Each thread of execution requires memory set aside for it 
Switching execution between threads adds overhead
Most SQL databases limit concurrent clients

import sys
import logging
import rds_config
import pymysql
#rds settings
try:
except:
sys.exit()
logger.info("SUCCESS: Connection to RDS mysql instance succeeded")
"""
"""
...
HTTPS://DOCS.AWS.AMAZON.COM/LAMBDA/LATEST/DG/VPC-RDS-DEPLOYMENT-PKG.HTML
This is never closed until the
function instance is terminated

Each function has a set
of idle instances with
open connections
20/20
20/20
20/20
20/20
20/20
Total Connections: 100

40/40
15/20
15/20
15/20
15/20
Total Connections: 120

15/40
15/20
40/40
15/20
15/20
Total Connections: 140!

Solution I: Wait for AWS, Azure, GCP to solve the problem
HTTPS://WWW.FLICKR.COM/PHOTOS/MANDJ98/5079617428
Sir Aurora Serverless
d’AWS has no limits

import sys
import logging
import rds_config
import pymysql
#rds settings
"""
"""
try:
except:
sys.exit()
logger.info("SUCCESS: Connection to RDS mysql instance succeeded”)
...
Solution II

Wait!

Doesn’t this add overhead on every invocation?

Yes, it does. 
Measure the overhead. 
It is likely within reasonable limits for your needs.

Rate Limiting
SERVERLESS SOLUTION IN ABSENCE OF A MANAGED SERVICE

Rate Limiting
Scenario:
• Marketing has syndicated content out 
• Need to page scrape websites to ﬁnd out where
the content is ranking 
• Need to rate limit web scraping

Rate Limiting
Rate: Quantity / Time

Rate Limiting
Example 
Desired Limit: 60,000 requests per hour 
Per-Minute Limit: 1,000 requests per minute 
Implementation: Each minute scrape 1000 pages

Rate Limiting
Time: Heartbeat Generator 
• Consistent interval to ensure steady rate 
• Providers
• AWS: CloudWatch Events
• Azure: Timer Triggers
• GCP: App Engine Cron + App Engine + Pub/Sub

Rate Limiting
Wait! How are we going to make 1,000
requests each minute? 
Concurrency!

Rate Limiting
After measuring, we can safely do 20 requests
per minute per invocation 
We need 1,000 / 20 = 50 concurrent functions

Rate Limiting
Solution I: Trigger functions manually 
• Each minute, grab 1,000 pages to scrape 
• Invoke 50 scraper functions concurrently
with 20 pages each

Rate Limiting
Pros 
• Very conﬁgurable (batch size & concurrency)
• Pay-per-use cost

Rate Limiting
Cons 
• Poor retry capabilities

Rate Limiting
Fillable Queue 
• Contains an amount of work to perform 
• Retryable 
• Providers
• Queuing services
• Databases

Rate Limiting
Solution II: Use a partitioned queue (AWS Kinesis Streams or
Azure Event Hubs) 
• Provision queue with 50 partitions for concurrency 
• Each minute, add 1,000 pages to queue 
• 50 Scraper functions will be invoked with a batch of ~20
pages each

Rate Limiting
Solution II: Use a partitioned queue 
Pros 
• Retryable
• Conﬁgurable (batching & concurrency)

Rate Limiting
Solution II: Use a partitioned queue 
Cons 
• Expensive (pay per partition)

Rate Limiting
Solution III: Use a Key-Value store with a stream (AWS
DynamoDB or Azure Cosmos DB) 
• Each minute add 1,000 pages to KV store 
• Each page record is batched into 20 page messages that
invoke the Scraper function 
• Concurrency is possible, but not conﬁgurable

Rate Limiting
Solution III: Use a Key-Value store with a stream 
Pros 
• Retryable
• Somewhat conﬁgurable (batching)
• Queue visibility

Rate Limiting
Solution III: Use a Key-Value store with a stream 
Cons 
• Concurrency not conﬁgurable
• A little expensive (pay per KV store
throughput)

Rate Limiting
What should you choose? 
If you’re ok with errors, use a spawner function 
If you need precision control over retries and concurrency,
use a partitioned queue 
If you are cost conscious, investigate a Key-Value store with
a stream, or use a spawner function with extra error handling

State Management
BRINGING NON-SERVERLESS CAPABILITIES TO SERVERLESS

State Management
Concurrency Coordination 
State Transitions 
Any serverless state machine can be built
with a combination of the above

State Management
Concurrency Coordination Requires External
State

State Management
Concurrency Coordination Example: Map / Reduce 
• Add a record to a database with count of concurrent invocations 
• Perform invocations 
• At completion, invocations atomically reduce concurrent count 
• If no more concurrent invocations, transition to next state

State Management
State Transitions Require Local State

State Management
State Transitions Example: Custom Retry Logic 
• Attempt to execute function 
• On failure, check “retries” property of the input message 
• If retries > MAX_RETRIES, then report failure 
• Otherwise, increment “retries” property in message, then reinvoke
self

State Management
Can we make this more general?

State Management
AWS Step Functions and Azure Logic Apps 
But, they’re expensive for high-throughput
apps

Go Forth And Do
More With Serverless!

Thank you!
Chase Douglas, CTO @ Stackery.io
chase@stackery.io
@txase

Square Peg Round Hole: Serverless Solutions For Non-Serverless Problems

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Square Peg Round Hole: Serverless Solutions For Non-Serverless Problems

Semelhante a Square Peg Round Hole: Serverless Solutions For Non-Serverless Problems (20)

Mais de Chase Douglas

Mais de Chase Douglas (8)

Último

Último (20)

Square Peg Round Hole: Serverless Solutions For Non-Serverless Problems