AWS Lambda has changed the way we deploy and run software, but the serverless paradigm has created new challenges to old problems: How do you test a cloud-hosted function locally? How do you monitor them? What about logging and config management? And how do we start migrating from existing architectures?
Yan Cui shares solutions to these challenges, drawing on his experience running Lambda in production and migrating from an existing monolithic architecture.
2. What’s in this talk?
! how to responsibly run a serverless architecture (aka. how to do
ops in serverless)
! testing, CI/CD
! logging, distributed tracing, monitoring
! config management, securing secrets
! coldstarts
! gotchas/limitations + workarounds/hacks
13. Before
! hidden complexities and dependencies
! low utilisation to leave headroom for large spikes
! EC2 scaling is slow, so scale earlier
! paying for lots of used resources
! up to 30 mins to deploy
! deployments required downtime
14. - Dan North
“lead time to someone saying
thank you is the only reputation
metric that matters.”
70. “…We find that tests that mock external libraries
often need to be complex to get the code into the
right state for the functionality we need to exercise.
The mess in such tests is telling us that the design
isn’t right but, instead of fixing the problem by
improving the code, we have to carry the extra
complexity in both code and test…”
Don’t Mock Types You Can’t Change
71. “…The second risk is that we have to be sure
that the behaviour we stub or mock matches
what the external library will actually do…
Even if we get it right once, we have to make
sure that the tests remain valid when we
upgrade the libraries…”
Don’t Mock Types You Can’t Change
77. is our request correct?
is the request
mapping set up
is the API resources
configured correctly?
are we assuming the
correct schema?
LambdaAPI Gateway DynamoDB
is Lambda proxy
configured correctly?
is IAM policy set up
correctly?
is the table created?
what unit tests will not tell you…
78.
79. most Lambda functions are simple have
single purpose, the risk of shipping broken
software has largely shifted to how they
integrate with external services
observation
80.
81. But it slows down
my feedback loop…
IT’S NOT
ABOUT YOU!
84. …if a service can’t provide you with
a relatively easy way to test the
interface in reality, then you should
consider using another one.
Paul Johnston
85. “…Wherever possible, an acceptance test
should exercise the system end-to-end without
directly calling its internal code.
An end-to-end test interacts with the system
only from the outside: through its interface…”
Testing End-to-End
96. “…We prefer to have the end-to-end tests
exercise both the system and the process
by which it’s built and deployed…
This sounds like a lot of effort (it is), but has
to be done anyway repeatedly during the
software’s lifetime…”
Testing End-to-End
143. console.log(“hydrating yubls from db…”);
console.log(“fetching user info from user-api”);
console.log(“MONITORING|1489795335|27.4|latency|user-api-latency”);
console.log(“MONITORING|1489795335|8|count|yubls-served”);
timestamp metric value
metric type
metric namemetrics
logs
165. Why not consul or etcd?
! multiple EC2 instances in multi-AZ for HA
! have to manage servers, patch OS, patch software, etc.
! learning curve for configuring the service
! learning curve for using the CLI tools
174. Requirements for client library
! standardise and encapsulate how you manage configs
! supports client-side caching (fetch & cache at coldstart)
! invalidate cache at interval
! invalidate cache explicitly when staleness is detected
198. AWS Lambda
docs
Take advantage of container re-use to improve the
performance of your function. Make sure any
externalized configuration or dependencies that your
code retrieves are stored and referenced locally after initial
execution. Limit the re-initialization of variables/objects on
every invocation. Instead use static initialization/
constructor, global/static variables and singletons. Keep
alive and reuse connections (HTTP, database, etc.) that
were established during a previous invocation.
http://amzn.to/2jzLmkb
202. AWS Lambda
docs
AWS Lambda polls your stream and
invokes your Lambda function.
Therefore, if a Lambda function fails,
AWS Lambda attempts to process the
erring batch of records until the time
the data expires.
http://amzn.to/2vs2lIg
203. vs
processing halts until failed
events are retried successfully/
expired from stream
prioritize realtime-ness,
retry failed events with best effort,
then skip
208. AWS Lambda
docs
Each shard can support up to
5 transactions per second for
reads, up to a maximum total data
read rate of 2 MB per second.
http://amzn.to/2ubyaot
209. AWS Lambda
docs
If your stream has 100 active
shards, there will be 100 Lambda
functions running concurrently.
Then, each Lambda function
processes events on a shard in
the order that they arrive.
http://amzn.to/2ubyaot