What does Serverless mean for DevOps, in practical terms? While Serverless does reduce the need for server-centric DevOps, it poses new challenges in many areas including security, app deployment and cloud resource provisioning, partly due to an explosion of "nanoservices". Based on a current project using AWS, we cover relevant tools, techniques and tips to deliver a smooth serverless experience for development through to production.
Delivered at Bristol DevOps meetup, 27 Jun 2018. To see detailed notes covering extra points not on slides, click the Notes link just below (or download the Powerpoint).
Update: here's the correct link for Gojko Adzic talk on the Backendless slide - https://www.youtube.com/watch?v=w7X4gAQTk2E
2. $ whoami
• DevOps engineer, working as contractor
• Serverless, Terraform, AWS
• Ansible, Docker, Linux, databases
• Startups and enterprises
• Currently engaged at Seccl Technology
• Fintech startup building innovative API-based platform
@rdonkin
linkedin.com/in/rdonkin
tempohq.net
3. • What is Serverless?
• What Changes for DevOps?
• DevOps != Server Management
• DevOps Areas for Serverless
• App Architectures
• Automation Tools
• Environments
• Monitoring and Observability
• Cold Start
• Security
Topics
11. DevOps != Server Management
• Agile for Infrastructure (Patrick Debois)
• "Better software, faster and more safely" (Helen Beal)
12. DevOps Areas for Serverless
Configuration management
Deployment
• Cloud resource provisioning
• App deployment
Management
• Monitoring, alerting
• Observability, logging
• Application performance
• Application Cost Management
Developer Environments
CI/CD
High availability
• Multi-region
Security
• Access controls
• Authentication inc. MFA
• Secret management
• Intrusion detection/prevention
• Auditing
Dependency management
• Software supply chain
All delivered using Infrastructure as Code
13. When Not To Use Serverless
• Long-running functions
• Max 5 min on AWS
• CaaS alternatives: ECS, EKS, Fargate, AWS Batch
• Server alternatives: EC2, SQS
• Low-latency functions
• Lambda "cold start" on AWS – delays of 100ms to 10 sec
• Workload is flat, or very high compute
• Spiky workloads are better fit
• Consider TCO of equivalent solution inc. HA, scaling
• Existing apps
• Move slices into serverless (background processing and operations tasks)
15. Serverless App Architectures
Option API Functions Comment
Monolith POST /api backend Single function for app
Microservices POST /custs
Add customer
customers Function per resource
e.g. customer
GET /custs/:id
Get customer
ditto
Nanoservices POST /custs
Add customer
create-cust Function per endpoint
e.g. add customer
GET /custs/:id
Get customer
get-cust
Choice affects cold start, monitoring, separate deployment, …
Nanoservices model very common 100s – 1000s of cloud resources
16. Automation Tools
• Cloud Resource Provisioning
• Important resources
• IAM access controls
• Serverless "frameworks"
• Deploy functions
• Provision resources bound to functions
• Easy 'build, deploy, test' for developers
• Configuration management
• Parameter model + templating
• Developer laptops, Jenkins, ELK
• Pure templating tools also relevant
AWS SAM
17. Environments and Naming
• Stage environment
• All resources include stage env name
• Examples: devfrisby, staging, prod
• Serverless Framework – uses 'stage' in all resources
• Very quick and cheap to create and destroy
• Personal environment in cloud, limited isolation
• Core environment:
• AWS account
• Limit blast radius
• Supports various stage envs
• Examples: dev, test, prod
• IAM roles - assume-role from master account
• Define naming rules for everything
18. Monitoring, Logging, Observability
• No server or network monitoring needed!
• No servers to install agents on …
• May need 'middleware hook' on every function
• Monitor app health & performance
• IOPipe, Dashbird, Epsagon, Thundra, …
• Some direct, some via CloudWatch
• Logging – CloudWatch Logs, ELK/EFK
• Debugging
• Distributed tracing – AWS XRay
• Observability – Honeycomb
• Charity Majors talk
Monitoring = unit tests for ops
Observability = debugging tool for ops
19. Cold Start on AWS Lambda (1)
• Cold start delays on first run
• 100 ms for Node/Python "hello world"
• Several sec for Java/C# with low RAM
• Warm start – no extra delay
• Can cache data per function instance
• Reduce cold start time
• Allocate more memory/CPU
• Strip out unused code
• Reduce scope of functions
• Reduce cold start frequency
• Increase scope of functions
• Function warming (concurrent)
20. Cold Start on AWS Lambda (2)
• More cold start delays
• DBs: creating a connection (MySQL, MongoDB, etc)
– 100s of ms
• Can also overwhelm RDS with too many connections
• VPCs: creating ENI (approx. 7-10 sec latency)
• Possible solutions
• DB connection caching + pooling using Lambda:
• Cache connection in function's warmed state
• Lambda concurrency limit to avoid overwhelming RDS
• Avoid VPCs for interactive functions
• DBs with connectionless APIs – e.g. DynamoDB
21. Security and Secrets
• No servers to secure & patch!
• No servers to run security agents …
• May need 'middleware hook'
• Tools:
• AWS CloudTrail - auditing AWS operations
• CloudTracker - analyze CloudTrail -> least-privilege
IAM policies
• AWS Guard Duty - intrusion detection
• AWS Macie – check for sensitive info in S3
• AWS Config – check configs
• Puresec – learn app behavior and block attacks
• Secret management
• AWS Parameter Store or Secrets Manager
25. "Backendless" web apps
Goal: no backend code – not even FaaS
• Frontend-centric
• Origin: mobile backend services - e.g. Firebase and Parse
• Smart frontend app + BaaS only
• Background processing may still use FaaS
• Much lower costs
• 400K users for $100/month – Mindmup app, Gojko Adzic
• Leverage client hardware
• Gojko Adzic talk: https://www.youtube.com/watch?v=Xi_WrinvTnM
• Forrest Brazeal blog post: https://read.acloud.guru/why-do-you-care-so-much-about-your-backend-when-your-customers-dont-
81f4e6433593
26. DevOps – Key Elements
• whole team collaborating
• deploy very frequently
• automate repeated work
• MTTR >> MTBF
• business-focused monitoring
• continuous learning and
improvement
27. Ops != Server Management
Operations is the constellation of your org's technical skills, practices,
and cultural values around designing, building, scaling and maintaining
systems.
-- Charity Majors, @mipsytipsy
Ops is the process of
delivering value to users
Notas do Editor
DevOps engineer, working on serverless, container-based and server-centric projects.
Talking about serverless and its impact on DevOps tools and techniques, based on a current fintech project.
So what is serverless? Well, a vast amount of security, scalability and managing failover is someone else's problem – Amazon and others will take care of this with serverless, rather than giving you a virtual server to manage.
This includes choosing the IaaS instance type, OS image, installing dependencies, creating filesystems, configuring databases, and ensuring security updates – all handled by the cloud provider.
Serverless is like PaaS, only more so – you just deploy app code as stateless functions. The difference is auto scaling and high availability – the cloud provider runs as many servers as you need.
FaaS = run single stateless function, as many copies as needed - no problem with Wall of Traffic - very good for spiky workloads
Triggered by events such as an API Gateway request, or a file being uploaded to S3.
The other part of serverless is BaaS = object storage, databases, authentication, … all run by the cloud provider so you don't have to.
Most servers are mostly idle, like this guy. A key part of serverless is that you don't pay for idle time. Just pay as you go, for the compute and storage you use.
As with a well-managed Kubernetes cluster, the servers run by Amazon or Microsoft "run hot" at 90% plus utilization, reducing the price you pay for functions.
Ref: https://www.slideshare.net/NCore1/unite-2017-going-serverless-gertjan-vanthienen slide 2 - servers are 30% idle generally, and on AWS it's over 90%.
So serverless has many benefits for scaling, security and so on. The question is: does this mean Less Ops or No Ops?
There are a few Serverless people who say that "DevOps is the new legacy" due to serverless – this seems to be due to equating DevOps with server management.
For a good review of what DevOps is, see https://puppet.com/blog/what-is-devops
These are areas for a pure serverless environment, without any containers, servers, VPNs or VPCs (virtual networks).
Configuration management – this particularly focuses on managing a tree of complex parameters, defining them in a structure that is DRY yet still allows local variations for specific projects or tasks (e.g. upgrading Node.js version in one dev environment before promoting to test environment.) These parameters are used by almost every part of the DevOps solution including tools such as Terraform, Ansible, Serverless Framework, etc.
Cloud resource provisioning – some data-centric resources that aren't owned by a single serverless function should be owned by an infra provisioning process – e.g. databases, critical storage buckets, event stores, and message queues. This provides clear ownership and avoids accidental deletion if a function that owns the resource is deleted.
Application Cost Management – given the PAYG model for serverless, it's possible to have big surprises in costs, depending on traffic volume and how the app is coded, but it's also possible to drill into a large amount of granular cost data to optimize this. Tooling in this area is still immature, and there is not even a good term for it.
All of this should of course use infrastructure as code, for the normal reasons, and be usable in multi environments so that new DevOps code can be tested safely without impacting the main dev/test/prod environments.
Convergence of CaaS and serverless is one way that long-running functions and lower-latency can be supported
Background processing for apps, or ops tasks such as backups and disk space pruning, are often an easy way into serverless.
I believe that DevOps is not disappearing, but evolving in response to some of the new challenges as you move into the serverless world, to make sure your app *keeps working* in production.
With the nanoservices model, each serverless function is an independent unit, acting as a nanoservice, not just a microservice. That's great for scaling, but you now have more to monitor and configure than before.
Serverless deployment frameworks are crucial to manage the large number of cloud resources if your serverless app is adopting the Nanoservices model, and arguably also if you use the Microservices model, due to proliferation of development environments (covered later).
Pure templating tools such as j2cli (based on Jinja2) might also be relevant instead of configuration management (CM) tools, if you don't have a very large and complex set of parameters to manage. However, any use of servers will require a CM tool alongside cloud resource provisioning tools.
Blast radius – Lambda concurrency limits, security breaches, etc have a limited impact if there is less in each AWS account, i.e. one per core environment. Hence we would use assume-role from a master AWS account, allowing a single set of AWS credentials to access the required target account (core environment).
Why deploy to a stage env even when developing?
Access to full range of BaaS services for integration testing during development (using S3, DBs, message queues, etc), without writing mocks
Ensures that everyone deploys frequently,to cloud, which makes development closer to production – hence some key operability characteristics are more likely to be thought about earlier in software lifecycle (DevOps style)
You still need to detect your app isn't working, or is too slow - so monitoring and logging remain crucial. Some people run a separate ELK stack with serverless on AWS, to provide richer searching and analysis than CloudWatch logs allow.
The complexity of serverless architectures also drives the need for observability, which lets you drill down into really hard bugs in production, by capturing rich data that's easily analysed.
With function warming, it's important that it's concurrent – if the traffic to a certain function requires N concurrent instances of the function, you must ensure that you invoke it (in warming mode) concurrently in a small enough window that you get N instances. Some discussions of function warming don't address this point.
Function consolidation means putting otherwise separate requests into a single larger function so that it's more likely to be "naturally warm" – this can work, but can mean slower cold starts when they do occur.
ENI = elastic network interface. Lambda functions can optionally be placed into a VPC to access its servers, or linked VPCs (on-premise servers, or DBaaS such as Amazon RDS or MongoDB Atlas)
DB connection reaping – various approaches including https://www.jeremydaly.com/manage-rds-connections-aws-lambda/
or just use the connection caching approach on slide which will ensure no more than N connections with a function concurrency limit of N.
need to consider what happens when AWS kills a warm function (after some hours) – this may leave an open DB connection in the DB server, so connection reaping may be required then.
Serverless has real benefits for developers and the business – running those invisible servers is "someone else's problem", and you get a lot of security, scalability and high availability for free.
"Don't Pay for Idle" is a key point, giving huge cost savings and letting you manage application costs at a granular level.
Far from disappearing, DevOps is already evolving to meet the challenges of serverless.
This has been a very quick tour of how Serverless is changing DevOps. Thank you!
deploy very frequently (e.g. daily or faster)
small batch sizes – make small changes to software
automate repeated work
automated testing
automated deployment (servers, cloud resources and apps)
whole team collaborating
not just tools
focus on time to repair more than time between failures
rapid recovery
business-focused monitoring
monitor highest value first
Continuous learning and improvement – rapid iteration and feedback to improve Devops mettrics