Anúncio

AWS re:Invent 2016: Development Workflow with Docker and Amazon ECS (CON302)

Amazon Web Services
13 de Dec de 2016
Anúncio

Mais conteúdo relacionado

Apresentações para você(20)

Similar a AWS re:Invent 2016: Development Workflow with Docker and Amazon ECS (CON302)(20)

Anúncio

Mais de Amazon Web Services(20)

Anúncio

AWS re:Invent 2016: Development Workflow with Docker and Amazon ECS (CON302)

  1. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Development Workflows with Docker and Amazon ECS Jon Todd, Chief Architect, Okta Tim Secor, Manager of Developer Productivity, Okta Danielle Greshock, Manager, Solutions Architecture, AWS CON302 December 1, 2016
  2. What to Expect from the Session • Review the CI/CD Pipeline • How would you use containers with CI/CD? • Okta Engineering: How they work and ship code • CI with Docker and ECS
  3. The Continuous Everything… Nirvana Goal Design Develop Deploy Test Run and monitor Continuous integration Continuous delivery Continuous deployment Continuous feedback
  4. Virtual machine Container
  5. Why Use Containers for Continuous Delivery? • Roll out features as quickly as possible • Predictable and reproducible environment • They are immutable! They will run the same in every environment • Fast feedback
  6. The Lifecycle: Stage 1 – Source
  7. Docker and Docker Toolbox • Docker (Linux > 3.10) • Docker Toolbox or Docker Beta (OS X, Windows) • Define app environment with Dockerfile
  8. Dockerfile FROM ruby:2.2.2 RUN apt-get update -qq && apt-get install -y build- essential libpq-dev RUN mkdir -p /opt/web WORKDIR /tmp ADD Gemfile /tmp/ ADD Gemfile.lock /tmp/ RUN bundle install ADD . /opt/web WORKDIR /opt/web
  9. Docker Compose Define and run multi-container applications: 1. Define app environment with Dockerfile 2. Define services that make up your app in docker- compose.yml 3. Run docker-compose up to start and run entire app
  10. The Lifecycle: Stage 2 – Build
  11. Containers as Build Execution Environment
  12. Containers as Build Artifacts
  13. Amazon EC2 Container Registry • Security • IAM resource-based policies • CloudTrail audit logs • Images encrypted at transit and at rest • Easily manage & deploy images • Tight integration with ECS • Integration with Docker toolset • AWS Management Console & AWS CLI • Reliability & performance • S3-backed
  14. The Lifecycle: Stage 3 – Test
  15. Running Tests Inside a Container Usual Docker commands available within your test environment Run the container with the commands necessary to execute your tests, e.g.: docker run web bundle exec rake test
  16. Running Tests Against a Container Start a container running in detached mode with an exposed port serving your app Run browser tests or other black box tests against the container, e.g., headless browser tests
  17. The Lifecycle: Stage 4 – Deploy
  18. Amazon EC2 Container Service • Highly scalable container management service • Easily manage clusters for any scale • Flexible container placement • Integrated with other AWS services • Extensible • ECS concepts • Cluster and container instances • Task definition and task
  19. AWS Elastic Beanstalk • Deploy and manage applications without worrying about the infrastructure • Elastic Beanstalk manages your database, Elastic Load Balancing, ECS cluster, monitoring, and logging • Docker support • Single container (on EC2) • Multi container (on ECS)
  20. Amazon ECS CLI • Easily create ECS clusters & supporting resources such as EC2 instances • Run Docker Compose configuration files on ECS • Available today – http://amzn.to/1jBf45a
  21. Continuous Delivery Workflows
  22. Continuous Delivery To ECS with Jenkins 4. Push image to Docker registry 2. Build image from sources 3. Run test on image 1. Code push triggers build 5. Update service 6. Pull image
  23. Continuous Delivery To ECS with Jenkins Easy deployment Developers – Merge into master, done! Jenkins build steps Trigger via webhooks, monitoring, Lambda Build Docker image via Build and Publish plugin Push Docker image into registry Register updated job with ECS API
  24. Continuous Delivery To ECS with CodePipeline 1. Code push triggers pipeline 2. Lambda function creates EC2 instance 3. Image is built and pushed to ECR 4. Lambda function terminates EC2 instance 5. Lambda function deploy new task revision to ECS
  25. Continuous Delivery To ECS with CodePipeline • Lambda custom actions • Create and terminate EC2 instance • Update ECS service • EC2 instance uses user data to build an image and push it to ECR
  26. Continuous Delivery To ECS with Shippable
  27. About Okta
  28. Millions of People Use Okta Every Day Millions of People Use Okta Every Day
  29. An identity platform for developers 1. Connect to any data source
  30. © Okta and/or its affiliates. All rights reserved. An identity platform for developers 2. Customizable login w/ MFA
  31. © Okta and/or its affiliates. All rights reserved. An identity platform for developers 3. Support all application types w/ modern identity standards
  32. © Okta and/or its affiliates. All rights reserved. An identity platform for developers Learn more at: developer.okta.com
  33. The case for ECS & Docker
  34. The problem Inspired by: http://dev2ops.org/2010/02/what-is-devops/ Dev OpsWall of turmoil Dev Ops I want stabilityI want change Domain boundary Container frameworks Cluster schedulerDev Ops Continuous integration
  35. © Okta and/or its affiliates. All rights reserved. Okta Confidential Options Container frameworks Cluster schedulers Amazon ECSLXC
  36. Okta’s CI with ECS
  37. Okta Engineering
  38. Okta Engineering—How Do We Work, How Do We Ship Our Code? • 200 engineers, split into teams with embedded specialists • 1 week sprints, and deploy to production weekly • Capability to do more than one hotfix per day at customers’ request or for bugs found in CI or pre-prod • Every merge to master is a potential release candidate
  39. Okta Engineering—How Do We Test Our Code? • Every topic branch goes through the same amount of vigor in testing as release candidates. • Passing automated tests is enforced at commit time. • Largest repo: 33K tests, takes 60 minutes (22 parallel runs) • Smallest repo: 100 tests, 5 minutes • The Developer Productivity team is responsible for supporting engineering.
  40. Challenge of Developer Productivity Team • Developer experience • Quality • Cost • Cloud first
  41. Challenge of Developer Productivity Team • Developer experience • Quality • Cost • Cloud first Developers expect fast turn- around time and reliable results
  42. Challenge of Developer Productivity Team • Developer experience • Quality • Cost • Cloud first We need to run all the tests required to guarantee quality
  43. Challenge of Developer Productivity Team • Developer experience • Quality • Cost • Cloud first We need to run an infrastructure which is as cost- effective as possible
  44. Challenge of Developer Productivity Team • Developer experience • Quality • Cost • Cloud first We aim to use cloud services first, wherever possible
  45. Problems
  46. CI Using Open Source, Monolithic Applications
  47. Vision
  48. Vision • Clean testing environments • Dynamic worker scaling • Spot Instances for cost • Versioned testing • Improved queuing system • Less infrastructure flakiness • The correct privileges, to maintain security
  49. Vision • Clean testing environment • Dynamic worker scaling • Spot Instances for cost • Versioned testing • Improved queuing system • Less infrastructure flakiness • The correct privileges, to maintain security Isolate test environments from others, parallel and serial runs
  50. Vision • Clean testing environments • Dynamic worker scaling • Spot Instances for cost • Versioned testing • Improved queuing system • Less infrastructure flakiness • The correct privileges, to maintain security Workers should survive the loss of their build server Worker pool should scale quickly Number of workers should not affect memory footprint of build server
  51. Vision • Clean testing environment • Dynamic worker scaling • Spot Instances for cost • Versioned testing • Improved queuing system • Less infrastructure flakiness • The correct privileges, to maintain security Run our services for cheaper rates, as we have many short lived tasks, and could certainly handle a few failures
  52. Vision • Clean testing environment • Dynamic worker scaling • Spot Instances for cost • Versioned Testing • Improved queuing system • Less infrastructure flakiness • The correct privileges, to maintain security Enable testing of infrastructure changes in topic branches
  53. Vision • Clean testing environment • Dynamic worker scaling • Spot Instances for cost • Versioned testing • Improved queuing system • Less infrastructure flakiness • The correct privileges, to maintain security Should survive build server reboots Shouldn’t be tied to specific workers or build servers Centralized Should have good visibility Re-queuing of lost tasks
  54. Vision • Clean testing environment • Dynamic worker scaling • Spot Instances for cost • Versioned testing • Improved queuing system • Less infrastructure flakiness • The correct privileges, to maintain security Push testing and creation of test machines to developers
  55. Vision • Clean testing environment • Dynamic worker scaling • Spot Instances for cost • Versioned testing • Improved queuing system • Less infrastructure flakiness • The correct privileges, to maintain security Launch tasks in secure environments
  56. Solutions
  57. Custom Reporting
  58. ECS and Docker • AWS + Java app tailored to Okta process • Immutable and disposable build workers—created for one-time use, destroyed when job is done • Near ZERO cost on weekends, scales with load • ECS allows us to maximize usage of EC2 instances • Same containers for multiple types and numbers of builds • Same AMI can run multiple Docker images
  59. Amazon ECS IAM separation per service • Either service per cluster or use new IAM for ECS functionality Sharing the docker daemon to allow running Docker within Docker Pre-fetching large data blobs and making them available on the hosts is an option Multiple containers: mysql, redis, kinesilite
  60. Docker Update • Update Dockerfile and our CI system builds the new image, uploading it to our repository • Update task definition for cluster updates
  61. Docker Conventions • Dockerfiles live with project code, versioned together • docker-compose used for development, so a clone plus build will have a full service running locally • Single repo for library and third-party service definitions • Secrets or any form of config NEVER baked in containers • Start from minimal, audited base OS • Strict rules around “FROM” clause • Build owns creating immutable version and publishing
  62. Docker Build Process
  63. Task Definitions { "taskDefinitionArn": "arn:aws:ecs:us-east-1:262205085595:task-definition/base-container-box-task:1", "containerDefinitions": [ { "memory": 15000, "essential": true, "mountPoints": [ { "containerPath": "/usr/bin/docker", "sourceVolume": "docker_daemon", "readOnly": null }, { "containerPath": "/var/run/docker.sock", "sourceVolume": "docker_socket", "readOnly": null }
  64. Task Definitions ], } ], "volumes": [ { "host": { "sourcePath": "/var/run/docker.sock" }, "name": "docker_socket" }, { "host": { "sourcePath": "/usr/bin/docker" }, "name": "docker_daemon" } ], "family": "base-container-box-task”
  65. Clean Testing Environments • Docker images • Nearly instant machine refresh • Easy for users to create and upload images that have been tested to work locally • Efficient machine use • ECS with ECR and private repository back end
  66. Dynamic Worker Scaling SQS LambdaSNS Lambda Scaling Bin packing ECS
  67. Dynamic Worker Scaling Lambda allocates jobs using bin packing This is one of the changes we had to make in order to use ECS for long running tasks, rather than services spread across many stateless instances Disconnects unneeded nodes from cluster, allowing themselves to self-terminate when they are idle VS
  68. Dynamic Worker Scaling Lambda allocates jobs using bin packing This is one of the changes we had to make in order to use ECS for long running tasks, rather than services spread across many stateless instances Disconnects unneeded nodes from cluster, allowing themselves to self-terminate when they are idle VS
  69. Dynamic Worker Scaling Lambda allocates jobs using bin packing This is one of the changes we had to make in order to use ECS for long running tasks, rather than services spread across many stateless instances Disconnects unneeded nodes from cluster, allowing themselves to self-terminate when they are idle VS
  70. Dynamic Worker Scaling Lambda allocates jobs using bin packing This is one of the changes we had to make in order to use ECS for long running tasks, rather than services spread across many stateless instances Disconnects unneeded nodes from cluster, allowing themselves to self-terminate when they are idle VS
  71. Dynamic Worker Scaling` Lambda allocates jobs using bin packing This is one of the changes we had to make in order to use ECS for long running tasks, rather than services spread across many stateless instances Disconnects unneeded nodes from cluster, allowing themselves to self-terminate when they are idle VS
  72. Dynamic Worker Scaling
  73. Spot Instances • We use Spot Instances across all Availability Zones • Manually switch between On-Demand and Spot Instances 3 times per week during Spot price spikes • We are planning on moving to Spot Fleet soon • Set pricing to On-Demand prices, we lose build slaves whenever pricing goes above On-Demand prices • 4000-6000 instance hours per day, about 1500 Spot losses per week
  74. Spot Instances
  75. Spot Instances
  76. Spot Instances
  77. Versioned Jobs Scripts checked into repositories Makes a transition to Docker jobs easy
  78. Versioned Jobs with ECS • Versioned build and test scripts can now be run in versioned Docker containers, using versioned task definitions • Creates extreme flexibility • CloudFormation allows us to stand up whole new clusters with all different versions in a matter of minutes for long term testing
  79. ECS + Docker Problems • Docker containers not launching • ECS agent failing • Docker containers stopping • Incompatibility with certain services • Docker OS availability • Cleanup - AWS has made this configurable • Image size
  80. Amazon Web Services EC2 SQS LambdaECS S3 RDS Amazon Kinesis Spot Instances ECR CloudFormation SNS CloudWatch CloudTrail
  81. Building CI with Amazon Web Services
  82. Future
  83. Expand Use • Use ECS for more services • Allow developers to control their test suites and Docker images more directly • Developer environments • Use Docker for local long running services • Use a VM running the same version OS • Remote updates to keep it in line with CD system • Aim to enable running CD containers right out of the box
  84. ECS Services In Production
  85. © Okta and/or its affiliates. All rights reserved. Requirements • Support for our multi-AZ & multi-region architecture • Compliance – SOC2 type 2, HIPAA, ISO 27001, FedRAMP • Least-privilege principle - independent IAM roles per service • Host to host encryption • Deployment support for: • Rollback • Canary • Blue-green • 0-downtime deployments
  86. 0-Downtime Testing https://github.com/jontodd/aries
  87. © Okta and/or its affiliates. All rights reserved. Okta Confidential Test Assumptions • ECS config • Agent version 1.11.0 • Docker version 1.11.2 • Cluster config • 8 instances backed by ASG • ASG config • 8 instances across 3 AZs • Default termination policy • 5 min health check grace period • ELB • Timeout 4s • Interval 5s • Unhealthy threshold 2 • Healthy threshold 10 • Enable connection draining 300s timeout • Load generation • 16 threads • Throughput • Interactive ➔ 490 r/s • 10s long poll ➔ 1.5 r/s
  88. © Okta and/or its affiliates. All rights reserved. Okta Confidential 89 Operation Interactive Errors (~70ms latency, 490rps) Long Poll Errors (~10s latency, 1.5rps) Upsize ECS service 4 → 8 0 0 Downsize ECS service 8 → 4 0 0 Deploy ECS service – 50% min healthy 0 0 Stop task* 0 0 Downsize Auto Scaling group 0 0 Terminate EC2 instance 0 0 Stop Docker daemon (service docker stop)* 0 0 Stop EC2 instance** 0 0 Kill Docker container (docker kill <containerId>)* 2 2 Fail health check 450 5 * No intention of running operation in practice ** Caused inconsistent state
  89. Workflow Auto Scaling group Launch config EC2 ECS cluster ECS service ECS canary serviceApplication YAML Docker Registry (Artifactory) ELB Images pulled when tasks start Conductor (Bastion ECS controller) CI Pipeline Git repo Promoted artifactsDockerfile docker_compose.yml Test / Preview / ProductionDev Deploy new version
  90. © Okta and/or its affiliates. All rights reserved. Okta Confidential Application definition • Developers define YAML for their application • Deploy time configuration is supplied to the ECS task definition • Secrets are pulled by the application at startup
  91. Demo
  92. © Okta and/or its affiliates. All rights reserved. Feature requests • Dynamic port mapping (Application load balancing) • Service autoscaling • Per container IAM roles • Per-container security groups • Bin-packing scheduler
  93. © Okta and/or its affiliates. All rights reserved. Lessons learned • /etc/ecs/ecs.config • ECS_ENGINE_TASK_CLEANUP_WAIT_DURATION for forensics (default 1hr) • ECS_LOGLEVEL=debug • Tune ELB health check • Docker 1.10 for security enhancements • Canary & blue/green separate service attached to same ELB • ECS is incredibly easy to get up and running • The ecosystem is changing quickly
  94. Thank you! Jon Todd – @JonToddDotCom Tim Secor - @TimSecor Danielle Greshock – greshock@amazon.com
  95. Remember to complete your evaluations!
Anúncio