SlideShare uma empresa Scribd logo
1 de 78
Baixar para ler offline
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Frank Chen, Coursera
Brennan Saeta, Coursera
October 2015
CMP406
Amazon ECS at Coursera
Powering a general-purpose near-line execution
microservice, while defending against untrusted code
What to Expect from the Session
• Techniques for a unified near-line, batch, and scheduled
micro-service powered by Amazon ECS
• Security vulnerabilities and countermeasures when
running untrusted code in Docker with Amazon ECS
• Reasons to modify the Amazon ECS agent
Session Outline
• Introduction to Coursera
• Near-line, batch and scheduled job execution framework
• Motivations and background
• Amazon ECS benefits and limitations
• Iguazú and its architecture
• Evaluating programming assignments
• System requirements
• Security threat model
• Attacks and defenses
Education at Scale
15 million
learners worldwide
2.5 million
course completions
1,300+
courses
125+
partners
A unified execution framework
Batch Processing Enables…
Reporting
Instructor Reports
• Grade exports
• Learner demographics
• Course progress
statistics
Internal Reports
• Business metrics
• Payments
reconciliation
Scheduled Processing Enables…
Marketing
• Recommendation emails
• Targeted marketing / reactivation emails
Nearline Processing Enables…
Pedagogical Innovations
• Peer-review matching & analysis
• Auto-graded programming assignments
The early days…
January 2012
Bad Old Days of Batch Processing @ Coursera
Cascade
• PHP-based job runner
• Originally ran in screen sessions
• Polled APIs for new jobs
• Forced restarts on regular basis
due to unidentified memory leaks
• Fragile and unreliable
The early
days…
Bad Old Days of Batch Processing @ Coursera
Saturn
• Scala scheduled batch job runner
• Powered by Quartz Scheduler library
• Better than Cascade, but…
• All jobs ran on same JVM, causing
interference
The not-
so early
days?
Looking for something better…
What We Wanted
Reliable Easy Development Easy Deployment
High Efficiency Low Ops Load Cost Effective
What We Wanted
Reliable Easy Development Easy Deployment
High Efficiency Low Ops Load Cost Effective
What We Wanted
Reliable Easy Development Easy Deployment
High Efficiency Low Ops Load Cost Effective
What We Wanted
Reliable Easy Development Easy Deployment
High Efficiency Low Ops Load Cost Effective
What We Wanted
Reliable Easy Development Easy Deployment
High Efficiency Low Ops Load Cost Effective
What We Wanted
Reliable Easy Development Easy Deployment
High Efficiency Low Ops Load Cost Effective
What Else Did We Look At?
Home-grown Tech
• Tried, but proved
to be unreliable
• Difficult to
handle
coordination and
synchronization
• Powerful, but
hard to
productionize
• Needs
developers with
experience
• Designed for
GCE first
• Not a managed
service, higher
Ops load
Amazon ECS to the Rescue
Amazon re:Invent 2014 – Dr. Werner Vogels introducing Amazon ECS
Screenshot from https://www.youtube.com/watch?v=LE5uBqNp2Ds by Amazon Web Services
Amazon ECS to the Rescue
Little
maintenance
Integrated with
rest of AWS
Easy to
develop for
Amazon ECS to the Rescue
Little
maintenance
Integrated with
rest of AWS
Easy to
develop for
Amazon ECS to the Rescue
Little
maintenance
Integrated with
rest of AWS
Easy to
develop for
However…
Amazon ECS is a great building block,
but we still need to build tools around it
for our purposes.
What We Built: Iguazú
Marissa Strniste (https://www.flickr.com/photos/mstrniste/5999464924) CC-BY-2.0
• Batch Job Scheduler for Amazon ECS
• Immediately
• Deferred (run once at X time)
• Scheduled recurring (cron-like)
• Programmatically accessible internally via
our standard APIs and clients
• Named for Iguazú falls
• World’s largest waterfall by volume
• We hope Iguazú handles a similar volume of jobs
Iguazú
Frontend
Iguazú
Scheduler
Iguazú
Backend
Iguazú: Architecture
CassandraServices Services
Iguazú
Admin
ECS
Workers
SQS
ECS API
Devs
Users
Iguazú
Frontend
Iguazú
Scheduler
Iguazú
Backend
Iguazú: Architecture
CassandraServices Services
Iguazú
Admin
ECS
Workers
SQS
ECS API
Devs
Users
Iguazú
Frontend
Iguazú
Scheduler
Iguazú
Backend
Iguazú: Architecture
CassandraServices Services
Iguazú
Admin
ECS
Workers
SQS
ECS API
Devs
Users
Iguazú
Frontend
Iguazú
Scheduler
Iguazú
Backend
Iguazú: Architecture
CassandraServices Services
Iguazú
Admin
ECS
Workers
SQS
ECS API
Devs
Users
Iguazú
Frontend
Iguazú
Scheduler
Iguazú
Backend
Iguazú: Architecture
CassandraServices Services
Iguazú
Admin
ECS
Workers
SQS
ECS API
Devs
Users
Iguazú
Frontend
Iguazú
Scheduler
Iguazú
Backend
Iguazú: Architecture
CassandraServices Services
Iguazú
Admin
ECS
Workers
SQS
ECS API
Devs
Users
Iguazú
Frontend
Iguazú
Scheduler
Iguazú
Backend
Iguazú: Architecture
CassandraServices Services
Iguazú
Admin
ECS
Workers
SQS
ECS API
Devs
Users
Iguazú
Frontend
Iguazú
Scheduler
Iguazú
Backend
Iguazú: Architecture
CassandraServices Services
Iguazú
Admin
ECS
Workers
SQS
ECS API
Devs
Users
Developing Iguazú Jobs
class Job extends AbstractJob with StrictLogging {
override val reservedCpu = 1024 // 1 CPU core
override val reservedMemory = 1024 // 1 GB RAM
def run(parameters: JsValue) = {
logger.info("I am running my job! ")
expensiveComputationHere()
}
}
Running Jobs from Other Services
// invoking a job with one function call
// from another service via Naptime RPC/REST framework
val invocationId = IguazuJobInvocationClient
.create(IguazuJobInvocationRequest(
jobName = "exportQuizGrades",
parameters = quizParams))
Iguazú: Developer / Ops User Interface
Deploying Jobs
Easy Deployment
1. Developers  Merge into master. Done!
Jenkins Build Steps:
1. Builds zip package from master
2. Prepares Docker image with zip file
3. Pushes image into Docker registry
4. Registers updated jobs with
Amazon ECS API
Logs
• Logs are in /var/lib/docker/containers/*
• Upload into log analysis service (Sumologic)
• Wrapper prints out job name and job ID
at the start for easy searching
• Good enough for now
Metrics
• Using third-party metrics collector (Datadog)
• Metrics for both jobs and container instances
• So long as the worker machines can talk to Internet,
things will work out pretty well
Since April 2015…
65 jobs in
production
>1000 runs
per day
44 different
scheduled jobs
Evaluating
Programming Assignments
Programming Assignments at Coursera
The Security Challenge
Compiling and running untrusted, arbitrary code in
Amazon EC2
Would you like to compile and run C code from random
people on the Internet on your servers?
1st Generation System
Class graders in
separate AWS acct
Custom grader systems
on cloud providers
Course grader under the
instructor’s desk
Learners Coursera Servers Queue Service
1st Generation System: Weaknesses
No Auto Scaling No standard security Graders crashed
1st Generation System: Weaknesses
No Auto Scaling No standard security Graders crashed
1st Generation System: Weaknesses
No Auto Scaling No standard security Graders crashed
Design Goals
Cost Savings No Maintenance Near Real-time Secure Infrastructure
Design Goals
Cost Savings No Maintenance Near Real-time Secure Infrastructure
Design Goals
Cost Savings No Maintenance Near Real-time Secure Infrastructure
Design Goals
Cost Savings No Maintenance Near Real-time Secure Infrastructure
Threat Model
Prevent submitted code from:
• impacting the evaluation of other submissions.
• disrupting the grading environment (e.g., DoS)
• affecting the rest of the Coursera learning platform
Additional goals:
• Minimize exfiltration of information
• Test cases, solutions, etc…
• Minimize risk of submissions changing own scores
• Avoid turning into bitcoin miners or part of botnet
Threat Model - Assumptions
• Run arbitrary binaries
• Instructor grading scripts may have vulnerabilities
• ∴ Grading code is untrusted
• Unknown vulnerabilities in Docker and Linux name-
spacing and/or container implementation
Attack / Vulnerability Classes
Divided into 2 main categories:
• Assuming basic containers are secure, prevent any
negative impacts to running arbitrary code.
• Assuming basic container technology is vulnerable,
mitigate negative impacts as much as possible.
What We Built: GrID
Patrick Hoesly (https://www.flickr.com/photos/zooboing/5665221326/) CC-BY-2.0
• Service + architecture for grading
programming assignments
• Builds on Amazon ECS and Iguazú
• Named for Tron’s “digital frontier”
• Backronym: Grading Inside Docker
High-level GrID Architecture
Learners
GrID
Iguazú
S3 Bucket
ECS APIs
Grading MachinesVPC Firewalls
Coursera Production Account Coursera GrID Grading Account
High-level GrID Architecture
Learners
GrID
Iguazú
S3 Bucket
ECS APIs
Grading MachinesVPC Firewalls
Coursera Production Account Coursera GrID Grading Account
High-level GrID Architecture
Learners
GrID
Iguazú
S3 Bucket
ECS API
Grading MachinesVPC Firewalls
Production Acct GrID Grading Account
High-level GrID Architecture
Learners
GrID
Iguazú
S3 Bucket
ECS API
Grading
Machines
VPC
Firewalls
Production Acct GrID Grading Account
Attacks: Resource Exhaustion
Defenses:
• Docker / CGroups:
• CPU quotas
• Memory limits
• Swap limits
• Hard timeouts for container execution
• btrfs limits
• file system storage quotas
• IOPS throttling
Attacks: Kernel Resource Exhaustion
Defenses:
• Open file limits per container (nofile)
• nproc Process limits
• Limit kernel memory per cgroup
• Limit execution time
Attacks: Network attacks
Attacks:
• Bitcoin mining
• DoS attacks on third-party systems
• Access Amazon S3 and other AWS
APIs
Defense:
• Deny network access
Modifying the ECS Agent: Network Modes
• NetworkDisabled too restrictive
• Some graders require local loopback
• Feature also deprecated
• --net=none + deny net_admin +
audit network
• Isolation via Docker creating an
independent network stack for each
container
• github.com/coursera/amazon-ecs-agent
Attacks: Namespace / Container Vulnerabilities
• App Armor & Mandatory Access Control
• Required modifying the Amazon ECS Agent
• Allows auditing or denying access to a
variety of subsystems
• Drop capabilities
• No need for NET_BIND_SERVICE,
CAP_FOWNER
• No root within container
Attacks: Root escalations within the container
• We modify instructor grader images
before allowing them to be run
• Clears setuid
• Inserts C wrapper to drop privileges from
root and redirect stdin/stdout/stderr
• Required Amazon ECS Agent
modification
• Grant root privileges
• Map Docker socket into Docker
containers to run Docker in Docker!
Attacks: If all else fails…
• Utilizes VPC security measures to
further restrict network access
• No public internet access
• Security group to restrict
inbound/outbound access
• Network flow logs for auditing
• Separate AWS account
• Run in an Auto Scaling group
• Regularly terminate all grading EC2
instances
Other Security Measures
• Utilize AWS CloudTrail for audit logs
• Third-party security monitoring
(Threat Stack)
• No one should log in, so any TTY is an alert
• Penetration testing by third-party red
team (Synack)
Technique: Co-process
• Environment has no network, but has to
get submissions in and results out
• Python co-process watches Amazon ECS
/ Docker
• Python co-process then:
• Mounts a shared folder containing submission
• Reads back the grade from the shared folder
after container exits
• Monitors and cleans up
Future Improvements
• Priority queues for different grading
priorities
• Re-grades vs on-demand grades
• Better instructor tooling
• Automated “unit-testing” for new graders
• Better simulation of production
environment on instructor machines
• Support scheduling GPUs
Lessons Learned
• Run the latest kernels
• Latest security patches
• btrfs wedging on older kernels
• Default Ubuntu 14.04 kernel not new
enough!
• Carefully monitor disk usage
• Docker-in-docker can’t clean up after
itself (yet).
• Reliable deploy tooling pays for itself
Related Sessions
Also from Coursera:
• BDT404 - Building and Managing Large-Scale ETL Data
Flows with AWS Data Pipeline and Dataduct - Friday
Containers and Amazon ECS:
• CMP302 - Amazon EC2 Container Service: Distributed
Applications at Scale – Next timeslot in Venetian H
Thank you!
Questions?
Also, we are hiring!
www.coursera.org/jobs
tech.coursera.org
Brennan Saeta
github/saeta
@bsaeta
saeta@coursera.org
Frank Chen
github/frankchn
@frankchn
frankchn@coursera.org
Remember to complete
your evaluations!

Mais conteúdo relacionado

Mais procurados

Serverless in java Lessons learnt
Serverless in java Lessons learntServerless in java Lessons learnt
Serverless in java Lessons learntKrzysztof Pawlowski
 
Managing Your Application Lifecycle on AWS: Continuous Integration and Deploy...
Managing Your Application Lifecycle on AWS: Continuous Integration and Deploy...Managing Your Application Lifecycle on AWS: Continuous Integration and Deploy...
Managing Your Application Lifecycle on AWS: Continuous Integration and Deploy...Amazon Web Services
 
Continuous Delivery to Amazon ECS
Continuous Delivery to Amazon ECSContinuous Delivery to Amazon ECS
Continuous Delivery to Amazon ECSAmazon Web Services
 
Continuous delivery and deployment on AWS
Continuous delivery and deployment on AWSContinuous delivery and deployment on AWS
Continuous delivery and deployment on AWSShiva Narayanaswamy
 
Scala, ECS, Docker: Delayed Execution @Coursera
Scala, ECS, Docker: Delayed Execution @CourseraScala, ECS, Docker: Delayed Execution @Coursera
Scala, ECS, Docker: Delayed Execution @CourseraC4Media
 
DevOps On AWS - Deep Dive on Continuous Delivery
DevOps On AWS - Deep Dive on Continuous DeliveryDevOps On AWS - Deep Dive on Continuous Delivery
DevOps On AWS - Deep Dive on Continuous DeliveryMikhail Prudnikov
 
Application Lifecycle Management in a Serverless World
Application Lifecycle Management in a Serverless WorldApplication Lifecycle Management in a Serverless World
Application Lifecycle Management in a Serverless WorldAmazon Web Services
 
Java script nirvana in netbeans [con5679]
Java script nirvana in netbeans [con5679]Java script nirvana in netbeans [con5679]
Java script nirvana in netbeans [con5679]Ryan Cuprak
 
A Hitchhiker’s Guide to the Cloud Native Stack. #CDS17
A Hitchhiker’s Guide to the Cloud Native Stack. #CDS17A Hitchhiker’s Guide to the Cloud Native Stack. #CDS17
A Hitchhiker’s Guide to the Cloud Native Stack. #CDS17Mario-Leander Reimer
 
(CMP302) Amazon ECS: Distributed Applications at Scale
(CMP302) Amazon ECS: Distributed Applications at Scale(CMP302) Amazon ECS: Distributed Applications at Scale
(CMP302) Amazon ECS: Distributed Applications at ScaleAmazon Web Services
 
Play Framework + Docker + CircleCI + AWS + EC2 Container Service
Play Framework + Docker + CircleCI + AWS + EC2 Container ServicePlay Framework + Docker + CircleCI + AWS + EC2 Container Service
Play Framework + Docker + CircleCI + AWS + EC2 Container ServiceJosh Padnick
 
Accelerating Innovation with DevOps on AWS
Accelerating Innovation with DevOps on AWSAccelerating Innovation with DevOps on AWS
Accelerating Innovation with DevOps on AWSAmazon Web Services
 
(DEV302) Hosting ASP.Net 5 Apps in AWS with Docker & AWS CodeDeploy
(DEV302) Hosting ASP.Net 5 Apps in AWS with Docker & AWS CodeDeploy(DEV302) Hosting ASP.Net 5 Apps in AWS with Docker & AWS CodeDeploy
(DEV302) Hosting ASP.Net 5 Apps in AWS with Docker & AWS CodeDeployAmazon Web Services
 
CI/CD on AWS Deploy Everything All the Time
CI/CD on AWS Deploy Everything All the TimeCI/CD on AWS Deploy Everything All the Time
CI/CD on AWS Deploy Everything All the TimeAmazon Web Services
 
Managing Your Application Lifecycle on AWS: Continuous Integration and Deploy...
Managing Your Application Lifecycle on AWS: Continuous Integration and Deploy...Managing Your Application Lifecycle on AWS: Continuous Integration and Deploy...
Managing Your Application Lifecycle on AWS: Continuous Integration and Deploy...Amazon Web Services
 
Infrastructure as Code with Ansible
Infrastructure as Code with AnsibleInfrastructure as Code with Ansible
Infrastructure as Code with AnsibleDaniel Bezerra
 
Rock Solid Deployment of Web Applications
Rock Solid Deployment of Web ApplicationsRock Solid Deployment of Web Applications
Rock Solid Deployment of Web ApplicationsPablo Godel
 
(CMP406) Amazon ECS at Coursera: A general-purpose microservice
(CMP406) Amazon ECS at Coursera: A general-purpose microservice(CMP406) Amazon ECS at Coursera: A general-purpose microservice
(CMP406) Amazon ECS at Coursera: A general-purpose microserviceAmazon Web Services
 

Mais procurados (20)

Serverless in java Lessons learnt
Serverless in java Lessons learntServerless in java Lessons learnt
Serverless in java Lessons learnt
 
New AWS Services
New AWS ServicesNew AWS Services
New AWS Services
 
Managing Your Application Lifecycle on AWS: Continuous Integration and Deploy...
Managing Your Application Lifecycle on AWS: Continuous Integration and Deploy...Managing Your Application Lifecycle on AWS: Continuous Integration and Deploy...
Managing Your Application Lifecycle on AWS: Continuous Integration and Deploy...
 
Continuous Delivery to Amazon ECS
Continuous Delivery to Amazon ECSContinuous Delivery to Amazon ECS
Continuous Delivery to Amazon ECS
 
Continuous delivery and deployment on AWS
Continuous delivery and deployment on AWSContinuous delivery and deployment on AWS
Continuous delivery and deployment on AWS
 
Scala, ECS, Docker: Delayed Execution @Coursera
Scala, ECS, Docker: Delayed Execution @CourseraScala, ECS, Docker: Delayed Execution @Coursera
Scala, ECS, Docker: Delayed Execution @Coursera
 
DevOps On AWS - Deep Dive on Continuous Delivery
DevOps On AWS - Deep Dive on Continuous DeliveryDevOps On AWS - Deep Dive on Continuous Delivery
DevOps On AWS - Deep Dive on Continuous Delivery
 
Application Lifecycle Management in a Serverless World
Application Lifecycle Management in a Serverless WorldApplication Lifecycle Management in a Serverless World
Application Lifecycle Management in a Serverless World
 
Java script nirvana in netbeans [con5679]
Java script nirvana in netbeans [con5679]Java script nirvana in netbeans [con5679]
Java script nirvana in netbeans [con5679]
 
A Hitchhiker’s Guide to the Cloud Native Stack. #CDS17
A Hitchhiker’s Guide to the Cloud Native Stack. #CDS17A Hitchhiker’s Guide to the Cloud Native Stack. #CDS17
A Hitchhiker’s Guide to the Cloud Native Stack. #CDS17
 
(CMP302) Amazon ECS: Distributed Applications at Scale
(CMP302) Amazon ECS: Distributed Applications at Scale(CMP302) Amazon ECS: Distributed Applications at Scale
(CMP302) Amazon ECS: Distributed Applications at Scale
 
Play Framework + Docker + CircleCI + AWS + EC2 Container Service
Play Framework + Docker + CircleCI + AWS + EC2 Container ServicePlay Framework + Docker + CircleCI + AWS + EC2 Container Service
Play Framework + Docker + CircleCI + AWS + EC2 Container Service
 
Accelerating Innovation with DevOps on AWS
Accelerating Innovation with DevOps on AWSAccelerating Innovation with DevOps on AWS
Accelerating Innovation with DevOps on AWS
 
(DEV302) Hosting ASP.Net 5 Apps in AWS with Docker & AWS CodeDeploy
(DEV302) Hosting ASP.Net 5 Apps in AWS with Docker & AWS CodeDeploy(DEV302) Hosting ASP.Net 5 Apps in AWS with Docker & AWS CodeDeploy
(DEV302) Hosting ASP.Net 5 Apps in AWS with Docker & AWS CodeDeploy
 
CI/CD on AWS Deploy Everything All the Time
CI/CD on AWS Deploy Everything All the TimeCI/CD on AWS Deploy Everything All the Time
CI/CD on AWS Deploy Everything All the Time
 
Managing Your Application Lifecycle on AWS: Continuous Integration and Deploy...
Managing Your Application Lifecycle on AWS: Continuous Integration and Deploy...Managing Your Application Lifecycle on AWS: Continuous Integration and Deploy...
Managing Your Application Lifecycle on AWS: Continuous Integration and Deploy...
 
Introduction to Docker on AWS
Introduction to Docker on AWSIntroduction to Docker on AWS
Introduction to Docker on AWS
 
Infrastructure as Code with Ansible
Infrastructure as Code with AnsibleInfrastructure as Code with Ansible
Infrastructure as Code with Ansible
 
Rock Solid Deployment of Web Applications
Rock Solid Deployment of Web ApplicationsRock Solid Deployment of Web Applications
Rock Solid Deployment of Web Applications
 
(CMP406) Amazon ECS at Coursera: A general-purpose microservice
(CMP406) Amazon ECS at Coursera: A general-purpose microservice(CMP406) Amazon ECS at Coursera: A general-purpose microservice
(CMP406) Amazon ECS at Coursera: A general-purpose microservice
 

Destaque

CAMINO DE SANTIAGO FRENCH WAY PRIMO
CAMINO DE SANTIAGO FRENCH WAY PRIMOCAMINO DE SANTIAGO FRENCH WAY PRIMO
CAMINO DE SANTIAGO FRENCH WAY PRIMONellie Meunier
 
2009 / 11 / 11 meeting 用
2009 / 11 / 11 meeting  用2009 / 11 / 11 meeting  用
2009 / 11 / 11 meeting 用Che-Hsien Lin
 
Activid ad glosario[3[2
Activid ad glosario[3[2Activid ad glosario[3[2
Activid ad glosario[3[2guestb35b30
 
Europe
EuropeEurope
EuropeMrO97
 
ERTMSFormalSpecs Presentation - October 2016
ERTMSFormalSpecs Presentation - October 2016ERTMSFormalSpecs Presentation - October 2016
ERTMSFormalSpecs Presentation - October 2016ERTMS Solutions
 
– Introdução: da Melhoria Contínua para Grandes Transformações [Privado]
– Introdução: da Melhoria Contínua para Grandes Transformações [Privado]– Introdução: da Melhoria Contínua para Grandes Transformações [Privado]
– Introdução: da Melhoria Contínua para Grandes Transformações [Privado]EloGroup
 
Human Resource Planning And Staffing
Human Resource Planning And StaffingHuman Resource Planning And Staffing
Human Resource Planning And StaffingNational HRD Network
 
TURBIECSA PRESENTACIÓN 26.10.2016
TURBIECSA PRESENTACIÓN 26.10.2016 TURBIECSA PRESENTACIÓN 26.10.2016
TURBIECSA PRESENTACIÓN 26.10.2016 David Cifuentes
 
PM-Summit_NadjaSchroeer_V3
PM-Summit_NadjaSchroeer_V3PM-Summit_NadjaSchroeer_V3
PM-Summit_NadjaSchroeer_V3More Shiny Eyes
 

Destaque (11)

summery of book EQ -Nasrein Parsa
summery of book EQ -Nasrein Parsasummery of book EQ -Nasrein Parsa
summery of book EQ -Nasrein Parsa
 
CAMINO DE SANTIAGO FRENCH WAY PRIMO
CAMINO DE SANTIAGO FRENCH WAY PRIMOCAMINO DE SANTIAGO FRENCH WAY PRIMO
CAMINO DE SANTIAGO FRENCH WAY PRIMO
 
2009 / 11 / 11 meeting 用
2009 / 11 / 11 meeting  用2009 / 11 / 11 meeting  用
2009 / 11 / 11 meeting 用
 
Activid ad glosario[3[2
Activid ad glosario[3[2Activid ad glosario[3[2
Activid ad glosario[3[2
 
CV Nikolai Bisschop
CV Nikolai BisschopCV Nikolai Bisschop
CV Nikolai Bisschop
 
Europe
EuropeEurope
Europe
 
ERTMSFormalSpecs Presentation - October 2016
ERTMSFormalSpecs Presentation - October 2016ERTMSFormalSpecs Presentation - October 2016
ERTMSFormalSpecs Presentation - October 2016
 
– Introdução: da Melhoria Contínua para Grandes Transformações [Privado]
– Introdução: da Melhoria Contínua para Grandes Transformações [Privado]– Introdução: da Melhoria Contínua para Grandes Transformações [Privado]
– Introdução: da Melhoria Contínua para Grandes Transformações [Privado]
 
Human Resource Planning And Staffing
Human Resource Planning And StaffingHuman Resource Planning And Staffing
Human Resource Planning And Staffing
 
TURBIECSA PRESENTACIÓN 26.10.2016
TURBIECSA PRESENTACIÓN 26.10.2016 TURBIECSA PRESENTACIÓN 26.10.2016
TURBIECSA PRESENTACIÓN 26.10.2016
 
PM-Summit_NadjaSchroeer_V3
PM-Summit_NadjaSchroeer_V3PM-Summit_NadjaSchroeer_V3
PM-Summit_NadjaSchroeer_V3
 

Semelhante a Amazon ECS at Coursera: A unified execution framework while defending against untrusted code

SRV312 DevOps on AWS: Building Systems to Deliver Faster
SRV312 DevOps on AWS: Building Systems to Deliver FasterSRV312 DevOps on AWS: Building Systems to Deliver Faster
SRV312 DevOps on AWS: Building Systems to Deliver FasterAmazon Web Services
 
ENT201 A Tale of Two Pizzas: Accelerating Software Delivery with AWS Develope...
ENT201 A Tale of Two Pizzas: Accelerating Software Delivery with AWS Develope...ENT201 A Tale of Two Pizzas: Accelerating Software Delivery with AWS Develope...
ENT201 A Tale of Two Pizzas: Accelerating Software Delivery with AWS Develope...Amazon Web Services
 
AWS re:Invent 2016: Development Workflow with Docker and Amazon ECS (CON302)
AWS re:Invent 2016: Development Workflow with Docker and Amazon ECS (CON302)AWS re:Invent 2016: Development Workflow with Docker and Amazon ECS (CON302)
AWS re:Invent 2016: Development Workflow with Docker and Amazon ECS (CON302)Amazon Web Services
 
ENT201 A Tale of Two Pizzas: Accelerating Software Delivery with AWS Develope...
ENT201 A Tale of Two Pizzas: Accelerating Software Delivery with AWS Develope...ENT201 A Tale of Two Pizzas: Accelerating Software Delivery with AWS Develope...
ENT201 A Tale of Two Pizzas: Accelerating Software Delivery with AWS Develope...Amazon Web Services
 
Operations and Security at Cloud Scale with Amazon EC2 System Manager - AWS S...
Operations and Security at Cloud Scale with Amazon EC2 System Manager - AWS S...Operations and Security at Cloud Scale with Amazon EC2 System Manager - AWS S...
Operations and Security at Cloud Scale with Amazon EC2 System Manager - AWS S...Amazon Web Services
 
AWS ECS Copilot DevOps Presentation
AWS ECS Copilot DevOps PresentationAWS ECS Copilot DevOps Presentation
AWS ECS Copilot DevOps PresentationVarun Manik
 
A Tale of Two Pizzas: Accelerating Software Delivery with AWS Developer Tools
A Tale of Two Pizzas: Accelerating Software Delivery with AWS Developer ToolsA Tale of Two Pizzas: Accelerating Software Delivery with AWS Developer Tools
A Tale of Two Pizzas: Accelerating Software Delivery with AWS Developer ToolsAmazon Web Services
 
DevOps Tooling - Pop-up Loft TLV 2017
DevOps Tooling - Pop-up Loft TLV 2017DevOps Tooling - Pop-up Loft TLV 2017
DevOps Tooling - Pop-up Loft TLV 2017Amazon Web Services
 
20211202 NADOG Adapting to Covid with Serverless Craeg Strong Ariel Partners
20211202 NADOG Adapting to Covid with Serverless Craeg Strong Ariel Partners20211202 NADOG Adapting to Covid with Serverless Craeg Strong Ariel Partners
20211202 NADOG Adapting to Covid with Serverless Craeg Strong Ariel PartnersCraeg Strong
 
Developing in the Cloud
Developing in the CloudDeveloping in the Cloud
Developing in the CloudRyan Cuprak
 
DevOps on AWS: DevOps Day San Francisco
DevOps on AWS: DevOps Day San FranciscoDevOps on AWS: DevOps Day San Francisco
DevOps on AWS: DevOps Day San FranciscoAmazon Web Services
 
Canada DevOps Summit 2020 Presentation Nov_03_2020
Canada DevOps Summit 2020 Presentation Nov_03_2020Canada DevOps Summit 2020 Presentation Nov_03_2020
Canada DevOps Summit 2020 Presentation Nov_03_2020Varun Manik
 
20211202 North America DevOps Group NADOG Adapting to Covid With Serverless C...
20211202 North America DevOps Group NADOG Adapting to Covid With Serverless C...20211202 North America DevOps Group NADOG Adapting to Covid With Serverless C...
20211202 North America DevOps Group NADOG Adapting to Covid With Serverless C...Craeg Strong
 
A Tale of Two Pizzas: Developer Tools at AWS - DevDay Los Angeles 2017
A Tale of Two Pizzas: Developer Tools at AWS - DevDay Los Angeles 2017A Tale of Two Pizzas: Developer Tools at AWS - DevDay Los Angeles 2017
A Tale of Two Pizzas: Developer Tools at AWS - DevDay Los Angeles 2017Amazon Web Services
 
Announcing AWS CodeBuild - January 2017 Online Teck Talks
Announcing AWS CodeBuild - January 2017 Online Teck TalksAnnouncing AWS CodeBuild - January 2017 Online Teck Talks
Announcing AWS CodeBuild - January 2017 Online Teck TalksAmazon Web Services
 
Breaking the Monolith Road to Containers
Breaking the Monolith Road to ContainersBreaking the Monolith Road to Containers
Breaking the Monolith Road to ContainersAmazon Web Services
 
Continuous Delivery to Amazon EC2 Container Service
Continuous Delivery to Amazon EC2 Container ServiceContinuous Delivery to Amazon EC2 Container Service
Continuous Delivery to Amazon EC2 Container ServiceAmazon Web Services
 
A Tale of Two Pizzas: Accelerating Software Delivery with AWS Developer Tools
A Tale of Two Pizzas: Accelerating Software Delivery with AWS Developer ToolsA Tale of Two Pizzas: Accelerating Software Delivery with AWS Developer Tools
A Tale of Two Pizzas: Accelerating Software Delivery with AWS Developer ToolsAmazon Web Services
 

Semelhante a Amazon ECS at Coursera: A unified execution framework while defending against untrusted code (20)

SRV312 DevOps on AWS: Building Systems to Deliver Faster
SRV312 DevOps on AWS: Building Systems to Deliver FasterSRV312 DevOps on AWS: Building Systems to Deliver Faster
SRV312 DevOps on AWS: Building Systems to Deliver Faster
 
ENT201 A Tale of Two Pizzas: Accelerating Software Delivery with AWS Develope...
ENT201 A Tale of Two Pizzas: Accelerating Software Delivery with AWS Develope...ENT201 A Tale of Two Pizzas: Accelerating Software Delivery with AWS Develope...
ENT201 A Tale of Two Pizzas: Accelerating Software Delivery with AWS Develope...
 
AWS re:Invent 2016: Development Workflow with Docker and Amazon ECS (CON302)
AWS re:Invent 2016: Development Workflow with Docker and Amazon ECS (CON302)AWS re:Invent 2016: Development Workflow with Docker and Amazon ECS (CON302)
AWS re:Invent 2016: Development Workflow with Docker and Amazon ECS (CON302)
 
ENT201 A Tale of Two Pizzas: Accelerating Software Delivery with AWS Develope...
ENT201 A Tale of Two Pizzas: Accelerating Software Delivery with AWS Develope...ENT201 A Tale of Two Pizzas: Accelerating Software Delivery with AWS Develope...
ENT201 A Tale of Two Pizzas: Accelerating Software Delivery with AWS Develope...
 
Operations and Security at Cloud Scale with Amazon EC2 System Manager - AWS S...
Operations and Security at Cloud Scale with Amazon EC2 System Manager - AWS S...Operations and Security at Cloud Scale with Amazon EC2 System Manager - AWS S...
Operations and Security at Cloud Scale with Amazon EC2 System Manager - AWS S...
 
AWS ECS Copilot DevOps Presentation
AWS ECS Copilot DevOps PresentationAWS ECS Copilot DevOps Presentation
AWS ECS Copilot DevOps Presentation
 
A Tale of Two Pizzas: Accelerating Software Delivery with AWS Developer Tools
A Tale of Two Pizzas: Accelerating Software Delivery with AWS Developer ToolsA Tale of Two Pizzas: Accelerating Software Delivery with AWS Developer Tools
A Tale of Two Pizzas: Accelerating Software Delivery with AWS Developer Tools
 
DevOps Tooling - Pop-up Loft TLV 2017
DevOps Tooling - Pop-up Loft TLV 2017DevOps Tooling - Pop-up Loft TLV 2017
DevOps Tooling - Pop-up Loft TLV 2017
 
20211202 NADOG Adapting to Covid with Serverless Craeg Strong Ariel Partners
20211202 NADOG Adapting to Covid with Serverless Craeg Strong Ariel Partners20211202 NADOG Adapting to Covid with Serverless Craeg Strong Ariel Partners
20211202 NADOG Adapting to Covid with Serverless Craeg Strong Ariel Partners
 
Md Zahir Uddin
Md Zahir UddinMd Zahir Uddin
Md Zahir Uddin
 
Developer Tools at AWS 2018.pdf
Developer Tools at AWS 2018.pdfDeveloper Tools at AWS 2018.pdf
Developer Tools at AWS 2018.pdf
 
Developing in the Cloud
Developing in the CloudDeveloping in the Cloud
Developing in the Cloud
 
DevOps on AWS: DevOps Day San Francisco
DevOps on AWS: DevOps Day San FranciscoDevOps on AWS: DevOps Day San Francisco
DevOps on AWS: DevOps Day San Francisco
 
Canada DevOps Summit 2020 Presentation Nov_03_2020
Canada DevOps Summit 2020 Presentation Nov_03_2020Canada DevOps Summit 2020 Presentation Nov_03_2020
Canada DevOps Summit 2020 Presentation Nov_03_2020
 
20211202 North America DevOps Group NADOG Adapting to Covid With Serverless C...
20211202 North America DevOps Group NADOG Adapting to Covid With Serverless C...20211202 North America DevOps Group NADOG Adapting to Covid With Serverless C...
20211202 North America DevOps Group NADOG Adapting to Covid With Serverless C...
 
A Tale of Two Pizzas: Developer Tools at AWS - DevDay Los Angeles 2017
A Tale of Two Pizzas: Developer Tools at AWS - DevDay Los Angeles 2017A Tale of Two Pizzas: Developer Tools at AWS - DevDay Los Angeles 2017
A Tale of Two Pizzas: Developer Tools at AWS - DevDay Los Angeles 2017
 
Announcing AWS CodeBuild - January 2017 Online Teck Talks
Announcing AWS CodeBuild - January 2017 Online Teck TalksAnnouncing AWS CodeBuild - January 2017 Online Teck Talks
Announcing AWS CodeBuild - January 2017 Online Teck Talks
 
Breaking the Monolith Road to Containers
Breaking the Monolith Road to ContainersBreaking the Monolith Road to Containers
Breaking the Monolith Road to Containers
 
Continuous Delivery to Amazon EC2 Container Service
Continuous Delivery to Amazon EC2 Container ServiceContinuous Delivery to Amazon EC2 Container Service
Continuous Delivery to Amazon EC2 Container Service
 
A Tale of Two Pizzas: Accelerating Software Delivery with AWS Developer Tools
A Tale of Two Pizzas: Accelerating Software Delivery with AWS Developer ToolsA Tale of Two Pizzas: Accelerating Software Delivery with AWS Developer Tools
A Tale of Two Pizzas: Accelerating Software Delivery with AWS Developer Tools
 

Último

JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...Bert Jan Schrijver
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLionel Briand
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogueitservices996
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldRoberto Pérez Alcolea
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...OnePlan Solutions
 
eSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolseSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolsosttopstonverter
 
Advantages of Cargo Cloud Solutions.pptx
Advantages of Cargo Cloud Solutions.pptxAdvantages of Cargo Cloud Solutions.pptx
Advantages of Cargo Cloud Solutions.pptxRTS corp
 
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesAmazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesKrzysztofKkol1
 
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics
 
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...OnePlan Solutions
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shardsChristopher Curtin
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfRTS corp
 
The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...
The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...
The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...kalichargn70th171
 
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxThe Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxRTS corp
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slidesvaideheekore1
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingShane Coughlan
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jNeo4j
 
SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?Alexandre Beguel
 
Effectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorEffectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorTier1 app
 
Understanding Plagiarism: Causes, Consequences and Prevention.pptx
Understanding Plagiarism: Causes, Consequences and Prevention.pptxUnderstanding Plagiarism: Causes, Consequences and Prevention.pptx
Understanding Plagiarism: Causes, Consequences and Prevention.pptxSasikiranMarri
 

Último (20)

JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and Repair
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogue
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository world
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
 
eSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolseSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration tools
 
Advantages of Cargo Cloud Solutions.pptx
Advantages of Cargo Cloud Solutions.pptxAdvantages of Cargo Cloud Solutions.pptx
Advantages of Cargo Cloud Solutions.pptx
 
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesAmazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
 
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
 
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
 
The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...
The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...
The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...
 
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxThe Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slides
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
 
SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?
 
Effectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorEffectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryError
 
Understanding Plagiarism: Causes, Consequences and Prevention.pptx
Understanding Plagiarism: Causes, Consequences and Prevention.pptxUnderstanding Plagiarism: Causes, Consequences and Prevention.pptx
Understanding Plagiarism: Causes, Consequences and Prevention.pptx
 

Amazon ECS at Coursera: A unified execution framework while defending against untrusted code

Notas do Editor

  1. This is a 400-level session…
  2. This talk is organized as follow…
  3. Introducing Coursera. - Mission - Founding. Transition: we partner…
  4. “World class institutions from around the globe. We have partners on 6 continents (most of them)! You might recognize a name or two up here. These institutions take their best programs and their best instructors, and their best courses, and put them on the Coursera learning platform.”
  5. Courses span a range from highly technical… to the humanities, including… Listing these courses: - Machine Learning - Data Science - Learning How To Learn - Social Psychology - Irrational Behavior Raise of hands: who has taken a Coursera course?
  6. … Our scale brings a number of challenges, and opportunities. For example, the export gradebook function is relatively trivial for a class of 20 or even 200 students. Our largest classes have over 200,000 individuals. In order to power our global learning platform, we’ve had to develop techniques and build systems that meet our needs. And now, I’d like to invite Frank Chen, a founding engineer at Coursera, to the stage to discuss our near line job scheduler and execution framework.”
  7. Three things: Batch Processing, Scheduled Processing, Nearline Processing
  8. Instructor reports - As you know, our instructors put classes on our platform for free – and in return, they want data about their learners! We also want to give them this data because they can use this for further studies and to improve their courses. 20 learners easy – 200k learners hard. Of course, like any other company, we have a lot of Internal reports as well – finance, bizdev, marketing wants to know updated data, make sure data is accurate. Payment reconciliation makes sure we are gettingthe right product to the right people etc…
  9. Our main use case of scheduled processing would be for marketing and recommendation emails. How many of you have received a recommendation email from Coursera? That is from a batch job that is run every week. We also do smaller scale things – reactivation and targeted marketing emails.
  10. What is peer review? In a lot of our classes (e.g. Fiction Writing, Modern Poetry), instructors cannot accurately assess student performance just by using MCQ questions – you need students to write short answer questions and essays or submit drawings or recordings. You need human beings to evaluate these. In an ideal world, you can hire 1,000 TAs to grade these submissions, but we can’t afford that. You can do the next best thing – getting other students to evaluate you. As part of this system, we need to assign reviewers to your assignments in a fair and efficient manner according to complex criteria and near-line processing enables us to do this.
  11. Before I jump into what we did with ECS, I want to talk about the early days of batch processing at Coursera.
  12. Looking for something BETTER in our next system.
  13. Saturn / Cascade were flaky Developers became frustrated with jobs not running properly
  14. Developing & testing locally was difficult in old system – develop bad habit of pushing code to prod without testing it Required no boilerplate code to be written and little environment setup should be required
  15. Deployment was difficult and often interfered with running jobs. “Other services have one-click tools, why can’t your service have that too?”
  16. Low startup and shutdown time – responsive enough to start within 30 seconds of the job being requested if there are enough resources.
  17. Only one dev-ops engineer -- can’t manage everything Developers own their services Developers shouldn’t have to actively monitor services
  18. We are a startup and we are cost-conscious Most jobs complete < 20 minutes but EC2 rounds costs up to full hour
  19. Amazon does the hard work of coordination and synchronization in a distributed system, and even provides an agent to run jobs itself. Our DevOps engineer is very happy – one AMI with one agent + Docker on it, simple and easy to use.
  20. Tightly integrated with rest of AWS APIs. For instance, we can use Amazon IAM roles and users to restrict access to the ECS API, and this makes Brennan, who is also our security engineer, very happy because we get security without him doing a lot of extra work.
  21. Amazon traditionally has a very good set of APIs, documentation and SDKs and ECS is no different. We found it very easy to grasp the key concepts and get started with using Amazon ECS.
  22. No scheduled tasks No fine-grained monitoring of tasks No retries or delays when cluster runs out of resources / prioritization Does not integrate well with our existing infrastructure (e.g. Scala APIs and tooling)
  23. User submits a request to a frontend service – e.g. to our quiz service for an export of all the grades of the students in the class for a specific quiz.
  24. The online service sends a batch job request to the Iguazu frontend.
  25. The iguazu frontend persists pertinent job information to Cassandra, our database.
  26. The frontend then submits the job request to a SQS queue.
  27. The Iguazu backend reads pending jobs off the SQS queue and processes them.
  28. In this case, it will talk to the ECS APIs to get a list of all container instances and select a container instance to run the job. A special note here: In our original design, we handed off the scheduling portion of this to the ECS system itself by calling RunTask – which would randomly choose an instance with enough CPU and RAM resources to run my job on. However, we found that this is not flexible enough for our purposes. Specifically, as we were integrating the Iguazu backend with the EC2 Autoscaling Lifecycle APIs in order to autoscale our ECS container instances, we found that RunTask would sometimes schedule jobs on instances that were in the process of termination. By switching to the StartTask API and writing our own scheduling system for the backend, we eliminated that problem by simply not scheduling jobs on instances undergoing termination. In addition, Iguazu can also receive notifications from the EC2 AS Lifecycle APIs that will automatically notify it when an instance is getting terminated and Iguazu will even block the termination of instances until all jobs that were already running on that instance had completed. This greatly improved reliability especially for our long running jobs. If jobs cannot be run, then they will go back into the SQS queue for retry after 10 minutes. If the job still fails to be scheduled after an hour, then the job will be deleted and a exception logged.
  29. If we have successfully identified an instance with enough resources, then we will call the StartTask API and ask ECS to run the task on that specific container instance. The Iguazu backend will periodically monitor the job and update the status of the job in Cassandra. Each online service can then query the Iguazu frontend for job updates.
  30. Similarly, developers can use an admin interface to schedule recurring jobs. This goes into an alternate Iguazu frontend (aka Scheduler) that wakes up every second and sends all the tasks that it has to run to the backend via the same SQS queue.
  31. Our backend is written entirely in Scala, so naturally Iguazu jobs are written in Scala too. Almost no boilerplate and easy to get started.
  32. Running jobs from other services (e.g. the quiz service) is even easier. We have an internal REST RPC framework called Naptime that abstracts the details of inter-service API calls away from each individual developer. Developers can just take the Iguazu Job Invocation Client that Naptime provides and call the create method with the parameters of the job he or she wants to run. In this case, the developer is running the exportQuizGrades job.
  33. Our interface is very simple. Engineers can just click the red + button on the lower right to add a new schedule job and they can also click to edit existing ones. Of course, all changes are logged for auditing purposes.
  34. Developers are definitely happier with this system than the previous ones but we are always continually improving the system.
  35. Now, I’d like to talk about a special application of Iguazu, Docker, and Amazon ECS: evaluating programming assignments. First Brennan Slide (Round 2)
  36. … Which of you would like to run untrusted C code from random people on the internet on your servers? Raise your hands! …. Ah you must be from Amazon Web Services. But see for us, we don’t even require a credit card!
  37. … And some professors set up their graders on machines under their desks. Naturally, the power cord always became unplugged hours before the submission deadline…
  38. Procrastinators! And then we’d forget about the instance for a couple months after the course finished. 
  39. Securing un-trusted code is hard!
  40. Now, Coursera recently underwent a complete revamp of the entire course platform, as we shifted from running sessions 2-3 times a year, to every 2-6 weeks. This gave us an opportunity to take another run at programming assignment infrastructure. Coursera is a startup, and we provide generous financial aid, so we are very cost conscious. When we analyzed our cloud spend, we found that we spent a disproportionate amount of money on the graders for programming assignments. Since it is a given that we need to support more courses, with more sophisticated assignments, we needed a system that would provide an order of magnitude or more in cost savings. Cost savings implied autoscaling and a shared pool of resources.
  41. Have only one devops engineer. No maintenance implied immutable infrastructure, and highly automated environment.
  42. For pedagogical reasons, we would like to provide feedback as quickly as possible. Ideally, we are able to execute fast graders and turn around their scores within 30 seconds at the 90th percentile. Near realtime means cannot boot a new EC2 instance. In combination with others, implied containers and Docker.
  43. We wanted to bake security into the infrastructure to automatically relieve instructors from worrying about the vast majority of vulnerabilities. This also has the added benefit of making the system more robust to more innocent occurrences. ... But, what does “Secure Infrastructure” even mean?
  44. … And with that, I’d like to delve into some attacks, and defenses.
  45. Now, some of you may have noticed, we have a little problem…
  46. First talk about success of the system: - Thousands of assignments evaluated daily. - Over a hundred assignments on the platform - Used by dozens of courses Despite the success, there are a number of future improvements to be made… - Provide a number of sample submissions, and their expected scores… - Now, a number of our security measures cannot be replicated out of AWS, but some like App Armor could be. As we have time, we may look into open sourcing our app armor profile, and our list of kernel capabilities. - Finally, we would like to support mapping GPUs into our containers for that CUDA class… While this would require another modification to the Amazon ECS agent, we think that it won’t be too bad.
  47. … Finally, building a platform for code execution is much harder than building an API in front of a database. Thank you to all of the engineers at AWS who are building the secure, and reliable systems we all here have come to rely on.
  48. If any of what we’ve talked about today sounds interesting to you, please know that Coursera is always looking for talented engineers, managers, and designers to join the team. If you are interested, please don’t hesitate to reach out. Thank you all very much for attending. <Pause> Questions?