SlideShare uma empresa Scribd logo
1 de 32
Baixar para ler offline
Serverless Optical Character Recognition
in Support of NASA Astronaut Safety
Chris Shenton
CTO V! Studios
AWS DC Meetup
2017-12-12
Talk Overview
● The Problem
● The Challenge
● Architectures: server, cloud, serverless
● Lambda: FaaS, Events, Benefits, Limitations
● NASA EVA OCR Architecture
● Security, FedRAMP, ATO
● Serverless Framework
● Gotchas!
● Happy Customer
● Future Challenges and Opportunities
Problem: Life-Threatening Spacesuit Failure
On July 16, 2013, water
filled the helmet of
Italian astronaut Luca
Parmitano, creating a
life threatening
scenario which forced
NASA to abort his
spacewalk.
The Challenge
● Designs on paper or scanned without OCR ability
● Current reporting processes and procedures
cannot be changed
● About 60 Discrepancy Reports (20 pages) and 190
Task Performance Sheet reports (500 pages) per
month
● Started OCR in 2015, stopped due to server load
● Overwhelmed the EVA Data Integration pipeline
100,000
pages/month
Architecture evolution: server to cloud to serverless
● Datacenter: no scaling
● Cloud servers: scaling
● Cloud Containers: scaling
● Serverless: fast, painless scaling
Architecture [1a]: Datacenter, no scaling
PDF
doc
Server
OCR
process
TXT
doc
Architecture [1b]: Datacenter, no scaling
PDF
doc
Server
OCR
process
TXT
doc
PDF
doc
PDF
doc
PDF
doc
PDF
doc
Load overwhelms OCR server
Architecture [2]: Cloud with scaling
PDF
doc
PDF
doc
PDF
doc
PDF
doc
PDF
doc
Pro:
● Scaling handles load spikes
Con:
● Complicated to set up
● Scale out takes a few minutes per server
● Still have to manage OS, security
Autoscaling group
SQS
Queue
Server
OCR
Server
OCR
Server
OCR
Server
OCR
Server
OCR
TXT
doc
TXT
doc
TXT
doc
TXT
doc
TXT
doc
Architecture [3]: Cloud Containers with scaling
PDF
doc
PDF
doc
PDF
doc
PDF
doc
PDF
doc
Pro:
● Scaling handles load spikes
● Can deploy immutable instances
Con:
● Have to manage scaling
● Have to manage placement, orchestration
SQS
Queue
TXT
doc
TXT
doc
TXT
doc
TXT
doc
TXT
doc
Container Serve
Container
OCR
Container
OCR
Container
OCR
Container
OCR
Container
OCR
Container Server
Container
OCR
Container
OCR
Container
OCR
Container
OCR
Container
OCR
Automatic scaling
Architecture [4a]: Serverless Cloud with built-in scaling
PDF
doc
PDF
doc
PDF
doc
PDF
doc
PDF
doc
Pro:
● Scaling is automatic, nearly instant
● No patching, open ports
Con:
● Some limits on size, lifetime
TXT
doc
Lambda
OCR
Lambda
OCR
Lambda
OCR
Lambda
OCR
Lambda
OCR
TXT
doc
TXT
doc
TXT
doc
TXT
doc
PDF
page
Automatic scaling
Architecture [4b]: Serverless Cloud with built-in scaling
PDF
doc
PDF
doc
PDF
doc
PDF
doc
PDF
doc
PDF
page
PDF
pageLambda
split doc
Lambda
split doc
Lambda
split doc
Lambda
split doc
Lambda
split doc
PDF
page
PDF
page
PDF
page
PDF
page
PDF
page
PDF
page
PDF
page
Automatic scaling
Lambda
OCR
Lambda
OCR
Lambda
OCR
Lambda
OCR
Lambda
OCR
Lambda
OCR
Lambda
OCR
Lambda
OCR
Lambda
OCR
Lambda
OCR
TXT
doc
TXT
doc
TXT
doc
TXT
doc
TXT
doc
TXT
doc
TXT
doc
TXT
doc
TXT
doc
TXT
doc
With instant, automatic scaling, we can split PDF docs into PDF pages
and OCR each page to text in parallel, with minimal extra effort.
Exploiting parallelism gives us our results much faster at no extra cost.
Lambda, FaaS, Event-Driven Computing
Say What?
AWS Lambda is a “Function as a Service” (FaaS)
Function as a service (FaaS) is a category of cloud
computing services that provides a platform
allowing customers to develop, run, and manage
application functionalities without the complexity
of building and maintaining the infrastructure
typically associated with developing and launching
an app. Building an application following this
model is one way of achieving a “serverless”
architecture, and is typically used when building
microservices applications.
Wikipedia
FaaS Products
● AWS Lambda
● Google Cloud Functions
● Microsoft Azure Functions
● IBM OpenWhisk
Event-Driven Computing: trigger Functions based on events
S3
ObjectCreated
DynamoDB
Row Changed
API Gateway
GET, PUT, POST, DELETE
Lambda Function
processes event
Event-Driven Computing: Lambdas can trigger Lambdas
S3
ObjectCreated
DynamoDB
Row Changed
API Gateway
GET, PUT, POST, DELETE
Lambda Function
processes event
Lambda
invoke Sync or Async
Event-Driven Computing: allows interesting architectures
Lambda:
return new S3
location
DynamoDB
Row Changed
store info to
DynamoDB
GET /newUpload
{uploadUrl:
s3://bucket/newKey}
PUT /newKey
...data…
S3 ObjectCreated
{bucket: b,
key: newKey}
Application
Search
Engine
Send metadata to application via HTTP
{method: GET,
url: /newUpload,
data: none}
{uploadUrl:
s3://bucket/newKey}
1
2
Event-Driven Computing: AWS events
● Amazon S3
● Amazon DynamoDB
● Amazon Kinesis Streams
● Amazon Simple Notification Service
● Amazon Simple Email Service
● Amazon Cognito
● AWS CloudFormation
● Amazon CloudWatch Logs
● Amazon CloudWatch Events
● AWS CodeCommit
● Scheduled Events
● AWS Config
● Amazon Alexa
● Amazon Lex
● Amazon API Gateway
● AWS IoT Button
● Amazon CloudFront
● Amazon Kinesis Firehose
● Invoking a Lambda Function On Demand
Lambda Benefits
Example application: process 1000 2-second requests/day
● Server: $16.84/month (AWS t2.small, 24x7)
● Lambda: $1.50/month
No Servers to Manage:
no patching, open ports
or logins
Subsecond Metering:
no idle capacity
Continuous Scaling:
high availability
Lambda Limitations
● Languages: NodeJS, Python, Java, C# (Golang soon)
● Maximum execution duration: 300 seconds
● Memory: 128 MB - 3 GB
● Ephemeral disk: 512 MB
● Invoke request payload: 6 MB (sync), 128 KB (event/async)
● File descriptors: 1024
● Processes and threads: 1024
Lambda Limitations: some work-arounds
Maximum execution duration: 300 seconds
Memory: 128 MB - 3 GB
● Decompose function into smaller microservices
● Chain Lambdas with events like SNS or Lambda async invocation
Ephemeral disk capacity: 512 MB
● Read/write from/to external storage: S3, DynamoDB, etc
EVA OCR Architecture
EVA
Data
Integration
systems
Text
Documents
Text
Pages
PDF
Pages
PDF
Docs
EDI
App
EDI
Search
API
Split
Output
PDF
doc
JSON
doc
PDF
doc
AWS Autoscaling LambdasAWS S3 Storage
OCR for EVA
TXT
pages
OCROCROCR
CombineCombineCombine
PDF
page
PDF
page
PDF
page
TXT
page
TXT
page
TXT
page
EVA OCR Architecture: Big Wins
● Architecture designed for lowest operational cost possible:
○ S3 files removed after 24 hours: minimal data charges, better security
○ no database cost
● Architectural patterns we used instead of database:
○ track progress with directory prefixes
○ propagate information using S3 object metadata
● Lambda autoscaling, fast scaling, pay only for active use
● Serverless Framework simplified deployment
● but see the Gotchas in a few slides...
EVA OCR Architecture: securely connect with cloud policies
EDI App
IAM Role:
eva-app-role
EVA OCR S3 bucket
eva-ocr-dev
● /doc_pdf/
● /page_pdf/
● /page_txt/
● /doc_txt/
EVA Search API
on 3x EC2
HTTPS API on port 5333
Security Group:
sg-002: eva-search
● allow from sg-001
● to port 5333
EVA OCR
Lambda Functions
IAM Role:
eva-ocr-dev-us-east-1-lambda
Security Group:
sg-001: ocreva-lambda-output
EVA
code
uploads
PDF to
/doc_pdf/
HTTP POST
{docid: ‘mydocid’,
page: 42,
text: ‘ocr text…’}
Lambdas read/write pdf
and txt in various folders
IAM Role:
eva-app-role
Policies:
● ocreva-s3-write-doc_pdf
● other EDI policies...
IAM Policy:
ocreva-s3-write-doc_pdf
allow write
arn:aws:s3:::eva-ocr-dev/doc_pdf/
No auth servers were harmed
in the making of this service
EVA OCR Security Controls
EVA OCR
S3 Storage
Even though Lambda is currently undergoing
FedRAMP certification, cloud security group
provided ATO based on the following controls:
● GovCloud for sensitive data
● IAM policies, roles and Security Groups
restrict access
● Separate VPC for Lambdas
● No VPC network egress for Lambdas
● Security Group allows output of final
Lambda to EDI Search API
● Encrypted data in transit and at rest
● Static Code Analysis
EVA OCR
Autoscaling
Lambdas
Lambda VPC
private IP space
/16 = 65535 IPs
EVA
Data
Integration
systems
EDI VPC
NASA IP space
limited IPs
SG allows
to port 5333
S3
VPC Endpoint
Serverless Framework: from the horse’s mouth
Serverless is your toolkit for
deploying and operating
serverless architectures.
Focus on your application,
not your infrastructure.
serverless.com
npm install serverless -g
serverless create --template hello-world
serverless deploy
curl http://xyz.amazonaws.com/hello-world
Serverless Framework: overview
● Controlled by file serverless.yml
● Defines Resources: S3, Lambda, …
● Defines Events: ObjectCreated, …
● Wires Events to trigger Lambdas
● Defines Security
● Uses CloudFormation underneath
● Simple CLI to deploy to cloud
1. service: myservicename
2. provider:
3. name: aws
4. runtime: python3.6
5. functions:
6. s3created:
7. handler: myservice.s3created
8. events:
9. - s3:
10. bucket: myservice-dev
11. event: s3:ObjectCreated:*
12. rules:
13. - prefix: doc_pdf/
14. formget:
15. handler: myservice.formget
16. events:
17. - http:
18. path: form
19. method: get
20. formpost:
21. handler: myservice.formpost
22. events:
23. - http:
24. path: form
25. method: post
Gotchas!
● Will get duplicate events if Lambda exits unsuccessfully
○ this is a good thing
● May get duplicate events
○ detect and possibly ignore them (idempotent)
● Timeouts if job takes longer than 300 seconds
○ may have to chain Lambdas
● Overloading destinations is likely due to scale
○ detect, back-off
○ may require handling like Timeouts
● Fast scaling can exhaust limited IP addresses in a VPC
○ use separate VPC for Lambda with large private IP space, e.g., /16 with 65,535 IPs
● S3 eventual consistency
○ use UUIDs in S3 keys to force read consistency
Happy Customer
“The work you’ve accomplished
is a big step proving out this
new technology for NASA”
Cuong Q Nguyen, JSC/NASA EVA Office
Future Challenges and Opportunities
NASA’s Cuong Nguyen has told us he needs to track assembly, subassembly and part
hierarchies. Can we extract structured text?
He also needs to identify inspector and approval “stamps”. This is not OCR but hard
image processing.
Future Work: structured content extraction
Future Work: “stamp” detection and identification
Questions?
Reach out to us!
chris@v-studios.com
@shentonfreude

Mais conteúdo relacionado

Mais procurados

Serverless - When to FaaS?
Serverless - When to FaaS?Serverless - When to FaaS?
Serverless - When to FaaS?Benny Bauer
 
State of serverless
State of serverlessState of serverless
State of serverlessAnurag Saran
 
Serverless Data Lake on AWS
Serverless Data Lake on AWSServerless Data Lake on AWS
Serverless Data Lake on AWSThanh Nguyen
 
Elastically scalable architectures with microservices. The end of the monolith?
Elastically scalable architectures with microservices. The end of the monolith?Elastically scalable architectures with microservices. The end of the monolith?
Elastically scalable architectures with microservices. The end of the monolith?Javier Arias Losada
 
Application Lifecycle Management on AWS
Application Lifecycle Management on AWSApplication Lifecycle Management on AWS
Application Lifecycle Management on AWSDavid Mat
 
Getting Started with Serverless Architectures | AWS Public Sector Summit 2016
Getting Started with Serverless Architectures | AWS Public Sector Summit 2016Getting Started with Serverless Architectures | AWS Public Sector Summit 2016
Getting Started with Serverless Architectures | AWS Public Sector Summit 2016Amazon Web Services
 
Managing application & instance state on AWS
Managing application & instance state on AWSManaging application & instance state on AWS
Managing application & instance state on AWSDavid Mat
 
AWS re:Invent 2016: Building Complex Serverless Applications (GPST404)
AWS re:Invent 2016: Building Complex Serverless Applications (GPST404)AWS re:Invent 2016: Building Complex Serverless Applications (GPST404)
AWS re:Invent 2016: Building Complex Serverless Applications (GPST404)Amazon Web Services
 
AWS re:Invent 2016: Automating and Scaling Infrastructure Administration with...
AWS re:Invent 2016: Automating and Scaling Infrastructure Administration with...AWS re:Invent 2016: Automating and Scaling Infrastructure Administration with...
AWS re:Invent 2016: Automating and Scaling Infrastructure Administration with...Amazon Web Services
 
AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)
AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)
AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)Amazon Web Services
 
AWS re:Invent 2016: How Thermo Fisher Is Reducing Mass Spectrometry Experimen...
AWS re:Invent 2016: How Thermo Fisher Is Reducing Mass Spectrometry Experimen...AWS re:Invent 2016: How Thermo Fisher Is Reducing Mass Spectrometry Experimen...
AWS re:Invent 2016: How Thermo Fisher Is Reducing Mass Spectrometry Experimen...Amazon Web Services
 
Serverless presentation
Serverless presentationServerless presentation
Serverless presentationjasonsich
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaAmazon Web Services
 
AWS vs. Azure vs. Google vs. SoftLayer: Network, Storage and DBaaS
AWS vs. Azure vs. Google vs. SoftLayer: Network, Storage and DBaaSAWS vs. Azure vs. Google vs. SoftLayer: Network, Storage and DBaaS
AWS vs. Azure vs. Google vs. SoftLayer: Network, Storage and DBaaSRightScale
 
Introducing and Benefits of Ultra Fast Cloud Direct Connectivity to and from ...
Introducing and Benefits of Ultra Fast Cloud Direct Connectivity to and from ...Introducing and Benefits of Ultra Fast Cloud Direct Connectivity to and from ...
Introducing and Benefits of Ultra Fast Cloud Direct Connectivity to and from ...Amazon Web Services
 
Moving Enterprise Windows Workloads to AWS – Peter Stanski
Moving Enterprise Windows Workloads to AWS – Peter StanskiMoving Enterprise Windows Workloads to AWS – Peter Stanski
Moving Enterprise Windows Workloads to AWS – Peter StanskiAmazon Web Services
 
AWS re:Invent 2016: Application Lifecycle Management in a Serverless World (S...
AWS re:Invent 2016: Application Lifecycle Management in a Serverless World (S...AWS re:Invent 2016: Application Lifecycle Management in a Serverless World (S...
AWS re:Invent 2016: Application Lifecycle Management in a Serverless World (S...Amazon Web Services
 
AWS re:Invent 2016: Development Workflow with Docker and Amazon ECS (CON302)
AWS re:Invent 2016: Development Workflow with Docker and Amazon ECS (CON302)AWS re:Invent 2016: Development Workflow with Docker and Amazon ECS (CON302)
AWS re:Invent 2016: Development Workflow with Docker and Amazon ECS (CON302)Amazon Web Services
 
Drilett aws vpc_presentation_shared
Drilett aws vpc_presentation_sharedDrilett aws vpc_presentation_shared
Drilett aws vpc_presentation_sharedDavid Rilett
 

Mais procurados (20)

Serverless architecture
Serverless architectureServerless architecture
Serverless architecture
 
Serverless - When to FaaS?
Serverless - When to FaaS?Serverless - When to FaaS?
Serverless - When to FaaS?
 
State of serverless
State of serverlessState of serverless
State of serverless
 
Serverless Data Lake on AWS
Serverless Data Lake on AWSServerless Data Lake on AWS
Serverless Data Lake on AWS
 
Elastically scalable architectures with microservices. The end of the monolith?
Elastically scalable architectures with microservices. The end of the monolith?Elastically scalable architectures with microservices. The end of the monolith?
Elastically scalable architectures with microservices. The end of the monolith?
 
Application Lifecycle Management on AWS
Application Lifecycle Management on AWSApplication Lifecycle Management on AWS
Application Lifecycle Management on AWS
 
Getting Started with Serverless Architectures | AWS Public Sector Summit 2016
Getting Started with Serverless Architectures | AWS Public Sector Summit 2016Getting Started with Serverless Architectures | AWS Public Sector Summit 2016
Getting Started with Serverless Architectures | AWS Public Sector Summit 2016
 
Managing application & instance state on AWS
Managing application & instance state on AWSManaging application & instance state on AWS
Managing application & instance state on AWS
 
AWS re:Invent 2016: Building Complex Serverless Applications (GPST404)
AWS re:Invent 2016: Building Complex Serverless Applications (GPST404)AWS re:Invent 2016: Building Complex Serverless Applications (GPST404)
AWS re:Invent 2016: Building Complex Serverless Applications (GPST404)
 
AWS re:Invent 2016: Automating and Scaling Infrastructure Administration with...
AWS re:Invent 2016: Automating and Scaling Infrastructure Administration with...AWS re:Invent 2016: Automating and Scaling Infrastructure Administration with...
AWS re:Invent 2016: Automating and Scaling Infrastructure Administration with...
 
AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)
AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)
AWS re:Invent 2016: Running Batch Jobs on Amazon ECS (CON310)
 
AWS re:Invent 2016: How Thermo Fisher Is Reducing Mass Spectrometry Experimen...
AWS re:Invent 2016: How Thermo Fisher Is Reducing Mass Spectrometry Experimen...AWS re:Invent 2016: How Thermo Fisher Is Reducing Mass Spectrometry Experimen...
AWS re:Invent 2016: How Thermo Fisher Is Reducing Mass Spectrometry Experimen...
 
Serverless presentation
Serverless presentationServerless presentation
Serverless presentation
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS Lambda
 
AWS vs. Azure vs. Google vs. SoftLayer: Network, Storage and DBaaS
AWS vs. Azure vs. Google vs. SoftLayer: Network, Storage and DBaaSAWS vs. Azure vs. Google vs. SoftLayer: Network, Storage and DBaaS
AWS vs. Azure vs. Google vs. SoftLayer: Network, Storage and DBaaS
 
Introducing and Benefits of Ultra Fast Cloud Direct Connectivity to and from ...
Introducing and Benefits of Ultra Fast Cloud Direct Connectivity to and from ...Introducing and Benefits of Ultra Fast Cloud Direct Connectivity to and from ...
Introducing and Benefits of Ultra Fast Cloud Direct Connectivity to and from ...
 
Moving Enterprise Windows Workloads to AWS – Peter Stanski
Moving Enterprise Windows Workloads to AWS – Peter StanskiMoving Enterprise Windows Workloads to AWS – Peter Stanski
Moving Enterprise Windows Workloads to AWS – Peter Stanski
 
AWS re:Invent 2016: Application Lifecycle Management in a Serverless World (S...
AWS re:Invent 2016: Application Lifecycle Management in a Serverless World (S...AWS re:Invent 2016: Application Lifecycle Management in a Serverless World (S...
AWS re:Invent 2016: Application Lifecycle Management in a Serverless World (S...
 
AWS re:Invent 2016: Development Workflow with Docker and Amazon ECS (CON302)
AWS re:Invent 2016: Development Workflow with Docker and Amazon ECS (CON302)AWS re:Invent 2016: Development Workflow with Docker and Amazon ECS (CON302)
AWS re:Invent 2016: Development Workflow with Docker and Amazon ECS (CON302)
 
Drilett aws vpc_presentation_shared
Drilett aws vpc_presentation_sharedDrilett aws vpc_presentation_shared
Drilett aws vpc_presentation_shared
 

Semelhante a Serverless OCR for NASA EVA: AWS Meetup DC 2017-12-12

Deploying Serverless Cloud Optical Character Recognition in Support of NASA A...
Deploying Serverless Cloud Optical Character Recognition in Support of NASA A...Deploying Serverless Cloud Optical Character Recognition in Support of NASA A...
Deploying Serverless Cloud Optical Character Recognition in Support of NASA A...Chris Shenton
 
Serverless cat detector workshop - cloudyna 2017 (16.12.2017)
Serverless cat detector   workshop - cloudyna 2017 (16.12.2017)Serverless cat detector   workshop - cloudyna 2017 (16.12.2017)
Serverless cat detector workshop - cloudyna 2017 (16.12.2017)Paweł Pikuła
 
AWS re:Invent 2016: ↑↑↓↓←→←→ BA Lambda Start (SVR305)
AWS re:Invent 2016: ↑↑↓↓←→←→ BA Lambda Start (SVR305)AWS re:Invent 2016: ↑↑↓↓←→←→ BA Lambda Start (SVR305)
AWS re:Invent 2016: ↑↑↓↓←→←→ BA Lambda Start (SVR305)Amazon Web Services
 
Deep Dive on Microservices and Docker
Deep Dive on Microservices and DockerDeep Dive on Microservices and Docker
Deep Dive on Microservices and DockerKristana Kane
 
Skillenza Build with Serverless Challenge - Advanced Serverless Concepts
Skillenza Build with Serverless Challenge -  Advanced Serverless ConceptsSkillenza Build with Serverless Challenge -  Advanced Serverless Concepts
Skillenza Build with Serverless Challenge - Advanced Serverless ConceptsDhaval Nagar
 
Compute Without Servers – Building Applications with AWS Lambda - Technical 301
Compute Without Servers – Building Applications with AWS Lambda - Technical 301Compute Without Servers – Building Applications with AWS Lambda - Technical 301
Compute Without Servers – Building Applications with AWS Lambda - Technical 301Amazon Web Services
 
Lambda and serverless - DevOps North East Jan 2017
Lambda and serverless - DevOps North East Jan 2017Lambda and serverless - DevOps North East Jan 2017
Lambda and serverless - DevOps North East Jan 2017Mike Shutlar
 
AWS re:Invent 2016 Day 2 Keynote re:Cap
AWS re:Invent 2016 Day 2 Keynote re:CapAWS re:Invent 2016 Day 2 Keynote re:Cap
AWS re:Invent 2016 Day 2 Keynote re:CapAdrian Hornsby
 
AWS re:Invent 2016 Day 2 Keynote re:Cap
AWS re:Invent 2016 Day 2 Keynote re:CapAWS re:Invent 2016 Day 2 Keynote re:Cap
AWS re:Invent 2016 Day 2 Keynote re:CapIan Massingham
 
Cloud Patterns Beuth Hochschule
Cloud Patterns Beuth HochschuleCloud Patterns Beuth Hochschule
Cloud Patterns Beuth HochschuleSascha Möllering
 
Intro To Serverless Application Architecture: Collision 2018
Intro To Serverless Application Architecture: Collision 2018Intro To Serverless Application Architecture: Collision 2018
Intro To Serverless Application Architecture: Collision 2018Amazon Web Services
 
Getting Started with AWS Lambda and the Serverless Cloud
Getting Started with AWS Lambda and the Serverless CloudGetting Started with AWS Lambda and the Serverless Cloud
Getting Started with AWS Lambda and the Serverless CloudAmazon Web Services
 
PloneConf2017: serverless python for astronaut safety
PloneConf2017:  serverless python for astronaut safetyPloneConf2017:  serverless python for astronaut safety
PloneConf2017: serverless python for astronaut safetyChris Shenton
 
Getting Started with AWS Lambda and the Serverless Cloud
Getting Started with AWS Lambda and the Serverless CloudGetting Started with AWS Lambda and the Serverless Cloud
Getting Started with AWS Lambda and the Serverless CloudAmazon Web Services
 
AWS Lambda: Event-driven Code in the Cloud
AWS Lambda: Event-driven Code in the CloudAWS Lambda: Event-driven Code in the Cloud
AWS Lambda: Event-driven Code in the CloudAmazon Web Services
 
muCon 2017 - 12 Factor Serverless Applications
muCon 2017 - 12 Factor Serverless ApplicationsmuCon 2017 - 12 Factor Serverless Applications
muCon 2017 - 12 Factor Serverless ApplicationsChris Munns
 
Write less (code) and build more with serverless
Write less (code) and build more with serverlessWrite less (code) and build more with serverless
Write less (code) and build more with serverlessDhaval Nagar
 

Semelhante a Serverless OCR for NASA EVA: AWS Meetup DC 2017-12-12 (20)

Deploying Serverless Cloud Optical Character Recognition in Support of NASA A...
Deploying Serverless Cloud Optical Character Recognition in Support of NASA A...Deploying Serverless Cloud Optical Character Recognition in Support of NASA A...
Deploying Serverless Cloud Optical Character Recognition in Support of NASA A...
 
Serverless cat detector workshop - cloudyna 2017 (16.12.2017)
Serverless cat detector   workshop - cloudyna 2017 (16.12.2017)Serverless cat detector   workshop - cloudyna 2017 (16.12.2017)
Serverless cat detector workshop - cloudyna 2017 (16.12.2017)
 
AWS re:Invent 2016: ↑↑↓↓←→←→ BA Lambda Start (SVR305)
AWS re:Invent 2016: ↑↑↓↓←→←→ BA Lambda Start (SVR305)AWS re:Invent 2016: ↑↑↓↓←→←→ BA Lambda Start (SVR305)
AWS re:Invent 2016: ↑↑↓↓←→←→ BA Lambda Start (SVR305)
 
Deep Dive on Microservices and Docker
Deep Dive on Microservices and DockerDeep Dive on Microservices and Docker
Deep Dive on Microservices and Docker
 
Skillenza Build with Serverless Challenge - Advanced Serverless Concepts
Skillenza Build with Serverless Challenge -  Advanced Serverless ConceptsSkillenza Build with Serverless Challenge -  Advanced Serverless Concepts
Skillenza Build with Serverless Challenge - Advanced Serverless Concepts
 
Compute Without Servers – Building Applications with AWS Lambda - Technical 301
Compute Without Servers – Building Applications with AWS Lambda - Technical 301Compute Without Servers – Building Applications with AWS Lambda - Technical 301
Compute Without Servers – Building Applications with AWS Lambda - Technical 301
 
Lambda and serverless - DevOps North East Jan 2017
Lambda and serverless - DevOps North East Jan 2017Lambda and serverless - DevOps North East Jan 2017
Lambda and serverless - DevOps North East Jan 2017
 
AWS re:Invent 2016 Day 2 Keynote re:Cap
AWS re:Invent 2016 Day 2 Keynote re:CapAWS re:Invent 2016 Day 2 Keynote re:Cap
AWS re:Invent 2016 Day 2 Keynote re:Cap
 
AWS re:Invent 2016 Day 2 Keynote re:Cap
AWS re:Invent 2016 Day 2 Keynote re:CapAWS re:Invent 2016 Day 2 Keynote re:Cap
AWS re:Invent 2016 Day 2 Keynote re:Cap
 
Cloud Patterns Beuth Hochschule
Cloud Patterns Beuth HochschuleCloud Patterns Beuth Hochschule
Cloud Patterns Beuth Hochschule
 
Reinvent recap
Reinvent recapReinvent recap
Reinvent recap
 
AWS Lambda and Serverless Cloud
AWS Lambda and Serverless CloudAWS Lambda and Serverless Cloud
AWS Lambda and Serverless Cloud
 
Intro To Serverless Application Architecture: Collision 2018
Intro To Serverless Application Architecture: Collision 2018Intro To Serverless Application Architecture: Collision 2018
Intro To Serverless Application Architecture: Collision 2018
 
Getting Started with AWS Lambda and the Serverless Cloud
Getting Started with AWS Lambda and the Serverless CloudGetting Started with AWS Lambda and the Serverless Cloud
Getting Started with AWS Lambda and the Serverless Cloud
 
PloneConf2017: serverless python for astronaut safety
PloneConf2017:  serverless python for astronaut safetyPloneConf2017:  serverless python for astronaut safety
PloneConf2017: serverless python for astronaut safety
 
Intro to cloud.pdf
Intro to cloud.pdfIntro to cloud.pdf
Intro to cloud.pdf
 
Getting Started with AWS Lambda and the Serverless Cloud
Getting Started with AWS Lambda and the Serverless CloudGetting Started with AWS Lambda and the Serverless Cloud
Getting Started with AWS Lambda and the Serverless Cloud
 
AWS Lambda: Event-driven Code in the Cloud
AWS Lambda: Event-driven Code in the CloudAWS Lambda: Event-driven Code in the Cloud
AWS Lambda: Event-driven Code in the Cloud
 
muCon 2017 - 12 Factor Serverless Applications
muCon 2017 - 12 Factor Serverless ApplicationsmuCon 2017 - 12 Factor Serverless Applications
muCon 2017 - 12 Factor Serverless Applications
 
Write less (code) and build more with serverless
Write less (code) and build more with serverlessWrite less (code) and build more with serverless
Write less (code) and build more with serverless
 

Mais de Chris Shenton

Orchestrating complex workflows with aws step functions
Orchestrating complex workflows with aws step functionsOrchestrating complex workflows with aws step functions
Orchestrating complex workflows with aws step functionsChris Shenton
 
Automating EVA Workflows with AWS Step Functions
Automating EVA Workflows with AWS Step FunctionsAutomating EVA Workflows with AWS Step Functions
Automating EVA Workflows with AWS Step FunctionsChris Shenton
 
Creating Serverless apps for NASA in GovCloud
Creating Serverless apps for NASA in GovCloudCreating Serverless apps for NASA in GovCloud
Creating Serverless apps for NASA in GovCloudChris Shenton
 
Squeezing Machine Learning into Serverless for Image Recognition - AWS Meetup...
Squeezing Machine Learning into Serverless for Image Recognition - AWS Meetup...Squeezing Machine Learning into Serverless for Image Recognition - AWS Meetup...
Squeezing Machine Learning into Serverless for Image Recognition - AWS Meetup...Chris Shenton
 
Scaffolding for Serverless: lightning talk for AWS Arlington Meetup
Scaffolding for Serverless: lightning talk for AWS Arlington MeetupScaffolding for Serverless: lightning talk for AWS Arlington Meetup
Scaffolding for Serverless: lightning talk for AWS Arlington MeetupChris Shenton
 
Second Skin: Real-Time Retheming a Legacy Web Application with Diazo in the C...
Second Skin: Real-Time Retheming a Legacy Web Application with Diazo in the C...Second Skin: Real-Time Retheming a Legacy Web Application with Diazo in the C...
Second Skin: Real-Time Retheming a Legacy Web Application with Diazo in the C...Chris Shenton
 
[AWS DC Meetup] Not Your Father’s WebApp: The Cloud-Native Architecture of im...
[AWS DC Meetup] Not Your Father’s WebApp: The Cloud-Native Architecture of im...[AWS DC Meetup] Not Your Father’s WebApp: The Cloud-Native Architecture of im...
[AWS DC Meetup] Not Your Father’s WebApp: The Cloud-Native Architecture of im...Chris Shenton
 
Not Your Father’s Web App: The Cloud-Native Architecture of images.nasa.gov
Not Your Father’s Web App: The Cloud-Native Architecture of images.nasa.govNot Your Father’s Web App: The Cloud-Native Architecture of images.nasa.gov
Not Your Father’s Web App: The Cloud-Native Architecture of images.nasa.govChris Shenton
 

Mais de Chris Shenton (8)

Orchestrating complex workflows with aws step functions
Orchestrating complex workflows with aws step functionsOrchestrating complex workflows with aws step functions
Orchestrating complex workflows with aws step functions
 
Automating EVA Workflows with AWS Step Functions
Automating EVA Workflows with AWS Step FunctionsAutomating EVA Workflows with AWS Step Functions
Automating EVA Workflows with AWS Step Functions
 
Creating Serverless apps for NASA in GovCloud
Creating Serverless apps for NASA in GovCloudCreating Serverless apps for NASA in GovCloud
Creating Serverless apps for NASA in GovCloud
 
Squeezing Machine Learning into Serverless for Image Recognition - AWS Meetup...
Squeezing Machine Learning into Serverless for Image Recognition - AWS Meetup...Squeezing Machine Learning into Serverless for Image Recognition - AWS Meetup...
Squeezing Machine Learning into Serverless for Image Recognition - AWS Meetup...
 
Scaffolding for Serverless: lightning talk for AWS Arlington Meetup
Scaffolding for Serverless: lightning talk for AWS Arlington MeetupScaffolding for Serverless: lightning talk for AWS Arlington Meetup
Scaffolding for Serverless: lightning talk for AWS Arlington Meetup
 
Second Skin: Real-Time Retheming a Legacy Web Application with Diazo in the C...
Second Skin: Real-Time Retheming a Legacy Web Application with Diazo in the C...Second Skin: Real-Time Retheming a Legacy Web Application with Diazo in the C...
Second Skin: Real-Time Retheming a Legacy Web Application with Diazo in the C...
 
[AWS DC Meetup] Not Your Father’s WebApp: The Cloud-Native Architecture of im...
[AWS DC Meetup] Not Your Father’s WebApp: The Cloud-Native Architecture of im...[AWS DC Meetup] Not Your Father’s WebApp: The Cloud-Native Architecture of im...
[AWS DC Meetup] Not Your Father’s WebApp: The Cloud-Native Architecture of im...
 
Not Your Father’s Web App: The Cloud-Native Architecture of images.nasa.gov
Not Your Father’s Web App: The Cloud-Native Architecture of images.nasa.govNot Your Father’s Web App: The Cloud-Native Architecture of images.nasa.gov
Not Your Father’s Web App: The Cloud-Native Architecture of images.nasa.gov
 

Último

定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一Fs
 
Magic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptxMagic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptxMartaLoveguard
 
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)Christopher H Felton
 
NSX-T and Service Interfaces presentation
NSX-T and Service Interfaces presentationNSX-T and Service Interfaces presentation
NSX-T and Service Interfaces presentationMarko4394
 
Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa494f574xmv
 
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Sonam Pathan
 
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作ys8omjxb
 
SCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is prediSCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is predieusebiomeyer
 
Top 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxTop 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxDyna Gilbert
 
Elevate Your Business with Our IT Expertise in New Orleans
Elevate Your Business with Our IT Expertise in New OrleansElevate Your Business with Our IT Expertise in New Orleans
Elevate Your Business with Our IT Expertise in New Orleanscorenetworkseo
 
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书zdzoqco
 
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一z xss
 
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书rnrncn29
 
Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Paul Calvano
 
Contact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New DelhiContact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New Delhimiss dipika
 
Git and Github workshop GDSC MLRITM
Git and Github  workshop GDSC MLRITMGit and Github  workshop GDSC MLRITM
Git and Github workshop GDSC MLRITMgdsc13
 
Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Sonam Pathan
 
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一Fs
 

Último (20)

定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
 
Magic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptxMagic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptx
 
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
 
NSX-T and Service Interfaces presentation
NSX-T and Service Interfaces presentationNSX-T and Service Interfaces presentation
NSX-T and Service Interfaces presentation
 
Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa
 
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
 
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
 
SCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is prediSCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is predi
 
Top 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxTop 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptx
 
Elevate Your Business with Our IT Expertise in New Orleans
Elevate Your Business with Our IT Expertise in New OrleansElevate Your Business with Our IT Expertise in New Orleans
Elevate Your Business with Our IT Expertise in New Orleans
 
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
 
Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in  Rk Puram 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in  Rk Puram 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service
 
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
 
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
 
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
 
Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24
 
Contact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New DelhiContact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New Delhi
 
Git and Github workshop GDSC MLRITM
Git and Github  workshop GDSC MLRITMGit and Github  workshop GDSC MLRITM
Git and Github workshop GDSC MLRITM
 
Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170
 
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
 

Serverless OCR for NASA EVA: AWS Meetup DC 2017-12-12

  • 1. Serverless Optical Character Recognition in Support of NASA Astronaut Safety Chris Shenton CTO V! Studios AWS DC Meetup 2017-12-12
  • 2. Talk Overview ● The Problem ● The Challenge ● Architectures: server, cloud, serverless ● Lambda: FaaS, Events, Benefits, Limitations ● NASA EVA OCR Architecture ● Security, FedRAMP, ATO ● Serverless Framework ● Gotchas! ● Happy Customer ● Future Challenges and Opportunities
  • 3. Problem: Life-Threatening Spacesuit Failure On July 16, 2013, water filled the helmet of Italian astronaut Luca Parmitano, creating a life threatening scenario which forced NASA to abort his spacewalk.
  • 4. The Challenge ● Designs on paper or scanned without OCR ability ● Current reporting processes and procedures cannot be changed ● About 60 Discrepancy Reports (20 pages) and 190 Task Performance Sheet reports (500 pages) per month ● Started OCR in 2015, stopped due to server load ● Overwhelmed the EVA Data Integration pipeline 100,000 pages/month
  • 5. Architecture evolution: server to cloud to serverless ● Datacenter: no scaling ● Cloud servers: scaling ● Cloud Containers: scaling ● Serverless: fast, painless scaling
  • 6. Architecture [1a]: Datacenter, no scaling PDF doc Server OCR process TXT doc
  • 7. Architecture [1b]: Datacenter, no scaling PDF doc Server OCR process TXT doc PDF doc PDF doc PDF doc PDF doc Load overwhelms OCR server
  • 8. Architecture [2]: Cloud with scaling PDF doc PDF doc PDF doc PDF doc PDF doc Pro: ● Scaling handles load spikes Con: ● Complicated to set up ● Scale out takes a few minutes per server ● Still have to manage OS, security Autoscaling group SQS Queue Server OCR Server OCR Server OCR Server OCR Server OCR TXT doc TXT doc TXT doc TXT doc TXT doc
  • 9. Architecture [3]: Cloud Containers with scaling PDF doc PDF doc PDF doc PDF doc PDF doc Pro: ● Scaling handles load spikes ● Can deploy immutable instances Con: ● Have to manage scaling ● Have to manage placement, orchestration SQS Queue TXT doc TXT doc TXT doc TXT doc TXT doc Container Serve Container OCR Container OCR Container OCR Container OCR Container OCR Container Server Container OCR Container OCR Container OCR Container OCR Container OCR
  • 10. Automatic scaling Architecture [4a]: Serverless Cloud with built-in scaling PDF doc PDF doc PDF doc PDF doc PDF doc Pro: ● Scaling is automatic, nearly instant ● No patching, open ports Con: ● Some limits on size, lifetime TXT doc Lambda OCR Lambda OCR Lambda OCR Lambda OCR Lambda OCR TXT doc TXT doc TXT doc TXT doc
  • 11. PDF page Automatic scaling Architecture [4b]: Serverless Cloud with built-in scaling PDF doc PDF doc PDF doc PDF doc PDF doc PDF page PDF pageLambda split doc Lambda split doc Lambda split doc Lambda split doc Lambda split doc PDF page PDF page PDF page PDF page PDF page PDF page PDF page Automatic scaling Lambda OCR Lambda OCR Lambda OCR Lambda OCR Lambda OCR Lambda OCR Lambda OCR Lambda OCR Lambda OCR Lambda OCR TXT doc TXT doc TXT doc TXT doc TXT doc TXT doc TXT doc TXT doc TXT doc TXT doc With instant, automatic scaling, we can split PDF docs into PDF pages and OCR each page to text in parallel, with minimal extra effort. Exploiting parallelism gives us our results much faster at no extra cost.
  • 12. Lambda, FaaS, Event-Driven Computing Say What?
  • 13. AWS Lambda is a “Function as a Service” (FaaS) Function as a service (FaaS) is a category of cloud computing services that provides a platform allowing customers to develop, run, and manage application functionalities without the complexity of building and maintaining the infrastructure typically associated with developing and launching an app. Building an application following this model is one way of achieving a “serverless” architecture, and is typically used when building microservices applications. Wikipedia FaaS Products ● AWS Lambda ● Google Cloud Functions ● Microsoft Azure Functions ● IBM OpenWhisk
  • 14. Event-Driven Computing: trigger Functions based on events S3 ObjectCreated DynamoDB Row Changed API Gateway GET, PUT, POST, DELETE Lambda Function processes event
  • 15. Event-Driven Computing: Lambdas can trigger Lambdas S3 ObjectCreated DynamoDB Row Changed API Gateway GET, PUT, POST, DELETE Lambda Function processes event Lambda invoke Sync or Async
  • 16. Event-Driven Computing: allows interesting architectures Lambda: return new S3 location DynamoDB Row Changed store info to DynamoDB GET /newUpload {uploadUrl: s3://bucket/newKey} PUT /newKey ...data… S3 ObjectCreated {bucket: b, key: newKey} Application Search Engine Send metadata to application via HTTP {method: GET, url: /newUpload, data: none} {uploadUrl: s3://bucket/newKey} 1 2
  • 17. Event-Driven Computing: AWS events ● Amazon S3 ● Amazon DynamoDB ● Amazon Kinesis Streams ● Amazon Simple Notification Service ● Amazon Simple Email Service ● Amazon Cognito ● AWS CloudFormation ● Amazon CloudWatch Logs ● Amazon CloudWatch Events ● AWS CodeCommit ● Scheduled Events ● AWS Config ● Amazon Alexa ● Amazon Lex ● Amazon API Gateway ● AWS IoT Button ● Amazon CloudFront ● Amazon Kinesis Firehose ● Invoking a Lambda Function On Demand
  • 18. Lambda Benefits Example application: process 1000 2-second requests/day ● Server: $16.84/month (AWS t2.small, 24x7) ● Lambda: $1.50/month No Servers to Manage: no patching, open ports or logins Subsecond Metering: no idle capacity Continuous Scaling: high availability
  • 19. Lambda Limitations ● Languages: NodeJS, Python, Java, C# (Golang soon) ● Maximum execution duration: 300 seconds ● Memory: 128 MB - 3 GB ● Ephemeral disk: 512 MB ● Invoke request payload: 6 MB (sync), 128 KB (event/async) ● File descriptors: 1024 ● Processes and threads: 1024
  • 20. Lambda Limitations: some work-arounds Maximum execution duration: 300 seconds Memory: 128 MB - 3 GB ● Decompose function into smaller microservices ● Chain Lambdas with events like SNS or Lambda async invocation Ephemeral disk capacity: 512 MB ● Read/write from/to external storage: S3, DynamoDB, etc
  • 21. EVA OCR Architecture EVA Data Integration systems Text Documents Text Pages PDF Pages PDF Docs EDI App EDI Search API Split Output PDF doc JSON doc PDF doc AWS Autoscaling LambdasAWS S3 Storage OCR for EVA TXT pages OCROCROCR CombineCombineCombine PDF page PDF page PDF page TXT page TXT page TXT page
  • 22. EVA OCR Architecture: Big Wins ● Architecture designed for lowest operational cost possible: ○ S3 files removed after 24 hours: minimal data charges, better security ○ no database cost ● Architectural patterns we used instead of database: ○ track progress with directory prefixes ○ propagate information using S3 object metadata ● Lambda autoscaling, fast scaling, pay only for active use ● Serverless Framework simplified deployment ● but see the Gotchas in a few slides...
  • 23. EVA OCR Architecture: securely connect with cloud policies EDI App IAM Role: eva-app-role EVA OCR S3 bucket eva-ocr-dev ● /doc_pdf/ ● /page_pdf/ ● /page_txt/ ● /doc_txt/ EVA Search API on 3x EC2 HTTPS API on port 5333 Security Group: sg-002: eva-search ● allow from sg-001 ● to port 5333 EVA OCR Lambda Functions IAM Role: eva-ocr-dev-us-east-1-lambda Security Group: sg-001: ocreva-lambda-output EVA code uploads PDF to /doc_pdf/ HTTP POST {docid: ‘mydocid’, page: 42, text: ‘ocr text…’} Lambdas read/write pdf and txt in various folders IAM Role: eva-app-role Policies: ● ocreva-s3-write-doc_pdf ● other EDI policies... IAM Policy: ocreva-s3-write-doc_pdf allow write arn:aws:s3:::eva-ocr-dev/doc_pdf/ No auth servers were harmed in the making of this service
  • 24. EVA OCR Security Controls EVA OCR S3 Storage Even though Lambda is currently undergoing FedRAMP certification, cloud security group provided ATO based on the following controls: ● GovCloud for sensitive data ● IAM policies, roles and Security Groups restrict access ● Separate VPC for Lambdas ● No VPC network egress for Lambdas ● Security Group allows output of final Lambda to EDI Search API ● Encrypted data in transit and at rest ● Static Code Analysis EVA OCR Autoscaling Lambdas Lambda VPC private IP space /16 = 65535 IPs EVA Data Integration systems EDI VPC NASA IP space limited IPs SG allows to port 5333 S3 VPC Endpoint
  • 25. Serverless Framework: from the horse’s mouth Serverless is your toolkit for deploying and operating serverless architectures. Focus on your application, not your infrastructure. serverless.com npm install serverless -g serverless create --template hello-world serverless deploy curl http://xyz.amazonaws.com/hello-world
  • 26. Serverless Framework: overview ● Controlled by file serverless.yml ● Defines Resources: S3, Lambda, … ● Defines Events: ObjectCreated, … ● Wires Events to trigger Lambdas ● Defines Security ● Uses CloudFormation underneath ● Simple CLI to deploy to cloud 1. service: myservicename 2. provider: 3. name: aws 4. runtime: python3.6 5. functions: 6. s3created: 7. handler: myservice.s3created 8. events: 9. - s3: 10. bucket: myservice-dev 11. event: s3:ObjectCreated:* 12. rules: 13. - prefix: doc_pdf/ 14. formget: 15. handler: myservice.formget 16. events: 17. - http: 18. path: form 19. method: get 20. formpost: 21. handler: myservice.formpost 22. events: 23. - http: 24. path: form 25. method: post
  • 27. Gotchas! ● Will get duplicate events if Lambda exits unsuccessfully ○ this is a good thing ● May get duplicate events ○ detect and possibly ignore them (idempotent) ● Timeouts if job takes longer than 300 seconds ○ may have to chain Lambdas ● Overloading destinations is likely due to scale ○ detect, back-off ○ may require handling like Timeouts ● Fast scaling can exhaust limited IP addresses in a VPC ○ use separate VPC for Lambda with large private IP space, e.g., /16 with 65,535 IPs ● S3 eventual consistency ○ use UUIDs in S3 keys to force read consistency
  • 28. Happy Customer “The work you’ve accomplished is a big step proving out this new technology for NASA” Cuong Q Nguyen, JSC/NASA EVA Office
  • 29. Future Challenges and Opportunities NASA’s Cuong Nguyen has told us he needs to track assembly, subassembly and part hierarchies. Can we extract structured text? He also needs to identify inspector and approval “stamps”. This is not OCR but hard image processing.
  • 30. Future Work: structured content extraction
  • 31. Future Work: “stamp” detection and identification
  • 32. Questions? Reach out to us! chris@v-studios.com @shentonfreude