SlideShare uma empresa Scribd logo
1 de 46
Baixar para ler offline
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Petabytes of Data & No Servers: Corteva
Scales DNA Analysis to Meet Increasing
Business Demand
Ryan Smith
Software Development Leader –
Bioinformatics
Corteva Agriscience
E N T 2 1 8 - S
Scott Warren
Cloud Architect
Sogeti USA
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
DNA Sequencing Technology
• Lab uses Illumina sequencing
machines
• Data generated for analysis
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Sequence Alignment
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Where We Started
• Every 6 hours Corteva produces as much Genetic data as
existed in the entire public sphere in 2008
• On premises compute and storage demands were
becoming unsustainable
• 35 node Hadoop cluster with 2PB of storage
• Significant increase in future demand
10
Why AWS?
• Understood research needs
• Amazon service offerings mirrored on
premises Hadoop system
• Amazon Elastic Map Reduce
• Amazon Simple Storage Solution
(Amazon S3)
• Cost efficiency
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data Uses
• Genome-wide variation screening
• Transformation assay
• Quality control
• Whole genome assembly
Our Applications
SNPFinder
• Whole genome alignment of short reads
• Looking for single nucleotide
polymorphisms (SNPs)
• Input data size 50-500+GB
Vector Quality Control (VQC)
• Synthesize a DNA fragment to create
a transgenic event
• Synthesis needs to be quality
controlled
• Regulatory requirements
• Input data size <10MB
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Project Theseus
User Interaction
SNPFinder:
• Pipeline transforms data into queryable
state
• Analysis is done ad-hoc through a user
interface or API layer
VQC
• All processing is completed when data
enters the application
• Users are viewing these results to
inform decision making
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Guiding Principles
User Patterns
• Time Sensitive Workloads
• Small User Base
Technical
• Serverless
• Immutable Infrastructure
• Automate Everything
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Difference in Design
• Both application use the same input data
• Type of processing, outputs and technical
requirements are very different
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SNP Calling Pipeline
• Align short reads
• Decide if SNP or sequencing error
• Transform into queryable format
(Parquet)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SNPFinder
Pipeline
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SNPFinder
Pipeline
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SNPFinder
Pipeline
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SNPFinder
Pipeline
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SNPFinder
Pipeline
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SNPFinder Queries
• Position
• Purity/Coverage
• Neighborhood
• Other SNPs
• GC%
• Repetitive Sequence
• Annotations
AAATTGAGTACGCGAGCTAGCGAGCTAGAGCGATG
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SNPFinder Query
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SNPFinder Query
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SNPFinder Query
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SNPFinder Query
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SNPFinder Query
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SNPFinder User Interface
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
VQC Architecture
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
VQC Architecture – Data Ingestion
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
VQC Architecture – Data Ingestion
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
VQC Architecture – Query
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
VQC User Interface
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Architecture Comparison
Many small jobs - VQC
A few big jobs - SNPFinder
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Glue
• Introduced later in the project
• Using for data cleanup
• Move data without having to fully
reprocess
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Results – Autoscaling
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Results – Autoscaling
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Results – Data Import
Results – Business Impact
• Eliminate resource contention
• Disaster recovery
• Our data is now stored in many different physical locations
• Lab growth enabled
• Data storage is no longer an issue
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Thank you!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Ryan Smith
ryand.smith@pioneer.com
Corteva Agriscience
Scott Warren
scott.warren@us.sogeti.com
Sogeti USA
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Serverless Video Ingestion & Analytics with Amazon Kinesis Video Streams (ANT...
Serverless Video Ingestion & Analytics with Amazon Kinesis Video Streams (ANT...Serverless Video Ingestion & Analytics with Amazon Kinesis Video Streams (ANT...
Serverless Video Ingestion & Analytics with Amazon Kinesis Video Streams (ANT...
 
Develop Cross-Platform Mobile Apps with React Native, GraphQL, & AWS (MOB324)...
Develop Cross-Platform Mobile Apps with React Native, GraphQL, & AWS (MOB324)...Develop Cross-Platform Mobile Apps with React Native, GraphQL, & AWS (MOB324)...
Develop Cross-Platform Mobile Apps with React Native, GraphQL, & AWS (MOB324)...
 
Migrating database to cloud
Migrating database to cloudMigrating database to cloud
Migrating database to cloud
 
Keeping the Pace with Data Ingestion (GPSCT402) - AWS re:Invent 2018
Keeping the Pace with Data Ingestion (GPSCT402) - AWS re:Invent 2018Keeping the Pace with Data Ingestion (GPSCT402) - AWS re:Invent 2018
Keeping the Pace with Data Ingestion (GPSCT402) - AWS re:Invent 2018
 
What's New in AR & VR: State of the World Report (ARV203) - AWS re:Invent 2018
What's New in AR & VR: State of the World Report (ARV203) - AWS re:Invent 2018What's New in AR & VR: State of the World Report (ARV203) - AWS re:Invent 2018
What's New in AR & VR: State of the World Report (ARV203) - AWS re:Invent 2018
 
AWS IoT Core Workshop (IOT305-R1) - AWS re:Invent 2018
AWS IoT Core Workshop (IOT305-R1) - AWS re:Invent 2018AWS IoT Core Workshop (IOT305-R1) - AWS re:Invent 2018
AWS IoT Core Workshop (IOT305-R1) - AWS re:Invent 2018
 
Building IoT Analytics (IOT327-R1) - AWS re:Invent 2018
Building IoT Analytics (IOT327-R1) - AWS re:Invent 2018Building IoT Analytics (IOT327-R1) - AWS re:Invent 2018
Building IoT Analytics (IOT327-R1) - AWS re:Invent 2018
 
Overview of Redis with Search and Graph (DAT334) - AWS re:Invent 2018
Overview of Redis with Search and Graph (DAT334) - AWS re:Invent 2018Overview of Redis with Search and Graph (DAT334) - AWS re:Invent 2018
Overview of Redis with Search and Graph (DAT334) - AWS re:Invent 2018
 
Closing Loops and Opening Minds: How to Take Control of Systems, Big and Smal...
Closing Loops and Opening Minds: How to Take Control of Systems, Big and Smal...Closing Loops and Opening Minds: How to Take Control of Systems, Big and Smal...
Closing Loops and Opening Minds: How to Take Control of Systems, Big and Smal...
 
[NEW LAUNCH!] Building modern applications using Amazon DynamoDB transactions...
[NEW LAUNCH!] Building modern applications using Amazon DynamoDB transactions...[NEW LAUNCH!] Building modern applications using Amazon DynamoDB transactions...
[NEW LAUNCH!] Building modern applications using Amazon DynamoDB transactions...
 
Industrialize Machine Learning Using CI/CD Techniques (FSV304-i) - AWS re:Inv...
Industrialize Machine Learning Using CI/CD Techniques (FSV304-i) - AWS re:Inv...Industrialize Machine Learning Using CI/CD Techniques (FSV304-i) - AWS re:Inv...
Industrialize Machine Learning Using CI/CD Techniques (FSV304-i) - AWS re:Inv...
 
Breaking Up the Monolith While Migrating to AWS (GPSTEC320) - AWS re:Invent 2018
Breaking Up the Monolith While Migrating to AWS (GPSTEC320) - AWS re:Invent 2018Breaking Up the Monolith While Migrating to AWS (GPSTEC320) - AWS re:Invent 2018
Breaking Up the Monolith While Migrating to AWS (GPSTEC320) - AWS re:Invent 2018
 
91APP 之API 經濟學與API Gateway與導入之旅
91APP 之API 經濟學與API Gateway與導入之旅91APP 之API 經濟學與API Gateway與導入之旅
91APP 之API 經濟學與API Gateway與導入之旅
 
AIOps: Steps Towards Autonomous Operations (DEV301-R1) - AWS re:Invent 2018
AIOps: Steps Towards Autonomous Operations (DEV301-R1) - AWS re:Invent 2018AIOps: Steps Towards Autonomous Operations (DEV301-R1) - AWS re:Invent 2018
AIOps: Steps Towards Autonomous Operations (DEV301-R1) - AWS re:Invent 2018
 
Don’t Wait Until Tomorrow: From Batch to Streaming (ANT360) - AWS re:Invent 2018
Don’t Wait Until Tomorrow: From Batch to Streaming (ANT360) - AWS re:Invent 2018Don’t Wait Until Tomorrow: From Batch to Streaming (ANT360) - AWS re:Invent 2018
Don’t Wait Until Tomorrow: From Batch to Streaming (ANT360) - AWS re:Invent 2018
 
Managing Modern Infrastructure in Enterprises (ENT227-R1) - AWS re:Invent 2018
Managing Modern Infrastructure in Enterprises (ENT227-R1) - AWS re:Invent 2018Managing Modern Infrastructure in Enterprises (ENT227-R1) - AWS re:Invent 2018
Managing Modern Infrastructure in Enterprises (ENT227-R1) - AWS re:Invent 2018
 
Easy Rider: How ML, Serverless, and IoT Drive Mobility as a Service (AMT302) ...
Easy Rider: How ML, Serverless, and IoT Drive Mobility as a Service (AMT302) ...Easy Rider: How ML, Serverless, and IoT Drive Mobility as a Service (AMT302) ...
Easy Rider: How ML, Serverless, and IoT Drive Mobility as a Service (AMT302) ...
 
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
 
0x32 Shades of #7f7f7f: The Tension Between Absolutes and Ambiguity in Securi...
0x32 Shades of #7f7f7f: The Tension Between Absolutes and Ambiguity in Securi...0x32 Shades of #7f7f7f: The Tension Between Absolutes and Ambiguity in Securi...
0x32 Shades of #7f7f7f: The Tension Between Absolutes and Ambiguity in Securi...
 
Migrating Workloads from Oracle to Amazon Redshift: Best Practices with Pfize...
Migrating Workloads from Oracle to Amazon Redshift: Best Practices with Pfize...Migrating Workloads from Oracle to Amazon Redshift: Best Practices with Pfize...
Migrating Workloads from Oracle to Amazon Redshift: Best Practices with Pfize...
 

Semelhante a Petabytes of Data & No Servers: Corteva Scales DNA Analysis to Meet Increasing Business Demand (ENT218-S) - AWS re:Invent 2018

100 Billion Data Points With Lambda_AWSPSSummit_Singapore
100 Billion Data Points With Lambda_AWSPSSummit_Singapore100 Billion Data Points With Lambda_AWSPSSummit_Singapore
100 Billion Data Points With Lambda_AWSPSSummit_Singapore
Amazon Web Services
 
국내 미디어 고객사의 AWS 활용 사례 - POOQ 서비스, 콘텐츠연합플랫폼::조휘열::AWS Summit Seoul 2018
국내 미디어 고객사의 AWS 활용 사례 - POOQ 서비스, 콘텐츠연합플랫폼::조휘열::AWS Summit Seoul 2018국내 미디어 고객사의 AWS 활용 사례 - POOQ 서비스, 콘텐츠연합플랫폼::조휘열::AWS Summit Seoul 2018
국내 미디어 고객사의 AWS 활용 사례 - POOQ 서비스, 콘텐츠연합플랫폼::조휘열::AWS Summit Seoul 2018
Amazon Web Services Korea
 

Semelhante a Petabytes of Data & No Servers: Corteva Scales DNA Analysis to Meet Increasing Business Demand (ENT218-S) - AWS re:Invent 2018 (20)

Petabytes of Data and No Servers: Corteva Scales DNA Analysis to Meet Increas...
Petabytes of Data and No Servers: Corteva Scales DNA Analysis to Meet Increas...Petabytes of Data and No Servers: Corteva Scales DNA Analysis to Meet Increas...
Petabytes of Data and No Servers: Corteva Scales DNA Analysis to Meet Increas...
 
Build an AWS Analytics Solution to Monitor the Video Streaming Experience (MA...
Build an AWS Analytics Solution to Monitor the Video Streaming Experience (MA...Build an AWS Analytics Solution to Monitor the Video Streaming Experience (MA...
Build an AWS Analytics Solution to Monitor the Video Streaming Experience (MA...
 
100 Billion Data Points With Lambda_AWSPSSummit_Singapore
100 Billion Data Points With Lambda_AWSPSSummit_Singapore100 Billion Data Points With Lambda_AWSPSSummit_Singapore
100 Billion Data Points With Lambda_AWSPSSummit_Singapore
 
From Monolith to Modern Apps: Best Practices (SRV322-R2) - AWS re:Invent 2018
From Monolith to Modern Apps: Best Practices (SRV322-R2) - AWS re:Invent 2018From Monolith to Modern Apps: Best Practices (SRV322-R2) - AWS re:Invent 2018
From Monolith to Modern Apps: Best Practices (SRV322-R2) - AWS re:Invent 2018
 
Introducing Performance Insights - Cloud-Based Database Performance Monitorin...
Introducing Performance Insights - Cloud-Based Database Performance Monitorin...Introducing Performance Insights - Cloud-Based Database Performance Monitorin...
Introducing Performance Insights - Cloud-Based Database Performance Monitorin...
 
Introduction to Serverless computing and AWS Lambda - Floor28
Introduction to Serverless computing and AWS Lambda - Floor28Introduction to Serverless computing and AWS Lambda - Floor28
Introduction to Serverless computing and AWS Lambda - Floor28
 
Introduction to Serverless computing and AWS Lambda | AWS Floor28
Introduction to Serverless computing and AWS Lambda | AWS Floor28Introduction to Serverless computing and AWS Lambda | AWS Floor28
Introduction to Serverless computing and AWS Lambda | AWS Floor28
 
Introduction to Serverless on AWS - Builders Day Jerusalem
Introduction to Serverless on AWS - Builders Day JerusalemIntroduction to Serverless on AWS - Builders Day Jerusalem
Introduction to Serverless on AWS - Builders Day Jerusalem
 
Performance insights twitch
Performance insights twitchPerformance insights twitch
Performance insights twitch
 
From Idea to Customers: Developing Modern Cloud-Enabled Apps with AWS (MOB201...
From Idea to Customers: Developing Modern Cloud-Enabled Apps with AWS (MOB201...From Idea to Customers: Developing Modern Cloud-Enabled Apps with AWS (MOB201...
From Idea to Customers: Developing Modern Cloud-Enabled Apps with AWS (MOB201...
 
From Monolithic to Modern Apps: Best Practices
From Monolithic to Modern Apps: Best PracticesFrom Monolithic to Modern Apps: Best Practices
From Monolithic to Modern Apps: Best Practices
 
Fully Realizing the Microservices Vision with Service Mesh (DEV312-S) - AWS r...
Fully Realizing the Microservices Vision with Service Mesh (DEV312-S) - AWS r...Fully Realizing the Microservices Vision with Service Mesh (DEV312-S) - AWS r...
Fully Realizing the Microservices Vision with Service Mesh (DEV312-S) - AWS r...
 
Vanguard's Journey with Tableau to the AWS Cloud (FSV307-S) - AWS re:Invent 2018
Vanguard's Journey with Tableau to the AWS Cloud (FSV307-S) - AWS re:Invent 2018Vanguard's Journey with Tableau to the AWS Cloud (FSV307-S) - AWS re:Invent 2018
Vanguard's Journey with Tableau to the AWS Cloud (FSV307-S) - AWS re:Invent 2018
 
Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...
Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...
Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...
 
국내 미디어 고객사의 AWS 활용 사례 - POOQ 서비스, 콘텐츠연합플랫폼::조휘열::AWS Summit Seoul 2018
국내 미디어 고객사의 AWS 활용 사례 - POOQ 서비스, 콘텐츠연합플랫폼::조휘열::AWS Summit Seoul 2018국내 미디어 고객사의 AWS 활용 사례 - POOQ 서비스, 콘텐츠연합플랫폼::조휘열::AWS Summit Seoul 2018
국내 미디어 고객사의 AWS 활용 사례 - POOQ 서비스, 콘텐츠연합플랫폼::조휘열::AWS Summit Seoul 2018
 
Chaos Engineering
Chaos EngineeringChaos Engineering
Chaos Engineering
 
Building Highly Scalable Retail Order Management Systems with Serverless
Building Highly Scalable Retail Order Management Systems with ServerlessBuilding Highly Scalable Retail Order Management Systems with Serverless
Building Highly Scalable Retail Order Management Systems with Serverless
 
Modernizing Media Supply Chains with AWS Serverless (API301) - AWS re:Invent ...
Modernizing Media Supply Chains with AWS Serverless (API301) - AWS re:Invent ...Modernizing Media Supply Chains with AWS Serverless (API301) - AWS re:Invent ...
Modernizing Media Supply Chains with AWS Serverless (API301) - AWS re:Invent ...
 
Digital Transformation Through APIs (SRV323) - AWS re:Invent 2018
Digital Transformation Through APIs (SRV323) - AWS re:Invent 2018Digital Transformation Through APIs (SRV323) - AWS re:Invent 2018
Digital Transformation Through APIs (SRV323) - AWS re:Invent 2018
 
AWS Global Infrastructure Foundations
AWS Global Infrastructure Foundations AWS Global Infrastructure Foundations
AWS Global Infrastructure Foundations
 

Mais de Amazon Web Services

Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 

Mais de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Petabytes of Data & No Servers: Corteva Scales DNA Analysis to Meet Increasing Business Demand (ENT218-S) - AWS re:Invent 2018

  • 1.
  • 2. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Petabytes of Data & No Servers: Corteva Scales DNA Analysis to Meet Increasing Business Demand Ryan Smith Software Development Leader – Bioinformatics Corteva Agriscience E N T 2 1 8 - S Scott Warren Cloud Architect Sogeti USA
  • 3.
  • 4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 6.
  • 7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. DNA Sequencing Technology • Lab uses Illumina sequencing machines • Data generated for analysis
  • 8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Sequence Alignment
  • 9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Where We Started • Every 6 hours Corteva produces as much Genetic data as existed in the entire public sphere in 2008 • On premises compute and storage demands were becoming unsustainable • 35 node Hadoop cluster with 2PB of storage • Significant increase in future demand
  • 10. 10 Why AWS? • Understood research needs • Amazon service offerings mirrored on premises Hadoop system • Amazon Elastic Map Reduce • Amazon Simple Storage Solution (Amazon S3) • Cost efficiency © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data Uses • Genome-wide variation screening • Transformation assay • Quality control • Whole genome assembly
  • 12. Our Applications SNPFinder • Whole genome alignment of short reads • Looking for single nucleotide polymorphisms (SNPs) • Input data size 50-500+GB Vector Quality Control (VQC) • Synthesize a DNA fragment to create a transgenic event • Synthesis needs to be quality controlled • Regulatory requirements • Input data size <10MB © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Project Theseus
  • 14. User Interaction SNPFinder: • Pipeline transforms data into queryable state • Analysis is done ad-hoc through a user interface or API layer VQC • All processing is completed when data enters the application • Users are viewing these results to inform decision making © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 15. Guiding Principles User Patterns • Time Sensitive Workloads • Small User Base Technical • Serverless • Immutable Infrastructure • Automate Everything © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 16. Difference in Design • Both application use the same input data • Type of processing, outputs and technical requirements are very different © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 17. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 18. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. SNP Calling Pipeline • Align short reads • Decide if SNP or sequencing error • Transform into queryable format (Parquet)
  • 19. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. SNPFinder Pipeline © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 20. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. SNPFinder Pipeline © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 21. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. SNPFinder Pipeline © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 22. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. SNPFinder Pipeline © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 23. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. SNPFinder Pipeline © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 24. SNPFinder Queries • Position • Purity/Coverage • Neighborhood • Other SNPs • GC% • Repetitive Sequence • Annotations AAATTGAGTACGCGAGCTAGCGAGCTAGAGCGATG © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 25. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. SNPFinder Query
  • 26. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. SNPFinder Query
  • 27. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. SNPFinder Query
  • 28. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. SNPFinder Query
  • 29. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. SNPFinder Query
  • 30. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. SNPFinder User Interface
  • 31. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 32. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. VQC Architecture
  • 33. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. VQC Architecture – Data Ingestion
  • 34. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. VQC Architecture – Data Ingestion
  • 35. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. VQC Architecture – Query
  • 36. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. VQC User Interface
  • 37. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 38. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Architecture Comparison Many small jobs - VQC A few big jobs - SNPFinder
  • 39. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Glue • Introduced later in the project • Using for data cleanup • Move data without having to fully reprocess
  • 40. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 41. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Results – Autoscaling
  • 42. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Results – Autoscaling
  • 43. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Results – Data Import
  • 44. Results – Business Impact • Eliminate resource contention • Disaster recovery • Our data is now stored in many different physical locations • Lab growth enabled • Data storage is no longer an issue © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 45. Thank you! © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Ryan Smith ryand.smith@pioneer.com Corteva Agriscience Scott Warren scott.warren@us.sogeti.com Sogeti USA
  • 46. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.