Mais conteúdo relacionado Semelhante a Petabytes of Data & No Servers: Corteva Scales DNA Analysis to Meet Increasing Business Demand (ENT218-S) - AWS re:Invent 2018 (20) Mais de Amazon Web Services (20) Petabytes of Data & No Servers: Corteva Scales DNA Analysis to Meet Increasing Business Demand (ENT218-S) - AWS re:Invent 20182. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Petabytes of Data & No Servers: Corteva
Scales DNA Analysis to Meet Increasing
Business Demand
Ryan Smith
Software Development Leader –
Bioinformatics
Corteva Agriscience
E N T 2 1 8 - S
Scott Warren
Cloud Architect
Sogeti USA
4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
DNA Sequencing Technology
• Lab uses Illumina sequencing
machines
• Data generated for analysis
8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Sequence Alignment
9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Where We Started
• Every 6 hours Corteva produces as much Genetic data as
existed in the entire public sphere in 2008
• On premises compute and storage demands were
becoming unsustainable
• 35 node Hadoop cluster with 2PB of storage
• Significant increase in future demand
10. 10
Why AWS?
• Understood research needs
• Amazon service offerings mirrored on
premises Hadoop system
• Amazon Elastic Map Reduce
• Amazon Simple Storage Solution
(Amazon S3)
• Cost efficiency
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data Uses
• Genome-wide variation screening
• Transformation assay
• Quality control
• Whole genome assembly
12. Our Applications
SNPFinder
• Whole genome alignment of short reads
• Looking for single nucleotide
polymorphisms (SNPs)
• Input data size 50-500+GB
Vector Quality Control (VQC)
• Synthesize a DNA fragment to create
a transgenic event
• Synthesis needs to be quality
controlled
• Regulatory requirements
• Input data size <10MB
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Project Theseus
14. User Interaction
SNPFinder:
• Pipeline transforms data into queryable
state
• Analysis is done ad-hoc through a user
interface or API layer
VQC
• All processing is completed when data
enters the application
• Users are viewing these results to
inform decision making
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
15. Guiding Principles
User Patterns
• Time Sensitive Workloads
• Small User Base
Technical
• Serverless
• Immutable Infrastructure
• Automate Everything
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
16. Difference in Design
• Both application use the same input data
• Type of processing, outputs and technical
requirements are very different
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
17. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
18. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SNP Calling Pipeline
• Align short reads
• Decide if SNP or sequencing error
• Transform into queryable format
(Parquet)
19. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SNPFinder
Pipeline
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
20. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SNPFinder
Pipeline
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
21. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SNPFinder
Pipeline
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
22. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SNPFinder
Pipeline
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
23. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SNPFinder
Pipeline
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
24. SNPFinder Queries
• Position
• Purity/Coverage
• Neighborhood
• Other SNPs
• GC%
• Repetitive Sequence
• Annotations
AAATTGAGTACGCGAGCTAGCGAGCTAGAGCGATG
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
25. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SNPFinder Query
26. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SNPFinder Query
27. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SNPFinder Query
28. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SNPFinder Query
29. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SNPFinder Query
30. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SNPFinder User Interface
31. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
32. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
VQC Architecture
33. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
VQC Architecture – Data Ingestion
34. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
VQC Architecture – Data Ingestion
35. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
VQC Architecture – Query
36. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
VQC User Interface
37. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
38. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Architecture Comparison
Many small jobs - VQC
A few big jobs - SNPFinder
39. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Glue
• Introduced later in the project
• Using for data cleanup
• Move data without having to fully
reprocess
40. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
41. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Results – Autoscaling
42. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Results – Autoscaling
43. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Results – Data Import
44. Results – Business Impact
• Eliminate resource contention
• Disaster recovery
• Our data is now stored in many different physical locations
• Lab growth enabled
• Data storage is no longer an issue
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
45. Thank you!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Ryan Smith
ryand.smith@pioneer.com
Corteva Agriscience
Scott Warren
scott.warren@us.sogeti.com
Sogeti USA
46. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.