SlideShare uma empresa Scribd logo
1 de 64
Baixar para ler offline
Puneet Suri, Thermo Fisher Scientific 
Shakila Pothini, Thermo Fisher Scientific 
Sami Zuhuruddin, Amazon Web Services 
November 12, 2014 | Las Vegas, NV 
HLS402 
Getting into Your Genes: The definitive guide to using Amazon 
EMR, Amazon ElastiCache, and Amazon S3 to Deliver High- 
Performance Scientific Applications
About me 
Puneet Suri 
Senior Director, Software Engineering 
Life Sciences Group, Thermo Fisher Scientific 
follow at: @psuriconnect at: puneet.suri@thermofisher.com 
Envisionedanddeveloped the life sciences cloud platform for Thermo Fisher Scientific
This is why we are here…
Having an impact… 
A person was set free after 35 years in prison because of a DNA test 
Freeing the innocent 
Surviving Cancer 
A person survived pancreatic cancer thanks to a genetic approach that allowed an oncologist to focus on a specific cancer cell 
Ebola
H1N1: Pandemic declared in April 2009
Need to enable this at larger scale & impact more lives
Customer needs… 
store & manage large scientific data sets
A few years back
Our offerings 
desktop applications 
challenges with upgrade cycle, versions etc. 
limited storage and compute capacity 
to analyze complex & large data sets 
no sharing & collaboration 
no backup, archive & security
Abetter way… is to provide 
STORAGE 
COMPUTE 
SCALABILITY 
MEMORY
Our vision
Adeep dive into our story
Aday with the scientist 
Get Insights 
 
 
 
 
aproject 
* 
* 
* 
*
Insights… 
•what is causing cancer 
•what drugs will work 
•is therapy working
Customer pain points 
•existing solutions cannot address the complexities 
•excel is used painfully to manually analyze data 
•multiple tools used to get the final insight 
•it takes days to analyze the data 
•some of the analysis workflow are not possible
Dimensions of complexity… 
millions 
of 
records 
thousands of users, 
projects 
real time analysis of large datasets 
2-3 seconds response time 
project 
storage 
compute 
performance 
scalability
Our journey enabling complex customer workflows
Our iterative journey & challenges 
0 
start with reference architecture 
1 
identify scalable storage solution 
2 
identify scalable storage solution for large data items 
3 
identify solutionsfor real time response & queries 
4 
Identify solutions for real time analysis ofdata
Reference web architecture 
A 
B 
User 
Client 
Internet 
DNS Routing : 
Route 53 
AUTO SCALING 
WEB SERVERS 
AUTO SCALING 
APP SERVERS 
Amazon 
RDS 
MASTER 
Amazon 
RDS 
STANDBY 
Synchronous Replication 
Load 
Balancers 
Load Balancers 
WEB SERVERS 
CDN: 
CloudFront 
APP SERVERS
Why relational DB was not considered 
•based on projected data and user growth over the years (hundreds of TBs), required real-time query performance very hard to achieve 
•needed managed scalability without sharding/re-shardingoverhead and disruptions 
•needed a loose schema to seamlessly enable new and cross domain workflows
Our iterative journey & challenges 
0 
start with reference architecture 
1 
identify scalable storage solution 
2 
identify scalable storage solution for large data items 
3 
identify solutionsfor real time response & queries 
4 
Identify solutions for real time analysis ofdata
NoSQL was the way to go 
•managed scalability 
•near zero administration overhead 
•query performance not impacted by table sizecan add billions of rows 
•simple and flexible schema –new domains can be supported 
•extremely fast read/write performance
Architecture with DynamoDB 
A 
B 
User 
Client 
Internet 
DNS Routing : 
Route 53 
AUTO SCALING 
WEB SERVERS 
AUTO SCALING 
APP SERVERS 
Load 
Balancers 
Load Balancers 
WEB SERVERS 
CDN: 
CloudFront 
APP SERVERS 
Auto Scaling 
AmazonDynamoDB
What worked well with DynamoDB 
Managed Service with flexible schema 
Managed Scalability 
Extremely fast access in order of milliseconds 
READ/WRITE
Iteration 1 
GBs 
GBs 
MBs 
MBs 
Instrument Run 
(1000s) 
Patient Samples 
(1000s) 
Genes 
(1000s) 
Raw Signals 
(millions) 
Analysis Results 
(millions) 
Storage 
Query 
Performance 
✔ 
✔ 
Cost 
✔ 
✔ 
Get Insights 
 
 
 
 
project
What were the gaps 
our item attribute (e.g.Instrument Run) size range > 400KB 
(item attribute size limitation of 64KB400KB) 
hot hash key& batch size limitations 
•Adding thousands of related records (e.g. Raw Signals) with common hash key (e.g. Instrument Run) can be slow (10s seconds) 
•a large project can have ~ 1 million records (e.g. Raw Signals) that needs to read & written 
for a large project, high read/write capacity (1000s) was needed 
(increased cost due to high READ/WRITE capacity needs)
What we needed 
Asolution that 
•can store huge number of related objects 
•is cost effective to read/write large data sets 
•has no limitations on batch size or item size 
•ability to query into the large number of records
Our iterative journey & challenges 
0 
start with reference architecture 
1 
identify scalable storage solution : DynamoDB 
2 
identify scalable storage solution for large data items 
3 
identify solutionsfor real time response & queries 
4 
Identify solutions for real time analysis ofdata
Architecture with DynamoDB & S3 
A 
B 
User 
Client 
Internet 
DNS Routing : 
Route 53 
AUTO SCALING 
WEB SERVERS 
AUTO SCALING 
APP SERVERS 
Load 
Balancers 
Load Balancers 
WEB SERVERS 
CDN: 
CloudFront 
APP SERVERS 
Auto Scaling 
DynamoDB 
Amazon S3
MBs 
MBs 
GBs 
Iteration 2 
GBs 
Instrument Run 
(1000s) 
Patient Samples 
(1000s) 
Genes 
(1000s) 
Raw Signals 
(millions) 
Analysis Results 
(millions) 
Storage 
Query 
Performance 
✔ 
✔ 
Cost 
✔ 
✔ 
Get Insights 
 
 
 

Architecture with DynamoDB & S3 
•DynamoDB was used to store small unrelated objects (KB) 
•will grow to a large number (e.g. Data Files) 
•Amazon S3 was used to store related larger objects (e.g. Raw Signals & Analysis Results (GB)) 
•stored as single Amazon S3 object serialized using google protobuf 
•Amazon S3 was cost effective for storing huge objects
Real time queries for complex visualizations
What we needed 
•complex visualizations requiresGigabytes of data to be queried in 2-3 secs and presented to the user 
•visualizations are very interactive that requires constant update of data. Need quick read & writes 
•support concurrent access without any degradation in query performance
Our iterative journey & challenges 
0 
start with reference architecture 
1 
identify scalable storage solution : DynamoDB 
2 
identify scalable storage solution for large data items :DynamoDB + AmazonS3 
3 
identify solutionsfor fast real time response & queries 
3 
Identify solutions for real time analysis ofdata
Distributed in-memory storage was the way to go 
read/writes have to be quick to enable fast response times, reading & writing from Amazon S3 was not ideal. 
•ElastiCachewas used as IN-MEMORY storage on top of DynamoDB& Amazon S3. 
•all related serialized objects in Amazon S3 accessed by customers is maintained in ElastiCacheas individual records 
•Indexes created in DynamoDB based on the query pattern so that data can be easily retrieved from ElastiCache
Architecture with DynamoDB, Amazon S3 & ElastiCache 
A 
B 
User 
Client 
Internet 
DNS Routing : 
Route 53 
AUTO SCALING 
WEB SERVERS 
AUTO SCALING 
APP SERVERS 
Load 
Balancers 
Load Balancers 
WEB SERVERS 
CDN: 
CloudFront 
APP SERVERS 
Auto Scaling 
DynamoDB 
Amazon S3 
ElastiCache
Iteration 3 
MBs 
MBs 
GBs 
GBs 
Instrument Run 
(1000s) 
Patient Samples 
(1000s) 
Genes 
(1000s) 
Raw Signals 
(millions) 
Analysis Results 
(millions) 
Storage Query 
Performance ✔ ✔ 
Cost ✔ ✔ 
indexes 
Get Insights 
   
Need for real time data analysis 
•analyze huge projects containing thousands of patient samples in minutes instead of days 
•a scalable solution is required to support analysis requests from thousands of users 
•existing desktop algorithms used for this analysis not optimized for extracting parallelism in data
8 
20 
40 
80 
120 
200 
320 
0 
50 
100 
150 
200 
250 
300 
350 
90000 
180000 
270000 
360000 
450000 
675000 
900000 
desktop 
desktop 
Analysis solutions in desktop 
desktop 
crashes 
minutes 
# of records 
Get Insights 
 
 
 

Excel nightmare
Our iterative journey & challenges 
0 
startwith reference architecture 
1 
identify scalable storage solution : DynamoDB 
2 
identify scalable storage solution for large data items : DynamoDB + Amazon S3 
3 
identify solutionsfor fast real time response & queries : DynamoDB + Amazon S3 + ElastiCache 
4 
Identify solutions for real time analysis ofdata
Amazon EMR was the way to go 
1.EMR was used to performreal time analysisof huge data sets – results in minutes instead of days 
2.all small jobs analyzed in-memory while big ones are sent toAmazon EMR. 
3.existing algorithms overhauled to derive massive parallelism using Hadoopmap-reduce framework 
4.as large datasets already in Amazon S3, used Amazon S3 for input and output instead of HDFS –only intermediate map-reduce data in HDFS 
5.Amazon EMR cluster is created On-Demand and shutdown when done
Architecture with EMR for real time 
analysis 
A 
B 
User 
Client 
Internet 
DNS Routing : 
Route 53 
AUTO SCALING 
WEB SERVERS 
AUTO SCALING 
APP SERVERS 
Load 
Balancers 
Load Balancers 
WEB SERVERS 
CDN: 
CloudFront 
APP SERVERS 
Auto Scaling 
DynamoDB 
Amazon S3 
ElastiCache 
EMR
Iteration 4 
MBs 
MBs 
GBs 
GBs 
Instrument Run 
(1000s) 
Patient Samples 
(1000s) 
Genes 
(1000s) 
Raw Signals 
(millions) 
Analysis Results 
(millions) 
Storage Query Analysis 
Performance ✔ ✔ ✔ 
Cost ✔ ✔ ✔ 
Get Insights 
   
Performance for a project 
2 
4 
7 
11 
13 
20 
30 
0 
50 
100 
150 
200 
250 
300 
350 
90000 
180000 
270000 
360000 
450000 
675000 
900000 
cloud 
desktop 
>10x 
crashes 
minutes 
# of records
Journey 
0 
start with reference architecture 
1 
identify scalable storage solution : DynamoDB 
2 
identify scalable storage solution for large data items : DynamoDB + Amazon S3 
3 
identify solutionsfor fast real time response & queries : DynamoDB+ Amazon S3 + ElastiCache 
4 
Identify solutions for real time analysis ofdata : Amazon EMR 
✓ 
✓ 
✓ 
✓ 
✓
Learnings 
• 
• 
• 
• 
• 
•
About me : Shakila Pothini 
Senior Manger, Cloud Apps 
Life Sciences Group, 
Thermo Fisher Scientific 
Hiking is my ONLY stress buster 
Entertain to Educate. 
Cofounder of performing arts group (swaram.org) 
Mostly left brained with 
occasional sense of creativity 
* 
* 
*
How to get into your gene? 
sequence the human entire transcriptome (30,000 genes) 
identify significant genes 
(100+ genes) 
validate & reconfirm the 
(20+ genes) 
do it on more samples & different population 
find the way the genes interplay in the pathway 
understand cancer diversity. 
types of therapy. 
drug-able genes.
Demo
Demo summary 
non cancerous sample 
cancerous sample 
difference in expression of genes
Customer feedback 
“My initial SymphoniSuite evaluation experience was good, GUI/ controls are intuitive and data upload/ analysis was fast and user friendly” 
UPENN 
“I enjoy processing hundreds of open array plates with ease.”, “I appreciate the rapid access of the large number of amplification curves ” 
Sanofi 
“I wanted to let you know that Symphonihas been working well for me. I have done analysis using as high as 500 files. ” 
ASU 
“This I see value in... utilizing these features. I appreciate the speed of data processing and visuals.” 
LUMC
Yearly checkup today 
165 / 105 
120 
50 / 90 
104
Is this really going to detect early stages of cancer?
A few years from now : every person 
ATGCATGCTATCAATTGCCC 
Sequence 
melanoma 
healthrisks 
drug response 
powered by AWS 
lifecloud
Yearly check-ups a few years from now 
ATGCATGC ATTGCCC 
ATGCATGC ATTGCCC 
TATCA 
GCATG 
lifecloud 
ATGCATGCTATCAATTGCCC 
Sequence
Yearly check-ups a few years from now (cont’d) 
cancer 
any clinical 
trial? 
healthrisks 
drug response 
ATGCATGCTATCAATTGCCC 
Sequence 
lifecloud 
prescribe the 
right drug
Puneet Suri 
Senior Director, Software engineering 
puneet.suri@thermofisher.comT: 650.266.5857 @psuri 
ShakilaPothini 
Senior Manager, Cloud Applications 
shakila.pothini@thermofisher.com 
SalilKumar 
Cloud Architect 
T: 650.740.1646 @salilkum
Collect / 
Ingest 
Kinesis 
Process / Analyze 
EMR 
EC2 
Redshift 
Data Pipeline 
Visualize / 
Report 
Glacier 
S3 
DynamoDB 
Store 
RDS 
Data Answers
Experiment 1 
Data Access 
Compute Time 
Experiment 2 
Experiment 3 
Data Access 
Compute Time 
Data Access 
Compute Time 
✔ 
✔ 
✔
EMR 
Cluster 
EC2 
Instance 
Data Temperature
Please give us your feedback on this session. 
Complete session evaluations and earn re:Invent swag. 
http://bit.ly/awsevals 
Puneet Suri 
Senior Director, Software engineering 
puneet.suri@thermofisher.com 
T: 650.266.5857 @psuri 
Shakila Pothini 
Senior Manager, Cloud Applications 
shakila.pothini@thermofisher.com 
T: 650.554.2190 
Salil Kumar 
Cloud Architect 
T: 650.740.1646 @salilkum

Mais conteúdo relacionado

Mais procurados

Building and scaling your containerized microservices on Amazon ECS
Building and scaling your containerized microservices on Amazon ECSBuilding and scaling your containerized microservices on Amazon ECS
Building and scaling your containerized microservices on Amazon ECSAmazon Web Services
 
Rackspace Best Practices for DevOps on AWS
Rackspace Best Practices for DevOps on AWSRackspace Best Practices for DevOps on AWS
Rackspace Best Practices for DevOps on AWSAmazon Web Services
 
Selecting the Right AWS Database Solution - AWS 2017 Online Tech Talks
Selecting the Right AWS Database Solution - AWS 2017 Online Tech TalksSelecting the Right AWS Database Solution - AWS 2017 Online Tech Talks
Selecting the Right AWS Database Solution - AWS 2017 Online Tech TalksAmazon Web Services
 
Real-time Data Processing using AWS Lambda
Real-time Data Processing using AWS LambdaReal-time Data Processing using AWS Lambda
Real-time Data Processing using AWS LambdaAmazon Web Services
 
Getting started with Amazon Dynamo BD
Getting started with Amazon Dynamo BDGetting started with Amazon Dynamo BD
Getting started with Amazon Dynamo BDAmazon Web Services
 
(BDT307) Zero Infrastructure, Real-Time Data Collection, and Analytics
(BDT307) Zero Infrastructure, Real-Time Data Collection, and Analytics(BDT307) Zero Infrastructure, Real-Time Data Collection, and Analytics
(BDT307) Zero Infrastructure, Real-Time Data Collection, and AnalyticsAmazon Web Services
 
Real-Time Processing Using AWS Lambda
Real-Time Processing Using AWS LambdaReal-Time Processing Using AWS Lambda
Real-Time Processing Using AWS LambdaAmazon Web Services
 
AWS re:Invent 2016: Getting Started with the Hybrid Cloud: Enterprise Backup ...
AWS re:Invent 2016: Getting Started with the Hybrid Cloud: Enterprise Backup ...AWS re:Invent 2016: Getting Started with the Hybrid Cloud: Enterprise Backup ...
AWS re:Invent 2016: Getting Started with the Hybrid Cloud: Enterprise Backup ...Amazon Web Services
 
AWS re:Invent 2016: Building Big Data Applications with the AWS Big Data Plat...
AWS re:Invent 2016: Building Big Data Applications with the AWS Big Data Plat...AWS re:Invent 2016: Building Big Data Applications with the AWS Big Data Plat...
AWS re:Invent 2016: Building Big Data Applications with the AWS Big Data Plat...Amazon Web Services
 
AWS re:Invent 2016: Getting Started with Amazon Aurora (DAT203)
AWS re:Invent 2016: Getting Started with Amazon Aurora (DAT203)AWS re:Invent 2016: Getting Started with Amazon Aurora (DAT203)
AWS re:Invent 2016: Getting Started with Amazon Aurora (DAT203)Amazon Web Services
 
Big Data and Architectural Patterns on AWS - Pop-up Loft Tel Aviv
Big Data and Architectural Patterns on AWS - Pop-up Loft Tel AvivBig Data and Architectural Patterns on AWS - Pop-up Loft Tel Aviv
Big Data and Architectural Patterns on AWS - Pop-up Loft Tel AvivAmazon Web Services
 
SRV404 Deep Dive on Amazon DynamoDB
SRV404 Deep Dive on Amazon DynamoDBSRV404 Deep Dive on Amazon DynamoDB
SRV404 Deep Dive on Amazon DynamoDBAmazon Web Services
 
serverless_architecture_patterns_london_loft.pdf
serverless_architecture_patterns_london_loft.pdfserverless_architecture_patterns_london_loft.pdf
serverless_architecture_patterns_london_loft.pdfAmazon Web Services
 
ENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million UsersENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million UsersAmazon Web Services
 
Data Storage for the Long Haul: Compliance and Archive
Data Storage for the Long Haul: Compliance and ArchiveData Storage for the Long Haul: Compliance and Archive
Data Storage for the Long Haul: Compliance and ArchiveAmazon Web Services
 
ENT306 Migrating Large Scale Data Sets to the Cloud
ENT306 Migrating Large Scale Data Sets to the CloudENT306 Migrating Large Scale Data Sets to the Cloud
ENT306 Migrating Large Scale Data Sets to the CloudAmazon Web Services
 
SEC303 Automating Security in Cloud Workloads with DevSecOps
SEC303 Automating Security in Cloud Workloads with DevSecOpsSEC303 Automating Security in Cloud Workloads with DevSecOps
SEC303 Automating Security in Cloud Workloads with DevSecOpsAmazon Web Services
 
SMC302 Building Serverless Web Applications
SMC302 Building Serverless Web ApplicationsSMC302 Building Serverless Web Applications
SMC302 Building Serverless Web ApplicationsAmazon Web Services
 
Getting Started with AWS Lambda and the Serverless Cloud - AWS Summit Cape T...
 Getting Started with AWS Lambda and the Serverless Cloud - AWS Summit Cape T... Getting Started with AWS Lambda and the Serverless Cloud - AWS Summit Cape T...
Getting Started with AWS Lambda and the Serverless Cloud - AWS Summit Cape T...Amazon Web Services
 

Mais procurados (20)

Big Data Architectural Patterns
Big Data Architectural PatternsBig Data Architectural Patterns
Big Data Architectural Patterns
 
Building and scaling your containerized microservices on Amazon ECS
Building and scaling your containerized microservices on Amazon ECSBuilding and scaling your containerized microservices on Amazon ECS
Building and scaling your containerized microservices on Amazon ECS
 
Rackspace Best Practices for DevOps on AWS
Rackspace Best Practices for DevOps on AWSRackspace Best Practices for DevOps on AWS
Rackspace Best Practices for DevOps on AWS
 
Selecting the Right AWS Database Solution - AWS 2017 Online Tech Talks
Selecting the Right AWS Database Solution - AWS 2017 Online Tech TalksSelecting the Right AWS Database Solution - AWS 2017 Online Tech Talks
Selecting the Right AWS Database Solution - AWS 2017 Online Tech Talks
 
Real-time Data Processing using AWS Lambda
Real-time Data Processing using AWS LambdaReal-time Data Processing using AWS Lambda
Real-time Data Processing using AWS Lambda
 
Getting started with Amazon Dynamo BD
Getting started with Amazon Dynamo BDGetting started with Amazon Dynamo BD
Getting started with Amazon Dynamo BD
 
(BDT307) Zero Infrastructure, Real-Time Data Collection, and Analytics
(BDT307) Zero Infrastructure, Real-Time Data Collection, and Analytics(BDT307) Zero Infrastructure, Real-Time Data Collection, and Analytics
(BDT307) Zero Infrastructure, Real-Time Data Collection, and Analytics
 
Real-Time Processing Using AWS Lambda
Real-Time Processing Using AWS LambdaReal-Time Processing Using AWS Lambda
Real-Time Processing Using AWS Lambda
 
AWS re:Invent 2016: Getting Started with the Hybrid Cloud: Enterprise Backup ...
AWS re:Invent 2016: Getting Started with the Hybrid Cloud: Enterprise Backup ...AWS re:Invent 2016: Getting Started with the Hybrid Cloud: Enterprise Backup ...
AWS re:Invent 2016: Getting Started with the Hybrid Cloud: Enterprise Backup ...
 
AWS re:Invent 2016: Building Big Data Applications with the AWS Big Data Plat...
AWS re:Invent 2016: Building Big Data Applications with the AWS Big Data Plat...AWS re:Invent 2016: Building Big Data Applications with the AWS Big Data Plat...
AWS re:Invent 2016: Building Big Data Applications with the AWS Big Data Plat...
 
AWS re:Invent 2016: Getting Started with Amazon Aurora (DAT203)
AWS re:Invent 2016: Getting Started with Amazon Aurora (DAT203)AWS re:Invent 2016: Getting Started with Amazon Aurora (DAT203)
AWS re:Invent 2016: Getting Started with Amazon Aurora (DAT203)
 
Big Data and Architectural Patterns on AWS - Pop-up Loft Tel Aviv
Big Data and Architectural Patterns on AWS - Pop-up Loft Tel AvivBig Data and Architectural Patterns on AWS - Pop-up Loft Tel Aviv
Big Data and Architectural Patterns on AWS - Pop-up Loft Tel Aviv
 
SRV404 Deep Dive on Amazon DynamoDB
SRV404 Deep Dive on Amazon DynamoDBSRV404 Deep Dive on Amazon DynamoDB
SRV404 Deep Dive on Amazon DynamoDB
 
serverless_architecture_patterns_london_loft.pdf
serverless_architecture_patterns_london_loft.pdfserverless_architecture_patterns_london_loft.pdf
serverless_architecture_patterns_london_loft.pdf
 
ENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million UsersENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million Users
 
Data Storage for the Long Haul: Compliance and Archive
Data Storage for the Long Haul: Compliance and ArchiveData Storage for the Long Haul: Compliance and Archive
Data Storage for the Long Haul: Compliance and Archive
 
ENT306 Migrating Large Scale Data Sets to the Cloud
ENT306 Migrating Large Scale Data Sets to the CloudENT306 Migrating Large Scale Data Sets to the Cloud
ENT306 Migrating Large Scale Data Sets to the Cloud
 
SEC303 Automating Security in Cloud Workloads with DevSecOps
SEC303 Automating Security in Cloud Workloads with DevSecOpsSEC303 Automating Security in Cloud Workloads with DevSecOps
SEC303 Automating Security in Cloud Workloads with DevSecOps
 
SMC302 Building Serverless Web Applications
SMC302 Building Serverless Web ApplicationsSMC302 Building Serverless Web Applications
SMC302 Building Serverless Web Applications
 
Getting Started with AWS Lambda and the Serverless Cloud - AWS Summit Cape T...
 Getting Started with AWS Lambda and the Serverless Cloud - AWS Summit Cape T... Getting Started with AWS Lambda and the Serverless Cloud - AWS Summit Cape T...
Getting Started with AWS Lambda and the Serverless Cloud - AWS Summit Cape T...
 

Semelhante a (HLS402) Getting into Your Genes: The Definitive Guide to Using Amazon EMR, Amazon ElastiCache, and Amazon S3 to Deliver High-Performance, Scientific Applications | AWS re:Invent 2014

Database and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudDatabase and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudAmazon Web Services
 
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924Amazon Web Services
 
Big Data & Analytics - Use Cases in Mobile, E-commerce, Media and more
Big Data & Analytics - Use Cases in Mobile, E-commerce, Media and moreBig Data & Analytics - Use Cases in Mobile, E-commerce, Media and more
Big Data & Analytics - Use Cases in Mobile, E-commerce, Media and moreAmazon Web Services
 
(BDT313) Amazon DynamoDB For Big Data
(BDT313) Amazon DynamoDB For Big Data(BDT313) Amazon DynamoDB For Big Data
(BDT313) Amazon DynamoDB For Big DataAmazon Web Services
 
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...Amazon Web Services
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
 
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...Amazon Web Services
 
re:Invent re:Cap - Big Data & IoT at Any Scale
re:Invent re:Cap - Big Data & IoT at Any Scalere:Invent re:Cap - Big Data & IoT at Any Scale
re:Invent re:Cap - Big Data & IoT at Any ScaleAdrian Hornsby
 
AWS Summit Seoul 2015 - AWS 클라우드를 활용한 빅데이터 및 실시간 스트리밍 분석
AWS Summit Seoul 2015 -  AWS 클라우드를 활용한 빅데이터 및 실시간 스트리밍 분석AWS Summit Seoul 2015 -  AWS 클라우드를 활용한 빅데이터 및 실시간 스트리밍 분석
AWS Summit Seoul 2015 - AWS 클라우드를 활용한 빅데이터 및 실시간 스트리밍 분석Amazon Web Services Korea
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Getting Started with Managed Database Services on AWS
Getting Started with Managed Database Services on AWSGetting Started with Managed Database Services on AWS
Getting Started with Managed Database Services on AWSAmazon Web Services
 
Choosing the right data storage in the Cloud.
Choosing the right data storage in the Cloud. Choosing the right data storage in the Cloud.
Choosing the right data storage in the Cloud. Amazon Web Services
 
AWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOT
AWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOTAWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOT
AWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOTAmazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Fast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWSFast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWSAmazon Web Services
 
(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big Data
(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big Data(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big Data
(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big DataAmazon Web Services
 
Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017
Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017
Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017Amazon Web Services
 
Strategic Uses for Cost Efficient Long-Term Cloud Storage
Strategic Uses for Cost Efficient Long-Term Cloud StorageStrategic Uses for Cost Efficient Long-Term Cloud Storage
Strategic Uses for Cost Efficient Long-Term Cloud StorageAmazon Web Services
 

Semelhante a (HLS402) Getting into Your Genes: The Definitive Guide to Using Amazon EMR, Amazon ElastiCache, and Amazon S3 to Deliver High-Performance, Scientific Applications | AWS re:Invent 2014 (20)

Database and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudDatabase and Analytics on the AWS Cloud
Database and Analytics on the AWS Cloud
 
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
 
Big Data & Analytics - Use Cases in Mobile, E-commerce, Media and more
Big Data & Analytics - Use Cases in Mobile, E-commerce, Media and moreBig Data & Analytics - Use Cases in Mobile, E-commerce, Media and more
Big Data & Analytics - Use Cases in Mobile, E-commerce, Media and more
 
(BDT313) Amazon DynamoDB For Big Data
(BDT313) Amazon DynamoDB For Big Data(BDT313) Amazon DynamoDB For Big Data
(BDT313) Amazon DynamoDB For Big Data
 
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWS
 
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
 
re:Invent re:Cap - Big Data & IoT at Any Scale
re:Invent re:Cap - Big Data & IoT at Any Scalere:Invent re:Cap - Big Data & IoT at Any Scale
re:Invent re:Cap - Big Data & IoT at Any Scale
 
AWS Summit Seoul 2015 - AWS 클라우드를 활용한 빅데이터 및 실시간 스트리밍 분석
AWS Summit Seoul 2015 -  AWS 클라우드를 활용한 빅데이터 및 실시간 스트리밍 분석AWS Summit Seoul 2015 -  AWS 클라우드를 활용한 빅데이터 및 실시간 스트리밍 분석
AWS Summit Seoul 2015 - AWS 클라우드를 활용한 빅데이터 및 실시간 스트리밍 분석
 
AWS Analytics
AWS AnalyticsAWS Analytics
AWS Analytics
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Getting Started with Managed Database Services on AWS
Getting Started with Managed Database Services on AWSGetting Started with Managed Database Services on AWS
Getting Started with Managed Database Services on AWS
 
Choosing the right data storage in the Cloud.
Choosing the right data storage in the Cloud. Choosing the right data storage in the Cloud.
Choosing the right data storage in the Cloud.
 
AWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOT
AWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOTAWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOT
AWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOT
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Fast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWSFast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWS
 
(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big Data
(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big Data(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big Data
(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big Data
 
Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017
Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017
Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017
 
Strategic Uses for Cost Efficient Long-Term Cloud Storage
Strategic Uses for Cost Efficient Long-Term Cloud StorageStrategic Uses for Cost Efficient Long-Term Cloud Storage
Strategic Uses for Cost Efficient Long-Term Cloud Storage
 

Mais de Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

Mais de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Último

Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Karmanjay Verma
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFMichael Gough
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...BookNet Canada
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfAarwolf Industries LLC
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsYoss Cohen
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Mark Simos
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessWSO2
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxAna-Maria Mihalceanu
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 

Último (20)

Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDF
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdf
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platforms
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
How Tech Giants Cut Corners to Harvest Data for A.I.
How Tech Giants Cut Corners to Harvest Data for A.I.How Tech Giants Cut Corners to Harvest Data for A.I.
How Tech Giants Cut Corners to Harvest Data for A.I.
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with Platformless
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance Toolbox
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 

(HLS402) Getting into Your Genes: The Definitive Guide to Using Amazon EMR, Amazon ElastiCache, and Amazon S3 to Deliver High-Performance, Scientific Applications | AWS re:Invent 2014

  • 1. Puneet Suri, Thermo Fisher Scientific Shakila Pothini, Thermo Fisher Scientific Sami Zuhuruddin, Amazon Web Services November 12, 2014 | Las Vegas, NV HLS402 Getting into Your Genes: The definitive guide to using Amazon EMR, Amazon ElastiCache, and Amazon S3 to Deliver High- Performance Scientific Applications
  • 2. About me Puneet Suri Senior Director, Software Engineering Life Sciences Group, Thermo Fisher Scientific follow at: @psuriconnect at: puneet.suri@thermofisher.com Envisionedanddeveloped the life sciences cloud platform for Thermo Fisher Scientific
  • 3. This is why we are here…
  • 4. Having an impact… A person was set free after 35 years in prison because of a DNA test Freeing the innocent Surviving Cancer A person survived pancreatic cancer thanks to a genetic approach that allowed an oncologist to focus on a specific cancer cell Ebola
  • 5. H1N1: Pandemic declared in April 2009
  • 6. Need to enable this at larger scale & impact more lives
  • 7. Customer needs… store & manage large scientific data sets
  • 8. A few years back
  • 9. Our offerings desktop applications challenges with upgrade cycle, versions etc. limited storage and compute capacity to analyze complex & large data sets no sharing & collaboration no backup, archive & security
  • 10. Abetter way… is to provide STORAGE COMPUTE SCALABILITY MEMORY
  • 12. Adeep dive into our story
  • 13. Aday with the scientist Get Insights     aproject * * * *
  • 14. Insights… •what is causing cancer •what drugs will work •is therapy working
  • 15. Customer pain points •existing solutions cannot address the complexities •excel is used painfully to manually analyze data •multiple tools used to get the final insight •it takes days to analyze the data •some of the analysis workflow are not possible
  • 16. Dimensions of complexity… millions of records thousands of users, projects real time analysis of large datasets 2-3 seconds response time project storage compute performance scalability
  • 17. Our journey enabling complex customer workflows
  • 18. Our iterative journey & challenges 0 start with reference architecture 1 identify scalable storage solution 2 identify scalable storage solution for large data items 3 identify solutionsfor real time response & queries 4 Identify solutions for real time analysis ofdata
  • 19. Reference web architecture A B User Client Internet DNS Routing : Route 53 AUTO SCALING WEB SERVERS AUTO SCALING APP SERVERS Amazon RDS MASTER Amazon RDS STANDBY Synchronous Replication Load Balancers Load Balancers WEB SERVERS CDN: CloudFront APP SERVERS
  • 20. Why relational DB was not considered •based on projected data and user growth over the years (hundreds of TBs), required real-time query performance very hard to achieve •needed managed scalability without sharding/re-shardingoverhead and disruptions •needed a loose schema to seamlessly enable new and cross domain workflows
  • 21. Our iterative journey & challenges 0 start with reference architecture 1 identify scalable storage solution 2 identify scalable storage solution for large data items 3 identify solutionsfor real time response & queries 4 Identify solutions for real time analysis ofdata
  • 22. NoSQL was the way to go •managed scalability •near zero administration overhead •query performance not impacted by table sizecan add billions of rows •simple and flexible schema –new domains can be supported •extremely fast read/write performance
  • 23. Architecture with DynamoDB A B User Client Internet DNS Routing : Route 53 AUTO SCALING WEB SERVERS AUTO SCALING APP SERVERS Load Balancers Load Balancers WEB SERVERS CDN: CloudFront APP SERVERS Auto Scaling AmazonDynamoDB
  • 24. What worked well with DynamoDB Managed Service with flexible schema Managed Scalability Extremely fast access in order of milliseconds READ/WRITE
  • 25. Iteration 1 GBs GBs MBs MBs Instrument Run (1000s) Patient Samples (1000s) Genes (1000s) Raw Signals (millions) Analysis Results (millions) Storage Query Performance ✔ ✔ Cost ✔ ✔ Get Insights     project
  • 26. What were the gaps our item attribute (e.g.Instrument Run) size range > 400KB (item attribute size limitation of 64KB400KB) hot hash key& batch size limitations •Adding thousands of related records (e.g. Raw Signals) with common hash key (e.g. Instrument Run) can be slow (10s seconds) •a large project can have ~ 1 million records (e.g. Raw Signals) that needs to read & written for a large project, high read/write capacity (1000s) was needed (increased cost due to high READ/WRITE capacity needs)
  • 27. What we needed Asolution that •can store huge number of related objects •is cost effective to read/write large data sets •has no limitations on batch size or item size •ability to query into the large number of records
  • 28. Our iterative journey & challenges 0 start with reference architecture 1 identify scalable storage solution : DynamoDB 2 identify scalable storage solution for large data items 3 identify solutionsfor real time response & queries 4 Identify solutions for real time analysis ofdata
  • 29. Architecture with DynamoDB & S3 A B User Client Internet DNS Routing : Route 53 AUTO SCALING WEB SERVERS AUTO SCALING APP SERVERS Load Balancers Load Balancers WEB SERVERS CDN: CloudFront APP SERVERS Auto Scaling DynamoDB Amazon S3
  • 30. MBs MBs GBs Iteration 2 GBs Instrument Run (1000s) Patient Samples (1000s) Genes (1000s) Raw Signals (millions) Analysis Results (millions) Storage Query Performance ✔ ✔ Cost ✔ ✔ Get Insights    
  • 31. Architecture with DynamoDB & S3 •DynamoDB was used to store small unrelated objects (KB) •will grow to a large number (e.g. Data Files) •Amazon S3 was used to store related larger objects (e.g. Raw Signals & Analysis Results (GB)) •stored as single Amazon S3 object serialized using google protobuf •Amazon S3 was cost effective for storing huge objects
  • 32. Real time queries for complex visualizations
  • 33. What we needed •complex visualizations requiresGigabytes of data to be queried in 2-3 secs and presented to the user •visualizations are very interactive that requires constant update of data. Need quick read & writes •support concurrent access without any degradation in query performance
  • 34. Our iterative journey & challenges 0 start with reference architecture 1 identify scalable storage solution : DynamoDB 2 identify scalable storage solution for large data items :DynamoDB + AmazonS3 3 identify solutionsfor fast real time response & queries 3 Identify solutions for real time analysis ofdata
  • 35. Distributed in-memory storage was the way to go read/writes have to be quick to enable fast response times, reading & writing from Amazon S3 was not ideal. •ElastiCachewas used as IN-MEMORY storage on top of DynamoDB& Amazon S3. •all related serialized objects in Amazon S3 accessed by customers is maintained in ElastiCacheas individual records •Indexes created in DynamoDB based on the query pattern so that data can be easily retrieved from ElastiCache
  • 36. Architecture with DynamoDB, Amazon S3 & ElastiCache A B User Client Internet DNS Routing : Route 53 AUTO SCALING WEB SERVERS AUTO SCALING APP SERVERS Load Balancers Load Balancers WEB SERVERS CDN: CloudFront APP SERVERS Auto Scaling DynamoDB Amazon S3 ElastiCache
  • 37. Iteration 3 MBs MBs GBs GBs Instrument Run (1000s) Patient Samples (1000s) Genes (1000s) Raw Signals (millions) Analysis Results (millions) Storage Query Performance ✔ ✔ Cost ✔ ✔ indexes Get Insights    
  • 38. Need for real time data analysis •analyze huge projects containing thousands of patient samples in minutes instead of days •a scalable solution is required to support analysis requests from thousands of users •existing desktop algorithms used for this analysis not optimized for extracting parallelism in data
  • 39. 8 20 40 80 120 200 320 0 50 100 150 200 250 300 350 90000 180000 270000 360000 450000 675000 900000 desktop desktop Analysis solutions in desktop desktop crashes minutes # of records Get Insights    
  • 41. Our iterative journey & challenges 0 startwith reference architecture 1 identify scalable storage solution : DynamoDB 2 identify scalable storage solution for large data items : DynamoDB + Amazon S3 3 identify solutionsfor fast real time response & queries : DynamoDB + Amazon S3 + ElastiCache 4 Identify solutions for real time analysis ofdata
  • 42. Amazon EMR was the way to go 1.EMR was used to performreal time analysisof huge data sets – results in minutes instead of days 2.all small jobs analyzed in-memory while big ones are sent toAmazon EMR. 3.existing algorithms overhauled to derive massive parallelism using Hadoopmap-reduce framework 4.as large datasets already in Amazon S3, used Amazon S3 for input and output instead of HDFS –only intermediate map-reduce data in HDFS 5.Amazon EMR cluster is created On-Demand and shutdown when done
  • 43. Architecture with EMR for real time analysis A B User Client Internet DNS Routing : Route 53 AUTO SCALING WEB SERVERS AUTO SCALING APP SERVERS Load Balancers Load Balancers WEB SERVERS CDN: CloudFront APP SERVERS Auto Scaling DynamoDB Amazon S3 ElastiCache EMR
  • 44. Iteration 4 MBs MBs GBs GBs Instrument Run (1000s) Patient Samples (1000s) Genes (1000s) Raw Signals (millions) Analysis Results (millions) Storage Query Analysis Performance ✔ ✔ ✔ Cost ✔ ✔ ✔ Get Insights    
  • 45. Performance for a project 2 4 7 11 13 20 30 0 50 100 150 200 250 300 350 90000 180000 270000 360000 450000 675000 900000 cloud desktop >10x crashes minutes # of records
  • 46. Journey 0 start with reference architecture 1 identify scalable storage solution : DynamoDB 2 identify scalable storage solution for large data items : DynamoDB + Amazon S3 3 identify solutionsfor fast real time response & queries : DynamoDB+ Amazon S3 + ElastiCache 4 Identify solutions for real time analysis ofdata : Amazon EMR ✓ ✓ ✓ ✓ ✓
  • 47. Learnings • • • • • •
  • 48. About me : Shakila Pothini Senior Manger, Cloud Apps Life Sciences Group, Thermo Fisher Scientific Hiking is my ONLY stress buster Entertain to Educate. Cofounder of performing arts group (swaram.org) Mostly left brained with occasional sense of creativity * * *
  • 49. How to get into your gene? sequence the human entire transcriptome (30,000 genes) identify significant genes (100+ genes) validate & reconfirm the (20+ genes) do it on more samples & different population find the way the genes interplay in the pathway understand cancer diversity. types of therapy. drug-able genes.
  • 50. Demo
  • 51. Demo summary non cancerous sample cancerous sample difference in expression of genes
  • 52. Customer feedback “My initial SymphoniSuite evaluation experience was good, GUI/ controls are intuitive and data upload/ analysis was fast and user friendly” UPENN “I enjoy processing hundreds of open array plates with ease.”, “I appreciate the rapid access of the large number of amplification curves ” Sanofi “I wanted to let you know that Symphonihas been working well for me. I have done analysis using as high as 500 files. ” ASU “This I see value in... utilizing these features. I appreciate the speed of data processing and visuals.” LUMC
  • 53. Yearly checkup today 165 / 105 120 50 / 90 104
  • 54. Is this really going to detect early stages of cancer?
  • 55. A few years from now : every person ATGCATGCTATCAATTGCCC Sequence melanoma healthrisks drug response powered by AWS lifecloud
  • 56. Yearly check-ups a few years from now ATGCATGC ATTGCCC ATGCATGC ATTGCCC TATCA GCATG lifecloud ATGCATGCTATCAATTGCCC Sequence
  • 57. Yearly check-ups a few years from now (cont’d) cancer any clinical trial? healthrisks drug response ATGCATGCTATCAATTGCCC Sequence lifecloud prescribe the right drug
  • 58. Puneet Suri Senior Director, Software engineering puneet.suri@thermofisher.comT: 650.266.5857 @psuri ShakilaPothini Senior Manager, Cloud Applications shakila.pothini@thermofisher.com SalilKumar Cloud Architect T: 650.740.1646 @salilkum
  • 59.
  • 60.
  • 61. Collect / Ingest Kinesis Process / Analyze EMR EC2 Redshift Data Pipeline Visualize / Report Glacier S3 DynamoDB Store RDS Data Answers
  • 62. Experiment 1 Data Access Compute Time Experiment 2 Experiment 3 Data Access Compute Time Data Access Compute Time ✔ ✔ ✔
  • 63. EMR Cluster EC2 Instance Data Temperature
  • 64. Please give us your feedback on this session. Complete session evaluations and earn re:Invent swag. http://bit.ly/awsevals Puneet Suri Senior Director, Software engineering puneet.suri@thermofisher.com T: 650.266.5857 @psuri Shakila Pothini Senior Manager, Cloud Applications shakila.pothini@thermofisher.com T: 650.554.2190 Salil Kumar Cloud Architect T: 650.740.1646 @salilkum