SlideShare a Scribd company logo
1 of 32
Download to read offline
Lukas Habegger, Associate Director Bioinformatics
Regeneron Genetics Center (RGC)
Insights from Building the
Future of Drug Discovery with
Apache Spark
#EntSAIS14
Outline
• Current state of drug discovery and development
• Benefits of leveraging human genetics data
• Overview of the Regeneron Genetics Center (RGC)
• Challenges on the road to delivering on the promises of big data and genomics in drug discovery
• Overview of how the RGC leverages Databricks’ Unified Analytics Platform and Apache Spark
• Discussion of key engineering innovations
• Conclusions & lessons learned
2#EntSAIS14
Current state of drug discovery and development:
Maximizing chances of success with human genetics
3
95% of experimental
medicines fail in
development; costs
exceed $2B per
approved drug
Higher probability
for success for
drugs with strong
human genetics
evidence
>$100B spent on
worldwide R&D by
biopharma industry à
only 10–20 new drugs
per year
Target bottleneck: <1,000
genes (<5% of all genes)
account for targets of all
drugs currently in
development
Herper M. Forbes.com. The Truly Staggering Cost of Inventing New Drugs. https://www.forbes.com/sites/matthewherper/2012/02/10/the-truly-staggering-cost-of-inventing-new-drugs/#355471a54a94. Feb. 10, 2012.
Herper M. Forbes.com. How the Staggering Cost of Inventing New Drugs Is Shaping the Future of Medicine. https://www.forbes.com/sites/matthewherper/2013/08/11/how-the-staggering-cost-of-inventing-new-drugs-is-shaping-the-future-of-medicine/#30f1a95113c3. Aug. 11, 2013.
Booth B. Forbes.com. A Billion Here, A Billion There: The Cost of Making a Drug Revisited. https://www.forbes.com/sites/brucebooth/2014/11/21/a-billion-here-a-billion-there-the-cost-of-making-a-drug-revisited/#6034e7f226a8. Nov. 21, 2014.
Nat Genet. 2015 Aug;47(8):856-60. doi: 10.1038/ng.3314. Nat Rev Drug Discov. 2013 Aug;12(8):581-94. doi: 10.1038/nrd4051. Nat Rev Drug Discov. 2017 Jan;16(1):19-34. doi: 10.1038/nrd.2016.230.
You cannot pursue modern drug discovery and development without incorporating human genetics
Why is human genetics such a powerful tool for drug
discovery?
4
Neutral
DNA
mutation
Loss-of-function
Impact on
disease
Impact on
gene product
Gain-of-function
NeutralProtective Damaging
Example: A à T
Why is human genetics such a powerful tool for drug
discovery?
5
Neutral
DNA
mutation
Loss-of-function
Impact on
disease
Impact on
gene product
Gain-of-function
NeutralProtective Damaging
Example: A à T
Why is human genetics such a powerful tool for drug
discovery?
6
Neutral
DNA
mutation
Loss-of-function
Impact on
disease
Impact on
gene product
Gain-of-function
NeutralProtective Damaging
Example: A à T
Why is human genetics such a powerful tool for drug
discovery?
7
Neutral
DNA
mutation
Loss-of-function
Drug
Impact on
disease
Impact on
gene product
Gain-of-function
NeutralProtective Damaging
Example: A à T
PCSK9: A success story where human genetics
evidence played a key role in drug development
8
Neutral
DNA
mutation
Loss-of-function
Drug
Impact on
disease
Impact on
gene product
Gain-of-function
NeutralProtective Damaging
• Loss-of-function
mutations in PCSK9
protect against heart
disease
• Regeneron developed
a drug to block PCSK9,
which has shown to be
effective in preventing
heart disease
Example: A à T
The goal of the RGC is build one of the world’s largest
genotype-phenotype resources
• Regeneron has a long history of commitment to genetics-based science, and a track record of
integrating human genetics into development programs, delivering new medicines to patients
• Regeneron established the Regeneron Genetics Center (RGC) in 2014
• Goal: build one of the world’s most comprehensive genetics databases to supplement our state-
of-the-art drug development pipeline
• To date, the RGC has sequenced DNA from more than 300,000 individuals
9#EntSAIS14
Breadth of human genetics resources: RGC network of
60+ collaborators representing over 1 million samples
10#EntSAIS14
Founder populations
Phenotype specific cohorts
Family studies
General population
Breadth of human genetics resources: RGC network of
60+ collaborators representing over 1 million samples
11#EntSAIS14
Founder populations
Phenotype specific cohorts
Family studies
General population
RGC collaboration with UK Biobank: RGC will sequence
~500K participants over 3-5 years
12#EntSAIS14
®
Automation is key to enable large-scale data production
and analysis
13#EntSAIS14
Automated biobank
(1.4M samples)
Library preparation
(>300,000 samples / year)
Sequencing instruments
(>300,000 samples / year)
100% cloud-based
informatics & analysis
®
A scalable informatics platform is needed to analyze this data and make it accessible to a broad set of users
How do we analyze our data to gain novel insights?
Approach and desired goal
14#EntSAIS14
• Approach:
1. Sequence a large number of individuals to
identify their mutations
2. Obtain paired clinical data (traits derived from
de-identified electronic medical records)
3. Test for correlations/associations between all
mutations and traits
4. Mine association results in various ways to
gain insights
MM
Individuals
Mutations
TM
Traits
Individuals
AR
Mutation : Trait
Analytical
engine
Association Results
Mutation Matrix Trait Matrix
Desired goal
How do we analyze our data to gain novel insights?
It’s more complicated – lack of data unification
15#EntSAIS14
MM
Individuals
Mutations
TM
Traits
Individuals
AR
Mutation : Trait
Analytical
engine
Association Results
Mutation Matrix Trait Matrix
Desired goalReality
MM
Individuals
Mutations
TM
Traits
Individuals
txt txtpVCF
AR
ResultsFiles
Mutation : Trait
• Data is decentralized and stored in different
formats
• Data is organized in different ways (e.g., not
squared off, transposed, custom
representations and indexing schemes)
• Asking simple questions requires many time-
consuming data wrangling steps
txt
How do we analyze our data to gain novel insights?
It’s more complicated – data from multiple cohorts
16#EntSAIS14
MM
Individuals
Mutations
TM
Traits
Individuals
AR
Mutation : Trait
Analytical
engine
Association Results
Mutation Matrix Trait Matrix
Desired goalReality
GT
Individuals
Mutations
TM
Traits
Individuals
txt
ResultsFiles
Mutation : Trait
• The RGC has data from multiple collaborators
• Data is not always consistent
• Limited functionality to unify / aggregate
matrices from multiple cohorts
GT
TM
MM
TM
AR
pVCF txt txt
How do we analyze our data to gain novel insights?
It’s more complicated – scalability issues
17#EntSAIS14
MM
Individuals
Mutations
TM
Traits
Individuals
AR
Mutation : Trait
Analytical
engine
Association Results
Mutation Matrix Trait Matrix
Desired goalReality
MM
Individuals
Mutations
TM
Traits
Individuals
AR
Mutation : Trait
Analytical
engine10s of millions
100s of billions
10s of thousands
• Large inputs
(MM & TM)
• MM x TM
cross join
• Massive
outputs (AR)
How do we find out what these mutations do?
The Databricks solution
18#EntSAIS14
• RGC has established a major partnership with
Databricks in 2017
• RGC is leveraging the Databricks Unified Analytics
Platform to create a unified data & compute
infrastructure:
1. Developed efficient and unified data
representations
2. Implemented scalable production workflows
optimized for analyzing billions of rows
3. Created a unified codebase to enable all
levels of users to perform computation
MM
Individuals
Mutations
TM
Traits
Individuals
AR
Mutation : Trait
Analytical
engine
Association Results
Mutation Matrix Trait Matrix
The RGC has developed easy-to-use web applications
to make the data accessible to a broad set of users
19#EntSAIS14
Web
Application
Databricks
Cluster
Query
Results
Queries
Library
Architecture of RGC web applications
MM
Individuals
Mutations
TM
Traits
Individuals
AR
Mutation : Trait
Analytical
engine
Association Results
Mutation Matrix Trait Matrix
Goal: to enable everyone in the drug development process to
easily access, analyze, and extract insights from the RGC’s data
The RGC Results Browser enables users to query
billions of association results
• Goal: Efficiently search billions of association
results across multiple cohorts
• The data set is updated when association results
from a new cohort become available
• Size of the current data set: >67 billion association
results (>200 billion results for the next update)
20#EntSAIS14
AR
Optimizations to the ETL workflow have significantly
reduced the time to ingest the association results
• Association results are ingested and merged
from multiple cohorts
• Spark-based solution scales linearly with
cluster size
– Several optimizations have made the
process more efficient
– Migration of other QC processes into
this workflow enable an end-to-end
Spark solution
21#EntSAIS14
Optimizing the partitioning scheme has significantly
reduced the query response time
• The input data is naturally organized by cohort; not query optimized
22#EntSAIS14
AR
Chromosomal Location
Gene
density
Results
density AR
Chromosomal boundaries
Partition
density
Variable range width & count
Range
Partitioned
• Optimizations reduced the query response time from >30 minutes to <3 seconds
Demo notebook: mining association results and
extracting key insights
23#EntSAIS14
The RGC has recently identified a new potential drug
target for treating liver disease
24#EntSAIS14
Source: https://endpts.com/the-pcsk9-of-nash-regeneron-and-alnylam-join-forces-to-tackle-a-promising-target-for-severe-liver-diseases/
Liver disease can be detected based on enzyme levels
in the blood
• Two enzymes are typically analyzed to evaluate liver damage:
– AST (Aspartate transaminase)
– ALT (Alanine transaminase)
• Elevated levels of AST and ALT are indicative of liver damage
– Necessary but not sufficient
• Goal: identify loss-of-function mutations that are associated with lower AST and ALT levels
(protective effect)
25#EntSAIS14
Manhattan plot for AST: Several mutations in the
genome are associated with this liver trait
26#EntSAIS14
What peak / mutation is the
most interesting?
Manhattan plot for AST: Several mutations in the
genome are associated with this liver trait
27#EntSAIS14
What peak / mutation is the
most interesting?
HSD17B13
28#EntSAIS14
• The mutation of interest is associated with a broad spectrum of liver disease traits
• All of these associations confer protection from liver disease
29#EntSAIS14
Conclusions & lessons learned
• At Regeneron our goal is to bring the power of science to medicine and develop new medicines for
patients in need
• Incorporating human genetics evidence is critical for pursuing modern drug discovery; the RGC is
building one of the world’s largest genetics databases to identify new potential drug targets
• Our strategic partnership with Databricks has enabled us to build a state-of-the-art data science
platform from scratch by:
– Developing efficient and unified data representations
– Building out scalable workflows to mine billions of rows and addressing key bottlenecks (e.g.,
reducing the ETL time from weeks to hours and optimizing the query response time to <3s)
– Creating a unified codebase to enable all levels of users to perform computation
• Most importantly, the Databricks Unified Analytics Platform, brings our data, tools, and people together
to accelerate innovation
30#EntSAIS14
Acknowledgements
31#EntSAIS14
• RGC-LT
– Alan Shuldiner
– Aris Baras
– Aris Economides
– Jeffrey Reid
– John Overton
• RGC-GI
– Alicia Hawes
– Ashish Yadav
– Claire Chai
– Evan Maxwell
– Gisu Eom
– Jeff Staples
– John Penn
– Leland Barnard
– Shareef Khalid
– Sheldon Bai
– Suganthi Balasubramanian
– Young Hahn
• RGC
– Alexander Li
– Alexander Lopez
– Amy Damask
– Charlie Paulding
– Claudia Schurmann
– Colm O’Dushlaine
– Cristopher Van Hout
– Dylan Sun
– Jan Freudenberg
– Kavita Praveen
– Kia Manoochehri
– Lauren Gurski
– Manasi Pradhan
– Mike Norsen
– Nehal Gosalia
– Nila Banerjee
– Rick Ulloa
– Shane McCarthy
– Tanya Teslovich Dostal
– Tony Marcketta
• Databricks
– Ali Ghodsi
– Ali Hodroj
– Allan Marcos
– Ambareesh Kulkarni
– Bavesh Patel
– Christopher Hoshino-Fish
– David Weaver
– Francis Gerace
– Hossein Falaki
– Ion Stocia
– Juliusz Sompolsk
– Li Yu
– Navid Bazzazzadeh
– Paris Georgallis
– Ram Sriharsha
– Ronak Shah
– Shiva Bhattacharjee
– Vida Ha
– Yongsheng Huang
• REGN-IT
– Abdul Shaik
– Allen Chiang
– Brandon Fetch
– Christopher McCabe
– Dale Cochran
– David Glosser
– Long Le
– Michael Phillips
– Mohammad Saeed
– Pat Leblanc
– Sal Mineo
– Shaw Nawaz
– Shiva Ravi
– Stephen Huvane
– Vin Dahake
– Weylin Preodor
Questions?
32#EntSAIS14
https://tinyurl.com/yaqwl2bt
We are hiring!

More Related Content

What's hot

Overcoming challenges in Drug Development
Overcoming challenges in Drug DevelopmentOvercoming challenges in Drug Development
Overcoming challenges in Drug Development
Charles Oo
 
Methods in Rational Drug design.pptx
Methods in Rational Drug design.pptxMethods in Rational Drug design.pptx
Methods in Rational Drug design.pptx
ashharnomani
 
Drug Discovery and Development Using AI
Drug Discovery and Development Using AIDrug Discovery and Development Using AI
Drug Discovery and Development Using AI
Databricks
 

What's hot (20)

Car T cell
Car T cellCar T cell
Car T cell
 
How Oracle Argus Safety 8.x Supports Product Safety Needs
How Oracle Argus Safety 8.x Supports Product Safety NeedsHow Oracle Argus Safety 8.x Supports Product Safety Needs
How Oracle Argus Safety 8.x Supports Product Safety Needs
 
Ai in drug discovery and drug development
Ai in drug discovery and drug developmentAi in drug discovery and drug development
Ai in drug discovery and drug development
 
Overcoming challenges in Drug Development
Overcoming challenges in Drug DevelopmentOvercoming challenges in Drug Development
Overcoming challenges in Drug Development
 
Current and future strategies for treatment of gliomas: Is gene therapy the s...
Current and future strategies for treatment of gliomas: Is gene therapy the s...Current and future strategies for treatment of gliomas: Is gene therapy the s...
Current and future strategies for treatment of gliomas: Is gene therapy the s...
 
Car t cell tumor board
Car  t cell tumor boardCar  t cell tumor board
Car t cell tumor board
 
AI in pharmacy: Revolutionizing Healthcare
AI in pharmacy: Revolutionizing HealthcareAI in pharmacy: Revolutionizing Healthcare
AI in pharmacy: Revolutionizing Healthcare
 
Pharmacogenomics
PharmacogenomicsPharmacogenomics
Pharmacogenomics
 
Systems biology & Approaches of genomics and proteomics
 Systems biology & Approaches of genomics and proteomics Systems biology & Approaches of genomics and proteomics
Systems biology & Approaches of genomics and proteomics
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
 
How to Create Fit-For-Purpose Clinical Study Reports for Successful Submissions
How to Create Fit-For-Purpose Clinical Study Reports for Successful SubmissionsHow to Create Fit-For-Purpose Clinical Study Reports for Successful Submissions
How to Create Fit-For-Purpose Clinical Study Reports for Successful Submissions
 
Methods in Rational Drug design.pptx
Methods in Rational Drug design.pptxMethods in Rational Drug design.pptx
Methods in Rational Drug design.pptx
 
Stakeholders in Medical Industry
Stakeholders  in Medical IndustryStakeholders  in Medical Industry
Stakeholders in Medical Industry
 
PI3 kinase pathway
PI3 kinase pathwayPI3 kinase pathway
PI3 kinase pathway
 
Pathway and network analysis
Pathway and network analysisPathway and network analysis
Pathway and network analysis
 
Docking based screening of drugs.
Docking based screening of drugs.Docking based screening of drugs.
Docking based screening of drugs.
 
AI-Augmented Drug Discovery - Creative Biolabs
AI-Augmented Drug Discovery - Creative BiolabsAI-Augmented Drug Discovery - Creative Biolabs
AI-Augmented Drug Discovery - Creative Biolabs
 
Drug Discovery and Development Using AI
Drug Discovery and Development Using AIDrug Discovery and Development Using AI
Drug Discovery and Development Using AI
 
Artificial intelligence in drug discovery
Artificial intelligence in drug discoveryArtificial intelligence in drug discovery
Artificial intelligence in drug discovery
 
Chemoinformatics.ppt
Chemoinformatics.pptChemoinformatics.ppt
Chemoinformatics.ppt
 

Similar to Insights from Building the Future of Drug Discovery with Apache Spark with Lukas Habegger

2016 Standardization of Laboratory Test Coding - PHI Conference
2016 Standardization of Laboratory Test Coding - PHI Conference2016 Standardization of Laboratory Test Coding - PHI Conference
2016 Standardization of Laboratory Test Coding - PHI Conference
Megan Sawchuk
 
FedCentric_Presentation
FedCentric_PresentationFedCentric_Presentation
FedCentric_Presentation
Yatpang Cheung
 

Similar to Insights from Building the Future of Drug Discovery with Apache Spark with Lukas Habegger (20)

2016 Standardization of Laboratory Test Coding - PHI Conference
2016 Standardization of Laboratory Test Coding - PHI Conference2016 Standardization of Laboratory Test Coding - PHI Conference
2016 Standardization of Laboratory Test Coding - PHI Conference
 
Explainable AI in Drug Hunting
Explainable AI in Drug HuntingExplainable AI in Drug Hunting
Explainable AI in Drug Hunting
 
Pistoia alliance debates analytics 15-09-2015 16.00
Pistoia alliance debates   analytics 15-09-2015 16.00Pistoia alliance debates   analytics 15-09-2015 16.00
Pistoia alliance debates analytics 15-09-2015 16.00
 
MedChemica BigData What Is That All About?
MedChemica BigData What Is That All About?MedChemica BigData What Is That All About?
MedChemica BigData What Is That All About?
 
Emerging Challenges for Artificial Intelligence in Medicinal Chemistry
Emerging Challenges for Artificial Intelligence in Medicinal ChemistryEmerging Challenges for Artificial Intelligence in Medicinal Chemistry
Emerging Challenges for Artificial Intelligence in Medicinal Chemistry
 
RNA-Seq Boston (23-25 June 2015) Agenda
RNA-Seq Boston (23-25 June 2015) AgendaRNA-Seq Boston (23-25 June 2015) Agenda
RNA-Seq Boston (23-25 June 2015) Agenda
 
2016 LabHIT Vision
2016 LabHIT Vision2016 LabHIT Vision
2016 LabHIT Vision
 
5th RNA-Seq San Francisco Agenda
5th RNA-Seq San Francisco Agenda5th RNA-Seq San Francisco Agenda
5th RNA-Seq San Francisco Agenda
 
MDC Connects: Make the Molecules that Matter
MDC Connects: Make the Molecules that MatterMDC Connects: Make the Molecules that Matter
MDC Connects: Make the Molecules that Matter
 
Sharing and standards christopher hart - clinical innovation and partnering...
Sharing and standards   christopher hart - clinical innovation and partnering...Sharing and standards   christopher hart - clinical innovation and partnering...
Sharing and standards christopher hart - clinical innovation and partnering...
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
Discovery on Target 2014 - The Industry's Preeminent Event on Novel Drug Targets
Discovery on Target 2014 - The Industry's Preeminent Event on Novel Drug TargetsDiscovery on Target 2014 - The Industry's Preeminent Event on Novel Drug Targets
Discovery on Target 2014 - The Industry's Preeminent Event on Novel Drug Targets
 
The Role of Data Lakes in Healthcare
The Role of Data Lakes in HealthcareThe Role of Data Lakes in Healthcare
The Role of Data Lakes in Healthcare
 
10th Annual Utah's Health Services Research Conference - Data Quality in Mult...
10th Annual Utah's Health Services Research Conference - Data Quality in Mult...10th Annual Utah's Health Services Research Conference - Data Quality in Mult...
10th Annual Utah's Health Services Research Conference - Data Quality in Mult...
 
SMi Group's 14th annual Drug Design 2015 conference
SMi Group's 14th annual Drug Design 2015 conferenceSMi Group's 14th annual Drug Design 2015 conference
SMi Group's 14th annual Drug Design 2015 conference
 
Early Metabolite (MetID) Info Sheet
Early Metabolite (MetID) Info SheetEarly Metabolite (MetID) Info Sheet
Early Metabolite (MetID) Info Sheet
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
Data supporting precision oncology fda wakibbe
Data supporting precision oncology fda wakibbeData supporting precision oncology fda wakibbe
Data supporting precision oncology fda wakibbe
 
FedCentric_Presentation
FedCentric_PresentationFedCentric_Presentation
FedCentric_Presentation
 
RNA-Seq 2013 Brochure
RNA-Seq 2013 BrochureRNA-Seq 2013 Brochure
RNA-Seq 2013 Brochure
 

More from Databricks

Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 

Recently uploaded (20)

Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 

Insights from Building the Future of Drug Discovery with Apache Spark with Lukas Habegger

  • 1. Lukas Habegger, Associate Director Bioinformatics Regeneron Genetics Center (RGC) Insights from Building the Future of Drug Discovery with Apache Spark #EntSAIS14
  • 2. Outline • Current state of drug discovery and development • Benefits of leveraging human genetics data • Overview of the Regeneron Genetics Center (RGC) • Challenges on the road to delivering on the promises of big data and genomics in drug discovery • Overview of how the RGC leverages Databricks’ Unified Analytics Platform and Apache Spark • Discussion of key engineering innovations • Conclusions & lessons learned 2#EntSAIS14
  • 3. Current state of drug discovery and development: Maximizing chances of success with human genetics 3 95% of experimental medicines fail in development; costs exceed $2B per approved drug Higher probability for success for drugs with strong human genetics evidence >$100B spent on worldwide R&D by biopharma industry à only 10–20 new drugs per year Target bottleneck: <1,000 genes (<5% of all genes) account for targets of all drugs currently in development Herper M. Forbes.com. The Truly Staggering Cost of Inventing New Drugs. https://www.forbes.com/sites/matthewherper/2012/02/10/the-truly-staggering-cost-of-inventing-new-drugs/#355471a54a94. Feb. 10, 2012. Herper M. Forbes.com. How the Staggering Cost of Inventing New Drugs Is Shaping the Future of Medicine. https://www.forbes.com/sites/matthewherper/2013/08/11/how-the-staggering-cost-of-inventing-new-drugs-is-shaping-the-future-of-medicine/#30f1a95113c3. Aug. 11, 2013. Booth B. Forbes.com. A Billion Here, A Billion There: The Cost of Making a Drug Revisited. https://www.forbes.com/sites/brucebooth/2014/11/21/a-billion-here-a-billion-there-the-cost-of-making-a-drug-revisited/#6034e7f226a8. Nov. 21, 2014. Nat Genet. 2015 Aug;47(8):856-60. doi: 10.1038/ng.3314. Nat Rev Drug Discov. 2013 Aug;12(8):581-94. doi: 10.1038/nrd4051. Nat Rev Drug Discov. 2017 Jan;16(1):19-34. doi: 10.1038/nrd.2016.230. You cannot pursue modern drug discovery and development without incorporating human genetics
  • 4. Why is human genetics such a powerful tool for drug discovery? 4 Neutral DNA mutation Loss-of-function Impact on disease Impact on gene product Gain-of-function NeutralProtective Damaging Example: A à T
  • 5. Why is human genetics such a powerful tool for drug discovery? 5 Neutral DNA mutation Loss-of-function Impact on disease Impact on gene product Gain-of-function NeutralProtective Damaging Example: A à T
  • 6. Why is human genetics such a powerful tool for drug discovery? 6 Neutral DNA mutation Loss-of-function Impact on disease Impact on gene product Gain-of-function NeutralProtective Damaging Example: A à T
  • 7. Why is human genetics such a powerful tool for drug discovery? 7 Neutral DNA mutation Loss-of-function Drug Impact on disease Impact on gene product Gain-of-function NeutralProtective Damaging Example: A à T
  • 8. PCSK9: A success story where human genetics evidence played a key role in drug development 8 Neutral DNA mutation Loss-of-function Drug Impact on disease Impact on gene product Gain-of-function NeutralProtective Damaging • Loss-of-function mutations in PCSK9 protect against heart disease • Regeneron developed a drug to block PCSK9, which has shown to be effective in preventing heart disease Example: A à T
  • 9. The goal of the RGC is build one of the world’s largest genotype-phenotype resources • Regeneron has a long history of commitment to genetics-based science, and a track record of integrating human genetics into development programs, delivering new medicines to patients • Regeneron established the Regeneron Genetics Center (RGC) in 2014 • Goal: build one of the world’s most comprehensive genetics databases to supplement our state- of-the-art drug development pipeline • To date, the RGC has sequenced DNA from more than 300,000 individuals 9#EntSAIS14
  • 10. Breadth of human genetics resources: RGC network of 60+ collaborators representing over 1 million samples 10#EntSAIS14 Founder populations Phenotype specific cohorts Family studies General population
  • 11. Breadth of human genetics resources: RGC network of 60+ collaborators representing over 1 million samples 11#EntSAIS14 Founder populations Phenotype specific cohorts Family studies General population
  • 12. RGC collaboration with UK Biobank: RGC will sequence ~500K participants over 3-5 years 12#EntSAIS14 ®
  • 13. Automation is key to enable large-scale data production and analysis 13#EntSAIS14 Automated biobank (1.4M samples) Library preparation (>300,000 samples / year) Sequencing instruments (>300,000 samples / year) 100% cloud-based informatics & analysis ® A scalable informatics platform is needed to analyze this data and make it accessible to a broad set of users
  • 14. How do we analyze our data to gain novel insights? Approach and desired goal 14#EntSAIS14 • Approach: 1. Sequence a large number of individuals to identify their mutations 2. Obtain paired clinical data (traits derived from de-identified electronic medical records) 3. Test for correlations/associations between all mutations and traits 4. Mine association results in various ways to gain insights MM Individuals Mutations TM Traits Individuals AR Mutation : Trait Analytical engine Association Results Mutation Matrix Trait Matrix Desired goal
  • 15. How do we analyze our data to gain novel insights? It’s more complicated – lack of data unification 15#EntSAIS14 MM Individuals Mutations TM Traits Individuals AR Mutation : Trait Analytical engine Association Results Mutation Matrix Trait Matrix Desired goalReality MM Individuals Mutations TM Traits Individuals txt txtpVCF AR ResultsFiles Mutation : Trait • Data is decentralized and stored in different formats • Data is organized in different ways (e.g., not squared off, transposed, custom representations and indexing schemes) • Asking simple questions requires many time- consuming data wrangling steps txt
  • 16. How do we analyze our data to gain novel insights? It’s more complicated – data from multiple cohorts 16#EntSAIS14 MM Individuals Mutations TM Traits Individuals AR Mutation : Trait Analytical engine Association Results Mutation Matrix Trait Matrix Desired goalReality GT Individuals Mutations TM Traits Individuals txt ResultsFiles Mutation : Trait • The RGC has data from multiple collaborators • Data is not always consistent • Limited functionality to unify / aggregate matrices from multiple cohorts GT TM MM TM AR pVCF txt txt
  • 17. How do we analyze our data to gain novel insights? It’s more complicated – scalability issues 17#EntSAIS14 MM Individuals Mutations TM Traits Individuals AR Mutation : Trait Analytical engine Association Results Mutation Matrix Trait Matrix Desired goalReality MM Individuals Mutations TM Traits Individuals AR Mutation : Trait Analytical engine10s of millions 100s of billions 10s of thousands • Large inputs (MM & TM) • MM x TM cross join • Massive outputs (AR)
  • 18. How do we find out what these mutations do? The Databricks solution 18#EntSAIS14 • RGC has established a major partnership with Databricks in 2017 • RGC is leveraging the Databricks Unified Analytics Platform to create a unified data & compute infrastructure: 1. Developed efficient and unified data representations 2. Implemented scalable production workflows optimized for analyzing billions of rows 3. Created a unified codebase to enable all levels of users to perform computation MM Individuals Mutations TM Traits Individuals AR Mutation : Trait Analytical engine Association Results Mutation Matrix Trait Matrix
  • 19. The RGC has developed easy-to-use web applications to make the data accessible to a broad set of users 19#EntSAIS14 Web Application Databricks Cluster Query Results Queries Library Architecture of RGC web applications MM Individuals Mutations TM Traits Individuals AR Mutation : Trait Analytical engine Association Results Mutation Matrix Trait Matrix Goal: to enable everyone in the drug development process to easily access, analyze, and extract insights from the RGC’s data
  • 20. The RGC Results Browser enables users to query billions of association results • Goal: Efficiently search billions of association results across multiple cohorts • The data set is updated when association results from a new cohort become available • Size of the current data set: >67 billion association results (>200 billion results for the next update) 20#EntSAIS14 AR
  • 21. Optimizations to the ETL workflow have significantly reduced the time to ingest the association results • Association results are ingested and merged from multiple cohorts • Spark-based solution scales linearly with cluster size – Several optimizations have made the process more efficient – Migration of other QC processes into this workflow enable an end-to-end Spark solution 21#EntSAIS14
  • 22. Optimizing the partitioning scheme has significantly reduced the query response time • The input data is naturally organized by cohort; not query optimized 22#EntSAIS14 AR Chromosomal Location Gene density Results density AR Chromosomal boundaries Partition density Variable range width & count Range Partitioned • Optimizations reduced the query response time from >30 minutes to <3 seconds
  • 23. Demo notebook: mining association results and extracting key insights 23#EntSAIS14
  • 24. The RGC has recently identified a new potential drug target for treating liver disease 24#EntSAIS14 Source: https://endpts.com/the-pcsk9-of-nash-regeneron-and-alnylam-join-forces-to-tackle-a-promising-target-for-severe-liver-diseases/
  • 25. Liver disease can be detected based on enzyme levels in the blood • Two enzymes are typically analyzed to evaluate liver damage: – AST (Aspartate transaminase) – ALT (Alanine transaminase) • Elevated levels of AST and ALT are indicative of liver damage – Necessary but not sufficient • Goal: identify loss-of-function mutations that are associated with lower AST and ALT levels (protective effect) 25#EntSAIS14
  • 26. Manhattan plot for AST: Several mutations in the genome are associated with this liver trait 26#EntSAIS14 What peak / mutation is the most interesting?
  • 27. Manhattan plot for AST: Several mutations in the genome are associated with this liver trait 27#EntSAIS14 What peak / mutation is the most interesting? HSD17B13
  • 29. • The mutation of interest is associated with a broad spectrum of liver disease traits • All of these associations confer protection from liver disease 29#EntSAIS14
  • 30. Conclusions & lessons learned • At Regeneron our goal is to bring the power of science to medicine and develop new medicines for patients in need • Incorporating human genetics evidence is critical for pursuing modern drug discovery; the RGC is building one of the world’s largest genetics databases to identify new potential drug targets • Our strategic partnership with Databricks has enabled us to build a state-of-the-art data science platform from scratch by: – Developing efficient and unified data representations – Building out scalable workflows to mine billions of rows and addressing key bottlenecks (e.g., reducing the ETL time from weeks to hours and optimizing the query response time to <3s) – Creating a unified codebase to enable all levels of users to perform computation • Most importantly, the Databricks Unified Analytics Platform, brings our data, tools, and people together to accelerate innovation 30#EntSAIS14
  • 31. Acknowledgements 31#EntSAIS14 • RGC-LT – Alan Shuldiner – Aris Baras – Aris Economides – Jeffrey Reid – John Overton • RGC-GI – Alicia Hawes – Ashish Yadav – Claire Chai – Evan Maxwell – Gisu Eom – Jeff Staples – John Penn – Leland Barnard – Shareef Khalid – Sheldon Bai – Suganthi Balasubramanian – Young Hahn • RGC – Alexander Li – Alexander Lopez – Amy Damask – Charlie Paulding – Claudia Schurmann – Colm O’Dushlaine – Cristopher Van Hout – Dylan Sun – Jan Freudenberg – Kavita Praveen – Kia Manoochehri – Lauren Gurski – Manasi Pradhan – Mike Norsen – Nehal Gosalia – Nila Banerjee – Rick Ulloa – Shane McCarthy – Tanya Teslovich Dostal – Tony Marcketta • Databricks – Ali Ghodsi – Ali Hodroj – Allan Marcos – Ambareesh Kulkarni – Bavesh Patel – Christopher Hoshino-Fish – David Weaver – Francis Gerace – Hossein Falaki – Ion Stocia – Juliusz Sompolsk – Li Yu – Navid Bazzazzadeh – Paris Georgallis – Ram Sriharsha – Ronak Shah – Shiva Bhattacharjee – Vida Ha – Yongsheng Huang • REGN-IT – Abdul Shaik – Allen Chiang – Brandon Fetch – Christopher McCabe – Dale Cochran – David Glosser – Long Le – Michael Phillips – Mohammad Saeed – Pat Leblanc – Sal Mineo – Shaw Nawaz – Shiva Ravi – Stephen Huvane – Vin Dahake – Weylin Preodor