Using Bioinformatics Data to inform Therapeutics discovery and development

From data to insights and
action: Strategies to take
your bioinformatics to the
next level
Eleanor Howe, Diamond Age Data Science
Huseyin Mehmet, Zafgen, Inc.
December 7, 2018

What is this talk about?
• Who are we? What is computational biology?
• Lessons learned from working with our customers
• Our ongoing relationship with Zafgen
• Q&A

Eleanor Howe, PhD
Background in molecular biology, statistics,
programming and computational
biology/bioinformatics
eleanor@diamondage.com

Diamond Age Data Science
www.diamondage.com
Bioinformatics/computational biology consulting
Project-based analysis
Staff augmentation
Pipeline development
“Drop-in” bioinformatics department
The Diamond Age: or,
A Young Lady’s Illustrated Primer
by Neal Stephenson

Team
Chris Friedline
Sequencing,
software engineering
Somdutta Saha
Computational chemistry and
proteomics
Bruce Romano
Mathematics and data science
Nicholas Crawford
Human genetics and GWAS
Mike DeRan
Cancer and diabetes
therapeutics, scRNA-seq
Max Marin
RNA splicing
Zarko Boskovic
Medicinal chemistry and
metabolomics
Chris Dwan
IT and data security

Computational Biology
Computational biology is data
science for biology
Bioinformatics is sometimes a
synonym for computational
biology.
Other times, bioinformatics refers
to software engineering for
biology.

Drug discovery requires evaluation of
diverse, complex data
• Sequence analysis is very different
from proteomics
• Knowing the landscape of available
datasets is key
• Individual bioinformaticians tend to
specialize in one sub-field or
another

Public datasets are a gold mine
• Cancer Cell-line Encyclopedia
• The Cancer Genome Atlas
• Gene Expression Omnibus
• Dependencies Map (Dep-map)
• UK Biobank
• DrugBank
• VarSome
• GTeX

But the real gems come from your own
experiments
It’s not possible to validate a drug
target using public datasets alone.
The public datasets are general, and
cover only the most common
diseases or disease subtypes.
The most useful results come from
combining custom-generated data
with public data.

CROs do the basics well
• Ocean Ridge, Novogene ($200 transcriptome!)
• Good for the basics - RNA-seq, DNA-seq, proteomics, metabolomics
• Reasonable standardized analysis pipelines
• Challenges:
• combining multiple datasets across experiments or across CROs
• more involved analysis (e.g. splicing)
• Do a thorough cost-comparison when considering an academic
collaborator
• Also ask them when their student is graduating.

What additional expertise do you need?
Early stage “traditional” therapeutics companies don’t need a full-time
computational biologist. Part time can work fine.
When the company expands, hire a computational biologist with
substantial experience, or an analyst with some kind of advisor
available.

Computational biologist:
Experience/training in all three
areas
Analyst: Biology + programming,
with an advisor to help with the
statistics
Methods developer: Wants to
build new analytical tools
Know what you need

What expertise do you need?
For Teams:
• Cross-discipline expertise
-biology, chemistry, computer science, statistics
• Communication skills
• Lateral thinking

Expertise gets you fast answers
The problem:
Get a terabyte of data from a USB
hard drive to the cloud in time to
analyze a dataset for a conference

Expertise gets you fast answers
The problem:
Get a terabyte of data from a USB
hard drive to the cloud in time to
analyze a dataset for a conference
The solution:
Bicycle across the Charles
3Gb/s bicycle (latency of 1.2M
ms)
Datacenter internet connection
Markley Data Center

Deep Learning / Artificial Intelligence
Another danger zone

Deep Learning / Artificial Intelligence
Deep learning is “new” in
that it’s a more complex
version of older
technology: a neural
network
Modern compute power
allows for powerful
classifiers trained on very
large datasets

The basics of machine learning (and DL)
Deep Learning works in a
similar way to other types
of machine learning.
The algorithms use larger
datasets and are more
complex. But the overall
workflow is the same.

Should you use deep learning?
Is your training data:
Large. 100,000+ to 1M+
samples
Well-annotated. Gene
expression data usually isn’t.
Representative of the
questions you want to answer?
In discovery biology, the data is
usually not there. Hence “discovery”.

Good use-cases for deep learning
Image processing
Diagnostics from histology,
radiology
High-content screening
Biochemical structure/sequence
Epitope prediction
Protein folding (Deep Mind)
Single-cell RNA-seq (potentially)

Should you use deep learning? (cont)
Do you need an interpretable model?
Deep learning is a black box
Have you tried everything else?
Linear models, random
forests, other ML techniques
These tools are often faster, cheaper,
and easier to understand and
implement

Huseyin Mehmet, PhD
Vice President and Head of Discovery Research
Zafgen, Inc.

Zafgen, Inc
• Publicly traded bio-pharmaceutical company
• Founded 12 years ago (IPO in 2014)
• Virtual company
• Bringing MetAP2 inhibitors to market
• Areas of interest: Metabolic disease

Zafgen and Diamond Age
Diamond Age acts as a virtual bioinformatics
department for Zafgen
• Data Analysis
• Data Management
• Hypothesis generation
• Technology recommendations

What Diamond Age has done for Zafgen
• Transcriptional profiling
• Proteomics/phosphoproteomics
• Metabolomics
• Clinical outcomes
• Custom apps for client needs

The benefits
What can Zafgen can do now that it couldn’t before?
• Iterative data generation
• Cross-dataset analyses
• Confidence in analysis results from CROs
• Link between pre-clinical and clinical data
• Cost efficiencies / value for money

Using Bioinformatics Data to inform Therapeutics discovery and development

Using Bioinformatics Data to inform Therapeutics discovery and development

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (19)

Semelhante a Using Bioinformatics Data to inform Therapeutics discovery and development

Semelhante a Using Bioinformatics Data to inform Therapeutics discovery and development (20)

Último

Último (20)

Using Bioinformatics Data to inform Therapeutics discovery and development