tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART’s Application to Clinical Biomarker Discovery Studies in Sanofi
Sherry Cao, Sanofi
This presentation will discuss challenges we are encountering in clinical biomarker discovery
study and how we are using tranSMART to help to address them.
3. Clinical Biomarker Discovery Process
Clinical Sample
Procurement
Clinical Information
• Patients
• Diseases
• Clinical Phenotypes
• Lab tests
• Pathology reports
• Drugs
Data Capture
Molecular Information
• DNA
• RNA
• Protein
• Lipid
• Metabolites
Discovery &
Interpretation
Biomarkers
• Diagnostic
• Prognostic
• Efficacy
Signatures
• Molecular classifications
• Patient stratifications
Target ID/Credentialing
• Molecular targets
• Pathways
• Clinical phenotypes
Clinical Sample
Validation
Sample Sources
• In house
• Public
Type
• In silico
• Experimental
4. Challenges for Clinical Biomarker Discovery
●
High-throughput biological measurements generate
unprecedented amount of data for each biological sample
● Chip based profiling technologies
● Exome, transcriptome & genomic sequencing technologies
●
Data Management
The complexity of disease biology requires large sample
numbers to reach statistical significance
● GWAS studies for complex traits
● Molecular signature developments for patient stratification
●
Heterogeneous data types & data sources
● Research & clinical
● Structured & non-structured data
●
●
Data curation is a very critical & time consuming process
Complex analysis & visualizations are needed to transform
data to knowledge
Integration
& Analysis
5. Interdisciplinary team for Clinical Biomarker Research
Clinical
Statisticians
Clinicians
CBR
Team
Research
Scientists
Clinical
Informaticians
Research
Informaticians
5
6. Two Distinctive User Groups
Clinicians, Research
Scientists
Informatic Scientists &
Statisticians
Main Role
Hypothesis generation,
Mechanistic Interpretation
Data analysis
Statistical Analysis Type
Single variable, correlative
analysis
Multi-variable complex
analysis
Very limited
SAS, JMP, R
Drag & Drop GUI
API
Data acquisition, Data
analysis turnaround time
Data acquisition, Data
curation & reformatting,
Not enough time to do real
analysis
Statistical Tool Access
User Interface
Major Complaints
7. Informatics Systems Mapped onto Research
Flow
Data Capture
Discovery
Interpretation
Clinical Sample
Validation
Platform
Specific
System
Data Management
& Integration
8. Challenges for Clinical Biomarker Discovery
●
High-throughput biological measurements generate
unprecedented amount of data for each biological sample
● Chip based profiling technologies
● Exome, transcriptome & genomic sequencing technologies
●
Data Management
The complexity of disease biology requires large sample
numbers to reach statistical significance
● GWAS studies for complex traits
● Molecular signature developments for patient stratification
●
Heterogeneous data types & data sources
● Research & clinical
● Structured & non-structured data
●
●
Data curation is a very critical & time consuming process
Complex analysis & visualizations are needed to transform
data to knowledge
Integration
& Analysis
9. Two Distinctive User Groups
Clinicians, Research
Scientists
Informatic Scientists &
Statisticians
Main Role
Hypothesis generation,
Mechanistic Interpretation
Data analysis
Statistical Analysis Type
Single variable, correlative
analysis
Multi-variable complex
analysis
Very limited
SAS, JMP, R
Drag & Drop GUI
API
Data acquisition, Data
analysis turnaround time
Data acquisition, Data
curation & reformatting,
Not enough time to do real
analysis
Statistical Tool Access
User Interface
Major Complaints
10. Informatics Systems Mapped onto Research
Flow
Data Capture
Discovery
Interpretation
Clinical Sample
Validation
Platform
Specific
System
Data Management
& Integration
11. Role of TranSMART within Sanofi
●
●
●
Translational data hub - One stop shop for all data related to a
biomarker discovery project
Data management & integration
● Clinical & research data
● Structured & non-structured data
● Fully curated data for integrated analysis & not-fully curated data
Deliver critically needed statistical/informatics analysis tool to
clinicians & research scientists
● Unit variant analysis
● Simple clustering analysis & heatmap generation
Help informatics scientists to generate custom analysis data sets
based on distinctive cohort definitions Data management & integration
12. Clinical Biomarker Discovery Use Case 1
●
●
●
●
●
Business unit with established & active biomarker discovery
process
Samples are routinely sent out for profiling at different platforms
Data are generated routinely both from CRO & internal groups
● High throughput profiling data
● Low throughput imaging & assay data (IHC, ELISA, qPCR, etc.)
Situation
● Biomarker team reps are overwhelmed by data management
related questions with little time to do actual analysis
Critical need
● How to organize data effectively?
● How to manage the low throughput data systematically with data
from clinical & high throughput data?
● How to search & find the relevant data quickly?
13. tranSMART in Sanofi – Data Management
Global view of all the data available
From level 1 data (uncurated/raw files)
to levels 3-4 data (analysis results, findings)
Run analysis on subject-level data
(former Dataset Explorer)
Navigate within Programs > Studies > Assays
, Analysis and File Folders (see next slide)
Browse level 2 (processed) data – incl. clinical /
preclinical / molecular data, etc.
Search data using dictionaries
Search subject-level data
Create new Programs > Studies > Assays and Files
Folders, and annotate (tag) them
Select data subsets (cohorts)
Export files
Run basic statistical and genomic analyses on
those subsets (standard features from tranSMART v1.0)
Visualize gene expression analysis results
Export out data subsets
14. Data organization
●
Data is organized in a hierarchical structure:
Program
Study
File Folder*
Assay
Analysis
* A file folder can be created at any
levels: program, study, assay…
Each object (Program, Study, Assay, etc.)
is tagged with metadata:
– Provide information on the object
– Enable queries using search
Predefined annotation templates
– Most fields use CV with pick-list or
autocomplete functionalities. Examples of
dictionaries used: MESH, WhoDD, some
branches Nextbio Ontology.
– Description field enables to capture free text
|
14
16. Integrated search
New search function at the top of the screen. Any data (levels 1-4) can be searched.
Dropdown with a list of
dictionaries + free-text
search
Autocomplete
feature for values
in dictionaries
Analyze view:
The system points you to level 2 data
Browse view: The search returns
Programs, Studies, Assays and/or
Files that match your query
|
16
17. Filter
A new Filter option can also be used for selections based on fields with a small
set of possible values.
1
2
The search returns
Programs, Studies, Assays and/or Files
that match your query.
|
17
18. Search & filter in Analyze
Synchronized search & filter function in Analyze
|
18
19. Visualization of gene expression analysis
Creation of a template for loading and displaying gene expression analysis results.
|
19
20. File export – Shopping Cart function
New concept of Shopping Cart for exporting files.
Note: If positive feedback from users on this Shopping Cart concept, we may extend this feature in RC-2 to subject-level data.
|
20
21. Clinical Biomarker Discovery Use Case 2
●
●
●
●
Business unit with focused biomarker discovery program
Goal is to identify disease progression biomarkers than the current
clinical functional test
Situation at hand
● Researchers don’t have any appropriate analytical tools for
correlative analysis
● A variety of profiling experiments are being planned
• RNAseq, Proteomics, RBM, miRNA, Metabolomics
● Patient data at multiple time points are collected
Critical need
● How to integrate all the data?
● How to enable clinical researchers to analyze and visualize data?
● How to analyze time series data more effectively?
22. tranSMART in Sanofi – Data Integration
Current state
● Within study clinical & gene expression profiling data
End Point
●
Gene expression
23. tranSMART in Sanofi – Data Integration
●
In the pipeline
● Multi-modal profiling data support
Data types to be addressed
●
●
●
●
●
RNAseq
miRNA profiling (qPCR + seq)
Metabolomics
Proteomics
RBM
Protein Level
●
Gene expression
24. tranSMART in Sanofi – Providing Analysis Tools
to Research Scientists
General Summary Statistics on Patient Cohorts
27. Clinical Biomarker Discovery Use Case 3
●
●
Efficacy biomarker discovery for complex disease with 15,000
patients
Situation at hand
● A number of profiling experiments are being planned
• RNAseq, RBM, Metabolomics
● Patients often manifest other disease symptons
●
Critical issue
● How to load such a large dataset?
● How to analyze such a large sample numbers with multiple high
dimensional data?
● How to analyze comorbidities?
28. Conclusions
●
●
●
●
tranSMART can provide critical solutions for clinical biomarker
discovery needs
● Data management, integration & analysis
Two distinctive user groups for tranSMART through user interface
and through API
Different business units have different requirements for
tranSMART
Sanofi developed critical user interface and functionality
improvements to meet sanofi and general clinical biomarker
discovery needs
30. Acknowledgement
●
●
●
●
Genzyme
● Jike Cui, Adam Palermo, Rena Baek, Petra Olivova, Leslie Jost, Rob
Pomponio, Allison McVie-Wylie, Steve Madden, Clarence Wang
Diabetes
● Juergen Kammerer, Manfred Hendlich, Dan Crowther
Oncology
● Mary Penniston, Jack Pollard
Sanofi tranSMART development team
● Claire Virenque, Annick Peraux
● Angelo Decristofano, Lars Greiffenberg, Christophe Gibault, David
Peyruc
31. Dream Analysis Process
Define question
Identify patient cohort
Obtain relevant profile
& clinical data
Run analysis
Satisfied
Format!
Export &
publish results