SlideShare uma empresa Scribd logo
1 de 31
IRIDA: Canada’s federated platform
for genomic epidemiology
William Hsiao, Ph.D.
William.hsiao@bccdc.ca
@wlhsiao
BC Public Health Microbiology and Reference Laboratory
and University of British Columbia
ABPHM 2015
Genome Canada Bioinformatics Competition: Large-Scale Project
“A Federated Bioinformatics Platform for
Public Health Microbial Genomics”
Our Goal
The IRIDA platform
(Integrated Rapid Infectious Disease Analysis)
An open source, standards compliant, high quality genomic epidemiology
analysis platform to support real-time (food-borne) disease outbreak
investigations
2 www.IRIDA.ca
3
Each year, one in eight Canadians (or
four million people)
get sick with a domestically acquired
food-borne illness.
Partnership among public health agencies and academic institutes to bridge the gaps
between advancements in genomic epidemiology and application to real-life and real-
time use cases in public health agencies
- Project Team has direct access to state of the art research in academia
- Project Team is directly embedded in user organization
National
Public Health Agency
Provincial
Public Health Agency
Academic/Public
Interviews with key personnel to identify
barriers to implement genomic epidemiology in
public health agencies
5
GAP 1: PUBLIC HEALTH PERSONNEL
LACK TRAINING IN GENOMICS
Solution 1a: Build a User Friendly, high quality
analysis platform to process genomics data
• Carefully designed and engineered software platform is
just the starting point… User
Interface
Security
File system
Metadata
Storage
Application
logic
REST API
Workflow Execution Manager
Continuous Integration Documentation
• Easy to use interface hiding the technical details
Solution 1a: Build a User Friendly, high quality
analysis platform to process genomics data
Solution 1a: Build a User Friendly, high quality
analysis platform to process genomics data
Solution 1b: Build Portable and Transparent
Pipelines
• Use Galaxy as workflow engine – large
community support
• Retools to address usability, security, and
other limitations
• Version Controlled Pipeline Templates
• Input files, parameters, and workflow are
sent to IRIDA-specific Galaxy for execution
• Results and provenance information are
copied from Galaxy
1. Input
files sent to
Galaxy
3. Results
downloaded
from Galaxy
IRIDA UI/DB
Galaxy
Assembly Tools
Variant Calling Tools
…
REST API
Shared File System
Worker Worker
2. Tools executed
on Galaxy workers
Solution 1c: Start the training NOW!
• Canada’s National Microbiology Laboratory has hosted
genomic workshops for partners and collaborators
• IRIDA Project has dedicated funding for hosting workshops in
4Q of 2015 and 2016
• We would like to hear about other training initiatives and
share experience and training material
GAP 2: INFORMATION SHARING IS
INEFFICIENT AND AD-HOC
Many Players in surveillance and outbreak –
ineffective information sharing
Source: M. Taylor, BCCDC
Provincial public
health dept.
National laboratory
Local public
health dept.
Provincial
laboratory
Cases
Physicians Frontline lab
Information
BioinformaticsandAnalyticalCapacities
Many Systems used in Reporting Diseases –
require data re-entry and re-coding
National Ministry of
Health
Provincial public
health dept.
National laboratory
Local public
health dept.
Provincial
laboratory
Cases
Physicians Local laboratory
Fax/Electronic
Fax
Phone/Fax
Electronic/Paper
Electronic/Fax/Phone
Mailing of
Samples/Fax/Eelctroni
c
Source: M. Taylor, BCCDC
IRIDA is designed with these dilemma in mind
• Solutions:
– 2a: Localized Instance of federated databases
– 2b: Permission Control – authentication /authorization for
information sharing
– 2c: User role-based display of information
Solution 2a: Local/Cloud Instances and Data
Federation
• Data processing capacity pushed to data generating
labs
• Allow data sharing securely for enhanced analysis
• Eventually cultivating a culture of openness of data
sharing and collaborative development of tools
16
Authorization
Solution 2b: Security
• Local authorization per instance.
• Method-level authorization.
• Object-level authorization.
• Allow secure, fine grained and
flexible information sharing
Solution 2c: Role-based Dynamic Display driven
by Ontology
• Ontologies often lack a content management system (CMS)
• An Interface Model Ontology (IFM) can define a CMS for an
ontology
IFM Interface View Permissions
Detailed View Restricted View
E.g. User role permissions control visibility and editing of content
GAP 3: INFORMATION
REPRESENTATION IS INCONSISTENT
Solution 3a: Use Ontology
• Ontology: a way to describe types of entities
and relations between them
• Why use ontology
– Ontology is flexible and expandable
– Lower levels of expressivity (e.g. controlled vocabulary,
data dictionary) are heavy handed and show low level of
compliance and adoption
– Free text used as an alternative that are not computing
friendly
– Ontology and semantic web technologies may be a
solution
Many Domains of Knowledge are needed to describe
an outbreak investigation Build On, Work With:
OBI
TypON
NGSOnto
NIAID-GSC-BRC core metadata
MIxS Ontology
NCBI Biosample etc
TRANS – Pathogen Transmission
EPO
Exposure Ontology
Infectious Disease Ontology
CARD, ARO for AMR
USDA Nutrient DB
EFSA Comp. Food Consump. DB
Example gaps to be filled:
Expand food ontology; expand CARD
AMR data with others.
Lab Checklist/Ontology
• Currently finishing a lab/genomics checklist and
starting an epidemiology checklist
• Metadata Domains:
– Sample Collection
– Sample Source
– Environmental
– Lab Analytics
– Sequencing Process /QC
– Sequencing Run /QC
– Assembly Process / QC
– Others overlapping with Epi: Demographic / Geographic /
etc.
GAP 4: GENOMIC DATA
INTERPRETATION IS COMPLEX AND
TECHNOLOGY IS EVOLVING
Solution 4a: Use of QA/QC in IRIDA
• Software Engineering
– High quality software that meets regulatory guidelines
– Open Source product to ensure “white box” testing
– Ontology driven software development
– Follow proper software development cycle
• Data Quality
– Built-in modules to check for input data quality
– Warnings and Feedbacks during pipeline execution to laboratory technologists
– Use of Ontology to check metadata (non-genomic) data quality
• Analytic Tool Quality
– Utilize validation datasets
– Use of abstract pipeline description – with version control
– Periodic analysis of exceptions and boundary cases to assess tool accuracy
Solution 4b: Generation of validation datasets
To Participate, Contact
Rene Hendriksen
rshe@food.dtu.dk
Or
Errol Strain
Errol.Strain@fda.hhs.gov
http://www.globalmicrobialidentifier.org/Workgroups#work-group-4
Solution 4c: Exploratory tools can access certain
data via REST API securely
27
http://pathogenomics.sfu.ca/islandviewer
IslandViewer
Dhillon and Laird et al. 2015, Nucleic Acids
Research
http://kiwi.cs.dal.ca/GenGIS
Parks et al. 2013, PLoS One
Availability
• Jun 1 2015: IRIDA 1.0 beta Internal Release
– Release to collaborators for installation and full test
• Jul 1 2015: IRIDA 1.0 beta1
– Announce Beta release, download, documentation
available on website – www.irida.ca
• Aug 1 2015: IRIDA 1.0 beta2
– Cloud installer, with documentation
– Additional pipelines as available
– Visualization as available
Acknowledgements
Project Leaders
Fiona Brinkman – SFU
Will Hsiao – PHMRL
Gary Van Domselaar – NML
University of Lisbon
Joᾶo Carriҫo
National Microbiology Laboratory (NML)
Franklin Bristow
Aaron Petkau
Thomas Matthews
Josh Adam
Adam Olson
Tarah Lynch
Shaun Tyler
Philip Mabon
Philip Au
Celine Nadon
Matthew Stuart-Edwards
Morag Graham
Chrystal Berry
Lorelee Tschetter
Aleisha Reimer
Laboratory for Foodborne Zoonoses (LFZ)
Eduardo Taboada
Peter Kruczkiewicz
Chad Laing
Vic Gannon
Matthew Whiteside
Ross Duncan
Steven Mutschall
Simon Fraser University (SFU)
Melanie Courtot
Emma Griffiths
Geoff Winsor
Julie Shay
Matthew Laird
Bhav Dhillon
Raymond Lo
BC Public Health Microbiology &
Reference Laboratory (PHMRL) and BC
Centre for Disease Control (BCCDC)
Judy Isaac-Renton
Patrick Tang
Natalie Prystajecky
Jennifer Gardy
Damion Dooley
Linda Hoang
Kim MacDonald
Yin Chang
Eleni Galanis
Marsha Taylor
Cletus D’Souza
Ana Paccagnella
University of Maryland
Lynn Schriml
Canadian Food Inspection Agency (CFIA)
Burton Blais
Catherine Carrillo
Dominic Lambert
Dalhousie University
Rob Beiko
Alex Keddy
29
McMaster University
Andrew McArthur
Daim Sardar
European Nucleotide Archive
Guy Cochrane
Petra ten Hoopen
Clara Amid
European Food Safety Agency
Leibana Criado Ernesto
Vernazza Francesco
Rizzi Valentina
30
30
IRIDA Annual General Meeting
Winnipeg, April 8-9, 2015
The IRIDA platform
(Integrated Rapid Infectious Disease Analysis)
An open source, standards compliant, high quality genomic epidemiology
analysis platform to support real-time (food-borne) disease outbreak
investigations
Contacts:
William.hsiao@bccdc.ca
@wlhsiao
31 www.IRIDA.ca

Mais conteúdo relacionado

Mais procurados

Semantic approaches for biomedical knowledge discovery - Discovery Science 20...
Semantic approaches for biomedical knowledge discovery - Discovery Science 20...Semantic approaches for biomedical knowledge discovery - Discovery Science 20...
Semantic approaches for biomedical knowledge discovery - Discovery Science 20...Michel Dumontier
 
2016 ACS Semantic Approaches for Biochemical Knowledge Discovery
2016 ACS Semantic Approaches for Biochemical Knowledge Discovery2016 ACS Semantic Approaches for Biochemical Knowledge Discovery
2016 ACS Semantic Approaches for Biochemical Knowledge DiscoveryMichel Dumontier
 
2014 agbt giab_progress update
2014 agbt giab_progress update2014 agbt giab_progress update
2014 agbt giab_progress updateGenomeInABottle
 
Metadata Analyser: measuring metadata quality
Metadata Analyser: measuring metadata qualityMetadata Analyser: measuring metadata quality
Metadata Analyser: measuring metadata qualityFrancisco Couto
 
Howe et al. - 2015 - BioAssay Research Database (BARD) chemical biolog
Howe et al. - 2015 - BioAssay Research Database (BARD) chemical biologHowe et al. - 2015 - BioAssay Research Database (BARD) chemical biolog
Howe et al. - 2015 - BioAssay Research Database (BARD) chemical biologEleanor Howe
 
Link Analysis of Life Sciences Linked Data
Link Analysis of Life Sciences Linked DataLink Analysis of Life Sciences Linked Data
Link Analysis of Life Sciences Linked DataMichel Dumontier
 
Open Drug Discovery Teams Feature Overview
Open Drug Discovery Teams Feature OverviewOpen Drug Discovery Teams Feature Overview
Open Drug Discovery Teams Feature OverviewAlex Clark
 
Gene Ontology Enrichment Network Analysis -Tutorial
Gene Ontology Enrichment Network Analysis -TutorialGene Ontology Enrichment Network Analysis -Tutorial
Gene Ontology Enrichment Network Analysis -TutorialDmitry Grapov
 
Improving surveillance and early detection of Foot-and-mouth And Similar Tran...
Improving surveillance and early detection of Foot-and-mouth And Similar Tran...Improving surveillance and early detection of Foot-and-mouth And Similar Tran...
Improving surveillance and early detection of Foot-and-mouth And Similar Tran...EuFMD
 
Genome sharing projects around the world nijmegen oct 29 - 2015
Genome sharing projects around the world   nijmegen oct 29 - 2015Genome sharing projects around the world   nijmegen oct 29 - 2015
Genome sharing projects around the world nijmegen oct 29 - 2015Fiona Nielsen
 
BIOLINK 2008: Linking database submissions to primary citations with PubMe...
BIOLINK 2008:    Linking database submissions to primary citations with PubMe...BIOLINK 2008:    Linking database submissions to primary citations with PubMe...
BIOLINK 2008: Linking database submissions to primary citations with PubMe...Heather Piwowar
 
GenomeTrakr: Whole-Genome Sequencing for Food Safety and A New Way Forward in...
GenomeTrakr: Whole-Genome Sequencing for Food Safety and A New Way Forward in...GenomeTrakr: Whole-Genome Sequencing for Food Safety and A New Way Forward in...
GenomeTrakr: Whole-Genome Sequencing for Food Safety and A New Way Forward in...ExternalEvents
 
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...GigaScience, BGI Hong Kong
 
Scott Edmunds Open data examples, from the Science as an Open Enterprise sess...
Scott Edmunds Open data examples, from the Science as an Open Enterprise sess...Scott Edmunds Open data examples, from the Science as an Open Enterprise sess...
Scott Edmunds Open data examples, from the Science as an Open Enterprise sess...GigaScience, BGI Hong Kong
 
Using ADAGE for pathway-style analyses
Using ADAGE for pathway-style analysesUsing ADAGE for pathway-style analyses
Using ADAGE for pathway-style analysesCasey Greene
 
Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...
Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...
Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...ExternalEvents
 

Mais procurados (20)

Semantic approaches for biomedical knowledge discovery - Discovery Science 20...
Semantic approaches for biomedical knowledge discovery - Discovery Science 20...Semantic approaches for biomedical knowledge discovery - Discovery Science 20...
Semantic approaches for biomedical knowledge discovery - Discovery Science 20...
 
2016 ACS Semantic Approaches for Biochemical Knowledge Discovery
2016 ACS Semantic Approaches for Biochemical Knowledge Discovery2016 ACS Semantic Approaches for Biochemical Knowledge Discovery
2016 ACS Semantic Approaches for Biochemical Knowledge Discovery
 
2016 bmdid-mappings
2016 bmdid-mappings2016 bmdid-mappings
2016 bmdid-mappings
 
2014 agbt giab_progress update
2014 agbt giab_progress update2014 agbt giab_progress update
2014 agbt giab_progress update
 
Metadata Analyser: measuring metadata quality
Metadata Analyser: measuring metadata qualityMetadata Analyser: measuring metadata quality
Metadata Analyser: measuring metadata quality
 
Howe et al. - 2015 - BioAssay Research Database (BARD) chemical biolog
Howe et al. - 2015 - BioAssay Research Database (BARD) chemical biologHowe et al. - 2015 - BioAssay Research Database (BARD) chemical biolog
Howe et al. - 2015 - BioAssay Research Database (BARD) chemical biolog
 
Link Analysis of Life Sciences Linked Data
Link Analysis of Life Sciences Linked DataLink Analysis of Life Sciences Linked Data
Link Analysis of Life Sciences Linked Data
 
In sillico 2 send
In sillico 2 sendIn sillico 2 send
In sillico 2 send
 
David Tyrpak CV
David Tyrpak CVDavid Tyrpak CV
David Tyrpak CV
 
Open Drug Discovery Teams Feature Overview
Open Drug Discovery Teams Feature OverviewOpen Drug Discovery Teams Feature Overview
Open Drug Discovery Teams Feature Overview
 
Gene Ontology Enrichment Network Analysis -Tutorial
Gene Ontology Enrichment Network Analysis -TutorialGene Ontology Enrichment Network Analysis -Tutorial
Gene Ontology Enrichment Network Analysis -Tutorial
 
Improving surveillance and early detection of Foot-and-mouth And Similar Tran...
Improving surveillance and early detection of Foot-and-mouth And Similar Tran...Improving surveillance and early detection of Foot-and-mouth And Similar Tran...
Improving surveillance and early detection of Foot-and-mouth And Similar Tran...
 
Genome sharing projects around the world nijmegen oct 29 - 2015
Genome sharing projects around the world   nijmegen oct 29 - 2015Genome sharing projects around the world   nijmegen oct 29 - 2015
Genome sharing projects around the world nijmegen oct 29 - 2015
 
BIOLINK 2008: Linking database submissions to primary citations with PubMe...
BIOLINK 2008:    Linking database submissions to primary citations with PubMe...BIOLINK 2008:    Linking database submissions to primary citations with PubMe...
BIOLINK 2008: Linking database submissions to primary citations with PubMe...
 
David Tyrpak CV
David Tyrpak CVDavid Tyrpak CV
David Tyrpak CV
 
GenomeTrakr: Whole-Genome Sequencing for Food Safety and A New Way Forward in...
GenomeTrakr: Whole-Genome Sequencing for Food Safety and A New Way Forward in...GenomeTrakr: Whole-Genome Sequencing for Food Safety and A New Way Forward in...
GenomeTrakr: Whole-Genome Sequencing for Food Safety and A New Way Forward in...
 
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...
 
Scott Edmunds Open data examples, from the Science as an Open Enterprise sess...
Scott Edmunds Open data examples, from the Science as an Open Enterprise sess...Scott Edmunds Open data examples, from the Science as an Open Enterprise sess...
Scott Edmunds Open data examples, from the Science as an Open Enterprise sess...
 
Using ADAGE for pathway-style analyses
Using ADAGE for pathway-style analysesUsing ADAGE for pathway-style analyses
Using ADAGE for pathway-style analyses
 
Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...
Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...
Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...
 

Destaque (12)

Beyond the PDF 2, 2013
Beyond the PDF 2, 2013Beyond the PDF 2, 2013
Beyond the PDF 2, 2013
 
Drug Discovery- ELRIG -2012
Drug Discovery- ELRIG -2012Drug Discovery- ELRIG -2012
Drug Discovery- ELRIG -2012
 
NETTAB 2013
NETTAB 2013NETTAB 2013
NETTAB 2013
 
OpenTox Europe 2013
OpenTox Europe 2013OpenTox Europe 2013
OpenTox Europe 2013
 
BioSharing.org - mapping the landscape of community standards, databases, dat...
BioSharing.org - mapping the landscape of community standards, databases, dat...BioSharing.org - mapping the landscape of community standards, databases, dat...
BioSharing.org - mapping the landscape of community standards, databases, dat...
 
4A2B2C-2013
4A2B2C-20134A2B2C-2013
4A2B2C-2013
 
CSHALS 2013
CSHALS 2013CSHALS 2013
CSHALS 2013
 
UKON 2014
UKON 2014UKON 2014
UKON 2014
 
NETTAB 2012
NETTAB 2012NETTAB 2012
NETTAB 2012
 
ISMB Workshop 2014
ISMB Workshop 2014ISMB Workshop 2014
ISMB Workshop 2014
 
Ontomaton icbo2013-alternative order-t_wv3
Ontomaton icbo2013-alternative order-t_wv3Ontomaton icbo2013-alternative order-t_wv3
Ontomaton icbo2013-alternative order-t_wv3
 
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
 

Semelhante a IRIDA: Canada’s federated platform for genomic epidemiology

Grand round whsiao_may2015
Grand round whsiao_may2015Grand round whsiao_may2015
Grand round whsiao_may2015IRIDA_community
 
IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiolo...
IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiolo...IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiolo...
IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiolo...William Hsiao
 
Domselaar GMI8 Beijing Canadian WGS Surveillance Experience
Domselaar GMI8 Beijing Canadian WGS Surveillance ExperienceDomselaar GMI8 Beijing Canadian WGS Surveillance Experience
Domselaar GMI8 Beijing Canadian WGS Surveillance ExperienceIRIDA_community
 
Workshop finding and accessing data - fiona nadia charlotte - cambridge apr...
Workshop   finding and accessing data - fiona nadia charlotte - cambridge apr...Workshop   finding and accessing data - fiona nadia charlotte - cambridge apr...
Workshop finding and accessing data - fiona nadia charlotte - cambridge apr...Fiona Nielsen
 
Reproducible research: theory
Reproducible research: theoryReproducible research: theory
Reproducible research: theoryC. Tobin Magle
 
SciDataCon - How to increase accessibility and reuse for clinical and persona...
SciDataCon - How to increase accessibility and reuse for clinical and persona...SciDataCon - How to increase accessibility and reuse for clinical and persona...
SciDataCon - How to increase accessibility and reuse for clinical and persona...Fiona Nielsen
 
Finding and Accessing Human Genomics Datasets
Finding and Accessing Human Genomics DatasetsFinding and Accessing Human Genomics Datasets
Finding and Accessing Human Genomics DatasetsManuel Corpas
 
Toward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and Present
Toward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and PresentToward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and Present
Toward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and PresentTim Williams
 
CINECA webinar slides: Open science through fair health data networks dream o...
CINECA webinar slides: Open science through fair health data networks dream o...CINECA webinar slides: Open science through fair health data networks dream o...
CINECA webinar slides: Open science through fair health data networks dream o...CINECAProject
 
Data Virtualization Modernizes Biobanking
Data Virtualization Modernizes BiobankingData Virtualization Modernizes Biobanking
Data Virtualization Modernizes BiobankingDenodo
 
Cheminformatics Workflows Using Mobile Apps for Drug Discovery
Cheminformatics Workflows Using Mobile Apps for Drug DiscoveryCheminformatics Workflows Using Mobile Apps for Drug Discovery
Cheminformatics Workflows Using Mobile Apps for Drug DiscoverySean Ekins
 
Data at the NIH: Some Early Thoughts
Data at the NIH: Some Early ThoughtsData at the NIH: Some Early Thoughts
Data at the NIH: Some Early ThoughtsPhilip Bourne
 
cBioPortal Webinar Slides (3/3)
cBioPortal Webinar Slides (3/3)cBioPortal Webinar Slides (3/3)
cBioPortal Webinar Slides (3/3)Pistoia Alliance
 
Sbm open science committee report to the board
Sbm open science committee report to the boardSbm open science committee report to the board
Sbm open science committee report to the boardBradford Hesse
 
PhRMA Some Early Thoughts
PhRMA Some Early ThoughtsPhRMA Some Early Thoughts
PhRMA Some Early ThoughtsPhilip Bourne
 
IFD&TC 2018: A Novel Approach for Conveniently and Securely Collecting Person...
IFD&TC 2018: A Novel Approach for Conveniently and Securely Collecting Person...IFD&TC 2018: A Novel Approach for Conveniently and Securely Collecting Person...
IFD&TC 2018: A Novel Approach for Conveniently and Securely Collecting Person...Lew Berman
 

Semelhante a IRIDA: Canada’s federated platform for genomic epidemiology (20)

Grand round whsiao_may2015
Grand round whsiao_may2015Grand round whsiao_may2015
Grand round whsiao_may2015
 
IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiolo...
IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiolo...IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiolo...
IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiolo...
 
Domselaar GMI8 Beijing Canadian WGS Surveillance Experience
Domselaar GMI8 Beijing Canadian WGS Surveillance ExperienceDomselaar GMI8 Beijing Canadian WGS Surveillance Experience
Domselaar GMI8 Beijing Canadian WGS Surveillance Experience
 
Workshop finding and accessing data - fiona nadia charlotte - cambridge apr...
Workshop   finding and accessing data - fiona nadia charlotte - cambridge apr...Workshop   finding and accessing data - fiona nadia charlotte - cambridge apr...
Workshop finding and accessing data - fiona nadia charlotte - cambridge apr...
 
Reproducible research: theory
Reproducible research: theoryReproducible research: theory
Reproducible research: theory
 
SciDataCon - How to increase accessibility and reuse for clinical and persona...
SciDataCon - How to increase accessibility and reuse for clinical and persona...SciDataCon - How to increase accessibility and reuse for clinical and persona...
SciDataCon - How to increase accessibility and reuse for clinical and persona...
 
Open Access as a Means to Produce High Quality Data
Open Access as a Means to Produce High Quality DataOpen Access as a Means to Produce High Quality Data
Open Access as a Means to Produce High Quality Data
 
Finding and Accessing Human Genomics Datasets
Finding and Accessing Human Genomics DatasetsFinding and Accessing Human Genomics Datasets
Finding and Accessing Human Genomics Datasets
 
Toward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and Present
Toward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and PresentToward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and Present
Toward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and Present
 
CINECA webinar slides: Open science through fair health data networks dream o...
CINECA webinar slides: Open science through fair health data networks dream o...CINECA webinar slides: Open science through fair health data networks dream o...
CINECA webinar slides: Open science through fair health data networks dream o...
 
Data Virtualization Modernizes Biobanking
Data Virtualization Modernizes BiobankingData Virtualization Modernizes Biobanking
Data Virtualization Modernizes Biobanking
 
Cheminformatics Workflows Using Mobile Apps for Drug Discovery
Cheminformatics Workflows Using Mobile Apps for Drug DiscoveryCheminformatics Workflows Using Mobile Apps for Drug Discovery
Cheminformatics Workflows Using Mobile Apps for Drug Discovery
 
Irida immemxi hsiao
Irida immemxi hsiaoIrida immemxi hsiao
Irida immemxi hsiao
 
Ilik - Beyond the Manuscript: Using IRs for Non Traditional Content Types
Ilik - Beyond the Manuscript: Using IRs for Non Traditional Content TypesIlik - Beyond the Manuscript: Using IRs for Non Traditional Content Types
Ilik - Beyond the Manuscript: Using IRs for Non Traditional Content Types
 
Data at the NIH: Some Early Thoughts
Data at the NIH: Some Early ThoughtsData at the NIH: Some Early Thoughts
Data at the NIH: Some Early Thoughts
 
cBioPortal Webinar Slides (3/3)
cBioPortal Webinar Slides (3/3)cBioPortal Webinar Slides (3/3)
cBioPortal Webinar Slides (3/3)
 
Some Early Thoughts
Some Early ThoughtsSome Early Thoughts
Some Early Thoughts
 
Sbm open science committee report to the board
Sbm open science committee report to the boardSbm open science committee report to the board
Sbm open science committee report to the board
 
PhRMA Some Early Thoughts
PhRMA Some Early ThoughtsPhRMA Some Early Thoughts
PhRMA Some Early Thoughts
 
IFD&TC 2018: A Novel Approach for Conveniently and Securely Collecting Person...
IFD&TC 2018: A Novel Approach for Conveniently and Securely Collecting Person...IFD&TC 2018: A Novel Approach for Conveniently and Securely Collecting Person...
IFD&TC 2018: A Novel Approach for Conveniently and Securely Collecting Person...
 

Último

Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINsankalpkumarsahoo174
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSSLeenakshiTyagi
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 

Último (20)

Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 

IRIDA: Canada’s federated platform for genomic epidemiology

  • 1. IRIDA: Canada’s federated platform for genomic epidemiology William Hsiao, Ph.D. William.hsiao@bccdc.ca @wlhsiao BC Public Health Microbiology and Reference Laboratory and University of British Columbia ABPHM 2015
  • 2. Genome Canada Bioinformatics Competition: Large-Scale Project “A Federated Bioinformatics Platform for Public Health Microbial Genomics” Our Goal The IRIDA platform (Integrated Rapid Infectious Disease Analysis) An open source, standards compliant, high quality genomic epidemiology analysis platform to support real-time (food-borne) disease outbreak investigations 2 www.IRIDA.ca
  • 3. 3 Each year, one in eight Canadians (or four million people) get sick with a domestically acquired food-borne illness.
  • 4. Partnership among public health agencies and academic institutes to bridge the gaps between advancements in genomic epidemiology and application to real-life and real- time use cases in public health agencies - Project Team has direct access to state of the art research in academia - Project Team is directly embedded in user organization National Public Health Agency Provincial Public Health Agency Academic/Public
  • 5. Interviews with key personnel to identify barriers to implement genomic epidemiology in public health agencies 5
  • 6. GAP 1: PUBLIC HEALTH PERSONNEL LACK TRAINING IN GENOMICS
  • 7. Solution 1a: Build a User Friendly, high quality analysis platform to process genomics data • Carefully designed and engineered software platform is just the starting point… User Interface Security File system Metadata Storage Application logic REST API Workflow Execution Manager Continuous Integration Documentation
  • 8. • Easy to use interface hiding the technical details Solution 1a: Build a User Friendly, high quality analysis platform to process genomics data
  • 9. Solution 1a: Build a User Friendly, high quality analysis platform to process genomics data
  • 10. Solution 1b: Build Portable and Transparent Pipelines • Use Galaxy as workflow engine – large community support • Retools to address usability, security, and other limitations • Version Controlled Pipeline Templates • Input files, parameters, and workflow are sent to IRIDA-specific Galaxy for execution • Results and provenance information are copied from Galaxy 1. Input files sent to Galaxy 3. Results downloaded from Galaxy IRIDA UI/DB Galaxy Assembly Tools Variant Calling Tools … REST API Shared File System Worker Worker 2. Tools executed on Galaxy workers
  • 11. Solution 1c: Start the training NOW! • Canada’s National Microbiology Laboratory has hosted genomic workshops for partners and collaborators • IRIDA Project has dedicated funding for hosting workshops in 4Q of 2015 and 2016 • We would like to hear about other training initiatives and share experience and training material
  • 12. GAP 2: INFORMATION SHARING IS INEFFICIENT AND AD-HOC
  • 13. Many Players in surveillance and outbreak – ineffective information sharing Source: M. Taylor, BCCDC Provincial public health dept. National laboratory Local public health dept. Provincial laboratory Cases Physicians Frontline lab Information BioinformaticsandAnalyticalCapacities
  • 14. Many Systems used in Reporting Diseases – require data re-entry and re-coding National Ministry of Health Provincial public health dept. National laboratory Local public health dept. Provincial laboratory Cases Physicians Local laboratory Fax/Electronic Fax Phone/Fax Electronic/Paper Electronic/Fax/Phone Mailing of Samples/Fax/Eelctroni c Source: M. Taylor, BCCDC
  • 15. IRIDA is designed with these dilemma in mind • Solutions: – 2a: Localized Instance of federated databases – 2b: Permission Control – authentication /authorization for information sharing – 2c: User role-based display of information
  • 16. Solution 2a: Local/Cloud Instances and Data Federation • Data processing capacity pushed to data generating labs • Allow data sharing securely for enhanced analysis • Eventually cultivating a culture of openness of data sharing and collaborative development of tools 16
  • 17. Authorization Solution 2b: Security • Local authorization per instance. • Method-level authorization. • Object-level authorization. • Allow secure, fine grained and flexible information sharing
  • 18. Solution 2c: Role-based Dynamic Display driven by Ontology • Ontologies often lack a content management system (CMS) • An Interface Model Ontology (IFM) can define a CMS for an ontology
  • 19. IFM Interface View Permissions Detailed View Restricted View E.g. User role permissions control visibility and editing of content
  • 21. Solution 3a: Use Ontology • Ontology: a way to describe types of entities and relations between them • Why use ontology – Ontology is flexible and expandable – Lower levels of expressivity (e.g. controlled vocabulary, data dictionary) are heavy handed and show low level of compliance and adoption – Free text used as an alternative that are not computing friendly – Ontology and semantic web technologies may be a solution
  • 22. Many Domains of Knowledge are needed to describe an outbreak investigation Build On, Work With: OBI TypON NGSOnto NIAID-GSC-BRC core metadata MIxS Ontology NCBI Biosample etc TRANS – Pathogen Transmission EPO Exposure Ontology Infectious Disease Ontology CARD, ARO for AMR USDA Nutrient DB EFSA Comp. Food Consump. DB Example gaps to be filled: Expand food ontology; expand CARD AMR data with others.
  • 23. Lab Checklist/Ontology • Currently finishing a lab/genomics checklist and starting an epidemiology checklist • Metadata Domains: – Sample Collection – Sample Source – Environmental – Lab Analytics – Sequencing Process /QC – Sequencing Run /QC – Assembly Process / QC – Others overlapping with Epi: Demographic / Geographic / etc.
  • 24. GAP 4: GENOMIC DATA INTERPRETATION IS COMPLEX AND TECHNOLOGY IS EVOLVING
  • 25. Solution 4a: Use of QA/QC in IRIDA • Software Engineering – High quality software that meets regulatory guidelines – Open Source product to ensure “white box” testing – Ontology driven software development – Follow proper software development cycle • Data Quality – Built-in modules to check for input data quality – Warnings and Feedbacks during pipeline execution to laboratory technologists – Use of Ontology to check metadata (non-genomic) data quality • Analytic Tool Quality – Utilize validation datasets – Use of abstract pipeline description – with version control – Periodic analysis of exceptions and boundary cases to assess tool accuracy
  • 26. Solution 4b: Generation of validation datasets To Participate, Contact Rene Hendriksen rshe@food.dtu.dk Or Errol Strain Errol.Strain@fda.hhs.gov http://www.globalmicrobialidentifier.org/Workgroups#work-group-4
  • 27. Solution 4c: Exploratory tools can access certain data via REST API securely 27 http://pathogenomics.sfu.ca/islandviewer IslandViewer Dhillon and Laird et al. 2015, Nucleic Acids Research http://kiwi.cs.dal.ca/GenGIS Parks et al. 2013, PLoS One
  • 28. Availability • Jun 1 2015: IRIDA 1.0 beta Internal Release – Release to collaborators for installation and full test • Jul 1 2015: IRIDA 1.0 beta1 – Announce Beta release, download, documentation available on website – www.irida.ca • Aug 1 2015: IRIDA 1.0 beta2 – Cloud installer, with documentation – Additional pipelines as available – Visualization as available
  • 29. Acknowledgements Project Leaders Fiona Brinkman – SFU Will Hsiao – PHMRL Gary Van Domselaar – NML University of Lisbon Joᾶo Carriҫo National Microbiology Laboratory (NML) Franklin Bristow Aaron Petkau Thomas Matthews Josh Adam Adam Olson Tarah Lynch Shaun Tyler Philip Mabon Philip Au Celine Nadon Matthew Stuart-Edwards Morag Graham Chrystal Berry Lorelee Tschetter Aleisha Reimer Laboratory for Foodborne Zoonoses (LFZ) Eduardo Taboada Peter Kruczkiewicz Chad Laing Vic Gannon Matthew Whiteside Ross Duncan Steven Mutschall Simon Fraser University (SFU) Melanie Courtot Emma Griffiths Geoff Winsor Julie Shay Matthew Laird Bhav Dhillon Raymond Lo BC Public Health Microbiology & Reference Laboratory (PHMRL) and BC Centre for Disease Control (BCCDC) Judy Isaac-Renton Patrick Tang Natalie Prystajecky Jennifer Gardy Damion Dooley Linda Hoang Kim MacDonald Yin Chang Eleni Galanis Marsha Taylor Cletus D’Souza Ana Paccagnella University of Maryland Lynn Schriml Canadian Food Inspection Agency (CFIA) Burton Blais Catherine Carrillo Dominic Lambert Dalhousie University Rob Beiko Alex Keddy 29 McMaster University Andrew McArthur Daim Sardar European Nucleotide Archive Guy Cochrane Petra ten Hoopen Clara Amid European Food Safety Agency Leibana Criado Ernesto Vernazza Francesco Rizzi Valentina
  • 30. 30 30 IRIDA Annual General Meeting Winnipeg, April 8-9, 2015
  • 31. The IRIDA platform (Integrated Rapid Infectious Disease Analysis) An open source, standards compliant, high quality genomic epidemiology analysis platform to support real-time (food-borne) disease outbreak investigations Contacts: William.hsiao@bccdc.ca @wlhsiao 31 www.IRIDA.ca

Notas do Editor

  1. Today, I’d like to tell you a bit about some of Canada’s effort on building a genomic epidemiology analysis platform
  2. IRIDA was conceived about 2 years ago through a Genome Canada Bioinformatics Grant. It is an effort to build an open source, standards compliant, high quality genomic epidemiology analysis platform to support real-time disease outbreak investigations, initially focused on food-borne illnesses
  3. Despite our high standard in food safety, each year 1 in eight Canadian get food poisoning, costing the economy $4 billion dollars. It is important to track the source and spread of the disease to prevent further sickness
  4. IRIDA is partnership among provincial public health agencies, national public health agencies and academic institutes to bridge the gaps between advancements in genomic epidemiology and real-life and real-time use cases in public health agencies Project Team has direct access to state of the art research in academia Project Team is directly embedded in user organization
  5. Since we have access to the end users, we conducted interviews with these subject experts to identify what are the barriers for up-taking of genomics epidemiology in public health agencies. We interviewed epidemiologists, lab scientists and technologists, medical microbiologists and lab administrators. So for the rest of the presentation, I’ll talk about some of the gaps we identified and how IRIDA can meet the requirements.
  6. The first gap which should not be a surprise to this audience, is that public health workers are mostly unfamiliar with genomics and the bioinformatics analysis needed to process and interpret genomic data
  7. While we do believe in the long run, adequate training in genomics is needed to bridge this gap and in the short term, experts such as yourselves are needed as stopgaps, having high quality analysis platform to automate data processing and has consistent analysis protocols will help to ease the transition. However, carefully designed and engineered software platform is just the starting point and there will no doubt be many similar platforms to choose from. So I will touch on some of the more interesting design philosophies we have for IRIDA.
  8. We found that in the diagnostic testing world, complex procedures with lots of options lead to more human errors and more non-compliance. So, one design solution that we stress on is to have a simple user interface that hides the technical details. This solution of course can’t stand on its own and I’ll describe measures to ensure that flexibility and scientific rigors can be maintained
  9. We think a user interface should be like a joke… If you have to explain it , then it’s not good. That said, we do have extensive documentations for the administrators and accreditation auditors who don’t like jokes :P
  10. Next solution is to leverage Galaxy which has a large community support and user base as our pipeline engine. We had to retool Galaxy extensively to address usability, security and other limitations. To achieve this we build IRIDA platform on top of the Galaxy engine where input files, parameters and workflows are sent to Galaxy for execution and results and pipeline provenance information are copied back into the IRIDA database for
  11. To address the knowledge gaps in genomics, we have started training our public health lab workers on genomic analysis. We would like to hear about other training initiatives and will be happy to share our experience and training material
  12. The second gap that we identified is that sharing of information within and between organizations are highly inefficient and often involves sharing of Excel files with deleted columns to hide sensitive information
  13. There are many players involved in infectious disease surveillance and outbreak investigation. However, concerns with privacy and confidentiality (both founded and unfounded) means that information tend to be aggregated and lost as we move from the frontline labs to public health and reference labs. However the bioinformatics and analytical capacities are the most abundant in central labs and academia
  14. Moreover, different institutions have different software and often data is exported and printed, faxed, then re-imported to a new system by re-typing! This is a huge waste of time and source of errors
  15. IRIDA has a few designs to deal with these issues, and I’ll highlight 3 here.
  16. First we propose that we should push the data processing capacity to the periphery where data is the richest by encouraging local or private cloud instances of the IRIDA platform. This way our partners would not be obligated to give up their data. The different instances are connected via a federated database schema. Data can then be shared securely and easily to allow enhanced analysis to be done by genomic experts located centrally. The more we share successfully, the more likely people will realize the benefit in sharing and this can lead to a new culture of openness
  17. Second we have built-in mechanisms for authentication and authorization at different levels to allow secure and fine grained information sharing. This would allow parties to customize the data they share per material and data transfer agreements
  18. Third, we realized we need to have a flexible user interface to present the data. Therefore, we are in the process of developing an interface model ontology which defines a content management system.
  19. As an example, based on the user’s role, they will be able to see the content of the database displayed differently.
  20. The third gap we identified is that information representation is inconsistent across organizations
  21. Given the richness and complexity of genomic epidemiological data, we opt to use and develop ontologies compliant with OBO Foundry to describe the data; Currently, lower levels of expressivity such as controlled vocabularies and data dictionaries are used but they tend to be heavy handed and show low level of compliance and adoption. We believe ontology and semantic web technologies can make data sharing across heterogeneous systems and platforms more tractable.
  22. There are many domains of knowledge needed to describe an outbreak investigation and we strive to re-use existing standards as much as possible
  23. Currently we are finishing a lab/genomic checklist and will be starting an epidemiology checklist soon
  24. Lastly, Jon and others mentioned yesterday, genomic data interpretation is complex and the technology is still evolving, yet in the world of diagnostic lab, accreditation means standardized protocols need to be developed
  25. So we focus quite a bit of our energy on developing high quality software with build-in QA and QC components to assess data quality and analytic tool performance
  26. I also want to highlight GMI’s WG4’s effort in developing proficiency tests for wet lab and analysis pipelines. To participate you can contact Rene or Errol.
  27. To facilitate tool improvement and to allow exploratory analysis not part of IRIDA pipelines to be done, we would also allow pre-authorized tools to connect to IRIDA via a REST API securely. Currently we have two external tools for genomic island detection and phylogeography analysis.
  28. The software will be released to a few international collaborators for full testing by Jun 1. Then in Jul 1, we plan to release the beta version publicly so people can try it out. Of course the software will be free and we would love to collaborate with people on both the software and the ontology development.
  29. Large Group of People who contributed to this work
  30. We also have a wonderful group of advisors