SlideShare uma empresa Scribd logo
1 de 16
Baixar para ler offline
data - science - life
Considerations & Challenges in
Building an End-to-End
Microbiome Workflow
data - science - life
Craig earned his Ph.D. in Microbology at the University of Warwick, he then
completed his Postdoctoral research at the John Innes Centre before joining EBI
and developing their bioinformatics capability. Dr McAnulla has extensive
experience in developing large scale pipelines for analyzing microbiome
datasets. His skills cover both software and biology and he often provides deep
insights to industry experts on these topics.
Presenters
Raminderpal earned his PhD in the semiconductor industry where he spent a
decade building advanced technologies for semiconductor modelling and
data mining. In 2012, Dr Singh took the helm as the Business Development
Executive for IBM Research’s genomics program, where he developed and led
the execution of the go-to-market and business plan for the IBM Watson
Genomics program. In addition to many published papers, Dr Singh has two
books and twelve patents.
Dr Craig McAnulla
Senior Consultant Bioinformatics at Eagle Genomics
Dr Raminderpal Singh
Vice President Business Development at Eagle Genomics
Please Tweet your questions
@eaglegen or enter them
into GoTo Webinar
Agenda
1. Considerations
2. Challenges
3. Industrialized Microbiome Workflow & Collaboration
Environment
4. Deployment of a Collaboration System
data - science - life
data - science - life
Considerations for successful microbiome data management and analysis:
1. Intrinsic factors
2. The relatively early maturity of the field
3. Calculation of business value
1. Considerations
data - science - life
Intrinsic
• Microbiomes are complex and diverse biological systems
• Many components – organisms, genes, pathways, metabolites etc
• More DNA/genes than human genome
• Taxonomic diversity
• Many sources of error
• Appropriate study design and metadata collection are crucial.
1. Considerations - Intrinsic
data - science - life
Maturity
• Changing sequencing technologies
• Move from amplicon-based to shotgun
• Rapidly increasing data volumes
• Limited reference data
• Fast-moving software development
• Future data re-analysis with improved methods must be considered.
1. Considerations - Maturity
data - science - life
1. Considerations – Business value
Business value
• Commercial exploitation of the microbiome is in its infancy, and still at an exploratory phase.
• Many research/findings remain controversial and some early claims on the gut microbiome have not
been replicated.
• The true potential of microbiome data is still to be tapped, and some novel experimental designs are
being developed that result in maximum value being generated.
data - science - life
The analysis-at-scale of microbiomics data faces several challenges:
• Metadata management
• Compute resources necessary for storage and analysis
• Statistical techniques
Metadata Management
• Crucial to analysis
• Often done poorly
• Needs to encompass all relevant information
• Check for batch effects
2. Challenges
data - science - life
2. Challenges
Data Storage and Compute Resources
• Shotgun metagenomics example study - 1.7 billion paired-end Illumina reads, 266.5 Gb.
• Bigger data volume than the human genome at 30x coverage.
• Study size is increasing
How to store this?
• Analysis selection critical
• Analyses MUST work with these data volumes
• eg Kraken, Humann2
• Compute resources
• Eg Kraken, multicore machine > 100Gigs
• Solution – the cloud
• AWS, Azure
data - science - life
Statistical Techniques
• Data normalization
• rarefaction, normalization, log
transformation?
• Statistical techniques
• parametric, nonparametric?
• Confusion
• No correct answer
• Depends on the data
• function vs taxonomy
• shotgun vs amplicon
2. Challenges
data - science - life
• Eagle Genomics leads the industry in
configuring and deploying best-in-class
microbiome workflow solutions.
• These run large-scale parallelized
pipelines cost-efficiently, with seamless
connectivity between the source data
and the scientists.
3. Industrialized Microbiome
Workflow & Collaboration
Environment
www.eaglegenomics.com
“Unilever’s digital data program now
processes genetic sequences twenty
times faster - without incurring higher
compute costs. In addition, its robust
architecture supports ten times as many
scientists, all working simultaneously.”
Pete Keeley, eScience Technical Lead - R&D IT
at Unilever
data - science - life
organization
3. Industrialized Microbiome
Workflow & Collaboration
Environment
data - science - life
MICROBIOME
DATA FILES
(amplicon or shotgun)
METADATA: eaglecatalog (implements security model)
QUERY/
FILTERING:
eaglecatalog
DATA
SHARING
CRO’s
DEPOSIT
DATA
ELN
EXPERIMENT
METADATA
INTEGRATION
WITH OMICS
(RNASeq, proteomics)
FEDERATED DATA &
RESULTS
INTEGRATION
TAXONOMY/
FUNCTION
ANALYSIS
REPORTS
R,QIIME
POPULATION
COHORTS
VISUALIZATION
CLOUD STORAGE
eaglecatalog
COLLABORATIVE
ENVIRONMENT
FOR PARTNERS
FEEDBACK TO
EXPERIMENTAL
DESIGN
COST OF
EXPERIMENT
DATA COLLABORATION ARCHITECTURE
Solving the metadata problem
data - science - life
4. Deployment of a Collaboration
System
Eagle Genomics Proposal
Accelerating metadata adoption and scale up
• Offering a new rapid deployment for microbiome R&D organizations looking to
cost-efficiently build world-class integrated workflows.
• Set up of an initial collaboration system for your team.
• 1 to 4 weeks
• Enabling self-load your data to the catalog – leading to a standardized and
streamlined workflow, which is ready for future scale out
• Managing internal data, partner data, or public data
data - science - life
Current Situation Desired State
Negative Consequences
of Non-action
Positive Business Impact,
using Eagle software
Data in silos, not reused
Easily searchable, discoverable,
accessible and intuitive portal by
researchers to all data and results
Significant researcher time spent in
“data wrangling”. Breakage
pending, based on projected
bottleneck
Improved time and ability to insight
through radically improved productivity
of researchers
No correlation between
experiments
Quick and easy exploratory analyses
across experiments and data
Missed opportunities for insights.
Failure to leverage data as an asset
Faster identification and generation of
successful product innovation and
insight, leading to reduced time to
market
No unified view of data
Data and analysis results stored in a
unified catalog, with standardized &
repeatable access using best-in-
class technology
Data (internal, external) and the
associated learning not available or
exploited as necessary
Competitive advantage through
cutting edge research, driving
innovation in the product pipeline
Lack of governance and data
provenance
Secure authenticated access with
relevant data provenance of data
sets; simple data sharing with
vendors/collaborators
Costly, inefficient and limited data
sharing with vendors; risk of
inappropriate data sharing
Efficient compliant, documented,
evidenced, and enforced data sharing
process; effective (time & cost)
collaboration between vendors and
internal teams
Data is in-effect archived
Catalog integrated in the scientific
workflow, used continuously and
keeping pace with innovation in the
industry
Missed insights; silo-ed research
activities, with high barriers to
collaboration
Systematic data sharing and re-use,
driving faster unique cross-functional
insights and reducing waste in spend
BENEFITS
For	more	info,	contact:
Raminderpal Singh
raminderpal.singh@eaglegenomics.com
In Summary,
Considerations & Challenges in
Building an End-to-End Microbiome
Workflow
data - science - life

Mais conteúdo relacionado

Mais procurados

Pistoia Alliance-Elsevier Datathon
Pistoia Alliance-Elsevier DatathonPistoia Alliance-Elsevier Datathon
Pistoia Alliance-Elsevier DatathonPistoia Alliance
 
BigDataAnalytics_Talk_KOCH_FINAL
BigDataAnalytics_Talk_KOCH_FINALBigDataAnalytics_Talk_KOCH_FINAL
BigDataAnalytics_Talk_KOCH_FINALJohn Koch
 
X team 2 - presentation
X team 2 - presentationX team 2 - presentation
X team 2 - presentationRayna Harris
 
Data for AI models, the past, the present, the future
Data for AI models, the past, the present, the futureData for AI models, the past, the present, the future
Data for AI models, the past, the present, the futurePistoia Alliance
 
Healthcare Conference 2013 : Genes, Clouds and Cancer - dr. Andrew Litt
Healthcare Conference 2013 : Genes, Clouds and Cancer - dr. Andrew LittHealthcare Conference 2013 : Genes, Clouds and Cancer - dr. Andrew Litt
Healthcare Conference 2013 : Genes, Clouds and Cancer - dr. Andrew LittD3 Consutling
 
Practical Drug Discovery using Explainable Artificial Intelligence
Practical Drug Discovery using Explainable Artificial IntelligencePractical Drug Discovery using Explainable Artificial Intelligence
Practical Drug Discovery using Explainable Artificial IntelligenceAl Dossetter
 
Pharma data analytics
Pharma data analyticsPharma data analytics
Pharma data analyticsAxon Lawyers
 
Caulder - DIVOS BioITWorld 2015
Caulder - DIVOS BioITWorld 2015Caulder - DIVOS BioITWorld 2015
Caulder - DIVOS BioITWorld 2015Dana Caulder
 
Sharing and standards christopher hart - clinical innovation and partnering...
Sharing and standards   christopher hart - clinical innovation and partnering...Sharing and standards   christopher hart - clinical innovation and partnering...
Sharing and standards christopher hart - clinical innovation and partnering...Christopher Hart
 
Big Data & ML for Clinical Data
Big Data & ML for Clinical DataBig Data & ML for Clinical Data
Big Data & ML for Clinical DataPaul Agapow
 
EY Drug R&D: Big DATA for big returns
EY Drug R&D: Big DATA for big returnsEY Drug R&D: Big DATA for big returns
EY Drug R&D: Big DATA for big returnsThomas Wilckens
 
Conference presentation from #iccs2014 in Noordwijkerhout
Conference presentation from #iccs2014 in NoordwijkerhoutConference presentation from #iccs2014 in Noordwijkerhout
Conference presentation from #iccs2014 in NoordwijkerhoutJosef Scheiber
 
Data Mining and Big Data Analytics in Pharma
Data Mining and Big Data Analytics in Pharma Data Mining and Big Data Analytics in Pharma
Data Mining and Big Data Analytics in Pharma Ankur Khanna
 
Pattern diagnostics 2015
Pattern diagnostics 2015Pattern diagnostics 2015
Pattern diagnostics 2015Thomas Wilckens
 
The Future of Healthcare with Big Data and AI with Ion Stoica and Frank Nothaft
The Future of Healthcare with Big Data and AI with Ion Stoica and Frank NothaftThe Future of Healthcare with Big Data and AI with Ion Stoica and Frank Nothaft
The Future of Healthcare with Big Data and AI with Ion Stoica and Frank NothaftDatabricks
 
Using the CDD Vault for MM4TB
Using the CDD Vault for MM4TBUsing the CDD Vault for MM4TB
Using the CDD Vault for MM4TBSean Ekins
 
ELSS use cases and strategy
ELSS use cases and strategyELSS use cases and strategy
ELSS use cases and strategyAnton Yuryev
 
From Queries to Algorithms to Advanced ML: 3 Pharmaceutical Graph Use Cases
From Queries to Algorithms to Advanced ML: 3 Pharmaceutical Graph Use CasesFrom Queries to Algorithms to Advanced ML: 3 Pharmaceutical Graph Use Cases
From Queries to Algorithms to Advanced ML: 3 Pharmaceutical Graph Use CasesNeo4j
 
Accelerate Your Scientific Discovery with GlobalCODE® - A Unique Data Managem...
Accelerate Your Scientific Discovery with GlobalCODE® - A Unique Data Managem...Accelerate Your Scientific Discovery with GlobalCODE® - A Unique Data Managem...
Accelerate Your Scientific Discovery with GlobalCODE® - A Unique Data Managem...Covance
 

Mais procurados (20)

Pistoia Alliance-Elsevier Datathon
Pistoia Alliance-Elsevier DatathonPistoia Alliance-Elsevier Datathon
Pistoia Alliance-Elsevier Datathon
 
BigDataAnalytics_Talk_KOCH_FINAL
BigDataAnalytics_Talk_KOCH_FINALBigDataAnalytics_Talk_KOCH_FINAL
BigDataAnalytics_Talk_KOCH_FINAL
 
X team 2 - presentation
X team 2 - presentationX team 2 - presentation
X team 2 - presentation
 
Data for AI models, the past, the present, the future
Data for AI models, the past, the present, the futureData for AI models, the past, the present, the future
Data for AI models, the past, the present, the future
 
Healthcare Conference 2013 : Genes, Clouds and Cancer - dr. Andrew Litt
Healthcare Conference 2013 : Genes, Clouds and Cancer - dr. Andrew LittHealthcare Conference 2013 : Genes, Clouds and Cancer - dr. Andrew Litt
Healthcare Conference 2013 : Genes, Clouds and Cancer - dr. Andrew Litt
 
Fair by design
Fair by designFair by design
Fair by design
 
Practical Drug Discovery using Explainable Artificial Intelligence
Practical Drug Discovery using Explainable Artificial IntelligencePractical Drug Discovery using Explainable Artificial Intelligence
Practical Drug Discovery using Explainable Artificial Intelligence
 
Pharma data analytics
Pharma data analyticsPharma data analytics
Pharma data analytics
 
Caulder - DIVOS BioITWorld 2015
Caulder - DIVOS BioITWorld 2015Caulder - DIVOS BioITWorld 2015
Caulder - DIVOS BioITWorld 2015
 
Sharing and standards christopher hart - clinical innovation and partnering...
Sharing and standards   christopher hart - clinical innovation and partnering...Sharing and standards   christopher hart - clinical innovation and partnering...
Sharing and standards christopher hart - clinical innovation and partnering...
 
Big Data & ML for Clinical Data
Big Data & ML for Clinical DataBig Data & ML for Clinical Data
Big Data & ML for Clinical Data
 
EY Drug R&D: Big DATA for big returns
EY Drug R&D: Big DATA for big returnsEY Drug R&D: Big DATA for big returns
EY Drug R&D: Big DATA for big returns
 
Conference presentation from #iccs2014 in Noordwijkerhout
Conference presentation from #iccs2014 in NoordwijkerhoutConference presentation from #iccs2014 in Noordwijkerhout
Conference presentation from #iccs2014 in Noordwijkerhout
 
Data Mining and Big Data Analytics in Pharma
Data Mining and Big Data Analytics in Pharma Data Mining and Big Data Analytics in Pharma
Data Mining and Big Data Analytics in Pharma
 
Pattern diagnostics 2015
Pattern diagnostics 2015Pattern diagnostics 2015
Pattern diagnostics 2015
 
The Future of Healthcare with Big Data and AI with Ion Stoica and Frank Nothaft
The Future of Healthcare with Big Data and AI with Ion Stoica and Frank NothaftThe Future of Healthcare with Big Data and AI with Ion Stoica and Frank Nothaft
The Future of Healthcare with Big Data and AI with Ion Stoica and Frank Nothaft
 
Using the CDD Vault for MM4TB
Using the CDD Vault for MM4TBUsing the CDD Vault for MM4TB
Using the CDD Vault for MM4TB
 
ELSS use cases and strategy
ELSS use cases and strategyELSS use cases and strategy
ELSS use cases and strategy
 
From Queries to Algorithms to Advanced ML: 3 Pharmaceutical Graph Use Cases
From Queries to Algorithms to Advanced ML: 3 Pharmaceutical Graph Use CasesFrom Queries to Algorithms to Advanced ML: 3 Pharmaceutical Graph Use Cases
From Queries to Algorithms to Advanced ML: 3 Pharmaceutical Graph Use Cases
 
Accelerate Your Scientific Discovery with GlobalCODE® - A Unique Data Managem...
Accelerate Your Scientific Discovery with GlobalCODE® - A Unique Data Managem...Accelerate Your Scientific Discovery with GlobalCODE® - A Unique Data Managem...
Accelerate Your Scientific Discovery with GlobalCODE® - A Unique Data Managem...
 

Semelhante a Considerations and challenges in building an end to-end microbiome workflow

Expert panel on industrialising microbiomics - with Unilever
Expert panel on industrialising microbiomics - with UnileverExpert panel on industrialising microbiomics - with Unilever
Expert panel on industrialising microbiomics - with UnileverEagle Genomics
 
2016 09 cxo forum
2016 09 cxo forum2016 09 cxo forum
2016 09 cxo forumChris Dwan
 
FocalCxm presentation on improving productivity in life sciences research
FocalCxm presentation on improving productivity in life sciences researchFocalCxm presentation on improving productivity in life sciences research
FocalCxm presentation on improving productivity in life sciences researchFOCALCXM
 
Workflow-Driven Geoinformatics Applications and Training in the Big Data Era
Workflow-Driven Geoinformatics Applications and Training in the Big Data EraWorkflow-Driven Geoinformatics Applications and Training in the Big Data Era
Workflow-Driven Geoinformatics Applications and Training in the Big Data EraIlkay Altintas, Ph.D.
 
Reproducible research: theory
Reproducible research: theoryReproducible research: theory
Reproducible research: theoryC. Tobin Magle
 
Building an Intelligent Biobank to Power Research Decision-Making
Building an Intelligent Biobank to Power Research Decision-MakingBuilding an Intelligent Biobank to Power Research Decision-Making
Building an Intelligent Biobank to Power Research Decision-MakingDenodo
 
A Survey on Big Data Analytics
A Survey on Big Data AnalyticsA Survey on Big Data Analytics
A Survey on Big Data AnalyticsBHARATH KUMAR
 
Maven and google pharma r&d (1)
Maven and google pharma r&d  (1)Maven and google pharma r&d  (1)
Maven and google pharma r&d (1)Matt Barnes
 
Automating Data Science over a Human Genomics Knowledge Base
Automating Data Science over a Human Genomics Knowledge BaseAutomating Data Science over a Human Genomics Knowledge Base
Automating Data Science over a Human Genomics Knowledge BaseVaticle
 
How Logilab ELN helps Organizations in Research Data Management
How Logilab ELN helps Organizations in Research Data ManagementHow Logilab ELN helps Organizations in Research Data Management
How Logilab ELN helps Organizations in Research Data ManagementAgaram Technologies
 
Pistoia alliance debates analytics 15-09-2015 16.00
Pistoia alliance debates   analytics 15-09-2015 16.00Pistoia alliance debates   analytics 15-09-2015 16.00
Pistoia alliance debates analytics 15-09-2015 16.00Pistoia Alliance
 
Healthcare Analytics Adoption Model
Healthcare Analytics Adoption ModelHealthcare Analytics Adoption Model
Healthcare Analytics Adoption ModelHealth Catalyst
 
Graham Pryor
Graham PryorGraham Pryor
Graham PryorEduserv
 
(Em)Powering Science: High-Performance Infrastructure in Biomedical Science
(Em)Powering Science: High-Performance Infrastructure in Biomedical Science(Em)Powering Science: High-Performance Infrastructure in Biomedical Science
(Em)Powering Science: High-Performance Infrastructure in Biomedical ScienceAri Berman
 
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...Bonnie Hurwitz
 
Life Technologies' Journey to the Cloud (ENT208) | AWS re:Invent 2013
Life Technologies' Journey to the Cloud (ENT208) | AWS re:Invent 2013Life Technologies' Journey to the Cloud (ENT208) | AWS re:Invent 2013
Life Technologies' Journey to the Cloud (ENT208) | AWS re:Invent 2013Amazon Web Services
 
Democratizing Data Science: Balancing Flexibility and Usability for Scientifi...
Democratizing Data Science: Balancing Flexibility and Usability for Scientifi...Democratizing Data Science: Balancing Flexibility and Usability for Scientifi...
Democratizing Data Science: Balancing Flexibility and Usability for Scientifi...PerkinElmer Informatics
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemWarren Kibbe
 

Semelhante a Considerations and challenges in building an end to-end microbiome workflow (20)

Expert panel on industrialising microbiomics - with Unilever
Expert panel on industrialising microbiomics - with UnileverExpert panel on industrialising microbiomics - with Unilever
Expert panel on industrialising microbiomics - with Unilever
 
2016 09 cxo forum
2016 09 cxo forum2016 09 cxo forum
2016 09 cxo forum
 
FocalCxm presentation on improving productivity in life sciences research
FocalCxm presentation on improving productivity in life sciences researchFocalCxm presentation on improving productivity in life sciences research
FocalCxm presentation on improving productivity in life sciences research
 
Workflow-Driven Geoinformatics Applications and Training in the Big Data Era
Workflow-Driven Geoinformatics Applications and Training in the Big Data EraWorkflow-Driven Geoinformatics Applications and Training in the Big Data Era
Workflow-Driven Geoinformatics Applications and Training in the Big Data Era
 
Reproducible research: theory
Reproducible research: theoryReproducible research: theory
Reproducible research: theory
 
Building an Intelligent Biobank to Power Research Decision-Making
Building an Intelligent Biobank to Power Research Decision-MakingBuilding an Intelligent Biobank to Power Research Decision-Making
Building an Intelligent Biobank to Power Research Decision-Making
 
A Survey on Big Data Analytics
A Survey on Big Data AnalyticsA Survey on Big Data Analytics
A Survey on Big Data Analytics
 
Maven and google pharma r&d (1)
Maven and google pharma r&d  (1)Maven and google pharma r&d  (1)
Maven and google pharma r&d (1)
 
Automating Data Science over a Human Genomics Knowledge Base
Automating Data Science over a Human Genomics Knowledge BaseAutomating Data Science over a Human Genomics Knowledge Base
Automating Data Science over a Human Genomics Knowledge Base
 
TOUG Big Data Challenge and Impact
TOUG Big Data Challenge and ImpactTOUG Big Data Challenge and Impact
TOUG Big Data Challenge and Impact
 
How Logilab ELN helps Organizations in Research Data Management
How Logilab ELN helps Organizations in Research Data ManagementHow Logilab ELN helps Organizations in Research Data Management
How Logilab ELN helps Organizations in Research Data Management
 
Pistoia alliance debates analytics 15-09-2015 16.00
Pistoia alliance debates   analytics 15-09-2015 16.00Pistoia alliance debates   analytics 15-09-2015 16.00
Pistoia alliance debates analytics 15-09-2015 16.00
 
Healthcare Analytics Adoption Model
Healthcare Analytics Adoption ModelHealthcare Analytics Adoption Model
Healthcare Analytics Adoption Model
 
Graham Pryor
Graham PryorGraham Pryor
Graham Pryor
 
Big data
Big dataBig data
Big data
 
(Em)Powering Science: High-Performance Infrastructure in Biomedical Science
(Em)Powering Science: High-Performance Infrastructure in Biomedical Science(Em)Powering Science: High-Performance Infrastructure in Biomedical Science
(Em)Powering Science: High-Performance Infrastructure in Biomedical Science
 
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
 
Life Technologies' Journey to the Cloud (ENT208) | AWS re:Invent 2013
Life Technologies' Journey to the Cloud (ENT208) | AWS re:Invent 2013Life Technologies' Journey to the Cloud (ENT208) | AWS re:Invent 2013
Life Technologies' Journey to the Cloud (ENT208) | AWS re:Invent 2013
 
Democratizing Data Science: Balancing Flexibility and Usability for Scientifi...
Democratizing Data Science: Balancing Flexibility and Usability for Scientifi...Democratizing Data Science: Balancing Flexibility and Usability for Scientifi...
Democratizing Data Science: Balancing Flexibility and Usability for Scientifi...
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
 

Último

Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxUnduhUnggah1
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 

Último (20)

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docx
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 

Considerations and challenges in building an end to-end microbiome workflow

  • 1. data - science - life Considerations & Challenges in Building an End-to-End Microbiome Workflow
  • 2. data - science - life Craig earned his Ph.D. in Microbology at the University of Warwick, he then completed his Postdoctoral research at the John Innes Centre before joining EBI and developing their bioinformatics capability. Dr McAnulla has extensive experience in developing large scale pipelines for analyzing microbiome datasets. His skills cover both software and biology and he often provides deep insights to industry experts on these topics. Presenters Raminderpal earned his PhD in the semiconductor industry where he spent a decade building advanced technologies for semiconductor modelling and data mining. In 2012, Dr Singh took the helm as the Business Development Executive for IBM Research’s genomics program, where he developed and led the execution of the go-to-market and business plan for the IBM Watson Genomics program. In addition to many published papers, Dr Singh has two books and twelve patents. Dr Craig McAnulla Senior Consultant Bioinformatics at Eagle Genomics Dr Raminderpal Singh Vice President Business Development at Eagle Genomics Please Tweet your questions @eaglegen or enter them into GoTo Webinar
  • 3. Agenda 1. Considerations 2. Challenges 3. Industrialized Microbiome Workflow & Collaboration Environment 4. Deployment of a Collaboration System data - science - life
  • 4. data - science - life Considerations for successful microbiome data management and analysis: 1. Intrinsic factors 2. The relatively early maturity of the field 3. Calculation of business value 1. Considerations
  • 5. data - science - life Intrinsic • Microbiomes are complex and diverse biological systems • Many components – organisms, genes, pathways, metabolites etc • More DNA/genes than human genome • Taxonomic diversity • Many sources of error • Appropriate study design and metadata collection are crucial. 1. Considerations - Intrinsic
  • 6. data - science - life Maturity • Changing sequencing technologies • Move from amplicon-based to shotgun • Rapidly increasing data volumes • Limited reference data • Fast-moving software development • Future data re-analysis with improved methods must be considered. 1. Considerations - Maturity
  • 7. data - science - life 1. Considerations – Business value Business value • Commercial exploitation of the microbiome is in its infancy, and still at an exploratory phase. • Many research/findings remain controversial and some early claims on the gut microbiome have not been replicated. • The true potential of microbiome data is still to be tapped, and some novel experimental designs are being developed that result in maximum value being generated.
  • 8. data - science - life The analysis-at-scale of microbiomics data faces several challenges: • Metadata management • Compute resources necessary for storage and analysis • Statistical techniques Metadata Management • Crucial to analysis • Often done poorly • Needs to encompass all relevant information • Check for batch effects 2. Challenges
  • 9. data - science - life 2. Challenges Data Storage and Compute Resources • Shotgun metagenomics example study - 1.7 billion paired-end Illumina reads, 266.5 Gb. • Bigger data volume than the human genome at 30x coverage. • Study size is increasing How to store this? • Analysis selection critical • Analyses MUST work with these data volumes • eg Kraken, Humann2 • Compute resources • Eg Kraken, multicore machine > 100Gigs • Solution – the cloud • AWS, Azure
  • 10. data - science - life Statistical Techniques • Data normalization • rarefaction, normalization, log transformation? • Statistical techniques • parametric, nonparametric? • Confusion • No correct answer • Depends on the data • function vs taxonomy • shotgun vs amplicon 2. Challenges
  • 11. data - science - life • Eagle Genomics leads the industry in configuring and deploying best-in-class microbiome workflow solutions. • These run large-scale parallelized pipelines cost-efficiently, with seamless connectivity between the source data and the scientists. 3. Industrialized Microbiome Workflow & Collaboration Environment www.eaglegenomics.com “Unilever’s digital data program now processes genetic sequences twenty times faster - without incurring higher compute costs. In addition, its robust architecture supports ten times as many scientists, all working simultaneously.” Pete Keeley, eScience Technical Lead - R&D IT at Unilever
  • 12. data - science - life organization 3. Industrialized Microbiome Workflow & Collaboration Environment
  • 13. data - science - life MICROBIOME DATA FILES (amplicon or shotgun) METADATA: eaglecatalog (implements security model) QUERY/ FILTERING: eaglecatalog DATA SHARING CRO’s DEPOSIT DATA ELN EXPERIMENT METADATA INTEGRATION WITH OMICS (RNASeq, proteomics) FEDERATED DATA & RESULTS INTEGRATION TAXONOMY/ FUNCTION ANALYSIS REPORTS R,QIIME POPULATION COHORTS VISUALIZATION CLOUD STORAGE eaglecatalog COLLABORATIVE ENVIRONMENT FOR PARTNERS FEEDBACK TO EXPERIMENTAL DESIGN COST OF EXPERIMENT DATA COLLABORATION ARCHITECTURE Solving the metadata problem
  • 14. data - science - life 4. Deployment of a Collaboration System Eagle Genomics Proposal Accelerating metadata adoption and scale up • Offering a new rapid deployment for microbiome R&D organizations looking to cost-efficiently build world-class integrated workflows. • Set up of an initial collaboration system for your team. • 1 to 4 weeks • Enabling self-load your data to the catalog – leading to a standardized and streamlined workflow, which is ready for future scale out • Managing internal data, partner data, or public data
  • 15. data - science - life Current Situation Desired State Negative Consequences of Non-action Positive Business Impact, using Eagle software Data in silos, not reused Easily searchable, discoverable, accessible and intuitive portal by researchers to all data and results Significant researcher time spent in “data wrangling”. Breakage pending, based on projected bottleneck Improved time and ability to insight through radically improved productivity of researchers No correlation between experiments Quick and easy exploratory analyses across experiments and data Missed opportunities for insights. Failure to leverage data as an asset Faster identification and generation of successful product innovation and insight, leading to reduced time to market No unified view of data Data and analysis results stored in a unified catalog, with standardized & repeatable access using best-in- class technology Data (internal, external) and the associated learning not available or exploited as necessary Competitive advantage through cutting edge research, driving innovation in the product pipeline Lack of governance and data provenance Secure authenticated access with relevant data provenance of data sets; simple data sharing with vendors/collaborators Costly, inefficient and limited data sharing with vendors; risk of inappropriate data sharing Efficient compliant, documented, evidenced, and enforced data sharing process; effective (time & cost) collaboration between vendors and internal teams Data is in-effect archived Catalog integrated in the scientific workflow, used continuously and keeping pace with innovation in the industry Missed insights; silo-ed research activities, with high barriers to collaboration Systematic data sharing and re-use, driving faster unique cross-functional insights and reducing waste in spend BENEFITS
  • 16. For more info, contact: Raminderpal Singh raminderpal.singh@eaglegenomics.com In Summary, Considerations & Challenges in Building an End-to-End Microbiome Workflow data - science - life