SlideShare uma empresa Scribd logo
1 de 24
Baixar para ler offline
Big Data at Golden Helix:
Scaling to Meet the
Demand of Clinical and
Research Genomics
September 21, 2016
Gabe Rudy
VP Product & Engineering
Agenda
Big Data in Genomics
Use Cases and Questions
2
3
4
Big Data at Golden Helix
Overview Golden Helix1
Golden Helix – Who We Are
Golden Helix is a global bioinformatics
company founded in 1998.
Filtering and Annotation
Clinical Reports
Pipeline: Run Workflows
GWAS
Genomic Prediction
Large-N-Population Studies
RNA-Seq
CNV-Analysis
Variant Warehouse
Centralized Annotations
Hosted Reports
Sharing and Integration
Over 300 customers globally
Cited in over 1000 peer-reviewed publications
Golden Helix – Who We Are
When you choose a Golden Helix solution, you get more than just software
 REPUTATION
 TRUST
 EXPERIENCE
 INDUSTRY FOCUS
 THOUGHT
LEADERSHIP
 COMMUNITY
 TRAINING
 SUPPORT
 RESPONSIVENESS
 TRANSPARENCY
 INNOVATION and
SPEED
 CUSTOMIZATIONS
Good BM, Ainscough BJ, McMichael JF, Su AI†, Griffith OL†. 2014. Genome Biology. 15(8):438.
Path of Data to the Clinic
FASTQ Files:
Per Sample ~100GB
BAM Files:
Per Sample ~100GB
gVCF Files:
Per Sample ~2-5GB
Annotated Variants:
Per Sample ~100MB
Clinically Assessments:
Per Sample ~10MB
Clinical Reports:
Per Sample ~1MB
NGS Driving Our Catalog of Small Variations
Project Samples Vars
1KG
Phase 1
1,094 39M
1KG
Phase 3
2,504 84M
NHLBI
ESP
6,500 2M
BROAD
ExAC
61,486 10M
Whole Genome Sequencing
Big Data at Golden Helix
Agrigenomics Prediction
Usage Examples on Big Data
Whole Genomes Provide a Better Exome
 Whole Genome PCR Free
- Less sensitive to GC content
 Not limited to target design
 Uniform coverage allows for
CNV detection
 Downsides (other than cost):
- Often lower coverage (~50X vs
150X)
- Doesn’t make sense for tumors
where you need very high read
depth
- Only loosing ~0.5% of variants[1]
- Can your tools handle WGS?
[1] Meienberg J, Zerjavic K, Keller I, et al. New insights into the performance of human whole-exome capture platforms.
Nucleic Acids Res. 2015;43:e76. doi: 10.1093/nar/gkv216
Means of 6 Samples Run on 6 Exon Kits
Variants Outside Exome Target Capture Too!
 Intronic variants can be of
clinical significance
- ClinVar has ~10K intronic
variants (387 P or LP)
 Many PGX and other
clinical trait association
variants are intergenic
 Need annotation sources
outside genes:
- CADD
- Conservation scores
- Splice site predictions
Kellis M, Wold B, Snyder MP, et al. Defining functional DNA elements in the human genome.
Proceedings of the National Academy of Sciences of the United States of America. 2014;111(17):6131-6138.
doi:10.1073/pnas.1318948111.
The Amazing 17!
 17 Supercentenarian
 Sequenced using CGI WGS
 1 Male, 16 Females
 Found DSC2 Pathogenic Mutation
 Found weak TSHZ3 rare variant
burden over controls
 Lets do some similar analysis!
- Filter to ClinVar Pathogenic variants for
LoF variants
- Count per gene presence of rare,
functional Homozygous variants
http://dx.doi.org/10.1371/journal.pone.0112430
VarSeq Demonstration
VSWarehouse: Treasure your Samples
 Scalable Infrastructure of VarSeq
as a Multi-User Growing Repository
of your NGS Samples
- Query variants and annotations
- Exports to Text, Excel, VCF
 Aggregate Samples:
- Targeted Gene Panels
- Exomes
- Genomes
 Cancer and Germline Workflows
 Deep integration with VarSeq for
annotation, reporting and hands on
assessment/classification
New in VSWarehouse 1.2
 Assessment catalogs
- Store your variant curations, assessments,
classifications
- Can also be used for cataloging common false-
positives, other custom annotations
- Updated by users with version history
- Can “bulk import” existing knowledgebase
 Optimized imports, especially adding
samples to existing warehouse projects
 Improved query speeds
VarSeq + VSWarehouse Demostration
 Cancer Gene Panel Demo Data
 Lets Connect it to a VSWarehouse
Installation
- Report
- Catalog
- Annotations
 Lets Extract our Samples from
VSWarehouse
- Query rare, functional variants in our cohort
- Export to Excel
SNP & Variation Suite
 Mature Research Analysis Platform
 Designed for Large Datasets, both in Samples and Markers/Variants
 Agrigenomics is growing market, pushing the sample limit “N”
 Certain matrix operations are computed on NxN matrixes
Kinship Matrices & GBLUP
 The GBLUP method computes a genomic relationship matrix and from that
computes the “Genomic Best Linear Unbiased Predictor” (GBLUP) of additive
genetic merits by sample and of allele substitution effects (ASE) by marker.
 GBLUP can be used to predict Estimated Breeding Values (EBV) for all samples
in a dataset which allows for the identification of samples with the highest EBV
to carry forward in breeding programs.
 It can also be used to identify influential loci for the phenotype of interest that
can then be used for a targeted assay for diagnostic purposes.
Beating the System
 We use:
- Out-of-memory scratch buffers
- Piece wise large data matrix
operations
- Adaptation of method[1] for
approximating matrix
decompositions of large
matrices using random
numbers.
Halko et al. Finding structure with randomness: Probabilistic algorithms
for constructing approximate matrix decompositions.
arXiv:0909.4061 [math.NA]
SVS Demonstration
[Demonstration]
Use Cases for VSWarehouse and VarSeq
 Researchers: Store and aggregate research samples. Re-run queries against
new annotations and research findings over time. Share and collaborate.
 Small Labs: Build up in-house knowledge base of variant assessments. Have
warehouse of clinical samples to draw on for quality and frequency filtering.
Have versioned snapshots of warehouse to reference from archived reports.
 Core Labs: Provide institution wide population frequencies. Access controlled
projects for individual groups of users. Customized workflows for different use
cases.
 Consortiums: Secure cloud based repository for users to submit new samples.
Versioned history, customized annotations and per-cohort statistics. Allow users
to query and extract the subsets necessary for analysis.
Use the Questions pane in
your GoToWebinar window
Questions during
the presentation
Questions or
more info:
 Email
info@goldenhelix.com
 Request an evaluation of
the software at
www.goldenhelix.com
Questions?
Use the Questions pane in
your GoToWebinar window

Mais conteúdo relacionado

Mais procurados

Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009
Ian Foster
 
Scientists devise new way to dramatically raise rna treatment potency
Scientists devise new way to dramatically raise rna treatment potencyScientists devise new way to dramatically raise rna treatment potency
Scientists devise new way to dramatically raise rna treatment potency
Disney Scripps Florida
 

Mais procurados (20)

Genome Big Data
Genome Big DataGenome Big Data
Genome Big Data
 
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVSExploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS
 
RML NCBI Resources
RML NCBI ResourcesRML NCBI Resources
RML NCBI Resources
 
Databases ii
Databases iiDatabases ii
Databases ii
 
Literature mining and large-scale data integration
Literature mining and large-scale data integrationLiterature mining and large-scale data integration
Literature mining and large-scale data integration
 
Spark Summit EU talk by Erwin Datema and Roeland van Ham
Spark Summit EU talk by Erwin Datema and Roeland van HamSpark Summit EU talk by Erwin Datema and Roeland van Ham
Spark Summit EU talk by Erwin Datema and Roeland van Ham
 
Rishi
RishiRishi
Rishi
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009
 
Kim Pruitt trainingbiocuration2015
Kim Pruitt trainingbiocuration2015Kim Pruitt trainingbiocuration2015
Kim Pruitt trainingbiocuration2015
 
2015 Bioc4010 lecture1and2
2015 Bioc4010 lecture1and22015 Bioc4010 lecture1and2
2015 Bioc4010 lecture1and2
 
Biological literature mining - from information retrieval to biological disco...
Biological literature mining - from information retrieval to biological disco...Biological literature mining - from information retrieval to biological disco...
Biological literature mining - from information retrieval to biological disco...
 
Enriching Scholarship Personal Genomics presentation
Enriching Scholarship Personal Genomics presentationEnriching Scholarship Personal Genomics presentation
Enriching Scholarship Personal Genomics presentation
 
Jax bio dataworldcongress.ngs.20181128finalwithoutbu
Jax bio dataworldcongress.ngs.20181128finalwithoutbuJax bio dataworldcongress.ngs.20181128finalwithoutbu
Jax bio dataworldcongress.ngs.20181128finalwithoutbu
 
Visual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient StratificationVisual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient Stratification
 
Monitoring the quality of data in the clinical use of pathogen genomes
Monitoring the quality of data in the clinical use of pathogen genomesMonitoring the quality of data in the clinical use of pathogen genomes
Monitoring the quality of data in the clinical use of pathogen genomes
 
Biomedical literature mining
Biomedical literature miningBiomedical literature mining
Biomedical literature mining
 
Scientists devise new way to dramatically raise rna treatment potency
Scientists devise new way to dramatically raise rna treatment potencyScientists devise new way to dramatically raise rna treatment potency
Scientists devise new way to dramatically raise rna treatment potency
 
ASHG_2014_AP
ASHG_2014_APASHG_2014_AP
ASHG_2014_AP
 
TOOLS AND DATA BASES OF NCBI
TOOLS AND DATA BASES OF NCBITOOLS AND DATA BASES OF NCBI
TOOLS AND DATA BASES OF NCBI
 
VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...
 

Destaque

Destaque (20)

Coaching poster
Coaching posterCoaching poster
Coaching poster
 
The need to redefine genomic data sharing - moving towards Open Science Oct ...
The need to redefine genomic data sharing - moving towards Open Science  Oct ...The need to redefine genomic data sharing - moving towards Open Science  Oct ...
The need to redefine genomic data sharing - moving towards Open Science Oct ...
 
Key knowledge, skills and behaviours required by Learning and Development Pro...
Key knowledge, skills and behaviours required by Learning and Development Pro...Key knowledge, skills and behaviours required by Learning and Development Pro...
Key knowledge, skills and behaviours required by Learning and Development Pro...
 
How to transform genomic big data into valuable clinical information
How to transform genomic big data into valuable clinical informationHow to transform genomic big data into valuable clinical information
How to transform genomic big data into valuable clinical information
 
Genomic futures v_pitt_kent_osu
Genomic futures v_pitt_kent_osuGenomic futures v_pitt_kent_osu
Genomic futures v_pitt_kent_osu
 
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
 
Advanced genomics v_medical_pitt_kent_osu
Advanced genomics v_medical_pitt_kent_osuAdvanced genomics v_medical_pitt_kent_osu
Advanced genomics v_medical_pitt_kent_osu
 
Processing 70Tb Of Genomics Data With ADAM And Toil
Processing 70Tb Of Genomics Data With ADAM And ToilProcessing 70Tb Of Genomics Data With ADAM And Toil
Processing 70Tb Of Genomics Data With ADAM And Toil
 
Cloud Connect 2013- Lock Stock and x Smoking EC2's
Cloud Connect 2013- Lock Stock and x Smoking EC2'sCloud Connect 2013- Lock Stock and x Smoking EC2's
Cloud Connect 2013- Lock Stock and x Smoking EC2's
 
Managing & Processing Big Data for Cancer Genomics, an insight of Bioinformatics
Managing & Processing Big Data for Cancer Genomics, an insight of BioinformaticsManaging & Processing Big Data for Cancer Genomics, an insight of Bioinformatics
Managing & Processing Big Data for Cancer Genomics, an insight of Bioinformatics
 
Fabricio Silva: Cloud Computing Technologies for Genomic Big Data Analysis
Fabricio  Silva: Cloud Computing Technologies for Genomic Big Data AnalysisFabricio  Silva: Cloud Computing Technologies for Genomic Big Data Analysis
Fabricio Silva: Cloud Computing Technologies for Genomic Big Data Analysis
 
Big Data Analytics for Healthcare Decision Support- Operational and Clinical
Big Data Analytics for Healthcare Decision Support- Operational and ClinicalBig Data Analytics for Healthcare Decision Support- Operational and Clinical
Big Data Analytics for Healthcare Decision Support- Operational and Clinical
 
Current Practices, Trends and Emerging roles in Learning and Development
Current Practices, Trends and Emerging roles in Learning and DevelopmentCurrent Practices, Trends and Emerging roles in Learning and Development
Current Practices, Trends and Emerging roles in Learning and Development
 
Malcolm Pradhan on Pathology in Clincial Decision Support and the role of Dee...
Malcolm Pradhan on Pathology in Clincial Decision Support and the role of Dee...Malcolm Pradhan on Pathology in Clincial Decision Support and the role of Dee...
Malcolm Pradhan on Pathology in Clincial Decision Support and the role of Dee...
 
Using Machine Learning to Automate Clinical Pathways
Using Machine Learning to Automate Clinical PathwaysUsing Machine Learning to Automate Clinical Pathways
Using Machine Learning to Automate Clinical Pathways
 
Clinical Trial Management Systems of next next decade
Clinical Trial Management Systems of next next decadeClinical Trial Management Systems of next next decade
Clinical Trial Management Systems of next next decade
 
Oncology Big Data: A Mirage or Oasis of Clinical Value?
Oncology Big Data:  A Mirage or Oasis of Clinical Value? Oncology Big Data:  A Mirage or Oasis of Clinical Value?
Oncology Big Data: A Mirage or Oasis of Clinical Value?
 
Clinical data-management-overview
Clinical data-management-overviewClinical data-management-overview
Clinical data-management-overview
 
H2O for Medicine and Intro to H2O in Python
H2O for Medicine and Intro to H2O in PythonH2O for Medicine and Intro to H2O in Python
H2O for Medicine and Intro to H2O in Python
 
2016 AWS Healthcare Days | Nashville, TN – May 3,2016
2016 AWS Healthcare Days | Nashville, TN – May 3,20162016 AWS Healthcare Days | Nashville, TN – May 3,2016
2016 AWS Healthcare Days | Nashville, TN – May 3,2016
 

Semelhante a Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research Genomics

OKC Grand Rounds 2009
OKC Grand Rounds 2009OKC Grand Rounds 2009
OKC Grand Rounds 2009
Sean Davis
 

Semelhante a Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research Genomics (20)

CRISPR Screening: the What, Why and How
CRISPR Screening: the What, Why and HowCRISPR Screening: the What, Why and How
CRISPR Screening: the What, Why and How
 
Building bioinformatics resources for the global community
Building bioinformatics resources for the global communityBuilding bioinformatics resources for the global community
Building bioinformatics resources for the global community
 
Bioinformatics tools for NGS data analysis
Bioinformatics tools for NGS data analysisBioinformatics tools for NGS data analysis
Bioinformatics tools for NGS data analysis
 
Using VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research WorkflowsUsing VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research Workflows
 
Using VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research WorkflowsUsing VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research Workflows
 
2023 GIAB AMP Update
2023 GIAB AMP Update2023 GIAB AMP Update
2023 GIAB AMP Update
 
Bioinformatics as a tool for understanding carcinogenesis
Bioinformatics as a tool for understanding carcinogenesisBioinformatics as a tool for understanding carcinogenesis
Bioinformatics as a tool for understanding carcinogenesis
 
How CRISPR–Cas9 Screening will revolutionise your drug development programs
How CRISPR–Cas9 Screening will revolutionise your drug development programsHow CRISPR–Cas9 Screening will revolutionise your drug development programs
How CRISPR–Cas9 Screening will revolutionise your drug development programs
 
Giab for jax long read 190917
Giab for jax long read 190917Giab for jax long read 190917
Giab for jax long read 190917
 
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
 
GIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM Forum
 
Prediction and Meta-Analysis
Prediction and Meta-AnalysisPrediction and Meta-Analysis
Prediction and Meta-Analysis
 
Prediction and Meta-Analysis
Prediction and Meta-AnalysisPrediction and Meta-Analysis
Prediction and Meta-Analysis
 
NGS-Based Clinical Analysis
NGS-Based Clinical AnalysisNGS-Based Clinical Analysis
NGS-Based Clinical Analysis
 
Introducing VSWarehouse - A Scalable Genetic Data Warehouse for VarSeq
Introducing VSWarehouse - A Scalable Genetic Data Warehouse for VarSeqIntroducing VSWarehouse - A Scalable Genetic Data Warehouse for VarSeq
Introducing VSWarehouse - A Scalable Genetic Data Warehouse for VarSeq
 
OKC Grand Rounds 2009
OKC Grand Rounds 2009OKC Grand Rounds 2009
OKC Grand Rounds 2009
 
Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...
Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...
Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...
 
Affymetrix OncoScan®* data analysis with Nexus Copy Number™
Affymetrix OncoScan®* data analysis with Nexus Copy Number™Affymetrix OncoScan®* data analysis with Nexus Copy Number™
Affymetrix OncoScan®* data analysis with Nexus Copy Number™
 
Next-Generation Sequencing and Data Analysis.pptx
Next-Generation Sequencing and Data Analysis.pptxNext-Generation Sequencing and Data Analysis.pptx
Next-Generation Sequencing and Data Analysis.pptx
 
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
 

Mais de Golden Helix Inc

Mais de Golden Helix Inc (20)

Pharmacological Induction of FoxO3 is a Potential Treatment for Sickle Cell D...
Pharmacological Induction of FoxO3 is a Potential Treatment for Sickle Cell D...Pharmacological Induction of FoxO3 is a Potential Treatment for Sickle Cell D...
Pharmacological Induction of FoxO3 is a Potential Treatment for Sickle Cell D...
 
Uncovering novel candidate genes for pyridoxine-responsive epilepsy in a cons...
Uncovering novel candidate genes for pyridoxine-responsive epilepsy in a cons...Uncovering novel candidate genes for pyridoxine-responsive epilepsy in a cons...
Uncovering novel candidate genes for pyridoxine-responsive epilepsy in a cons...
 
Authoring Clinical Reports in VarSeq
Authoring Clinical Reports in VarSeqAuthoring Clinical Reports in VarSeq
Authoring Clinical Reports in VarSeq
 
SETBP1 as a novel candidate gene for neurodevelopmental disorders of speech a...
SETBP1 as a novel candidate gene for neurodevelopmental disorders of speech a...SETBP1 as a novel candidate gene for neurodevelopmental disorders of speech a...
SETBP1 as a novel candidate gene for neurodevelopmental disorders of speech a...
 
Two Clinical Workflows - From Unfiltered Variants to a Clinical Report
Two Clinical Workflows - From Unfiltered Variants to a Clinical ReportTwo Clinical Workflows - From Unfiltered Variants to a Clinical Report
Two Clinical Workflows - From Unfiltered Variants to a Clinical Report
 
The Molecular Sciences Made Personal
The Molecular Sciences Made PersonalThe Molecular Sciences Made Personal
The Molecular Sciences Made Personal
 
A Walk Through GWAS
A Walk Through GWASA Walk Through GWAS
A Walk Through GWAS
 
MM-KBAC – Using Mixed Models to Adjust for Population Structure in a Rare-var...
MM-KBAC – Using Mixed Models to Adjust for Population Structure in a Rare-var...MM-KBAC – Using Mixed Models to Adjust for Population Structure in a Rare-var...
MM-KBAC – Using Mixed Models to Adjust for Population Structure in a Rare-var...
 
Cancer Workflows in VarSeq
Cancer Workflows in VarSeqCancer Workflows in VarSeq
Cancer Workflows in VarSeq
 
Getting Started with VSWarehouse - The User Experience
Getting Started with VSWarehouse - The User ExperienceGetting Started with VSWarehouse - The User Experience
Getting Started with VSWarehouse - The User Experience
 
Pharmacogenomic Prediction of Antracycline-induced Cardiotoxicity
Pharmacogenomic Prediction of Antracycline-induced CardiotoxicityPharmacogenomic Prediction of Antracycline-induced Cardiotoxicity
Pharmacogenomic Prediction of Antracycline-induced Cardiotoxicity
 
Custom Family Workflows
Custom Family WorkflowsCustom Family Workflows
Custom Family Workflows
 
Using WES in Distant Relationships to Identify Cardiomyopathy Genes
Using WES in Distant Relationships to Identify Cardiomyopathy GenesUsing WES in Distant Relationships to Identify Cardiomyopathy Genes
Using WES in Distant Relationships to Identify Cardiomyopathy Genes
 
Using Clinical Reports as a part of a Gene Panel Pipeline
Using Clinical Reports as a part of a Gene Panel PipelineUsing Clinical Reports as a part of a Gene Panel Pipeline
Using Clinical Reports as a part of a Gene Panel Pipeline
 
Investigating Shared Additive Genetic Variation for Alcohol Dependence
Investigating Shared Additive Genetic Variation for Alcohol DependenceInvestigating Shared Additive Genetic Variation for Alcohol Dependence
Investigating Shared Additive Genetic Variation for Alcohol Dependence
 
Personalized Medicine through Tumor Sequencing
Personalized Medicine through Tumor SequencingPersonalized Medicine through Tumor Sequencing
Personalized Medicine through Tumor Sequencing
 
CNV Analysis in VarSeq
CNV Analysis in VarSeqCNV Analysis in VarSeq
CNV Analysis in VarSeq
 
Beagle Imputation in SVS
Beagle Imputation in SVSBeagle Imputation in SVS
Beagle Imputation in SVS
 
Clinical Reporting Made Easy
Clinical Reporting Made EasyClinical Reporting Made Easy
Clinical Reporting Made Easy
 
Population Structure & Genetic Improvement in Livestock
Population Structure & Genetic Improvement in LivestockPopulation Structure & Genetic Improvement in Livestock
Population Structure & Genetic Improvement in Livestock
 

Último

Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
PirithiRaju
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
Lokesh Kothari
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Lokesh Kothari
 

Último (20)

Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
 

Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research Genomics

  • 1. Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research Genomics September 21, 2016 Gabe Rudy VP Product & Engineering
  • 2. Agenda Big Data in Genomics Use Cases and Questions 2 3 4 Big Data at Golden Helix Overview Golden Helix1
  • 3. Golden Helix – Who We Are Golden Helix is a global bioinformatics company founded in 1998. Filtering and Annotation Clinical Reports Pipeline: Run Workflows GWAS Genomic Prediction Large-N-Population Studies RNA-Seq CNV-Analysis Variant Warehouse Centralized Annotations Hosted Reports Sharing and Integration
  • 5. Cited in over 1000 peer-reviewed publications
  • 6. Golden Helix – Who We Are When you choose a Golden Helix solution, you get more than just software  REPUTATION  TRUST  EXPERIENCE  INDUSTRY FOCUS  THOUGHT LEADERSHIP  COMMUNITY  TRAINING  SUPPORT  RESPONSIVENESS  TRANSPARENCY  INNOVATION and SPEED  CUSTOMIZATIONS
  • 7. Good BM, Ainscough BJ, McMichael JF, Su AI†, Griffith OL†. 2014. Genome Biology. 15(8):438. Path of Data to the Clinic FASTQ Files: Per Sample ~100GB BAM Files: Per Sample ~100GB gVCF Files: Per Sample ~2-5GB Annotated Variants: Per Sample ~100MB Clinically Assessments: Per Sample ~10MB Clinical Reports: Per Sample ~1MB
  • 8. NGS Driving Our Catalog of Small Variations Project Samples Vars 1KG Phase 1 1,094 39M 1KG Phase 3 2,504 84M NHLBI ESP 6,500 2M BROAD ExAC 61,486 10M
  • 9. Whole Genome Sequencing Big Data at Golden Helix Agrigenomics Prediction Usage Examples on Big Data
  • 10. Whole Genomes Provide a Better Exome  Whole Genome PCR Free - Less sensitive to GC content  Not limited to target design  Uniform coverage allows for CNV detection  Downsides (other than cost): - Often lower coverage (~50X vs 150X) - Doesn’t make sense for tumors where you need very high read depth - Only loosing ~0.5% of variants[1] - Can your tools handle WGS? [1] Meienberg J, Zerjavic K, Keller I, et al. New insights into the performance of human whole-exome capture platforms. Nucleic Acids Res. 2015;43:e76. doi: 10.1093/nar/gkv216 Means of 6 Samples Run on 6 Exon Kits
  • 11. Variants Outside Exome Target Capture Too!  Intronic variants can be of clinical significance - ClinVar has ~10K intronic variants (387 P or LP)  Many PGX and other clinical trait association variants are intergenic  Need annotation sources outside genes: - CADD - Conservation scores - Splice site predictions Kellis M, Wold B, Snyder MP, et al. Defining functional DNA elements in the human genome. Proceedings of the National Academy of Sciences of the United States of America. 2014;111(17):6131-6138. doi:10.1073/pnas.1318948111.
  • 12. The Amazing 17!  17 Supercentenarian  Sequenced using CGI WGS  1 Male, 16 Females  Found DSC2 Pathogenic Mutation  Found weak TSHZ3 rare variant burden over controls  Lets do some similar analysis! - Filter to ClinVar Pathogenic variants for LoF variants - Count per gene presence of rare, functional Homozygous variants http://dx.doi.org/10.1371/journal.pone.0112430
  • 14. VSWarehouse: Treasure your Samples  Scalable Infrastructure of VarSeq as a Multi-User Growing Repository of your NGS Samples - Query variants and annotations - Exports to Text, Excel, VCF  Aggregate Samples: - Targeted Gene Panels - Exomes - Genomes  Cancer and Germline Workflows  Deep integration with VarSeq for annotation, reporting and hands on assessment/classification
  • 15. New in VSWarehouse 1.2  Assessment catalogs - Store your variant curations, assessments, classifications - Can also be used for cataloging common false- positives, other custom annotations - Updated by users with version history - Can “bulk import” existing knowledgebase  Optimized imports, especially adding samples to existing warehouse projects  Improved query speeds
  • 16. VarSeq + VSWarehouse Demostration  Cancer Gene Panel Demo Data  Lets Connect it to a VSWarehouse Installation - Report - Catalog - Annotations  Lets Extract our Samples from VSWarehouse - Query rare, functional variants in our cohort - Export to Excel
  • 17. SNP & Variation Suite  Mature Research Analysis Platform  Designed for Large Datasets, both in Samples and Markers/Variants  Agrigenomics is growing market, pushing the sample limit “N”  Certain matrix operations are computed on NxN matrixes
  • 18. Kinship Matrices & GBLUP  The GBLUP method computes a genomic relationship matrix and from that computes the “Genomic Best Linear Unbiased Predictor” (GBLUP) of additive genetic merits by sample and of allele substitution effects (ASE) by marker.  GBLUP can be used to predict Estimated Breeding Values (EBV) for all samples in a dataset which allows for the identification of samples with the highest EBV to carry forward in breeding programs.  It can also be used to identify influential loci for the phenotype of interest that can then be used for a targeted assay for diagnostic purposes.
  • 19. Beating the System  We use: - Out-of-memory scratch buffers - Piece wise large data matrix operations - Adaptation of method[1] for approximating matrix decompositions of large matrices using random numbers. Halko et al. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. arXiv:0909.4061 [math.NA]
  • 21. Use Cases for VSWarehouse and VarSeq  Researchers: Store and aggregate research samples. Re-run queries against new annotations and research findings over time. Share and collaborate.  Small Labs: Build up in-house knowledge base of variant assessments. Have warehouse of clinical samples to draw on for quality and frequency filtering. Have versioned snapshots of warehouse to reference from archived reports.  Core Labs: Provide institution wide population frequencies. Access controlled projects for individual groups of users. Customized workflows for different use cases.  Consortiums: Secure cloud based repository for users to submit new samples. Versioned history, customized annotations and per-cohort statistics. Allow users to query and extract the subsets necessary for analysis.
  • 22. Use the Questions pane in your GoToWebinar window Questions during the presentation
  • 23. Questions or more info:  Email info@goldenhelix.com  Request an evaluation of the software at www.goldenhelix.com
  • 24. Questions? Use the Questions pane in your GoToWebinar window