SlideShare uma empresa Scribd logo
1 de 21
Baixar para ler offline
Enhanced structural variant and breakpoint
   detection using SVMerge by integration of multiple
   detection methods and local assembly

   Kim Wong/Thomas Keane
   Vertebrate Resequencing Informatics

   http://svmerge.sourceforge.net




Vertebrate Resequencing Informatics   22nd March, 2011
Genomic Structural Variation

 Large DNA rearrangements (>100bp)
 Frequent causes of disease
   Referred to as genomic disorders
   Mendelian diseases or complex traits such as behaviors
        E.g. increase in gene dosage due to increase in copy number
   Prevalent in cancer genomes
 Many types of genomic structural variation (SV)
   Insertions, deletions, copy number changes, inversions,
     translocations & complex events
 Comparative genomic hybridization (CGH) traditionally used to for copy
 number discovery
   CNVs of 1–50 kb in size have been under-ascertained
 Next-gen sequencing revolutionised field of SV discovery
   Parallel sequencing of ends of large numbers of DNA fragments
   Examine alignment distance of reads to discover presence of
     genomic rearrangments
   Resolution down to ~100bp


Vertebrate Resequencing Informatics   22nd March, 2011
Simple types of Structural Variation




Vertebrate Resequencing Informatics   22nd March, 2011
Deletion

 SV Visualisation
   LookSeq viewer
   Read pairs displayed
   Y axis is aligned insert size
 Deletions are easily spotted
   Read pairs are mapped
     further apart than expected
   Coverage is zero across
     the deletion sequence
 Deletion in NOD/ShiLtJ




Vertebrate Resequencing Informatics   22nd March, 2011
Inversion



                                Mate pairs align in the same orientation




                                       Coverage zero at breakpoints


Vertebrate Resequencing Informatics   22nd March, 2011
Insertion




One end
mapped reads



                                          Coverage zero at breakpoint



 Vertebrate Resequencing Informatics   22nd March, 2011
Complex SV Events




                                          Inversion




                                  Insertion              Insertion
Vertebrate Resequencing Informatics   22nd March, 2011
Human Examples




                                       Stankiewicz and Lupski (2010) Ann. Rev. Med.

Vertebrate Resequencing Informatics   22nd March, 2011
Example 2: Transposable element insertion in mice




Vertebrate Resequencing Informatics   22nd March, 2011
SVMerge

 Initially developed for mouse genomes project
     Several software packages currently available to discover SVs
 Various approaches using information from anomalously mapped read pairs OR
 read depth analysis
 No single SV caller is able to detect the full range of structural variants
     Paired-end mapping information, for example, cannot detect SVs where the
       read pairs do not flank the SV breakpoints
     Insertion calls made using the split-mapping approach are also size-limited
       because the whole insertion breakpoint must be contained within a read
     Read-depth approaches can identify copy number changes without the need
       for read-pair support, but cannot find copy number neutral events
 SVMerge, a meta SV calling pipeline, which makes SV predictions with a
 collection of SV callers
     Input is a BAM file per sample
     Run callers individually + outputs sanitized into standard BED format
     SV calls merged, and computationally validated using local de novo assembly
     Primarily a SV discovery/calling + validation tool



Vertebrate Resequencing Informatics   22nd March, 2011
SVMerge Workflow




                                                         Wong et al (2010)

Vertebrate Resequencing Informatics   22nd March, 2011
SV Callers




                                                         Wong et al (2010)




Vertebrate Resequencing Informatics   22nd March, 2011
Local Assembly Validation

 Key to the approach is the computational
 validation step
   Local assembly and breakpoint refinement
   All SV calls (except those lacking read
      pair support e.g. CNG/CNL)
 Algorithm
   Gather mapped reads, and any
      unmapped mate-pairs (<1kb of a insertion
      breakpoint, <2kb of all other SV types)
   Run local velvet assembly
   Realign the contigs produced with
      exonerate
   Detect contig breaks proximal to the
      breakpoint(s)

Vertebrate Resequencing Informatics   22nd March, 2011
Breakpoint Improvement (simulated)




Vertebrate Resequencing Informatics   22nd March, 2011
Breakpoint Improvement (Real data)




                                                         Yalchin and Wong et al, in prep



Vertebrate Resequencing Informatics   22nd March, 2011
Application to HapMap trio dataset

 High-depth HapMap trio (NA18506, NA18507, NA18508)
   42x, 42x and 40x
 Reads processed through Vert. Reseq. Pipeline
   Aligned to the GRCh37 human reference using BWA
   Single BAM file for each individual
 BreakDancerMax, Pindel, RDXplorer, SECluster, and RetroSeq
 Exclude calls
   600 bp from a reference sequence gap
   1 Mb from a centromere or telomere
 Computational validation of raw candidate calls




Vertebrate Resequencing Informatics   22nd March, 2011
NA18506 Results




Vertebrate Resequencing Informatics   22nd March, 2011
Does multiple callers discover more SVs?




Vertebrate Resequencing Informatics   22nd March, 2011
How do the calls measure up?

 Compared the overlap of the deletion, gain, and inversion calls
 against the curated Database of Genomic Variants
   Overlapped with calls in DGV at a rate significantly higher than
      expected by random chance
   Deletions in DGV: 71% (NA18506), 81% (NA18507), and 71%
      (NA18508)
   Copy number gains in DGV: 29% (NA18506), 32% (NA18507),
      and 36% (NA18508)
   Inversions in DGV: 47% (NA18506), 69% (NA18507), and 51%
      (NA18508)
 Child calls not in DGV also called in the parents
   Further 18% deletions, 32% inversions, 54% duplications
   Estimated max. false positive rate of 11%, 21%, and 17%
 All child-only SV calls comprise 11% of the child's final SV call
   Considerable improvement from 'merged raw’ (50% unique)


Vertebrate Resequencing Informatics   22nd March, 2011
Complex SV Types




                                                         Yalchin and Wong et al, in prep

Vertebrate Resequencing Informatics   22nd March, 2011
Future Work

 SVMerge primarily a discovery and validation tool
   Extensible pipeline so that calls from any method to be easily
    incorporated
 Developed primarily for mouse genomes project
   Successfully applied to human trio dataset
   Computationally validation approach reduces false positives
 Complex SVs
   Cataloging repeating combinations of multiple SV events in small
    loci
 2011 development
   Low coverage cross-population SV discovery
   Genotyping existing SVs in new samples
   Better support for heterozygous calls
   Integration of SVMerge into Vert. Reseq. pipeline for UK10K
Vertebrate Resequencing Informatics   22nd March, 2011

Mais conteúdo relacionado

Mais procurados

An introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAn introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAGRF_Ltd
 
Talk ABRF 2015 (Gunnar Rätsch)
Talk ABRF 2015 (Gunnar Rätsch)Talk ABRF 2015 (Gunnar Rätsch)
Talk ABRF 2015 (Gunnar Rätsch)Gunnar Rätsch
 
RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1BITS
 
RNA-seq: Mapping and quality control - part 3
RNA-seq: Mapping and quality control - part 3RNA-seq: Mapping and quality control - part 3
RNA-seq: Mapping and quality control - part 3BITS
 
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisSo you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisUniversity of California, Davis
 
Dgaston dec-06-2012
Dgaston dec-06-2012Dgaston dec-06-2012
Dgaston dec-06-2012Dan Gaston
 
NGS: bioinformatic challenges
NGS: bioinformatic challengesNGS: bioinformatic challenges
NGS: bioinformatic challengesLex Nederbragt
 
Knowing Your NGS Upstream: Alignment and Variants
Knowing Your NGS Upstream: Alignment and VariantsKnowing Your NGS Upstream: Alignment and Variants
Knowing Your NGS Upstream: Alignment and VariantsGolden Helix Inc
 
RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2BITS
 
CSU Next Generation Sequencing Core 06/09/2015
CSU Next Generation Sequencing Core 06/09/2015CSU Next Generation Sequencing Core 06/09/2015
CSU Next Generation Sequencing Core 06/09/2015Richard Casey
 
Part 4 of RNA-seq for DE analysis: Extracting count table and QC
Part 4 of RNA-seq for DE analysis: Extracting count table and QCPart 4 of RNA-seq for DE analysis: Extracting count table and QC
Part 4 of RNA-seq for DE analysis: Extracting count table and QCJoachim Jacob
 
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun SequencesTools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun SequencesSurya Saha
 
20170209 ngs for_cancer_genomics_101
20170209 ngs for_cancer_genomics_10120170209 ngs for_cancer_genomics_101
20170209 ngs for_cancer_genomics_101Ino de Bruijn
 
Part 2 of RNA-seq for DE analysis: Investigating raw data
Part 2 of RNA-seq for DE analysis: Investigating raw dataPart 2 of RNA-seq for DE analysis: Investigating raw data
Part 2 of RNA-seq for DE analysis: Investigating raw dataJoachim Jacob
 
RNA-seq Data Analysis Overview
RNA-seq Data Analysis OverviewRNA-seq Data Analysis Overview
RNA-seq Data Analysis OverviewSean Davis
 
Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...
Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...
Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...VHIR Vall d’Hebron Institut de Recerca
 
Apollo - A webinar for the Phascolarctos cinereus research community
Apollo - A webinar for the Phascolarctos cinereus research communityApollo - A webinar for the Phascolarctos cinereus research community
Apollo - A webinar for the Phascolarctos cinereus research communityMonica Munoz-Torres
 

Mais procurados (20)

An introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAn introduction to RNA-seq data analysis
An introduction to RNA-seq data analysis
 
Talk ABRF 2015 (Gunnar Rätsch)
Talk ABRF 2015 (Gunnar Rätsch)Talk ABRF 2015 (Gunnar Rätsch)
Talk ABRF 2015 (Gunnar Rätsch)
 
2013 CSM
2013 CSM2013 CSM
2013 CSM
 
RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1
 
RNA-seq: Mapping and quality control - part 3
RNA-seq: Mapping and quality control - part 3RNA-seq: Mapping and quality control - part 3
RNA-seq: Mapping and quality control - part 3
 
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisSo you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
 
Dgaston dec-06-2012
Dgaston dec-06-2012Dgaston dec-06-2012
Dgaston dec-06-2012
 
NGS: bioinformatic challenges
NGS: bioinformatic challengesNGS: bioinformatic challenges
NGS: bioinformatic challenges
 
Knowing Your NGS Upstream: Alignment and Variants
Knowing Your NGS Upstream: Alignment and VariantsKnowing Your NGS Upstream: Alignment and Variants
Knowing Your NGS Upstream: Alignment and Variants
 
RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2
 
CSU Next Generation Sequencing Core 06/09/2015
CSU Next Generation Sequencing Core 06/09/2015CSU Next Generation Sequencing Core 06/09/2015
CSU Next Generation Sequencing Core 06/09/2015
 
Rna seq
Rna seqRna seq
Rna seq
 
Part 4 of RNA-seq for DE analysis: Extracting count table and QC
Part 4 of RNA-seq for DE analysis: Extracting count table and QCPart 4 of RNA-seq for DE analysis: Extracting count table and QC
Part 4 of RNA-seq for DE analysis: Extracting count table and QC
 
Data analysis pipelines for NGS applications
Data analysis pipelines for NGS applicationsData analysis pipelines for NGS applications
Data analysis pipelines for NGS applications
 
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun SequencesTools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences
Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences
 
20170209 ngs for_cancer_genomics_101
20170209 ngs for_cancer_genomics_10120170209 ngs for_cancer_genomics_101
20170209 ngs for_cancer_genomics_101
 
Part 2 of RNA-seq for DE analysis: Investigating raw data
Part 2 of RNA-seq for DE analysis: Investigating raw dataPart 2 of RNA-seq for DE analysis: Investigating raw data
Part 2 of RNA-seq for DE analysis: Investigating raw data
 
RNA-seq Data Analysis Overview
RNA-seq Data Analysis OverviewRNA-seq Data Analysis Overview
RNA-seq Data Analysis Overview
 
Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...
Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...
Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...
 
Apollo - A webinar for the Phascolarctos cinereus research community
Apollo - A webinar for the Phascolarctos cinereus research communityApollo - A webinar for the Phascolarctos cinereus research community
Apollo - A webinar for the Phascolarctos cinereus research community
 

Semelhante a Enhanced structural variant and breakpoint detection using SVMerge by integration of multiple detection methods and local assembly

Rare Variant Analysis Workflows: Analyzing NGS Data in Large Cohorts
Rare Variant Analysis Workflows: Analyzing NGS Data in Large CohortsRare Variant Analysis Workflows: Analyzing NGS Data in Large Cohorts
Rare Variant Analysis Workflows: Analyzing NGS Data in Large CohortsGolden Helix Inc
 
The Clinical Significance of Transcript Alignment Discrepancies … and tools t...
The Clinical Significance of Transcript Alignment Discrepancies … and tools t...The Clinical Significance of Transcript Alignment Discrepancies … and tools t...
The Clinical Significance of Transcript Alignment Discrepancies … and tools t...Human Variome Project
 
The Clinical Significance of Transcript Alignment Discrepancies
The Clinical Significance of Transcript Alignment DiscrepanciesThe Clinical Significance of Transcript Alignment Discrepancies
The Clinical Significance of Transcript Alignment DiscrepanciesReece Hart
 
Integrating phylogenetic inference and metadata visualization for NGS data
Integrating phylogenetic inference and metadata visualization for NGS dataIntegrating phylogenetic inference and metadata visualization for NGS data
Integrating phylogenetic inference and metadata visualization for NGS dataJoão André Carriço
 
iEvoBio Hertweck abstract 2012
iEvoBio Hertweck abstract 2012iEvoBio Hertweck abstract 2012
iEvoBio Hertweck abstract 2012Kate Hertweck
 
160627 giab for festival sv workshop
160627 giab for festival sv workshop160627 giab for festival sv workshop
160627 giab for festival sv workshopGenomeInABottle
 
2015 09-29-sbc322-methods.key
2015 09-29-sbc322-methods.key2015 09-29-sbc322-methods.key
2015 09-29-sbc322-methods.keyYannick Wurm
 
2944_IJDR_final_version
2944_IJDR_final_version2944_IJDR_final_version
2944_IJDR_final_versionDago Noel
 
2944_IJDR_final_version
2944_IJDR_final_version2944_IJDR_final_version
2944_IJDR_final_versionDago Noel
 
Variant (SNPs/Indels) calling in DNA sequences, Part 2
Variant (SNPs/Indels) calling in DNA sequences, Part 2Variant (SNPs/Indels) calling in DNA sequences, Part 2
Variant (SNPs/Indels) calling in DNA sequences, Part 2Denis C. Bauer
 
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVSExploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVSGolden Helix Inc
 
Transcript detection in RNAseq
Transcript detection in RNAseqTranscript detection in RNAseq
Transcript detection in RNAseqDenis C. Bauer
 
Overview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataOverview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataThomas Keane
 
Mouse Genomes Project + RNA-Editing
Mouse Genomes Project + RNA-EditingMouse Genomes Project + RNA-Editing
Mouse Genomes Project + RNA-EditingThomas Keane
 
Comparative genomics to the rescue: How complete is your plant genome sequence?
Comparative genomics to the rescue: How complete is your plant genome sequence?Comparative genomics to the rescue: How complete is your plant genome sequence?
Comparative genomics to the rescue: How complete is your plant genome sequence?Klaas Vandepoele
 

Semelhante a Enhanced structural variant and breakpoint detection using SVMerge by integration of multiple detection methods and local assembly (20)

Rare Variant Analysis Workflows: Analyzing NGS Data in Large Cohorts
Rare Variant Analysis Workflows: Analyzing NGS Data in Large CohortsRare Variant Analysis Workflows: Analyzing NGS Data in Large Cohorts
Rare Variant Analysis Workflows: Analyzing NGS Data in Large Cohorts
 
The Clinical Significance of Transcript Alignment Discrepancies … and tools t...
The Clinical Significance of Transcript Alignment Discrepancies … and tools t...The Clinical Significance of Transcript Alignment Discrepancies … and tools t...
The Clinical Significance of Transcript Alignment Discrepancies … and tools t...
 
The Clinical Significance of Transcript Alignment Discrepancies
The Clinical Significance of Transcript Alignment DiscrepanciesThe Clinical Significance of Transcript Alignment Discrepancies
The Clinical Significance of Transcript Alignment Discrepancies
 
Integrating phylogenetic inference and metadata visualization for NGS data
Integrating phylogenetic inference and metadata visualization for NGS dataIntegrating phylogenetic inference and metadata visualization for NGS data
Integrating phylogenetic inference and metadata visualization for NGS data
 
iEvoBio Hertweck abstract 2012
iEvoBio Hertweck abstract 2012iEvoBio Hertweck abstract 2012
iEvoBio Hertweck abstract 2012
 
160627 giab for festival sv workshop
160627 giab for festival sv workshop160627 giab for festival sv workshop
160627 giab for festival sv workshop
 
2015 09-29-sbc322-methods.key
2015 09-29-sbc322-methods.key2015 09-29-sbc322-methods.key
2015 09-29-sbc322-methods.key
 
2944_IJDR_final_version
2944_IJDR_final_version2944_IJDR_final_version
2944_IJDR_final_version
 
2944_IJDR_final_version
2944_IJDR_final_version2944_IJDR_final_version
2944_IJDR_final_version
 
Variant (SNPs/Indels) calling in DNA sequences, Part 2
Variant (SNPs/Indels) calling in DNA sequences, Part 2Variant (SNPs/Indels) calling in DNA sequences, Part 2
Variant (SNPs/Indels) calling in DNA sequences, Part 2
 
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVSExploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS
 
BiDiBlast Tool Presentation
BiDiBlast Tool PresentationBiDiBlast Tool Presentation
BiDiBlast Tool Presentation
 
Bioinformatica 08-12-2011-t8-go-hmm
Bioinformatica 08-12-2011-t8-go-hmmBioinformatica 08-12-2011-t8-go-hmm
Bioinformatica 08-12-2011-t8-go-hmm
 
Image Based Transcriptomics: An Overview
Image Based Transcriptomics: An OverviewImage Based Transcriptomics: An Overview
Image Based Transcriptomics: An Overview
 
Transcript detection in RNAseq
Transcript detection in RNAseqTranscript detection in RNAseq
Transcript detection in RNAseq
 
Overview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataOverview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence data
 
Dr Justin Schonfeld - Bioinformatics Applications
Dr Justin Schonfeld - Bioinformatics ApplicationsDr Justin Schonfeld - Bioinformatics Applications
Dr Justin Schonfeld - Bioinformatics Applications
 
Mouse Genomes Project + RNA-Editing
Mouse Genomes Project + RNA-EditingMouse Genomes Project + RNA-Editing
Mouse Genomes Project + RNA-Editing
 
NCBI
NCBINCBI
NCBI
 
Comparative genomics to the rescue: How complete is your plant genome sequence?
Comparative genomics to the rescue: How complete is your plant genome sequence?Comparative genomics to the rescue: How complete is your plant genome sequence?
Comparative genomics to the rescue: How complete is your plant genome sequence?
 

Mais de Thomas Keane

Multiple mouse reference genomes and strain specific gene annotations
Multiple mouse reference genomes and strain specific gene annotationsMultiple mouse reference genomes and strain specific gene annotations
Multiple mouse reference genomes and strain specific gene annotationsThomas Keane
 
Mousegenomes tk-wtsi (1)
Mousegenomes tk-wtsi (1)Mousegenomes tk-wtsi (1)
Mousegenomes tk-wtsi (1)Thomas Keane
 
2014 Wellcome Trust Advances Course: NGS Course - Lecture2
2014 Wellcome Trust Advances Course: NGS Course - Lecture22014 Wellcome Trust Advances Course: NGS Course - Lecture2
2014 Wellcome Trust Advances Course: NGS Course - Lecture2Thomas Keane
 
Wellcome Trust Advances Course: NGS Course - Lecture1
Wellcome Trust Advances Course: NGS Course - Lecture1Wellcome Trust Advances Course: NGS Course - Lecture1
Wellcome Trust Advances Course: NGS Course - Lecture1Thomas Keane
 
Large Scale Resequencing: Approaches and Challenges
Large Scale Resequencing: Approaches and ChallengesLarge Scale Resequencing: Approaches and Challenges
Large Scale Resequencing: Approaches and ChallengesThomas Keane
 
Assessing the impact of transposable element variation on mouse phenotypes an...
Assessing the impact of transposable element variation on mouse phenotypes an...Assessing the impact of transposable element variation on mouse phenotypes an...
Assessing the impact of transposable element variation on mouse phenotypes an...Thomas Keane
 
Overview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataOverview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataThomas Keane
 
Next generation sequencing in cloud computing era
Next generation sequencing in cloud computing eraNext generation sequencing in cloud computing era
Next generation sequencing in cloud computing eraThomas Keane
 
1000G/UK10K: Bioinformatics, storage, and compute challenges of large scale r...
1000G/UK10K: Bioinformatics, storage, and compute challenges of large scale r...1000G/UK10K: Bioinformatics, storage, and compute challenges of large scale r...
1000G/UK10K: Bioinformatics, storage, and compute challenges of large scale r...Thomas Keane
 
Mouse Genomes Poster - Genetics 2010
Mouse Genomes Poster - Genetics 2010Mouse Genomes Poster - Genetics 2010
Mouse Genomes Poster - Genetics 2010Thomas Keane
 
Mouse Genomes Project Summary June 2010
Mouse Genomes Project Summary June 2010Mouse Genomes Project Summary June 2010
Mouse Genomes Project Summary June 2010Thomas Keane
 
ECCB 2010 Next-gen sequencing Tutorial
ECCB 2010 Next-gen sequencing TutorialECCB 2010 Next-gen sequencing Tutorial
ECCB 2010 Next-gen sequencing TutorialThomas Keane
 

Mais de Thomas Keane (12)

Multiple mouse reference genomes and strain specific gene annotations
Multiple mouse reference genomes and strain specific gene annotationsMultiple mouse reference genomes and strain specific gene annotations
Multiple mouse reference genomes and strain specific gene annotations
 
Mousegenomes tk-wtsi (1)
Mousegenomes tk-wtsi (1)Mousegenomes tk-wtsi (1)
Mousegenomes tk-wtsi (1)
 
2014 Wellcome Trust Advances Course: NGS Course - Lecture2
2014 Wellcome Trust Advances Course: NGS Course - Lecture22014 Wellcome Trust Advances Course: NGS Course - Lecture2
2014 Wellcome Trust Advances Course: NGS Course - Lecture2
 
Wellcome Trust Advances Course: NGS Course - Lecture1
Wellcome Trust Advances Course: NGS Course - Lecture1Wellcome Trust Advances Course: NGS Course - Lecture1
Wellcome Trust Advances Course: NGS Course - Lecture1
 
Large Scale Resequencing: Approaches and Challenges
Large Scale Resequencing: Approaches and ChallengesLarge Scale Resequencing: Approaches and Challenges
Large Scale Resequencing: Approaches and Challenges
 
Assessing the impact of transposable element variation on mouse phenotypes an...
Assessing the impact of transposable element variation on mouse phenotypes an...Assessing the impact of transposable element variation on mouse phenotypes an...
Assessing the impact of transposable element variation on mouse phenotypes an...
 
Overview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataOverview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence data
 
Next generation sequencing in cloud computing era
Next generation sequencing in cloud computing eraNext generation sequencing in cloud computing era
Next generation sequencing in cloud computing era
 
1000G/UK10K: Bioinformatics, storage, and compute challenges of large scale r...
1000G/UK10K: Bioinformatics, storage, and compute challenges of large scale r...1000G/UK10K: Bioinformatics, storage, and compute challenges of large scale r...
1000G/UK10K: Bioinformatics, storage, and compute challenges of large scale r...
 
Mouse Genomes Poster - Genetics 2010
Mouse Genomes Poster - Genetics 2010Mouse Genomes Poster - Genetics 2010
Mouse Genomes Poster - Genetics 2010
 
Mouse Genomes Project Summary June 2010
Mouse Genomes Project Summary June 2010Mouse Genomes Project Summary June 2010
Mouse Genomes Project Summary June 2010
 
ECCB 2010 Next-gen sequencing Tutorial
ECCB 2010 Next-gen sequencing TutorialECCB 2010 Next-gen sequencing Tutorial
ECCB 2010 Next-gen sequencing Tutorial
 

Último

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 

Último (20)

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

Enhanced structural variant and breakpoint detection using SVMerge by integration of multiple detection methods and local assembly

  • 1. Enhanced structural variant and breakpoint detection using SVMerge by integration of multiple detection methods and local assembly Kim Wong/Thomas Keane Vertebrate Resequencing Informatics http://svmerge.sourceforge.net Vertebrate Resequencing Informatics 22nd March, 2011
  • 2. Genomic Structural Variation Large DNA rearrangements (>100bp) Frequent causes of disease  Referred to as genomic disorders  Mendelian diseases or complex traits such as behaviors  E.g. increase in gene dosage due to increase in copy number  Prevalent in cancer genomes Many types of genomic structural variation (SV)  Insertions, deletions, copy number changes, inversions, translocations & complex events Comparative genomic hybridization (CGH) traditionally used to for copy number discovery  CNVs of 1–50 kb in size have been under-ascertained Next-gen sequencing revolutionised field of SV discovery  Parallel sequencing of ends of large numbers of DNA fragments  Examine alignment distance of reads to discover presence of genomic rearrangments  Resolution down to ~100bp Vertebrate Resequencing Informatics 22nd March, 2011
  • 3. Simple types of Structural Variation Vertebrate Resequencing Informatics 22nd March, 2011
  • 4. Deletion SV Visualisation  LookSeq viewer  Read pairs displayed  Y axis is aligned insert size Deletions are easily spotted  Read pairs are mapped further apart than expected  Coverage is zero across the deletion sequence Deletion in NOD/ShiLtJ Vertebrate Resequencing Informatics 22nd March, 2011
  • 5. Inversion Mate pairs align in the same orientation Coverage zero at breakpoints Vertebrate Resequencing Informatics 22nd March, 2011
  • 6. Insertion One end mapped reads Coverage zero at breakpoint Vertebrate Resequencing Informatics 22nd March, 2011
  • 7. Complex SV Events Inversion Insertion Insertion Vertebrate Resequencing Informatics 22nd March, 2011
  • 8. Human Examples Stankiewicz and Lupski (2010) Ann. Rev. Med. Vertebrate Resequencing Informatics 22nd March, 2011
  • 9. Example 2: Transposable element insertion in mice Vertebrate Resequencing Informatics 22nd March, 2011
  • 10. SVMerge Initially developed for mouse genomes project   Several software packages currently available to discover SVs Various approaches using information from anomalously mapped read pairs OR read depth analysis No single SV caller is able to detect the full range of structural variants   Paired-end mapping information, for example, cannot detect SVs where the read pairs do not flank the SV breakpoints   Insertion calls made using the split-mapping approach are also size-limited because the whole insertion breakpoint must be contained within a read   Read-depth approaches can identify copy number changes without the need for read-pair support, but cannot find copy number neutral events SVMerge, a meta SV calling pipeline, which makes SV predictions with a collection of SV callers   Input is a BAM file per sample   Run callers individually + outputs sanitized into standard BED format   SV calls merged, and computationally validated using local de novo assembly   Primarily a SV discovery/calling + validation tool Vertebrate Resequencing Informatics 22nd March, 2011
  • 11. SVMerge Workflow Wong et al (2010) Vertebrate Resequencing Informatics 22nd March, 2011
  • 12. SV Callers Wong et al (2010) Vertebrate Resequencing Informatics 22nd March, 2011
  • 13. Local Assembly Validation Key to the approach is the computational validation step  Local assembly and breakpoint refinement  All SV calls (except those lacking read pair support e.g. CNG/CNL) Algorithm  Gather mapped reads, and any unmapped mate-pairs (<1kb of a insertion breakpoint, <2kb of all other SV types)  Run local velvet assembly  Realign the contigs produced with exonerate  Detect contig breaks proximal to the breakpoint(s) Vertebrate Resequencing Informatics 22nd March, 2011
  • 14. Breakpoint Improvement (simulated) Vertebrate Resequencing Informatics 22nd March, 2011
  • 15. Breakpoint Improvement (Real data) Yalchin and Wong et al, in prep Vertebrate Resequencing Informatics 22nd March, 2011
  • 16. Application to HapMap trio dataset High-depth HapMap trio (NA18506, NA18507, NA18508)  42x, 42x and 40x Reads processed through Vert. Reseq. Pipeline  Aligned to the GRCh37 human reference using BWA  Single BAM file for each individual BreakDancerMax, Pindel, RDXplorer, SECluster, and RetroSeq Exclude calls  600 bp from a reference sequence gap  1 Mb from a centromere or telomere Computational validation of raw candidate calls Vertebrate Resequencing Informatics 22nd March, 2011
  • 17. NA18506 Results Vertebrate Resequencing Informatics 22nd March, 2011
  • 18. Does multiple callers discover more SVs? Vertebrate Resequencing Informatics 22nd March, 2011
  • 19. How do the calls measure up? Compared the overlap of the deletion, gain, and inversion calls against the curated Database of Genomic Variants  Overlapped with calls in DGV at a rate significantly higher than expected by random chance  Deletions in DGV: 71% (NA18506), 81% (NA18507), and 71% (NA18508)  Copy number gains in DGV: 29% (NA18506), 32% (NA18507), and 36% (NA18508)  Inversions in DGV: 47% (NA18506), 69% (NA18507), and 51% (NA18508) Child calls not in DGV also called in the parents  Further 18% deletions, 32% inversions, 54% duplications  Estimated max. false positive rate of 11%, 21%, and 17% All child-only SV calls comprise 11% of the child's final SV call  Considerable improvement from 'merged raw’ (50% unique) Vertebrate Resequencing Informatics 22nd March, 2011
  • 20. Complex SV Types Yalchin and Wong et al, in prep Vertebrate Resequencing Informatics 22nd March, 2011
  • 21. Future Work SVMerge primarily a discovery and validation tool  Extensible pipeline so that calls from any method to be easily incorporated Developed primarily for mouse genomes project  Successfully applied to human trio dataset  Computationally validation approach reduces false positives Complex SVs  Cataloging repeating combinations of multiple SV events in small loci 2011 development  Low coverage cross-population SV discovery  Genotyping existing SVs in new samples  Better support for heterozygous calls  Integration of SVMerge into Vert. Reseq. pipeline for UK10K Vertebrate Resequencing Informatics 22nd March, 2011