SlideShare uma empresa Scribd logo
1 de 18
Baixar para ler offline
Effect of Repeats on the
Characterization of Structural Variation
Nancy F. Hansen, Ph.D..
September 15, 2016
Outline of my talk
• Description of “PBRefine” callset
• Refinement of regions by alignment of PacBio assemblies to the human
reference (Build37) with nucmer (MUMmer3.23, Kurtz et al., Genome
Biology (2004))
• Characterization of SVs using mummerplot dot plots
• Role of repeats in curation of structural variation
• Ambiguities in the positions of insertions and deletions due to repeats
• “Correct” answer can be dependent on alignment algorithm
• Evidence from different technological platforms can point to different
breakpoints
The PBRefine Pipeline
Extract reference
sequence surrounding
variant predictions
from reference
Align reference
sequence to PB
assembly* with
MUMmer
Count end-
to-end
alignments
Discard
region as
repetitive
Align assembly region
back to reference with
MUMmer
Characterize
variants
More
than 2
2 or
fewer
* CA and hybrid Falcon
assemblies for all
three trio members
Why long read assemblies for structural variant prediction?
• Continuity
• Consensus accuracy
Why not long read assemblies?
• Often assemblers will miss the second haplotype for diploid
organisms
Accurate positions,
accurate consensus
for novel inserted
sequences
Inaccurate genotypes
for heterozygotes
labeled as
homozygotes
How often are variants confirmed?
1. Consider only SVs for which there are one or two contigs
found in the assembly
2. Require consistent position and variant type
Variant
Type
Total
Calls
Assembler Variants
confirmed
in HG002
Variants
confirmed
in HG003
Variants
confirmed
in HG004
Overall 6,784 Mt.
Sinai/Falcon
1,851
(27.3%)
1,729
(25.5%)
1,708
(25.2%)
NHGRI/CA 1,808
(26.7%)
1,565
(23.1%)
1,545
(22.8%)
Insertion
s
743 Mt.
Sinai/Falcon
171 (23.0%) 157 (21.1%) 156 (21.0%)
NHGRI/CA 155 (20.9%) 134 (18.0%) 130 (17.5%)
Deletions 6,041 Mt.
Sinai/Falcon
1,680
(27.8%)
1,572
(26.0%)
1,552
(25.7%)
NHGRI/CA 1,653 1,431 1,415
(Mummerplot,
Adam Philippy)
Simple deletion
Reference
Assembly
Simple deletion
Dr
Size of deletion=Dr
Simple deletion
Simple deletion
Deletion flanked by repeated sequence
Reference
Assembly
Deletion flanked by repeated sequence
Dr
Dc
Size of deletion=Dr - Dc
Deletion flanked by repeated sequence
Simple insertion with duplication of flanking sequence
Simple insertion
Reference
Assembly
Simple insertion with duplication of flanking sequence
Simple insertion
Insertion of an additional copy of a tandem repeat
Tandem insertion
Insertion of an additional copy of a tandem repeat
Tandem insertion
Inversions
Inversion
Inversions
Inversion
Deletion of one copy of a tandem inverted repeat
Tandem inverted repeat deletion
Deletion of one copy of a tandem inverted repeat
Tandem inverted repeat deletion
• Thank you!
• Jim Mullikin
• Adam Phillippy
• Sergey Koren
• Brian Walenz
• Ali Bashir

Mais conteúdo relacionado

Mais de GenomeInABottle

GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023GenomeInABottle
 
GIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdfGIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdfGenomeInABottle
 
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923GenomeInABottle
 
Benchmarking with GIAB 220907
Benchmarking with GIAB 220907Benchmarking with GIAB 220907
Benchmarking with GIAB 220907GenomeInABottle
 
Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...GenomeInABottle
 
GIAB Technical Germline Benchmark roadmap discussion
GIAB Technical Germline Benchmark roadmap discussionGIAB Technical Germline Benchmark roadmap discussion
GIAB Technical Germline Benchmark roadmap discussionGenomeInABottle
 
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GenomeInABottle
 
Giab agbt small_var_2020
Giab agbt small_var_2020Giab agbt small_var_2020
Giab agbt small_var_2020GenomeInABottle
 
GIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGenomeInABottle
 
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GHGa4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GHGenomeInABottle
 
GIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant posterGIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant posterGenomeInABottle
 
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATKGIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATKGenomeInABottle
 
GIAB ASHG 2019 Small Variant poster
GIAB ASHG 2019 Small Variant posterGIAB ASHG 2019 Small Variant poster
GIAB ASHG 2019 Small Variant posterGenomeInABottle
 
GRC GIAB Workshop ASHG 2019 Small Variant Benchmark
GRC GIAB Workshop ASHG 2019 Small Variant BenchmarkGRC GIAB Workshop ASHG 2019 Small Variant Benchmark
GRC GIAB Workshop ASHG 2019 Small Variant BenchmarkGenomeInABottle
 
Jason Chin MHC diploid assembly
Jason Chin MHC diploid assemblyJason Chin MHC diploid assembly
Jason Chin MHC diploid assemblyGenomeInABottle
 
GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015GenomeInABottle
 
Giab for jax long read 190917
Giab for jax long read 190917Giab for jax long read 190917
Giab for jax long read 190917GenomeInABottle
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...GenomeInABottle
 

Mais de GenomeInABottle (20)

2023 GIAB AMP Update
2023 GIAB AMP Update2023 GIAB AMP Update
2023 GIAB AMP Update
 
GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023
 
Stratomod ASHG 2023
Stratomod ASHG 2023Stratomod ASHG 2023
Stratomod ASHG 2023
 
GIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdfGIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdf
 
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
 
Benchmarking with GIAB 220907
Benchmarking with GIAB 220907Benchmarking with GIAB 220907
Benchmarking with GIAB 220907
 
Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...
 
GIAB Technical Germline Benchmark roadmap discussion
GIAB Technical Germline Benchmark roadmap discussionGIAB Technical Germline Benchmark roadmap discussion
GIAB Technical Germline Benchmark roadmap discussion
 
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
 
Giab agbt small_var_2020
Giab agbt small_var_2020Giab agbt small_var_2020
Giab agbt small_var_2020
 
GIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM Forum
 
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GHGa4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
 
GIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant posterGIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant poster
 
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATKGIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
 
GIAB ASHG 2019 Small Variant poster
GIAB ASHG 2019 Small Variant posterGIAB ASHG 2019 Small Variant poster
GIAB ASHG 2019 Small Variant poster
 
GRC GIAB Workshop ASHG 2019 Small Variant Benchmark
GRC GIAB Workshop ASHG 2019 Small Variant BenchmarkGRC GIAB Workshop ASHG 2019 Small Variant Benchmark
GRC GIAB Workshop ASHG 2019 Small Variant Benchmark
 
Jason Chin MHC diploid assembly
Jason Chin MHC diploid assemblyJason Chin MHC diploid assembly
Jason Chin MHC diploid assembly
 
GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015
 
Giab for jax long read 190917
Giab for jax long read 190917Giab for jax long read 190917
Giab for jax long read 190917
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
 

Último

Myelin Oligodendrocyte Glycoprotein antibody associated disease (MOGAD)
Myelin Oligodendrocyte Glycoprotein antibody associated disease (MOGAD)Myelin Oligodendrocyte Glycoprotein antibody associated disease (MOGAD)
Myelin Oligodendrocyte Glycoprotein antibody associated disease (MOGAD)MohamadAlhes
 
Giftedness: Understanding Everyday Neurobiology for Self-Knowledge
Giftedness: Understanding Everyday Neurobiology for Self-KnowledgeGiftedness: Understanding Everyday Neurobiology for Self-Knowledge
Giftedness: Understanding Everyday Neurobiology for Self-Knowledgeassessoriafabianodea
 
Lippincott Microcards_ Microbiology Flash Cards-LWW (2015).pdf
Lippincott Microcards_ Microbiology Flash Cards-LWW (2015).pdfLippincott Microcards_ Microbiology Flash Cards-LWW (2015).pdf
Lippincott Microcards_ Microbiology Flash Cards-LWW (2015).pdfSreeja Cherukuru
 
Presentació "Real-Life VR Integration for Mild Cognitive Impairment Rehabilit...
Presentació "Real-Life VR Integration for Mild Cognitive Impairment Rehabilit...Presentació "Real-Life VR Integration for Mild Cognitive Impairment Rehabilit...
Presentació "Real-Life VR Integration for Mild Cognitive Impairment Rehabilit...Badalona Serveis Assistencials
 
SYNDESMOTIC INJURY- ANATOMICAL REPAIR.pptx
SYNDESMOTIC INJURY- ANATOMICAL REPAIR.pptxSYNDESMOTIC INJURY- ANATOMICAL REPAIR.pptx
SYNDESMOTIC INJURY- ANATOMICAL REPAIR.pptxdrashraf369
 
L1.INTRODUCTION to ENDOCRINOLOGY MEDICINE.pptx
L1.INTRODUCTION to ENDOCRINOLOGY MEDICINE.pptxL1.INTRODUCTION to ENDOCRINOLOGY MEDICINE.pptx
L1.INTRODUCTION to ENDOCRINOLOGY MEDICINE.pptxDr Bilal Natiq
 
METHODS OF ACQUIRING KNOWLEDGE IN NURSING.pptx by navdeep kaur
METHODS OF ACQUIRING KNOWLEDGE IN NURSING.pptx by navdeep kaurMETHODS OF ACQUIRING KNOWLEDGE IN NURSING.pptx by navdeep kaur
METHODS OF ACQUIRING KNOWLEDGE IN NURSING.pptx by navdeep kaurNavdeep Kaur
 
LESSON PLAN ON fever.pdf child health nursing
LESSON PLAN ON fever.pdf child health nursingLESSON PLAN ON fever.pdf child health nursing
LESSON PLAN ON fever.pdf child health nursingSakthi Kathiravan
 
Presentation on Parasympathetic Nervous System
Presentation on Parasympathetic Nervous SystemPresentation on Parasympathetic Nervous System
Presentation on Parasympathetic Nervous SystemPrerana Jadhav
 
epilepsy and status epilepticus for undergraduate.pptx
epilepsy and status epilepticus  for undergraduate.pptxepilepsy and status epilepticus  for undergraduate.pptx
epilepsy and status epilepticus for undergraduate.pptxMohamed Rizk Khodair
 
MedDRA-A-Comprehensive-Guide-to-Standardized-Medical-Terminology.pdf
MedDRA-A-Comprehensive-Guide-to-Standardized-Medical-Terminology.pdfMedDRA-A-Comprehensive-Guide-to-Standardized-Medical-Terminology.pdf
MedDRA-A-Comprehensive-Guide-to-Standardized-Medical-Terminology.pdfSasikiranMarri
 
Music Therapy's Impact in Palliative Care| IAPCON2024| Dr. Tara Rajendran
Music Therapy's Impact in Palliative Care| IAPCON2024| Dr. Tara RajendranMusic Therapy's Impact in Palliative Care| IAPCON2024| Dr. Tara Rajendran
Music Therapy's Impact in Palliative Care| IAPCON2024| Dr. Tara RajendranTara Rajendran
 
SCHOOL HEALTH SERVICES.pptx made by Sapna Thakur
SCHOOL HEALTH SERVICES.pptx made by Sapna ThakurSCHOOL HEALTH SERVICES.pptx made by Sapna Thakur
SCHOOL HEALTH SERVICES.pptx made by Sapna ThakurSapna Thakur
 
COVID-19 (NOVEL CORONA VIRUS DISEASE PANDEMIC ).pptx
COVID-19  (NOVEL CORONA  VIRUS DISEASE PANDEMIC ).pptxCOVID-19  (NOVEL CORONA  VIRUS DISEASE PANDEMIC ).pptx
COVID-19 (NOVEL CORONA VIRUS DISEASE PANDEMIC ).pptxBibekananda shah
 
Apiculture Chapter 1. Introduction 2.ppt
Apiculture Chapter 1. Introduction 2.pptApiculture Chapter 1. Introduction 2.ppt
Apiculture Chapter 1. Introduction 2.pptkedirjemalharun
 
Nutrition of OCD for my Nutritional Neuroscience Class
Nutrition of OCD for my Nutritional Neuroscience ClassNutrition of OCD for my Nutritional Neuroscience Class
Nutrition of OCD for my Nutritional Neuroscience Classmanuelazg2001
 
medico legal aspects of wound - forensic medicine
medico legal aspects of wound - forensic medicinemedico legal aspects of wound - forensic medicine
medico legal aspects of wound - forensic medicinethanaram patel
 
The next social challenge to public health: the information environment.pptx
The next social challenge to public health:  the information environment.pptxThe next social challenge to public health:  the information environment.pptx
The next social challenge to public health: the information environment.pptxTina Purnat
 
Presentation on General Anesthetics pdf.
Presentation on General Anesthetics pdf.Presentation on General Anesthetics pdf.
Presentation on General Anesthetics pdf.Prerana Jadhav
 
Biomechanics- Shoulder Joint!!!!!!!!!!!!
Biomechanics- Shoulder Joint!!!!!!!!!!!!Biomechanics- Shoulder Joint!!!!!!!!!!!!
Biomechanics- Shoulder Joint!!!!!!!!!!!!ibtesaam huma
 

Último (20)

Myelin Oligodendrocyte Glycoprotein antibody associated disease (MOGAD)
Myelin Oligodendrocyte Glycoprotein antibody associated disease (MOGAD)Myelin Oligodendrocyte Glycoprotein antibody associated disease (MOGAD)
Myelin Oligodendrocyte Glycoprotein antibody associated disease (MOGAD)
 
Giftedness: Understanding Everyday Neurobiology for Self-Knowledge
Giftedness: Understanding Everyday Neurobiology for Self-KnowledgeGiftedness: Understanding Everyday Neurobiology for Self-Knowledge
Giftedness: Understanding Everyday Neurobiology for Self-Knowledge
 
Lippincott Microcards_ Microbiology Flash Cards-LWW (2015).pdf
Lippincott Microcards_ Microbiology Flash Cards-LWW (2015).pdfLippincott Microcards_ Microbiology Flash Cards-LWW (2015).pdf
Lippincott Microcards_ Microbiology Flash Cards-LWW (2015).pdf
 
Presentació "Real-Life VR Integration for Mild Cognitive Impairment Rehabilit...
Presentació "Real-Life VR Integration for Mild Cognitive Impairment Rehabilit...Presentació "Real-Life VR Integration for Mild Cognitive Impairment Rehabilit...
Presentació "Real-Life VR Integration for Mild Cognitive Impairment Rehabilit...
 
SYNDESMOTIC INJURY- ANATOMICAL REPAIR.pptx
SYNDESMOTIC INJURY- ANATOMICAL REPAIR.pptxSYNDESMOTIC INJURY- ANATOMICAL REPAIR.pptx
SYNDESMOTIC INJURY- ANATOMICAL REPAIR.pptx
 
L1.INTRODUCTION to ENDOCRINOLOGY MEDICINE.pptx
L1.INTRODUCTION to ENDOCRINOLOGY MEDICINE.pptxL1.INTRODUCTION to ENDOCRINOLOGY MEDICINE.pptx
L1.INTRODUCTION to ENDOCRINOLOGY MEDICINE.pptx
 
METHODS OF ACQUIRING KNOWLEDGE IN NURSING.pptx by navdeep kaur
METHODS OF ACQUIRING KNOWLEDGE IN NURSING.pptx by navdeep kaurMETHODS OF ACQUIRING KNOWLEDGE IN NURSING.pptx by navdeep kaur
METHODS OF ACQUIRING KNOWLEDGE IN NURSING.pptx by navdeep kaur
 
LESSON PLAN ON fever.pdf child health nursing
LESSON PLAN ON fever.pdf child health nursingLESSON PLAN ON fever.pdf child health nursing
LESSON PLAN ON fever.pdf child health nursing
 
Presentation on Parasympathetic Nervous System
Presentation on Parasympathetic Nervous SystemPresentation on Parasympathetic Nervous System
Presentation on Parasympathetic Nervous System
 
epilepsy and status epilepticus for undergraduate.pptx
epilepsy and status epilepticus  for undergraduate.pptxepilepsy and status epilepticus  for undergraduate.pptx
epilepsy and status epilepticus for undergraduate.pptx
 
MedDRA-A-Comprehensive-Guide-to-Standardized-Medical-Terminology.pdf
MedDRA-A-Comprehensive-Guide-to-Standardized-Medical-Terminology.pdfMedDRA-A-Comprehensive-Guide-to-Standardized-Medical-Terminology.pdf
MedDRA-A-Comprehensive-Guide-to-Standardized-Medical-Terminology.pdf
 
Music Therapy's Impact in Palliative Care| IAPCON2024| Dr. Tara Rajendran
Music Therapy's Impact in Palliative Care| IAPCON2024| Dr. Tara RajendranMusic Therapy's Impact in Palliative Care| IAPCON2024| Dr. Tara Rajendran
Music Therapy's Impact in Palliative Care| IAPCON2024| Dr. Tara Rajendran
 
SCHOOL HEALTH SERVICES.pptx made by Sapna Thakur
SCHOOL HEALTH SERVICES.pptx made by Sapna ThakurSCHOOL HEALTH SERVICES.pptx made by Sapna Thakur
SCHOOL HEALTH SERVICES.pptx made by Sapna Thakur
 
COVID-19 (NOVEL CORONA VIRUS DISEASE PANDEMIC ).pptx
COVID-19  (NOVEL CORONA  VIRUS DISEASE PANDEMIC ).pptxCOVID-19  (NOVEL CORONA  VIRUS DISEASE PANDEMIC ).pptx
COVID-19 (NOVEL CORONA VIRUS DISEASE PANDEMIC ).pptx
 
Apiculture Chapter 1. Introduction 2.ppt
Apiculture Chapter 1. Introduction 2.pptApiculture Chapter 1. Introduction 2.ppt
Apiculture Chapter 1. Introduction 2.ppt
 
Nutrition of OCD for my Nutritional Neuroscience Class
Nutrition of OCD for my Nutritional Neuroscience ClassNutrition of OCD for my Nutritional Neuroscience Class
Nutrition of OCD for my Nutritional Neuroscience Class
 
medico legal aspects of wound - forensic medicine
medico legal aspects of wound - forensic medicinemedico legal aspects of wound - forensic medicine
medico legal aspects of wound - forensic medicine
 
The next social challenge to public health: the information environment.pptx
The next social challenge to public health:  the information environment.pptxThe next social challenge to public health:  the information environment.pptx
The next social challenge to public health: the information environment.pptx
 
Presentation on General Anesthetics pdf.
Presentation on General Anesthetics pdf.Presentation on General Anesthetics pdf.
Presentation on General Anesthetics pdf.
 
Biomechanics- Shoulder Joint!!!!!!!!!!!!
Biomechanics- Shoulder Joint!!!!!!!!!!!!Biomechanics- Shoulder Joint!!!!!!!!!!!!
Biomechanics- Shoulder Joint!!!!!!!!!!!!
 

Sept2016 sv nhgri_repeats

  • 1. Effect of Repeats on the Characterization of Structural Variation Nancy F. Hansen, Ph.D.. September 15, 2016
  • 2. Outline of my talk • Description of “PBRefine” callset • Refinement of regions by alignment of PacBio assemblies to the human reference (Build37) with nucmer (MUMmer3.23, Kurtz et al., Genome Biology (2004)) • Characterization of SVs using mummerplot dot plots • Role of repeats in curation of structural variation • Ambiguities in the positions of insertions and deletions due to repeats • “Correct” answer can be dependent on alignment algorithm • Evidence from different technological platforms can point to different breakpoints
  • 3. The PBRefine Pipeline Extract reference sequence surrounding variant predictions from reference Align reference sequence to PB assembly* with MUMmer Count end- to-end alignments Discard region as repetitive Align assembly region back to reference with MUMmer Characterize variants More than 2 2 or fewer * CA and hybrid Falcon assemblies for all three trio members
  • 4. Why long read assemblies for structural variant prediction? • Continuity • Consensus accuracy Why not long read assemblies? • Often assemblers will miss the second haplotype for diploid organisms Accurate positions, accurate consensus for novel inserted sequences Inaccurate genotypes for heterozygotes labeled as homozygotes
  • 5. How often are variants confirmed? 1. Consider only SVs for which there are one or two contigs found in the assembly 2. Require consistent position and variant type Variant Type Total Calls Assembler Variants confirmed in HG002 Variants confirmed in HG003 Variants confirmed in HG004 Overall 6,784 Mt. Sinai/Falcon 1,851 (27.3%) 1,729 (25.5%) 1,708 (25.2%) NHGRI/CA 1,808 (26.7%) 1,565 (23.1%) 1,545 (22.8%) Insertion s 743 Mt. Sinai/Falcon 171 (23.0%) 157 (21.1%) 156 (21.0%) NHGRI/CA 155 (20.9%) 134 (18.0%) 130 (17.5%) Deletions 6,041 Mt. Sinai/Falcon 1,680 (27.8%) 1,572 (26.0%) 1,552 (25.7%) NHGRI/CA 1,653 1,431 1,415
  • 8. Deletion flanked by repeated sequence Reference Assembly Deletion flanked by repeated sequence Dr Dc Size of deletion=Dr - Dc
  • 9. Deletion flanked by repeated sequence
  • 10. Simple insertion with duplication of flanking sequence Simple insertion Reference Assembly
  • 11. Simple insertion with duplication of flanking sequence Simple insertion
  • 12. Insertion of an additional copy of a tandem repeat Tandem insertion
  • 13. Insertion of an additional copy of a tandem repeat Tandem insertion
  • 16. Deletion of one copy of a tandem inverted repeat Tandem inverted repeat deletion
  • 17. Deletion of one copy of a tandem inverted repeat Tandem inverted repeat deletion
  • 18. • Thank you! • Jim Mullikin • Adam Phillippy • Sergey Koren • Brian Walenz • Ali Bashir