SlideShare uma empresa Scribd logo
1 de 49
Potato SNPs


Dan Bolser and David Martin

  Next Gen Bug, Dundee
       01/18/2010



                        1
Aims of the work
1) Learn about handling RNASeq
     
         Create a SNP calling pipeline


2) Select SNPs for genetic mapping
     
         Using Illumina's GoldenGate SNP chip (OPA)




                                         2
Creating a SNP calling pipeline




                       3
4
Align (using BWA)
1) Index the potato genome assembly
bwa index [-a bwtsw|div|is]             [-c]
 <in.fasta>
2) Perform the alignment
bwa aln [options] <in.fasta>
 <in.fq>
3) Output results in SAM format (single end)
bwa samse <in.fasta> <in.sai>
 <in.fq>                  5
Align (using Bowtie)
1) Index the potato genome assembly
bowtie-build [options] <in.fasta>
  <ebwt>
2) Perform the alignment and output results
bowtie [options] <ebwt> <in.fq>
7
Convert (using SAMtools)
1) Convert SAM to BAM for sorting
samtools view -S -b <in.sam>
2) Sort BAM for SNP calling
samtools sort <in.bam> <out.bam.s>


  Alignments are both compressed for long term
storage and sorted for variant discovery.

                                    8
9
Coverage profiles /
  Depth vectors



                 10
SAMtools...

    Dump a coverage profile
samtools mpileup -f <in.fasta>
 <my.bam.s>
    P1   244526   A   10   ...,.,,,..      BBQa`aaaa[
    P1   244527   A   10   ...,.,,,..      BBZ_`^a_a[
    P1   244528   C   10   .$.$.,.,,,..    >>RaZ`aaaa
    P1   244529   C    8   .,.,,,..        NaXaaaa`
    P1   244530   T    8   .,.,,,..        Xa_aaa`
    P1   244531   C    8   .,.,,,..        Rbabbaa
    P1   244532   T    9   .,.,,,..^~.     EE^^^^^^A
    P1   244533   T    9   .,.,,,...       BBB
    P1   244534   T    9   .$,$.,,,...     @@^^^^^^E

                                          11
SAMtools Bio::DB::Sam (BioPerl)
Dump a coverage
 profile 2




                       12
SAMtools Bio::DB::Sam (BioPerl)
P41630
Matches : 9
0233333333333345555555555
 666778888888899999999999
 999999999999999999999999
 999976666666666665444444
 44443332211111111000

                        13
14
mpileup

    samtools mpileup collects summary
    information in the input BAMs, computes the
    likelihood of data given each possible
    genotype and stores the likelihoods in the
    BCF format.

    bcftools view applies the prior and does the
    actual calling.

    Finally, we filter.
                                    15
SNP call
1) Index the potato genome assembly (again!)
samtools faidx in.fasta
2) Run 'mpileup' to generate VCF format
samtools mpileup -ug -f in.fasta
  my1.bam.s my2.bam.s > my.raw.bcf

    Actually, all we did (I think) is perform a
    format conversion (BAM to VCF).
VCF format




             17
VCF format
A standard format for sequence variation:
  SNPs, indels and structural variants.
Compressed and indexed.
Developed for the 1000 Genomes Project.
VCFtools for VCF like SAMtools for SAM.
Specification and tools available from
 http://vcftools.sourceforge.net
                                    18
19
SNP call and filter
1) Call SNPs
bcftools view -bvcg my.raw.bcf >
 my.var.bcf
2) Filter SNPs
bcftools view my.var.bcf |
 vcfutils.pl varFilter my.var.bcf
 > my.var.bcf.filt


                             20
21
Aims of the work
1) Learn about handling RNASeq
     
         Create a SNP calling pipeline


2) Select SNPs for genetic mapping
     
         Using Illumina's GoldenGate SNP chip (OPA)




                                         22
Select SNPs for genetic mapping
 Using Illumina's GoldenGate SNP chip (OPA)




                                23
SNP chip (OPA) construction

    A set of DM SNP positions was provided by
    the SolCAP project (RNASeq derived).

    A subset was selected for developing OPAs
    (Illumina’s SNP chip technology).

    OPAs were run, and results have now been
    compared to RNASeq.


                                   24
Comparison (using an early SAMtools)
Comparison (using an early SAMtools)
27
Comparison (using an early SAMtools)
Comparison (using new SAMtools)
Comparison (using new SAMtools)
Looking into the RNASeq data…




                      34
35
Potato genome
  assembly




      RNASeq          RNASeq
     read library    read library




                    36
37
38
39
40
41
A lot more questions to answer…

    Track down more ‘strange’ SNPs based on
    the expected AFS of the two samples.

    Go beyond bialleleic SNPs

    Check the OPA base...
    −   Was the right base probed by the chip?




                                          42
Thank you for your patience!




                      43
OPAs in 5 steps...
         The DNA sample is
          activated for binding
          to paramagnetic
          particles.
OPAs in 5 steps...
         Three oligos are
          designed for each
          SNP locus. Two are
          specific to each allele
          of the SNP site
          (ASO) and a Locus-
          Specific Oligo (LSO).
OPAs in 5 steps...
        Several wash steps
         remove excess and
         mis-hybridized oligos.
        Extension of the
         appropriate ASO and
         ligation to the LSO joins
         information about the
         genotype to the
         address sequence on
         the LSO.
OPAs in 5 steps...
         The single-stranded,
          dye-labeled DNAs
          are hybridized to
          their complement
          bead type through
          their unique address
          sequences.
OPAs in 5 steps...
         Key to the assay:
         Scalable, multiplexing
          sample preparation
          (one tube reaction).
         Highly parallel array-
           based read-out.
         High-quality data:
           Average call rates
           above 99% accuracy.

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Genomics experimental-methods
Genomics experimental-methodsGenomics experimental-methods
Genomics experimental-methods
 
SNP Detection Methods and applications
SNP Detection Methods and applications SNP Detection Methods and applications
SNP Detection Methods and applications
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
 
COMPARATIVE GENOMICS.ppt
COMPARATIVE GENOMICS.pptCOMPARATIVE GENOMICS.ppt
COMPARATIVE GENOMICS.ppt
 
Kogo 2013 RNA-seq analysis
Kogo 2013 RNA-seq analysisKogo 2013 RNA-seq analysis
Kogo 2013 RNA-seq analysis
 
Single Nucleotide Polymorphism
Single Nucleotide PolymorphismSingle Nucleotide Polymorphism
Single Nucleotide Polymorphism
 
NGS data formats and analyses
NGS data formats and analysesNGS data formats and analyses
NGS data formats and analyses
 
2 whole genome sequencing and analysis
2 whole genome sequencing and analysis2 whole genome sequencing and analysis
2 whole genome sequencing and analysis
 
Transcriptome analysis
Transcriptome analysisTranscriptome analysis
Transcriptome analysis
 
GBS: Genotyping by sequencing
GBS: Genotyping by sequencingGBS: Genotyping by sequencing
GBS: Genotyping by sequencing
 
Single-Cell Analysis - Powered by REPLI-g: Single Cell Analysis Series Part 1
Single-Cell Analysis - Powered by REPLI-g: Single Cell Analysis Series Part 1Single-Cell Analysis - Powered by REPLI-g: Single Cell Analysis Series Part 1
Single-Cell Analysis - Powered by REPLI-g: Single Cell Analysis Series Part 1
 
RNA-seq for DE analysis: detecting differential expression - part 5
RNA-seq for DE analysis: detecting differential expression - part 5RNA-seq for DE analysis: detecting differential expression - part 5
RNA-seq for DE analysis: detecting differential expression - part 5
 
Comparitive genomic hybridisation
Comparitive genomic hybridisationComparitive genomic hybridisation
Comparitive genomic hybridisation
 
GWAS
GWASGWAS
GWAS
 
Bioinformatics tools for NGS data analysis
Bioinformatics tools for NGS data analysisBioinformatics tools for NGS data analysis
Bioinformatics tools for NGS data analysis
 
Transcriptome Analysis & Applications
Transcriptome Analysis & ApplicationsTranscriptome Analysis & Applications
Transcriptome Analysis & Applications
 
Next Gen Sequencing (NGS) Technology Overview
Next Gen Sequencing (NGS) Technology OverviewNext Gen Sequencing (NGS) Technology Overview
Next Gen Sequencing (NGS) Technology Overview
 
RNAseq Analysis
RNAseq AnalysisRNAseq Analysis
RNAseq Analysis
 
2015 functional genomics variant annotation and interpretation- tools and p...
2015 functional genomics   variant annotation and interpretation- tools and p...2015 functional genomics   variant annotation and interpretation- tools and p...
2015 functional genomics variant annotation and interpretation- tools and p...
 
2016. daisuke tsugama. next generation sequencing (ngs) for plant research
2016. daisuke tsugama. next generation sequencing (ngs) for plant research2016. daisuke tsugama. next generation sequencing (ngs) for plant research
2016. daisuke tsugama. next generation sequencing (ngs) for plant research
 

Destaque

Press Release Vietnam -Vietnamese
Press Release Vietnam -VietnamesePress Release Vietnam -Vietnamese
Press Release Vietnam -Vietnamese
Le Thuy Hanh
 
Chuong 1 tu bat on vi mo den con duong tai co cau
Chuong 1   tu bat on vi mo den con duong tai co cauChuong 1   tu bat on vi mo den con duong tai co cau
Chuong 1 tu bat on vi mo den con duong tai co cau
Le Thuy Hanh
 
Building Your Personal Brand with Social Media
Building Your Personal Brand with Social MediaBuilding Your Personal Brand with Social Media
Building Your Personal Brand with Social Media
Erin Dorney
 
Luxury Real Estate Stats 4 26
Luxury Real Estate Stats 4 26Luxury Real Estate Stats 4 26
Luxury Real Estate Stats 4 26
njhousehelper
 
DWI_Introduction Material_ver.01 (2)
DWI_Introduction Material_ver.01 (2)DWI_Introduction Material_ver.01 (2)
DWI_Introduction Material_ver.01 (2)
Mohit Singh
 
Manifesto Dos EmpresáRios
Manifesto Dos EmpresáRiosManifesto Dos EmpresáRios
Manifesto Dos EmpresáRios
Fabricio Martins
 

Destaque (20)

20-Line Lifesavers: Coding simple solutions in the GATK
20-Line Lifesavers: Coding simple solutions in the GATK20-Line Lifesavers: Coding simple solutions in the GATK
20-Line Lifesavers: Coding simple solutions in the GATK
 
Ensembl Plants: Visualising, mining and analysing crop genomics data
Ensembl Plants: Visualising, mining and analysing crop  genomics dataEnsembl Plants: Visualising, mining and analysing crop  genomics data
Ensembl Plants: Visualising, mining and analysing crop genomics data
 
Variant (SNPs/Indels) calling in DNA sequences, Part 1
Variant (SNPs/Indels) calling in DNA sequences, Part 1 Variant (SNPs/Indels) calling in DNA sequences, Part 1
Variant (SNPs/Indels) calling in DNA sequences, Part 1
 
SNp mining in crops
SNp mining in cropsSNp mining in crops
SNp mining in crops
 
Press Release Vietnam -Vietnamese
Press Release Vietnam -VietnamesePress Release Vietnam -Vietnamese
Press Release Vietnam -Vietnamese
 
Cloud Computing and ROI
Cloud Computing and ROICloud Computing and ROI
Cloud Computing and ROI
 
IBM SaaS Complete A Questionnaire
IBM SaaS Complete A QuestionnaireIBM SaaS Complete A Questionnaire
IBM SaaS Complete A Questionnaire
 
Appearances do matter leadership in a crisis
Appearances do matter leadership in a crisisAppearances do matter leadership in a crisis
Appearances do matter leadership in a crisis
 
Chuong 1 tu bat on vi mo den con duong tai co cau
Chuong 1   tu bat on vi mo den con duong tai co cauChuong 1   tu bat on vi mo den con duong tai co cau
Chuong 1 tu bat on vi mo den con duong tai co cau
 
Building Your Personal Brand with Social Media
Building Your Personal Brand with Social MediaBuilding Your Personal Brand with Social Media
Building Your Personal Brand with Social Media
 
Workshop social networking 09
Workshop social networking 09Workshop social networking 09
Workshop social networking 09
 
IBM SaaS Upload And Share A File
IBM SaaS Upload And Share A FileIBM SaaS Upload And Share A File
IBM SaaS Upload And Share A File
 
wchh2014 Wordpress ChildThemes - wieso, weshalb, warum?
wchh2014 Wordpress ChildThemes - wieso, weshalb, warum?wchh2014 Wordpress ChildThemes - wieso, weshalb, warum?
wchh2014 Wordpress ChildThemes - wieso, weshalb, warum?
 
IR 2.0: media społecznościowe w relacjach inwestorskich
IR 2.0: media społecznościowe w relacjach inwestorskichIR 2.0: media społecznościowe w relacjach inwestorskich
IR 2.0: media społecznościowe w relacjach inwestorskich
 
Luxury Real Estate Stats 4 26
Luxury Real Estate Stats 4 26Luxury Real Estate Stats 4 26
Luxury Real Estate Stats 4 26
 
DWI_Introduction Material_ver.01 (2)
DWI_Introduction Material_ver.01 (2)DWI_Introduction Material_ver.01 (2)
DWI_Introduction Material_ver.01 (2)
 
TiếP Thị Số HướNg DẫNthiếT YếU Cho
TiếP Thị Số   HướNg DẫNthiếT YếU ChoTiếP Thị Số   HướNg DẫNthiếT YếU Cho
TiếP Thị Số HướNg DẫNthiếT YếU Cho
 
BioWikis BSB10
BioWikis BSB10BioWikis BSB10
BioWikis BSB10
 
Manifesto Dos EmpresáRios
Manifesto Dos EmpresáRiosManifesto Dos EmpresáRios
Manifesto Dos EmpresáRios
 
Questions
QuestionsQuestions
Questions
 

Semelhante a Creating a SNP calling pipeline

20110524zurichngs 1st pub
20110524zurichngs 1st pub20110524zurichngs 1st pub
20110524zurichngs 1st pub
sesejun
 
07 wp6 progresses&results-20130221
07 wp6 progresses&results-2013022107 wp6 progresses&results-20130221
07 wp6 progresses&results-20130221
fruitbreedomics
 
20110524zurichngs 2nd pub
20110524zurichngs 2nd pub20110524zurichngs 2nd pub
20110524zurichngs 2nd pub
sesejun
 
Overview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataOverview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence data
Thomas Keane
 
Overview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataOverview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence data
Thomas Keane
 

Semelhante a Creating a SNP calling pipeline (20)

Introduction to Apollo for i5k
Introduction to Apollo for i5kIntroduction to Apollo for i5k
Introduction to Apollo for i5k
 
20150601 bio sb_assembly_course
20150601 bio sb_assembly_course20150601 bio sb_assembly_course
20150601 bio sb_assembly_course
 
Using BioNano Maps to Improve an Insect Genome Assembly​
Using BioNano Maps to Improve an Insect Genome Assembly​Using BioNano Maps to Improve an Insect Genome Assembly​
Using BioNano Maps to Improve an Insect Genome Assembly​
 
Rnaseq forgenefinding
Rnaseq forgenefindingRnaseq forgenefinding
Rnaseq forgenefinding
 
20110524zurichngs 1st pub
20110524zurichngs 1st pub20110524zurichngs 1st pub
20110524zurichngs 1st pub
 
Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...
Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...
Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...
 
Hong_Celine_ES_workshop.pptx
Hong_Celine_ES_workshop.pptxHong_Celine_ES_workshop.pptx
Hong_Celine_ES_workshop.pptx
 
Apollo Collaborative genome annotation editing
Apollo Collaborative genome annotation editing Apollo Collaborative genome annotation editing
Apollo Collaborative genome annotation editing
 
07 wp6 progresses&results-20130221
07 wp6 progresses&results-2013022107 wp6 progresses&results-20130221
07 wp6 progresses&results-20130221
 
Introduction to NGS
Introduction to NGSIntroduction to NGS
Introduction to NGS
 
RNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGSRNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGS
 
20110524zurichngs 2nd pub
20110524zurichngs 2nd pub20110524zurichngs 2nd pub
20110524zurichngs 2nd pub
 
2.CRISPR .pptx
2.CRISPR .pptx2.CRISPR .pptx
2.CRISPR .pptx
 
Overview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataOverview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence data
 
Fish546
Fish546Fish546
Fish546
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
 
Overview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataOverview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence data
 
RNASeq - Analysis Pipeline for Differential Expression
RNASeq - Analysis Pipeline for Differential ExpressionRNASeq - Analysis Pipeline for Differential Expression
RNASeq - Analysis Pipeline for Differential Expression
 
Hamas 1
Hamas 1Hamas 1
Hamas 1
 
The ‘Three Peak Challenge’ for long-read, ultra-deep stool metagenomics on th...
The ‘Three Peak Challenge’ for long-read, ultra-deep stool metagenomics on th...The ‘Three Peak Challenge’ for long-read, ultra-deep stool metagenomics on th...
The ‘Three Peak Challenge’ for long-read, ultra-deep stool metagenomics on th...
 

Mais de Dan Bolser

Semantic MediaWiki Workshop
Semantic MediaWiki WorkshopSemantic MediaWiki Workshop
Semantic MediaWiki Workshop
Dan Bolser
 

Mais de Dan Bolser (7)

Ramona Tăme - Email Encryption and Digital SIgning
Ramona Tăme - Email Encryption and Digital SIgningRamona Tăme - Email Encryption and Digital SIgning
Ramona Tăme - Email Encryption and Digital SIgning
 
Nice 2012, BioWikis and DASWiki
Nice 2012, BioWikis and DASWikiNice 2012, BioWikis and DASWiki
Nice 2012, BioWikis and DASWiki
 
Ensembl plants hsf_d_bolser_2012
Ensembl plants hsf_d_bolser_2012Ensembl plants hsf_d_bolser_2012
Ensembl plants hsf_d_bolser_2012
 
NETTAB 2012 flyer
NETTAB 2012 flyerNETTAB 2012 flyer
NETTAB 2012 flyer
 
Semantic MediaWiki Workshop
Semantic MediaWiki WorkshopSemantic MediaWiki Workshop
Semantic MediaWiki Workshop
 
Wikis at work
Wikis at workWikis at work
Wikis at work
 
Wikipedia and the Global Brain
Wikipedia and the Global BrainWikipedia and the Global Brain
Wikipedia and the Global Brain
 

Último

Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 

Último (20)

Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 

Creating a SNP calling pipeline

  • 1. Potato SNPs Dan Bolser and David Martin Next Gen Bug, Dundee 01/18/2010 1
  • 2. Aims of the work 1) Learn about handling RNASeq  Create a SNP calling pipeline 2) Select SNPs for genetic mapping  Using Illumina's GoldenGate SNP chip (OPA) 2
  • 3. Creating a SNP calling pipeline 3
  • 4. 4
  • 5. Align (using BWA) 1) Index the potato genome assembly bwa index [-a bwtsw|div|is] [-c] <in.fasta> 2) Perform the alignment bwa aln [options] <in.fasta> <in.fq> 3) Output results in SAM format (single end) bwa samse <in.fasta> <in.sai> <in.fq> 5
  • 6. Align (using Bowtie) 1) Index the potato genome assembly bowtie-build [options] <in.fasta> <ebwt> 2) Perform the alignment and output results bowtie [options] <ebwt> <in.fq>
  • 7. 7
  • 8. Convert (using SAMtools) 1) Convert SAM to BAM for sorting samtools view -S -b <in.sam> 2) Sort BAM for SNP calling samtools sort <in.bam> <out.bam.s>  Alignments are both compressed for long term storage and sorted for variant discovery. 8
  • 9. 9
  • 10. Coverage profiles / Depth vectors 10
  • 11. SAMtools...  Dump a coverage profile samtools mpileup -f <in.fasta> <my.bam.s> P1 244526 A 10 ...,.,,,.. BBQa`aaaa[ P1 244527 A 10 ...,.,,,.. BBZ_`^a_a[ P1 244528 C 10 .$.$.,.,,,.. >>RaZ`aaaa P1 244529 C 8 .,.,,,.. NaXaaaa` P1 244530 T 8 .,.,,,.. Xa_aaa` P1 244531 C 8 .,.,,,.. Rbabbaa P1 244532 T 9 .,.,,,..^~. EE^^^^^^A P1 244533 T 9 .,.,,,... BBB P1 244534 T 9 .$,$.,,,... @@^^^^^^E 11
  • 12. SAMtools Bio::DB::Sam (BioPerl) Dump a coverage profile 2 12
  • 13. SAMtools Bio::DB::Sam (BioPerl) P41630 Matches : 9 0233333333333345555555555 666778888888899999999999 999999999999999999999999 999976666666666665444444 44443332211111111000 13
  • 14. 14
  • 15. mpileup  samtools mpileup collects summary information in the input BAMs, computes the likelihood of data given each possible genotype and stores the likelihoods in the BCF format.  bcftools view applies the prior and does the actual calling.  Finally, we filter. 15
  • 16. SNP call 1) Index the potato genome assembly (again!) samtools faidx in.fasta 2) Run 'mpileup' to generate VCF format samtools mpileup -ug -f in.fasta my1.bam.s my2.bam.s > my.raw.bcf  Actually, all we did (I think) is perform a format conversion (BAM to VCF).
  • 18. VCF format A standard format for sequence variation: SNPs, indels and structural variants. Compressed and indexed. Developed for the 1000 Genomes Project. VCFtools for VCF like SAMtools for SAM. Specification and tools available from http://vcftools.sourceforge.net 18
  • 19. 19
  • 20. SNP call and filter 1) Call SNPs bcftools view -bvcg my.raw.bcf > my.var.bcf 2) Filter SNPs bcftools view my.var.bcf | vcfutils.pl varFilter my.var.bcf > my.var.bcf.filt 20
  • 21. 21
  • 22. Aims of the work 1) Learn about handling RNASeq  Create a SNP calling pipeline 2) Select SNPs for genetic mapping  Using Illumina's GoldenGate SNP chip (OPA) 22
  • 23. Select SNPs for genetic mapping Using Illumina's GoldenGate SNP chip (OPA) 23
  • 24. SNP chip (OPA) construction  A set of DM SNP positions was provided by the SolCAP project (RNASeq derived).  A subset was selected for developing OPAs (Illumina’s SNP chip technology).  OPAs were run, and results have now been compared to RNASeq. 24
  • 25. Comparison (using an early SAMtools)
  • 26. Comparison (using an early SAMtools)
  • 27. 27
  • 28.
  • 29. Comparison (using an early SAMtools)
  • 31.
  • 32.
  • 34. Looking into the RNASeq data… 34
  • 35. 35
  • 36. Potato genome assembly RNASeq RNASeq read library read library 36
  • 37. 37
  • 38. 38
  • 39. 39
  • 40. 40
  • 41. 41
  • 42. A lot more questions to answer…  Track down more ‘strange’ SNPs based on the expected AFS of the two samples.  Go beyond bialleleic SNPs  Check the OPA base... − Was the right base probed by the chip? 42
  • 43. Thank you for your patience! 43
  • 44.
  • 45. OPAs in 5 steps... The DNA sample is activated for binding to paramagnetic particles.
  • 46. OPAs in 5 steps... Three oligos are designed for each SNP locus. Two are specific to each allele of the SNP site (ASO) and a Locus- Specific Oligo (LSO).
  • 47. OPAs in 5 steps... Several wash steps remove excess and mis-hybridized oligos. Extension of the appropriate ASO and ligation to the LSO joins information about the genotype to the address sequence on the LSO.
  • 48. OPAs in 5 steps... The single-stranded, dye-labeled DNAs are hybridized to their complement bead type through their unique address sequences.
  • 49. OPAs in 5 steps... Key to the assay: Scalable, multiplexing sample preparation (one tube reaction). Highly parallel array- based read-out. High-quality data: Average call rates above 99% accuracy.

Notas do Editor

  1. All three oligo sequences contain regions of genomic complementarity and universal PCR primer sites; the LSO also contains a unique address sequence that targets a particular bead type. Up to 1,536 SNPs may be interrogated simultaneously in this manner. During the primer hybridization process, the assay oligos hybridize to the genomic DNA sample bound to paramagnetic particles. Because hybridization occurs prior to any amplification steps, no amplification bias can be introduced into the assay.
  2. Extension of the appropriate ASO and ligation of the extended product to the LSO joins information about the genotype present at the SNP site to the address sequence on the LSO Allele-specific primer extension (ASPE). This step is used to preferentially extend the correctly matched ASO (at the 3&apos; end) up to the 5&apos; end of the LSO primer.
  3. One to one mapping between an address sequence on the array and the locus being scored. As a result of this labeling scheme, the PCR product consists of double stranded DNA of which one strand, containing the complement to the Illumicode, is labeled with either Cy3 or Cy5 in an allele specific manner, and a complementary strand labeled with biotin. The biotinylated strand is removed and the single, florescently labeled strand hybridized to the BeadArray.