SlideShare uma empresa Scribd logo
1 de 61
Baixar para ler offline
[I0D51A] Bioinformatics: High-Throughput Analysis
 Next-generation sequencing. Part 1: Technologies
Prof Jan Aerts
Faculty of Engineering - ESAT/SCD
jan.aerts@esat.kuleuven.be

TA: Alejandro Sifrim (alejandro.sifrim@esat.kuleuven.be)




                                                           1
Announcements

May 27th (9am-noon): evaluation


open book




                                  2
Note to self...

Upload s_1_sequence.txt and s_2_sequence.txt to Galaxy first...




                                                                 3
Overview

• linux refresher (6/5)


• next-generation sequencing technologies and applications (6/5)


• sequence mapping (13/5)


• variant calling - SNPs (20/5)


• variant calling - structural variation (20/5)




                                                                   4
Linux Refresher...




                     5
Next-generation sequencing technologies




                                          6
General principle




                    7
Big data...




              8
First vs second generation sequencing
Sanger sequencing (1st gen)   2nd/next gen sequencing




                                                 Shendure & Ji, 2008




                                                                       9
Paired-end sequencing




                        Korbel et al, 2007




                                             10
General approaches

• 2nd generation: clonally amplified single molecules


  • Roche 454 pyrosequencing


  • Illumina Genome Analyzer -> HiSeq: reversible terminator technology


  • ABI SOLiD: ligation-based extension


• Next-next-generation/3rd generation: true single molecule


  • Helicos: Heliscore


  • Pacific Biosciences: SMRT
                                                                          11
Mardis, 2011

               12
Steps


        genome enrichment




                    template preparation



                              sequencing and imaging



                                           data analysis




                                                           13
A. Genome enrichment




                       14
Sequencing costs




                   15
What?

Only sequence relevant parts of the genome instead of whole genome, e.g.:


• specific Mb-scale regions known to be involved in particular disease (e.g.
  based on GWAS)


• specific candidate genes belonging to disease pathway


• exome (= all exons)


 => how to isolate these from non-target sequence? “pulldown”




                                                                              16
Pulldown: on-array




                     Turner et al, 2009




                                          17
Pulldown: in-solution




                        Turner et al, 2009




                                             18
Performance metrics

• fold-enrichment: ratio of abundance of target sequences post-enrichment vs
  pre-enrichment


• capture specificity: fraction of sequence reads that map to target


• uniformity: relative abundance of individual targets after enrichment


• completeness: fraction of target bases detectably captured




                                                                           19
B. Template preparation




                          20
Problem: most imaging systems not designed to detect single fluorescent event
=> need amplified templates


Aim: to produce a representative, non-biased source of nucleic acid material
from the genome under investigation => population of identical templates


Steps:


   1. shear DNA


   2. amplify templates


 Options: emulsion PCR (emPCR) or solid phase amplification

                                                                               21
Amplification by emulsion PCR

emulsion = mixture of two or more immiscible (unblendable) liquids; e.g.
mayonnaise, vinaigrette


emPCR: thousands of microreactors/micro-eppendorfs


one bead + one DNA molecule per microreactor => PCR to 1000s of copies




                                                                           22
Williams et al, 2006




 Metzker et al, 2010


                       23
Solid-phase amplification




                                             http://bit.ly/6JYIUz




http://www.youtube.com/watch?v=77r5p8IBwJk&NR=1
                                                                    Metzker et al, 2010
                                                                                       24
C. Sequencing and imaging




                            25
Sequencing and imaging

Technologies:


1. cyclic reversible termination


2. sequencing by ligation


3. pyrosequencing


4. real-time sequencing




                                   26
Cyclic reversible termination

DNA synthesis is terminated after adding single nucleotide


start/stop/start/stop/start/stop/...

                            Illumina: 4-colour



sequencing result
                      sequencing steps




                               Metzker et al, 2010
                                                             27
Helicos: 1-colour




         sequencing steps




sequencing result




                                      Metzker et al, 2010




          Metzker et al, 2010



                                                            28
Sequencing by ligation




   http://bit.ly/fPh22X




sequencing steps




                          29
sequencing result




http://bit.ly/fPh22X




                       30
Pyrosequencing




                                  Metzker et al, 2010




            Metzker et al, 2010                         31
Real-time sequencing




                    “ZMW” zero-mode waveguide
   DNA polymerase

                                        “strobe sequencing”


                                                              32
Run time   Gb/run

Roche 454    8.5 hr     45

 Illumina    9 days     35

 SOLiD      14 days     50

 Helicos     8 days     37

 PacBio        ?         ?


                                33
Accuracy - base calling error

• base quality drops along read


        Sanger > SOLiD > Illumina > 454 > Helicos


        (“dephasing” within clusters)




• base calling errors




                                                    34
Accuracy - homopolymer runs

 Issue for Roche 454:


   39% of errors are homopolymers


      A5 motifs: 3.3% error rate


      A8 motifs: 50% error rate


   Reason: use signal intensity as a measure for homopolymer length




                                                                      35
36
Ronaghi, Genome Res 11:3-11 (2001)




                                     37
http://mammoth.psu.edu/labPhotos/imageOfFlowgram.jpg




                                                       38
Is it 4? Is it 5? Is it 4?




      http://mammoth.psu.edu/labPhotos/imageOfFlowgram.jpg




                                                             39
Consensus accuracy

Increase accuracy for SNP calling by increasing coverage:


   Illumina: 20X


   SOLiD: 12X


   454: 7.4X


   Sanger: 3X


Factors: raw accuracy + read length


How deep do you have to sequence? => Poisson distribution: “If you sequence at
average of 10X, how much of the genome will be covered at least 5X”?

                                                                                 40
Bentley et al, Nature 456:53-56 (2008)




                                         41
FASTQ file format
                                                   example fasta entries (n=2)




             “@” + identifier            example fastq entries (n=2)
               sequence
  “+” + identifier (optional)
phred-based quality scores




         phred quality score encoding




                                                                Wikipedia

                                                                                 42
Sequence quality control

Is this good sequence? (essential!)


E.g.: using FastQC tool (Babraham Institute, UK; http://
www.bioinformatics.bbsrc.ac.uk/projects/fastqc/)




                                                           43
Sequence quality control
              per base sequence quality
                    good         bad




                                          44
Sequence quality control
              per sequence quality scores
                    good         bad




                                            45
Sequence quality control
              per base sequence content
                   good         bad




                                          46
Sequence quality control
                per base GC content
                  good         bad




                                      47
Sequence quality control
               per sequence GC content
                   good        bad




                                         48
Sequence quality control
                   k-mer content
                  good       bad




                                   49
Intermezzo: Galaxy




                     50
Online genome analysis

http://galaxy.psu.edu/


“Galaxy allows you to do analyses you cannot do anywhere else without the
need to install or download anything. You can analyze multiple alignments,
compare genomic annotations, profile metagenomic samples and much much
more...”




                                                                             51
52
53
Applications of next-generation sequencing




                                             54
Kahvejian et al, 2008


                        55
DNA-seq

ChIP-seq




           RNA-seq




                        Kahvejian et al, 2008


                                                50
                                                56
identify
                                                            sequence
                                                            variations



                          DNA-seq

            ChIP-seq




                       RNA-seq

 identify
pathogens

                                    Kahvejian et al, 2008


                                                                         50
                                                                         51
                                                                         57
Exercises




            58
Try to login to the server mentioned on Toledo with username and password
provided there.



There are 2 FASTQ files in /mnt/homes/jaerts/: s_1_sequence.txt and
s_2_sequence.txt (= paired ends)



  • How many sequences are in s_1_sequence.txt?


  • What encoding was used for the quality score? Illumina? Sanger?


  • What are the numerical quality scores for the first sequence in
    s_1_sequence.txt (i.e. 7172283/1)?




                                                                            59
• Create an account on the Galaxy server



• Download s_1_sequence.txt and s_2_sequence.txt from Toledo and upload
  them into Galaxy. These files are also available on the linux server



• Have a look at the contents of s_1_sequence.txt.



• Convert quality scores to numeric values for s_1_sequence.txt (“FASTQ
  Groomer”)



• Draw the quality score boxplot for s_1_sequence.txt



• Draw the nucleotide distribution chart for s_1_sequence.txt

                                                                          60
References

Bentley DR et al. Accurate whole human genome sequencing using reversible
terminator chemistry. Nature 456: 53-59 (2008)
Kahvejian A, Quackenbush J & Thompson JF. What would you do if you could
sequence everything? Nature Biotechnology 26: 1125-1133 (2008)
Korbel JO et al. Paired-end mapping reveals extensive structural variation in the
human genome. Science 318: 420-426 (2007)
Mardis ER. A decade’s perspective on DNA sequencing technology. Nature
470: 198-203 (2011)
Metzker ML. Sequencing technologies - the next generation. Nature Reviews
Genetics 11:31-46 (2010)
Shendure J & Ji H. Next-generation DNA sequencing. Nature Biotechnology
26:1135-1145 (2008)
Turner EH et al. Methods for genomic partitioning. Annual Review of Genomics
and Human Genetics 10 (2009)

                                                                                61

Mais conteúdo relacionado

Mais procurados

Next-generation genomics: an integrative approach
Next-generation genomics: an integrative approachNext-generation genomics: an integrative approach
Next-generation genomics: an integrative approachHong ChangBum
 
Ngs microbiome
Ngs microbiomeNgs microbiome
Ngs microbiomejukais
 
Next generation sequencing methods (final edit)
Next generation sequencing methods (final edit)Next generation sequencing methods (final edit)
Next generation sequencing methods (final edit)Mrinal Vashisth
 
Introduction to second generation sequencing
Introduction to second generation sequencingIntroduction to second generation sequencing
Introduction to second generation sequencingDenis C. Bauer
 
Expanding Your Research Capabilities Using Targeted NGS
Expanding Your Research Capabilities Using Targeted NGSExpanding Your Research Capabilities Using Targeted NGS
Expanding Your Research Capabilities Using Targeted NGSIntegrated DNA Technologies
 
Correlagen next gen presentation 042711
Correlagen next gen presentation 042711Correlagen next gen presentation 042711
Correlagen next gen presentation 042711algunduz28
 
Next generation sequencing methods
Next generation sequencing methods Next generation sequencing methods
Next generation sequencing methods Mrinal Vashisth
 
BioChain Next Generation Sequencing Products
BioChain Next Generation Sequencing ProductsBioChain Next Generation Sequencing Products
BioChain Next Generation Sequencing Productsbiochain
 
Next Generation Sequencing
Next Generation SequencingNext Generation Sequencing
Next Generation SequencingSajad Rafatiyan
 
New Generation Sequencing Technologies: an overview
New Generation Sequencing Technologies: an overviewNew Generation Sequencing Technologies: an overview
New Generation Sequencing Technologies: an overviewPaolo Dametto
 
ECCB 2010 Next-gen sequencing Tutorial
ECCB 2010 Next-gen sequencing TutorialECCB 2010 Next-gen sequencing Tutorial
ECCB 2010 Next-gen sequencing TutorialThomas Keane
 
Rnaseq basics ngs_application1
Rnaseq basics ngs_application1Rnaseq basics ngs_application1
Rnaseq basics ngs_application1Yaoyu Wang
 
IonGAP - an Integrated Genome Assembly Platform for Ion Torrent Data
IonGAP - an Integrated Genome Assembly Platform for Ion Torrent DataIonGAP - an Integrated Genome Assembly Platform for Ion Torrent Data
IonGAP - an Integrated Genome Assembly Platform for Ion Torrent DataAdrian Baez-Ortega
 
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...VHIR Vall d’Hebron Institut de Recerca
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencingTapish Goel
 
next generation sequencing
next generation sequencingnext generation sequencing
next generation sequencingPeter Egorov
 
The Next, Next Generation of Sequencing - From Semiconductor to Single Molecule
The Next, Next Generation of Sequencing - From Semiconductor to Single MoleculeThe Next, Next Generation of Sequencing - From Semiconductor to Single Molecule
The Next, Next Generation of Sequencing - From Semiconductor to Single MoleculeJustin Johnson
 
2011 jeroen vanhoudt_ngs
2011 jeroen vanhoudt_ngs2011 jeroen vanhoudt_ngs
2011 jeroen vanhoudt_ngsDin Apellidos
 
Next Generation Sequencing & Transcriptome Analysis
Next Generation Sequencing & Transcriptome AnalysisNext Generation Sequencing & Transcriptome Analysis
Next Generation Sequencing & Transcriptome AnalysisBastian Greshake
 

Mais procurados (20)

Next-generation genomics: an integrative approach
Next-generation genomics: an integrative approachNext-generation genomics: an integrative approach
Next-generation genomics: an integrative approach
 
Ngs microbiome
Ngs microbiomeNgs microbiome
Ngs microbiome
 
Next generation sequencing methods (final edit)
Next generation sequencing methods (final edit)Next generation sequencing methods (final edit)
Next generation sequencing methods (final edit)
 
New generation Sequencing
New generation Sequencing New generation Sequencing
New generation Sequencing
 
Introduction to second generation sequencing
Introduction to second generation sequencingIntroduction to second generation sequencing
Introduction to second generation sequencing
 
Expanding Your Research Capabilities Using Targeted NGS
Expanding Your Research Capabilities Using Targeted NGSExpanding Your Research Capabilities Using Targeted NGS
Expanding Your Research Capabilities Using Targeted NGS
 
Correlagen next gen presentation 042711
Correlagen next gen presentation 042711Correlagen next gen presentation 042711
Correlagen next gen presentation 042711
 
Next generation sequencing methods
Next generation sequencing methods Next generation sequencing methods
Next generation sequencing methods
 
BioChain Next Generation Sequencing Products
BioChain Next Generation Sequencing ProductsBioChain Next Generation Sequencing Products
BioChain Next Generation Sequencing Products
 
Next Generation Sequencing
Next Generation SequencingNext Generation Sequencing
Next Generation Sequencing
 
New Generation Sequencing Technologies: an overview
New Generation Sequencing Technologies: an overviewNew Generation Sequencing Technologies: an overview
New Generation Sequencing Technologies: an overview
 
ECCB 2010 Next-gen sequencing Tutorial
ECCB 2010 Next-gen sequencing TutorialECCB 2010 Next-gen sequencing Tutorial
ECCB 2010 Next-gen sequencing Tutorial
 
Rnaseq basics ngs_application1
Rnaseq basics ngs_application1Rnaseq basics ngs_application1
Rnaseq basics ngs_application1
 
IonGAP - an Integrated Genome Assembly Platform for Ion Torrent Data
IonGAP - an Integrated Genome Assembly Platform for Ion Torrent DataIonGAP - an Integrated Genome Assembly Platform for Ion Torrent Data
IonGAP - an Integrated Genome Assembly Platform for Ion Torrent Data
 
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
 
next generation sequencing
next generation sequencingnext generation sequencing
next generation sequencing
 
The Next, Next Generation of Sequencing - From Semiconductor to Single Molecule
The Next, Next Generation of Sequencing - From Semiconductor to Single MoleculeThe Next, Next Generation of Sequencing - From Semiconductor to Single Molecule
The Next, Next Generation of Sequencing - From Semiconductor to Single Molecule
 
2011 jeroen vanhoudt_ngs
2011 jeroen vanhoudt_ngs2011 jeroen vanhoudt_ngs
2011 jeroen vanhoudt_ngs
 
Next Generation Sequencing & Transcriptome Analysis
Next Generation Sequencing & Transcriptome AnalysisNext Generation Sequencing & Transcriptome Analysis
Next Generation Sequencing & Transcriptome Analysis
 

Destaque

Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencingDayananda Salam
 
Next generation sequencing course - part 2: sequence mapping
Next generation sequencing course - part 2: sequence mappingNext generation sequencing course - part 2: sequence mapping
Next generation sequencing course - part 2: sequence mappingJan Aerts
 
Quality Control of NGS Data Solutions
Quality Control of NGS Data  SolutionsQuality Control of NGS Data  Solutions
Quality Control of NGS Data SolutionsSurya Saha
 
Quality Control of Sequencing Data
Quality Control of Sequencing DataQuality Control of Sequencing Data
Quality Control of Sequencing DataSurya Saha
 
Sequencing: The Next Generation 2015
Sequencing: The Next Generation 2015Sequencing: The Next Generation 2015
Sequencing: The Next Generation 2015Surya Saha
 
Next-generation sequencing data format and visualization with ngs.plot 2015
Next-generation sequencing data format and visualization with ngs.plot 2015Next-generation sequencing data format and visualization with ngs.plot 2015
Next-generation sequencing data format and visualization with ngs.plot 2015Li Shen
 
Systems biology & Approaches of genomics and proteomics
 Systems biology & Approaches of genomics and proteomics Systems biology & Approaches of genomics and proteomics
Systems biology & Approaches of genomics and proteomicssonam786
 
Introduction to systems biology
Introduction to systems biologyIntroduction to systems biology
Introduction to systems biologylemberger
 
High-Throughput Sequencing
High-Throughput SequencingHigh-Throughput Sequencing
High-Throughput SequencingMark Pallen
 
NGS technologies - platforms and applications
NGS technologies - platforms and applicationsNGS technologies - platforms and applications
NGS technologies - platforms and applicationsAGRF_Ltd
 
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...QIAGEN
 

Destaque (17)

Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
 
Ngs intro_v6_public
 Ngs intro_v6_public Ngs intro_v6_public
Ngs intro_v6_public
 
Introduction to next generation sequencing
Introduction to next generation sequencingIntroduction to next generation sequencing
Introduction to next generation sequencing
 
Next generation sequencing course - part 2: sequence mapping
Next generation sequencing course - part 2: sequence mappingNext generation sequencing course - part 2: sequence mapping
Next generation sequencing course - part 2: sequence mapping
 
Quality Control of NGS Data Solutions
Quality Control of NGS Data  SolutionsQuality Control of NGS Data  Solutions
Quality Control of NGS Data Solutions
 
Quality Control of Sequencing Data
Quality Control of Sequencing DataQuality Control of Sequencing Data
Quality Control of Sequencing Data
 
Sequencing: The Next Generation 2015
Sequencing: The Next Generation 2015Sequencing: The Next Generation 2015
Sequencing: The Next Generation 2015
 
ChIP-seq - Data processing
ChIP-seq - Data processingChIP-seq - Data processing
ChIP-seq - Data processing
 
Next-generation sequencing data format and visualization with ngs.plot 2015
Next-generation sequencing data format and visualization with ngs.plot 2015Next-generation sequencing data format and visualization with ngs.plot 2015
Next-generation sequencing data format and visualization with ngs.plot 2015
 
Systems biology & Approaches of genomics and proteomics
 Systems biology & Approaches of genomics and proteomics Systems biology & Approaches of genomics and proteomics
Systems biology & Approaches of genomics and proteomics
 
Introduction to systems biology
Introduction to systems biologyIntroduction to systems biology
Introduction to systems biology
 
High throughput sequencing
High throughput sequencingHigh throughput sequencing
High throughput sequencing
 
High-Throughput Sequencing
High-Throughput SequencingHigh-Throughput Sequencing
High-Throughput Sequencing
 
NGS technologies - platforms and applications
NGS technologies - platforms and applicationsNGS technologies - platforms and applications
NGS technologies - platforms and applications
 
Ngs ppt
Ngs pptNgs ppt
Ngs ppt
 
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
 
Clinical Applications of Next Generation Sequencing
Clinical Applications of Next Generation SequencingClinical Applications of Next Generation Sequencing
Clinical Applications of Next Generation Sequencing
 

Semelhante a Next-generation sequencing course, part 1: technologies

New Molecular Approaches to Identify 21st Century Microbes - Dr Melissa Mille...
New Molecular Approaches to Identify 21st Century Microbes - Dr Melissa Mille...New Molecular Approaches to Identify 21st Century Microbes - Dr Melissa Mille...
New Molecular Approaches to Identify 21st Century Microbes - Dr Melissa Mille...Eastern Pennsylvania Branch ASM
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomicsPawan Kumar
 
DNA Markers Techniques for Plant Varietal Identification
DNA Markers Techniques for Plant Varietal Identification DNA Markers Techniques for Plant Varietal Identification
DNA Markers Techniques for Plant Varietal Identification Senthil Natesan
 
Part 2 of RNA-seq for DE analysis: Investigating raw data
Part 2 of RNA-seq for DE analysis: Investigating raw dataPart 2 of RNA-seq for DE analysis: Investigating raw data
Part 2 of RNA-seq for DE analysis: Investigating raw dataJoachim Jacob
 
Generations of sequencing technologies.
Generations of sequencing technologies. Generations of sequencing technologies.
Generations of sequencing technologies. ShadenAlharbi
 
03_Microbio590B_sequencing_2022.pdf
03_Microbio590B_sequencing_2022.pdf03_Microbio590B_sequencing_2022.pdf
03_Microbio590B_sequencing_2022.pdfKristen DeAngelis
 
Aug2014 abrf interlaboratory study plans
Aug2014 abrf interlaboratory study plansAug2014 abrf interlaboratory study plans
Aug2014 abrf interlaboratory study plansGenomeInABottle
 
Gene sequencing technique
Gene sequencing techniqueGene sequencing technique
Gene sequencing techniqueDarshan Patel
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencingARUNDHATI MEHTA
 
nextgenerationsequencing-170606100132.pdf
nextgenerationsequencing-170606100132.pdfnextgenerationsequencing-170606100132.pdf
nextgenerationsequencing-170606100132.pdfAkhileshPathak33
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...GenomeInABottle
 
DNA SEQUENCING (1).pptx
DNA SEQUENCING (1).pptxDNA SEQUENCING (1).pptx
DNA SEQUENCING (1).pptxDeenaRahul
 
Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...
Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...
Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...Thermo Fisher Scientific
 
2014 09 30_t1_bioinformatics_wim_vancriekinge
2014 09 30_t1_bioinformatics_wim_vancriekinge2014 09 30_t1_bioinformatics_wim_vancriekinge
2014 09 30_t1_bioinformatics_wim_vancriekingeProf. Wim Van Criekinge
 
EVE161: Microbial Phylogenomics - Class 2 - Evolution of DNA Sequencing
EVE161: Microbial Phylogenomics - Class 2 - Evolution of DNA SequencingEVE161: Microbial Phylogenomics - Class 2 - Evolution of DNA Sequencing
EVE161: Microbial Phylogenomics - Class 2 - Evolution of DNA SequencingJonathan Eisen
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomicsAjit Shinde
 

Semelhante a Next-generation sequencing course, part 1: technologies (20)

New Molecular Approaches to Identify 21st Century Microbes - Dr Melissa Mille...
New Molecular Approaches to Identify 21st Century Microbes - Dr Melissa Mille...New Molecular Approaches to Identify 21st Century Microbes - Dr Melissa Mille...
New Molecular Approaches to Identify 21st Century Microbes - Dr Melissa Mille...
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
 
DNA Markers Techniques for Plant Varietal Identification
DNA Markers Techniques for Plant Varietal Identification DNA Markers Techniques for Plant Varietal Identification
DNA Markers Techniques for Plant Varietal Identification
 
Part 2 of RNA-seq for DE analysis: Investigating raw data
Part 2 of RNA-seq for DE analysis: Investigating raw dataPart 2 of RNA-seq for DE analysis: Investigating raw data
Part 2 of RNA-seq for DE analysis: Investigating raw data
 
Generations of sequencing technologies.
Generations of sequencing technologies. Generations of sequencing technologies.
Generations of sequencing technologies.
 
03_Microbio590B_sequencing_2022.pdf
03_Microbio590B_sequencing_2022.pdf03_Microbio590B_sequencing_2022.pdf
03_Microbio590B_sequencing_2022.pdf
 
Aug2014 abrf interlaboratory study plans
Aug2014 abrf interlaboratory study plansAug2014 abrf interlaboratory study plans
Aug2014 abrf interlaboratory study plans
 
Gene sequencing technique
Gene sequencing techniqueGene sequencing technique
Gene sequencing technique
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
 
nextgenerationsequencing-170606100132.pdf
nextgenerationsequencing-170606100132.pdfnextgenerationsequencing-170606100132.pdf
nextgenerationsequencing-170606100132.pdf
 
NGS.pptx
NGS.pptxNGS.pptx
NGS.pptx
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
 
12 arrays
12 arrays12 arrays
12 arrays
 
12 arrays
12 arrays12 arrays
12 arrays
 
Hamas 1
Hamas 1Hamas 1
Hamas 1
 
DNA SEQUENCING (1).pptx
DNA SEQUENCING (1).pptxDNA SEQUENCING (1).pptx
DNA SEQUENCING (1).pptx
 
Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...
Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...
Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...
 
2014 09 30_t1_bioinformatics_wim_vancriekinge
2014 09 30_t1_bioinformatics_wim_vancriekinge2014 09 30_t1_bioinformatics_wim_vancriekinge
2014 09 30_t1_bioinformatics_wim_vancriekinge
 
EVE161: Microbial Phylogenomics - Class 2 - Evolution of DNA Sequencing
EVE161: Microbial Phylogenomics - Class 2 - Evolution of DNA SequencingEVE161: Microbial Phylogenomics - Class 2 - Evolution of DNA Sequencing
EVE161: Microbial Phylogenomics - Class 2 - Evolution of DNA Sequencing
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
 

Mais de Jan Aerts

VIZBI 2014 - Visualizing Genomic Variation
VIZBI 2014 - Visualizing Genomic VariationVIZBI 2014 - Visualizing Genomic Variation
VIZBI 2014 - Visualizing Genomic VariationJan Aerts
 
Visual Analytics in Omics - why, what, how?
Visual Analytics in Omics - why, what, how?Visual Analytics in Omics - why, what, how?
Visual Analytics in Omics - why, what, how?Jan Aerts
 
Visual Analytics in Omics: why, what, how?
Visual Analytics in Omics: why, what, how?Visual Analytics in Omics: why, what, how?
Visual Analytics in Omics: why, what, how?Jan Aerts
 
Visual Analytics talk at ISMB2013
Visual Analytics talk at ISMB2013Visual Analytics talk at ISMB2013
Visual Analytics talk at ISMB2013Jan Aerts
 
Visualizing the Structural Variome (VMLS-Eurovis 2013)
Visualizing the Structural Variome (VMLS-Eurovis 2013)Visualizing the Structural Variome (VMLS-Eurovis 2013)
Visualizing the Structural Variome (VMLS-Eurovis 2013)Jan Aerts
 
Humanizing Data Analysis
Humanizing Data AnalysisHumanizing Data Analysis
Humanizing Data AnalysisJan Aerts
 
Intro to data visualization
Intro to data visualizationIntro to data visualization
Intro to data visualizationJan Aerts
 
L Fu - Dao: a novel programming language for bioinformatics
L Fu - Dao: a novel programming language for bioinformaticsL Fu - Dao: a novel programming language for bioinformatics
L Fu - Dao: a novel programming language for bioinformaticsJan Aerts
 
J Wang - bioKepler: a comprehensive bioinformatics scientific workflow module...
J Wang - bioKepler: a comprehensive bioinformatics scientific workflow module...J Wang - bioKepler: a comprehensive bioinformatics scientific workflow module...
J Wang - bioKepler: a comprehensive bioinformatics scientific workflow module...Jan Aerts
 
S Cain - GMOD in the cloud
S Cain - GMOD in the cloudS Cain - GMOD in the cloud
S Cain - GMOD in the cloudJan Aerts
 
B Temperton - The Bioinformatics Testing Consortium
B Temperton - The Bioinformatics Testing ConsortiumB Temperton - The Bioinformatics Testing Consortium
B Temperton - The Bioinformatics Testing ConsortiumJan Aerts
 
J Goecks - The Galaxy Visual Analysis Framework
J Goecks - The Galaxy Visual Analysis FrameworkJ Goecks - The Galaxy Visual Analysis Framework
J Goecks - The Galaxy Visual Analysis FrameworkJan Aerts
 
S Cain - GMOD in the cloud
S Cain - GMOD in the cloudS Cain - GMOD in the cloud
S Cain - GMOD in the cloudJan Aerts
 
B Chapman - Toolkit for variation comparison and analysis
B Chapman - Toolkit for variation comparison and analysisB Chapman - Toolkit for variation comparison and analysis
B Chapman - Toolkit for variation comparison and analysisJan Aerts
 
P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...
P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...
P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...Jan Aerts
 
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...Jan Aerts
 
S Cheng - eagle-i: development and expansion of a scientific resource discove...
S Cheng - eagle-i: development and expansion of a scientific resource discove...S Cheng - eagle-i: development and expansion of a scientific resource discove...
S Cheng - eagle-i: development and expansion of a scientific resource discove...Jan Aerts
 
A Kanterakis - PyPedia: a python crowdsourcing development environment for bi...
A Kanterakis - PyPedia: a python crowdsourcing development environment for bi...A Kanterakis - PyPedia: a python crowdsourcing development environment for bi...
A Kanterakis - PyPedia: a python crowdsourcing development environment for bi...Jan Aerts
 
A Kalderimis - InterMine: Embeddable datamining components
A Kalderimis - InterMine: Embeddable datamining componentsA Kalderimis - InterMine: Embeddable datamining components
A Kalderimis - InterMine: Embeddable datamining componentsJan Aerts
 
E Afgan - Zero to a bioinformatics analysis platform in four minutes
E Afgan - Zero to a bioinformatics analysis platform in four minutesE Afgan - Zero to a bioinformatics analysis platform in four minutes
E Afgan - Zero to a bioinformatics analysis platform in four minutesJan Aerts
 

Mais de Jan Aerts (20)

VIZBI 2014 - Visualizing Genomic Variation
VIZBI 2014 - Visualizing Genomic VariationVIZBI 2014 - Visualizing Genomic Variation
VIZBI 2014 - Visualizing Genomic Variation
 
Visual Analytics in Omics - why, what, how?
Visual Analytics in Omics - why, what, how?Visual Analytics in Omics - why, what, how?
Visual Analytics in Omics - why, what, how?
 
Visual Analytics in Omics: why, what, how?
Visual Analytics in Omics: why, what, how?Visual Analytics in Omics: why, what, how?
Visual Analytics in Omics: why, what, how?
 
Visual Analytics talk at ISMB2013
Visual Analytics talk at ISMB2013Visual Analytics talk at ISMB2013
Visual Analytics talk at ISMB2013
 
Visualizing the Structural Variome (VMLS-Eurovis 2013)
Visualizing the Structural Variome (VMLS-Eurovis 2013)Visualizing the Structural Variome (VMLS-Eurovis 2013)
Visualizing the Structural Variome (VMLS-Eurovis 2013)
 
Humanizing Data Analysis
Humanizing Data AnalysisHumanizing Data Analysis
Humanizing Data Analysis
 
Intro to data visualization
Intro to data visualizationIntro to data visualization
Intro to data visualization
 
L Fu - Dao: a novel programming language for bioinformatics
L Fu - Dao: a novel programming language for bioinformaticsL Fu - Dao: a novel programming language for bioinformatics
L Fu - Dao: a novel programming language for bioinformatics
 
J Wang - bioKepler: a comprehensive bioinformatics scientific workflow module...
J Wang - bioKepler: a comprehensive bioinformatics scientific workflow module...J Wang - bioKepler: a comprehensive bioinformatics scientific workflow module...
J Wang - bioKepler: a comprehensive bioinformatics scientific workflow module...
 
S Cain - GMOD in the cloud
S Cain - GMOD in the cloudS Cain - GMOD in the cloud
S Cain - GMOD in the cloud
 
B Temperton - The Bioinformatics Testing Consortium
B Temperton - The Bioinformatics Testing ConsortiumB Temperton - The Bioinformatics Testing Consortium
B Temperton - The Bioinformatics Testing Consortium
 
J Goecks - The Galaxy Visual Analysis Framework
J Goecks - The Galaxy Visual Analysis FrameworkJ Goecks - The Galaxy Visual Analysis Framework
J Goecks - The Galaxy Visual Analysis Framework
 
S Cain - GMOD in the cloud
S Cain - GMOD in the cloudS Cain - GMOD in the cloud
S Cain - GMOD in the cloud
 
B Chapman - Toolkit for variation comparison and analysis
B Chapman - Toolkit for variation comparison and analysisB Chapman - Toolkit for variation comparison and analysis
B Chapman - Toolkit for variation comparison and analysis
 
P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...
P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...
P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...
 
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
 
S Cheng - eagle-i: development and expansion of a scientific resource discove...
S Cheng - eagle-i: development and expansion of a scientific resource discove...S Cheng - eagle-i: development and expansion of a scientific resource discove...
S Cheng - eagle-i: development and expansion of a scientific resource discove...
 
A Kanterakis - PyPedia: a python crowdsourcing development environment for bi...
A Kanterakis - PyPedia: a python crowdsourcing development environment for bi...A Kanterakis - PyPedia: a python crowdsourcing development environment for bi...
A Kanterakis - PyPedia: a python crowdsourcing development environment for bi...
 
A Kalderimis - InterMine: Embeddable datamining components
A Kalderimis - InterMine: Embeddable datamining componentsA Kalderimis - InterMine: Embeddable datamining components
A Kalderimis - InterMine: Embeddable datamining components
 
E Afgan - Zero to a bioinformatics analysis platform in four minutes
E Afgan - Zero to a bioinformatics analysis platform in four minutesE Afgan - Zero to a bioinformatics analysis platform in four minutes
E Afgan - Zero to a bioinformatics analysis platform in four minutes
 

Último

Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024Janet Corral
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...PsychoTech Services
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 

Último (20)

Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 

Next-generation sequencing course, part 1: technologies

  • 1. [I0D51A] Bioinformatics: High-Throughput Analysis Next-generation sequencing. Part 1: Technologies Prof Jan Aerts Faculty of Engineering - ESAT/SCD jan.aerts@esat.kuleuven.be TA: Alejandro Sifrim (alejandro.sifrim@esat.kuleuven.be) 1
  • 2. Announcements May 27th (9am-noon): evaluation open book 2
  • 3. Note to self... Upload s_1_sequence.txt and s_2_sequence.txt to Galaxy first... 3
  • 4. Overview • linux refresher (6/5) • next-generation sequencing technologies and applications (6/5) • sequence mapping (13/5) • variant calling - SNPs (20/5) • variant calling - structural variation (20/5) 4
  • 9. First vs second generation sequencing Sanger sequencing (1st gen) 2nd/next gen sequencing Shendure & Ji, 2008 9
  • 10. Paired-end sequencing Korbel et al, 2007 10
  • 11. General approaches • 2nd generation: clonally amplified single molecules • Roche 454 pyrosequencing • Illumina Genome Analyzer -> HiSeq: reversible terminator technology • ABI SOLiD: ligation-based extension • Next-next-generation/3rd generation: true single molecule • Helicos: Heliscore • Pacific Biosciences: SMRT 11
  • 13. Steps genome enrichment template preparation sequencing and imaging data analysis 13
  • 16. What? Only sequence relevant parts of the genome instead of whole genome, e.g.: • specific Mb-scale regions known to be involved in particular disease (e.g. based on GWAS) • specific candidate genes belonging to disease pathway • exome (= all exons) => how to isolate these from non-target sequence? “pulldown” 16
  • 17. Pulldown: on-array Turner et al, 2009 17
  • 18. Pulldown: in-solution Turner et al, 2009 18
  • 19. Performance metrics • fold-enrichment: ratio of abundance of target sequences post-enrichment vs pre-enrichment • capture specificity: fraction of sequence reads that map to target • uniformity: relative abundance of individual targets after enrichment • completeness: fraction of target bases detectably captured 19
  • 21. Problem: most imaging systems not designed to detect single fluorescent event => need amplified templates Aim: to produce a representative, non-biased source of nucleic acid material from the genome under investigation => population of identical templates Steps: 1. shear DNA 2. amplify templates Options: emulsion PCR (emPCR) or solid phase amplification 21
  • 22. Amplification by emulsion PCR emulsion = mixture of two or more immiscible (unblendable) liquids; e.g. mayonnaise, vinaigrette emPCR: thousands of microreactors/micro-eppendorfs one bead + one DNA molecule per microreactor => PCR to 1000s of copies 22
  • 23. Williams et al, 2006 Metzker et al, 2010 23
  • 24. Solid-phase amplification http://bit.ly/6JYIUz http://www.youtube.com/watch?v=77r5p8IBwJk&NR=1 Metzker et al, 2010 24
  • 25. C. Sequencing and imaging 25
  • 26. Sequencing and imaging Technologies: 1. cyclic reversible termination 2. sequencing by ligation 3. pyrosequencing 4. real-time sequencing 26
  • 27. Cyclic reversible termination DNA synthesis is terminated after adding single nucleotide start/stop/start/stop/start/stop/... Illumina: 4-colour sequencing result sequencing steps Metzker et al, 2010 27
  • 28. Helicos: 1-colour sequencing steps sequencing result Metzker et al, 2010 Metzker et al, 2010 28
  • 29. Sequencing by ligation http://bit.ly/fPh22X sequencing steps 29
  • 31. Pyrosequencing Metzker et al, 2010 Metzker et al, 2010 31
  • 32. Real-time sequencing “ZMW” zero-mode waveguide DNA polymerase “strobe sequencing” 32
  • 33. Run time Gb/run Roche 454 8.5 hr 45 Illumina 9 days 35 SOLiD 14 days 50 Helicos 8 days 37 PacBio ? ? 33
  • 34. Accuracy - base calling error • base quality drops along read Sanger > SOLiD > Illumina > 454 > Helicos (“dephasing” within clusters) • base calling errors 34
  • 35. Accuracy - homopolymer runs Issue for Roche 454: 39% of errors are homopolymers A5 motifs: 3.3% error rate A8 motifs: 50% error rate Reason: use signal intensity as a measure for homopolymer length 35
  • 36. 36
  • 37. Ronaghi, Genome Res 11:3-11 (2001) 37
  • 39. Is it 4? Is it 5? Is it 4? http://mammoth.psu.edu/labPhotos/imageOfFlowgram.jpg 39
  • 40. Consensus accuracy Increase accuracy for SNP calling by increasing coverage: Illumina: 20X SOLiD: 12X 454: 7.4X Sanger: 3X Factors: raw accuracy + read length How deep do you have to sequence? => Poisson distribution: “If you sequence at average of 10X, how much of the genome will be covered at least 5X”? 40
  • 41. Bentley et al, Nature 456:53-56 (2008) 41
  • 42. FASTQ file format example fasta entries (n=2) “@” + identifier example fastq entries (n=2) sequence “+” + identifier (optional) phred-based quality scores phred quality score encoding Wikipedia 42
  • 43. Sequence quality control Is this good sequence? (essential!) E.g.: using FastQC tool (Babraham Institute, UK; http:// www.bioinformatics.bbsrc.ac.uk/projects/fastqc/) 43
  • 44. Sequence quality control per base sequence quality good bad 44
  • 45. Sequence quality control per sequence quality scores good bad 45
  • 46. Sequence quality control per base sequence content good bad 46
  • 47. Sequence quality control per base GC content good bad 47
  • 48. Sequence quality control per sequence GC content good bad 48
  • 49. Sequence quality control k-mer content good bad 49
  • 51. Online genome analysis http://galaxy.psu.edu/ “Galaxy allows you to do analyses you cannot do anywhere else without the need to install or download anything. You can analyze multiple alignments, compare genomic annotations, profile metagenomic samples and much much more...” 51
  • 52. 52
  • 53. 53
  • 55. Kahvejian et al, 2008 55
  • 56. DNA-seq ChIP-seq RNA-seq Kahvejian et al, 2008 50 56
  • 57. identify sequence variations DNA-seq ChIP-seq RNA-seq identify pathogens Kahvejian et al, 2008 50 51 57
  • 58. Exercises 58
  • 59. Try to login to the server mentioned on Toledo with username and password provided there. There are 2 FASTQ files in /mnt/homes/jaerts/: s_1_sequence.txt and s_2_sequence.txt (= paired ends) • How many sequences are in s_1_sequence.txt? • What encoding was used for the quality score? Illumina? Sanger? • What are the numerical quality scores for the first sequence in s_1_sequence.txt (i.e. 7172283/1)? 59
  • 60. • Create an account on the Galaxy server • Download s_1_sequence.txt and s_2_sequence.txt from Toledo and upload them into Galaxy. These files are also available on the linux server • Have a look at the contents of s_1_sequence.txt. • Convert quality scores to numeric values for s_1_sequence.txt (“FASTQ Groomer”) • Draw the quality score boxplot for s_1_sequence.txt • Draw the nucleotide distribution chart for s_1_sequence.txt 60
  • 61. References Bentley DR et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456: 53-59 (2008) Kahvejian A, Quackenbush J & Thompson JF. What would you do if you could sequence everything? Nature Biotechnology 26: 1125-1133 (2008) Korbel JO et al. Paired-end mapping reveals extensive structural variation in the human genome. Science 318: 420-426 (2007) Mardis ER. A decade’s perspective on DNA sequencing technology. Nature 470: 198-203 (2011) Metzker ML. Sequencing technologies - the next generation. Nature Reviews Genetics 11:31-46 (2010) Shendure J & Ji H. Next-generation DNA sequencing. Nature Biotechnology 26:1135-1145 (2008) Turner EH et al. Methods for genomic partitioning. Annual Review of Genomics and Human Genetics 10 (2009) 61