An introduction to second generation sequencing will be given with focus on the basic production informatics: The approach of raw data conversion and quality control will be discussed.
2. Production Informatics and Bioinformatics June 23, 2011 Produce raw sequence reads Basic Production Informatics Map to genome and generate raw genomic features (e.g. SNPs) Advanced Production Inform. Analyze the data; Uncover the biological meaning Bioinformatics Research Per one-flowcell project
3.
4. What steps are involved in sequencing ? June 23, 2011 sequencing by synthesis (SBS) technology Fragmentation Library generation Amplification Sequencing Analysis Illumina Marketing: “3h 10 minutes wet-lab 30 minutes dry lab”
7. Output: 1.5 Terabyte of data June 23, 2011 Inspired by anzska information booklet
8. Sequencer Output Conversion: Production Informatics 1.5 TB data : 6 billion clusters with 100 bp reads = 600 billion data points June 23, 2011 HiSeq CASAVA … × read length For HiSeq: images are converted to flat files (*.bcl or *.cif) visualpharm.com Maysoft
9. Multiplexing 6 billion reads: 750 million reads per lane Currently 12-plex (soon 96-plex): One run June 23, 2011 Oliver Twardowski
14. Fastq: Quality control Base-pair quality score Adapter contamination Uneven Amplification June 23, 2011
15. Three things to remember Don’t be fooled by marketing Fastqfiles are not directly usable Basic-run QC can be made from fastq file June 23, 2011 “All modern genomics projects are now bottlenecked at the stage of data analysis rather than data production” Ewan Birney European Bioinformatics Institute Wellcome Trust David S. Roos Bioinformatics--Trying to Swim in a Sea of Data;Science 16 February 2001: Vol. 291 no. 5507 pp. 1260-1261 DOI: 10.1126/science.291.5507.1260
16. Next Week: June 23, 2011 Abstract: This session will focus on identifying SNPs from whole genome, exome capture or targeted resequencing data. The approaches of mapping, local realigment, recalibration, SNP calling, and SNP recalibration will be introduced and quality metrics discussed.
19. Helicos true Single Molecule Sequencing(tSMS)™ technology Sequencing by synthesis but much more sensitive so no amplification June 23, 2011
20. Life Technology - Ion Torrent Hydrogen Ion is released by the incorporation of a nucleotide, which is measured by a semiconductor Depending on which nucleotide wash cycle the signal coincides June 23, 2011
21. PacBio Immobilized polymerase at the bottom of a well Fluorescent nucleotides float around and if they are incorporated they are held still for tens of milliseconds, which is the signal that is recorded No upper limit on the length June 23, 2011 http://www.pacificbiosciences.com/smrt-biology/smrt-technology?page=4
22. Nanopore Molecule is sucked through a poor and the change in the membrane charge due to the different nucleotides is recorded. June 23, 2011 http://www.nanoporetech.com/sections/index/82
PCR where a labeled nucleotide is incorporated at random that terminates the PCR reaction. These fragments of different length are then separated on a gel and the sequence can be manually read from the labeled end nucleotides.
Some of you have done some library prep already so you have a feel for how realistic 3h10 min are for this. This seminar goes through the analysis steps that are required to answer the question the data was generated for. So by the end of this seminar series you’ll have also a feel for how realistic 30 minutes is for the data analysis.
PCR where a labeled nucleotide is incorporated at random that terminates the PCR reaction. These fragments of different length are then separated on a gel and the sequence can be manually read from the labeled end nucleotides.