SlideShare uma empresa Scribd logo
1 de 22
[MIT] Introduction to 2GS data analysis Drink faster ! June 23, 2011
Production Informatics and Bioinformatics June 23, 2011 Produce raw sequence reads Basic Production Informatics Map to genome and generate raw genomic features (e.g. SNPs) Advanced  Production Inform. Analyze the data; Uncover the biological meaning Bioinformatics Research Per one-flowcell project
First Generation: Sanger sequencing ,[object Object],Third Generation: single molecule sequencing Brief history of sequencing  June 23, 2011 * * * Discussion about category
What steps are involved in sequencing ? June 23, 2011 sequencing by synthesis (SBS) technology Fragmentation Library generation Amplification Sequencing Analysis Illumina Marketing:  “3h 10 minutes wet-lab 30 minutes dry lab”
Illumina sequencing: Library + Amplification June 23, 2011 “Illumina Sequencing Technology” booklet
Illumina Sequencing: Synthesis + Imaging June 23, 2011 “Illumina Sequencing Technology” booklet
Output: 1.5 Terabyte of data June 23, 2011 Inspired by anzska information booklet
Sequencer Output Conversion: Production Informatics 1.5 TB data : 6 billion clusters with 100 bp reads  	= 600 billion data points  June 23, 2011 HiSeq CASAVA … × read length For HiSeq: images are converted to flat files (*.bcl or *.cif)  visualpharm.com Maysoft
Multiplexing 6 billion reads: 750 million reads per lane Currently 12-plex (soon 96-plex): One run   June 23, 2011 Oliver Twardowski
Demultiplexing June 23, 2011 CASAVA … … × samples × read length visualpharm.com
CASAVA1.8.0 program call June 23, 2011 configureBclToFastq.pl br />	--input-dir Data/Intensities/BaseCalls/ br />    -output-dir Data/Unaligned br />	--sample-sheet SampleSheet.csv  	--use-bases-mask y100,I6nn,Y100 >file.log 2>&1 cd Data/Unaligned qsub -pe make 16 -jy -v $MYPATH –oqsub.out -cwd –N fastq -by br />    make -j 16 Runtime: ~ 6h
Fastq files June 23, 2011 @HWI-ST301_0112:1:1:1169:2044#0/1 CCATAAGGCCACGTATTTTGCAAGCTATTTAACTGGCGGCGAT +HWI-ST301_0112:1:1:1169:2044#0/1 dddcd^dd`acacdacd`ecdedabdcdddcc`bTabr />36 36 36 35 28 … ASCII       @ .. ~ DEC        64 .. 126 PHRED     0 .. 62 Phred scores are estimates only !  Cock PJ, Fields CJ, Goto N, Heuer ML, Rice PM. The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Res. 2010 Apr;38(6):1767-71. PMID:20015970
Fastq – PHRED quality Pathological June 23, 2011
Fastq: Quality control Base-pair quality score  Adapter contamination Uneven Amplification  June 23, 2011
Three things to remember Don’t be fooled by marketing Fastqfiles are not directly usable Basic-run QC can be made from fastq file June 23, 2011 “All modern genomics projects are now bottlenecked at the stage of data analysis rather than data production” 							Ewan Birney 		      European Bioinformatics Institute Wellcome Trust  David S. Roos  Bioinformatics--Trying to Swim in a Sea of Data;Science 16 February 2001: Vol. 291 no. 5507 pp. 1260-1261 DOI: 10.1126/science.291.5507.1260
Next Week: June 23, 2011 Abstract: This session will focus on identifying SNPs from whole genome, exome capture or targeted resequencing data. The approaches of mapping, local realigment, recalibration, SNP calling, and SNP recalibration will be introduced and quality metrics discussed.
Walk-in-clinic June 23, 2011
First Generation: Sanger sequencing ,[object Object],Third Generation: single molecule sequencing Brief history of sequencing  June 23, 2011 * * * Discussion about category
Helicos true Single Molecule Sequencing(tSMS)™ technology Sequencing by synthesis but much more sensitive so no amplification June 23, 2011
Life Technology - Ion Torrent Hydrogen Ion is released by the incorporation of a nucleotide, which is measured by a semiconductor Depending on which nucleotide wash cycle the signal coincides June 23, 2011
PacBio Immobilized polymerase at the bottom of a well Fluorescent nucleotides float around and if they are incorporated they are held still for tens of milliseconds, which is the signal that is recorded No upper limit on the length   June 23, 2011 http://www.pacificbiosciences.com/smrt-biology/smrt-technology?page=4
Nanopore Molecule is sucked through a poor and the change in the membrane charge due to the different nucleotides is recorded. June 23, 2011 http://www.nanoporetech.com/sections/index/82

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

RNA-Seq
RNA-SeqRNA-Seq
RNA-Seq
 
Next generation sequencing methods
Next generation sequencing methods Next generation sequencing methods
Next generation sequencing methods
 
Whole genome sequencing
Whole genome sequencingWhole genome sequencing
Whole genome sequencing
 
Introduction to Next Generation Sequencing
Introduction to Next Generation SequencingIntroduction to Next Generation Sequencing
Introduction to Next Generation Sequencing
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
 
DNA Sequencing
DNA Sequencing DNA Sequencing
DNA Sequencing
 
Next Generation Sequencing
Next Generation SequencingNext Generation Sequencing
Next Generation Sequencing
 
Probe labelling and hybridization
Probe labelling and hybridizationProbe labelling and hybridization
Probe labelling and hybridization
 
NEXT GENERATION SEQUENCING
NEXT GENERATION SEQUENCINGNEXT GENERATION SEQUENCING
NEXT GENERATION SEQUENCING
 
Conventional and next generation sequencing ppt
Conventional and next generation sequencing pptConventional and next generation sequencing ppt
Conventional and next generation sequencing ppt
 
Next Generation Sequencing (NGS)
Next Generation Sequencing (NGS)Next Generation Sequencing (NGS)
Next Generation Sequencing (NGS)
 
Transcriptome analysis
Transcriptome analysisTranscriptome analysis
Transcriptome analysis
 
Shotgun and clone contig method
Shotgun and clone contig methodShotgun and clone contig method
Shotgun and clone contig method
 
Genome Mapping
Genome MappingGenome Mapping
Genome Mapping
 
Genome annotation 2013
Genome annotation 2013Genome annotation 2013
Genome annotation 2013
 
dna sequencing methods
 dna sequencing methods dna sequencing methods
dna sequencing methods
 
NEXT GENERATION SEQUENCING
NEXT GENERATION SEQUENCINGNEXT GENERATION SEQUENCING
NEXT GENERATION SEQUENCING
 
Dna sequencing ppt
Dna sequencing pptDna sequencing ppt
Dna sequencing ppt
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
 
Ion torrent sequencing
Ion torrent sequencingIon torrent sequencing
Ion torrent sequencing
 

Destaque

New Generation Sequencing Technologies: an overview
New Generation Sequencing Technologies: an overviewNew Generation Sequencing Technologies: an overview
New Generation Sequencing Technologies: an overviewPaolo Dametto
 
Ngs microbiome
Ngs microbiomeNgs microbiome
Ngs microbiomejukais
 
2013 july 25 systems biology rna seq v2
2013 july 25 systems biology rna seq v22013 july 25 systems biology rna seq v2
2013 july 25 systems biology rna seq v2Anne Deslattes Mays
 
Variant (SNPs/Indels) calling in DNA sequences, Part 2
Variant (SNPs/Indels) calling in DNA sequences, Part 2Variant (SNPs/Indels) calling in DNA sequences, Part 2
Variant (SNPs/Indels) calling in DNA sequences, Part 2Denis C. Bauer
 
Variant (SNPs/Indels) calling in DNA sequences, Part 1
Variant (SNPs/Indels) calling in DNA sequences, Part 1 Variant (SNPs/Indels) calling in DNA sequences, Part 1
Variant (SNPs/Indels) calling in DNA sequences, Part 1 Denis C. Bauer
 
Functionally annotate genomic variants
Functionally annotate genomic variantsFunctionally annotate genomic variants
Functionally annotate genomic variantsDenis C. Bauer
 
GRUTER가 들려주는 Big Data Platform 구축 전략과 적용 사례: Bioinformatics Data를 위한 Hadoop기반...
GRUTER가 들려주는 Big Data Platform 구축 전략과 적용 사례: Bioinformatics Data를 위한 Hadoop기반...GRUTER가 들려주는 Big Data Platform 구축 전략과 적용 사례: Bioinformatics Data를 위한 Hadoop기반...
GRUTER가 들려주는 Big Data Platform 구축 전략과 적용 사례: Bioinformatics Data를 위한 Hadoop기반...Gruter
 
How to sequence a large eukaryotic genome
How to sequence a large eukaryotic genomeHow to sequence a large eukaryotic genome
How to sequence a large eukaryotic genomeLex Nederbragt
 
Amplicon sequencing slides - Trina McMahon - MEWE 2013
Amplicon sequencing slides - Trina McMahon - MEWE 2013Amplicon sequencing slides - Trina McMahon - MEWE 2013
Amplicon sequencing slides - Trina McMahon - MEWE 2013mcmahonUW
 
Esa 2014 qiime
Esa 2014 qiimeEsa 2014 qiime
Esa 2014 qiimeZech Xu
 
Introduction to Bioinformatics
Introduction to BioinformaticsIntroduction to Bioinformatics
Introduction to BioinformaticsDenis C. Bauer
 
Evolution of DNA Sequencing - talk by Jonathan Eisen for the Bodega Workshop ...
Evolution of DNA Sequencing - talk by Jonathan Eisen for the Bodega Workshop ...Evolution of DNA Sequencing - talk by Jonathan Eisen for the Bodega Workshop ...
Evolution of DNA Sequencing - talk by Jonathan Eisen for the Bodega Workshop ...Jonathan Eisen
 
Data Management for Quantitative Biology - Data sources (Next generation tech...
Data Management for Quantitative Biology - Data sources (Next generation tech...Data Management for Quantitative Biology - Data sources (Next generation tech...
Data Management for Quantitative Biology - Data sources (Next generation tech...QBiC_Tue
 
Part 1 of RNA-seq for DE analysis: Defining the goal
Part 1 of RNA-seq for DE analysis: Defining the goalPart 1 of RNA-seq for DE analysis: Defining the goal
Part 1 of RNA-seq for DE analysis: Defining the goalJoachim Jacob
 

Destaque (20)

Illumina Sequencing
Illumina SequencingIllumina Sequencing
Illumina Sequencing
 
Introduction to next generation sequencing
Introduction to next generation sequencingIntroduction to next generation sequencing
Introduction to next generation sequencing
 
New Generation Sequencing Technologies: an overview
New Generation Sequencing Technologies: an overviewNew Generation Sequencing Technologies: an overview
New Generation Sequencing Technologies: an overview
 
Ngs microbiome
Ngs microbiomeNgs microbiome
Ngs microbiome
 
2013 july 25 systems biology rna seq v2
2013 july 25 systems biology rna seq v22013 july 25 systems biology rna seq v2
2013 july 25 systems biology rna seq v2
 
Variant (SNPs/Indels) calling in DNA sequences, Part 2
Variant (SNPs/Indels) calling in DNA sequences, Part 2Variant (SNPs/Indels) calling in DNA sequences, Part 2
Variant (SNPs/Indels) calling in DNA sequences, Part 2
 
Variant (SNPs/Indels) calling in DNA sequences, Part 1
Variant (SNPs/Indels) calling in DNA sequences, Part 1 Variant (SNPs/Indels) calling in DNA sequences, Part 1
Variant (SNPs/Indels) calling in DNA sequences, Part 1
 
Functionally annotate genomic variants
Functionally annotate genomic variantsFunctionally annotate genomic variants
Functionally annotate genomic variants
 
GRUTER가 들려주는 Big Data Platform 구축 전략과 적용 사례: Bioinformatics Data를 위한 Hadoop기반...
GRUTER가 들려주는 Big Data Platform 구축 전략과 적용 사례: Bioinformatics Data를 위한 Hadoop기반...GRUTER가 들려주는 Big Data Platform 구축 전략과 적용 사례: Bioinformatics Data를 위한 Hadoop기반...
GRUTER가 들려주는 Big Data Platform 구축 전략과 적용 사례: Bioinformatics Data를 위한 Hadoop기반...
 
How to sequence a large eukaryotic genome
How to sequence a large eukaryotic genomeHow to sequence a large eukaryotic genome
How to sequence a large eukaryotic genome
 
Bridge Amplification Part 1
Bridge Amplification Part 1Bridge Amplification Part 1
Bridge Amplification Part 1
 
Amplicon sequencing slides - Trina McMahon - MEWE 2013
Amplicon sequencing slides - Trina McMahon - MEWE 2013Amplicon sequencing slides - Trina McMahon - MEWE 2013
Amplicon sequencing slides - Trina McMahon - MEWE 2013
 
Esa 2014 qiime
Esa 2014 qiimeEsa 2014 qiime
Esa 2014 qiime
 
Histology Portfolio
Histology Portfolio Histology Portfolio
Histology Portfolio
 
Introduction to Bioinformatics
Introduction to BioinformaticsIntroduction to Bioinformatics
Introduction to Bioinformatics
 
Genome
GenomeGenome
Genome
 
Evolution of DNA Sequencing - talk by Jonathan Eisen for the Bodega Workshop ...
Evolution of DNA Sequencing - talk by Jonathan Eisen for the Bodega Workshop ...Evolution of DNA Sequencing - talk by Jonathan Eisen for the Bodega Workshop ...
Evolution of DNA Sequencing - talk by Jonathan Eisen for the Bodega Workshop ...
 
Data Management for Quantitative Biology - Data sources (Next generation tech...
Data Management for Quantitative Biology - Data sources (Next generation tech...Data Management for Quantitative Biology - Data sources (Next generation tech...
Data Management for Quantitative Biology - Data sources (Next generation tech...
 
Part 1 of RNA-seq for DE analysis: Defining the goal
Part 1 of RNA-seq for DE analysis: Defining the goalPart 1 of RNA-seq for DE analysis: Defining the goal
Part 1 of RNA-seq for DE analysis: Defining the goal
 
Feulgen stain
Feulgen stainFeulgen stain
Feulgen stain
 

Semelhante a Introduction to 2GS data analysis and sequencing technologies

Mouse Genomes Project Summary June 2010
Mouse Genomes Project Summary June 2010Mouse Genomes Project Summary June 2010
Mouse Genomes Project Summary June 2010Thomas Keane
 
Apollo Collaborative genome annotation editing
Apollo Collaborative genome annotation editing Apollo Collaborative genome annotation editing
Apollo Collaborative genome annotation editing Monica Munoz-Torres
 
DNA sequencing: rapid improvements and their implications
DNA sequencing: rapid improvements and their implicationsDNA sequencing: rapid improvements and their implications
DNA sequencing: rapid improvements and their implicationsJeffrey Funk
 
Processing Amplicon Sequence Data for the Analysis of Microbial Communities
Processing Amplicon Sequence Data for the Analysis of Microbial CommunitiesProcessing Amplicon Sequence Data for the Analysis of Microbial Communities
Processing Amplicon Sequence Data for the Analysis of Microbial CommunitiesMartin Hartmann
 
03_Microbio590B_sequencing_2022.pdf
03_Microbio590B_sequencing_2022.pdf03_Microbio590B_sequencing_2022.pdf
03_Microbio590B_sequencing_2022.pdfKristen DeAngelis
 
2015 09-29-sbc322-methods.key
2015 09-29-sbc322-methods.key2015 09-29-sbc322-methods.key
2015 09-29-sbc322-methods.keyYannick Wurm
 
The Transformation of Systems Biology Into A Large Data Science
The Transformation of Systems Biology Into A Large Data ScienceThe Transformation of Systems Biology Into A Large Data Science
The Transformation of Systems Biology Into A Large Data ScienceRobert Grossman
 
Comparison between RNASeq and Microarray for Gene Expression Analysis
Comparison between RNASeq and Microarray for Gene Expression AnalysisComparison between RNASeq and Microarray for Gene Expression Analysis
Comparison between RNASeq and Microarray for Gene Expression AnalysisYaoyu Wang
 
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...EMC
 
Introduction to Next-Generation Sequencing (NGS) Technology
Introduction to Next-Generation Sequencing (NGS) TechnologyIntroduction to Next-Generation Sequencing (NGS) Technology
Introduction to Next-Generation Sequencing (NGS) TechnologyQIAGEN
 
Avila et al 2010 wnt 3
Avila et al 2010 wnt 3Avila et al 2010 wnt 3
Avila et al 2010 wnt 3Jorge Parodi
 
Examining gene expression and methylation with next gen sequencing
Examining gene expression and methylation with next gen sequencingExamining gene expression and methylation with next gen sequencing
Examining gene expression and methylation with next gen sequencingStephen Turner
 
Microarray biotechnologg ppy dna microarrays
Microarray biotechnologg ppy dna microarraysMicroarray biotechnologg ppy dna microarrays
Microarray biotechnologg ppy dna microarraysayeshasattarsandhu
 
GMI proficiency testing- Progress report 2016
GMI proficiency testing- Progress report 2016GMI proficiency testing- Progress report 2016
GMI proficiency testing- Progress report 2016ExternalEvents
 
Sequence based Markers
Sequence based MarkersSequence based Markers
Sequence based Markerssukruthaa
 
EVE161: Microbial Phylogenomics - Class 2 - Evolution of DNA Sequencing
EVE161: Microbial Phylogenomics - Class 2 - Evolution of DNA SequencingEVE161: Microbial Phylogenomics - Class 2 - Evolution of DNA Sequencing
EVE161: Microbial Phylogenomics - Class 2 - Evolution of DNA SequencingJonathan Eisen
 

Semelhante a Introduction to 2GS data analysis and sequencing technologies (20)

Mouse Genomes Project Summary June 2010
Mouse Genomes Project Summary June 2010Mouse Genomes Project Summary June 2010
Mouse Genomes Project Summary June 2010
 
Apollo Collaborative genome annotation editing
Apollo Collaborative genome annotation editing Apollo Collaborative genome annotation editing
Apollo Collaborative genome annotation editing
 
Genome Assembly
Genome AssemblyGenome Assembly
Genome Assembly
 
DNA sequencing: rapid improvements and their implications
DNA sequencing: rapid improvements and their implicationsDNA sequencing: rapid improvements and their implications
DNA sequencing: rapid improvements and their implications
 
Processing Amplicon Sequence Data for the Analysis of Microbial Communities
Processing Amplicon Sequence Data for the Analysis of Microbial CommunitiesProcessing Amplicon Sequence Data for the Analysis of Microbial Communities
Processing Amplicon Sequence Data for the Analysis of Microbial Communities
 
03_Microbio590B_sequencing_2022.pdf
03_Microbio590B_sequencing_2022.pdf03_Microbio590B_sequencing_2022.pdf
03_Microbio590B_sequencing_2022.pdf
 
2015 09-29-sbc322-methods.key
2015 09-29-sbc322-methods.key2015 09-29-sbc322-methods.key
2015 09-29-sbc322-methods.key
 
The Transformation of Systems Biology Into A Large Data Science
The Transformation of Systems Biology Into A Large Data ScienceThe Transformation of Systems Biology Into A Large Data Science
The Transformation of Systems Biology Into A Large Data Science
 
Comparison between RNASeq and Microarray for Gene Expression Analysis
Comparison between RNASeq and Microarray for Gene Expression AnalysisComparison between RNASeq and Microarray for Gene Expression Analysis
Comparison between RNASeq and Microarray for Gene Expression Analysis
 
Jan2016 pac bio giab
Jan2016 pac bio giabJan2016 pac bio giab
Jan2016 pac bio giab
 
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...
White Paper: Next-Generation Genome Sequencing Using EMC Isilon Scale-Out NAS...
 
Introduction to Next-Generation Sequencing (NGS) Technology
Introduction to Next-Generation Sequencing (NGS) TechnologyIntroduction to Next-Generation Sequencing (NGS) Technology
Introduction to Next-Generation Sequencing (NGS) Technology
 
Avila et al 2010 wnt 3
Avila et al 2010 wnt 3Avila et al 2010 wnt 3
Avila et al 2010 wnt 3
 
Examining gene expression and methylation with next gen sequencing
Examining gene expression and methylation with next gen sequencingExamining gene expression and methylation with next gen sequencing
Examining gene expression and methylation with next gen sequencing
 
A Journey Through The History Of DNA Sequencing
A Journey Through The History Of DNA Sequencing A Journey Through The History Of DNA Sequencing
A Journey Through The History Of DNA Sequencing
 
BioSB meeting 2015
BioSB meeting 2015BioSB meeting 2015
BioSB meeting 2015
 
Microarray biotechnologg ppy dna microarrays
Microarray biotechnologg ppy dna microarraysMicroarray biotechnologg ppy dna microarrays
Microarray biotechnologg ppy dna microarrays
 
GMI proficiency testing- Progress report 2016
GMI proficiency testing- Progress report 2016GMI proficiency testing- Progress report 2016
GMI proficiency testing- Progress report 2016
 
Sequence based Markers
Sequence based MarkersSequence based Markers
Sequence based Markers
 
EVE161: Microbial Phylogenomics - Class 2 - Evolution of DNA Sequencing
EVE161: Microbial Phylogenomics - Class 2 - Evolution of DNA SequencingEVE161: Microbial Phylogenomics - Class 2 - Evolution of DNA Sequencing
EVE161: Microbial Phylogenomics - Class 2 - Evolution of DNA Sequencing
 

Mais de Denis C. Bauer

Cloud-native machine learning - Transforming bioinformatics research
Cloud-native machine learning - Transforming bioinformatics research Cloud-native machine learning - Transforming bioinformatics research
Cloud-native machine learning - Transforming bioinformatics research Denis C. Bauer
 
Translating genomics into clinical practice - 2018 AWS summit keynote
Translating genomics into clinical practice - 2018 AWS summit keynoteTranslating genomics into clinical practice - 2018 AWS summit keynote
Translating genomics into clinical practice - 2018 AWS summit keynoteDenis C. Bauer
 
Going Server-less for Web-Services that need to Crunch Large Volumes of Data
Going Server-less for Web-Services that need to Crunch Large Volumes of DataGoing Server-less for Web-Services that need to Crunch Large Volumes of Data
Going Server-less for Web-Services that need to Crunch Large Volumes of DataDenis C. Bauer
 
How novel compute technology transforms life science research
How novel compute technology transforms life science researchHow novel compute technology transforms life science research
How novel compute technology transforms life science researchDenis C. Bauer
 
How novel compute technology transforms life science research
How novel compute technology transforms life science researchHow novel compute technology transforms life science research
How novel compute technology transforms life science researchDenis C. Bauer
 
VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...Denis C. Bauer
 
Population-scale high-throughput sequencing data analysis
Population-scale high-throughput sequencing data analysisPopulation-scale high-throughput sequencing data analysis
Population-scale high-throughput sequencing data analysisDenis C. Bauer
 
Allelic Imbalance for Pre-capture Whole Exome Sequencing
Allelic Imbalance for Pre-capture Whole Exome SequencingAllelic Imbalance for Pre-capture Whole Exome Sequencing
Allelic Imbalance for Pre-capture Whole Exome SequencingDenis C. Bauer
 
Centralizing sequence analysis
Centralizing sequence analysisCentralizing sequence analysis
Centralizing sequence analysisDenis C. Bauer
 
Qbi Centre for Brain genomics (Informatics side)
Qbi Centre for Brain genomics (Informatics side)Qbi Centre for Brain genomics (Informatics side)
Qbi Centre for Brain genomics (Informatics side)Denis C. Bauer
 
Differential gene expression
Differential gene expressionDifferential gene expression
Differential gene expressionDenis C. Bauer
 
Transcript detection in RNAseq
Transcript detection in RNAseqTranscript detection in RNAseq
Transcript detection in RNAseqDenis C. Bauer
 
The missing data issue for HiSeq runs
The missing data issue for HiSeq runsThe missing data issue for HiSeq runs
The missing data issue for HiSeq runsDenis C. Bauer
 
Deciphering the regulatory code in the genome
Deciphering the regulatory code in the genomeDeciphering the regulatory code in the genome
Deciphering the regulatory code in the genomeDenis C. Bauer
 
STAR: Recombination site prediction
STAR: Recombination site predictionSTAR: Recombination site prediction
STAR: Recombination site predictionDenis C. Bauer
 
SUMOylation site prediction
SUMOylation site predictionSUMOylation site prediction
SUMOylation site predictionDenis C. Bauer
 

Mais de Denis C. Bauer (18)

Cloud-native machine learning - Transforming bioinformatics research
Cloud-native machine learning - Transforming bioinformatics research Cloud-native machine learning - Transforming bioinformatics research
Cloud-native machine learning - Transforming bioinformatics research
 
Translating genomics into clinical practice - 2018 AWS summit keynote
Translating genomics into clinical practice - 2018 AWS summit keynoteTranslating genomics into clinical practice - 2018 AWS summit keynote
Translating genomics into clinical practice - 2018 AWS summit keynote
 
Going Server-less for Web-Services that need to Crunch Large Volumes of Data
Going Server-less for Web-Services that need to Crunch Large Volumes of DataGoing Server-less for Web-Services that need to Crunch Large Volumes of Data
Going Server-less for Web-Services that need to Crunch Large Volumes of Data
 
How novel compute technology transforms life science research
How novel compute technology transforms life science researchHow novel compute technology transforms life science research
How novel compute technology transforms life science research
 
How novel compute technology transforms life science research
How novel compute technology transforms life science researchHow novel compute technology transforms life science research
How novel compute technology transforms life science research
 
VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...
 
Population-scale high-throughput sequencing data analysis
Population-scale high-throughput sequencing data analysisPopulation-scale high-throughput sequencing data analysis
Population-scale high-throughput sequencing data analysis
 
Trip Report Seattle
Trip Report SeattleTrip Report Seattle
Trip Report Seattle
 
Allelic Imbalance for Pre-capture Whole Exome Sequencing
Allelic Imbalance for Pre-capture Whole Exome SequencingAllelic Imbalance for Pre-capture Whole Exome Sequencing
Allelic Imbalance for Pre-capture Whole Exome Sequencing
 
Centralizing sequence analysis
Centralizing sequence analysisCentralizing sequence analysis
Centralizing sequence analysis
 
Qbi Centre for Brain genomics (Informatics side)
Qbi Centre for Brain genomics (Informatics side)Qbi Centre for Brain genomics (Informatics side)
Qbi Centre for Brain genomics (Informatics side)
 
Differential gene expression
Differential gene expressionDifferential gene expression
Differential gene expression
 
Transcript detection in RNAseq
Transcript detection in RNAseqTranscript detection in RNAseq
Transcript detection in RNAseq
 
The missing data issue for HiSeq runs
The missing data issue for HiSeq runsThe missing data issue for HiSeq runs
The missing data issue for HiSeq runs
 
Deciphering the regulatory code in the genome
Deciphering the regulatory code in the genomeDeciphering the regulatory code in the genome
Deciphering the regulatory code in the genome
 
ReliF
ReliFReliF
ReliF
 
STAR: Recombination site prediction
STAR: Recombination site predictionSTAR: Recombination site prediction
STAR: Recombination site prediction
 
SUMOylation site prediction
SUMOylation site predictionSUMOylation site prediction
SUMOylation site prediction
 

Último

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 

Último (20)

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 

Introduction to 2GS data analysis and sequencing technologies

  • 1. [MIT] Introduction to 2GS data analysis Drink faster ! June 23, 2011
  • 2. Production Informatics and Bioinformatics June 23, 2011 Produce raw sequence reads Basic Production Informatics Map to genome and generate raw genomic features (e.g. SNPs) Advanced Production Inform. Analyze the data; Uncover the biological meaning Bioinformatics Research Per one-flowcell project
  • 3.
  • 4. What steps are involved in sequencing ? June 23, 2011 sequencing by synthesis (SBS) technology Fragmentation Library generation Amplification Sequencing Analysis Illumina Marketing: “3h 10 minutes wet-lab 30 minutes dry lab”
  • 5. Illumina sequencing: Library + Amplification June 23, 2011 “Illumina Sequencing Technology” booklet
  • 6. Illumina Sequencing: Synthesis + Imaging June 23, 2011 “Illumina Sequencing Technology” booklet
  • 7. Output: 1.5 Terabyte of data June 23, 2011 Inspired by anzska information booklet
  • 8. Sequencer Output Conversion: Production Informatics 1.5 TB data : 6 billion clusters with 100 bp reads = 600 billion data points June 23, 2011 HiSeq CASAVA … × read length For HiSeq: images are converted to flat files (*.bcl or *.cif) visualpharm.com Maysoft
  • 9. Multiplexing 6 billion reads: 750 million reads per lane Currently 12-plex (soon 96-plex): One run June 23, 2011 Oliver Twardowski
  • 10. Demultiplexing June 23, 2011 CASAVA … … × samples × read length visualpharm.com
  • 11. CASAVA1.8.0 program call June 23, 2011 configureBclToFastq.pl br /> --input-dir Data/Intensities/BaseCalls/ br /> -output-dir Data/Unaligned br /> --sample-sheet SampleSheet.csv --use-bases-mask y100,I6nn,Y100 >file.log 2>&1 cd Data/Unaligned qsub -pe make 16 -jy -v $MYPATH –oqsub.out -cwd –N fastq -by br /> make -j 16 Runtime: ~ 6h
  • 12. Fastq files June 23, 2011 @HWI-ST301_0112:1:1:1169:2044#0/1 CCATAAGGCCACGTATTTTGCAAGCTATTTAACTGGCGGCGAT +HWI-ST301_0112:1:1:1169:2044#0/1 dddcd^dd`acacdacd`ecdedabdcdddcc`bTabr />36 36 36 35 28 … ASCII @ .. ~ DEC 64 .. 126 PHRED 0 .. 62 Phred scores are estimates only ! Cock PJ, Fields CJ, Goto N, Heuer ML, Rice PM. The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Res. 2010 Apr;38(6):1767-71. PMID:20015970
  • 13. Fastq – PHRED quality Pathological June 23, 2011
  • 14. Fastq: Quality control Base-pair quality score Adapter contamination Uneven Amplification June 23, 2011
  • 15. Three things to remember Don’t be fooled by marketing Fastqfiles are not directly usable Basic-run QC can be made from fastq file June 23, 2011 “All modern genomics projects are now bottlenecked at the stage of data analysis rather than data production” Ewan Birney European Bioinformatics Institute Wellcome Trust David S. Roos Bioinformatics--Trying to Swim in a Sea of Data;Science 16 February 2001: Vol. 291 no. 5507 pp. 1260-1261 DOI: 10.1126/science.291.5507.1260
  • 16. Next Week: June 23, 2011 Abstract: This session will focus on identifying SNPs from whole genome, exome capture or targeted resequencing data. The approaches of mapping, local realigment, recalibration, SNP calling, and SNP recalibration will be introduced and quality metrics discussed.
  • 18.
  • 19. Helicos true Single Molecule Sequencing(tSMS)™ technology Sequencing by synthesis but much more sensitive so no amplification June 23, 2011
  • 20. Life Technology - Ion Torrent Hydrogen Ion is released by the incorporation of a nucleotide, which is measured by a semiconductor Depending on which nucleotide wash cycle the signal coincides June 23, 2011
  • 21. PacBio Immobilized polymerase at the bottom of a well Fluorescent nucleotides float around and if they are incorporated they are held still for tens of milliseconds, which is the signal that is recorded No upper limit on the length June 23, 2011 http://www.pacificbiosciences.com/smrt-biology/smrt-technology?page=4
  • 22. Nanopore Molecule is sucked through a poor and the change in the membrane charge due to the different nucleotides is recorded. June 23, 2011 http://www.nanoporetech.com/sections/index/82

Notas do Editor

  1. http://2.bp.blogspot.com/_BPr6hpMG0tg/TSZdkYDcRvI/AAAAAAAAAjY/ReScIkWNySg/s1600/drink.jpg
  2. PCR where a labeled nucleotide is incorporated at random that terminates the PCR reaction. These fragments of different length are then separated on a gel and the sequence can be manually read from the labeled end nucleotides.
  3. Some of you have done some library prep already so you have a feel for how realistic 3h10 min are for this. This seminar goes through the analysis steps that are required to answer the question the data was generated for. So by the end of this seminar series you’ll have also a feel for how realistic 30 minutes is for the data analysis.
  4. PCR where a labeled nucleotide is incorporated at random that terminates the PCR reaction. These fragments of different length are then separated on a gel and the sequence can be manually read from the labeled end nucleotides.
  5. http://www.helicosbio.com/Technology/TrueSingleMoleculeSequencing/tabid/64/Default.aspx
  6. http://www.nanoporetech.com/sections/index/82