SlideShare uma empresa Scribd logo
1 de 21
Baixar para ler offline
Surya Saha ss2489@cornell.edu
BTI PGRP Summer Internship Program 2014
Slides: https://bitly.com/BioinfoInternEx2014
Quality Control of NGS Data
1. Evaluation
2. Preprocessing
Quality Control of NGS Data
7/8/2014 BTI PGRP Summer Internship Program 2014 2
Slide credit: Aureliano Bombarely
Goal:
Learn the use of read evaluation programs keeping
attention in relevant parameters such as quality score and
length distributions and reads duplications.
Data:
(Illumina data for two tomato ripening stages)
/home/bioinfo/Data/ch4_demo_dataset.tar.gz
Tools:
tar -zxvf (command line, untar and unzip the files)
head (command line, take a quick look of the files)
mv (command line, change the name of the files)
grep (command line, find/count patterns in files)
FASTX toolkit (command line, process fasta/fastq)
FastQC (gui, to calculate several stats for each file)
Evaluation
7/8/2014 BTI PGRP Summer Internship Program 2014 3
Slide credit: Aureliano Bombarely
Exercise 1:
1. Untar and Unzip the file:
/home/bioinfo/Data/ch4_demo_dataset.tar.gz
2. Raw data will be found in two dirs: breaker and
immature_fruit. Print the first 10 lines for the files:
SRR404331_ch4.fq, SRR404333_ch4.fq,
SRR404334_ch4.fq and SRR404336_ch4.fq.
Question 1.1: Do these files have fastq format?
3. Change the extension of the .fq files to .fastq
Evaluation
7/8/2014 BTI PGRP Summer Internship Program 2014 4
Slide credit: Aureliano Bombarely
Exercise 1:
4. Count number of sequences in each fastq file using
commands you learnt earlier.
5. Convert the fastq files to fasta.
6. Explore other tools in the FASTX toolkit.
7. Now count the number of sequences in fasta file and see
if the number of sequences has changed.
Evaluation
Tip: Use ‘grep’
Tip: Use ‘fastq_to_fasta -h’ to see help
Use Google if you are stuck
7/8/2014 BTI PGRP Summer Internship Program 2014 5
Slide credit: Aureliano Bombarely
Evaluation: Sequence Quality
Good
Illumina
dataset
7/8/2014 BTI PGRP Summer Internship Program 2014 6
Evaluation: Sequence Quality
7/8/2014 BTI PGRP Summer Internship Program 2014 7
Good
Illumina
dataset
Poor
Illumina
dataset
Evaluation: Sequence Quality
7/8/2014 BTI PGRP Summer Internship Program 2014 8
454
Pacific
Biosciences
Evaluation: Sequence Content
Good
Illumina
dataset
7/8/2014 BTI PGRP Summer Internship Program 2014 9
Evaluation: Sequence Content
7/8/2014 BTI PGRP Summer Internship Program 2014 10
Good
Illumina
dataset
Poor
Illumina
dataset
Evaluation: Duplication
Good
Illumina
dataset
7/8/2014 BTI PGRP Summer Internship Program 2014 11
Evaluation: Duplication
7/8/2014 BTI PGRP Summer Internship Program 2014 12
Good
Illumina
dataset
Poor
Illumina
dataset
Evaluation: Overrepresented Sequences
Good
Illumina
dataset
7/8/2014 BTI PGRP Summer Internship Program 2014 13
Evaluation: Overrepresented Sequences
7/8/2014 BTI PGRP Summer Internship Program 2014 14
Good
Illumina
dataset
Poor
Illumina
dataset
Evaluation: Kmer content
Good
Illumina
dataset
7/8/2014 BTI PGRP Summer Internship Program 2014 15
Evaluation: Kmer content
7/8/2014 BTI PGRP Summer Internship Program 2014 16
Good
Illumina
dataset
Poor
Illumina
dataset
Evaluation: Kmer content
7/8/2014 BTI PGRP Summer Internship Program 2014 17
454
Pacific
Biosciences
Question 2.2: How many sequences there are per file in FastQC?
Question 2.3: Which is the length range for these reads?
Question 2.4: Which is the quality score range for these reads? Which
one looks best quality-wise?
Question 2.5: Do these datasets have read overrepresentation?
Question 2.6: Looking into the kmer content, do you think that the samples
have an adaptor?
Evaluation
Exercise 2:
1.Type ‘fastqc’ to start the FastQC program. Load the four
fastq sequence files in the program.
7/8/2014 BTI PGRP Summer Internship Program 2014 18
Goal:
Trim the low quality ends of the reads and remove
the short reads.
Data:
(Illumina data for two tomato ripening stages)
ch4_demo_dataset.tar.gz
Tools:
fastq-mcf (command line tool to process reads)
FastQC (gui, to calculate several stats for each file)
Preprocessing
7/8/2014 BTI PGRP Summer Internship Program 2014 19
Exercise 3:
• Download the file: adapters1.fa from
ftp://ftp.solgenomics.net/user_requests/aubombarely/courses/RNAseqCorpoica/a
dapters1.fa
• Run the read processing program over each of the datasets
using
• Min. qscore of 30
• Min. length of 40 bp
• Type ‘fastqc’ to start the FastQC program. Load the four
new fastq sequence files. Compare the results with the
previous datasets.
Preprocessing
Tip: Use ‘fastqc -h’ to see help
7/8/2014 BTI PGRP Summer Internship Program 2014 20
Need Help??
7/8/2014 BTI PGRP Summer Internship Program 2014 21
Solutions: https://bitly.com/BioinfoInternExSol2014

Mais conteúdo relacionado

Mais procurados

Genotyping by sequencing
Genotyping by sequencingGenotyping by sequencing
Genotyping by sequencingBhavya Sree
 
Genotyping by Sequencing
Genotyping by SequencingGenotyping by Sequencing
Genotyping by SequencingSenthil Natesan
 
Whole Genome Selection
Whole Genome SelectionWhole Genome Selection
Whole Genome SelectionRaghav N.R
 
BEST PRACTICE TO MAXIMIZE THROUGHPUT WITH NANOPORE TECHNOLOGY & DE NOVO SEQUE...
BEST PRACTICE TO MAXIMIZE THROUGHPUT WITH NANOPORE TECHNOLOGY & DE NOVO SEQUE...BEST PRACTICE TO MAXIMIZE THROUGHPUT WITH NANOPORE TECHNOLOGY & DE NOVO SEQUE...
BEST PRACTICE TO MAXIMIZE THROUGHPUT WITH NANOPORE TECHNOLOGY & DE NOVO SEQUE...Baptiste Mayjonade
 
Genomic Selection & Precision Phenotyping
Genomic Selection & Precision PhenotypingGenomic Selection & Precision Phenotyping
Genomic Selection & Precision PhenotypingCIMMYT
 
System biology and its tools
System biology and its toolsSystem biology and its tools
System biology and its toolsGaurav Diwakar
 
MOLECULAR MAPPING Of AROMATIC GENES IN RICE
MOLECULAR MAPPING Of AROMATIC GENES IN RICEMOLECULAR MAPPING Of AROMATIC GENES IN RICE
MOLECULAR MAPPING Of AROMATIC GENES IN RICEPratikChhari1
 
1909 BERT: why-and-how (CODE SEMINAR)
1909 BERT: why-and-how (CODE SEMINAR)1909 BERT: why-and-how (CODE SEMINAR)
1909 BERT: why-and-how (CODE SEMINAR)WarNik Chow
 
Material Transfer Agreement
Material Transfer AgreementMaterial Transfer Agreement
Material Transfer AgreementRobert Harrison
 
Uni prot presentation
Uni prot presentationUni prot presentation
Uni prot presentationRida Khalid
 
Cryopreservation
CryopreservationCryopreservation
Cryopreservationaachal jain
 
Genomic selection for crop improvement
Genomic selection for crop improvementGenomic selection for crop improvement
Genomic selection for crop improvementnagamani gorantla
 
AI in Bioinformatics
AI in BioinformaticsAI in Bioinformatics
AI in BioinformaticsAli Kishk
 

Mais procurados (20)

Gene Expression Omnibus (GEO)
Gene Expression Omnibus (GEO)Gene Expression Omnibus (GEO)
Gene Expression Omnibus (GEO)
 
Genotyping by sequencing
Genotyping by sequencingGenotyping by sequencing
Genotyping by sequencing
 
Genotyping by Sequencing
Genotyping by SequencingGenotyping by Sequencing
Genotyping by Sequencing
 
Ionomics
IonomicsIonomics
Ionomics
 
Pubchem
PubchemPubchem
Pubchem
 
NGS File formats
NGS File formatsNGS File formats
NGS File formats
 
Ml in genomics
Ml in genomicsMl in genomics
Ml in genomics
 
Whole Genome Selection
Whole Genome SelectionWhole Genome Selection
Whole Genome Selection
 
BEST PRACTICE TO MAXIMIZE THROUGHPUT WITH NANOPORE TECHNOLOGY & DE NOVO SEQUE...
BEST PRACTICE TO MAXIMIZE THROUGHPUT WITH NANOPORE TECHNOLOGY & DE NOVO SEQUE...BEST PRACTICE TO MAXIMIZE THROUGHPUT WITH NANOPORE TECHNOLOGY & DE NOVO SEQUE...
BEST PRACTICE TO MAXIMIZE THROUGHPUT WITH NANOPORE TECHNOLOGY & DE NOVO SEQUE...
 
Genomic Selection & Precision Phenotyping
Genomic Selection & Precision PhenotypingGenomic Selection & Precision Phenotyping
Genomic Selection & Precision Phenotyping
 
System biology and its tools
System biology and its toolsSystem biology and its tools
System biology and its tools
 
Genome wide association mapping
Genome wide association mappingGenome wide association mapping
Genome wide association mapping
 
MOLECULAR MAPPING Of AROMATIC GENES IN RICE
MOLECULAR MAPPING Of AROMATIC GENES IN RICEMOLECULAR MAPPING Of AROMATIC GENES IN RICE
MOLECULAR MAPPING Of AROMATIC GENES IN RICE
 
1909 BERT: why-and-how (CODE SEMINAR)
1909 BERT: why-and-how (CODE SEMINAR)1909 BERT: why-and-how (CODE SEMINAR)
1909 BERT: why-and-how (CODE SEMINAR)
 
Material Transfer Agreement
Material Transfer AgreementMaterial Transfer Agreement
Material Transfer Agreement
 
Uni prot presentation
Uni prot presentationUni prot presentation
Uni prot presentation
 
Cryopreservation
CryopreservationCryopreservation
Cryopreservation
 
Genomic selection for crop improvement
Genomic selection for crop improvementGenomic selection for crop improvement
Genomic selection for crop improvement
 
Swiss prot
Swiss protSwiss prot
Swiss prot
 
AI in Bioinformatics
AI in BioinformaticsAI in Bioinformatics
AI in Bioinformatics
 

Semelhante a Quality Control of NGS Data

Quality Control of NGS Data Solutions
Quality Control of NGS Data  SolutionsQuality Control of NGS Data  Solutions
Quality Control of NGS Data SolutionsSurya Saha
 
Quality Control of Sequencing Data
Quality Control of Sequencing Data Quality Control of Sequencing Data
Quality Control of Sequencing Data Surya Saha
 
PTU: Using Provenance for Repeatability
PTU: Using Provenance for RepeatabilityPTU: Using Provenance for Repeatability
PTU: Using Provenance for RepeatabilityTanu Malik
 
Reproducible, Automated and Portable Computational and Data Science Experimen...
Reproducible, Automated and Portable Computational and Data Science Experimen...Reproducible, Automated and Portable Computational and Data Science Experimen...
Reproducible, Automated and Portable Computational and Data Science Experimen...Ivo Jimenez
 
Sharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsSharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsGaignard Alban
 
HiPEAC 2019 Tutorial - Maestro RTOS
HiPEAC 2019 Tutorial - Maestro RTOSHiPEAC 2019 Tutorial - Maestro RTOS
HiPEAC 2019 Tutorial - Maestro RTOSTulipp. Eu
 
Ipaw14 presentation Quan, Tanu, Ian
Ipaw14 presentation Quan, Tanu, IanIpaw14 presentation Quan, Tanu, Ian
Ipaw14 presentation Quan, Tanu, IanBoris Glavic
 
Auditing and Maintaining Provenance in Software Packages
Auditing and Maintaining Provenance in Software PackagesAuditing and Maintaining Provenance in Software Packages
Auditing and Maintaining Provenance in Software PackagesTanu Malik
 
information management Project.docx
information management Project.docxinformation management Project.docx
information management Project.docxssuser9a7e24
 
From Buffer-Overflowing Genomic Tools to Securing Biomedical File Formats
From Buffer-Overflowing Genomic Tools to Securing Biomedical File FormatsFrom Buffer-Overflowing Genomic Tools to Securing Biomedical File Formats
From Buffer-Overflowing Genomic Tools to Securing Biomedical File FormatsCharles Fracchia
 
Apigee deploy grunt plugin.1.0
Apigee deploy grunt plugin.1.0Apigee deploy grunt plugin.1.0
Apigee deploy grunt plugin.1.0Diego Zuluaga
 
Scalable Hadoop-Based Pooled Time Series of Big Video Data from the Deep Web
Scalable Hadoop-Based Pooled Time Series of Big Video Data  from the Deep WebScalable Hadoop-Based Pooled Time Series of Big Video Data  from the Deep Web
Scalable Hadoop-Based Pooled Time Series of Big Video Data from the Deep WebMadhav Sharan
 
Qtp-training A presentation for beginers
Qtp-training  A presentation for beginersQtp-training  A presentation for beginers
Qtp-training A presentation for beginersDhavamani Prakash
 
Fedora Iptables
Fedora IptablesFedora Iptables
Fedora Iptableszubin71
 
Getting the most out of multi-GPU on Inference stage using Hadoop-spark cluster
Getting the most out of multi-GPU on Inference stage using Hadoop-spark clusterGetting the most out of multi-GPU on Inference stage using Hadoop-spark cluster
Getting the most out of multi-GPU on Inference stage using Hadoop-spark clusterDaesu Chung
 
KineMatik November 2010
KineMatik November 2010KineMatik November 2010
KineMatik November 2010Michael Price
 
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdfSteve Caron
 

Semelhante a Quality Control of NGS Data (20)

Quality Control of NGS Data Solutions
Quality Control of NGS Data  SolutionsQuality Control of NGS Data  Solutions
Quality Control of NGS Data Solutions
 
Quality Control of Sequencing Data
Quality Control of Sequencing Data Quality Control of Sequencing Data
Quality Control of Sequencing Data
 
MAVRL Workshop 2014 - Python Materials Genomics (pymatgen)
MAVRL Workshop 2014 - Python Materials Genomics (pymatgen)MAVRL Workshop 2014 - Python Materials Genomics (pymatgen)
MAVRL Workshop 2014 - Python Materials Genomics (pymatgen)
 
PTU: Using Provenance for Repeatability
PTU: Using Provenance for RepeatabilityPTU: Using Provenance for Repeatability
PTU: Using Provenance for Repeatability
 
Reproducible, Automated and Portable Computational and Data Science Experimen...
Reproducible, Automated and Portable Computational and Data Science Experimen...Reproducible, Automated and Portable Computational and Data Science Experimen...
Reproducible, Automated and Portable Computational and Data Science Experimen...
 
Sharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsSharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reports
 
Gnocchi batching
Gnocchi batchingGnocchi batching
Gnocchi batching
 
HiPEAC 2019 Tutorial - Maestro RTOS
HiPEAC 2019 Tutorial - Maestro RTOSHiPEAC 2019 Tutorial - Maestro RTOS
HiPEAC 2019 Tutorial - Maestro RTOS
 
Ipaw14 presentation Quan, Tanu, Ian
Ipaw14 presentation Quan, Tanu, IanIpaw14 presentation Quan, Tanu, Ian
Ipaw14 presentation Quan, Tanu, Ian
 
Auditing and Maintaining Provenance in Software Packages
Auditing and Maintaining Provenance in Software PackagesAuditing and Maintaining Provenance in Software Packages
Auditing and Maintaining Provenance in Software Packages
 
information management Project.docx
information management Project.docxinformation management Project.docx
information management Project.docx
 
From Buffer-Overflowing Genomic Tools to Securing Biomedical File Formats
From Buffer-Overflowing Genomic Tools to Securing Biomedical File FormatsFrom Buffer-Overflowing Genomic Tools to Securing Biomedical File Formats
From Buffer-Overflowing Genomic Tools to Securing Biomedical File Formats
 
Apigee deploy grunt plugin.1.0
Apigee deploy grunt plugin.1.0Apigee deploy grunt plugin.1.0
Apigee deploy grunt plugin.1.0
 
Scalable Hadoop-Based Pooled Time Series of Big Video Data from the Deep Web
Scalable Hadoop-Based Pooled Time Series of Big Video Data  from the Deep WebScalable Hadoop-Based Pooled Time Series of Big Video Data  from the Deep Web
Scalable Hadoop-Based Pooled Time Series of Big Video Data from the Deep Web
 
Qtp-training A presentation for beginers
Qtp-training  A presentation for beginersQtp-training  A presentation for beginers
Qtp-training A presentation for beginers
 
Fedora Iptables
Fedora IptablesFedora Iptables
Fedora Iptables
 
Getting the most out of multi-GPU on Inference stage using Hadoop-spark cluster
Getting the most out of multi-GPU on Inference stage using Hadoop-spark clusterGetting the most out of multi-GPU on Inference stage using Hadoop-spark cluster
Getting the most out of multi-GPU on Inference stage using Hadoop-spark cluster
 
KineMatik November 2010
KineMatik November 2010KineMatik November 2010
KineMatik November 2010
 
Sequencing
SequencingSequencing
Sequencing
 
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
 

Mais de Surya Saha

An open access resource portal for arthropod vectors and agricultural pathosy...
An open access resource portal for arthropod vectors and agricultural pathosy...An open access resource portal for arthropod vectors and agricultural pathosy...
An open access resource portal for arthropod vectors and agricultural pathosy...Surya Saha
 
Functional annotation of invertebrate genomes
Functional annotation of invertebrate genomesFunctional annotation of invertebrate genomes
Functional annotation of invertebrate genomesSurya Saha
 
Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...
Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...
Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...Surya Saha
 
Updates on Citrusgreening.org database from USDA NIFA project meeting
Updates on Citrusgreening.org database from USDA NIFA project meetingUpdates on Citrusgreening.org database from USDA NIFA project meeting
Updates on Citrusgreening.org database from USDA NIFA project meetingSurya Saha
 
Updates on the ACP v3 genome and annotation from USDA NIFA project meeting
Updates on the ACP v3 genome and annotation from USDA NIFA project meetingUpdates on the ACP v3 genome and annotation from USDA NIFA project meeting
Updates on the ACP v3 genome and annotation from USDA NIFA project meetingSurya Saha
 
AgriVectors: A Data and Systems Resource for Arthropod Vectors of Plant Diseases
AgriVectors: A Data and Systems Resource for Arthropod Vectors of Plant DiseasesAgriVectors: A Data and Systems Resource for Arthropod Vectors of Plant Diseases
AgriVectors: A Data and Systems Resource for Arthropod Vectors of Plant DiseasesSurya Saha
 
Visualization of insect vector-plant pathogen interactions in the citrus gree...
Visualization of insect vector-plant pathogen interactions in the citrus gree...Visualization of insect vector-plant pathogen interactions in the citrus gree...
Visualization of insect vector-plant pathogen interactions in the citrus gree...Surya Saha
 
Deciphering the genome of Diaphorina citri to develop solutions for the citru...
Deciphering the genome of Diaphorina citri to develop solutions for the citru...Deciphering the genome of Diaphorina citri to develop solutions for the citru...
Deciphering the genome of Diaphorina citri to develop solutions for the citru...Surya Saha
 
Sequencing 2017
Sequencing 2017Sequencing 2017
Sequencing 2017Surya Saha
 
Community resources for all y’all Omics
Community resources for all y’all OmicsCommunity resources for all y’all Omics
Community resources for all y’all OmicsSurya Saha
 
CitrusCyc: Metabolic Pathway Databases for the C. clementina and C. sinensis...
 CitrusCyc: Metabolic Pathway Databases for the C. clementina and C. sinensis... CitrusCyc: Metabolic Pathway Databases for the C. clementina and C. sinensis...
CitrusCyc: Metabolic Pathway Databases for the C. clementina and C. sinensis...Surya Saha
 
Using Long Reads, Optical Maps and Long-Range Scaffolding to improve the Diap...
Using Long Reads, Optical Maps and Long-Range Scaffolding to improve the Diap...Using Long Reads, Optical Maps and Long-Range Scaffolding to improve the Diap...
Using Long Reads, Optical Maps and Long-Range Scaffolding to improve the Diap...Surya Saha
 
Sequencing 2016
Sequencing 2016Sequencing 2016
Sequencing 2016Surya Saha
 
Tomato Genome Build SL3.0
Tomato Genome Build SL3.0Tomato Genome Build SL3.0
Tomato Genome Build SL3.0Surya Saha
 
Sequencing and Bioinformatics PGRP Summer 2015
Sequencing and Bioinformatics PGRP Summer 2015Sequencing and Bioinformatics PGRP Summer 2015
Sequencing and Bioinformatics PGRP Summer 2015Surya Saha
 
Quality Control of Sequencing Data
Quality Control of Sequencing DataQuality Control of Sequencing Data
Quality Control of Sequencing DataSurya Saha
 
Sequencing: The Next Generation 2015
Sequencing: The Next Generation 2015Sequencing: The Next Generation 2015
Sequencing: The Next Generation 2015Surya Saha
 
Tomato Genome SL2.50 and Beyond…
Tomato Genome SL2.50 and Beyond…Tomato Genome SL2.50 and Beyond…
Tomato Genome SL2.50 and Beyond…Surya Saha
 
Sequencing, Genome Assembly and the SGN Platform
Sequencing, Genome Assembly and the SGN PlatformSequencing, Genome Assembly and the SGN Platform
Sequencing, Genome Assembly and the SGN PlatformSurya Saha
 
ICAR Soybean Indore 2014
ICAR Soybean Indore 2014ICAR Soybean Indore 2014
ICAR Soybean Indore 2014Surya Saha
 

Mais de Surya Saha (20)

An open access resource portal for arthropod vectors and agricultural pathosy...
An open access resource portal for arthropod vectors and agricultural pathosy...An open access resource portal for arthropod vectors and agricultural pathosy...
An open access resource portal for arthropod vectors and agricultural pathosy...
 
Functional annotation of invertebrate genomes
Functional annotation of invertebrate genomesFunctional annotation of invertebrate genomes
Functional annotation of invertebrate genomes
 
Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...
Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...
Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...
 
Updates on Citrusgreening.org database from USDA NIFA project meeting
Updates on Citrusgreening.org database from USDA NIFA project meetingUpdates on Citrusgreening.org database from USDA NIFA project meeting
Updates on Citrusgreening.org database from USDA NIFA project meeting
 
Updates on the ACP v3 genome and annotation from USDA NIFA project meeting
Updates on the ACP v3 genome and annotation from USDA NIFA project meetingUpdates on the ACP v3 genome and annotation from USDA NIFA project meeting
Updates on the ACP v3 genome and annotation from USDA NIFA project meeting
 
AgriVectors: A Data and Systems Resource for Arthropod Vectors of Plant Diseases
AgriVectors: A Data and Systems Resource for Arthropod Vectors of Plant DiseasesAgriVectors: A Data and Systems Resource for Arthropod Vectors of Plant Diseases
AgriVectors: A Data and Systems Resource for Arthropod Vectors of Plant Diseases
 
Visualization of insect vector-plant pathogen interactions in the citrus gree...
Visualization of insect vector-plant pathogen interactions in the citrus gree...Visualization of insect vector-plant pathogen interactions in the citrus gree...
Visualization of insect vector-plant pathogen interactions in the citrus gree...
 
Deciphering the genome of Diaphorina citri to develop solutions for the citru...
Deciphering the genome of Diaphorina citri to develop solutions for the citru...Deciphering the genome of Diaphorina citri to develop solutions for the citru...
Deciphering the genome of Diaphorina citri to develop solutions for the citru...
 
Sequencing 2017
Sequencing 2017Sequencing 2017
Sequencing 2017
 
Community resources for all y’all Omics
Community resources for all y’all OmicsCommunity resources for all y’all Omics
Community resources for all y’all Omics
 
CitrusCyc: Metabolic Pathway Databases for the C. clementina and C. sinensis...
 CitrusCyc: Metabolic Pathway Databases for the C. clementina and C. sinensis... CitrusCyc: Metabolic Pathway Databases for the C. clementina and C. sinensis...
CitrusCyc: Metabolic Pathway Databases for the C. clementina and C. sinensis...
 
Using Long Reads, Optical Maps and Long-Range Scaffolding to improve the Diap...
Using Long Reads, Optical Maps and Long-Range Scaffolding to improve the Diap...Using Long Reads, Optical Maps and Long-Range Scaffolding to improve the Diap...
Using Long Reads, Optical Maps and Long-Range Scaffolding to improve the Diap...
 
Sequencing 2016
Sequencing 2016Sequencing 2016
Sequencing 2016
 
Tomato Genome Build SL3.0
Tomato Genome Build SL3.0Tomato Genome Build SL3.0
Tomato Genome Build SL3.0
 
Sequencing and Bioinformatics PGRP Summer 2015
Sequencing and Bioinformatics PGRP Summer 2015Sequencing and Bioinformatics PGRP Summer 2015
Sequencing and Bioinformatics PGRP Summer 2015
 
Quality Control of Sequencing Data
Quality Control of Sequencing DataQuality Control of Sequencing Data
Quality Control of Sequencing Data
 
Sequencing: The Next Generation 2015
Sequencing: The Next Generation 2015Sequencing: The Next Generation 2015
Sequencing: The Next Generation 2015
 
Tomato Genome SL2.50 and Beyond…
Tomato Genome SL2.50 and Beyond…Tomato Genome SL2.50 and Beyond…
Tomato Genome SL2.50 and Beyond…
 
Sequencing, Genome Assembly and the SGN Platform
Sequencing, Genome Assembly and the SGN PlatformSequencing, Genome Assembly and the SGN Platform
Sequencing, Genome Assembly and the SGN Platform
 
ICAR Soybean Indore 2014
ICAR Soybean Indore 2014ICAR Soybean Indore 2014
ICAR Soybean Indore 2014
 

Último

mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...RKavithamani
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 

Último (20)

mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 

Quality Control of NGS Data

  • 1. Surya Saha ss2489@cornell.edu BTI PGRP Summer Internship Program 2014 Slides: https://bitly.com/BioinfoInternEx2014 Quality Control of NGS Data
  • 2. 1. Evaluation 2. Preprocessing Quality Control of NGS Data 7/8/2014 BTI PGRP Summer Internship Program 2014 2 Slide credit: Aureliano Bombarely
  • 3. Goal: Learn the use of read evaluation programs keeping attention in relevant parameters such as quality score and length distributions and reads duplications. Data: (Illumina data for two tomato ripening stages) /home/bioinfo/Data/ch4_demo_dataset.tar.gz Tools: tar -zxvf (command line, untar and unzip the files) head (command line, take a quick look of the files) mv (command line, change the name of the files) grep (command line, find/count patterns in files) FASTX toolkit (command line, process fasta/fastq) FastQC (gui, to calculate several stats for each file) Evaluation 7/8/2014 BTI PGRP Summer Internship Program 2014 3 Slide credit: Aureliano Bombarely
  • 4. Exercise 1: 1. Untar and Unzip the file: /home/bioinfo/Data/ch4_demo_dataset.tar.gz 2. Raw data will be found in two dirs: breaker and immature_fruit. Print the first 10 lines for the files: SRR404331_ch4.fq, SRR404333_ch4.fq, SRR404334_ch4.fq and SRR404336_ch4.fq. Question 1.1: Do these files have fastq format? 3. Change the extension of the .fq files to .fastq Evaluation 7/8/2014 BTI PGRP Summer Internship Program 2014 4 Slide credit: Aureliano Bombarely
  • 5. Exercise 1: 4. Count number of sequences in each fastq file using commands you learnt earlier. 5. Convert the fastq files to fasta. 6. Explore other tools in the FASTX toolkit. 7. Now count the number of sequences in fasta file and see if the number of sequences has changed. Evaluation Tip: Use ‘grep’ Tip: Use ‘fastq_to_fasta -h’ to see help Use Google if you are stuck 7/8/2014 BTI PGRP Summer Internship Program 2014 5 Slide credit: Aureliano Bombarely
  • 6. Evaluation: Sequence Quality Good Illumina dataset 7/8/2014 BTI PGRP Summer Internship Program 2014 6
  • 7. Evaluation: Sequence Quality 7/8/2014 BTI PGRP Summer Internship Program 2014 7 Good Illumina dataset Poor Illumina dataset
  • 8. Evaluation: Sequence Quality 7/8/2014 BTI PGRP Summer Internship Program 2014 8 454 Pacific Biosciences
  • 9. Evaluation: Sequence Content Good Illumina dataset 7/8/2014 BTI PGRP Summer Internship Program 2014 9
  • 10. Evaluation: Sequence Content 7/8/2014 BTI PGRP Summer Internship Program 2014 10 Good Illumina dataset Poor Illumina dataset
  • 11. Evaluation: Duplication Good Illumina dataset 7/8/2014 BTI PGRP Summer Internship Program 2014 11
  • 12. Evaluation: Duplication 7/8/2014 BTI PGRP Summer Internship Program 2014 12 Good Illumina dataset Poor Illumina dataset
  • 13. Evaluation: Overrepresented Sequences Good Illumina dataset 7/8/2014 BTI PGRP Summer Internship Program 2014 13
  • 14. Evaluation: Overrepresented Sequences 7/8/2014 BTI PGRP Summer Internship Program 2014 14 Good Illumina dataset Poor Illumina dataset
  • 15. Evaluation: Kmer content Good Illumina dataset 7/8/2014 BTI PGRP Summer Internship Program 2014 15
  • 16. Evaluation: Kmer content 7/8/2014 BTI PGRP Summer Internship Program 2014 16 Good Illumina dataset Poor Illumina dataset
  • 17. Evaluation: Kmer content 7/8/2014 BTI PGRP Summer Internship Program 2014 17 454 Pacific Biosciences
  • 18. Question 2.2: How many sequences there are per file in FastQC? Question 2.3: Which is the length range for these reads? Question 2.4: Which is the quality score range for these reads? Which one looks best quality-wise? Question 2.5: Do these datasets have read overrepresentation? Question 2.6: Looking into the kmer content, do you think that the samples have an adaptor? Evaluation Exercise 2: 1.Type ‘fastqc’ to start the FastQC program. Load the four fastq sequence files in the program. 7/8/2014 BTI PGRP Summer Internship Program 2014 18
  • 19. Goal: Trim the low quality ends of the reads and remove the short reads. Data: (Illumina data for two tomato ripening stages) ch4_demo_dataset.tar.gz Tools: fastq-mcf (command line tool to process reads) FastQC (gui, to calculate several stats for each file) Preprocessing 7/8/2014 BTI PGRP Summer Internship Program 2014 19
  • 20. Exercise 3: • Download the file: adapters1.fa from ftp://ftp.solgenomics.net/user_requests/aubombarely/courses/RNAseqCorpoica/a dapters1.fa • Run the read processing program over each of the datasets using • Min. qscore of 30 • Min. length of 40 bp • Type ‘fastqc’ to start the FastQC program. Load the four new fastq sequence files. Compare the results with the previous datasets. Preprocessing Tip: Use ‘fastqc -h’ to see help 7/8/2014 BTI PGRP Summer Internship Program 2014 20
  • 21. Need Help?? 7/8/2014 BTI PGRP Summer Internship Program 2014 21 Solutions: https://bitly.com/BioinfoInternExSol2014