SlideShare uma empresa Scribd logo
1 de 4
Baixar para ler offline
APPLICATION NOTE
Paired-End Sequencing




Ion	semiconductor	sequencing	uniquely	enables		
both	accurate	long	reads	and	paired-end	sequencing


                                          PANEL A
Summary
•	 Ion	Torrent	recently	launched	a	
   long	read	kit	for	the	Ion	PGM™	
   sequencer	with	modal	high-quality	
   read	lengths	of	225	bases.

•	 Read	lengths	greater	than	
   500	bases	are	feasible,	as	
   demonstrated	by	the	generation		                                PANEL B

   of	a	perfect	525	bases	read.

•	 A	novel	paired-end	sequencing	
   (PES)	protocol	for	the	Ion	PGM™	
   system	has	been	demonstrated.	
   Ion	PES	further	enhances	
   accuracy.	The	number	of	indel	
   errors	was	reduced	5-fold,	and	the	
   total	number	of	consensus	errors	
   was	reduced	10-fold.



                                         Figure 1. Ion semiconductor sequencing utilizes natural nucleotides and simple sequenc-
                                         ing chemistry, producing accurate reads that are at least twice as long as reads generated
                                         with fluorescent sequencing-by-synthesis (SBS) chemistry. Sequencing accuracy generally
                                         decreases as read length increases, but this slope is relatively flat for the Ion PGM™ sequencer
                                         when compared with fluorescent SBS chemistry. The Ion PGM™ sequencer demonstrates modal
                                         read lengths of 225 bases (panel B). Measured per-base accuracy (Phred score) is calculated after
                                         aligning reads to the DH10B reference genome and is then plotted as a function of position in read,
                                         (panel A). A Phred score of 20 is equivalent to 99% accuracy, or 1 error per 100 bases.



                                         Next-generation sequencing — based applications and methodologies are
                                         extensively used to investigate genome biology. The lengths of next-generation
                                         sequencing reads are limited by accuracy—as read length increases, accuracy
                                         decreases. This fundamental inverse relationship between read length and
                                         accuracy governs sequencing-based applications.
Single-Well Ionogram for (2406, 1991)


                                                                 CGCTAAGTAATATTCGCCCCGTTCACACGATTCCTCTGTAGTTCAGTCGGTAGAACGGCGGACTGTTAATCCGTATGTCACTGGTTCGAGTCCAGTCAGA
                                                                 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
                                                                 CGCTAAGTAATATTCGCCCCGTTCACACGATTCCTCTGTAGTTCAGTCGGTAGAACGGCGGACTGTTAATCCGTATGTCACTGGTTCGAGTCCAGTCAGA
                                                                 1                                                                                                100

                                                                 GGAGCCAAATTCTAAAAATTCGCTTTTTTAGCGCAATGTCACTGACCTTAGTTGAACATTGTTTTTTAACGGATAGCGGGTTTTTAACATCTTAAGCGCC
                                                                 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
                                                                 GGAGCCAAATTCTAAAAATTCGCTTTTTTAGCGCAATGTCACTGACCTTAGTTGAACATTGTTTTTTAACGGATAGCGGGTTTTTAACATCTTAAGCGCC
                                                                 101                                                                                              200

                                                                 CTCGACCTTTATGGTTGAGGGCGTTTTGCTATGAACGCCATCACCATTTTCCCCTCGATTATAAAACTTGAGTTATTCAGTAGTCTCCCCTCTTGCAACT
                                                                 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
                                                                 CTCGACCTTTATGGTTGAGGGCGTTTTGCTATGAACGCCATCACCATTTTCCCCTCGATTATAAAACTTGAGTTATTCAGTAGTCTCCCCTCTTGCAACT
                                                                 201                                                                                              300

                                                                 CACACCCAAAACTGCCTAACGAAAAGTTATTAATTTTCAATCATATTGCTATCAGTATTTACATTTTTTCGCTGTGCTAGAAAGGGCGCATTTATGTTAG
                                                                 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
                                                                 CACACCCAAAACTGCCTAACGAAAAGTTATTAATTTTCAATCATATTGCTATCAGTATTTACATTTTTTCGCTGTGCTAGAAAGGGCGCATTTATGTTAG
                                                                 301                                                                                              400

                                                                 CTCGTTCAGGGAAGGTAAGCATGGCTACGAAGAAGAGAAGTGGAGAAGAAATAAATGACCGACAAATATTATGCGGGATGGGAATTAAACTACGCCGCTT
                                                                 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
                                                                 CTCGTTCAGGGAAGGTAAGCATGGCTACGAAGAAGAGAAGTGGAGAAGAAATAAATGACCGACAAATATTATGCGGGATGGGAATTAAACTACGCCGCTT
                                                                 401                                                                                              500

                                                                 AACTGCGGGTATCTGTCTGATAACT
                                                                 |||||||||||||||||||||||||
                                                                 AACTGCGGGTATCTGTCTGATAACT
                                                                 501                                                                                              600




Figure 2. Demonstration of a 525 base perfect read on the Ion PGM™ Sequencer. The “ionogram” (left) displays the number of each base (A, T, C,
or G) called at each nucleotide flow during an Ion PGM™ sequencing run. The sequence alignment (right) displays the read generated by the
Ion PGM™ sequencer in the top strand and the reference genome sequence in the bottom strand.


In October 2011, Ion Torrent                                      across the length of the read is 99%,                            Reads longer than 500 bases are
launched a long read kit for the PGM™                             (Q20) and consensus accuracy is                                  achievable, as demonstrated by the
sequencer, which can provide modal                                99.999%, (Q50). This data is available                           generation of a perfect read spanning
high-quality read lengths of 225                                  for download at ioncommunity.                                    525 contiguous bases (Figure 2).
bases (Figure 1). Mean raw accuracy                               iontorrent.com — run name STO-409.
                                                                                                                                   Paired-end sequencing (PES) was
                                                                                                                                   initially developed for short-read
        Step 1: Generate “forward”
        sequence reads
                                                     Step 2: Prepare template
                                                     for “reverse” sequence reads
                                                                                                      Step 3: Generate “reverse”
                                                                                                      sequence reads
                                                                                                                                   next-generation sequencing tech-
    C
    H                                                                                                                              nologies to increase both coverage of
    E
    M
                                                                                                                                   the human genome and the fraction
    I                                                                                                                              of generated reads aligning to the
    C
    A                                                                                                                              genome. Fortuitously, PES has
    L
                                                                                                                                   become useful for applications that
                                                                                                                                   demand ever longer reads. Increasing
                       Torrent Analysis Suite
                            base calling
                                                      Step 4: Merge “forward”
                                                       and “reverse” FASTQ
                                                                                              Torrent Analysis Suite
                                                                                                   base calling                    the length of accurately called bases
                  FASTQ                                        FASTQ                                              FASTQ            can obviate the need for PES in many
                  forward                                      Merged                                             reverse
                                                                                                                                   instances, allowing more of the
    D
    I
                                                            Bidirectional
                                                           “paired” reads                                                          genome to be accurately sequenced,
    G
    I                                                                             Fully                                            and for some especially demanding
    T                                                                           Overlapping
    A                                                                                                                              applications there is no substitute for
    L                                                                            Partially
                                                                                Overlapping                                        long reads. Given the utility of both
                                                                                                                                   accurate long reads and PES, Ion
                                                                                   Non-
                                                                                Overlapping                                        Torrent has recently developed
                                                                                                                                   solutions for both methods.
Figure 3. Only Ion Torrent Personal Genome Machine™ (PGM™) semiconductor sequencing
technology delivers the unique combination of long reads and PES. This figure depicts the
chemical and digital workflow of the novel PES method developed by Ion Torrent. Forward and
reverse reads are denoted by green and orange colors, respectively. The degree of paired-read
overlap is shown aligned to a reference genome.
Using the Ion sequencing kit              Each microwell on the Ion Chip, and        PES methodology can produce fully
launched in July 2011, which sup-         thus, each template is in perfect reg-     overlapping forward and reverse
ports read lengths of 100 bases, a        ister with each transistor, ensuring       reads with either a 2 x 100 or 2 x 200
novel PES method was developed for        that the linkage between forward and       Ion semiconductor sequencing run.
the Ion PGM™ sequencer that pro-          reverse reads is unbreakable even          Fully overlapping reads are useful
duces paired reads, each 100 bases        when the Ion Chip is removed from          for further enhancing accuracy. In
in length (2 x 100). We anticipate that   the Ion PGM™ sequencer between             counting-based applications such
applying recent improvements in           the forward and reverse sequencing         as RNA-Seq or ChIP-Seq, counting
read length to the Ion PES protocol       runs. Furthermore, because the Ion         accuracy is improved by partially
should produce paired reads of 200        Chip directly translates biochemi-         overlapping reads (Figure 3), which
bases in length (2 x 200), with the       cal signals into digital signals, via      increase the number of uniquely
possibility of paired reads up to 400     perfectly arrayed transistors, only        mapped reads and identify redundant
bases in length (2 x 400).                a particle-immobilized template,           reads by demarcating the start
                                          primer and polymerase are required         and end of the sequence reads.
Until recently, no sequencing technol-    in each Ion Chip microwell. This           Non-overlapping reads (Figure 3)
ogy could simultaneously offer the        simplicity facilitates the occurrence of   are useful for detecting genomic
benefits conferred by both long reads     nucleic acid biochemistry and enzy-        rearrangements or de novo assembly
and PES. The Ion Torrent Personal         mology directly in each microwell,         because the distance between the
Genome Machine™ (PGM) semiconduc-         enabling PES as well as other novel        forward and reverse reads can be
tor sequencing technology is now the      sequencing methods.                        longer than the length of a single
only platform that provides this unique                                              read and the distance is known and
combination of long reads and PES.                                                   uniformly distributed.
                                          Ion PES data analysis
                                                                                     To compare the accuracy of fully
Ion PES workflow                          Forward and reverse sequencing             overlapping paired reads to individual
                                          produces two FASTQ-formatted files.        reads, the proper pairs were either
The simplicity of semiconductor           Merging the two FASTQ files produces       aligned as individual reads or as
sequencing allows a template to           both unidirectional (singleton) and        a single consolidated read to the
be sequenced in both directions           bidirectional reads. Unidirectional        DH10B reference genome using
using a single Ion Chip. The first        reads are defined as Ion Chip              the TMAP alignment software in
read results from the polymerase          microwell positions that do not            the Torrent Analysis Suite. A total
extending the primer towards the          intersect in the forward and reverse       of 977,458 reads were paired and
Ion Sphere™ particle, in the “forward”    FASTQ files, whereas bidirectional
direction (Figure 3, step 1), the         reads are defined as the intersection
standard operating procedure for          of the Ion Chip microwell position                                       TGACGATTGC
the Ion PGM™ Sequencer. Then, off         in the forward and reverse FASTQ                                         TGACG-TTGC
instrument, directly within the Ion       files. After alignment to a reference                                    TGACGATTGC
                                                                                                                     Deletion
Chip, the template is prepared for        genome, the bidirectional or “paired”
the second read (Figure 3, step 2)        reads can be further subclassified                                       CAGCT-CAGA
via a series of simple enzymatic          into “improper pairs”, those with                                        CAGCTGCAGA
                                                                                                                   CAGCTCAGA
steps. The forward primer is fully        improper orientation or improper
                                                                                                                     Insertion
extended to the Ion Sphere™ particle,     distance between forward and reverse
and then the original template is         reads, and “proper pairs”, those                                         CCAGGTACAC
nicked and degraded to produce            with proper orientation and distance                                     CCAGGCACAC
                                                                                                                   CCAGGTACAC
a primer for the second read. The         between forward and reverse reads
                                                                                                                    Substitution
same Ion Chip is reintroduced to the      (Figure 3).
Ion PGM™ Sequencer, and the second                                                   Figure 4. Depiction of a paired-read error-
read is generated by the polymerase       PES generates differing degrees of         correction schema. Green and orange bars
extending the primer away from the        overlap between the forward and            represent forward and reverse reads derived
                                                                                     from fully overlapping paired reads, aligned
Ion Sphere™ particle, in the “reverse”    reverse reads, depending on the size
                                                                                     to a reference genome. Information from
direction (Figure 3, step 3).             of the library fragments and read          paired strands at the same nucleotide as well
                                          lengths. With a library insert size        as per-position quality scores can be used to
                                          of 100–200 bases in length, the Ion        develop improved bioinformatics methods for
                                                                                     error correction
aligned to the DH10B reference                                Conclusion                                              The difference between “paired-
genome using the Ion 314™ Chip,                                                                                       end sequencing” (PES) and
and of these 466,654 reads were                               The combination of long reads and                       “mate-pair sequencing” (MPS)
considered proper pairs, to produce                           paired-end sequencing delivers a
                                                                                                                      PES	and	MPS	are	similar	in	that	
53 Mb of consolidated data, giving                            unique and powerful capability that                     both	methods	produce	reads	from	
greater than 10x coverage on E. coli.                         is only available by employing Ion                      the	ends	of	the	templates,	yet	differ	
To consolidate forward and reverse                            semiconductor sequencing.                               in	that	MPS	produces	a	single	read	
reads, errors were handled via a                              Perfect reads exceeding 500 bases                       of	the	template	ends	that	were	
simple schema. Specifically, if one of                        have now been demonstrated using                        paired	during	library	construc-
the reads from the pair did not exhibit                       the Ion PGM™ system. When these                         tion—whereas	PES	produces	two	
                                                                                                                      sequence	reads,	one	from	each	
the same substitution, insertion,                             increasingly longer reads are coupled
                                                                                                                      end	of	the	same	template.	MPS	is	
or deletion at the same position,                             with paired-end sequencing the
                                                                                                                      generally	used	for	longer	inserts	
then the substitution, insertion, or                          outcome is enhanced accuracy.                           (2–10	kp),	and	PES	is	employed	for	
deletion was removed (Figure 4). Via                                                                                  shorter	inserts	(100–800	bases).	
this methodology, raw insertion and                                                                                   Ion	Torrent	recently	released	a	long	
deletion errors were reduced ~5-fold                                                                                  MPS	protocol	on	the	Ion	Community	
to 0.19%. Consensus accuracy was                                                                                      (ioncommunity.iontorrent.com)	
reduced over 10x, to only 17 total                                                                                    for	sequencing	the	ends	of	10	kb	
                                                                                                                      inserts,	which	have	important	appli-
errors with pairing.
                                                                                                                      cations	in	cancer	rearrangements	
                                                                                                                      and	de	novo	genome	assembly.	

                                                                                                                      The	non-overlapping	reads	produced	
                                                                                                                      by	MPS	and	PES	enable	applications	
                                                                                                                      such	as	interrogation	of	genomic	
                                                                                                                      rearrangement	or	de	novo	genome	
Bioinformatics methods                                                                                                assembly	because	the	distance	
                                                                                                                      between	the	forward	and	reverse	
Raw voltage data were processed using Torrent Suite v1.5 to generate FASTQ                                            reads	can	be	longer	than	the	length	
files. FASTQ files were aligned against E. coli DH10B using TMAP v0.1.3-1                                             of	a	single	read	and	the	distance	is	
to produce BAM files. Variants for STO-409 were called using SAMTOOLS                                                 known	and	uniformly	distributed.	
mpileup using default parameters as wrapped with Torrent Suite v1.5 in the                                            Beyond	assembly	and	interrogation	
                                                                                                                      of	genome	structural	variation,	PES	
Germ Line Variant Caller Plug-In. Variants for PES analysis were called from
                                                                                                                      can	improve	the	number	of	uniquely	
BAM files using both SAMTOOLS mpileup (-Q 7 -h 50 -o 10 -e 17 -m4) and                                                mapped	reads,	the	ability	to	identify	
GATK v1.0.5777 with the following commands.                                                                           unique	(different	start	and	end	points)	
##- Making indel calls                                                                                                and	redundant	reads	(same	start	and	
/usr/java/default/bin/java -jar /home/sunya/tool/GATK/GenomeAnalysisTK.                                               end	point),	and,	as	we	demonstrate	
jar                                                                                                                  here,	accuracy.
-T IndelGenotyperV2 
-R /data/output/pgm/pe/reference/dh10b.fa 
-I /data/output/pgm/pe/consolid/output.bam 
-bed /data/output/pgm/pe/consolid/norecal/consolid_indel.bed 
-o /data/output/pgm/pe/consolid/norecal/consolid_indel.vcf 
--minFraction 0.5 
--minIndelCount 5 
--window_size 300

##- Making SNP calls
/usr/java/default/bin/java -jar /home/sunya/tool/GATK/GenomeAnalysisTK.
jar 
 -T UnifiedGenotyper -nt 6 
 -R /data/output/pgm/pe/reference/dh10b.fa 
 -I /data/output/pgm/pe/consolid/output.bam 
 -o /data/output/pgm/pe/consolid/norecal/consolid_snp.vcf 
 -stand_call_conf 30.0 -stand_emit_conf 10.0



For Research Use Only. Not intended for any animal or human therapeutic or diagnostic use.
© 2011 Life Technologies Corporation. All rights reserved. The trademarks mentioned herein are the property of Life
Technologies Corporation or their respective owners.
Printed in the USA. CO24084 1011

lifetechnologies.com

Mais conteúdo relacionado

Mais de Thermo Fisher Scientific

TaqMan®Advanced miRNA cDNA synthesis kit to simultaneously study expression o...
TaqMan®Advanced miRNA cDNA synthesis kit to simultaneously study expression o...TaqMan®Advanced miRNA cDNA synthesis kit to simultaneously study expression o...
TaqMan®Advanced miRNA cDNA synthesis kit to simultaneously study expression o...
Thermo Fisher Scientific
 

Mais de Thermo Fisher Scientific (20)

Liquid biopsy quality control – the importance of plasma quality, sample prep...
Liquid biopsy quality control – the importance of plasma quality, sample prep...Liquid biopsy quality control – the importance of plasma quality, sample prep...
Liquid biopsy quality control – the importance of plasma quality, sample prep...
 
Streamlined next generation sequencing assay development using a highly multi...
Streamlined next generation sequencing assay development using a highly multi...Streamlined next generation sequencing assay development using a highly multi...
Streamlined next generation sequencing assay development using a highly multi...
 
Targeted T-cell receptor beta immune repertoire sequencing in several FFPE ti...
Targeted T-cell receptor beta immune repertoire sequencing in several FFPE ti...Targeted T-cell receptor beta immune repertoire sequencing in several FFPE ti...
Targeted T-cell receptor beta immune repertoire sequencing in several FFPE ti...
 
Development of Quality Control Materials for Characterization of Comprehensiv...
Development of Quality Control Materials for Characterization of Comprehensiv...Development of Quality Control Materials for Characterization of Comprehensiv...
Development of Quality Control Materials for Characterization of Comprehensiv...
 
A High Throughput System for Profiling Respiratory Tract Microbiota
A High Throughput System for Profiling Respiratory Tract MicrobiotaA High Throughput System for Profiling Respiratory Tract Microbiota
A High Throughput System for Profiling Respiratory Tract Microbiota
 
A high-throughput approach for multi-omic testing for prostate cancer research
A high-throughput approach for multi-omic testing for prostate cancer researchA high-throughput approach for multi-omic testing for prostate cancer research
A high-throughput approach for multi-omic testing for prostate cancer research
 
Why is selecting the right thermal cycler important?
Why is selecting the right thermal cycler important?Why is selecting the right thermal cycler important?
Why is selecting the right thermal cycler important?
 
A rapid library preparation method with custom assay designs for detection of...
A rapid library preparation method with custom assay designs for detection of...A rapid library preparation method with custom assay designs for detection of...
A rapid library preparation method with custom assay designs for detection of...
 
Generation of Clonal CRISPR/Cas9-edited Human iPSC Derived Cellular Models an...
Generation of Clonal CRISPR/Cas9-edited Human iPSC Derived Cellular Models an...Generation of Clonal CRISPR/Cas9-edited Human iPSC Derived Cellular Models an...
Generation of Clonal CRISPR/Cas9-edited Human iPSC Derived Cellular Models an...
 
TaqMan®Advanced miRNA cDNA synthesis kit to simultaneously study expression o...
TaqMan®Advanced miRNA cDNA synthesis kit to simultaneously study expression o...TaqMan®Advanced miRNA cDNA synthesis kit to simultaneously study expression o...
TaqMan®Advanced miRNA cDNA synthesis kit to simultaneously study expression o...
 
Identifying novel and druggable targets in a triple negative breast cancer ce...
Identifying novel and druggable targets in a triple negative breast cancer ce...Identifying novel and druggable targets in a triple negative breast cancer ce...
Identifying novel and druggable targets in a triple negative breast cancer ce...
 
Evidence for antigen-driven TCRβ chain convergence in the melanoma-infiltrati...
Evidence for antigen-driven TCRβ chain convergence in the melanoma-infiltrati...Evidence for antigen-driven TCRβ chain convergence in the melanoma-infiltrati...
Evidence for antigen-driven TCRβ chain convergence in the melanoma-infiltrati...
 
Analytical performance of a novel next generation sequencing assay for Myeloi...
Analytical performance of a novel next generation sequencing assay for Myeloi...Analytical performance of a novel next generation sequencing assay for Myeloi...
Analytical performance of a novel next generation sequencing assay for Myeloi...
 
Estimating Mutation Load from Tumor Research Samples using a Targeted Next-Ge...
Estimating Mutation Load from Tumor Research Samples using a Targeted Next-Ge...Estimating Mutation Load from Tumor Research Samples using a Targeted Next-Ge...
Estimating Mutation Load from Tumor Research Samples using a Targeted Next-Ge...
 
Development of a next-generation (NGS) assay for pediatric, childhood, and yo...
Development of a next-generation (NGS) assay for pediatric, childhood, and yo...Development of a next-generation (NGS) assay for pediatric, childhood, and yo...
Development of a next-generation (NGS) assay for pediatric, childhood, and yo...
 
High content screening in MCF7 and MDA-MB231 cells show differential response...
High content screening in MCF7 and MDA-MB231 cells show differential response...High content screening in MCF7 and MDA-MB231 cells show differential response...
High content screening in MCF7 and MDA-MB231 cells show differential response...
 
A next generation sequencing based sample-to-result pharmacogenomics research...
A next generation sequencing based sample-to-result pharmacogenomics research...A next generation sequencing based sample-to-result pharmacogenomics research...
A next generation sequencing based sample-to-result pharmacogenomics research...
 
A Comprehensive Childhood Cancer Research Gene Panel
A Comprehensive Childhood Cancer Research Gene PanelA Comprehensive Childhood Cancer Research Gene Panel
A Comprehensive Childhood Cancer Research Gene Panel
 
New Software for Detecting Somatic Mutations at Low Level in Sanger Sequencin...
New Software for Detecting Somatic Mutations at Low Level in Sanger Sequencin...New Software for Detecting Somatic Mutations at Low Level in Sanger Sequencin...
New Software for Detecting Somatic Mutations at Low Level in Sanger Sequencin...
 
Clinical research results for a NGS based kit for targeted detection of relev...
Clinical research results for a NGS based kit for targeted detection of relev...Clinical research results for a NGS based kit for targeted detection of relev...
Clinical research results for a NGS based kit for targeted detection of relev...
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 

Ion semiconductor sequencing uniquely enables both accurate long reads and paired-end sequencing

  • 1. APPLICATION NOTE Paired-End Sequencing Ion semiconductor sequencing uniquely enables both accurate long reads and paired-end sequencing PANEL A Summary • Ion Torrent recently launched a long read kit for the Ion PGM™ sequencer with modal high-quality read lengths of 225 bases. • Read lengths greater than 500 bases are feasible, as demonstrated by the generation PANEL B of a perfect 525 bases read. • A novel paired-end sequencing (PES) protocol for the Ion PGM™ system has been demonstrated. Ion PES further enhances accuracy. The number of indel errors was reduced 5-fold, and the total number of consensus errors was reduced 10-fold. Figure 1. Ion semiconductor sequencing utilizes natural nucleotides and simple sequenc- ing chemistry, producing accurate reads that are at least twice as long as reads generated with fluorescent sequencing-by-synthesis (SBS) chemistry. Sequencing accuracy generally decreases as read length increases, but this slope is relatively flat for the Ion PGM™ sequencer when compared with fluorescent SBS chemistry. The Ion PGM™ sequencer demonstrates modal read lengths of 225 bases (panel B). Measured per-base accuracy (Phred score) is calculated after aligning reads to the DH10B reference genome and is then plotted as a function of position in read, (panel A). A Phred score of 20 is equivalent to 99% accuracy, or 1 error per 100 bases. Next-generation sequencing — based applications and methodologies are extensively used to investigate genome biology. The lengths of next-generation sequencing reads are limited by accuracy—as read length increases, accuracy decreases. This fundamental inverse relationship between read length and accuracy governs sequencing-based applications.
  • 2. Single-Well Ionogram for (2406, 1991) CGCTAAGTAATATTCGCCCCGTTCACACGATTCCTCTGTAGTTCAGTCGGTAGAACGGCGGACTGTTAATCCGTATGTCACTGGTTCGAGTCCAGTCAGA |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| CGCTAAGTAATATTCGCCCCGTTCACACGATTCCTCTGTAGTTCAGTCGGTAGAACGGCGGACTGTTAATCCGTATGTCACTGGTTCGAGTCCAGTCAGA 1 100 GGAGCCAAATTCTAAAAATTCGCTTTTTTAGCGCAATGTCACTGACCTTAGTTGAACATTGTTTTTTAACGGATAGCGGGTTTTTAACATCTTAAGCGCC |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| GGAGCCAAATTCTAAAAATTCGCTTTTTTAGCGCAATGTCACTGACCTTAGTTGAACATTGTTTTTTAACGGATAGCGGGTTTTTAACATCTTAAGCGCC 101 200 CTCGACCTTTATGGTTGAGGGCGTTTTGCTATGAACGCCATCACCATTTTCCCCTCGATTATAAAACTTGAGTTATTCAGTAGTCTCCCCTCTTGCAACT |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| CTCGACCTTTATGGTTGAGGGCGTTTTGCTATGAACGCCATCACCATTTTCCCCTCGATTATAAAACTTGAGTTATTCAGTAGTCTCCCCTCTTGCAACT 201 300 CACACCCAAAACTGCCTAACGAAAAGTTATTAATTTTCAATCATATTGCTATCAGTATTTACATTTTTTCGCTGTGCTAGAAAGGGCGCATTTATGTTAG |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| CACACCCAAAACTGCCTAACGAAAAGTTATTAATTTTCAATCATATTGCTATCAGTATTTACATTTTTTCGCTGTGCTAGAAAGGGCGCATTTATGTTAG 301 400 CTCGTTCAGGGAAGGTAAGCATGGCTACGAAGAAGAGAAGTGGAGAAGAAATAAATGACCGACAAATATTATGCGGGATGGGAATTAAACTACGCCGCTT |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| CTCGTTCAGGGAAGGTAAGCATGGCTACGAAGAAGAGAAGTGGAGAAGAAATAAATGACCGACAAATATTATGCGGGATGGGAATTAAACTACGCCGCTT 401 500 AACTGCGGGTATCTGTCTGATAACT ||||||||||||||||||||||||| AACTGCGGGTATCTGTCTGATAACT 501 600 Figure 2. Demonstration of a 525 base perfect read on the Ion PGM™ Sequencer. The “ionogram” (left) displays the number of each base (A, T, C, or G) called at each nucleotide flow during an Ion PGM™ sequencing run. The sequence alignment (right) displays the read generated by the Ion PGM™ sequencer in the top strand and the reference genome sequence in the bottom strand. In October 2011, Ion Torrent across the length of the read is 99%, Reads longer than 500 bases are launched a long read kit for the PGM™ (Q20) and consensus accuracy is achievable, as demonstrated by the sequencer, which can provide modal 99.999%, (Q50). This data is available generation of a perfect read spanning high-quality read lengths of 225 for download at ioncommunity. 525 contiguous bases (Figure 2). bases (Figure 1). Mean raw accuracy iontorrent.com — run name STO-409. Paired-end sequencing (PES) was initially developed for short-read Step 1: Generate “forward” sequence reads Step 2: Prepare template for “reverse” sequence reads Step 3: Generate “reverse” sequence reads next-generation sequencing tech- C H nologies to increase both coverage of E M the human genome and the fraction I of generated reads aligning to the C A genome. Fortuitously, PES has L become useful for applications that demand ever longer reads. Increasing Torrent Analysis Suite base calling Step 4: Merge “forward” and “reverse” FASTQ Torrent Analysis Suite base calling the length of accurately called bases FASTQ FASTQ FASTQ can obviate the need for PES in many forward Merged reverse instances, allowing more of the D I Bidirectional “paired” reads genome to be accurately sequenced, G I Fully and for some especially demanding T Overlapping A applications there is no substitute for L Partially Overlapping long reads. Given the utility of both accurate long reads and PES, Ion Non- Overlapping Torrent has recently developed solutions for both methods. Figure 3. Only Ion Torrent Personal Genome Machine™ (PGM™) semiconductor sequencing technology delivers the unique combination of long reads and PES. This figure depicts the chemical and digital workflow of the novel PES method developed by Ion Torrent. Forward and reverse reads are denoted by green and orange colors, respectively. The degree of paired-read overlap is shown aligned to a reference genome.
  • 3. Using the Ion sequencing kit Each microwell on the Ion Chip, and PES methodology can produce fully launched in July 2011, which sup- thus, each template is in perfect reg- overlapping forward and reverse ports read lengths of 100 bases, a ister with each transistor, ensuring reads with either a 2 x 100 or 2 x 200 novel PES method was developed for that the linkage between forward and Ion semiconductor sequencing run. the Ion PGM™ sequencer that pro- reverse reads is unbreakable even Fully overlapping reads are useful duces paired reads, each 100 bases when the Ion Chip is removed from for further enhancing accuracy. In in length (2 x 100). We anticipate that the Ion PGM™ sequencer between counting-based applications such applying recent improvements in the forward and reverse sequencing as RNA-Seq or ChIP-Seq, counting read length to the Ion PES protocol runs. Furthermore, because the Ion accuracy is improved by partially should produce paired reads of 200 Chip directly translates biochemi- overlapping reads (Figure 3), which bases in length (2 x 200), with the cal signals into digital signals, via increase the number of uniquely possibility of paired reads up to 400 perfectly arrayed transistors, only mapped reads and identify redundant bases in length (2 x 400). a particle-immobilized template, reads by demarcating the start primer and polymerase are required and end of the sequence reads. Until recently, no sequencing technol- in each Ion Chip microwell. This Non-overlapping reads (Figure 3) ogy could simultaneously offer the simplicity facilitates the occurrence of are useful for detecting genomic benefits conferred by both long reads nucleic acid biochemistry and enzy- rearrangements or de novo assembly and PES. The Ion Torrent Personal mology directly in each microwell, because the distance between the Genome Machine™ (PGM) semiconduc- enabling PES as well as other novel forward and reverse reads can be tor sequencing technology is now the sequencing methods. longer than the length of a single only platform that provides this unique read and the distance is known and combination of long reads and PES. uniformly distributed. Ion PES data analysis To compare the accuracy of fully Ion PES workflow Forward and reverse sequencing overlapping paired reads to individual produces two FASTQ-formatted files. reads, the proper pairs were either The simplicity of semiconductor Merging the two FASTQ files produces aligned as individual reads or as sequencing allows a template to both unidirectional (singleton) and a single consolidated read to the be sequenced in both directions bidirectional reads. Unidirectional DH10B reference genome using using a single Ion Chip. The first reads are defined as Ion Chip the TMAP alignment software in read results from the polymerase microwell positions that do not the Torrent Analysis Suite. A total extending the primer towards the intersect in the forward and reverse of 977,458 reads were paired and Ion Sphere™ particle, in the “forward” FASTQ files, whereas bidirectional direction (Figure 3, step 1), the reads are defined as the intersection standard operating procedure for of the Ion Chip microwell position TGACGATTGC the Ion PGM™ Sequencer. Then, off in the forward and reverse FASTQ TGACG-TTGC instrument, directly within the Ion files. After alignment to a reference TGACGATTGC Deletion Chip, the template is prepared for genome, the bidirectional or “paired” the second read (Figure 3, step 2) reads can be further subclassified CAGCT-CAGA via a series of simple enzymatic into “improper pairs”, those with CAGCTGCAGA CAGCTCAGA steps. The forward primer is fully improper orientation or improper Insertion extended to the Ion Sphere™ particle, distance between forward and reverse and then the original template is reads, and “proper pairs”, those CCAGGTACAC nicked and degraded to produce with proper orientation and distance CCAGGCACAC CCAGGTACAC a primer for the second read. The between forward and reverse reads Substitution same Ion Chip is reintroduced to the (Figure 3). Ion PGM™ Sequencer, and the second Figure 4. Depiction of a paired-read error- read is generated by the polymerase PES generates differing degrees of correction schema. Green and orange bars extending the primer away from the overlap between the forward and represent forward and reverse reads derived from fully overlapping paired reads, aligned Ion Sphere™ particle, in the “reverse” reverse reads, depending on the size to a reference genome. Information from direction (Figure 3, step 3). of the library fragments and read paired strands at the same nucleotide as well lengths. With a library insert size as per-position quality scores can be used to of 100–200 bases in length, the Ion develop improved bioinformatics methods for error correction
  • 4. aligned to the DH10B reference Conclusion The difference between “paired- genome using the Ion 314™ Chip, end sequencing” (PES) and and of these 466,654 reads were The combination of long reads and “mate-pair sequencing” (MPS) considered proper pairs, to produce paired-end sequencing delivers a PES and MPS are similar in that 53 Mb of consolidated data, giving unique and powerful capability that both methods produce reads from greater than 10x coverage on E. coli. is only available by employing Ion the ends of the templates, yet differ To consolidate forward and reverse semiconductor sequencing. in that MPS produces a single read reads, errors were handled via a Perfect reads exceeding 500 bases of the template ends that were simple schema. Specifically, if one of have now been demonstrated using paired during library construc- the reads from the pair did not exhibit the Ion PGM™ system. When these tion—whereas PES produces two sequence reads, one from each the same substitution, insertion, increasingly longer reads are coupled end of the same template. MPS is or deletion at the same position, with paired-end sequencing the generally used for longer inserts then the substitution, insertion, or outcome is enhanced accuracy. (2–10 kp), and PES is employed for deletion was removed (Figure 4). Via shorter inserts (100–800 bases). this methodology, raw insertion and Ion Torrent recently released a long deletion errors were reduced ~5-fold MPS protocol on the Ion Community to 0.19%. Consensus accuracy was (ioncommunity.iontorrent.com) reduced over 10x, to only 17 total for sequencing the ends of 10 kb inserts, which have important appli- errors with pairing. cations in cancer rearrangements and de novo genome assembly. The non-overlapping reads produced by MPS and PES enable applications such as interrogation of genomic rearrangement or de novo genome Bioinformatics methods assembly because the distance between the forward and reverse Raw voltage data were processed using Torrent Suite v1.5 to generate FASTQ reads can be longer than the length files. FASTQ files were aligned against E. coli DH10B using TMAP v0.1.3-1 of a single read and the distance is to produce BAM files. Variants for STO-409 were called using SAMTOOLS known and uniformly distributed. mpileup using default parameters as wrapped with Torrent Suite v1.5 in the Beyond assembly and interrogation of genome structural variation, PES Germ Line Variant Caller Plug-In. Variants for PES analysis were called from can improve the number of uniquely BAM files using both SAMTOOLS mpileup (-Q 7 -h 50 -o 10 -e 17 -m4) and mapped reads, the ability to identify GATK v1.0.5777 with the following commands. unique (different start and end points) ##- Making indel calls and redundant reads (same start and /usr/java/default/bin/java -jar /home/sunya/tool/GATK/GenomeAnalysisTK. end point), and, as we demonstrate jar here, accuracy. -T IndelGenotyperV2 -R /data/output/pgm/pe/reference/dh10b.fa -I /data/output/pgm/pe/consolid/output.bam -bed /data/output/pgm/pe/consolid/norecal/consolid_indel.bed -o /data/output/pgm/pe/consolid/norecal/consolid_indel.vcf --minFraction 0.5 --minIndelCount 5 --window_size 300 ##- Making SNP calls /usr/java/default/bin/java -jar /home/sunya/tool/GATK/GenomeAnalysisTK. jar -T UnifiedGenotyper -nt 6 -R /data/output/pgm/pe/reference/dh10b.fa -I /data/output/pgm/pe/consolid/output.bam -o /data/output/pgm/pe/consolid/norecal/consolid_snp.vcf -stand_call_conf 30.0 -stand_emit_conf 10.0 For Research Use Only. Not intended for any animal or human therapeutic or diagnostic use. © 2011 Life Technologies Corporation. All rights reserved. The trademarks mentioned herein are the property of Life Technologies Corporation or their respective owners. Printed in the USA. CO24084 1011 lifetechnologies.com