The document describes an RNA-seq analysis workflow called RNACocktail. It evaluates tools for various RNA-seq analysis tasks like alignment, assembly, quantification, and more. The study finds that performance can vary significantly depending on the specific tool and data set. It then proposes a comprehensive RNACocktail protocol and computational pipeline that achieves high accuracy across different sample types and analysis goals. Validation on multiple samples shows this broad analysis approach can help researchers extract more biologically relevant insights from transcriptomic data.
3. Next-generation sequencing is rapidly becoming the method of choice for
transcriptional profiling experiments.
In contrast to microarray technology, high throughput sequencing allows
identification of novel transcripts, does not require a sequenced genome
and circumvents background noise associated with fluorescence
quantification.
Furthermore, unlike hybridization-based detection, RNA-seq allows genome-
wide analysis of transcription at single nucleotide resolution, including
identification of alternative splicing events and post-transcriptional RNA
editing events.
Introduction
3
5. A typical RNA-seq experiment
Preparation of total RNA.
Depending on class of RNA to be sequenced (i.e. mRNA, lincRNA, microRNA etc),
enrichment is performed. Good quality total RNA is critical, although alternative
protocols for degraded RNA exist.
Library preparation.
Library preparation consists of:
RNA fragmentation. Unlike short RNAs, mRNAs are typically fragmented to smaller
pieces of RNA to enable sequencing.
Reverse transcription. First and second strand cDNA is reverse transcribed from
fragmented RNA using random hexamers or oligo(dT) primers.
Adapter ligation. The 5’ and/or 3’ ends of cDNA are repaired and adapters (containing
sequences to allow hybridization to a flow cell) are ligated.
Library cleanup and amplification. Libraries are enriched for correctly ligated cDNA
fragments and amplified by PCR to add any remaining sequencing primer sequences.
Library quantification, quality control and sequencing. Library concentration is
assessed using qRT-PCR and/or Bioanalyzer and is ready for sequencing.
Data analysis.
Downstream data analysis consists of quality control such as trimming of sequencing
adapters and removal of reads with poor quality scores followed by mapping reads,
analysis of differential expression, identification of novel transcripts and pathway
analysis.
Introduction
5
6. RNA-seq technologies
The three most widely used NGS platforms for RNA-seq are SOLiD and Ion Torrent,
both marketed by ThermoFisher, and Illumina’s HiSeq. All three platforms have similar
sample input requirements and sequences millions of cDNA fragments per run. Below,
sample preparation and pertinent application-specific advantages and disadvantages
are discussed.
Introduction
6
8. Ion Torrent and SOLiD libraries are both prepared using similar protocols.
Introduction
8
9. Non-coding RNA-seq
MicroRNAs are sequenced
by ligating RNA adapters to
each end of the mature
microRNA followed by
reverse transcription and
PCR (RT-PCR).
Introduction
9
12. The popularity of high-throughput next-generation sequencing
(NGS) ushered a new era in transcriptome analysis with RNA-seq.
A widespread application of RNAseq requires workflows tuned to
the sequencing technologies involved, sample types, desired
analysis as well as the availability of genomic and computational
resources.
Depending on the workflow used, the accuracy, speed, and cost
of analysis can vary significantly.
Thus, it is crucial to study the tradeoffs involved at different steps
of an RNA-seq analysis to get the best accuracy subject to the cost
and performance constraints.
Furthermore, figuring out the optimal workflow is even more
challenging since, in general, the best overall approaches may
have sub-optimal performance for a specific data set in terms of a
specific measure, which necessitates a comprehensive analysis of
workflows using a wide variety of data sets.
Introduction
12
13. They report the performance and propose a comprehensive
RNA-seq analysis protocol, named RNACocktail, along with a
computational pipeline achieving high accuracy.
Validation on different samples reveals that their proposed
protocol could help researchers extract more biologically
relevant predictions by broad analysis of the transcriptome.
Introduction
13
16. Several efforts have made to compare the performance of different RNA-seq analysis
tools.
However, these studies have mostly focused on a single RNA-seq analysis step, or their
workflow analyses were limited to one or two steps such as alignment and
quantification.
Thus, a comprehensive and systematic analysis of the RNA-seq data from different
perspectives can contribute significantly toward extraction of maximal insights from
RNA-seq data.
Research question
16
24. In conclusion, this a comprehensive assessment with detailed investigation
at each analysis step clearly outlines the current state of the RNA-seq
analysis.
This protocol highlights algorithm issues that warrant the attention of
researchers, leads to a broad-spectrum analysis protocol that can enable
researchers to unleash the full power of RNA-seq.
They envision that this approach will facilitate researchers in gaining better
and more comprehensive biological insights from their transcriptomic data,
as exemplified by the results of our pipeline, which is only one possible
instantiation of the comprehensive protocol.
Conclusion
24