O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.
Benchmarking 16S rRNA gene sequencing and bioinformatics tools
for identification of microbial abundances
Acknowledgments
...
Próximos SlideShares
Carregando em…5
×

Benchmarking 16S rRNA gene sequencing and bioinformatics tools for identification of microbial abundances

591 visualizações

Publicada em

High-throughput DNA sequencing continue to offer comprehensive insights into microbial ecosystems1. Several bioinformatics tools have been inconclusively benchmarked2, yet variations in algorithms are known to impact the microbiome results3. Thus, there is need for detailed benchmarking of bioinformatics tools. Here we validated 16S rRNA amplicon sequencing and four bioinformatics tools for microbiome analyses.

Publicada em: Ciências
  • Seja o primeiro a comentar

Benchmarking 16S rRNA gene sequencing and bioinformatics tools for identification of microbial abundances

  1. 1. Benchmarking 16S rRNA gene sequencing and bioinformatics tools for identification of microbial abundances Acknowledgments The authors acknowledge CRG Genomics Core Facility for their sequencing services, CRG Bioinformatics Core Facility and UCT ICTS High Performance Computing team for their computing facilities. The project was financed by CRG through Genomics and Bioinformatics Core Facilities funds as part of the “Saca la Lengua” project, which is an initiative of and the “la Caixa” Foundation, with the participation of the Center for Research into Environmental Epidemiology (CREAL), and the “Center d’Excellència Severo Ochoa 2013-2017” programme (SEV-2012-02-08) of the Ministry of Economy and Competitiveness. David Harris Onywera received a grant from the CRG-Novartis-Africa Mobility Programme. 1Bioinformatics Core Facility, Centre for Genomic Regulation (CRG), Dr. Aiguader 88, Barcelona, Spain; 2Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain; 3Institute of Infectious Disease and Molecular Medicine (IDM), University of Cape Town (UCT), Anzio Road, Observatory 7925, Cape Town, South Africa Introduction High-throughput DNA sequencing continue to offer comprehensive insights into microbial ecosystems1. Several bioinformatics tools have been inconclusively benchmarked2, yet variations in algorithms are known to impact the microbiome results3. Thus, there is need for detailed benchmarking of bioinformatics tools. Here we validated 16S rRNA amplicon sequencing and four bioinformatics tools for microbiome analyses. Methods  Genomic DNA from two microbial mock communities (Even: HM782D, Staggered: HM783D, BEI Resources) was sequenced by shotgun and V3-V4 16S rRNA sequencing on Illumina HiSeq and MiSeq, respectively.  For 16S rRNA and whole DNA, eight and three independent sequencing runs were performed, respectively.  All reads were mapped to a database of 20 reference bacterial genomes using Bowtie24.  Four bioinformatics tools for 16S rRNA analysis – mothur5, QIIME6, QUPARSE (UPARSE7 imported into QIIME6) and riboPicker (based on the skewer8, pear9 and ribopicker10 algorithms) were set up and tested.  Taxonomic annotations on globally trimmed non-chimeric representative sequences in QIIME, mothur, and riboPicker were performed by the RDP Classifier using the SILVA database v119 with ≥90% bootstrap confidence. In QUPARSE, the Greengenes Database (13_8 Release) was used.  Distributions of relative taxa abundances estimated by each tool were compared with the number rRNA operons, provided by BEI Resources and obtained from the whole genome sequencing (WGS).  Performance of the methods were evaluated using the HMP parametric R statistical package11. Conclusion  WGS and 16S approaches gave significantly different species distributions in both mocks.  Genera distributions in the staggered mock by all tools were similar to the 16S rRNA mapping data.  mothur and QUPARSE had similar and significantly lower FPs and FNs (genera) than riboPicker and QIIME, at different thresholds on the genera abundance in all mocks. FN results are not shown.  QUPARSE did not assign to any genera more than half of sequenced reads. Its performance was not as satisfactory as other tools’ on the even mock.  mothur performed better than the other three bioinformatics tools that were tested. Luca Cozzuto1,2, Carlos Company1,2, Nuria Andreu Somavilla1,2, Jochen Hecht1,2, David Harris Onywera1,3 and Julia Ponomarenko1,2 Mock bacterial community sequencing and analysis Results References 1. Franzosa, E.A.etal.Sequencing andbeyond:integrating molecular 'omics' formicrobial community profiling. Nat.Rev.Methods13,360–372(2015). 2. Sun, Y. et al. A large-scale benchmark study of existing algorithms for taxonomy-independent microbial community analysis. Brief. Bioinform 13, 107- 121(2012). 3. White,J.R.etal.Alignment andclustering ofphylogenetic markers -implications formicrobial diversity studies. BMCBioinfomatics 11,152(2010). 4. Langmead, B.&Salzberg, S.L.Fast gapped-read alignment withBowtie 2.Nat.Methods9,357-359(2012). 5. Schools, P. D. et al. Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl.Environ.Microbiol. 75,7537-7541(2009). 6. Caporaso, J.G.etal.QIIMEallows analysis ofhigh-throughput community sequencing data.Nat.Methods7,335–336(2010). 7. Edgar,R.C.UPARSE:highlyaccurate OTUsequences frommicrobial amplicon reads. Nat.Methods10,996–8(2013). 8. Jiang,H.etal.Skewer: afast andaccurate adapter trimmer fornext-generation sequencing paired-end reads. BMCBioinformatics 15,182(2014). 9. Zhang,J.etal.PEAR:afast andaccurate Illumina Paired-End reAdmergeR.Bioinformatics 30,614-620(2014). 10. Schmieder, R.etal.Identification andremoval ofribosomal RNAsequences frommetatranscriptomes. Bioinformatics 28,433-435(2012). 11. LaRosa,P.etal.Hypothesis testing andpowercalculations fortaxonomic-based humanmicrobiome data.PLOSONE7,e52078(2012). Figure 1. Benchmarking metagenomics pipelines using mock communities. Bacterial DNA were extracted, and amplicons barcoded for sequencing. Tools and sequencing performances were statistically computed. luca.cozzuto@crg.eu; carlos.company@crg.eu; harris.onywera@crg.eu; julia.ponomarenko@crg.eu Species abundances were significantly different between 16S and WGS approaches Figure 2. Species theoretical and observed abundances. a) Even mock community, b) staggered mock community. Figure 3. Genera relative abundances of mock genera. a) Histograms of genera distributions of eight mocks by each tool, b) Bar plots comparing genera proportions of each tool against one another and 16S mapping data. All but QUPARSE results were similar to 16S mapping data (QUPARSE: p-value < 0.0004, based on the Likelihood-Ratio test statistic comparing the Drichlet parameter vectors). All but QUPARSE distributions were not significantly different from 16S mapping data: Even Distributions by all tools were not significantly different from 16S mapping data: Staggered Figure 4. Genera relative abundances of mock genera. a) Histograms of genera distributions of eight mocks by each tool, b) Bar plots comparing genera proportions of each pipeline against one another and 16S mapping data. All results were similar. Significant differences in fraction of assigned reads and false-positively assigned reads Figure 5. Fraction of all sequenced reads. QIIME and riboPiker assigned >70% of sequenced reads, which was significantly more than mothur or QUPARSE did. Figure 6. Proportion of false-positively assigned reads. Percentage of false-positively assigned reads was low in all tested methods. Figure 8. Staggered mock, threshold on 0.022% and 0.01% abundances. mothur and QUPARSE had similar number of positive genera, which was significantly lower (p-value < 0.001) than QIIME’s or riboPiker’s. Significant differences in false genera at different thresholds on relative abundances Figure 7. Even mock. mothur and QUPARSE had similar and significantly lower number of false positive genera than QIIME and riboPicker (p-value < 0.001).

×