06.30.2014.brainstorming.the.downstream.analysis.of.metagenomic.data.using.r.bayesian.statistics.or.other.methods

Brainstorming the Downstream
Analysis of Metagenomic Data Using R,
Bayesian Statistics, or Other Methods
Mitch Fernandez
July 1st, 2014

Workflow
• Data Preparation
• Data Preprocessing
• Data Clustering
• Downstream Analysis

Alignment
http://genome.cshlp.org/content/17/2/127/F2.expansion.html

Pre.clustering
http://www.nature.com/nmeth/journal/v9/n5/full/nmeth.1990.html

Chimeras
http://genome.cshlp.org/content/21/3/494/F1.expansion.html

Contaminants
http://www.reids-workouts.com/the-reason-you-are-not-building-muscle-part-2; http://acceleratingscience.com/proteomics/proteomic-analysis-of-mitochondria-unravels-
the-pathophysiology-of-pre-eclampsia/; http://vickgaza.deviantart.com/art/Mountain-Yeti-360756260; http://www.funchap.com/pictures-of-dogs/;
http://en.wikipedia.org/wiki/Chloroplast

OTU Clustering
http://rosalind.info/glossary/distance-matrix/
https://peerj.com/articles/237/

• Low Diversity
Richness and Diversity
• High Diversity
Equal Richness

Text Parsing
• Automate Oligos file creation
• Produce read count tables
• Parse richness and diversity results
• Parse error logs
• Split data into groups

Metastats
Former (n=24) Active (n=22)
Difference in Relative AbundanceOTU Reads Relative Abundance Variance Std. Error Reads Relative Abundance Variance Std. Error p-value q-value
Selenomonas_003 79 8.71E-04 1.00E-06 1.85E-04 216 2.03E-03 4.00E-06 4.01E-04 -1.16E-03 1.10E-02 2.33E-02
Porphyromonas_001 607 8.30E-03 8.50E-05 1.89E-03 331 3.96E-03 7.00E-06 5.68E-04 4.34E-03 3.10E-02 6.38E-02
Family_Burkholderiaceae_001 96 1.51E-03 5.00E-06 4.40E-04 67 5.04E-04 0.00E+00 1.48E-04 1.00E-03 3.50E-02 7.05E-02
Neisseria_001 4,851 4.58E-02 4.68E-03 1.40E-02 1,353 2.00E-02 2.55E-04 3.41E-03 2.57E-02 3.80E-02 7.38E-02
Campylobacter_001 121 1.10E-03 2.00E-06 2.76E-04 115 2.10E-03 4.00E-06 4.26E-04 -1.00E-03 5.29E-02 8.54E-02
Rhizobium_001 20 3.42E-04 0.00E+00 1.37E-04 76 7.55E-04 1.00E-06 1.74E-04 -4.13E-04 5.99E-02 9.21E-02
Class_Gammaproteobacteria_002 158 2.03E-03 6.00E-06 5.06E-04 454 4.29E-03 2.90E-05 1.14E-03 -2.26E-03 7.09E-02 9.93E-02
Catonella_001 274 2.13E-03 1.10E-05 6.81E-04 65 9.22E-04 1.00E-06 2.00E-04 1.21E-03 7.19E-02 1.00E-01
Family_Carnobacteriaceae_001 621 6.83E-03 4.30E-05 1.34E-03 283 4.31E-03 1.20E-05 7.30E-04 2.52E-03 1.01E-01 1.25E-01
Prevotella_001 3,064 3.58E-02 5.73E-04 4.89E-03 6,061 5.58E-02 2.50E-03 1.07E-02 -2.00E-02 1.06E-01 1.30E-01
Paracoccus_001 24 2.76E-04 0.00E+00 1.28E-04 138 1.44E-03 1.40E-05 8.07E-04 -1.17E-03 1.06E-01 1.30E-01
Actinomyces_001 903 7.32E-03 4.50E-05 1.37E-03 1,055 1.23E-02 1.76E-04 2.83E-03 -5.01E-03 1.07E-01 1.31E-01
Prevotella_005 46 6.95E-04 2.00E-06 2.86E-04 328 3.09E-03 6.80E-05 1.76E-03 -2.40E-03 1.10E-01 1.34E-01
Family_Rhodocyclaceae_001 14 1.92E-04 0.00E+00 1.36E-04 87 1.14E-03 8.00E-06 5.94E-04 -9.47E-04 1.25E-01 1.47E-01

Running the Workflow
1. Gather your data
2. Prepare an Oligos file
3. Zip everything up and copy to the “work” folder
4. Run mothur.sh
5. Come back in a few hours/days
6. Run ReadCountTable.py on the taxonomy output
7. Do additional downstream processing
8. Publish results

What we need help with
Data management
Post-hoc OTU naming
Improved scripting
Identifying new tools
Other stuff I haven’t thought of

06.30.2014.brainstorming.the.downstream.analysis.of.metagenomic.data.using.r.bayesian.statistics.or.other.methods

Recomendados

Recomendados

Mais conteúdo relacionado

Último

Último (20)

Destaque

Destaque (20)

06.30.2014.brainstorming.the.downstream.analysis.of.metagenomic.data.using.r.bayesian.statistics.or.other.methods