Step by step tutorial for conducting GO enrichment analysis and then creating a network from the results.
Material from the UC Davis 2014 Proteomics Workshop.
See more at: http://sourceforge.net/projects/teachingdemos/files/2014%20UC%20Davis%20Proteomics%20Workshop/
2. Download all material for the tutorial
https://sourceforge.net/projects/teachingdemos/files/2014%2
https://sourceforge.net/projects/teachingdemos/files/
Choose 2014 UC Davis Proteomics Workshop or use the
full URL below
3. โข decrease
โข increase
Use functional analysis to identify if the changes in variables
are enriched (increased compared to random chance) for
some biological pathway, domain or ontological category.
5. Major Tasks
Using the proteins listed in the excel workbook: โproteomic data for
analysis.xlsxโ and worksheet: โprotein IDsโ
1. Conduct Gene Ontology (GO) Enrichment Analysis using
DAVID Bioinformatics Resources
http://david.abcc.ncifcrf.gov/home.jsp
2. Investigate enriched terms using
Quick GO http://www.ebi.ac.uk/QuickGO/
3. Summaries and visualize the results using
REVIGO http://revigo.irb.hr/
4. Create and modify GO network using
Cytoscape http://www.cytoscape.org/
6. Protein IDs
Common protein identifier
UniProt/SwissProt Accession
(default in scaffold)
http://www.uniprot.org/
Use Biomart to translate to other
database IDS
http://www.biomart.org/
e.g. gene symbols
18. Overview Results
Modified Fisherโs Exact Test p-value
optionally: Check in R
x<-data.frame(user=c(1,47),genome=c(690,13528))
fisher.test(x) # p-value = 5.41e-06
(13/47) / (690/13528)
19. Alternative to Fisher Exact Test:
Hypergeometric Test
How to calculate statistics to determine enrichment?
hit.num = 51 # number of significantly changed pathway variables
set.num = 1455 # number of variables in pathway
full = 3358 # all possible variables in organism
q.size = 72 # number of significantly changed variables
phyper(hit.num-1, set.num, full-set.num, q.size, lower.tail=F)
enrichment p-value = 1.717553e-06
21. Use REVIGO to filter redundant terms
http://revigo.irb.hr/
prepare input (term, p-value)
1. Upload to
REVIGO
Supek F, Boลกnjak M, ล kunca N, ล muc T. "REVIGO summarizes and visualizes long lists of Gene Ontology terms" PLoS ONE 2011. doi:10.1371/journal.pone.0021800
2. Run
24. REVIGO: network
โข Edges: 3% of the
strongest GO term
pairwise similarities
โข Node size: generality
of term
(small = specific)
โข Node color: p-value
Download network
27. Cytoscape: map data to network properties
1. Set Edge width and color 2. Set Node labels, size and color
28. Cytoscape: overview network components
Download edge information
1
2
3. View in excel
Download node information
1
2
3. View in excel
29. Bonus: Modify Edge and Node Attributes to show
term to protein connections
See file โtest edge.xlsxโ and โtest node.xslx, for examples of upload
formats
See detailed instructions at http://www.slideshare.net/dgrapov/demonstration-of-network-mapping
30. See more Statistical and Multivariate Analysis Examples at
http://imdevsoftware.wordpress.com/tutorials/
Questions?
dgrapov@ucdavis.edu
This research was supported in part by NIH 1 U24 DK097154