Russian Call Girls in Pune Riya 9907093804 Short 1500 Night 6000 Best call gi...
Network Biology: from lists to underpinnings of molecular behaviour
1. Network Biology:from lists to underpinnings of molecular behaviour Michel Dumontier, Ph.D. Associate Professor of Bioinformatics Carleton University 1 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
3. Provenance This talk was prepared in part with input from the “Interpreting Gene Lists” workshop put forward by the Canadian Bioinformatics Workshops (bioinformatics.ca) http://bioinformatics.ca/workshops/2009/course-content BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier] 3
4. So you did some mass spectrometry? Protein Identification 4 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
5. database search vs de novo W R V A L T Database ofknown peptidesMDERHILNM, KLQWVCSDL, PTYWASDL, ENQIKRSACVM, TLACHGGEM, NGALPQWRT, HLLERTKMNVV, GGPASSDA, GGLITGMQSD, MQPLMNWE, ALKIIMNVRT, AVGELTK, HEWAILF, GHNLWAMNAC, GVFGSVLRA, EKLNKAATYIN.. G E P L K C W D T W R V A L T G E P L K C W D T Database Search de novo AVGELTK 5 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
7. My experiment worked and I have dozens, hundreds, or thousands of hits…. now what? Protein Identification ? 7 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
8. Use the list to explore Biology Determine significant shared attributes Explore putative mechanisms of actions Test hypotheses Protein Identification Network Biology Eureka! Hypothesis on the molecular basis of disease/process 8 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
9. Detoxification Oxidative Metabolism # in list having attribute Enriched in smokers = UP-regulated in smokers # in list sharing these attributes 9 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
11. A hypothesis underlies the list of identified proteins An initial question was posed, an experiment performed and a list of candidates obtained. The question is, what are the roles of these entities in the biological process being investigated. Normal vs pathological Response to stimulus Interactions and complexes 11 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
12. Biological Answers Computational systems biology Information retrieval and summary Interaction network analysis Pathway analysis Function prediction 12 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
13. Molecular Attributes An attribute provides information about to the entity in question (e.g. shape, function, process) Sequence and structure provides information about Motifs, domains, interaction/binding sites, post-translational modifications, conformational changes, molecular complexes, mutations, conservation/evolution Functions, localization, biological / pathological processes 13 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
14. Gene Ontology Captures terminology related to three aspects biological processes molecular functions cellular components Relationships between terms are largely defined with “is a” and “part of” relations Cell division Isomerase activity 14 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
15. cell membrane chloroplast mitochondrial chloroplast membrane membrane is-a part-of GO Structure Species independent. Some lower-level terms are specific to a group, but higher level terms are not 15 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
19. Good for making pie charts16 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
20. Annotation Manual annotation Created by scientific curators High quality Small number (time-consuming to create) Electronic annotation Annotation derived without human validation Computational predictions (accuracy varies) Lower ‘quality’ than manual codes Key point: be aware of annotation origin 17 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
31. IEA: Inferred from electronic annotation18 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
32. Variable Coverage Lomax J. Get ready to GO! A biologist's guide to the Gene Ontology. Brief Bioinform. 2005 Sep;6(3):298-304. 19 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
33. GO Software Tools GO resources are freely available to anyone without restriction Includes the ontologies, gene associations and tools developed by GO Other groups have used GO to create tools for many purposes http://www.geneontology.org/GO.tools 20 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
38. Identifiers Identifiers (IDs) are ideally unique, stable names or numbers that help track database records E.g. Social Insurance Number, Entrez Gene ID 41232 Gene and protein information stored in many databases Genes have many IDs Records for: Gene, DNA, RNA, Protein Important to recognize the correct record type E.g. Entrez Gene records don’t store sequence. They link to DNA regions, RNA transcripts and proteins. 25 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
39. NCBI Database Links NCBI: U.S. National Center for Biotechnology Information Part of National Library of Medicine (NLM) http://www.ncbi.nlm.nih.gov/Database/datamodel/data_nodes.swf 26 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
41. Identifier Mapping So many IDs! Mapping (conversion) is a headache Four main uses Disambiguate similarly named entities Used to reference related information Biological and informational provenance E.g. Genes to proteins, Entrez Gene to Affy Unification during dataset merging Equivalent entities 28 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
42. ID Mapping Services Synergizer http://llama.med.harvard.edu/synergizer/translate/ Ensembl BioMart http://www.ensembl.org UniProt http://www.uniprot.org/ 29 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
44. Attribute Enrichment (AE) Given: list: e.g. RRP6, MRD1, RRP7, RRP43, RRP42 attributes: e.g. function, process, localization, interactions AE Question: Are any of the attributes surprisingly enriched in the list? Details: How to assess “surprisingly” (statistics) How to correct for repeating the tests 31 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
45. What is a P-value? The P-value is (a bound) on the probability that the “null hypothesis” is true, Calculated through statistics with the data and testing the probability of observing those statistics, or ones more extreme, given a sample of the same size distributed according to the null hypothesis, Intuitively: P-value is the probability of a false positive result (aka “Type I error”) 32 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
46. How likely are the observed differences between the two distributions due to chance? 0 1 7 1 5 6 6 0 1 1 0 7 2 0 1 2 1 0 value value distribution 33 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
47. AE using the T-test Answer: Two-tailed T-test Black: N1=500 Mean: m1 = 1.1 Std: s1 = 0.9 Red: N2=4500 Mean: m1 = 4.9 Std: s1 = 1.0 T-statistic = Formal Question: What is the probability of observing the T-statistic or one more extreme if the means of the two distributions were the same? = -88.5 34 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
48. AE using the T-test P-value = shaded area * 2 -88.5 T-distribution Probability density 0 T-statistic T-statistic = Formal Question: What is the probability of observing the T-statistic or one more extreme if the means of the two distributions were the same? = -88.5 35 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
49. T-test limitations Values are positive and have increasing density near zero, e.g. sequence counts Bimodal “two-bumped” distributions. Distributions with outliers, or “heavy-tailed” distributions Probability density 0 score Probability density Probability density score score Assumes distributions are both approximately Gaussian (i.e. normal) Score distribution assumption is often true for: Log ratios from microarrays Score distribution assumption is rarely true for: Peptide counts, sequence tags (SAGE or NextGen sequencing), transcription factor binding sites hits Tests for significance of difference in means of two distribution but does not test for other differences between distributions. 36 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
50. Kolmogorov-Smirnov (K-S) test Probability density 0 score Cumulative distribution 1.0 Cumulative probability 0.5 Length = 0.4 0 Question: Are the red and black distributions significantly different? score Formal question: Is the length of largest difference between the “empirical distribution functions” statistically significant? Calculate cumulative distributions of red and black 37 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
51. What is the probability of finding 4 or more proteins with feature X in a random sample of 5 proteins list RRP6 MRD1 RRP7 RRP43 RRP42 Background population: 500 X proteins, 5000 proteins 38 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
52. Fisher’s exact test Null distribution P-value Answer = 4.6 x 10-4 list RRP6 MRD1 RRP7 RRP43 RRP42 P-value for Fisher’s exact test is “the probability that a random draw of the same size as the list from the background population would produce the observed number (or more) of attributes in the list.”, depends on size of the list, # with features (in list, background), and the background population. Background population: 500 X proteins, 5000 proteins 39 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
53. Important details To test for under-enrichment of “black”, test for over-enrichment of “red”. Need to choose “background population” appropriately, e.g., if only portion of the total complement is queried (or having annotation), only use that population as background. To test for enrichment of more than one independent types of annotation (red vs black and circle vs square), apply Fisher’s exact test separately for each type. The hypergeometric test is equivalent to a one-tailed Fisher’s exact test. 40 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
54. How to win the P-value lottery, part 1 Random draws Expect a random draw with observed enrichment once every 1 / P-value draws … 7,834 draws later … Background population: 500 X 5000 Y 41 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
55. How to win the P-value lottery, part 2Keep the list the same, evaluate different annotations Different annotations Observed draw RRP6 MRD1 RRP7 RRP43 RRP42 RRP6 MRD1 RRP7 RRP43 RRP42 42 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
56. Correcting for multiple tests The Bonferroni correction controls the probability any one test is due to random chance akaFamily-Wise Error Rate (FWER) If M = # of annotations tested: Corrected P-value = M x original P-value The Benjamini-Hochberg (B-H) controls the proportion of positive tests (i.e. rejections of the null hypothesis) that are false positives akaFalse Discovery Rate (FDR) FDR is the expected proportion of the observed enrichments that are due to random chance. Less stringent than the Bonferroni 43 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
57. Reducing multiple test correction stringency The correction to the P-value threshold a depends on the # of tests that you do, so, no matter what, the more tests you do, the more sensitive the test needs to be Can control the stringency by reducing the number of tests: e.g. use GO slim or restrict testing to the appropriate GO annotations. 44 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
58. AE tools Web-based tools Funspec: easy tool for yeast, not maintained, uses GO annotations and some annotations (e.g. protein complexes) YeastFeatures Similar to Funspec, different datasets and presentation GoMiner: Uses GO annotations, covers many organisms, needs a background set of genes Cytoscape-based tools BINGO: Does GO annotations and displays enrichment results graphically and visually organizes related categories 45 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
59.
60. last updated 200246 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
63. GoMiner, part 1http://discover.nci.nih.gov/gominer 1. Click “web interface” 2. Upload background 3. Upload list 4. Choose organism 5. Choose evidence code (All or Level 1) 49 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
64. GoMiner, part 2 6. Restrict # of tests via category size 7. Restrict # of tests via GO hierarchy 8. Results emailed to this address, in a few minutes 50 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
65. DAVID, part 1 http://david.abcc.ncifcrf.gov/ Paste list here DAVID automatically detects organism Choose ID type List type: list or background? 51 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
67. BINGO, an ORA cytoscape pluginhttp://www.psb.ugent.be/cbd/papers/BiNGO/index.htm Links represent parent-child relationships in GO ontology Colours represent significance of enrichment Nodes represent GO categories 53 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
77. network In biology, a network is a graph comprised of nodes that correspond to entities (genes, proteins, small molecules) and edges that correspond to physical/agentive or associative relations between entities. Vertex (node) Cycle Edge -5 Directed Edge (Arc) Weighted Edge 10 7 57 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
78. Integration in a Network Context 58 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
79. Integration in a Network Context Expression data mapped to node colours 59 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
80. Mapping Biology to a Network A simple mapping: Protein-protein interactions one protein/node, one interaction/edge Edges can represent other relationships Physical e.g. protein-protein interaction Regulatory e.g. kinase activates target Genetic e.g. epistasis Similarity e.g. protein sequence similarity Critical: understand the mapping for network analysis 60 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
81. Protein Sequence Similarity Network http://apropos.icmb.utexas.edu/lgl/ 61 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
82. Literature Network Computationally extract gene relationships from text, usually PubMed abstracts Useful if network is not in a database Literature search tool BUT not perfect Problems recognizing gene names Natural language processing is difficult Agilent Literature Search Cytoscape plugin iHOP (www.ihop-net.org/UniPub/iHOP/) 62 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
84. Cytoscape Network produced by Literature Search. Abstract from the scientific literature Sentences for an edge 64 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
85. Enrichment Map Overlap A B 65 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
86. Nodes represent gene-sets 66 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
88. 68 Physical Networks B A Between two molecular objects DNA, RNA, gene, protein, complex, small molecule, photon Requires a site of interaction / binding Biologically relevant: Present/expressed at the same time Share a cellular location Leads to some biologically relevant outcome BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
89. Molecular Interactions RAS interacting with RALGDS (PDB: 1LFD) Synthetic protein interacting with ATP and Zinc (PDB: 2P0X) 69 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
91. 71 Experimental Considerations How do you know if the interaction really exists? Each method has its advantages and disadvantages. Be aware of systematic errors Be aware of contaminants. Each method observes interactions from a slightly different experimental condition. Support from many different sources is certainly better (necessary) than just one. BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
92. 72 Some affinity purification caveats First and most importantly, this is only a representation of the observation. You can only tell what proteins are in the eluate; you can’t tell how they are connected to one another. If there is only one other protein present (B), then its likely that A and B are directly interacting. But, what if I told you that two other proteins (B and C) were present along with A…. A B A C B BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
93. 73 Complexes with unknown topology A A A B C B C B C Which of these models is correct? The complex described by this experimental result is said to have an Unknown Topology. BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
94. 74 Complexes with unknown stoichiometry A A B B B Here’s another possibility? The complex described by this experimental result is also said to have Unknown Stoichiometry. BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
95. 75 Interaction Models Actual Topology Spoke Matrix Simple model, useful for data navigation More accurate Theoretical max. number of interactions BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
96. 76 High-throughput Mass Spectrometric Protein Complex Identification (HMS-PCI) Mike Tyers, SLRI Ste12 Ho et al. Nature. 2002 Jan 10;415(6868):180-3 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
98. 78 k-core analysis A part of a graph where every node is connected to other nodes with at least k edges (k=0,1,2,3...) Highest k-core is a central most densely connected region of a graph Regions of dense connectivity may represent molecular complexes Therefore, high k-cores may be molecular complexes BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
99. 79 Pre MS Ho 6-core 6-core Interaction can define function Gavin Union 6-core 9-core MCODE plugin for Cytoscape BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
102. Network Classification of Disease Traditional: Gene association Limitations: Too many genes reduces statistical power New: Active cell map based approaches combining network and molecular profiles Chuang HY, Lee E, Liu YT, Lee D, Ideker T Network-based classification of breast cancer metastasis Mol Syst Biol. 2007;3:140. Epub 2007 Oct 16 Liu M, Liberzon A, Kong SW, Lai WR, Park PJ, Kohane IS, Kasif S Network-based analysis of affected biological processes in type 2 diabetes models PLoS Genet. 2007 Jun;3(6):e96 Efroni S, Schaefer CF, Buetow KH Identification of key processes underlying cancer phenotypes using biologic pathway analysis PLoS ONE. 2007 May 9;2(5):e425 82 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
103. Network-Based Breast Cancer Classification 57k intx from Y2H, orthology, co-citation, HPRD, BIND, Reactome 2 breast cancer cohorts, different expression platforms Chuang HY, Lee E, Liu YT, Lee D, Ideker T Network-based classification of breast cancer metastasis Mol Syst Biol. 2007;3:140. Epub 2007 Oct 16 83 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
104. Similar network markers across 2 data sets (better than original overlap) Increased classification accuracy Better coverage of known cancer risk genes (*) 84 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
105. PIPE Predicts yeast PPI from sequence Uses interaction databases to find similar interacting proteins Estimates the site of interaction 75% accuracy (61% sensitivity, 89% specificity) Finds new interactions among complexes 85 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
108. PIPE2 First all-to-all sequence-based computational screen of PPIs in yeast 29,589 high confidence interactions of ~ 2 x 107 possible pairs 16,000x faster than PIPE 99.95% specificity 88 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
109. 89 Synthetic Genetic Interactions Synthetic genetic interactions (lethal, slow growth) Mate two mutants without phenotypes to get a daughter cell with a phenotype Synthetic lethal (SL), slow growth robotic mating using the yeast deletion library Genetic interactions provide functional data on protein interactions or redundant genes About 23% of known SLs (1295 - YPD+MIPS) were known protein interactions in yeast Tong et al. Science. 2001 Dec 14;294(5550):2364-8 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
110. 90 Cell Polarity Cell Wall Maintenance Cell Structure Mitosis Chromosome Structure DNA Synthesis DNA Repair Unknown Others Synthetic Genetic Interactions in Yeast Tong, Boone BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
111.
112. Have common cellular roleSprinzak, Sattath, Margalit, J Mol Biol, 2003 91 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
113. Comparisons All methods except for Y2H and synthetic lethality technique are biased toward abundant proteins. PPI bias toward certain cellular localizations. Evolutionarily conserved proteins have much better coverage in Y2H than the proteins restricted to a certain organism. C. Von Mering et al, Nature, 2002: 92 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
121. pathway In biology, a pathway is a network which consists of inputs (physical entities), outputs (physical entities, biological outcomes), and the molecular machinery and chemical transformations required/expected to realize the end-directed activity. 100 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
122. Using Pathway Information Expert knowledge Experimental Data Find active processes underlying a phenotype Databases Literature Pathway Information Pathway Analysis 101 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
123.
124. Pathway data extremely difficult to combine and useVuk Pavlovic Sylva Donaldson 102 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
125. Aim: Convenient Access to Pathway Information http://www.pathwaycommons.org Facilitate creation and communication of pathway data Aggregate pathway data in the public domain Provide easy access for pathway analysis 103 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
126. Access From Cytoscape 104 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
127. cardiomyopathy: downregulated genes Fatty Acid Degradation? Other pathways / processes? GenMAPP.org 105 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
132. 110 Network Analysis Cytoscape Visualize molecular interaction networks and integrate interactions with gene expression profiles and other state data. Data filters & custom plug-in architecture. http://www.cytoscape.org Biolayout Express 3D Large networks Gene expression www.sanger.ac.uk/Teams/Team101/biolayout/b3d.html BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
133. Expert knowledge Experimental Data Network Analysis using Cytoscape Find biological processes underlying a phenotype Databases Literature Network Information Network Analysis 111 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
134. http://cytoscape.org Network visualization and analysis Pathway comparison Literature mining Gene Ontology analysis Active modules Complex detection Network motif search UCSD, ISB, Agilent, MSKCC, Pasteur, UCSF, Unilever, UToronto, U Texas 112 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
137. Active Community http://www.cytoscape.org Help 8 tutorials, >10 case studies Mailing lists for discussion Documentation, data sets Annual Conference: Houston Nov 6-9, 2009 10,000s users, 2500 downloads/month >40 Plugins Extend Functionality Build your own, requires programming Cline MS et al. Integration of biological networks and gene expression data using Cytoscape Nat Protoc. 2007;2(10):2366-82 115 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
138. LAB Objective Create a map of the functional enrichments from the 14 input proteins Methods Use HGNC to obtain the gene symbols from the names Submit the gene symbols to a tool that already has datasets loaded. Get Attributes and do analysis on network 116 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
139. 14 Proteins ISOFORM of APOPTOSIS-INDUCING FACTOR 1, MITOCHONDRIAL QUINONE OXIDOREDUCTASE.; 26 KDA PROTEIN.;22 KDA PROTEIN.; 32 KDA PROTEIN. 14-3-3 PROTEIN EPSILON. ELONGATION FACTOR 1-GAMMA.; 50 KDA PROTEIN. AFG3-LIKE PROTEIN 2. 3-KETOACYL-COA THIOLASE, MITOCHONDRIAL IMPORTIN BETA-1 SUBUNIT. FH1/FH2 DOMAIN-CONTAINING PROTEIN ANNEXIN VI ISOFORM 2.; ANNEXIN A6. 2,4-DIENOYL-COA REDUCTASE, MITOCHONDRIAL HYDROXYACYL GLUTATHIONE HYDROLASE ISOFORM 1.; HYDROXYACYLGLUTATHIONE HYDROLASE. ISOFORM 1 OF ELECTRON TRANSFER FLAVOPROTEIN SUBUNIT BETA.; ISOFORM 2 OF ELECTRON TRANSFER FLAVOPROTEIN SUBUNIT BETA ISOFORM 1 OF LONG-CHAIN-FATTY-ACID--COA LIGASE 1 PHOSPHOLIPASE C DELTA 4. 117 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
140. Get their gene symbol/identifiersHGNC - http://www.genenames.org Provide a table of mappings What challenges did you face when trying to identify the symbols from textual descriptions? 118 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
141. Identify functional enrichments Discuss and provide a plot for the enrichment of Gene Ontology categories 119 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
142. Build an attribute enrichment network Which new proteins are functionally linked? What datasets were used in the network construction? 120 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
143. Attribute Enrichment with a custom data set Use BioMart to convert HGNC identifiers to Ensembl Identifiers Obtain the Gene Ontology categories for the target proteins and the background proteins. Use FUNC to do the enrichment analysis 121 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
148. Collect the Gene Ontology attributes for the list, then for all the human genes 126 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
149. Next steps are harder… http://func.eva.mpg.de/ To use FUNC, you need to convert the BioMART output to the file format above. This is pretty easy to do in excel for the protein list, but excel can’t handle the results for all the human proteins. Need to write a small script… take BIOC3008 and become a competent in simple data manipulation 127 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
Notas do Editor
GeneMANIA uses query specific weights for multifaceted function queries.Let’s say you have a co-expression network that was generated from microarray data. You know there is a cluster of cell cycle genes, and a cluster of DNA repair genes, and a few unknown genes between or within those clusters.This tells you a little bit about your genes of interest.But you want to add in a genetic interaction network, which is considerably more complex.And a protein interaction network, which is even more complex.How do you know what network contains the most relevant information about your query genes?The GeneMANIA algorithm weights the networks based on how connected your query genes are. A network is weighted more heavily if your query genes are more connected within that network.GeneMANIA produces a composite network showing the weights of the genetic and protein interaction, and co-expression networks used to generate the composite network.