Network Biology: from lists to underpinnings of molecular behaviour

Network Biology:from lists to underpinnings of molecular behaviour Michel Dumontier, Ph.D. Associate Professor of Bioinformatics Carleton University 1 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

2 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Provenance This talk was prepared in part with input from the “Interpreting Gene Lists” workshop put forward by the Canadian Bioinformatics Workshops (bioinformatics.ca) http://bioinformatics.ca/workshops/2009/course-content BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier] 3

So you did some mass spectrometry? Protein Identification 4 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

database search vs de novo W R V A L T Database ofknown peptidesMDERHILNM, KLQWVCSDL, PTYWASDL, ENQIKRSACVM, TLACHGGEM, NGALPQWRT, HLLERTKMNVV, GGPASSDA, GGLITGMQSD, MQPLMNWE, ALKIIMNVRT, AVGELTK, HEWAILF, GHNLWAMNAC, GVFGSVLRA, EKLNKAATYIN.. G E P L K C W D T W R V A L T G E P L K C W D T Database Search de novo AVGELTK 5 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

My experiment worked and I have dozens, hundreds, or thousands of hits…. now what? Protein Identification ? 7 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Use the list to explore Biology Determine significant shared attributes Explore putative mechanisms of actions Test hypotheses Protein Identification Network Biology Eureka! Hypothesis on the molecular basis of disease/process 8 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Detoxification Oxidative Metabolism # in list having attribute Enriched in smokers = UP-regulated in smokers # in list sharing these attributes 9 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Outline Explore identified proteins Attribute enrichment Networks Pathways Lab 10 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

A hypothesis underlies the list of identified proteins An initial question was posed, an experiment performed and a list of candidates obtained. The question is, what are the roles of these entities in the biological process being investigated. Normal vs pathological Response to stimulus Interactions and complexes 11 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Biological Answers Computational systems biology Information retrieval and summary Interaction network analysis Pathway analysis Function prediction 12 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Molecular Attributes An attribute provides information about to the entity in question (e.g. shape, function, process) Sequence and structure provides information about Motifs, domains, interaction/binding sites, post-translational modifications, conformational changes, molecular complexes, mutations, conservation/evolution Functions, localization, biological / pathological processes 13 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Gene Ontology Captures terminology related to three aspects biological processes molecular functions cellular components Relationships between terms are largely defined with “is a” and “part of” relations Cell division Isomerase activity 14 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

cell membrane chloroplast mitochondrial chloroplast membrane membrane is-a part-of GO Structure Species independent. Some lower-level terms are specific to a group, but higher level terms are not 15 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Gene Ontology 30,393 terms, 99.2% with definitions ,[object Object]

8,719 molecular functionsGO Slim is an official reduced set of GO terms ,[object Object]

Good for making pie charts16 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Annotation Manual annotation Created by scientific curators High quality Small number (time-consuming to create) Electronic annotation Annotation derived without human validation Computational predictions (accuracy varies) Lower ‘quality’ than manual codes Key point: be aware of annotation origin 17 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Evidence Type(provenance of facts) ,[object Object]

IDA: Inferred from Direct Assay

IPI: Inferred from Physical Interaction

IMP: Inferred from Mutant Phenotype

IGI: Inferred from Genetic Interaction

IEP: Inferred from Expression Pattern

TAS: Traceable Author Statement

NAS: Non-traceable Author Statement

IEA: Inferred from electronic annotation18 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Variable Coverage Lomax J. Get ready to GO! A biologist's guide to the Gene Ontology. Brief Bioinform. 2005 Sep;6(3):298-304. 19 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

GO Software Tools GO resources are freely available to anyone without restriction Includes the ontologies, gene associations and tools developed by GO Other groups have used GO to create tools for many purposes http://www.geneontology.org/GO.tools 20 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Accessing GO: QuickGO http://www.ebi.ac.uk/ego/ 21 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Explore Ontologies http://www.ebi.ac.uk/ontology-lookup 22 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Databases of Molecular Annotation NCBI Genbank / RefSeq Entrez Gene EBI UniProt Ensembl BioMart (eukaryotes) Model Organism Databases Berkeley Drosophila Genome Project (BDGP) dictyBase (Dictyostelium discoideum) FlyBase (Drosophila melanogaster) GeneDB (Schizosaccharomyces pombe, Plasmodium falciparum, Leishmania major and Trypanosoma brucei) UniProt Knowledgebase (Swiss-Prot/TrEMBL/PIR-PSD) and InterPro databases Gramene (grains, including rice, Oryza) Mouse Genome Database (MGD) and Gene Expression Database (GXD) (Mus musculus) Rat Genome Database (RGD) (Rattus norvegicus) Reactome Saccharomyces Genome Database (SGD) (Saccharomyces cerevisiae) The Arabidopsis Information Resource (TAIR) (Arabidopsis thaliana) The Institute for Genomic Research (TIGR): databases on several bacterial species WormBase (Caenorhabditis elegans) Zebrafish Information Network (ZFIN): (Danio rerio 23 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Identifiers Identifiers (IDs) are ideally unique, stable names or numbers that help track database records E.g. Social Insurance Number, Entrez Gene ID 41232 Gene and protein information stored in many databases  Genes have many IDs Records for: Gene, DNA, RNA, Protein Important to recognize the correct record type E.g. Entrez Gene records don’t store sequence. They link to DNA regions, RNA transcripts and proteins. 25 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

NCBI Database Links NCBI: U.S. National Center for Biotechnology Information Part of National Library of Medicine (NLM) http://www.ncbi.nlm.nih.gov/Database/datamodel/data_nodes.swf 26 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Common Identifiers Species-specific HUGO HGNC BRCA2 MGI MGI:109337 RGD 2219 ZFIN ZDB-GENE-060510-3 FlyBase CG9097 WormBase WBGene00002299 or ZK1067.1 SGD S000002187 or YDL029W Annotations InterPro IPR015252 OMIM 600185 Pfam PF09104 Gene Ontology GO:0000724 SNPs rs28897757 Experimental Platform Affymetrix 208368_3p_s_at Agilent A_23_P99452 CodeLink GE60169 Illumina GI_4502450-S Gene Ensembl ENSG00000139618 Entrez Gene 675 Unigene Hs.34012 RNA transcript GenBank BC026160.1 RefSeq NM_000059 Ensembl ENST00000380152 Protein Ensembl ENSP00000369497 RefSeq NP_000050.2 UniProt BRCA2_HUMAN or A1YBP1_HUMAN IPI IPI00412408.1 EMBL AF309413 PDB 1MIU Red = Recommended 27 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Identifier Mapping So many IDs! Mapping (conversion) is a headache Four main uses Disambiguate similarly named entities Used to reference related information Biological and informational provenance E.g. Genes to proteins, Entrez Gene to Affy Unification during dataset merging Equivalent entities 28 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

ID Mapping Services Synergizer http://llama.med.harvard.edu/synergizer/translate/ Ensembl BioMart http://www.ensembl.org UniProt http://www.uniprot.org/ 29 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Outline Explore identified proteins Attribute enrichment Networks Pathways 30 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Attribute Enrichment (AE) Given: list: e.g. RRP6, MRD1, RRP7, RRP43, RRP42 attributes: e.g. function, process, localization, interactions AE Question: Are any of the attributes surprisingly enriched in the list? Details: How to assess “surprisingly” (statistics) How to correct for repeating the tests 31 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

What is a P-value? The P-value is (a bound) on the probability that the “null hypothesis” is true, Calculated through statistics with the data and testing the probability of observing those statistics, or ones more extreme, given a sample of the same size distributed according to the null hypothesis, Intuitively: P-value is the probability of a false positive result (aka “Type I error”) 32 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

How likely are the observed differences between the two distributions due to chance? 0 1 7 1 5 6 6 0 1 1 0 7 2 0 1 2 1 0 value value distribution 33 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

AE using the T-test Answer: Two-tailed T-test Black: N1=500 Mean: m1 = 1.1 Std: s1 = 0.9 Red: N2=4500 Mean: m1 = 4.9 Std: s1 = 1.0 T-statistic = Formal Question: What is the probability of observing the T-statistic or one more extreme if the means of the two distributions were the same? = -88.5 34 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

AE using the T-test P-value = shaded area * 2 -88.5 T-distribution Probability density 0 T-statistic T-statistic = Formal Question: What is the probability of observing the T-statistic or one more extreme if the means of the two distributions were the same? = -88.5 35 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

T-test limitations Values are positive and have increasing density near zero, e.g. sequence counts Bimodal “two-bumped” distributions. Distributions with outliers, or “heavy-tailed” distributions Probability density 0 score  Probability density Probability density score  score  Assumes distributions are both approximately Gaussian (i.e. normal) Score distribution assumption is often true for: Log ratios from microarrays Score distribution assumption is rarely true for: Peptide counts, sequence tags (SAGE or NextGen sequencing), transcription factor binding sites hits Tests for significance of difference in means of two distribution but does not test for other differences between distributions. 36 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Kolmogorov-Smirnov (K-S) test Probability density 0 score  Cumulative distribution 1.0 Cumulative probability 0.5 Length = 0.4 0 Question: Are the red and black distributions significantly different? score  Formal question: Is the length of largest difference between the “empirical distribution functions” statistically significant? Calculate cumulative distributions of red and black 37 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

What is the probability of finding 4 or more proteins with feature X in a random sample of 5 proteins list RRP6 MRD1 RRP7 RRP43 RRP42 Background population: 500 X proteins, 5000 proteins 38 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Fisher’s exact test Null distribution P-value Answer = 4.6 x 10-4 list RRP6 MRD1 RRP7 RRP43 RRP42 P-value for Fisher’s exact test is “the probability that a random draw of the same size as the list from the background population would produce the observed number (or more) of attributes in the list.”, depends on size of the list, # with features (in list, background), and the background population. Background population: 500 X proteins, 5000 proteins 39 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Important details To test for under-enrichment of “black”, test for over-enrichment of “red”. Need to choose “background population” appropriately, e.g., if only portion of the total complement is queried (or having annotation), only use that population as background. To test for enrichment of more than one independent types of annotation (red vs black and circle vs square), apply Fisher’s exact test separately for each type. The hypergeometric test is equivalent to a one-tailed Fisher’s exact test. 40 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

How to win the P-value lottery, part 1 Random draws Expect a random draw with observed enrichment once every 1 / P-value draws … 7,834 draws later … Background population: 500 X 5000 Y 41 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

How to win the P-value lottery, part 2Keep the list the same, evaluate different annotations Different annotations Observed draw RRP6 MRD1 RRP7 RRP43 RRP42 RRP6 MRD1 RRP7 RRP43 RRP42 42 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Correcting for multiple tests The Bonferroni correction controls the probability any one test is due to random chance akaFamily-Wise Error Rate (FWER) If M = # of annotations tested: Corrected P-value = M x original P-value The Benjamini-Hochberg (B-H) controls the proportion of positive tests (i.e. rejections of the null hypothesis) that are false positives akaFalse Discovery Rate (FDR) FDR is the expected proportion of the observed enrichments that are due to random chance. Less stringent than the Bonferroni 43 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Reducing multiple test correction stringency The correction to the P-value threshold a depends on the # of tests that you do, so, no matter what, the more tests you do, the more sensitive the test needs to be Can control the stringency by reducing the number of tests: e.g. use GO slim or restrict testing to the appropriate GO annotations. 44 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

AE tools Web-based tools Funspec: easy tool for yeast, not maintained, uses GO annotations and some annotations (e.g. protein complexes) YeastFeatures Similar to Funspec, different datasets and presentation GoMiner: Uses GO annotations, covers many organisms, needs a background set of genes Cytoscape-based tools BINGO: Does GO annotations and displays enrichment results graphically and visually organizes related categories 45 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Funspec: Simple ORA for yeasthttp://funspec.med.utoronto.ca/ Choose sources of annotation Bonferroni correct? YES! Paste list here Cavaets: ,[object Object]

last updated 200246 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

http://software.dumontierlab.com/yeastfeatures 47 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

GoMiner, part 1http://discover.nci.nih.gov/gominer 1. Click “web interface” 2. Upload background 3. Upload list 4. Choose organism 5. Choose evidence code (All or Level 1) 49 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

GoMiner, part 2 6. Restrict # of tests via category size 7. Restrict # of tests via GO hierarchy 8. Results emailed to this address, in a few minutes 50 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

DAVID, part 1 http://david.abcc.ncifcrf.gov/ Paste list here DAVID automatically detects organism Choose ID type List type: list or background? 51 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

DAVID, part 2http://david.abcc.ncifcrf.gov/ 52 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

BINGO, an ORA cytoscape pluginhttp://www.psb.ugent.be/cbd/papers/BiNGO/index.htm Links represent parent-child relationships in GO ontology Colours represent significance of enrichment Nodes represent GO categories 53 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Outline Explore identified proteins Attribute enrichment Networks ,[object Object]

Functional networksPathways 55 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Why Network and Pathway Analysis? Intuitive to Biologists ,[object Object]

More efficient than searching databases gene-by-gene

Intuitive display for sharing data Computation on Pathway Content ,[object Object]

Identify potential regulators56 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

network In biology, a network is a graph comprised of nodes that correspond to entities (genes, proteins, small molecules) and edges that correspond to physical/agentive or associative relations between entities. Vertex (node) Cycle Edge -5 Directed Edge (Arc) Weighted Edge 10 7 57 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Integration in a Network Context 58 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Integration in a Network Context Expression data mapped to node colours 59 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Mapping Biology to a Network A simple mapping: Protein-protein interactions one protein/node, one interaction/edge Edges can represent other relationships Physical e.g. protein-protein interaction Regulatory e.g. kinase activates target Genetic e.g. epistasis Similarity e.g. protein sequence similarity Critical: understand the mapping for network analysis 60 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Protein Sequence Similarity Network http://apropos.icmb.utexas.edu/lgl/ 61 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Literature Network Computationally extract gene relationships from text, usually PubMed abstracts Useful if network is not in a database Literature search tool BUT not perfect Problems recognizing gene names Natural language processing is difficult Agilent Literature Search Cytoscape plugin iHOP (www.ihop-net.org/UniPub/iHOP/) 62 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Agilent Literature Search 63 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Cytoscape Network produced by Literature Search. Abstract from the scientific literature Sentences for an edge 64 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Enrichment Map Overlap A B 65 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Nodes represent gene-sets 66 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Muscle Contraction Olfactory Receptor Ubiquitin Processes Ubiquitin-dependent Proteolysis Ectodermal Dev. & Keratinocyte Diff. DNA Repair Mitotic Cell Cycle Ubiquitin Ligase DNA Processes Cytoskeleton DNA Replication Intermediate Filament Cytoskeleton Microtubule Cytoskeleton Ras GTPase mRNA Transport Chromosome RNA Processes Serine Endopeptidase Chromatin Remodeling RNA Splicing Fatty Acid Metabolism Ion Channel Transcription Calcium rRNA Processing Mitochondrial Oxidative Metabolism Ribonucleotide Metabolism Potassium Sodium Translation 67 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

68 Physical Networks B A Between two molecular objects DNA, RNA, gene, protein, complex, small molecule, photon Requires a site of interaction / binding Biologically relevant: Present/expressed at the same time Share a cellular location Leads to some biologically relevant outcome BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Molecular Interactions RAS interacting with RALGDS (PDB: 1LFD) Synthetic protein interacting with ATP and Zinc (PDB: 2P0X) 69 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

70 Experimental Interaction Discovery MassSpectrometry Genetics Two-Hybrid Direct, Physical Indirect, Physical Indirect, Genetic Microarray X-Ray NMR BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

71 Experimental Considerations How do you know if the interaction really exists? Each method has its advantages and disadvantages. Be aware of systematic errors Be aware of contaminants. Each method observes interactions from a slightly different experimental condition. Support from many different sources is certainly better (necessary) than just one. BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

72 Some affinity purification caveats First and most importantly, this is only a representation of the observation. You can only tell what proteins are in the eluate; you can’t tell how they are connected to one another. If there is only one other protein present (B), then its likely that A and B are directly interacting. But, what if I told you that two other proteins (B and C) were present along with A…. A B A C B BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

73 Complexes with unknown topology A A A B C B C B C Which of these models is correct? The complex described by this experimental result is said to have an Unknown Topology. BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

74 Complexes with unknown stoichiometry A A B B B Here’s another possibility? The complex described by this experimental result is also said to have Unknown Stoichiometry. BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

75 Interaction Models Actual Topology Spoke Matrix Simple model, useful for data navigation More accurate Theoretical max. number of interactions BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

76 High-throughput Mass Spectrometric Protein Complex Identification (HMS-PCI) Mike Tyers, SLRI Ste12 Ho et al. Nature. 2002 Jan 10;415(6868):180-3 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

78 k-core analysis A part of a graph where every node is connected to other nodes with at least k edges (k=0,1,2,3...) Highest k-core is a central most densely connected region of a graph Regions of dense connectivity may represent molecular complexes Therefore, high k-cores may be molecular complexes BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

79 Pre MS Ho 6-core 6-core Interaction can define function Gavin Union 6-core 9-core MCODE plugin for Cytoscape BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

80 http://pathguide.org BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Interaction Databases Experiment (E) Structure detail (S) Predicted Physical (P) Functional (F) Curated (C) Homology modeling (H) *IMEx consortium 81 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Network Classification of Disease Traditional: Gene association Limitations: Too many genes reduces statistical power New: Active cell map based approaches combining network and molecular profiles Chuang HY, Lee E, Liu YT, Lee D, Ideker T Network-based classification of breast cancer metastasis Mol Syst Biol. 2007;3:140. Epub 2007 Oct 16 Liu M, Liberzon A, Kong SW, Lai WR, Park PJ, Kohane IS, Kasif S Network-based analysis of affected biological processes in type 2 diabetes models PLoS Genet. 2007 Jun;3(6):e96 Efroni S, Schaefer CF, Buetow KH Identification of key processes underlying cancer phenotypes using biologic pathway analysis PLoS ONE. 2007 May 9;2(5):e425 82 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Network-Based Breast Cancer Classification 57k intx from Y2H, orthology, co-citation, HPRD, BIND, Reactome 2 breast cancer cohorts, different expression platforms Chuang HY, Lee E, Liu YT, Lee D, Ideker T Network-based classification of breast cancer metastasis Mol Syst Biol. 2007;3:140. Epub 2007 Oct 16 83 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Similar network markers across 2 data sets (better than original overlap) Increased classification accuracy Better coverage of known cancer risk genes (*) 84 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

PIPE Predicts yeast PPI from sequence Uses interaction databases to find similar interacting proteins Estimates the site of interaction 75% accuracy (61% sensitivity, 89% specificity) Finds new interactions among complexes 85 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

PIPE2 First all-to-all sequence-based computational screen of PPIs in yeast 29,589 high confidence interactions of ~ 2 x 107 possible pairs 16,000x faster than PIPE 99.95% specificity 88 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

89 Synthetic Genetic Interactions Synthetic genetic interactions (lethal, slow growth) Mate two mutants without phenotypes to get a daughter cell with a phenotype Synthetic lethal (SL), slow growth robotic mating using the yeast deletion library Genetic interactions provide functional data on protein interactions or redundant genes About 23% of known SLs (1295 - YPD+MIPS) were known protein interactions in yeast Tong et al. Science. 2001 Dec 14;294(5550):2364-8 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

90 Cell Polarity Cell Wall Maintenance Cell Structure Mitosis Chromosome Structure DNA Synthesis DNA Repair Unknown Others Synthetic Genetic Interactions in Yeast Tong, Boone BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Validation: Protein Localization A – A3: Y2H B: physical methods C: genetic E: immunological True positives: ,[object Object]

Have common cellular roleSprinzak, Sattath, Margalit, J Mol Biol, 2003 91 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Comparisons All methods except for Y2H and synthetic lethality technique are biased toward abundant proteins. PPI bias toward certain cellular localizations. Evolutionarily conserved proteins have much better coverage in Y2H than the proteins restricted to a certain organism. C. Von Mering et al, Nature, 2002: 92 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Functional Associations Molecular Interactions Regulatory Interactions Genetic Interactions Similarity relationships Co-expression Protein sequence Domain architecture Phylogenetic profiles Gene neighborhood Gene fusion … 93 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

http://string.embl.de/ von Mering et al., Nucleic Acids Res., 2005 94 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

95 95 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Gene Function Prediction using a Multiple Association Network Integration Algorithm Query-specific weights for multifaceted function queries w1x w2x w3x weights CDC27 Cell cycle CDC23 + + APC11 UNK1 Co-complexed Durrett 2006 Genetic Tong et al. 2001 RAD54 XRS2 DNA repair = MRE11 UNK2 Co-expression Pavlidis et al, 2002, Lanckriet et al, 2004 Mostafavi et al, 2008 97 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

GeneMANIA Cytoscape Plugin 98 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Outline Explore identified proteins Attribute enrichment Networks Pathways Lab 99 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

pathway In biology, a pathway is a network which consists of inputs (physical entities), outputs (physical entities, biological outcomes), and the molecular machinery and chemical transformations required/expected to realize the end-directed activity. 100 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Using Pathway Information Expert knowledge Experimental Data Find active processes underlying a phenotype Databases Literature Pathway Information Pathway Analysis 101 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

>290 Pathway Databases! http://pathguide.org ,[object Object]

Pathway data extremely difficult to combine and useVuk Pavlovic Sylva Donaldson 102 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Aim: Convenient Access to Pathway Information http://www.pathwaycommons.org Facilitate creation and communication of pathway data Aggregate pathway data in the public domain Provide easy access for pathway analysis 103 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Access From Cytoscape 104 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

cardiomyopathy: downregulated genes Fatty Acid Degradation? Other pathways / processes? GenMAPP.org 105 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Network Biology: from lists to underpinnings of molecular behaviour

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Network Biology: from lists to underpinnings of molecular behaviour

Semelhante a Network Biology: from lists to underpinnings of molecular behaviour (20)

Mais de Michel Dumontier

Mais de Michel Dumontier (20)

Último

Último (20)

Network Biology: from lists to underpinnings of molecular behaviour

Notas do Editor