SlideShare uma empresa Scribd logo
1 de 127
Network Biology:from lists to underpinnings of molecular behaviour Michel Dumontier, Ph.D. Associate Professor of Bioinformatics Carleton University 1 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
2 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
Provenance This talk was prepared in part with input from the “Interpreting Gene Lists” workshop put forward by the Canadian Bioinformatics Workshops (bioinformatics.ca) http://bioinformatics.ca/workshops/2009/course-content BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier] 3
So you did some mass spectrometry? Protein Identification 4 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
database search vs de novo W R V A L T Database ofknown peptidesMDERHILNM,   KLQWVCSDL, PTYWASDL,   ENQIKRSACVM, TLACHGGEM,  NGALPQWRT, HLLERTKMNVV,   GGPASSDA,   GGLITGMQSD,  MQPLMNWE, ALKIIMNVRT,  AVGELTK, HEWAILF,  GHNLWAMNAC, GVFGSVLRA,  EKLNKAATYIN.. G E P L K C W D T W R V A L T G E P L K C W D T Database Search de novo AVGELTK 5 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
6 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
My experiment worked and I have dozens, hundreds, or thousands of hits…. now what? Protein  Identification ? 7 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
Use the list to explore Biology Determine significant shared attributes Explore putative mechanisms of actions Test hypotheses Protein  Identification Network  Biology Eureka! Hypothesis on the  molecular basis of disease/process  8 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
Detoxification Oxidative Metabolism # in list having attribute Enriched in smokers = UP-regulated in smokers # in list sharing  these attributes 9 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
Outline Explore identified proteins Attribute enrichment Networks  Pathways Lab 10 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
A hypothesis underlies the list of identified proteins An initial question was posed, an experiment performed and a list of candidates obtained. The question is, what are the roles of these entities in the biological process being investigated.  Normal vs pathological Response to stimulus Interactions and complexes 11 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
Biological Answers Computational systems biology Information retrieval and summary Interaction network analysis Pathway analysis Function prediction 12 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
Molecular Attributes An attribute provides information about to the entity in question (e.g. shape, function, process) Sequence and structure provides information about  Motifs, domains, interaction/binding sites, post-translational modifications, conformational changes, molecular complexes, mutations, conservation/evolution Functions, localization, biological / pathological processes 13 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
Gene Ontology Captures terminology related to three aspects  biological processes molecular functions  cellular components Relationships between terms are largely defined with “is a” and “part of” relations Cell division Isomerase activity 14 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
                              cell membrane                       chloroplast mitochondrial                   chloroplast membrane                         membrane is-a part-of GO Structure Species independent. Some lower-level terms are specific to a group, but higher level terms are not 15 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
Gene Ontology 30,393 terms, 99.2% with definitions ,[object Object]
2,735 cellular components
8,719 molecular functionsGO Slim is an official reduced set of GO terms ,[object Object]
Good for making pie charts16 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
Annotation Manual annotation Created by scientific curators High quality Small number (time-consuming to create) Electronic annotation Annotation derived without human validation Computational predictions (accuracy varies) Lower ‘quality’ than manual codes Key point: be aware of annotation origin  17 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
Evidence Type(provenance of facts) ,[object Object]
IDA: Inferred from Direct Assay
IPI:  Inferred from Physical Interaction
IMP:  Inferred from Mutant Phenotype
IGI:   Inferred from Genetic Interaction
IEP:  Inferred from Expression Pattern
TAS: Traceable Author Statement
NAS: Non-traceable Author Statement
IC:    Inferred by Curator
ND:   No Data available
IEA: Inferred from electronic annotation18 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
Variable Coverage Lomax J. Get ready to GO! A biologist's guide to the Gene Ontology. Brief Bioinform. 2005 Sep;6(3):298-304. 19 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
GO Software Tools GO resources are freely available to anyone without restriction Includes the ontologies, gene associations and tools developed by GO Other groups have used GO to create tools for many purposes http://www.geneontology.org/GO.tools 20 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
Accessing GO: QuickGO http://www.ebi.ac.uk/ego/ 21 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
Explore Ontologies http://www.ebi.ac.uk/ontology-lookup 22 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
Databases of Molecular Annotation NCBI  Genbank / RefSeq Entrez Gene EBI  UniProt Ensembl BioMart (eukaryotes) Model Organism Databases Berkeley Drosophila Genome Project (BDGP) dictyBase (Dictyostelium discoideum)  FlyBase (Drosophila melanogaster)  GeneDB (Schizosaccharomyces pombe, Plasmodium falciparum, Leishmania major and Trypanosoma brucei)  UniProt Knowledgebase (Swiss-Prot/TrEMBL/PIR-PSD) and InterPro databases  Gramene (grains, including rice, Oryza)  Mouse Genome Database (MGD) and Gene Expression Database (GXD) (Mus musculus)  Rat Genome Database (RGD) (Rattus norvegicus) Reactome Saccharomyces Genome Database (SGD) (Saccharomyces cerevisiae)  The Arabidopsis Information Resource (TAIR) (Arabidopsis thaliana)  The Institute for Genomic Research (TIGR): databases on several bacterial species  WormBase (Caenorhabditis elegans)  Zebrafish Information Network (ZFIN): (Danio rerio 23 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
24 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
Identifiers Identifiers (IDs) are ideally unique, stable names or numbers that help track database records E.g. Social Insurance Number, Entrez Gene ID 41232 Gene and protein information stored in many databases  Genes have many IDs Records for: Gene, DNA, RNA, Protein Important to recognize the correct record type E.g. Entrez Gene records don’t store sequence. They link to DNA regions, RNA transcripts and proteins. 25 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
NCBI Database Links NCBI: U.S. National Center for Biotechnology Information Part of National Library of Medicine (NLM) http://www.ncbi.nlm.nih.gov/Database/datamodel/data_nodes.swf 26 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
Common Identifiers Species-specific HUGO HGNC BRCA2 MGI MGI:109337 RGD 2219  ZFIN ZDB-GENE-060510-3  FlyBase CG9097  WormBase WBGene00002299 or ZK1067.1 SGD S000002187 or YDL029W Annotations InterPro IPR015252 OMIM 600185 Pfam  PF09104 Gene Ontology GO:0000724 SNPs rs28897757 Experimental Platform Affymetrix 208368_3p_s_at Agilent A_23_P99452 CodeLink GE60169 Illumina GI_4502450-S Gene Ensembl ENSG00000139618 Entrez Gene 675 Unigene Hs.34012 RNA transcript GenBank BC026160.1 RefSeq NM_000059 Ensembl ENST00000380152 Protein Ensembl ENSP00000369497 RefSeq NP_000050.2 UniProt BRCA2_HUMAN or A1YBP1_HUMAN IPI IPI00412408.1 EMBL AF309413  PDB 1MIU Red = Recommended 27 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
Identifier Mapping So many IDs! Mapping (conversion) is a headache Four main uses Disambiguate similarly named entities Used to reference related information Biological and informational provenance E.g. Genes to proteins, Entrez Gene to Affy Unification during dataset merging Equivalent entities 28 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
ID Mapping Services Synergizer http://llama.med.harvard.edu/synergizer/translate/ Ensembl BioMart http://www.ensembl.org UniProt http://www.uniprot.org/ 29 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
Outline Explore identified proteins Attribute enrichment Networks  Pathways 30 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
Attribute Enrichment (AE) Given: list: e.g. RRP6, MRD1, RRP7, RRP43, RRP42 attributes: e.g. function, process, localization, interactions 	AE Question: Are any of the attributes surprisingly enriched in the list? Details: How to assess “surprisingly” (statistics) How to correct for repeating the tests 31 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
What is a P-value? The P-value is (a bound) on the probability that the “null hypothesis” is true, Calculated through statistics with the data and testing the probability of observing those statistics, or ones more extreme, given a sample of the same size distributed according to the null hypothesis, Intuitively: P-value is the probability of a false positive result (aka “Type I error”) 32 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
How likely are the observed differences between the two distributions due to chance? 0 1 7 1 5 6 6 0 1 1 0 7 2 0 1 2 1 0 value value distribution 33 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
AE using the T-test Answer:  Two-tailed T-test Black:  N1=500 Mean:  m1 = 1.1   Std:      s1 = 0.9 Red:   N2=4500 Mean: m1 = 4.9   Std:      s1 = 1.0 T-statistic = Formal Question:  What is the probability of observing the T-statistic or one more extreme if the means of the two distributions were the same? = -88.5 34 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
AE using the T-test P-value = shaded area * 2 -88.5 T-distribution Probability density 0 T-statistic T-statistic = Formal Question:  What is the probability of observing the T-statistic or one more extreme if the means of the two distributions were the same? = -88.5 35 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
T-test limitations Values are positive and have increasing density near zero, e.g. sequence counts Bimodal “two-bumped” distributions. Distributions with outliers, or “heavy-tailed” distributions Probability density 0 score  Probability density Probability density score  score  Assumes distributions are both approximately Gaussian (i.e. normal)  Score distribution assumption is often true for: Log ratios from microarrays Score distribution assumption is rarely true for: Peptide counts, sequence tags (SAGE or NextGen sequencing), transcription factor binding sites hits Tests for significance of difference in means of two distribution but does not test for other differences between distributions. 36 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
Kolmogorov-Smirnov (K-S) test Probability density 0 score  Cumulative distribution 1.0 Cumulative probability 0.5 Length = 0.4 0 Question:  Are the red and black distributions significantly different? score  Formal question: Is the length of largest difference between the “empirical distribution functions” statistically significant? Calculate cumulative distributions of red and black 37 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
What is the probability of finding 4 or more proteins with feature X in a random sample of 5 proteins list RRP6 MRD1 RRP7 RRP43 RRP42 Background population: 500 X proteins, 5000 proteins 38 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
Fisher’s exact test Null distribution P-value Answer = 4.6 x 10-4 list RRP6 MRD1 RRP7 RRP43 RRP42 P-value for Fisher’s exact test is “the probability that a random draw of the same size as the list from the background population would produce the observed number (or more) of attributes in the list.”, depends on size of the list,  # with features (in list, background), and the background population. Background population: 500 X proteins,  5000 proteins 39 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
Important details To test for under-enrichment of “black”, test for over-enrichment of “red”. Need to choose “background population” appropriately, e.g., if only portion of the total complement is queried (or having annotation), only use that population as background. To test for enrichment of more than one independent types of annotation (red vs black and circle vs square), apply Fisher’s exact test separately for each type.  The hypergeometric test is equivalent to a one-tailed Fisher’s exact test. 40 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
How to win the P-value lottery, part 1 Random draws Expect a random draw with observed enrichment once every 1 / P-value draws … 7,834 draws later … Background population: 500 X 5000 Y 41 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
How to win the P-value lottery, part 2Keep the list the same, evaluate different annotations Different annotations Observed draw RRP6 MRD1 RRP7 RRP43 RRP42 RRP6 MRD1 RRP7 RRP43 RRP42 42 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
Correcting for multiple tests The Bonferroni correction controls the probability any one test is due to random chance akaFamily-Wise Error Rate (FWER) 	 If M = # of annotations tested: Corrected P-value = M x original P-value The Benjamini-Hochberg (B-H) controls the proportion of positive tests (i.e. rejections of the null hypothesis) that are false positives akaFalse Discovery Rate (FDR) FDR is the expected proportion of the observed enrichments that are due to random chance. Less stringent than the Bonferroni 43 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
Reducing multiple test correction stringency The correction to the P-value threshold a depends on the # of tests that you do, so, no matter what, the more tests you do, the more sensitive the test needs to be Can control the stringency by reducing the number of tests:   e.g. use GO slim or restrict testing to the appropriate GO annotations. 44 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
AE tools Web-based tools  Funspec:   easy tool for yeast, not maintained, uses GO annotations and some annotations (e.g. protein complexes) YeastFeatures Similar to Funspec, different datasets and presentation GoMiner:  Uses GO annotations, covers many organisms, needs a background set of genes Cytoscape-based tools BINGO: Does GO annotations and displays enrichment results graphically and visually organizes related categories 45 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
Funspec: Simple ORA for yeasthttp://funspec.med.utoronto.ca/ Choose sources of annotation Bonferroni correct?  YES! Paste list here Cavaets: ,[object Object]
 last updated 200246 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
http://software.dumontierlab.com/yeastfeatures 47 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
48 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
GoMiner, part 1http://discover.nci.nih.gov/gominer 1. Click “web interface” 2. Upload background 3. Upload list 4. Choose organism 5. Choose evidence code (All or Level 1) 49 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
GoMiner, part 2 6. Restrict # of tests via category size 7. Restrict # of tests via GO hierarchy 8. Results emailed to this address, in a few minutes 50 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
DAVID, part 1 http://david.abcc.ncifcrf.gov/ Paste list here DAVID automatically detects organism Choose ID type List type: list or background? 51 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
DAVID, part 2http://david.abcc.ncifcrf.gov/ 52 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
BINGO, an ORA cytoscape pluginhttp://www.psb.ugent.be/cbd/papers/BiNGO/index.htm Links represent parent-child relationships in GO ontology Colours represent significance of enrichment Nodes represent GO categories 53 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
54 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
Outline Explore identified proteins Attribute enrichment Networks  ,[object Object]
Genetic networks
Functional networksPathways 55 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
Why Network and Pathway Analysis? Intuitive to Biologists ,[object Object]
More efficient than searching databases gene-by-gene
Intuitive display for sharing data Computation on Pathway Content ,[object Object]
Find active pathways
Identify potential regulators56 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
network 	In biology, a network is a graph comprised of nodes that correspond to entities (genes, proteins, small molecules) and edges that correspond to physical/agentive  or associative relations between entities. Vertex (node) Cycle Edge -5 Directed Edge (Arc) Weighted Edge 10 7 57 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
Integration in a Network Context 58 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
Integration in a Network Context Expression data mapped to node colours 59 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
Mapping Biology to a Network A simple mapping: Protein-protein interactions one protein/node, one interaction/edge Edges can represent other relationships Physical e.g. protein-protein interaction Regulatory e.g. kinase activates target Genetic e.g. epistasis Similarity e.g. protein sequence similarity Critical: understand the mapping for network analysis 60 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
Protein Sequence Similarity Network http://apropos.icmb.utexas.edu/lgl/ 61 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
Literature Network Computationally extract gene relationships from text, usually PubMed abstracts Useful if network is not in a database Literature search tool BUT not perfect Problems recognizing gene names Natural language processing is difficult Agilent Literature Search Cytoscape plugin iHOP (www.ihop-net.org/UniPub/iHOP/) 62 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
Agilent Literature Search 63 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
Cytoscape Network produced by Literature Search. Abstract from the scientific literature Sentences for an edge 64 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
Enrichment Map Overlap A B 65 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
Nodes represent  gene-sets 66 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
Muscle Contraction Olfactory Receptor Ubiquitin Processes Ubiquitin-dependent Proteolysis Ectodermal Dev. & Keratinocyte Diff. DNA Repair Mitotic Cell Cycle Ubiquitin Ligase DNA Processes Cytoskeleton DNA Replication Intermediate Filament Cytoskeleton Microtubule Cytoskeleton Ras GTPase mRNA Transport Chromosome RNA Processes Serine Endopeptidase Chromatin Remodeling RNA Splicing Fatty Acid Metabolism Ion Channel Transcription Calcium rRNA Processing Mitochondrial Oxidative Metabolism Ribonucleotide Metabolism Potassium Sodium Translation 67 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
68 Physical Networks B A Between two molecular objects DNA, RNA, gene, protein, complex, small molecule, photon Requires a site of interaction / binding Biologically relevant: Present/expressed at the same time Share a cellular location Leads to some biologically relevant outcome BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
Molecular Interactions RAS interacting with RALGDS (PDB: 1LFD) Synthetic protein interacting with ATP and Zinc (PDB: 2P0X) 69 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
70 Experimental Interaction Discovery MassSpectrometry Genetics Two-Hybrid Direct, Physical Indirect, Physical Indirect, Genetic Microarray X-Ray NMR BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
71 Experimental Considerations How do you know if the interaction really exists?  Each method has its advantages and disadvantages.   Be aware of systematic errors Be aware of contaminants. Each method observes interactions from a slightly different experimental condition. Support from many different sources is certainly better (necessary) than just one. BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
72 Some affinity purification caveats First and most importantly, this is only a representation of the observation. You can only tell what proteins are in the eluate;  you can’t tell how they are connected to one another. If there is only one other protein present (B), then its likely that A and B are directly interacting. But, what if I told you that two other proteins (B and C) were present along with A….  A B A C B BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
73 Complexes with unknown topology A A A B C B C B C Which of these models is correct? The complex described by this experimental result is  said to have an Unknown Topology. BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
74 Complexes with unknown stoichiometry A A B B B Here’s another possibility? The complex described by this experimental result is  also said to have Unknown Stoichiometry. BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
75 Interaction Models Actual Topology Spoke Matrix Simple model,  useful for data navigation More accurate Theoretical max. number of interactions BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
76 High-throughput Mass Spectrometric Protein Complex Identification (HMS-PCI) Mike Tyers, SLRI Ste12 Ho et al. Nature. 2002 Jan 10;415(6868):180-3 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
77 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
78 k-core analysis A part of a graph where every node is connected to other nodes with at least k edges (k=0,1,2,3...) Highest k-core is a central most densely connected region of a graph Regions of dense connectivity may represent molecular complexes Therefore, high k-cores may be molecular complexes BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
79 Pre MS Ho 6-core 6-core Interaction can define function  Gavin Union 6-core 9-core  MCODE plugin for Cytoscape BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
80 http://pathguide.org BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
Interaction Databases Experiment (E) Structure detail (S) Predicted Physical (P) Functional (F) Curated (C) Homology modeling (H) *IMEx consortium 81 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
Network Classification of Disease Traditional: Gene association Limitations: Too many genes reduces statistical power New: Active cell map based approaches combining network and molecular profiles Chuang HY, Lee E, Liu YT, Lee D, Ideker T Network-based classification of breast cancer metastasis Mol Syst Biol. 2007;3:140. Epub 2007 Oct 16 Liu M, Liberzon A, Kong SW, Lai WR, Park PJ, Kohane IS, Kasif S Network-based analysis of affected biological processes in type 2 diabetes models PLoS Genet. 2007 Jun;3(6):e96 Efroni S, Schaefer CF, Buetow KH Identification of key processes underlying cancer phenotypes using biologic pathway analysis PLoS ONE. 2007 May 9;2(5):e425 82 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
Network-Based Breast Cancer Classification 57k intx from Y2H, orthology, co-citation, HPRD, BIND, Reactome 2 breast cancer cohorts, different expression platforms Chuang HY, Lee E, Liu YT, Lee D, Ideker T Network-based classification of breast cancer metastasis Mol Syst Biol. 2007;3:140. Epub 2007 Oct 16 83 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
Similar network markers across 2 data sets (better than original overlap) Increased classification accuracy Better coverage of known cancer risk genes (*) 84 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
PIPE Predicts yeast PPI from sequence Uses interaction databases to find similar interacting proteins Estimates the site of interaction 75% accuracy (61% sensitivity, 89% specificity) Finds new interactions among complexes 85 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
86 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
87 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
PIPE2 First all-to-all sequence-based computational screen of PPIs in yeast  29,589 high confidence interactions of ~ 2 x 107 possible pairs  16,000x faster than PIPE 99.95% specificity 88 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
89 Synthetic Genetic Interactions Synthetic genetic interactions (lethal, slow growth) Mate two mutants without phenotypes to get a daughter cell with a phenotype Synthetic lethal (SL), slow growth robotic mating using the yeast deletion library Genetic interactions provide functional data on protein interactions or redundant genes About 23% of known SLs (1295 - YPD+MIPS) were known protein interactions in yeast Tong et al. Science. 2001 Dec 14;294(5550):2364-8 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
90 Cell Polarity Cell Wall Maintenance  Cell Structure Mitosis Chromosome Structure DNA Synthesis  DNA Repair Unknown Others Synthetic Genetic Interactions in Yeast Tong, Boone BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
Validation: Protein Localization A – A3: Y2H B: physical methods C: genetic E: immunological True positives: ,[object Object]
Have common cellular roleSprinzak, Sattath, Margalit, J Mol Biol, 2003 91 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
Comparisons All methods except for Y2H and synthetic lethality technique are biased toward abundant proteins.  PPI bias toward certain cellular localizations.  Evolutionarily conserved proteins have much better coverage in Y2H than the proteins restricted to a certain organism.  C. Von Mering et al, Nature, 2002: 92 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
Functional Associations Molecular Interactions Regulatory Interactions Genetic Interactions Similarity relationships Co-expression Protein sequence Domain architecture Phylogenetic profiles Gene neighborhood Gene fusion … 93 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
http://string.embl.de/ von Mering et al., Nucleic Acids Res., 2005 94 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
95 95 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
96 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
Gene Function Prediction using a  Multiple Association Network Integration Algorithm Query-specific weights for multifaceted function queries w1x w2x w3x weights CDC27 Cell  cycle CDC23 + + APC11 UNK1 Co-complexed Durrett 2006 Genetic Tong et al. 2001 RAD54 XRS2 DNA  repair = MRE11 UNK2 Co-expression Pavlidis et al, 2002, Lanckriet et al, 2004 Mostafavi et al, 2008 97 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
GeneMANIA Cytoscape Plugin 98 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
Outline Explore identified proteins Attribute enrichment Networks  Pathways Lab 99 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
pathway 	In biology, a pathway is a network which consists of inputs (physical entities), outputs (physical entities, biological outcomes), and the molecular machinery and chemical transformations required/expected to realize the end-directed activity. 100 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
Using Pathway Information Expert knowledge Experimental Data Find active processes underlying a phenotype Databases Literature Pathway Information Pathway Analysis 101 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
>290 Pathway Databases! http://pathguide.org ,[object Object]
Pathway data extremely difficult to combine and useVuk Pavlovic Sylva Donaldson 102 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
Aim: Convenient Access to Pathway Information http://www.pathwaycommons.org Facilitate creation and communication of pathway data Aggregate pathway data in the public domain Provide easy access for pathway analysis 103 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
Access From Cytoscape 104 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
cardiomyopathy: downregulated genes Fatty Acid Degradation? Other pathways / processes? GenMAPP.org 105 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Mais conteúdo relacionado

Mais procurados

Next Generation Sequencing of DNA
Next Generation Sequencing of DNANext Generation Sequencing of DNA
Next Generation Sequencing of DNAmaryamshah13
 
Single nucleotide polymorphisms (sn ps), haplotypes,
Single nucleotide polymorphisms (sn ps), haplotypes,Single nucleotide polymorphisms (sn ps), haplotypes,
Single nucleotide polymorphisms (sn ps), haplotypes,Karan Veer Singh
 
Systems biology & Approaches of genomics and proteomics
 Systems biology & Approaches of genomics and proteomics Systems biology & Approaches of genomics and proteomics
Systems biology & Approaches of genomics and proteomicssonam786
 
Comparative genomics 2
Comparative genomics 2Comparative genomics 2
Comparative genomics 2GCUF
 
STRUCTURE BASED DRUG DESIGN - MOLECULAR MODELLING AND DRUG DISCOVERY
STRUCTURE BASED DRUG DESIGN - MOLECULAR MODELLING AND DRUG DISCOVERYSTRUCTURE BASED DRUG DESIGN - MOLECULAR MODELLING AND DRUG DISCOVERY
STRUCTURE BASED DRUG DESIGN - MOLECULAR MODELLING AND DRUG DISCOVERYTHILAKAR MANI
 
Gene mapping and DNA markers
Gene mapping and DNA markersGene mapping and DNA markers
Gene mapping and DNA markersAFSATH
 
Tech Talk: UCSC Genome Browser
Tech Talk: UCSC Genome BrowserTech Talk: UCSC Genome Browser
Tech Talk: UCSC Genome BrowserHoffman Lab
 
Comparative Genomics and Visualisation - Part 1
Comparative Genomics and Visualisation - Part 1Comparative Genomics and Visualisation - Part 1
Comparative Genomics and Visualisation - Part 1Leighton Pritchard
 
Threading modeling methods
Threading modeling methodsThreading modeling methods
Threading modeling methodsratanvishwas
 
Molecular Docking Using Autodock Tools
Molecular Docking Using Autodock ToolsMolecular Docking Using Autodock Tools
Molecular Docking Using Autodock ToolsVikram Aditya
 
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformaticsAbhishek Vatsa
 
Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...
Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...
Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...VHIR Vall d’Hebron Institut de Recerca
 
Protein microarray Preparation of protein microarray Different methods of arr...
Protein microarray Preparation of protein microarray Different methods of arr...Protein microarray Preparation of protein microarray Different methods of arr...
Protein microarray Preparation of protein microarray Different methods of arr...naveed ul mushtaq
 
molecular docking its types and de novo drug design and application and softw...
molecular docking its types and de novo drug design and application and softw...molecular docking its types and de novo drug design and application and softw...
molecular docking its types and de novo drug design and application and softw...GAUTAM KHUNE
 
In Silico Drug Designing
In Silico Drug Designing In Silico Drug Designing
In Silico Drug Designing PALWINDER GILL
 
Molecular docking
Molecular dockingMolecular docking
Molecular dockingRahul B S
 
In silico drug desigining
In silico drug desiginingIn silico drug desigining
In silico drug desiginingDevesh Shukla
 

Mais procurados (20)

Next Generation Sequencing of DNA
Next Generation Sequencing of DNANext Generation Sequencing of DNA
Next Generation Sequencing of DNA
 
Single nucleotide polymorphisms (sn ps), haplotypes,
Single nucleotide polymorphisms (sn ps), haplotypes,Single nucleotide polymorphisms (sn ps), haplotypes,
Single nucleotide polymorphisms (sn ps), haplotypes,
 
Systems biology & Approaches of genomics and proteomics
 Systems biology & Approaches of genomics and proteomics Systems biology & Approaches of genomics and proteomics
Systems biology & Approaches of genomics and proteomics
 
Comparative genomics 2
Comparative genomics 2Comparative genomics 2
Comparative genomics 2
 
Protein Threading
Protein ThreadingProtein Threading
Protein Threading
 
STRUCTURE BASED DRUG DESIGN - MOLECULAR MODELLING AND DRUG DISCOVERY
STRUCTURE BASED DRUG DESIGN - MOLECULAR MODELLING AND DRUG DISCOVERYSTRUCTURE BASED DRUG DESIGN - MOLECULAR MODELLING AND DRUG DISCOVERY
STRUCTURE BASED DRUG DESIGN - MOLECULAR MODELLING AND DRUG DISCOVERY
 
Gene mapping and DNA markers
Gene mapping and DNA markersGene mapping and DNA markers
Gene mapping and DNA markers
 
Tech Talk: UCSC Genome Browser
Tech Talk: UCSC Genome BrowserTech Talk: UCSC Genome Browser
Tech Talk: UCSC Genome Browser
 
Comparative Genomics and Visualisation - Part 1
Comparative Genomics and Visualisation - Part 1Comparative Genomics and Visualisation - Part 1
Comparative Genomics and Visualisation - Part 1
 
Threading modeling methods
Threading modeling methodsThreading modeling methods
Threading modeling methods
 
Molecular Docking Using Autodock Tools
Molecular Docking Using Autodock ToolsMolecular Docking Using Autodock Tools
Molecular Docking Using Autodock Tools
 
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformatics
 
Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...
Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...
Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...
 
Protein microarray Preparation of protein microarray Different methods of arr...
Protein microarray Preparation of protein microarray Different methods of arr...Protein microarray Preparation of protein microarray Different methods of arr...
Protein microarray Preparation of protein microarray Different methods of arr...
 
molecular docking its types and de novo drug design and application and softw...
molecular docking its types and de novo drug design and application and softw...molecular docking its types and de novo drug design and application and softw...
molecular docking its types and de novo drug design and application and softw...
 
In Silico Drug Designing
In Silico Drug Designing In Silico Drug Designing
In Silico Drug Designing
 
molecular docking
molecular dockingmolecular docking
molecular docking
 
Molecular docking
Molecular dockingMolecular docking
Molecular docking
 
PROTEIN MICROARRAYS
PROTEIN MICROARRAYSPROTEIN MICROARRAYS
PROTEIN MICROARRAYS
 
In silico drug desigining
In silico drug desiginingIn silico drug desigining
In silico drug desigining
 

Semelhante a Network Biology: from lists to underpinnings of molecular behaviour

Ontology for the Financial Services Industry
Ontology for the Financial Services IndustryOntology for the Financial Services Industry
Ontology for the Financial Services IndustryBarry Smith
 
Collaboratively Creating the Knowledge Graph of Life
Collaboratively Creating the Knowledge Graph of LifeCollaboratively Creating the Knowledge Graph of Life
Collaboratively Creating the Knowledge Graph of LifeChris Mungall
 
Modeling exposure events and adverse outcome pathways using ontologies
Modeling exposure events and adverse outcome pathways using ontologiesModeling exposure events and adverse outcome pathways using ontologies
Modeling exposure events and adverse outcome pathways using ontologiesChris Mungall
 
Metabolic Profiling_techniques and approaches.ppt
Metabolic Profiling_techniques and approaches.pptMetabolic Profiling_techniques and approaches.ppt
Metabolic Profiling_techniques and approaches.pptSachin Teotia
 
Metabolic Profiling: Limitations, Challenges.ppt
Metabolic Profiling: Limitations, Challenges.pptMetabolic Profiling: Limitations, Challenges.ppt
Metabolic Profiling: Limitations, Challenges.pptSachin Teotia
 
Biocuration activities for the International Cancer Genome Consortium (ICGC).
Biocuration activities for the International Cancer Genome Consortium (ICGC).Biocuration activities for the International Cancer Genome Consortium (ICGC).
Biocuration activities for the International Cancer Genome Consortium (ICGC).Neuro, McGill University
 
Chattanooga Research Institute Presentation
Chattanooga Research Institute PresentationChattanooga Research Institute Presentation
Chattanooga Research Institute PresentationPhilip Bourne
 
International Cancer Genomics Consortium (ICGC) Data Coordinating Center
International Cancer Genomics Consortium (ICGC) Data Coordinating CenterInternational Cancer Genomics Consortium (ICGC) Data Coordinating Center
International Cancer Genomics Consortium (ICGC) Data Coordinating CenterNeuro, McGill University
 
JBEI Science Highlights - January 2023
JBEI Science Highlights - January 2023JBEI Science Highlights - January 2023
JBEI Science Highlights - January 2023SaraHarmon5
 
Applications of bioinformatics, main by kk sahu
Applications of bioinformatics, main by kk sahuApplications of bioinformatics, main by kk sahu
Applications of bioinformatics, main by kk sahuKAUSHAL SAHU
 
Mining Phenotypes: How to set up a reverse genetics experiment with an Arabid...
Mining Phenotypes: How to set up a reverse genetics experiment with an Arabid...Mining Phenotypes: How to set up a reverse genetics experiment with an Arabid...
Mining Phenotypes: How to set up a reverse genetics experiment with an Arabid...adcobb
 
Interoperable Data for KnetMiner and DFW Use Cases
Interoperable Data for KnetMiner and DFW Use CasesInteroperable Data for KnetMiner and DFW Use Cases
Interoperable Data for KnetMiner and DFW Use CasesRothamsted Research, UK
 

Semelhante a Network Biology: from lists to underpinnings of molecular behaviour (20)

Ontology for the Financial Services Industry
Ontology for the Financial Services IndustryOntology for the Financial Services Industry
Ontology for the Financial Services Industry
 
Collaboratively Creating the Knowledge Graph of Life
Collaboratively Creating the Knowledge Graph of LifeCollaboratively Creating the Knowledge Graph of Life
Collaboratively Creating the Knowledge Graph of Life
 
Modeling exposure events and adverse outcome pathways using ontologies
Modeling exposure events and adverse outcome pathways using ontologiesModeling exposure events and adverse outcome pathways using ontologies
Modeling exposure events and adverse outcome pathways using ontologies
 
Bms 2010
Bms 2010Bms 2010
Bms 2010
 
Bio informatics
Bio informaticsBio informatics
Bio informatics
 
Bio informatics
Bio informaticsBio informatics
Bio informatics
 
Metabolic Profiling_techniques and approaches.ppt
Metabolic Profiling_techniques and approaches.pptMetabolic Profiling_techniques and approaches.ppt
Metabolic Profiling_techniques and approaches.ppt
 
Metabolic Profiling: Limitations, Challenges.ppt
Metabolic Profiling: Limitations, Challenges.pptMetabolic Profiling: Limitations, Challenges.ppt
Metabolic Profiling: Limitations, Challenges.ppt
 
Biocuration activities for the International Cancer Genome Consortium (ICGC).
Biocuration activities for the International Cancer Genome Consortium (ICGC).Biocuration activities for the International Cancer Genome Consortium (ICGC).
Biocuration activities for the International Cancer Genome Consortium (ICGC).
 
A nanobiosensor based on hppd for mesotrione detection
A nanobiosensor based on hppd for mesotrione detection A nanobiosensor based on hppd for mesotrione detection
A nanobiosensor based on hppd for mesotrione detection
 
Chattanooga Research Institute Presentation
Chattanooga Research Institute PresentationChattanooga Research Institute Presentation
Chattanooga Research Institute Presentation
 
International Cancer Genomics Consortium (ICGC) Data Coordinating Center
International Cancer Genomics Consortium (ICGC) Data Coordinating CenterInternational Cancer Genomics Consortium (ICGC) Data Coordinating Center
International Cancer Genomics Consortium (ICGC) Data Coordinating Center
 
JBEI Science Highlights - January 2023
JBEI Science Highlights - January 2023JBEI Science Highlights - January 2023
JBEI Science Highlights - January 2023
 
Applications of bioinformatics, main by kk sahu
Applications of bioinformatics, main by kk sahuApplications of bioinformatics, main by kk sahu
Applications of bioinformatics, main by kk sahu
 
Mining Phenotypes: How to set up a reverse genetics experiment with an Arabid...
Mining Phenotypes: How to set up a reverse genetics experiment with an Arabid...Mining Phenotypes: How to set up a reverse genetics experiment with an Arabid...
Mining Phenotypes: How to set up a reverse genetics experiment with an Arabid...
 
FINAL
FINAL FINAL
FINAL
 
Proteomic and metabolomic
Proteomic and metabolomicProteomic and metabolomic
Proteomic and metabolomic
 
Interoperable Data for KnetMiner and DFW Use Cases
Interoperable Data for KnetMiner and DFW Use CasesInteroperable Data for KnetMiner and DFW Use Cases
Interoperable Data for KnetMiner and DFW Use Cases
 
Introduction to Proteogenomics
Introduction to Proteogenomics Introduction to Proteogenomics
Introduction to Proteogenomics
 
Pathogen Genome Data
Pathogen Genome DataPathogen Genome Data
Pathogen Genome Data
 

Mais de Michel Dumontier

A metadata standard for Knowledge Graphs
A metadata standard for Knowledge GraphsA metadata standard for Knowledge Graphs
A metadata standard for Knowledge GraphsMichel Dumontier
 
Data-Driven Discovery Science with FAIR Knowledge Graphs
Data-Driven Discovery Science with FAIR Knowledge GraphsData-Driven Discovery Science with FAIR Knowledge Graphs
Data-Driven Discovery Science with FAIR Knowledge GraphsMichel Dumontier
 
The Role of the FAIR Guiding Principles for an effective Learning Health System
The Role of the FAIR Guiding Principles for an effective Learning Health SystemThe Role of the FAIR Guiding Principles for an effective Learning Health System
The Role of the FAIR Guiding Principles for an effective Learning Health SystemMichel Dumontier
 
CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...
CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...
CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...Michel Dumontier
 
The role of the FAIR Guiding Principles in a Learning Health System
The role of the FAIR Guiding Principles in a Learning Health SystemThe role of the FAIR Guiding Principles in a Learning Health System
The role of the FAIR Guiding Principles in a Learning Health SystemMichel Dumontier
 
Acclerating biomedical discovery with an internet of FAIR data and services -...
Acclerating biomedical discovery with an internet of FAIR data and services -...Acclerating biomedical discovery with an internet of FAIR data and services -...
Acclerating biomedical discovery with an internet of FAIR data and services -...Michel Dumontier
 
Accelerating Biomedical Research with the Emerging Internet of FAIR Data and ...
Accelerating Biomedical Research with the Emerging Internet of FAIR Data and ...Accelerating Biomedical Research with the Emerging Internet of FAIR Data and ...
Accelerating Biomedical Research with the Emerging Internet of FAIR Data and ...Michel Dumontier
 
Are we FAIR yet? And will it be worth it?
Are we FAIR yet? And will it be worth it?Are we FAIR yet? And will it be worth it?
Are we FAIR yet? And will it be worth it?Michel Dumontier
 
The Future of FAIR Data: An international social, legal and technological inf...
The Future of FAIR Data: An international social, legal and technological inf...The Future of FAIR Data: An international social, legal and technological inf...
The Future of FAIR Data: An international social, legal and technological inf...Michel Dumontier
 
Keynote at the 2018 Maastricht University Dinner
Keynote at the 2018 Maastricht University DinnerKeynote at the 2018 Maastricht University Dinner
Keynote at the 2018 Maastricht University DinnerMichel Dumontier
 
The future of science and business - a UM Star Lecture
The future of science and business - a UM Star LectureThe future of science and business - a UM Star Lecture
The future of science and business - a UM Star LectureMichel Dumontier
 
Developing and assessing FAIR digital resources
Developing and assessing FAIR digital resourcesDeveloping and assessing FAIR digital resources
Developing and assessing FAIR digital resourcesMichel Dumontier
 
Advancing Biomedical Knowledge Reuse with FAIR
Advancing Biomedical Knowledge Reuse with FAIRAdvancing Biomedical Knowledge Reuse with FAIR
Advancing Biomedical Knowledge Reuse with FAIRMichel Dumontier
 
A Framework to develop the FAIR Metrics
A Framework to develop the FAIR MetricsA Framework to develop the FAIR Metrics
A Framework to develop the FAIR MetricsMichel Dumontier
 
FAIR principles and metrics for evaluation
FAIR principles and metrics for evaluationFAIR principles and metrics for evaluation
FAIR principles and metrics for evaluationMichel Dumontier
 
Towards metrics to assess and encourage FAIRness
Towards metrics to assess and encourage FAIRnessTowards metrics to assess and encourage FAIRness
Towards metrics to assess and encourage FAIRnessMichel Dumontier
 

Mais de Michel Dumontier (20)

A metadata standard for Knowledge Graphs
A metadata standard for Knowledge GraphsA metadata standard for Knowledge Graphs
A metadata standard for Knowledge Graphs
 
Data-Driven Discovery Science with FAIR Knowledge Graphs
Data-Driven Discovery Science with FAIR Knowledge GraphsData-Driven Discovery Science with FAIR Knowledge Graphs
Data-Driven Discovery Science with FAIR Knowledge Graphs
 
Evaluating FAIRness
Evaluating FAIRnessEvaluating FAIRness
Evaluating FAIRness
 
The Role of the FAIR Guiding Principles for an effective Learning Health System
The Role of the FAIR Guiding Principles for an effective Learning Health SystemThe Role of the FAIR Guiding Principles for an effective Learning Health System
The Role of the FAIR Guiding Principles for an effective Learning Health System
 
CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...
CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...
CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...
 
The role of the FAIR Guiding Principles in a Learning Health System
The role of the FAIR Guiding Principles in a Learning Health SystemThe role of the FAIR Guiding Principles in a Learning Health System
The role of the FAIR Guiding Principles in a Learning Health System
 
Acclerating biomedical discovery with an internet of FAIR data and services -...
Acclerating biomedical discovery with an internet of FAIR data and services -...Acclerating biomedical discovery with an internet of FAIR data and services -...
Acclerating biomedical discovery with an internet of FAIR data and services -...
 
Accelerating Biomedical Research with the Emerging Internet of FAIR Data and ...
Accelerating Biomedical Research with the Emerging Internet of FAIR Data and ...Accelerating Biomedical Research with the Emerging Internet of FAIR Data and ...
Accelerating Biomedical Research with the Emerging Internet of FAIR Data and ...
 
Are we FAIR yet? And will it be worth it?
Are we FAIR yet? And will it be worth it?Are we FAIR yet? And will it be worth it?
Are we FAIR yet? And will it be worth it?
 
The Future of FAIR Data: An international social, legal and technological inf...
The Future of FAIR Data: An international social, legal and technological inf...The Future of FAIR Data: An international social, legal and technological inf...
The Future of FAIR Data: An international social, legal and technological inf...
 
Keynote at the 2018 Maastricht University Dinner
Keynote at the 2018 Maastricht University DinnerKeynote at the 2018 Maastricht University Dinner
Keynote at the 2018 Maastricht University Dinner
 
The future of science and business - a UM Star Lecture
The future of science and business - a UM Star LectureThe future of science and business - a UM Star Lecture
The future of science and business - a UM Star Lecture
 
Are we FAIR yet?
Are we FAIR yet?Are we FAIR yet?
Are we FAIR yet?
 
Developing and assessing FAIR digital resources
Developing and assessing FAIR digital resourcesDeveloping and assessing FAIR digital resources
Developing and assessing FAIR digital resources
 
Advancing Biomedical Knowledge Reuse with FAIR
Advancing Biomedical Knowledge Reuse with FAIRAdvancing Biomedical Knowledge Reuse with FAIR
Advancing Biomedical Knowledge Reuse with FAIR
 
A Framework to develop the FAIR Metrics
A Framework to develop the FAIR MetricsA Framework to develop the FAIR Metrics
A Framework to develop the FAIR Metrics
 
FAIR principles and metrics for evaluation
FAIR principles and metrics for evaluationFAIR principles and metrics for evaluation
FAIR principles and metrics for evaluation
 
Towards metrics to assess and encourage FAIRness
Towards metrics to assess and encourage FAIRnessTowards metrics to assess and encourage FAIRness
Towards metrics to assess and encourage FAIRness
 
Data Science for the Win
Data Science for the WinData Science for the Win
Data Science for the Win
 
2016 bmdid-mappings
2016 bmdid-mappings2016 bmdid-mappings
2016 bmdid-mappings
 

Último

Call Girls Jayanagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jayanagar Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Jayanagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jayanagar Just Call 7001305949 Top Class Call Girl Service Availablenarwatsonia7
 
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Availablenarwatsonia7
 
College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...
College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...
College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...Miss joya
 
Call Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service Availablenarwatsonia7
 
Hemostasis Physiology and Clinical correlations by Dr Faiza.pdf
Hemostasis Physiology and Clinical correlations by Dr Faiza.pdfHemostasis Physiology and Clinical correlations by Dr Faiza.pdf
Hemostasis Physiology and Clinical correlations by Dr Faiza.pdfMedicoseAcademics
 
Call Girls Service In Shyam Nagar Whatsapp 8445551418 Independent Escort Service
Call Girls Service In Shyam Nagar Whatsapp 8445551418 Independent Escort ServiceCall Girls Service In Shyam Nagar Whatsapp 8445551418 Independent Escort Service
Call Girls Service In Shyam Nagar Whatsapp 8445551418 Independent Escort Serviceparulsinha
 
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service Chennai
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service ChennaiCall Girls Service Chennai Jiya 7001305949 Independent Escort Service Chennai
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service ChennaiNehru place Escorts
 
Sonagachi Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Sonagachi Call Girls Services 9907093804 @24x7 High Class Babes Here Call NowSonagachi Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Sonagachi Call Girls Services 9907093804 @24x7 High Class Babes Here Call NowRiya Pathan
 
Housewife Call Girls Bangalore - Call 7001305949 Rs-3500 with A/C Room Cash o...
Housewife Call Girls Bangalore - Call 7001305949 Rs-3500 with A/C Room Cash o...Housewife Call Girls Bangalore - Call 7001305949 Rs-3500 with A/C Room Cash o...
Housewife Call Girls Bangalore - Call 7001305949 Rs-3500 with A/C Room Cash o...narwatsonia7
 
Bangalore Call Girls Marathahalli 📞 9907093804 High Profile Service 100% Safe
Bangalore Call Girls Marathahalli 📞 9907093804 High Profile Service 100% SafeBangalore Call Girls Marathahalli 📞 9907093804 High Profile Service 100% Safe
Bangalore Call Girls Marathahalli 📞 9907093804 High Profile Service 100% Safenarwatsonia7
 
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...narwatsonia7
 
Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...
Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...
Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...narwatsonia7
 
Russian Call Girl Brookfield - 7001305949 Escorts Service 50% Off with Cash O...
Russian Call Girl Brookfield - 7001305949 Escorts Service 50% Off with Cash O...Russian Call Girl Brookfield - 7001305949 Escorts Service 50% Off with Cash O...
Russian Call Girl Brookfield - 7001305949 Escorts Service 50% Off with Cash O...narwatsonia7
 
Book Call Girls in Yelahanka - For 7001305949 Cheap & Best with original Photos
Book Call Girls in Yelahanka - For 7001305949 Cheap & Best with original PhotosBook Call Girls in Yelahanka - For 7001305949 Cheap & Best with original Photos
Book Call Girls in Yelahanka - For 7001305949 Cheap & Best with original Photosnarwatsonia7
 
College Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort Service
College Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort ServiceCollege Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort Service
College Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort ServiceNehru place Escorts
 
Artifacts in Nuclear Medicine with Identifying and resolving artifacts.
Artifacts in Nuclear Medicine with Identifying and resolving artifacts.Artifacts in Nuclear Medicine with Identifying and resolving artifacts.
Artifacts in Nuclear Medicine with Identifying and resolving artifacts.MiadAlsulami
 
Call Girl Service Bidadi - For 7001305949 Cheap & Best with original Photos
Call Girl Service Bidadi - For 7001305949 Cheap & Best with original PhotosCall Girl Service Bidadi - For 7001305949 Cheap & Best with original Photos
Call Girl Service Bidadi - For 7001305949 Cheap & Best with original Photosnarwatsonia7
 
See the 2,456 pharmacies on the National E-Pharmacy Platform
See the 2,456 pharmacies on the National E-Pharmacy PlatformSee the 2,456 pharmacies on the National E-Pharmacy Platform
See the 2,456 pharmacies on the National E-Pharmacy PlatformKweku Zurek
 
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipur
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service JaipurHigh Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipur
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipurparulsinha
 
Russian Call Girls in Pune Riya 9907093804 Short 1500 Night 6000 Best call gi...
Russian Call Girls in Pune Riya 9907093804 Short 1500 Night 6000 Best call gi...Russian Call Girls in Pune Riya 9907093804 Short 1500 Night 6000 Best call gi...
Russian Call Girls in Pune Riya 9907093804 Short 1500 Night 6000 Best call gi...Miss joya
 

Último (20)

Call Girls Jayanagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jayanagar Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Jayanagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jayanagar Just Call 7001305949 Top Class Call Girl Service Available
 
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Available
 
College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...
College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...
College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...
 
Call Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service Available
 
Hemostasis Physiology and Clinical correlations by Dr Faiza.pdf
Hemostasis Physiology and Clinical correlations by Dr Faiza.pdfHemostasis Physiology and Clinical correlations by Dr Faiza.pdf
Hemostasis Physiology and Clinical correlations by Dr Faiza.pdf
 
Call Girls Service In Shyam Nagar Whatsapp 8445551418 Independent Escort Service
Call Girls Service In Shyam Nagar Whatsapp 8445551418 Independent Escort ServiceCall Girls Service In Shyam Nagar Whatsapp 8445551418 Independent Escort Service
Call Girls Service In Shyam Nagar Whatsapp 8445551418 Independent Escort Service
 
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service Chennai
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service ChennaiCall Girls Service Chennai Jiya 7001305949 Independent Escort Service Chennai
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service Chennai
 
Sonagachi Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Sonagachi Call Girls Services 9907093804 @24x7 High Class Babes Here Call NowSonagachi Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Sonagachi Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
 
Housewife Call Girls Bangalore - Call 7001305949 Rs-3500 with A/C Room Cash o...
Housewife Call Girls Bangalore - Call 7001305949 Rs-3500 with A/C Room Cash o...Housewife Call Girls Bangalore - Call 7001305949 Rs-3500 with A/C Room Cash o...
Housewife Call Girls Bangalore - Call 7001305949 Rs-3500 with A/C Room Cash o...
 
Bangalore Call Girls Marathahalli 📞 9907093804 High Profile Service 100% Safe
Bangalore Call Girls Marathahalli 📞 9907093804 High Profile Service 100% SafeBangalore Call Girls Marathahalli 📞 9907093804 High Profile Service 100% Safe
Bangalore Call Girls Marathahalli 📞 9907093804 High Profile Service 100% Safe
 
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...
 
Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...
Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...
Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...
 
Russian Call Girl Brookfield - 7001305949 Escorts Service 50% Off with Cash O...
Russian Call Girl Brookfield - 7001305949 Escorts Service 50% Off with Cash O...Russian Call Girl Brookfield - 7001305949 Escorts Service 50% Off with Cash O...
Russian Call Girl Brookfield - 7001305949 Escorts Service 50% Off with Cash O...
 
Book Call Girls in Yelahanka - For 7001305949 Cheap & Best with original Photos
Book Call Girls in Yelahanka - For 7001305949 Cheap & Best with original PhotosBook Call Girls in Yelahanka - For 7001305949 Cheap & Best with original Photos
Book Call Girls in Yelahanka - For 7001305949 Cheap & Best with original Photos
 
College Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort Service
College Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort ServiceCollege Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort Service
College Call Girls Vyasarpadi Whatsapp 7001305949 Independent Escort Service
 
Artifacts in Nuclear Medicine with Identifying and resolving artifacts.
Artifacts in Nuclear Medicine with Identifying and resolving artifacts.Artifacts in Nuclear Medicine with Identifying and resolving artifacts.
Artifacts in Nuclear Medicine with Identifying and resolving artifacts.
 
Call Girl Service Bidadi - For 7001305949 Cheap & Best with original Photos
Call Girl Service Bidadi - For 7001305949 Cheap & Best with original PhotosCall Girl Service Bidadi - For 7001305949 Cheap & Best with original Photos
Call Girl Service Bidadi - For 7001305949 Cheap & Best with original Photos
 
See the 2,456 pharmacies on the National E-Pharmacy Platform
See the 2,456 pharmacies on the National E-Pharmacy PlatformSee the 2,456 pharmacies on the National E-Pharmacy Platform
See the 2,456 pharmacies on the National E-Pharmacy Platform
 
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipur
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service JaipurHigh Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipur
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipur
 
Russian Call Girls in Pune Riya 9907093804 Short 1500 Night 6000 Best call gi...
Russian Call Girls in Pune Riya 9907093804 Short 1500 Night 6000 Best call gi...Russian Call Girls in Pune Riya 9907093804 Short 1500 Night 6000 Best call gi...
Russian Call Girls in Pune Riya 9907093804 Short 1500 Night 6000 Best call gi...
 

Network Biology: from lists to underpinnings of molecular behaviour

  • 1. Network Biology:from lists to underpinnings of molecular behaviour Michel Dumontier, Ph.D. Associate Professor of Bioinformatics Carleton University 1 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 2. 2 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 3. Provenance This talk was prepared in part with input from the “Interpreting Gene Lists” workshop put forward by the Canadian Bioinformatics Workshops (bioinformatics.ca) http://bioinformatics.ca/workshops/2009/course-content BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier] 3
  • 4. So you did some mass spectrometry? Protein Identification 4 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 5. database search vs de novo W R V A L T Database ofknown peptidesMDERHILNM, KLQWVCSDL, PTYWASDL, ENQIKRSACVM, TLACHGGEM, NGALPQWRT, HLLERTKMNVV, GGPASSDA, GGLITGMQSD, MQPLMNWE, ALKIIMNVRT, AVGELTK, HEWAILF, GHNLWAMNAC, GVFGSVLRA, EKLNKAATYIN.. G E P L K C W D T W R V A L T G E P L K C W D T Database Search de novo AVGELTK 5 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 6. 6 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 7. My experiment worked and I have dozens, hundreds, or thousands of hits…. now what? Protein Identification ? 7 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 8. Use the list to explore Biology Determine significant shared attributes Explore putative mechanisms of actions Test hypotheses Protein Identification Network Biology Eureka! Hypothesis on the molecular basis of disease/process 8 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 9. Detoxification Oxidative Metabolism # in list having attribute Enriched in smokers = UP-regulated in smokers # in list sharing these attributes 9 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 10. Outline Explore identified proteins Attribute enrichment Networks Pathways Lab 10 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 11. A hypothesis underlies the list of identified proteins An initial question was posed, an experiment performed and a list of candidates obtained. The question is, what are the roles of these entities in the biological process being investigated. Normal vs pathological Response to stimulus Interactions and complexes 11 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 12. Biological Answers Computational systems biology Information retrieval and summary Interaction network analysis Pathway analysis Function prediction 12 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 13. Molecular Attributes An attribute provides information about to the entity in question (e.g. shape, function, process) Sequence and structure provides information about Motifs, domains, interaction/binding sites, post-translational modifications, conformational changes, molecular complexes, mutations, conservation/evolution Functions, localization, biological / pathological processes 13 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 14. Gene Ontology Captures terminology related to three aspects biological processes molecular functions cellular components Relationships between terms are largely defined with “is a” and “part of” relations Cell division Isomerase activity 14 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 15. cell membrane chloroplast mitochondrial chloroplast membrane membrane is-a part-of GO Structure Species independent. Some lower-level terms are specific to a group, but higher level terms are not 15 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 16.
  • 18.
  • 19. Good for making pie charts16 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 20. Annotation Manual annotation Created by scientific curators High quality Small number (time-consuming to create) Electronic annotation Annotation derived without human validation Computational predictions (accuracy varies) Lower ‘quality’ than manual codes Key point: be aware of annotation origin 17 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 21.
  • 22. IDA: Inferred from Direct Assay
  • 23. IPI: Inferred from Physical Interaction
  • 24. IMP: Inferred from Mutant Phenotype
  • 25. IGI: Inferred from Genetic Interaction
  • 26. IEP: Inferred from Expression Pattern
  • 29. IC: Inferred by Curator
  • 30. ND: No Data available
  • 31. IEA: Inferred from electronic annotation18 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 32. Variable Coverage Lomax J. Get ready to GO! A biologist's guide to the Gene Ontology. Brief Bioinform. 2005 Sep;6(3):298-304. 19 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 33. GO Software Tools GO resources are freely available to anyone without restriction Includes the ontologies, gene associations and tools developed by GO Other groups have used GO to create tools for many purposes http://www.geneontology.org/GO.tools 20 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 34. Accessing GO: QuickGO http://www.ebi.ac.uk/ego/ 21 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 35. Explore Ontologies http://www.ebi.ac.uk/ontology-lookup 22 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 36. Databases of Molecular Annotation NCBI Genbank / RefSeq Entrez Gene EBI UniProt Ensembl BioMart (eukaryotes) Model Organism Databases Berkeley Drosophila Genome Project (BDGP) dictyBase (Dictyostelium discoideum) FlyBase (Drosophila melanogaster) GeneDB (Schizosaccharomyces pombe, Plasmodium falciparum, Leishmania major and Trypanosoma brucei) UniProt Knowledgebase (Swiss-Prot/TrEMBL/PIR-PSD) and InterPro databases Gramene (grains, including rice, Oryza) Mouse Genome Database (MGD) and Gene Expression Database (GXD) (Mus musculus) Rat Genome Database (RGD) (Rattus norvegicus) Reactome Saccharomyces Genome Database (SGD) (Saccharomyces cerevisiae) The Arabidopsis Information Resource (TAIR) (Arabidopsis thaliana) The Institute for Genomic Research (TIGR): databases on several bacterial species WormBase (Caenorhabditis elegans) Zebrafish Information Network (ZFIN): (Danio rerio 23 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 37. 24 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 38. Identifiers Identifiers (IDs) are ideally unique, stable names or numbers that help track database records E.g. Social Insurance Number, Entrez Gene ID 41232 Gene and protein information stored in many databases  Genes have many IDs Records for: Gene, DNA, RNA, Protein Important to recognize the correct record type E.g. Entrez Gene records don’t store sequence. They link to DNA regions, RNA transcripts and proteins. 25 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 39. NCBI Database Links NCBI: U.S. National Center for Biotechnology Information Part of National Library of Medicine (NLM) http://www.ncbi.nlm.nih.gov/Database/datamodel/data_nodes.swf 26 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 40. Common Identifiers Species-specific HUGO HGNC BRCA2 MGI MGI:109337 RGD 2219 ZFIN ZDB-GENE-060510-3 FlyBase CG9097 WormBase WBGene00002299 or ZK1067.1 SGD S000002187 or YDL029W Annotations InterPro IPR015252 OMIM 600185 Pfam PF09104 Gene Ontology GO:0000724 SNPs rs28897757 Experimental Platform Affymetrix 208368_3p_s_at Agilent A_23_P99452 CodeLink GE60169 Illumina GI_4502450-S Gene Ensembl ENSG00000139618 Entrez Gene 675 Unigene Hs.34012 RNA transcript GenBank BC026160.1 RefSeq NM_000059 Ensembl ENST00000380152 Protein Ensembl ENSP00000369497 RefSeq NP_000050.2 UniProt BRCA2_HUMAN or A1YBP1_HUMAN IPI IPI00412408.1 EMBL AF309413 PDB 1MIU Red = Recommended 27 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 41. Identifier Mapping So many IDs! Mapping (conversion) is a headache Four main uses Disambiguate similarly named entities Used to reference related information Biological and informational provenance E.g. Genes to proteins, Entrez Gene to Affy Unification during dataset merging Equivalent entities 28 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 42. ID Mapping Services Synergizer http://llama.med.harvard.edu/synergizer/translate/ Ensembl BioMart http://www.ensembl.org UniProt http://www.uniprot.org/ 29 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 43. Outline Explore identified proteins Attribute enrichment Networks Pathways 30 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 44. Attribute Enrichment (AE) Given: list: e.g. RRP6, MRD1, RRP7, RRP43, RRP42 attributes: e.g. function, process, localization, interactions AE Question: Are any of the attributes surprisingly enriched in the list? Details: How to assess “surprisingly” (statistics) How to correct for repeating the tests 31 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 45. What is a P-value? The P-value is (a bound) on the probability that the “null hypothesis” is true, Calculated through statistics with the data and testing the probability of observing those statistics, or ones more extreme, given a sample of the same size distributed according to the null hypothesis, Intuitively: P-value is the probability of a false positive result (aka “Type I error”) 32 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 46. How likely are the observed differences between the two distributions due to chance? 0 1 7 1 5 6 6 0 1 1 0 7 2 0 1 2 1 0 value value distribution 33 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 47. AE using the T-test Answer: Two-tailed T-test Black: N1=500 Mean: m1 = 1.1 Std: s1 = 0.9 Red: N2=4500 Mean: m1 = 4.9 Std: s1 = 1.0 T-statistic = Formal Question: What is the probability of observing the T-statistic or one more extreme if the means of the two distributions were the same? = -88.5 34 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 48. AE using the T-test P-value = shaded area * 2 -88.5 T-distribution Probability density 0 T-statistic T-statistic = Formal Question: What is the probability of observing the T-statistic or one more extreme if the means of the two distributions were the same? = -88.5 35 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 49. T-test limitations Values are positive and have increasing density near zero, e.g. sequence counts Bimodal “two-bumped” distributions. Distributions with outliers, or “heavy-tailed” distributions Probability density 0 score  Probability density Probability density score  score  Assumes distributions are both approximately Gaussian (i.e. normal) Score distribution assumption is often true for: Log ratios from microarrays Score distribution assumption is rarely true for: Peptide counts, sequence tags (SAGE or NextGen sequencing), transcription factor binding sites hits Tests for significance of difference in means of two distribution but does not test for other differences between distributions. 36 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 50. Kolmogorov-Smirnov (K-S) test Probability density 0 score  Cumulative distribution 1.0 Cumulative probability 0.5 Length = 0.4 0 Question: Are the red and black distributions significantly different? score  Formal question: Is the length of largest difference between the “empirical distribution functions” statistically significant? Calculate cumulative distributions of red and black 37 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 51. What is the probability of finding 4 or more proteins with feature X in a random sample of 5 proteins list RRP6 MRD1 RRP7 RRP43 RRP42 Background population: 500 X proteins, 5000 proteins 38 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 52. Fisher’s exact test Null distribution P-value Answer = 4.6 x 10-4 list RRP6 MRD1 RRP7 RRP43 RRP42 P-value for Fisher’s exact test is “the probability that a random draw of the same size as the list from the background population would produce the observed number (or more) of attributes in the list.”, depends on size of the list, # with features (in list, background), and the background population. Background population: 500 X proteins, 5000 proteins 39 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 53. Important details To test for under-enrichment of “black”, test for over-enrichment of “red”. Need to choose “background population” appropriately, e.g., if only portion of the total complement is queried (or having annotation), only use that population as background. To test for enrichment of more than one independent types of annotation (red vs black and circle vs square), apply Fisher’s exact test separately for each type. The hypergeometric test is equivalent to a one-tailed Fisher’s exact test. 40 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 54. How to win the P-value lottery, part 1 Random draws Expect a random draw with observed enrichment once every 1 / P-value draws … 7,834 draws later … Background population: 500 X 5000 Y 41 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 55. How to win the P-value lottery, part 2Keep the list the same, evaluate different annotations Different annotations Observed draw RRP6 MRD1 RRP7 RRP43 RRP42 RRP6 MRD1 RRP7 RRP43 RRP42 42 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 56. Correcting for multiple tests The Bonferroni correction controls the probability any one test is due to random chance akaFamily-Wise Error Rate (FWER) If M = # of annotations tested: Corrected P-value = M x original P-value The Benjamini-Hochberg (B-H) controls the proportion of positive tests (i.e. rejections of the null hypothesis) that are false positives akaFalse Discovery Rate (FDR) FDR is the expected proportion of the observed enrichments that are due to random chance. Less stringent than the Bonferroni 43 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 57. Reducing multiple test correction stringency The correction to the P-value threshold a depends on the # of tests that you do, so, no matter what, the more tests you do, the more sensitive the test needs to be Can control the stringency by reducing the number of tests: e.g. use GO slim or restrict testing to the appropriate GO annotations. 44 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 58. AE tools Web-based tools Funspec: easy tool for yeast, not maintained, uses GO annotations and some annotations (e.g. protein complexes) YeastFeatures Similar to Funspec, different datasets and presentation GoMiner: Uses GO annotations, covers many organisms, needs a background set of genes Cytoscape-based tools BINGO: Does GO annotations and displays enrichment results graphically and visually organizes related categories 45 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 59.
  • 60. last updated 200246 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 61. http://software.dumontierlab.com/yeastfeatures 47 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 62. 48 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 63. GoMiner, part 1http://discover.nci.nih.gov/gominer 1. Click “web interface” 2. Upload background 3. Upload list 4. Choose organism 5. Choose evidence code (All or Level 1) 49 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 64. GoMiner, part 2 6. Restrict # of tests via category size 7. Restrict # of tests via GO hierarchy 8. Results emailed to this address, in a few minutes 50 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 65. DAVID, part 1 http://david.abcc.ncifcrf.gov/ Paste list here DAVID automatically detects organism Choose ID type List type: list or background? 51 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 66. DAVID, part 2http://david.abcc.ncifcrf.gov/ 52 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 67. BINGO, an ORA cytoscape pluginhttp://www.psb.ugent.be/cbd/papers/BiNGO/index.htm Links represent parent-child relationships in GO ontology Colours represent significance of enrichment Nodes represent GO categories 53 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 68. 54 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 69.
  • 71. Functional networksPathways 55 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 72.
  • 73. More efficient than searching databases gene-by-gene
  • 74.
  • 76. Identify potential regulators56 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 77. network In biology, a network is a graph comprised of nodes that correspond to entities (genes, proteins, small molecules) and edges that correspond to physical/agentive or associative relations between entities. Vertex (node) Cycle Edge -5 Directed Edge (Arc) Weighted Edge 10 7 57 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 78. Integration in a Network Context 58 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 79. Integration in a Network Context Expression data mapped to node colours 59 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 80. Mapping Biology to a Network A simple mapping: Protein-protein interactions one protein/node, one interaction/edge Edges can represent other relationships Physical e.g. protein-protein interaction Regulatory e.g. kinase activates target Genetic e.g. epistasis Similarity e.g. protein sequence similarity Critical: understand the mapping for network analysis 60 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 81. Protein Sequence Similarity Network http://apropos.icmb.utexas.edu/lgl/ 61 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 82. Literature Network Computationally extract gene relationships from text, usually PubMed abstracts Useful if network is not in a database Literature search tool BUT not perfect Problems recognizing gene names Natural language processing is difficult Agilent Literature Search Cytoscape plugin iHOP (www.ihop-net.org/UniPub/iHOP/) 62 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 83. Agilent Literature Search 63 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 84. Cytoscape Network produced by Literature Search. Abstract from the scientific literature Sentences for an edge 64 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 85. Enrichment Map Overlap A B 65 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 86. Nodes represent gene-sets 66 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 87. Muscle Contraction Olfactory Receptor Ubiquitin Processes Ubiquitin-dependent Proteolysis Ectodermal Dev. & Keratinocyte Diff. DNA Repair Mitotic Cell Cycle Ubiquitin Ligase DNA Processes Cytoskeleton DNA Replication Intermediate Filament Cytoskeleton Microtubule Cytoskeleton Ras GTPase mRNA Transport Chromosome RNA Processes Serine Endopeptidase Chromatin Remodeling RNA Splicing Fatty Acid Metabolism Ion Channel Transcription Calcium rRNA Processing Mitochondrial Oxidative Metabolism Ribonucleotide Metabolism Potassium Sodium Translation 67 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 88. 68 Physical Networks B A Between two molecular objects DNA, RNA, gene, protein, complex, small molecule, photon Requires a site of interaction / binding Biologically relevant: Present/expressed at the same time Share a cellular location Leads to some biologically relevant outcome BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 89. Molecular Interactions RAS interacting with RALGDS (PDB: 1LFD) Synthetic protein interacting with ATP and Zinc (PDB: 2P0X) 69 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 90. 70 Experimental Interaction Discovery MassSpectrometry Genetics Two-Hybrid Direct, Physical Indirect, Physical Indirect, Genetic Microarray X-Ray NMR BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 91. 71 Experimental Considerations How do you know if the interaction really exists? Each method has its advantages and disadvantages. Be aware of systematic errors Be aware of contaminants. Each method observes interactions from a slightly different experimental condition. Support from many different sources is certainly better (necessary) than just one. BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 92. 72 Some affinity purification caveats First and most importantly, this is only a representation of the observation. You can only tell what proteins are in the eluate; you can’t tell how they are connected to one another. If there is only one other protein present (B), then its likely that A and B are directly interacting. But, what if I told you that two other proteins (B and C) were present along with A…. A B A C B BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 93. 73 Complexes with unknown topology A A A B C B C B C Which of these models is correct? The complex described by this experimental result is said to have an Unknown Topology. BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 94. 74 Complexes with unknown stoichiometry A A B B B Here’s another possibility? The complex described by this experimental result is also said to have Unknown Stoichiometry. BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 95. 75 Interaction Models Actual Topology Spoke Matrix Simple model, useful for data navigation More accurate Theoretical max. number of interactions BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 96. 76 High-throughput Mass Spectrometric Protein Complex Identification (HMS-PCI) Mike Tyers, SLRI Ste12 Ho et al. Nature. 2002 Jan 10;415(6868):180-3 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 97. 77 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 98. 78 k-core analysis A part of a graph where every node is connected to other nodes with at least k edges (k=0,1,2,3...) Highest k-core is a central most densely connected region of a graph Regions of dense connectivity may represent molecular complexes Therefore, high k-cores may be molecular complexes BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 99. 79 Pre MS Ho 6-core 6-core Interaction can define function Gavin Union 6-core 9-core MCODE plugin for Cytoscape BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 100. 80 http://pathguide.org BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 101. Interaction Databases Experiment (E) Structure detail (S) Predicted Physical (P) Functional (F) Curated (C) Homology modeling (H) *IMEx consortium 81 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 102. Network Classification of Disease Traditional: Gene association Limitations: Too many genes reduces statistical power New: Active cell map based approaches combining network and molecular profiles Chuang HY, Lee E, Liu YT, Lee D, Ideker T Network-based classification of breast cancer metastasis Mol Syst Biol. 2007;3:140. Epub 2007 Oct 16 Liu M, Liberzon A, Kong SW, Lai WR, Park PJ, Kohane IS, Kasif S Network-based analysis of affected biological processes in type 2 diabetes models PLoS Genet. 2007 Jun;3(6):e96 Efroni S, Schaefer CF, Buetow KH Identification of key processes underlying cancer phenotypes using biologic pathway analysis PLoS ONE. 2007 May 9;2(5):e425 82 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 103. Network-Based Breast Cancer Classification 57k intx from Y2H, orthology, co-citation, HPRD, BIND, Reactome 2 breast cancer cohorts, different expression platforms Chuang HY, Lee E, Liu YT, Lee D, Ideker T Network-based classification of breast cancer metastasis Mol Syst Biol. 2007;3:140. Epub 2007 Oct 16 83 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 104. Similar network markers across 2 data sets (better than original overlap) Increased classification accuracy Better coverage of known cancer risk genes (*) 84 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 105. PIPE Predicts yeast PPI from sequence Uses interaction databases to find similar interacting proteins Estimates the site of interaction 75% accuracy (61% sensitivity, 89% specificity) Finds new interactions among complexes 85 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 106. 86 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 107. 87 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 108. PIPE2 First all-to-all sequence-based computational screen of PPIs in yeast 29,589 high confidence interactions of ~ 2 x 107 possible pairs 16,000x faster than PIPE 99.95% specificity 88 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 109. 89 Synthetic Genetic Interactions Synthetic genetic interactions (lethal, slow growth) Mate two mutants without phenotypes to get a daughter cell with a phenotype Synthetic lethal (SL), slow growth robotic mating using the yeast deletion library Genetic interactions provide functional data on protein interactions or redundant genes About 23% of known SLs (1295 - YPD+MIPS) were known protein interactions in yeast Tong et al. Science. 2001 Dec 14;294(5550):2364-8 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 110. 90 Cell Polarity Cell Wall Maintenance Cell Structure Mitosis Chromosome Structure DNA Synthesis DNA Repair Unknown Others Synthetic Genetic Interactions in Yeast Tong, Boone BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 111.
  • 112. Have common cellular roleSprinzak, Sattath, Margalit, J Mol Biol, 2003 91 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 113. Comparisons All methods except for Y2H and synthetic lethality technique are biased toward abundant proteins. PPI bias toward certain cellular localizations. Evolutionarily conserved proteins have much better coverage in Y2H than the proteins restricted to a certain organism. C. Von Mering et al, Nature, 2002: 92 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 114. Functional Associations Molecular Interactions Regulatory Interactions Genetic Interactions Similarity relationships Co-expression Protein sequence Domain architecture Phylogenetic profiles Gene neighborhood Gene fusion … 93 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 115. http://string.embl.de/ von Mering et al., Nucleic Acids Res., 2005 94 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 116. 95 95 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 117. 96 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 118. Gene Function Prediction using a Multiple Association Network Integration Algorithm Query-specific weights for multifaceted function queries w1x w2x w3x weights CDC27 Cell cycle CDC23 + + APC11 UNK1 Co-complexed Durrett 2006 Genetic Tong et al. 2001 RAD54 XRS2 DNA repair = MRE11 UNK2 Co-expression Pavlidis et al, 2002, Lanckriet et al, 2004 Mostafavi et al, 2008 97 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 119. GeneMANIA Cytoscape Plugin 98 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 120. Outline Explore identified proteins Attribute enrichment Networks Pathways Lab 99 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 121. pathway In biology, a pathway is a network which consists of inputs (physical entities), outputs (physical entities, biological outcomes), and the molecular machinery and chemical transformations required/expected to realize the end-directed activity. 100 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 122. Using Pathway Information Expert knowledge Experimental Data Find active processes underlying a phenotype Databases Literature Pathway Information Pathway Analysis 101 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 123.
  • 124. Pathway data extremely difficult to combine and useVuk Pavlovic Sylva Donaldson 102 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 125. Aim: Convenient Access to Pathway Information http://www.pathwaycommons.org Facilitate creation and communication of pathway data Aggregate pathway data in the public domain Provide easy access for pathway analysis 103 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 126. Access From Cytoscape 104 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 127. cardiomyopathy: downregulated genes Fatty Acid Degradation? Other pathways / processes? GenMAPP.org 105 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 128. Fatty Acid Degradation Pathway 106 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 129. Cardiomyopathy Data on Fatty Acid Degradation Pathway 107 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 130. Visualizing Time Course Data on Pathways: Multiple Comparison View 108 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 131. Outline Explore identified proteins Attribute enrichment Networks Pathways Lab 109 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 132. 110 Network Analysis Cytoscape Visualize molecular interaction networks and integrate interactions with gene expression profiles and other state data. Data filters & custom plug-in architecture. http://www.cytoscape.org Biolayout Express 3D Large networks Gene expression www.sanger.ac.uk/Teams/Team101/biolayout/b3d.html BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 133. Expert knowledge Experimental Data Network Analysis using Cytoscape Find biological processes underlying a phenotype Databases Literature Network Information Network Analysis 111 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 134. http://cytoscape.org Network visualization and analysis Pathway comparison Literature mining Gene Ontology analysis Active modules Complex detection Network motif search UCSD, ISB, Agilent, MSKCC, Pasteur, UCSF, Unilever, UToronto, U Texas 112 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 135. Manipulate Networks Filter/Query Interaction Database Search Automatic Layout 113 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 136. Overview Zoom Focus PKC Cell Wall Integrity 114 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 137. Active Community http://www.cytoscape.org Help 8 tutorials, >10 case studies Mailing lists for discussion Documentation, data sets Annual Conference: Houston Nov 6-9, 2009 10,000s users, 2500 downloads/month >40 Plugins Extend Functionality Build your own, requires programming Cline MS et al. Integration of biological networks and gene expression data using Cytoscape Nat Protoc. 2007;2(10):2366-82 115 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 138. LAB Objective Create a map of the functional enrichments from the 14 input proteins Methods Use HGNC to obtain the gene symbols from the names Submit the gene symbols to a tool that already has datasets loaded. Get Attributes and do analysis on network 116 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 139. 14 Proteins ISOFORM of APOPTOSIS-INDUCING FACTOR 1, MITOCHONDRIAL QUINONE OXIDOREDUCTASE.; 26 KDA PROTEIN.;22 KDA PROTEIN.; 32 KDA PROTEIN. 14-3-3 PROTEIN EPSILON. ELONGATION FACTOR 1-GAMMA.; 50 KDA PROTEIN. AFG3-LIKE PROTEIN 2. 3-KETOACYL-COA THIOLASE, MITOCHONDRIAL IMPORTIN BETA-1 SUBUNIT. FH1/FH2 DOMAIN-CONTAINING PROTEIN ANNEXIN VI ISOFORM 2.; ANNEXIN A6. 2,4-DIENOYL-COA REDUCTASE, MITOCHONDRIAL HYDROXYACYL GLUTATHIONE HYDROLASE ISOFORM 1.; HYDROXYACYLGLUTATHIONE HYDROLASE. ISOFORM 1 OF ELECTRON TRANSFER FLAVOPROTEIN SUBUNIT BETA.; ISOFORM 2 OF ELECTRON TRANSFER FLAVOPROTEIN SUBUNIT BETA ISOFORM 1 OF LONG-CHAIN-FATTY-ACID--COA LIGASE 1 PHOSPHOLIPASE C DELTA 4. 117 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 140. Get their gene symbol/identifiersHGNC - http://www.genenames.org Provide a table of mappings What challenges did you face when trying to identify the symbols from textual descriptions? 118 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 141. Identify functional enrichments Discuss and provide a plot for the enrichment of Gene Ontology categories 119 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 142. Build an attribute enrichment network Which new proteins are functionally linked? What datasets were used in the network construction? 120 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 143. Attribute Enrichment with a custom data set Use BioMart to convert HGNC identifiers to Ensembl Identifiers Obtain the Gene Ontology categories for the target proteins and the background proteins. Use FUNC to do the enrichment analysis 121 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 144. 122 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 145. 123 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 146. 124 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 147. 125 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 148. Collect the Gene Ontology attributes for the list, then for all the human genes 126 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
  • 149. Next steps are harder… http://func.eva.mpg.de/ To use FUNC, you need to convert the BioMART output to the file format above. This is pretty easy to do in excel for the protein list, but excel can’t handle the results for all the human proteins. Need to write a small script… take BIOC3008 and become a competent in simple data manipulation  127 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Notas do Editor

  1. GeneMANIA uses query specific weights for multifaceted function queries.Let’s say you have a co-expression network that was generated from microarray data. You know there is a cluster of cell cycle genes, and a cluster of DNA repair genes, and a few unknown genes between or within those clusters.This tells you a little bit about your genes of interest.But you want to add in a genetic interaction network, which is considerably more complex.And a protein interaction network, which is even more complex.How do you know what network contains the most relevant information about your query genes?The GeneMANIA algorithm weights the networks based on how connected your query genes are. A network is weighted more heavily if your query genes are more connected within that network.GeneMANIA produces a composite network showing the weights of the genetic and protein interaction, and co-expression networks used to generate the composite network.