SlideShare uma empresa Scribd logo
1 de 61
Baixar para ler offline
Associative
                                                                             methods in
                                                                              Systems
                                                                              Biology

                                                                                Hugh
                                                                              Shanahan
Associative methods in Systems Biology
                                                                            Outline

                                                                            Gene
                                                                            Ontologies
                Hugh Shanahan                                               Over-representation
                                                                            Semantic similarity

                                                                            Associative
                                                                            Measures
          Department of Computer Science                                    Hypotheses
         Royal Holloway, University of London                               Linear Correlation
                                                                            Partial Correlation
                                                                            Non-linear measures

             September 22, 2009                                             Validation
                                                                            DREAM




                  Hugh Shanahan    Associative methods in Systems Biology
Outline

                                                                                 Associative
                                                                                 methods in
1   Outline                                                                       Systems
                                                                                  Biology

2   Gene Ontologies                                                                 Hugh
                                                                                  Shanahan
      Over-representation
                                                                                Outline
      Semantic similarity
                                                                                Gene
                                                                                Ontologies
3   Associative Measures                                                        Over-representation
                                                                                Semantic similarity

      Hypotheses                                                                Associative
      Linear Correlation                                                        Measures
                                                                                Hypotheses

      Partial Correlation                                                       Linear Correlation
                                                                                Partial Correlation

      Non-linear measures                                                       Non-linear measures

                                                                                Validation
                                                                                DREAM
4   Validation
      DREAM


                       Hugh Shanahan   Associative methods in Systems Biology
Gene Ontologies

                                                                                  Associative
                                                                                  methods in
                                                                                   Systems
                                                                                   Biology

                                                                                     Hugh
Before finding interactions, need to be able                                        Shanahan


    to systematically annotate all genes                                         Outline

                                                                                 Gene
    to determine which functional groupings are                                  Ontologies
                                                                                 Over-representation
    over-represented                                                             Semantic similarity


    measure objectively the “functional similarity” of two                       Associative
                                                                                 Measures
    genes.                                                                       Hypotheses
                                                                                 Linear Correlation
                                                                                 Partial Correlation
Gene Ontology (GO) is a means to do this.                                        Non-linear measures

                                                                                 Validation
                                                                                 DREAM




                        Hugh Shanahan   Associative methods in Systems Biology
Ontologies

                                                                                 Associative
                                                                                 methods in
                                                                                  Systems
                                                                                  Biology
    Abstract method for expressing structured data.                                 Hugh
                                                                                  Shanahan
    Annotation of any gene can be expressed in terms of
                                                                                Outline
    incresingly accurate description, e.g.
                                                                                Gene
    This gene is involved in transport.                                         Ontologies
                                                                                Over-representation

    This gene is involved in vesicle mediated                                   Semantic similarity

                                                                                Associative
    transport.                                                                  Measures
                                                                                Hypotheses
    This gene is involved in vesicle fusion.                                    Linear Correlation
                                                                                Partial Correlation

    Genes may not have an accurate annotation, so                               Non-linear measures

                                                                                Validation
    definition can stop higher up in this hierarchy.                             DREAM




                       Hugh Shanahan   Associative methods in Systems Biology
More complexity required

                                                                                 Associative
                                                                                 methods in
                                                                                  Systems
                                                                                  Biology
    Annotation is not a simple chain.
                                                                                    Hugh
    A single gene can have have a very specific annotation,                        Shanahan

    which comes from two (or more) more general                                 Outline
    descriptions.                                                               Gene
                                                                                Ontologies
    Different types of annotation as well: location,                            Over-representation
                                                                                Semantic similarity

    biochemistry, part of organism expressed in, and so on.                     Associative
                                                                                Measures
    An Ontology is a Directed Acyclic Graph (DAG), not a                        Hypotheses
                                                                                Linear Correlation
    Tree (this means a lot to Graph Theorists).                                 Partial Correlation
                                                                                Non-linear measures

    Each node in the DAG is an annotation term.                                 Validation
                                                                                DREAM
    Each “child” node can more than one “parent” nodes.



                       Hugh Shanahan   Associative methods in Systems Biology
GO’s visualised

                                                                                                                                                  Associative
 IEWS                                                                                                                                             methods in
                                                                                                                                                   Systems
                                                                                                                                                   Biology

            a                                    b                                             c                                                     Hugh
                                                                                                                 Biological
                                                                                                                 process (root)                    Shanahan

                                                                                                                                                 Outline
                                                                                                    Transport          Membrane organization
                                                                                                                       and biogenesis            Gene
asing
ficity                                                                                                                                           Ontologies
                                                                                                        is_a                        is_a
or
                                                                                                                                                 Over-representation
 larity
                                                                                                   Vesicle-mediated                              Semantic similarity
                                                                                                                             Membrane fusion
                                                                                                   transport
                                                                                                                                                 Associative
                                                                                                      part_of                     is_a
                                                                                                                                                 Measures
                                                                                                                                                 Hypotheses
                                                                                                                 Vesicle fusion
                                                                                                                                                 Linear Correlation
                                                                                                                                                 Partial Correlation
                Figure 1 | Simple trees versus directed acyclic graphs. Boxes represent nodes and arrows represent edges. a | An
                                                                                                                    Nature Reviews | Genetics
                example of a simple tree, in which each child has only one parent and the edges are directed, that is, there is a source         Non-linear measures

                (parent) and a destination (child) for each edge. b | A directed acyclic graph (DAG), in which each child can have one or
                                                                                                                                                 Validation
          Rhee et al., Nature Reviews Genetics, (2008)
                more parents. The node with multiple parents is coloured red and the additional edge is coloured grey.c | An example of
                a node, vesicle fusion, in the biological process ontology with multiple parentage. The dashed edges indicate that there         DREAM
                are other nodes not shown between the nodes and the root node (biological process). A root is a node with no incoming
                edges, and at least one leaf (also called a sink). A leaf node is a node with no outgoing edges, that is, a terminal node with
                no children (vesicle fusion). Similar to a simple tree, A DAG has directed edges and does not have cycles, that is, no path
                starts and ends at the same node, and will always have at least one root node. The depth of a node is the length of the
                longest path from the root to that node, whereas the height is the length of the longest path from that node to a leaf41.
                is_a and part_of are types of relationships that link the terms in the GO ontology. More information about the
                relationships between GO terms are found online (An Introduction to the Gene Ontology).
                                                            Hugh Shanahan                 Associative methods in Systems Biology
GO’s visualised

                                                                            Associative
                                                                            methods in
                                                                             Systems
                                                                             Biology

                                                                               Hugh
                                                                             Shanahan

                                                                           Outline

                                                                           Gene
                                                                           Ontologies
                                                                           Over-representation
                                                                           Semantic similarity

                                                                           Associative
                                                                           Measures
                                                                           Hypotheses
                                                                           Linear Correlation
                                                                           Partial Correlation
http://amigo.geneontology.org/                                             Non-linear measures

                                                                           Validation
                                                                           DREAM




                  Hugh Shanahan   Associative methods in Systems Biology
Different types of Annotation

                                                                                   Associative
                                                                                   methods in
                                                                                    Systems
                                                                                    Biology

                                                                                      Hugh
    Typically, there are three distinct ontologies                                  Shanahan

    (overwhelmingly used).                                                        Outline

    Cellular Compartment                                                          Gene
                                                                                  Ontologies
                                                                                  Over-representation
    Biological Process                                                            Semantic similarity


    Molecular Function                                                            Associative
                                                                                  Measures
                                                                                  Hypotheses
    Many other ontologies have been constructed, e.g.                             Linear Correlation
                                                                                  Partial Correlation
    Plant Organ for Arabidopsis.                                                  Non-linear measures

                                                                                  Validation
                                                                                  DREAM




                         Hugh Shanahan   Associative methods in Systems Biology
Caveat

                                                                                 Associative
                                                                                 methods in
                                                                                  Systems
                                                                                  Biology
The annotation of most genes (90%) have been carried out                            Hugh
                                                                                  Shanahan
computationally. The annotations usually work pretty well,
though they have a tendency not to be as accurate as those                      Outline

obtained by direct assay.                                                       Gene
                                                                                Ontologies
All annotated genes have an evidence code (IED)                                 Over-representation
                                                                                Semantic similarity
associated with them in order to demonstrate how much we                        Associative
can rely on it.                                                                 Measures
                                                                                Hypotheses

Increasingly, co-expression is being used as a means to                         Linear Correlation
                                                                                Partial Correlation

annotate genes, so one should be careful in not using this                      Non-linear measures


information in constructing annotations !                                       Validation
                                                                                DREAM




                       Hugh Shanahan   Associative methods in Systems Biology
Outline

                                                                                 Associative
                                                                                 methods in
1   Outline                                                                       Systems
                                                                                  Biology

2   Gene Ontologies                                                                 Hugh
                                                                                  Shanahan
      Over-representation
                                                                                Outline
      Semantic similarity
                                                                                Gene
                                                                                Ontologies
3   Associative Measures                                                        Over-representation
                                                                                Semantic similarity

      Hypotheses                                                                Associative
      Linear Correlation                                                        Measures
                                                                                Hypotheses

      Partial Correlation                                                       Linear Correlation
                                                                                Partial Correlation

      Non-linear measures                                                       Non-linear measures

                                                                                Validation
                                                                                DREAM
4   Validation
      DREAM


                       Hugh Shanahan   Associative methods in Systems Biology
Over-representation

                                                                                  Associative
                                                                                  methods in
                                                                                   Systems
One of the most useful tools to hand when one analyses                             Biology
micro-array data is to ask what functional groupings occur                           Hugh
                                                                                   Shanahan
more often than one expects.
                                                                                 Outline
    Notation
                                                                                 Gene
    N number of genes in the genome.                                             Ontologies
                                                                                 Over-representation
                                                                                 Semantic similarity
    n number of genes which have been found to be
                                                                                 Associative
    differentially expressed.                                                    Measures
                                                                                 Hypotheses

    m number of genes in the genome with a specific                               Linear Correlation
                                                                                 Partial Correlation

    annotation.                                                                  Non-linear measures

                                                                                 Validation
    k number of genes which are differentially expressed                         DREAM


    with the same annotation.


                        Hugh Shanahan   Associative methods in Systems Biology
Probabilities

                                                                                   Associative
                                                                                   methods in
                                                                                    Systems
One can derive the probability Pk that k genes would be                             Biology

found by chance amongst n genes using the                                             Hugh
                                                                                    Shanahan
hypergeometric probability distribution and the above
                                                                                  Outline
parameters.
                                                                                  Gene
For the record                                                                    Ontologies
                                                                                  Over-representation
                                                                                  Semantic similarity

                                                                                  Associative
                             m C N−m C                                            Measures
                                k     n−k
                    Pk   =        NC
                                                   ,                        (1)   Hypotheses
                                                                                  Linear Correlation
                                    n                                             Partial Correlation

                N               N!                                                Non-linear measures

                    Cm =                .                                   (2)   Validation
                             (N − n)!n!                                           DREAM




                         Hugh Shanahan   Associative methods in Systems Biology
Difficulties

                                                                                  Associative
                                                                                  methods in
                                                                                   Systems
                                                                                   Biology
    There are thousand’s of possible GO terms and one                                Hugh
    should adjust the probabilities to deal with multiple                          Shanahan

    hypotheses.                                                                  Outline

    Applying Bonferroni correction using all GO terms gives                      Gene
                                                                                 Ontologies
    a p-value of 10−7 equivalent to 1% significence.                              Over-representation
                                                                                 Semantic similarity

    Because of the structure of the GO terms these                               Associative
                                                                                 Measures
    probabilities are highly correlated with each other.                         Hypotheses
                                                                                 Linear Correlation

    In all these cases focussing on as short a list of                           Partial Correlation
                                                                                 Non-linear measures

    possible biological processes as possible will minimise                      Validation
    the above difficulties.                                                       DREAM




                        Hugh Shanahan   Associative methods in Systems Biology
Outline

                                                                                 Associative
                                                                                 methods in
1   Outline                                                                       Systems
                                                                                  Biology

2   Gene Ontologies                                                                 Hugh
                                                                                  Shanahan
      Over-representation
                                                                                Outline
      Semantic similarity
                                                                                Gene
                                                                                Ontologies
3   Associative Measures                                                        Over-representation
                                                                                Semantic similarity

      Hypotheses                                                                Associative
      Linear Correlation                                                        Measures
                                                                                Hypotheses

      Partial Correlation                                                       Linear Correlation
                                                                                Partial Correlation

      Non-linear measures                                                       Non-linear measures

                                                                                Validation
                                                                                DREAM
4   Validation
      DREAM


                       Hugh Shanahan   Associative methods in Systems Biology
What genes match

In benchmarking methods to infer interactions between                            Associative
                                                                                 methods in
gene products, we expect genes that interact to have similar                      Systems
                                                                                  Biology
GO terms, though perhaps not entirely the same.                                     Hugh
Semantic Similarity is a means to measure how similar the                         Shanahan

annotations of two genes are (0 being no similarity, 1                          Outline

meaning total similarity).                                                      Gene
                                                                                Ontologies
GO provides us with a means to do this in a quantitative                        Over-representation
                                                                                Semantic similarity
fashion.
                                                                                Associative
                                                                                Measures
                                                                                Hypotheses
                                                                                Linear Correlation
                                                                                Partial Correlation
                                                                                Non-linear measures

                                                                                Validation
                                                                                DREAM




                       Hugh Shanahan   Associative methods in Systems Biology
Simple implementation

Determine the ratio of the number of nodes two genes share                     Associative
                                                                               methods in
with the total number of nodes they have between them.                          Systems
                                                                                Biology

                                                                                  Hugh
                          #{N(G1 ) ∩ N(G2 )}                                    Shanahan
                GOsimUI =                                               (3)
                          #{N(G1 ) ∪ N(G2 )}                                  Outline
N(G1 ) being the set of nodes associated with G1 ’s                           Gene
                                                                              Ontologies
annotation.                                                                   Over-representation
                                                                              Semantic similarity

                                                                              Associative
                                                                              Measures
                                                                              Hypotheses
                                                                              Linear Correlation
                                                                              Partial Correlation
                                                                              Non-linear measures

                                                                              Validation
                                                                              DREAM




More elaborate methods are available.
                      Hugh Shanahan Associative methods in Systems Biology
Outline

                                                                                 Associative
                                                                                 methods in
1   Outline                                                                       Systems
                                                                                  Biology

2   Gene Ontologies                                                                 Hugh
                                                                                  Shanahan
      Over-representation
                                                                                Outline
      Semantic similarity
                                                                                Gene
                                                                                Ontologies
3   Associative Measures                                                        Over-representation
                                                                                Semantic similarity

      Hypotheses                                                                Associative
      Linear Correlation                                                        Measures
                                                                                Hypotheses

      Partial Correlation                                                       Linear Correlation
                                                                                Partial Correlation

      Non-linear measures                                                       Non-linear measures

                                                                                Validation
                                                                                DREAM
4   Validation
      DREAM


                       Hugh Shanahan   Associative methods in Systems Biology
Motivation

                                                                                  Associative
                                                                                  methods in
                                                                                   Systems
    Yesterday, encountered clustering.                                             Biology

                                                                                     Hugh
    Hypothesis 1 (weak) :- coexpression implies involvment                         Shanahan
    in the same process.
                                                                                 Outline
    Expand to many different experiments.                                        Gene
                                                                                 Ontologies
    Hypothesis 2 (strong) :- greater a level of association,                     Over-representation
                                                                                 Semantic similarity
    greater the chance of interaction.
                                                                                 Associative
                                                                                 Measures
    Hypothesis 2 is often referred to as “guilt by                               Hypotheses

    association”.                                                                Linear Correlation
                                                                                 Partial Correlation
                                                                                 Non-linear measures
    Association may tell us about interactions between
                                                                                 Validation
    gene products. It does not tell us anything about the                        DREAM


    regulation mechanism.


                        Hugh Shanahan   Associative methods in Systems Biology
Associative
                                                                                           methods in
                                                                                            Systems
                                                                                            Biology

                                                                                              Hugh
                                                                                            Shanahan

                                                                                          Outline

                                                                                          Gene
                                                                                          Ontologies
                                                                                          Over-representation
                                                                                          Semantic similarity

                                                                                          Associative
                                                                                          Measures
                                                                                          Hypotheses
                                                                                          Linear Correlation
http://www.arabidopsis.leeds.ac.uk/act/index.php                                          Partial Correlation
                                                                                          Non-linear measures


266841_at AT2G26150                                                                       Validation
heat shock transcription factor family protein contains Pfam profile:                     DREAM
PF00447 HSF-type DNA-binding domain
260978_at AT1G53540

17.6 kDa class I small heat shock protein



                                Hugh Shanahan    Associative methods in Systems Biology
What do we mean by association ?

                                                                                Associative
                                                                                methods in
                                                                                 Systems
Knowing something about the expression level of one gene                         Biology
(over many different experiments) means we know                                    Hugh
                                                                                 Shanahan
something about the expression level of the other.
Replotting the above                                                           Outline

                                                                               Gene
                                                                               Ontologies
                                                                               Over-representation
                                                                               Semantic similarity

                                                                               Associative
                                                                               Measures
                                                                               Hypotheses
                                                                               Linear Correlation
                                                                               Partial Correlation
                                                                               Non-linear measures

                                                                               Validation
                                                                               DREAM




                      Hugh Shanahan   Associative methods in Systems Biology
Outline

                                                                                 Associative
                                                                                 methods in
1   Outline                                                                       Systems
                                                                                  Biology

2   Gene Ontologies                                                                 Hugh
                                                                                  Shanahan
      Over-representation
                                                                                Outline
      Semantic similarity
                                                                                Gene
                                                                                Ontologies
3   Associative Measures                                                        Over-representation
                                                                                Semantic similarity

      Hypotheses                                                                Associative
      Linear Correlation                                                        Measures
                                                                                Hypotheses

      Partial Correlation                                                       Linear Correlation
                                                                                Partial Correlation

      Non-linear measures                                                       Non-linear measures

                                                                                Validation
                                                                                DREAM
4   Validation
      DREAM


                       Hugh Shanahan   Associative methods in Systems Biology
Linear Correlation
coexpression

                                                                                     Associative
                                                                                     methods in
      Simplest form of association.                                                   Systems
                                                                                      Biology
      Assume that there is a linear relationship between                                Hugh
                                                                                      Shanahan
      genes.
                                                                                    Outline
      Formally :-
                                                                                    Gene
                       y1 = a12 + c12 y2 + η12 ,                              (4)   Ontologies
                                                                                    Over-representation
                                                                                    Semantic similarity

                                                                                    Associative
           y1 , y2 are (log) expression levels                                      Measures
           η12 noise term.                                                          Hypotheses
                                                                                    Linear Correlation
           a12 , c12 parameters to be determined.                                   Partial Correlation
                                                                                    Non-linear measures

      But we’re not interested in that !                                            Validation
                                                                                    DREAM
      We are interested in asking how good a model is this
      for this pair of genes ?

                           Hugh Shanahan   Associative methods in Systems Biology
Covariance

                                                                                   Associative
                                                                                   methods in
Can estimate how good the linear model is by computing                              Systems
                                                                                    Biology

                   E((y1 − y 1 )(y2 − y 2 )) ,                                        Hugh
                                                                                    Shanahan


where y 1 , y 2 = E(y1 ), E(y2 ) are the means of y1 and y2 .                     Outline

                                                                                  Gene
    E means the expectation value of the above (think of it                       Ontologies
                                                                                  Over-representation
    for now as taking the average over all the points in the                      Semantic similarity


    previous figure).                                                              Associative
                                                                                  Measures
    Can prove to oneself (exercise) that the magnitude of                         Hypotheses
                                                                                  Linear Correlation

    the covariance is largest when y1 can be perfectly                            Partial Correlation
                                                                                  Non-linear measures

    expressed as a linear function of y2 .                                        Validation
                                                                                  DREAM
    The covariance is zero when there is no relationship at
    all between y1 and y2 .

                         Hugh Shanahan   Associative methods in Systems Biology
Associative
                                                                                                                                                                                                                                                                                              methods in
                                                                                                                                                                                                                                                                                               Systems
                                                                                                                                                                                                                                                                                               Biology

                                                                                                                                                                                                                                                                                                 Hugh
                                                                                                                   q
                                                                                                                             q


                                                                                                                                                                                                                       q
                                                                                                                                                                                                                                                                q
                                                                                                                                                                                                                                                                                               Shanahan
                                                                                                               q
                                                                                                               q                                                                                                                                    qq
                                                                                                           q
                                                                                                           q                                                                                                           q           q
     2




                                                                                                                                                2
                                                                                                    q
                                                                                                     q
                                                                                                     q                                                                             q
                                                                                                                                                                                                  q           q
                                                                                                                                                                                                                                                                                             Outline
                                                                                               qq
                                                                                                q                                                                                                     q
                                                                                                                                                                                                      q            q
                                                                                               q                                                                                                                                                    q
                                                                                           q
                                                                                           q                                                                                                                  q q                                               q
                                                                                         qq                                                                                                                    q                                                        q
                                                                                         q                                                                                                                                                 q


                                                                              q
                                                                               q
                                                                               q
                                                                                q
                                                                                q
                                                                                 q
                                                                                 q
                                                                                     q
                                                                                                                                                                  q                       qq
                                                                                                                                                                                                          q q
                                                                                                                                                                                                                   q
                                                                                                                                                                                                                           q

                                                                                                                                                                                                                                                                            q
                                                                                                                                                                                                                                                                                             Gene
     1




                                                                                                                                                1
                                                                            qq
                                                                             qq                                                                          q                    q               q                    q                            q       q
                                                                           qq                                                                                                  q                                               q                    q
                                                                           q                                                                                              q
                                                                                                                                                                                                                                                                                             Ontologies
y2




                                                                                                                                           y2
                                                                      qq
                                                                       q                                                                                                           q                                       q                        q
                                                                     q
                                                                     q                                                                                                                            q                                                                                  q
                                                                  qq                                                                                                                                  q            q                     q                  q
                                                                qq                                                                                                                                                                        q                                 q
                                                               q
                                                               q                                                                                                                   q              q                            q
                                                             qq                                                                                                                                               q                        q q

                                                      qq
                                                       q
                                                           qqq                                                                                       q
                                                                                                                                                                                                                       q
                                                                                                                                                                                                                               q        q
                                                                                                                                                                                                                                           q qq                                 q
                                                                                                                                                                                                                                                                                    q        Over-representation
                                                     qq                                                                                                                               q                                                                         q                    q   q
                                                   q
                                                   qq                                                                                                                             q           q                                             q               q                       q
                                                  q                                                                                                                                                                                                 q
                                                q
                                                 q
                                                                                                                                                                                              q
                                                                                                                                                                                                                                       q
                                                                                                                                                                                                                                                                                             Semantic similarity
     0




                                                                                                                                                0
                                               q
                                               q                                                                                                                                                              q                                             q           q
                                           q
                                           q                                                                                                                                           qq
                                         qq                                                                                                                                            q              q
                                        qq                                                                                                                                                                    q                q
                                      qq                                                                                                                                          q                            q
                                     qq                                                                                                                                                                         q                       q
                                   qq                                                                                                                                             q                                                q


                        q
                            q
                            q
                                q
                                q
                                  q

                                                                                                                                                                                          q
                                                                                                                                                                                                          q
                                                                                                                                                                                                              q
                                                                                                                                                                                                               q

                                                                                                                                                                                                                   q
                                                                                                                                                                                                                       q
                                                                                                                                                                                                                                            q
                                                                                                                                                                                                                                                                                             Associative
                    q                                                                                                                                                                                                                               q
                                                                                                                                                                                                                                                                                             Measures
     −1




                                                                                                                                                −1
                q                                                                                                                                                     q

            q                                                                                                                                                                                                                                                       q
                                                                                                                                                                                                                                                                                             Hypotheses
          0.6           0.8                1.0                   1.2                     1.4             1.6           1.8                                   −1               0                                1                                    2                                3       Linear Correlation
                                                                     y1                                                                                                                                            y1
                                                                                                                                                                                                                                                                                             Partial Correlation
                                                                                                                                                                                                                                                                                             Non-linear measures

Maximum covariance                                                                                                                         Zero covariance                                                                                                                                   Validation
                                                                                                                                                                                                                                                                                             DREAM




                                                                                                                                 Hugh Shanahan                    Associative methods in Systems Biology
Correlation

                                                                                 Associative
                                                                                 methods in
                                                                                  Systems
                                                                                  Biology

                                                                                    Hugh
We want to compare every possible pair of genes, so using                         Shanahan

the covariance is not very practical since the maximum                          Outline
covariance will vary from pair of gene to pair of gene.                         Gene
However,                                                                        Ontologies
                                                                                Over-representation
                                                                                Semantic similarity

                    E((y1 − y 1 )(y2 − y 2 ))                                   Associative
          ρ12 =                                            ,              (5)   Measures
                   E((y1 − y 1 )2 )E((y2 − y 2 )2 )                             Hypotheses
                                                                                Linear Correlation
                                                                                Partial Correlation

is bounded: −1 ≤ ρ12 ≤ 1.                                                       Non-linear measures

                                                                                Validation
                                                                                DREAM




                       Hugh Shanahan   Associative methods in Systems Biology
How well does it work ?

                                                                                Associative
                                                                                methods in
                                                                                 Systems
    Number of examples of improved functional annotation.                        Biology

    Unannotated gene which is highly correlated with gene                          Hugh
                                                                                 Shanahan
    in a known response implies it is likely to be in the
    same response.                                                             Outline

                                                                               Gene
                                                                               Ontologies
                                                                               Over-representation
                                                                               Semantic similarity

                                                                               Associative
                                                                               Measures
                                                                               Hypotheses
                                                                               Linear Correlation
                                                                               Partial Correlation
                                                                               Non-linear measures

                                                                               Validation
                                                                               DREAM




                      Hugh Shanahan   Associative methods in Systems Biology
Outline

                                                                                 Associative
                                                                                 methods in
1   Outline                                                                       Systems
                                                                                  Biology

2   Gene Ontologies                                                                 Hugh
                                                                                  Shanahan
      Over-representation
                                                                                Outline
      Semantic similarity
                                                                                Gene
                                                                                Ontologies
3   Associative Measures                                                        Over-representation
                                                                                Semantic similarity

      Hypotheses                                                                Associative
      Linear Correlation                                                        Measures
                                                                                Hypotheses

      Partial Correlation                                                       Linear Correlation
                                                                                Partial Correlation

      Non-linear measures                                                       Non-linear measures

                                                                                Validation
                                                                                DREAM
4   Validation
      DREAM


                       Hugh Shanahan   Associative methods in Systems Biology
Associative
                                                                              methods in
                                                                               Systems
                                                                               Biology

                                                                                 Hugh
                                                                               Shanahan
Difficulty : genes correlate with many other genes, not
                                                                             Outline
just a few.
                                                                             Gene
Why ?                                                                        Ontologies
                                                                             Over-representation

Suggestion : correlations do not distinguish between                         Semantic similarity

                                                                             Associative
potential direct interactions and indirect interactions                      Measures
                                                                             Hypotheses
between gene products.                                                       Linear Correlation
                                                                             Partial Correlation
                                                                             Non-linear measures

                                                                             Validation
                                                                             DREAM




                    Hugh Shanahan   Associative methods in Systems Biology
Example

                                                                                  Associative
                                                                                  methods in
                   Other interactions                                              Systems
   A                                                                               Biology

                                                                                     Hugh
                                                                                   Shanahan
              B
                                  F                                              Outline

                                                                                 Gene
                                                                                 Ontologies
                         D                                                       Over-representation

       C                                                                         Semantic similarity

                                                                                 Associative
                                                                                 Measures
                  E                                                              Hypotheses
                                                                                 Linear Correlation
                                                                                 Partial Correlation
                                                                                 Non-linear measures

   B directly interacts with three other genes, but could be                     Validation
   highly correlated with others.                                                DREAM




   C and D would be highly correlated with each other
   even though they are not directly interacting.
                        Hugh Shanahan   Associative methods in Systems Biology
Hugh Shanahan Association Talk
Hugh Shanahan Association Talk
Hugh Shanahan Association Talk
Hugh Shanahan Association Talk
Hugh Shanahan Association Talk
Hugh Shanahan Association Talk
Hugh Shanahan Association Talk
Hugh Shanahan Association Talk
Hugh Shanahan Association Talk
Hugh Shanahan Association Talk
Hugh Shanahan Association Talk
Hugh Shanahan Association Talk
Hugh Shanahan Association Talk
Hugh Shanahan Association Talk
Hugh Shanahan Association Talk
Hugh Shanahan Association Talk
Hugh Shanahan Association Talk
Hugh Shanahan Association Talk
Hugh Shanahan Association Talk
Hugh Shanahan Association Talk
Hugh Shanahan Association Talk
Hugh Shanahan Association Talk
Hugh Shanahan Association Talk
Hugh Shanahan Association Talk
Hugh Shanahan Association Talk
Hugh Shanahan Association Talk
Hugh Shanahan Association Talk
Hugh Shanahan Association Talk
Hugh Shanahan Association Talk
Hugh Shanahan Association Talk
Hugh Shanahan Association Talk
Hugh Shanahan Association Talk

Mais conteúdo relacionado

Destaque

Formato de clase 2012 4th-5th future tense
Formato de clase 2012 4th-5th future tenseFormato de clase 2012 4th-5th future tense
Formato de clase 2012 4th-5th future tenseEvelin Peña
 
Mpc productivity conference 2012
Mpc productivity conference 2012Mpc productivity conference 2012
Mpc productivity conference 2012azlipaat
 
Formato plano 6th week2_simp_pretens
Formato plano 6th week2_simp_pretensFormato plano 6th week2_simp_pretens
Formato plano 6th week2_simp_pretensEvelin Peña
 
Formato plano 6th week4
Formato plano 6th week4Formato plano 6th week4
Formato plano 6th week4Evelin Peña
 
Formato plano 9th week5_gerundas_subobj
Formato plano 9th week5_gerundas_subobjFormato plano 9th week5_gerundas_subobj
Formato plano 9th week5_gerundas_subobjEvelin Peña
 
National Championships Netball Finals 2009
National Championships Netball Finals 2009National Championships Netball Finals 2009
National Championships Netball Finals 2009guestd23dfd
 
On the importance (and absence) of annotation in Next Generation Sequencing Data
On the importance (and absence) of annotation in Next Generation Sequencing DataOn the importance (and absence) of annotation in Next Generation Sequencing Data
On the importance (and absence) of annotation in Next Generation Sequencing DataHugh Shanahan
 
Como usar twitter dentro del aula de clase
Como usar twitter dentro del aula de claseComo usar twitter dentro del aula de clase
Como usar twitter dentro del aula de claseEvelin Peña
 
Presentation unit 1 (e1)
Presentation unit 1 (e1)Presentation unit 1 (e1)
Presentation unit 1 (e1)Evelin Peña
 
Formato plano 7th week4_simpl_pasrvspastcont
Formato plano 7th week4_simpl_pasrvspastcontFormato plano 7th week4_simpl_pasrvspastcont
Formato plano 7th week4_simpl_pasrvspastcontEvelin Peña
 
Human Resources Services
Human Resources ServicesHuman Resources Services
Human Resources Servicesmjarquin
 
Galeria Rammstein Slides
Galeria Rammstein SlidesGaleria Rammstein Slides
Galeria Rammstein SlidesNATALIA LAVERDE
 
Formato plano 7h week7_compa_super
Formato plano 7h week7_compa_superFormato plano 7h week7_compa_super
Formato plano 7h week7_compa_superEvelin Peña
 
Formato plano 8th week2_verb_tenses
Formato plano 8th week2_verb_tensesFormato plano 8th week2_verb_tenses
Formato plano 8th week2_verb_tensesEvelin Peña
 
Ch 1 Basics Of Marketing
Ch 1 Basics Of Marketing Ch 1 Basics Of Marketing
Ch 1 Basics Of Marketing Sagar Patankar
 

Destaque (20)

Formato de clase 2012 4th-5th future tense
Formato de clase 2012 4th-5th future tenseFormato de clase 2012 4th-5th future tense
Formato de clase 2012 4th-5th future tense
 
Mpc productivity conference 2012
Mpc productivity conference 2012Mpc productivity conference 2012
Mpc productivity conference 2012
 
Formato plano 6th week2_simp_pretens
Formato plano 6th week2_simp_pretensFormato plano 6th week2_simp_pretens
Formato plano 6th week2_simp_pretens
 
Formato plano 6th week4
Formato plano 6th week4Formato plano 6th week4
Formato plano 6th week4
 
Formato plano 9th week5_gerundas_subobj
Formato plano 9th week5_gerundas_subobjFormato plano 9th week5_gerundas_subobj
Formato plano 9th week5_gerundas_subobj
 
Asterisk
AsteriskAsterisk
Asterisk
 
National Championships Netball Finals 2009
National Championships Netball Finals 2009National Championships Netball Finals 2009
National Championships Netball Finals 2009
 
On the importance (and absence) of annotation in Next Generation Sequencing Data
On the importance (and absence) of annotation in Next Generation Sequencing DataOn the importance (and absence) of annotation in Next Generation Sequencing Data
On the importance (and absence) of annotation in Next Generation Sequencing Data
 
Como usar twitter dentro del aula de clase
Como usar twitter dentro del aula de claseComo usar twitter dentro del aula de clase
Como usar twitter dentro del aula de clase
 
Thought leader global 16 11-12
Thought leader global 16 11-12Thought leader global 16 11-12
Thought leader global 16 11-12
 
Presentation unit 1 (e1)
Presentation unit 1 (e1)Presentation unit 1 (e1)
Presentation unit 1 (e1)
 
Owl Guide Resticted
Owl Guide RestictedOwl Guide Resticted
Owl Guide Resticted
 
Formato plano 7th week4_simpl_pasrvspastcont
Formato plano 7th week4_simpl_pasrvspastcontFormato plano 7th week4_simpl_pasrvspastcont
Formato plano 7th week4_simpl_pasrvspastcont
 
Human Resources Services
Human Resources ServicesHuman Resources Services
Human Resources Services
 
Galeria Rammstein Slides
Galeria Rammstein SlidesGaleria Rammstein Slides
Galeria Rammstein Slides
 
Formato plano 7h week7_compa_super
Formato plano 7h week7_compa_superFormato plano 7h week7_compa_super
Formato plano 7h week7_compa_super
 
Formato plano 8th week2_verb_tenses
Formato plano 8th week2_verb_tensesFormato plano 8th week2_verb_tenses
Formato plano 8th week2_verb_tenses
 
sukses bekerja
sukses bekerjasukses bekerja
sukses bekerja
 
Folio
FolioFolio
Folio
 
Ch 1 Basics Of Marketing
Ch 1 Basics Of Marketing Ch 1 Basics Of Marketing
Ch 1 Basics Of Marketing
 

Hugh Shanahan Association Talk

  • 1. Associative methods in Systems Biology Hugh Shanahan Associative methods in Systems Biology Outline Gene Ontologies Hugh Shanahan Over-representation Semantic similarity Associative Measures Department of Computer Science Hypotheses Royal Holloway, University of London Linear Correlation Partial Correlation Non-linear measures September 22, 2009 Validation DREAM Hugh Shanahan Associative methods in Systems Biology
  • 2. Outline Associative methods in 1 Outline Systems Biology 2 Gene Ontologies Hugh Shanahan Over-representation Outline Semantic similarity Gene Ontologies 3 Associative Measures Over-representation Semantic similarity Hypotheses Associative Linear Correlation Measures Hypotheses Partial Correlation Linear Correlation Partial Correlation Non-linear measures Non-linear measures Validation DREAM 4 Validation DREAM Hugh Shanahan Associative methods in Systems Biology
  • 3. Gene Ontologies Associative methods in Systems Biology Hugh Before finding interactions, need to be able Shanahan to systematically annotate all genes Outline Gene to determine which functional groupings are Ontologies Over-representation over-represented Semantic similarity measure objectively the “functional similarity” of two Associative Measures genes. Hypotheses Linear Correlation Partial Correlation Gene Ontology (GO) is a means to do this. Non-linear measures Validation DREAM Hugh Shanahan Associative methods in Systems Biology
  • 4. Ontologies Associative methods in Systems Biology Abstract method for expressing structured data. Hugh Shanahan Annotation of any gene can be expressed in terms of Outline incresingly accurate description, e.g. Gene This gene is involved in transport. Ontologies Over-representation This gene is involved in vesicle mediated Semantic similarity Associative transport. Measures Hypotheses This gene is involved in vesicle fusion. Linear Correlation Partial Correlation Genes may not have an accurate annotation, so Non-linear measures Validation definition can stop higher up in this hierarchy. DREAM Hugh Shanahan Associative methods in Systems Biology
  • 5. More complexity required Associative methods in Systems Biology Annotation is not a simple chain. Hugh A single gene can have have a very specific annotation, Shanahan which comes from two (or more) more general Outline descriptions. Gene Ontologies Different types of annotation as well: location, Over-representation Semantic similarity biochemistry, part of organism expressed in, and so on. Associative Measures An Ontology is a Directed Acyclic Graph (DAG), not a Hypotheses Linear Correlation Tree (this means a lot to Graph Theorists). Partial Correlation Non-linear measures Each node in the DAG is an annotation term. Validation DREAM Each “child” node can more than one “parent” nodes. Hugh Shanahan Associative methods in Systems Biology
  • 6. GO’s visualised Associative IEWS methods in Systems Biology a b c Hugh Biological process (root) Shanahan Outline Transport Membrane organization and biogenesis Gene asing ficity Ontologies is_a is_a or Over-representation larity Vesicle-mediated Semantic similarity Membrane fusion transport Associative part_of is_a Measures Hypotheses Vesicle fusion Linear Correlation Partial Correlation Figure 1 | Simple trees versus directed acyclic graphs. Boxes represent nodes and arrows represent edges. a | An Nature Reviews | Genetics example of a simple tree, in which each child has only one parent and the edges are directed, that is, there is a source Non-linear measures (parent) and a destination (child) for each edge. b | A directed acyclic graph (DAG), in which each child can have one or Validation Rhee et al., Nature Reviews Genetics, (2008) more parents. The node with multiple parents is coloured red and the additional edge is coloured grey.c | An example of a node, vesicle fusion, in the biological process ontology with multiple parentage. The dashed edges indicate that there DREAM are other nodes not shown between the nodes and the root node (biological process). A root is a node with no incoming edges, and at least one leaf (also called a sink). A leaf node is a node with no outgoing edges, that is, a terminal node with no children (vesicle fusion). Similar to a simple tree, A DAG has directed edges and does not have cycles, that is, no path starts and ends at the same node, and will always have at least one root node. The depth of a node is the length of the longest path from the root to that node, whereas the height is the length of the longest path from that node to a leaf41. is_a and part_of are types of relationships that link the terms in the GO ontology. More information about the relationships between GO terms are found online (An Introduction to the Gene Ontology). Hugh Shanahan Associative methods in Systems Biology
  • 7. GO’s visualised Associative methods in Systems Biology Hugh Shanahan Outline Gene Ontologies Over-representation Semantic similarity Associative Measures Hypotheses Linear Correlation Partial Correlation http://amigo.geneontology.org/ Non-linear measures Validation DREAM Hugh Shanahan Associative methods in Systems Biology
  • 8. Different types of Annotation Associative methods in Systems Biology Hugh Typically, there are three distinct ontologies Shanahan (overwhelmingly used). Outline Cellular Compartment Gene Ontologies Over-representation Biological Process Semantic similarity Molecular Function Associative Measures Hypotheses Many other ontologies have been constructed, e.g. Linear Correlation Partial Correlation Plant Organ for Arabidopsis. Non-linear measures Validation DREAM Hugh Shanahan Associative methods in Systems Biology
  • 9. Caveat Associative methods in Systems Biology The annotation of most genes (90%) have been carried out Hugh Shanahan computationally. The annotations usually work pretty well, though they have a tendency not to be as accurate as those Outline obtained by direct assay. Gene Ontologies All annotated genes have an evidence code (IED) Over-representation Semantic similarity associated with them in order to demonstrate how much we Associative can rely on it. Measures Hypotheses Increasingly, co-expression is being used as a means to Linear Correlation Partial Correlation annotate genes, so one should be careful in not using this Non-linear measures information in constructing annotations ! Validation DREAM Hugh Shanahan Associative methods in Systems Biology
  • 10. Outline Associative methods in 1 Outline Systems Biology 2 Gene Ontologies Hugh Shanahan Over-representation Outline Semantic similarity Gene Ontologies 3 Associative Measures Over-representation Semantic similarity Hypotheses Associative Linear Correlation Measures Hypotheses Partial Correlation Linear Correlation Partial Correlation Non-linear measures Non-linear measures Validation DREAM 4 Validation DREAM Hugh Shanahan Associative methods in Systems Biology
  • 11. Over-representation Associative methods in Systems One of the most useful tools to hand when one analyses Biology micro-array data is to ask what functional groupings occur Hugh Shanahan more often than one expects. Outline Notation Gene N number of genes in the genome. Ontologies Over-representation Semantic similarity n number of genes which have been found to be Associative differentially expressed. Measures Hypotheses m number of genes in the genome with a specific Linear Correlation Partial Correlation annotation. Non-linear measures Validation k number of genes which are differentially expressed DREAM with the same annotation. Hugh Shanahan Associative methods in Systems Biology
  • 12. Probabilities Associative methods in Systems One can derive the probability Pk that k genes would be Biology found by chance amongst n genes using the Hugh Shanahan hypergeometric probability distribution and the above Outline parameters. Gene For the record Ontologies Over-representation Semantic similarity Associative m C N−m C Measures k n−k Pk = NC , (1) Hypotheses Linear Correlation n Partial Correlation N N! Non-linear measures Cm = . (2) Validation (N − n)!n! DREAM Hugh Shanahan Associative methods in Systems Biology
  • 13. Difficulties Associative methods in Systems Biology There are thousand’s of possible GO terms and one Hugh should adjust the probabilities to deal with multiple Shanahan hypotheses. Outline Applying Bonferroni correction using all GO terms gives Gene Ontologies a p-value of 10−7 equivalent to 1% significence. Over-representation Semantic similarity Because of the structure of the GO terms these Associative Measures probabilities are highly correlated with each other. Hypotheses Linear Correlation In all these cases focussing on as short a list of Partial Correlation Non-linear measures possible biological processes as possible will minimise Validation the above difficulties. DREAM Hugh Shanahan Associative methods in Systems Biology
  • 14. Outline Associative methods in 1 Outline Systems Biology 2 Gene Ontologies Hugh Shanahan Over-representation Outline Semantic similarity Gene Ontologies 3 Associative Measures Over-representation Semantic similarity Hypotheses Associative Linear Correlation Measures Hypotheses Partial Correlation Linear Correlation Partial Correlation Non-linear measures Non-linear measures Validation DREAM 4 Validation DREAM Hugh Shanahan Associative methods in Systems Biology
  • 15. What genes match In benchmarking methods to infer interactions between Associative methods in gene products, we expect genes that interact to have similar Systems Biology GO terms, though perhaps not entirely the same. Hugh Semantic Similarity is a means to measure how similar the Shanahan annotations of two genes are (0 being no similarity, 1 Outline meaning total similarity). Gene Ontologies GO provides us with a means to do this in a quantitative Over-representation Semantic similarity fashion. Associative Measures Hypotheses Linear Correlation Partial Correlation Non-linear measures Validation DREAM Hugh Shanahan Associative methods in Systems Biology
  • 16. Simple implementation Determine the ratio of the number of nodes two genes share Associative methods in with the total number of nodes they have between them. Systems Biology Hugh #{N(G1 ) ∩ N(G2 )} Shanahan GOsimUI = (3) #{N(G1 ) ∪ N(G2 )} Outline N(G1 ) being the set of nodes associated with G1 ’s Gene Ontologies annotation. Over-representation Semantic similarity Associative Measures Hypotheses Linear Correlation Partial Correlation Non-linear measures Validation DREAM More elaborate methods are available. Hugh Shanahan Associative methods in Systems Biology
  • 17. Outline Associative methods in 1 Outline Systems Biology 2 Gene Ontologies Hugh Shanahan Over-representation Outline Semantic similarity Gene Ontologies 3 Associative Measures Over-representation Semantic similarity Hypotheses Associative Linear Correlation Measures Hypotheses Partial Correlation Linear Correlation Partial Correlation Non-linear measures Non-linear measures Validation DREAM 4 Validation DREAM Hugh Shanahan Associative methods in Systems Biology
  • 18. Motivation Associative methods in Systems Yesterday, encountered clustering. Biology Hugh Hypothesis 1 (weak) :- coexpression implies involvment Shanahan in the same process. Outline Expand to many different experiments. Gene Ontologies Hypothesis 2 (strong) :- greater a level of association, Over-representation Semantic similarity greater the chance of interaction. Associative Measures Hypothesis 2 is often referred to as “guilt by Hypotheses association”. Linear Correlation Partial Correlation Non-linear measures Association may tell us about interactions between Validation gene products. It does not tell us anything about the DREAM regulation mechanism. Hugh Shanahan Associative methods in Systems Biology
  • 19. Associative methods in Systems Biology Hugh Shanahan Outline Gene Ontologies Over-representation Semantic similarity Associative Measures Hypotheses Linear Correlation http://www.arabidopsis.leeds.ac.uk/act/index.php Partial Correlation Non-linear measures 266841_at AT2G26150 Validation heat shock transcription factor family protein contains Pfam profile: DREAM PF00447 HSF-type DNA-binding domain 260978_at AT1G53540 17.6 kDa class I small heat shock protein Hugh Shanahan Associative methods in Systems Biology
  • 20. What do we mean by association ? Associative methods in Systems Knowing something about the expression level of one gene Biology (over many different experiments) means we know Hugh Shanahan something about the expression level of the other. Replotting the above Outline Gene Ontologies Over-representation Semantic similarity Associative Measures Hypotheses Linear Correlation Partial Correlation Non-linear measures Validation DREAM Hugh Shanahan Associative methods in Systems Biology
  • 21. Outline Associative methods in 1 Outline Systems Biology 2 Gene Ontologies Hugh Shanahan Over-representation Outline Semantic similarity Gene Ontologies 3 Associative Measures Over-representation Semantic similarity Hypotheses Associative Linear Correlation Measures Hypotheses Partial Correlation Linear Correlation Partial Correlation Non-linear measures Non-linear measures Validation DREAM 4 Validation DREAM Hugh Shanahan Associative methods in Systems Biology
  • 22. Linear Correlation coexpression Associative methods in Simplest form of association. Systems Biology Assume that there is a linear relationship between Hugh Shanahan genes. Outline Formally :- Gene y1 = a12 + c12 y2 + η12 , (4) Ontologies Over-representation Semantic similarity Associative y1 , y2 are (log) expression levels Measures η12 noise term. Hypotheses Linear Correlation a12 , c12 parameters to be determined. Partial Correlation Non-linear measures But we’re not interested in that ! Validation DREAM We are interested in asking how good a model is this for this pair of genes ? Hugh Shanahan Associative methods in Systems Biology
  • 23. Covariance Associative methods in Can estimate how good the linear model is by computing Systems Biology E((y1 − y 1 )(y2 − y 2 )) , Hugh Shanahan where y 1 , y 2 = E(y1 ), E(y2 ) are the means of y1 and y2 . Outline Gene E means the expectation value of the above (think of it Ontologies Over-representation for now as taking the average over all the points in the Semantic similarity previous figure). Associative Measures Can prove to oneself (exercise) that the magnitude of Hypotheses Linear Correlation the covariance is largest when y1 can be perfectly Partial Correlation Non-linear measures expressed as a linear function of y2 . Validation DREAM The covariance is zero when there is no relationship at all between y1 and y2 . Hugh Shanahan Associative methods in Systems Biology
  • 24. Associative methods in Systems Biology Hugh q q q q Shanahan q q qq q q q q 2 2 q q q q q q Outline qq q q q q q q q q q q q qq q q q q q q q q q q q q q qq q q q q q Gene 1 1 qq qq q q q q q q qq q q q q q Ontologies y2 y2 qq q q q q q q q q qq q q q q qq q q q q q q q qq q q q qq q qqq q q q q q qq q q Over-representation qq q q q q q qq q q q q q q q q q q q Semantic similarity 0 0 q q q q q q q qq qq q q qq q q qq q q qq q q qq q q q q q q q q q q q q q q q Associative q q Measures −1 −1 q q q q Hypotheses 0.6 0.8 1.0 1.2 1.4 1.6 1.8 −1 0 1 2 3 Linear Correlation y1 y1 Partial Correlation Non-linear measures Maximum covariance Zero covariance Validation DREAM Hugh Shanahan Associative methods in Systems Biology
  • 25. Correlation Associative methods in Systems Biology Hugh We want to compare every possible pair of genes, so using Shanahan the covariance is not very practical since the maximum Outline covariance will vary from pair of gene to pair of gene. Gene However, Ontologies Over-representation Semantic similarity E((y1 − y 1 )(y2 − y 2 )) Associative ρ12 = , (5) Measures E((y1 − y 1 )2 )E((y2 − y 2 )2 ) Hypotheses Linear Correlation Partial Correlation is bounded: −1 ≤ ρ12 ≤ 1. Non-linear measures Validation DREAM Hugh Shanahan Associative methods in Systems Biology
  • 26. How well does it work ? Associative methods in Systems Number of examples of improved functional annotation. Biology Unannotated gene which is highly correlated with gene Hugh Shanahan in a known response implies it is likely to be in the same response. Outline Gene Ontologies Over-representation Semantic similarity Associative Measures Hypotheses Linear Correlation Partial Correlation Non-linear measures Validation DREAM Hugh Shanahan Associative methods in Systems Biology
  • 27. Outline Associative methods in 1 Outline Systems Biology 2 Gene Ontologies Hugh Shanahan Over-representation Outline Semantic similarity Gene Ontologies 3 Associative Measures Over-representation Semantic similarity Hypotheses Associative Linear Correlation Measures Hypotheses Partial Correlation Linear Correlation Partial Correlation Non-linear measures Non-linear measures Validation DREAM 4 Validation DREAM Hugh Shanahan Associative methods in Systems Biology
  • 28. Associative methods in Systems Biology Hugh Shanahan Difficulty : genes correlate with many other genes, not Outline just a few. Gene Why ? Ontologies Over-representation Suggestion : correlations do not distinguish between Semantic similarity Associative potential direct interactions and indirect interactions Measures Hypotheses between gene products. Linear Correlation Partial Correlation Non-linear measures Validation DREAM Hugh Shanahan Associative methods in Systems Biology
  • 29. Example Associative methods in Other interactions Systems A Biology Hugh Shanahan B F Outline Gene Ontologies D Over-representation C Semantic similarity Associative Measures E Hypotheses Linear Correlation Partial Correlation Non-linear measures B directly interacts with three other genes, but could be Validation highly correlated with others. DREAM C and D would be highly correlated with each other even though they are not directly interacting. Hugh Shanahan Associative methods in Systems Biology