SlideShare uma empresa Scribd logo
1 de 61
Baixar para ler offline
Mining Usage Patterns from a Repository
                of Scientific Workflows

       Claudia Diamantini, Domenico Potena, Emanuele Storti

                                  ´
                    DII, Universita Politecnica delle Marche, Ancona, Italy


                   SAC 2012, Riva del Garda, March 26-30




Emanuele Storti (UNIVPM)              Mining Usage Patterns                   SAC 2012   1 / 21
Introduction
Process mining techniques allow for extracting knowledge from event
logs, i.e. traces of running processes:
    audit trails of a workflow management system
    transaction logs of an enterprise resource planning system

Main applications:
    process discovery: what is really happening?
    conformance check: are we doing what was agreed upon?
    process extension: how can we redesign the process?




  Emanuele Storti (UNIVPM)   Mining Usage Patterns        SAC 2012   2 / 21
Introduction
Process mining techniques allow for extracting knowledge from event
logs, i.e. traces of running processes:
    audit trails of a workflow management system
    transaction logs of an enterprise resource planning system

Main applications:
    process discovery: what is really happening?
    conformance check: are we doing what was agreed upon?
    process extension: how can we redesign the process?




  Emanuele Storti (UNIVPM)   Mining Usage Patterns        SAC 2012   2 / 21
Introduction
Process mining techniques allow for extracting knowledge from event
logs, i.e. traces of running processes:
    audit trails of a workflow management system
    transaction logs of an enterprise resource planning system

Main applications:
    process discovery: what is really happening?
    conformance check: are we doing what was agreed upon?
    process extension: how can we redesign the process?




  Emanuele Storti (UNIVPM)   Mining Usage Patterns        SAC 2012   2 / 21
Motivation
Little work about how to extract knowledge from a set of models (i.e.:
process schemas)




    understanding of common patterns of usage (best/worst
    practices)
    support in process design
    process integration (from multiple sources)
    process optimization/re-engineering
    indexing of processes and their retrieval
  Emanuele Storti (UNIVPM)   Mining Usage Patterns           SAC 2012   3 / 21
Methodology


Processes schemas have an inherent graph structure: many
graph-mining tools available

Approach
 1     representation of original processes into graphs
 2     usage of graph-mining techniques to extract SUBs


Which representation form?
Which specific technique?
Graph clustering techniques (sub-processes are clusters) with a
hierarchical approach (various levels of abstractions, more/less
specificity)



     Emanuele Storti (UNIVPM)   Mining Usage Patterns     SAC 2012   4 / 21
Methodology


Processes schemas have an inherent graph structure: many
graph-mining tools available

Approach
 1     representation of original processes into graphs
 2     usage of graph-mining techniques to extract SUBs


Which representation form?
Which specific technique?
Graph clustering techniques (sub-processes are clusters) with a
hierarchical approach (various levels of abstractions, more/less
specificity)



     Emanuele Storti (UNIVPM)   Mining Usage Patterns     SAC 2012   4 / 21
Methodology


Processes schemas have an inherent graph structure: many
graph-mining tools available

Approach
 1     representation of original processes into graphs
 2     usage of graph-mining techniques to extract SUBs


Which representation form?
Which specific technique?
Graph clustering techniques (sub-processes are clusters) with a
hierarchical approach (various levels of abstractions, more/less
specificity)



     Emanuele Storti (UNIVPM)   Mining Usage Patterns     SAC 2012   4 / 21
Methodology


Processes schemas have an inherent graph structure: many
graph-mining tools available

Approach
 1     representation of original processes into graphs
 2     usage of graph-mining techniques to extract SUBs


Which representation form?
Which specific technique?
Graph clustering techniques (sub-processes are clusters) with a
hierarchical approach (various levels of abstractions, more/less
specificity)



     Emanuele Storti (UNIVPM)   Mining Usage Patterns     SAC 2012   4 / 21
SUBDUE


SUBDUE [Joyner, 2000]
    graph-based hierarchical clustering algorithm
    suited for discrete-valued and structured data
    searches for substructures (i.e., subgraphs) that best compress
    the input graph, according to MDL


Comments:
    to compress G with a SUB: to replace every occurrence of SUB in
    G with a single new node
    SUB best compresses G: num. of bits needed to represent G after
    the compression is minimum



  Emanuele Storti (UNIVPM)   Mining Usage Patterns         SAC 2012   5 / 21
SUBDUE


SUBDUE [Joyner, 2000]
    graph-based hierarchical clustering algorithm
    suited for discrete-valued and structured data
    searches for substructures (i.e., subgraphs) that best compress
    the input graph, according to MDL


Comments:
    to compress G with a SUB: to replace every occurrence of SUB
    in G with a single new node
    SUB best compresses G: num. of bits needed to represent G after
    the compression is minimum



  Emanuele Storti (UNIVPM)   Mining Usage Patterns        SAC 2012   5 / 21
SUBDUE


SUBDUE [Joyner, 2000]
    graph-based hierarchical clustering algorithm
    suited for discrete-valued and structured data
    searches for substructures (i.e., subgraphs) that best compress
    the input graph, according to MDL


Comments:
    to compress G with a SUB: to replace every occurrence of SUB in
    G with a single new node
    SUB best compresses G: num. of bits needed to represent G
    after the compression is minimum



  Emanuele Storti (UNIVPM)   Mining Usage Patterns        SAC 2012   5 / 21
SUBDUE (cont’d)

 1     searches for the best SUB (i.e., that minimizes the Description
       Length of the graph)
 2     compresses the graph by using the best SUB




Subsequent SUBs may be defined in terms of previously defined
SUBs. Iterations of this basic step results in a lattice

     Emanuele Storti (UNIVPM)   Mining Usage Patterns        SAC 2012    6 / 21
SUBDUE (cont’d)

 1     searches for the best SUB (i.e., that minimizes the Description
       Length of the graph)
 2     compresses the graph by using the best SUB




Subsequent SUBs may be defined in terms of previously defined
SUBs. Iterations of this basic step results in a lattice

     Emanuele Storti (UNIVPM)   Mining Usage Patterns        SAC 2012    6 / 21
SUBDUE (cont’d)

 1     searches for the best SUB (i.e., that minimizes the Description
       Length of the graph)
 2     compresses the graph by using the best SUB




Subsequent SUBs may be defined in terms of previously defined
SUBs. Iterations of this basic step results in a lattice

     Emanuele Storti (UNIVPM)   Mining Usage Patterns        SAC 2012    6 / 21
SUBDUE (cont’d)

 1     searches for the best SUB (i.e., that minimizes the Description
       Length of the graph)
 2     compresses the graph by using the best SUB




Subsequent SUBs may be defined in terms of previously defined
SUBs. Iterations of this basic step results in a lattice

     Emanuele Storti (UNIVPM)   Mining Usage Patterns        SAC 2012    6 / 21
SUBDUE (cont’d)

 1     searches for the best SUB (i.e., that minimizes the Description
       Length of the graph)
 2     compresses the graph by using the best SUB




Subsequent SUBs may be defined in terms of previously defined
SUBs. Iterations of this basic step results in a lattice

     Emanuele Storti (UNIVPM)   Mining Usage Patterns        SAC 2012    6 / 21
SUBDUE (cont’d)

 1     searches for the best SUB (i.e., that minimizes the Description
       Length of the graph)
 2     compresses the graph by using the best SUB




Subsequent SUBs may be defined in terms of previously defined
SUBs. Iterations of this basic step results in a lattice

     Emanuele Storti (UNIVPM)   Mining Usage Patterns        SAC 2012    6 / 21
SUBDUE (cont’d)

 1     searches for the best SUB (i.e., that minimizes the Description
       Length of the graph)
 2     compresses the graph by using the best SUB




Subsequent SUBs may be defined in terms of previously defined
SUBs. Iterations of this basic step results in a lattice

     Emanuele Storti (UNIVPM)   Mining Usage Patterns        SAC 2012    6 / 21
SUBDUE (cont’d)
Traditional cluster evaluation is not applicable to hierarchic domains
                                         interClusterDistance
                             quality =   intraClusterDistance

Desirable features of a good clustering:
    few and big clusters (large coverage, good generality)
    minimal/no overlap (better defined concepts)

New indexes
    completeness: % of original nodes/edges still present in the final
    lattice
    representativeness (of a SUB): % of input graphs holding the
    SUB at least once
    significance: qualitative measure of the meaningfulness of a
    cluster w.r.t. the domain and application

  Emanuele Storti (UNIVPM)         Mining Usage Patterns        SAC 2012   7 / 21
SUBDUE (cont’d)
Traditional cluster evaluation is not applicable to hierarchic domains
                                         interClusterDistance
                             quality =   intraClusterDistance

Desirable features of a good clustering:
    few and big clusters (large coverage, good generality)
    minimal/no overlap (better defined concepts)

New indexes
    completeness: % of original nodes/edges still present in the final
    lattice
    representativeness (of a SUB): % of input graphs holding the
    SUB at least once
    significance: qualitative measure of the meaningfulness of a
    cluster w.r.t. the domain and application

  Emanuele Storti (UNIVPM)         Mining Usage Patterns        SAC 2012   7 / 21
SUBDUE (cont’d)
Traditional cluster evaluation is not applicable to hierarchic domains
                                         interClusterDistance
                             quality =   intraClusterDistance

Desirable features of a good clustering:
    few and big clusters (large coverage, good generality)
    minimal/no overlap (better defined concepts)

New indexes
    completeness: % of original nodes/edges still present in the final
    lattice
    representativeness (of a SUB): % of input graphs holding the
    SUB at least once
    significance: qualitative measure of the meaningfulness of a
    cluster w.r.t. the domain and application

  Emanuele Storti (UNIVPM)         Mining Usage Patterns        SAC 2012   7 / 21
SUBDUE (cont’d)
Traditional cluster evaluation is not applicable to hierarchic domains
                                         interClusterDistance
                             quality =   intraClusterDistance

Desirable features of a good clustering:
    few and big clusters (large coverage, good generality)
    minimal/no overlap (better defined concepts)

New indexes
    completeness: % of original nodes/edges still present in the final
    lattice
    representativeness (of a SUB): % of input graphs holding the
    SUB at least once
    significance: qualitative measure of the meaningfulness of a
    cluster w.r.t. the domain and application

  Emanuele Storti (UNIVPM)         Mining Usage Patterns        SAC 2012   7 / 21
Representation

Approach
 1     representation of original processes into graphs
 2     usage of graph-mining techniques to extract SUBs




Common operators:
       sequence
       split-AND (parallel split), split-XOR (exclusive choice)
       join-AND (synchronization), join-XOR (simple merge)
How to map the process to the graph? Several choices
     Emanuele Storti (UNIVPM)   Mining Usage Patterns             SAC 2012   8 / 21
Representation models: A rule




   lowest level of compactness
   every operator is represented by a node
   called operator, linked to another node
   specifying its type (AND/OR/SEQ) and to
   its operands
   join/split are distinguishable by the
   number of in/out-going arcs


  Emanuele Storti (UNIVPM)    Mining Usage Patterns   SAC 2012   9 / 21
Representation models: B rule




   medium level of compactness
   the closest to the original representation
   join/split are replaced by different nodes,
   one for each kind of operator




  Emanuele Storti (UNIVPM)   Mining Usage Patterns   SAC 2012   10 / 21
Representation models: C rule




   highest level of compactness
   implicit representation of operators
   multiple alternative graphs in case of
   SPLIT-XOR or JOIN-XOR
   arcs are labeled to preserve information
   about domain/range nodes



  Emanuele Storti (UNIVPM)   Mining Usage Patterns   SAC 2012   11 / 21
Evaluation
Comments
    the three representations hold the same information, with no loss
    easy extension with more attributes (actor name/role, info about
    time/resources, ...):
            attribute name → edge
            value → a node




Need of experimentations to assess which one performs best w.r.t.
the quality of clustering.
  Emanuele Storti (UNIVPM)      Mining Usage Patterns      SAC 2012   12 / 21
Evaluation
Comments
    the three representations hold the same information, with no loss
    easy extension with more attributes (actor name/role, info about
    time/resources, ...):
            attribute name → edge
            value → a node




Need of experimentations to assess which one performs best w.r.t.
the quality of clustering.
  Emanuele Storti (UNIVPM)      Mining Usage Patterns      SAC 2012   12 / 21
Evaluation
Comments
    the three representations hold the same information, with no loss
    easy extension with more attributes (actor name/role, info about
    time/resources, ...):
            attribute name → edge
            value → a node




Need of experimentations to assess which one performs best w.r.t.
the quality of clustering.
  Emanuele Storti (UNIVPM)      Mining Usage Patterns      SAC 2012   12 / 21
my
     Experiment
Dataset:
       258 processes (XScufl language for Taverna framework)
       e-Science WF for distributed computation over scientific data
       applications: local tools, Web Services, scripts



Setting:
      WF extraction/parsing
      preprocessing
      translation into graphs (A,B,C
      rules)
      SUBDUE execution



     Emanuele Storti (UNIVPM)   Mining Usage Patterns        SAC 2012   13 / 21
Experimental results

Output lattice (fragment)




Red boxes = top level SUBs



  Emanuele Storti (UNIVPM)   Mining Usage Patterns   SAC 2012   14 / 21
Experimental results

                                                         A rule    B rule    C rule
 Size (nodes)                                             5618     2871      2414
 SUBs (num)                                                671      611       580
 top SUBs                                                2.24%    52.37%    52.93%
 Completeness                                           95.41%    90.90%    90.27%
 Representativeness (first 10 top SUBs)                  99.22%    31.78%    23.26%
 Significance of top level clusters                          –        +        ++
 Execution time (hh:mm)                                  03:38     00:06     00:03

Considerations:
    A model: low significance, high execution time
    No definitively best choice between B and C:
            B model: higher representativeness → indexing
            C model: higher significance → usage pattern discovery


  Emanuele Storti (UNIVPM)      Mining Usage Patterns                   SAC 2012   15 / 21
Experimental results

                                                         A rule    B rule    C rule
 Size (nodes)                                             5618     2871      2414
 SUBs (num)                                                671      611       580
 top SUBs                                                2.24%    52.37%    52.93%
 Completeness                                           95.41%    90.90%    90.27%
 Representativeness (first 10 top SUBs)                  99.22%    31.78%    23.26%
 Significance of top level clusters                          –        +        ++
 Execution time (hh:mm)                                  03:38     00:06     00:03

Considerations:
    A model: low significance, high execution time
    No definitively best choice between B and C:
            B model: higher representativeness → indexing
            C model: higher significance → usage pattern discovery


  Emanuele Storti (UNIVPM)      Mining Usage Patterns                   SAC 2012   15 / 21
Experimental results

                                                         A rule    B rule    C rule
 Size (nodes)                                             5618     2871      2414
 SUBs (num)                                                671      611       580
 top SUBs                                                2.24%    52.37%    52.93%
 Completeness                                           95.41%    90.90%    90.27%
 Representativeness (first 10 top SUBs)                  99.22%    31.78%    23.26%
 Significance of top level clusters                          –        +        ++
 Execution time (hh:mm)                                  03:38     00:06     00:03

Considerations:
    A model: low significance, high execution time
    No definitively best choice between B and C:
            B model: higher representativeness → indexing
            C model: higher significance → usage pattern discovery


  Emanuele Storti (UNIVPM)      Mining Usage Patterns                   SAC 2012   15 / 21
Experimental results

                                                         A rule    B rule    C rule
 Size (nodes)                                             5618     2871      2414
 SUBs (num)                                                671      611       580
 top SUBs                                                2.24%    52.37%    52.93%
 Completeness                                           95.41%    90.90%    90.27%
 Representativeness (first 10 top SUBs)                  99.22%    31.78%    23.26%
 Significance of top level clusters                          –        +        ++
 Execution time (hh:mm)                                  03:38     00:06     00:03

Considerations:
    A model: low significance, high execution time
    No definitively best choice between B and C:
            B model: higher representativeness → indexing
            C model: higher significance → usage pattern discovery


  Emanuele Storti (UNIVPM)      Mining Usage Patterns                   SAC 2012   15 / 21
Experimental results

                                                         A rule    B rule    C rule
 Size (nodes)                                             5618     2871      2414
 SUBs (num)                                                671      611       580
 top SUBs                                                2.24%    52.37%    52.93%
 Completeness                                           95.41%    90.90%    90.27%
 Representativeness (first 10 top SUBs)                  99.22%    31.78%    23.26%
 Significance of top level clusters                          –        +        ++
 Execution time (hh:mm)                                  03:38     00:06     00:03

Considerations:
    A model: low significance, high execution time
    No definitively best choice between B and C:
            B model: higher representativeness → indexing
            C model: higher significance → usage pattern discovery


  Emanuele Storti (UNIVPM)      Mining Usage Patterns                   SAC 2012   15 / 21
Experimental results

                                                         A rule    B rule    C rule
 Size (nodes)                                             5618     2871      2414
 SUBs (num)                                                671      611       580
 top SUBs                                                2.24%    52.37%    52.93%
 Completeness                                           95.41%    90.90%    90.27%
 Representativeness (first 10 top SUBs)                  99.22%    31.78%    23.26%
 Significance of top level clusters                          –        +        ++
 Execution time (hh:mm)                                  03:38     00:06     00:03

Considerations:
    A model: low significance, high execution time
    No definitively best choice between B and C:
            B model: higher representativeness → indexing
            C model: higher significance → usage pattern discovery


  Emanuele Storti (UNIVPM)      Mining Usage Patterns                   SAC 2012   15 / 21
Experimental results

                                                         A rule    B rule    C rule
 Size (nodes)                                             5618     2871      2414
 SUBs (num)                                                671      611       580
 top SUBs                                                2.24%    52.37%    52.93%
 Completeness                                           95.41%    90.90%    90.27%
 Representativeness (first 10 top SUBs)                  99.22%    31.78%    23.26%
 Significance of top level clusters                          –        +        ++
 Execution time (hh:mm)                                  03:38     00:06     00:03

Considerations:
    A model: low significance, high execution time
    No definitively best choice between B and C:
            B model: higher representativeness → indexing
            C model: higher significance → usage pattern discovery


  Emanuele Storti (UNIVPM)      Mining Usage Patterns                   SAC 2012   15 / 21
Experimental results

                                                         A rule    B rule    C rule
 Size (nodes)                                             5618     2871      2414
 SUBs (num)                                                671      611       580
 top SUBs                                                2.24%    52.37%    52.93%
 Completeness                                           95.41%    90.90%    90.27%
 Representativeness (first 10 top SUBs)                  99.22%    31.78%    23.26%
 Significance of top level clusters                          –        +        ++
 Execution time (hh:mm)                                  03:38     00:06     00:03

Considerations:
    A model: low significance, high execution time
    No definitively best choice between B and C:
            B model: higher representativeness → indexing
            C model: higher significance → usage pattern discovery


  Emanuele Storti (UNIVPM)      Mining Usage Patterns                   SAC 2012   15 / 21
Use Case


Scenario: a scientist wants to use web service X for DNA alignment

Baseline: browse my Experiment to find every workflow with X:
       too many results
       given a workflow: common practice or exception?

New approach: search for X in the lattice’s SUBs:
 1     find all usage patterns for X (i.e., top SUBs containing X)
 2     analyse their representativeness
 3     choose one of them or browse more in depth the lattice




     Emanuele Storti (UNIVPM)   Mining Usage Patterns          SAC 2012   16 / 21
Use Case


Scenario: a scientist wants to use web service X for DNA alignment

Baseline: browse my Experiment to find every workflow with X:
       too many results
       given a workflow: common practice or exception?

New approach: search for X in the lattice’s SUBs:
 1     find all usage patterns for X (i.e., top SUBs containing X)
 2     analyse their representativeness
 3     choose one of them or browse more in depth the lattice




     Emanuele Storti (UNIVPM)   Mining Usage Patterns          SAC 2012   16 / 21
Use Case


Scenario: a scientist wants to use web service X for DNA alignment

Baseline: browse my Experiment to find every workflow with X:
       too many results
       given a workflow: common practice or exception?

New approach: search for X in the lattice’s SUBs:
 1     find all usage patterns for X (i.e., top SUBs containing X)
 2     analyse their representativeness
 3     choose one of them or browse more in depth the lattice




     Emanuele Storti (UNIVPM)   Mining Usage Patterns          SAC 2012   16 / 21
Use Case


Scenario: a scientist wants to use web service X for DNA alignment

Baseline: browse my Experiment to find every workflow with X:
       too many results
       given a workflow: common practice or exception?

New approach: search for X in the lattice’s SUBs:
 1     find all usage patterns for X (i.e., top SUBs containing X)
 2     analyse their representativeness
 3     choose one of them or browse more in depth the lattice




     Emanuele Storti (UNIVPM)   Mining Usage Patterns          SAC 2012   16 / 21
Use Case


Scenario: a scientist wants to use web service X for DNA alignment

Baseline: browse my Experiment to find every workflow with X:
       too many results
       given a workflow: common practice or exception?

New approach: search for X in the lattice’s SUBs:
 1     find all usage patterns for X (i.e., top SUBs containing X)
 2     analyse their representativeness
 3     choose one of them or browse more in depth the lattice




     Emanuele Storti (UNIVPM)   Mining Usage Patterns          SAC 2012   16 / 21
Use Case


Scenario: a scientist wants to use web service X for DNA alignment

Baseline: browse my Experiment to find every workflow with X:
       too many results
       given a workflow: common practice or exception?

New approach: search for X in the lattice’s SUBs:
 1     find all usage patterns for X (i.e., top SUBs containing X)
 2     analyse their representativeness
 3     choose one of them or browse more in depth the lattice




     Emanuele Storti (UNIVPM)   Mining Usage Patterns          SAC 2012   16 / 21
Use case (cont’d)




Applications:
    reference during process design (learn-by-example)
    process retrieval: find every my Experiment WF containing such a
    pattern

  Emanuele Storti (UNIVPM)   Mining Usage Patterns       SAC 2012   17 / 21
Use case (cont’d)




Applications:
    reference during process design (learn-by-example)
    process retrieval: find every my Experiment WF containing such a
    pattern

  Emanuele Storti (UNIVPM)   Mining Usage Patterns       SAC 2012   17 / 21
Discussion



A graph-based clustering technique (SUBDUE) to recognize the most
frequent/common subprocesses among a set of workflows/process
schemas.

Comments
    several representation alternatives for application of SUBDUE
    evaluations on specialized e-Science domain
    useful to find typical patterns and schemas of usage for tools/tasks




  Emanuele Storti (UNIVPM)   Mining Usage Patterns         SAC 2012   18 / 21
Conclusion

Applications:
    recognition of reference processes (common/best/worst
    practices)
    organize a process repository (by indexing processes through
    substructures)
    enterprise integration: find similarities (differences, overlapping,
    complementariness) in BP in different companies


Future work:
    extending experimentations, especially with (real) Business
    Processes
    management of heterogeneity (syntactic/semantic, level of
    granularity)

  Emanuele Storti (UNIVPM)   Mining Usage Patterns          SAC 2012   19 / 21
Mining Usage Patterns from a Repository
                of Scientific Workflows

       Claudia Diamantini, Domenico Potena, Emanuele Storti

                                  ´
                    DII, Universita Politecnica delle Marche, Ancona, Italy


                   SAC 2012, Riva del Garda, March 26-30




Emanuele Storti (UNIVPM)              Mining Usage Patterns                   SAC 2012   20 / 21
SUBDUE
Subdue uses a variant of beam search (heuristic)
Best SUB minimizes the Description Length of the graph

 1     Search the best SUB
               Init: set of SUBs consisting of all uniquely labeled vertices
               extend each SUB in all possible ways by a single edge and a vertex
               order SUBs according to MDL principle
               keep only the top n SUBs
               End: exhaustion of search space or user constraints




     Emanuele Storti (UNIVPM)       Mining Usage Patterns           SAC 2012   21 / 21
SUBDUE
Subdue uses a variant of beam search (heuristic)
Best SUB minimizes the Description Length of the graph

 1     Search the best SUB
               Init: set of SUBs consisting of all uniquely labeled vertices
               extend each SUB in all possible ways by a single edge and a vertex
               order SUBs according to MDL principle
               keep only the top n SUBs
               End: exhaustion of search space or user constraints




     Emanuele Storti (UNIVPM)       Mining Usage Patterns           SAC 2012   21 / 21
SUBDUE
Subdue uses a variant of beam search (heuristic)
Best SUB minimizes the Description Length of the graph

 1     Search the best SUB
               Init: set of SUBs consisting of all uniquely labeled vertices
               extend each SUB in all possible ways by a single edge and a vertex
               order SUBs according to MDL principle
               keep only the top n SUBs
               End: exhaustion of search space or user constraints




     Emanuele Storti (UNIVPM)       Mining Usage Patterns           SAC 2012   21 / 21
SUBDUE
Subdue uses a variant of beam search (heuristic)
Best SUB minimizes the Description Length of the graph

 1     Search the best SUB
               Init: set of SUBs consisting of all uniquely labeled vertices
               extend each SUB in all possible ways by a single edge and a vertex
               order SUBs according to MDL principle
               keep only the top n SUBs
               End: exhaustion of search space or user constraints




     Emanuele Storti (UNIVPM)       Mining Usage Patterns           SAC 2012   21 / 21
SUBDUE
Subdue uses a variant of beam search (heuristic)
Best SUB minimizes the Description Length of the graph

 1     Search the best SUB
               Init: set of SUBs consisting of all uniquely labeled vertices
               extend each SUB in all possible ways by a single edge and a vertex
               order SUBs according to MDL principle
               keep only the top n SUBs
               End: exhaustion of search space or user constraints




     Emanuele Storti (UNIVPM)       Mining Usage Patterns           SAC 2012   21 / 21
SUBDUE
Subdue uses a variant of beam search (heuristic)
Best SUB minimizes the Description Length of the graph

 1     Search the best SUB
               Init: set of SUBs consisting of all uniquely labeled vertices
               extend each SUB in all possible ways by a single edge and a vertex
               order SUBs according to MDL principle
               keep only the top n SUBs
               End: exhaustion of search space or user constraints




     Emanuele Storti (UNIVPM)       Mining Usage Patterns           SAC 2012   21 / 21
SUBDUE
Subdue uses a variant of beam search (heuristic)
Best SUB minimizes the Description Length of the graph

 1     Search the best SUB
               Init: set of SUBs consisting of all uniquely labeled vertices
               extend each SUB in all possible ways by a single edge and a vertex
               order SUBs according to MDL principle
               keep only the top n SUBs
               End: exhaustion of search space or user constraints




     Emanuele Storti (UNIVPM)       Mining Usage Patterns           SAC 2012   21 / 21
SUBDUE
Subdue uses a variant of beam search (heuristic)
Best SUB minimizes the Description Length of the graph

 1     Search the best SUB
               Init: set of SUBs consisting of all uniquely labeled vertices
               extend each SUB in all possible ways by a single edge and a vertex
               order SUBs according to MDL principle
               keep only the top n SUBs
               End: exhaustion of search space or user constraints

 2     Compress the graph with the best SUB




     Emanuele Storti (UNIVPM)       Mining Usage Patterns           SAC 2012   21 / 21
SUBDUE
Subdue uses a variant of beam search (heuristic)
Best SUB minimizes the Description Length of the graph

 1     Search the best SUB
               Init: set of SUBs consisting of all uniquely labeled vertices
               extend each SUB in all possible ways by a single edge and a vertex
               order SUBs according to MDL principle
               keep only the top n SUBs
               End: exhaustion of search space or user constraints

 2     Compress the graph with the best SUB




     Emanuele Storti (UNIVPM)       Mining Usage Patterns           SAC 2012   21 / 21
SUBDUE

Subdue uses a variant of beam search (heuristic)
Best SUB minimizes the Description Length of the graph

  1     Search the best SUB
                Init: set of SUBs consisting of all uniquely labeled vertices
                extend each SUB in all possible ways by a single edge and a vertex
                order SUBs according to MDL principle
                keep only the top n SUBs
                End: exhaustion of search space or user constraints

  2     Compress the graph with the best SUB


Subsequent SUBs may be defined in terms of previously defined SUBs
Iterations of this basic step results in a lattice

      Emanuele Storti (UNIVPM)       Mining Usage Patterns           SAC 2012   21 / 21

Mais conteúdo relacionado

Destaque

Promo Gadget Natale 2011
Promo Gadget Natale 2011Promo Gadget Natale 2011
Promo Gadget Natale 2011e.20 srl
 
Merchandising 2012 - LA VOSTRA IMMAGINE, IL NOSTRO OBIETTIVO
Merchandising 2012 - LA VOSTRA IMMAGINE, IL NOSTRO OBIETTIVOMerchandising 2012 - LA VOSTRA IMMAGINE, IL NOSTRO OBIETTIVO
Merchandising 2012 - LA VOSTRA IMMAGINE, IL NOSTRO OBIETTIVOe.20 srl
 
Fiere 2012 - LA VOSTRA IMMAGINE, IL NOSTRO OBIETTIVO
Fiere 2012 - LA VOSTRA IMMAGINE, IL NOSTRO OBIETTIVOFiere 2012 - LA VOSTRA IMMAGINE, IL NOSTRO OBIETTIVO
Fiere 2012 - LA VOSTRA IMMAGINE, IL NOSTRO OBIETTIVOe.20 srl
 
E20 Brochure 2010
E20 Brochure 2010E20 Brochure 2010
E20 Brochure 2010e.20 srl
 
Presentazione Golf e.20 srl
Presentazione Golf  e.20 srl Presentazione Golf  e.20 srl
Presentazione Golf e.20 srl e.20 srl
 

Destaque (6)

Presentation - Recreating Treasures
Presentation - Recreating TreasuresPresentation - Recreating Treasures
Presentation - Recreating Treasures
 
Promo Gadget Natale 2011
Promo Gadget Natale 2011Promo Gadget Natale 2011
Promo Gadget Natale 2011
 
Merchandising 2012 - LA VOSTRA IMMAGINE, IL NOSTRO OBIETTIVO
Merchandising 2012 - LA VOSTRA IMMAGINE, IL NOSTRO OBIETTIVOMerchandising 2012 - LA VOSTRA IMMAGINE, IL NOSTRO OBIETTIVO
Merchandising 2012 - LA VOSTRA IMMAGINE, IL NOSTRO OBIETTIVO
 
Fiere 2012 - LA VOSTRA IMMAGINE, IL NOSTRO OBIETTIVO
Fiere 2012 - LA VOSTRA IMMAGINE, IL NOSTRO OBIETTIVOFiere 2012 - LA VOSTRA IMMAGINE, IL NOSTRO OBIETTIVO
Fiere 2012 - LA VOSTRA IMMAGINE, IL NOSTRO OBIETTIVO
 
E20 Brochure 2010
E20 Brochure 2010E20 Brochure 2010
E20 Brochure 2010
 
Presentazione Golf e.20 srl
Presentazione Golf  e.20 srl Presentazione Golf  e.20 srl
Presentazione Golf e.20 srl
 

Semelhante a Mining Usage Patterns from a Repository of Scientific Workflow

Compositional Blocks for Optimal Self-Healing Gradients
Compositional Blocks for Optimal Self-Healing GradientsCompositional Blocks for Optimal Self-Healing Gradients
Compositional Blocks for Optimal Self-Healing GradientsRoberto Casadei
 
Visualizing UML’s Sequence and Class Diagrams Using Graph-Based Clusters
Visualizing UML’s Sequence and   Class Diagrams Using Graph-Based Clusters  Visualizing UML’s Sequence and   Class Diagrams Using Graph-Based Clusters
Visualizing UML’s Sequence and Class Diagrams Using Graph-Based Clusters Nakul Sharma
 
Woop - Workflow Optimizer
Woop - Workflow OptimizerWoop - Workflow Optimizer
Woop - Workflow OptimizerMartin Homik
 
Complex Background Subtraction Using Kalman Filter
Complex Background Subtraction Using Kalman FilterComplex Background Subtraction Using Kalman Filter
Complex Background Subtraction Using Kalman FilterIJERA Editor
 
Easy edd phd talks 28 oct 2008
Easy edd phd talks 28 oct 2008Easy edd phd talks 28 oct 2008
Easy edd phd talks 28 oct 2008Taha Sochi
 
Generic Model-based Approaches for Software Reverse Engineering and Comprehen...
Generic Model-based Approaches for Software Reverse Engineering and Comprehen...Generic Model-based Approaches for Software Reverse Engineering and Comprehen...
Generic Model-based Approaches for Software Reverse Engineering and Comprehen...Hugo Bruneliere
 
[CVPR2020] Simple but effective image enhancement techniques
[CVPR2020] Simple but effective image enhancement techniques[CVPR2020] Simple but effective image enhancement techniques
[CVPR2020] Simple but effective image enhancement techniquesJaeJun Yoo
 
FPGA IMPLEMENTATION OF EFFICIENT VLSI ARCHITECTURE FOR FIXED POINT 1-D DWT US...
FPGA IMPLEMENTATION OF EFFICIENT VLSI ARCHITECTURE FOR FIXED POINT 1-D DWT US...FPGA IMPLEMENTATION OF EFFICIENT VLSI ARCHITECTURE FOR FIXED POINT 1-D DWT US...
FPGA IMPLEMENTATION OF EFFICIENT VLSI ARCHITECTURE FOR FIXED POINT 1-D DWT US...VLSICS Design
 
IRJET- Advanced Control Strategies for Mold Level Process
IRJET- Advanced Control Strategies for Mold Level ProcessIRJET- Advanced Control Strategies for Mold Level Process
IRJET- Advanced Control Strategies for Mold Level ProcessIRJET Journal
 
A Detailed Survey on VLSI Architectures for Lifting based DWT for efficient h...
A Detailed Survey on VLSI Architectures for Lifting based DWT for efficient h...A Detailed Survey on VLSI Architectures for Lifting based DWT for efficient h...
A Detailed Survey on VLSI Architectures for Lifting based DWT for efficient h...VLSICS Design
 
5.1 Sustainability Oriented System Design Methods And Tools Vezzoli 07 08 (28...
5.1 Sustainability Oriented System Design Methods And Tools Vezzoli 07 08 (28...5.1 Sustainability Oriented System Design Methods And Tools Vezzoli 07 08 (28...
5.1 Sustainability Oriented System Design Methods And Tools Vezzoli 07 08 (28...vezzoli
 
IRJET - Conversion of Ancient Tamil Characters to Modern Tamil Characters
IRJET -  	  Conversion of Ancient Tamil Characters to Modern Tamil CharactersIRJET -  	  Conversion of Ancient Tamil Characters to Modern Tamil Characters
IRJET - Conversion of Ancient Tamil Characters to Modern Tamil CharactersIRJET Journal
 

Semelhante a Mining Usage Patterns from a Repository of Scientific Workflow (20)

NX QUESTIONS
NX QUESTIONSNX QUESTIONS
NX QUESTIONS
 
Bf36342346
Bf36342346Bf36342346
Bf36342346
 
Compositional Blocks for Optimal Self-Healing Gradients
Compositional Blocks for Optimal Self-Healing GradientsCompositional Blocks for Optimal Self-Healing Gradients
Compositional Blocks for Optimal Self-Healing Gradients
 
Visualizing UML’s Sequence and Class Diagrams Using Graph-Based Clusters
Visualizing UML’s Sequence and   Class Diagrams Using Graph-Based Clusters  Visualizing UML’s Sequence and   Class Diagrams Using Graph-Based Clusters
Visualizing UML’s Sequence and Class Diagrams Using Graph-Based Clusters
 
Woop - Workflow Optimizer
Woop - Workflow OptimizerWoop - Workflow Optimizer
Woop - Workflow Optimizer
 
Complex Background Subtraction Using Kalman Filter
Complex Background Subtraction Using Kalman FilterComplex Background Subtraction Using Kalman Filter
Complex Background Subtraction Using Kalman Filter
 
N046018089
N046018089N046018089
N046018089
 
Easy edd phd talks 28 oct 2008
Easy edd phd talks 28 oct 2008Easy edd phd talks 28 oct 2008
Easy edd phd talks 28 oct 2008
 
CSMR10a.ppt
CSMR10a.pptCSMR10a.ppt
CSMR10a.ppt
 
Generic Model-based Approaches for Software Reverse Engineering and Comprehen...
Generic Model-based Approaches for Software Reverse Engineering and Comprehen...Generic Model-based Approaches for Software Reverse Engineering and Comprehen...
Generic Model-based Approaches for Software Reverse Engineering and Comprehen...
 
Pass design
Pass designPass design
Pass design
 
Usability Studies in Virtual and Traditional Computer Aided Design Environmen...
Usability Studies in Virtual and Traditional Computer Aided Design Environmen...Usability Studies in Virtual and Traditional Computer Aided Design Environmen...
Usability Studies in Virtual and Traditional Computer Aided Design Environmen...
 
[CVPR2020] Simple but effective image enhancement techniques
[CVPR2020] Simple but effective image enhancement techniques[CVPR2020] Simple but effective image enhancement techniques
[CVPR2020] Simple but effective image enhancement techniques
 
FPGA IMPLEMENTATION OF EFFICIENT VLSI ARCHITECTURE FOR FIXED POINT 1-D DWT US...
FPGA IMPLEMENTATION OF EFFICIENT VLSI ARCHITECTURE FOR FIXED POINT 1-D DWT US...FPGA IMPLEMENTATION OF EFFICIENT VLSI ARCHITECTURE FOR FIXED POINT 1-D DWT US...
FPGA IMPLEMENTATION OF EFFICIENT VLSI ARCHITECTURE FOR FIXED POINT 1-D DWT US...
 
Distributed Streams
Distributed StreamsDistributed Streams
Distributed Streams
 
Adaptive13 test
Adaptive13 testAdaptive13 test
Adaptive13 test
 
IRJET- Advanced Control Strategies for Mold Level Process
IRJET- Advanced Control Strategies for Mold Level ProcessIRJET- Advanced Control Strategies for Mold Level Process
IRJET- Advanced Control Strategies for Mold Level Process
 
A Detailed Survey on VLSI Architectures for Lifting based DWT for efficient h...
A Detailed Survey on VLSI Architectures for Lifting based DWT for efficient h...A Detailed Survey on VLSI Architectures for Lifting based DWT for efficient h...
A Detailed Survey on VLSI Architectures for Lifting based DWT for efficient h...
 
5.1 Sustainability Oriented System Design Methods And Tools Vezzoli 07 08 (28...
5.1 Sustainability Oriented System Design Methods And Tools Vezzoli 07 08 (28...5.1 Sustainability Oriented System Design Methods And Tools Vezzoli 07 08 (28...
5.1 Sustainability Oriented System Design Methods And Tools Vezzoli 07 08 (28...
 
IRJET - Conversion of Ancient Tamil Characters to Modern Tamil Characters
IRJET -  	  Conversion of Ancient Tamil Characters to Modern Tamil CharactersIRJET -  	  Conversion of Ancient Tamil Characters to Modern Tamil Characters
IRJET - Conversion of Ancient Tamil Characters to Modern Tamil Characters
 

Último

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 

Último (20)

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

Mining Usage Patterns from a Repository of Scientific Workflow

  • 1. Mining Usage Patterns from a Repository of Scientific Workflows Claudia Diamantini, Domenico Potena, Emanuele Storti ´ DII, Universita Politecnica delle Marche, Ancona, Italy SAC 2012, Riva del Garda, March 26-30 Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 1 / 21
  • 2. Introduction Process mining techniques allow for extracting knowledge from event logs, i.e. traces of running processes: audit trails of a workflow management system transaction logs of an enterprise resource planning system Main applications: process discovery: what is really happening? conformance check: are we doing what was agreed upon? process extension: how can we redesign the process? Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 2 / 21
  • 3. Introduction Process mining techniques allow for extracting knowledge from event logs, i.e. traces of running processes: audit trails of a workflow management system transaction logs of an enterprise resource planning system Main applications: process discovery: what is really happening? conformance check: are we doing what was agreed upon? process extension: how can we redesign the process? Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 2 / 21
  • 4. Introduction Process mining techniques allow for extracting knowledge from event logs, i.e. traces of running processes: audit trails of a workflow management system transaction logs of an enterprise resource planning system Main applications: process discovery: what is really happening? conformance check: are we doing what was agreed upon? process extension: how can we redesign the process? Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 2 / 21
  • 5. Motivation Little work about how to extract knowledge from a set of models (i.e.: process schemas) understanding of common patterns of usage (best/worst practices) support in process design process integration (from multiple sources) process optimization/re-engineering indexing of processes and their retrieval Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 3 / 21
  • 6. Methodology Processes schemas have an inherent graph structure: many graph-mining tools available Approach 1 representation of original processes into graphs 2 usage of graph-mining techniques to extract SUBs Which representation form? Which specific technique? Graph clustering techniques (sub-processes are clusters) with a hierarchical approach (various levels of abstractions, more/less specificity) Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 4 / 21
  • 7. Methodology Processes schemas have an inherent graph structure: many graph-mining tools available Approach 1 representation of original processes into graphs 2 usage of graph-mining techniques to extract SUBs Which representation form? Which specific technique? Graph clustering techniques (sub-processes are clusters) with a hierarchical approach (various levels of abstractions, more/less specificity) Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 4 / 21
  • 8. Methodology Processes schemas have an inherent graph structure: many graph-mining tools available Approach 1 representation of original processes into graphs 2 usage of graph-mining techniques to extract SUBs Which representation form? Which specific technique? Graph clustering techniques (sub-processes are clusters) with a hierarchical approach (various levels of abstractions, more/less specificity) Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 4 / 21
  • 9. Methodology Processes schemas have an inherent graph structure: many graph-mining tools available Approach 1 representation of original processes into graphs 2 usage of graph-mining techniques to extract SUBs Which representation form? Which specific technique? Graph clustering techniques (sub-processes are clusters) with a hierarchical approach (various levels of abstractions, more/less specificity) Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 4 / 21
  • 10. SUBDUE SUBDUE [Joyner, 2000] graph-based hierarchical clustering algorithm suited for discrete-valued and structured data searches for substructures (i.e., subgraphs) that best compress the input graph, according to MDL Comments: to compress G with a SUB: to replace every occurrence of SUB in G with a single new node SUB best compresses G: num. of bits needed to represent G after the compression is minimum Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 5 / 21
  • 11. SUBDUE SUBDUE [Joyner, 2000] graph-based hierarchical clustering algorithm suited for discrete-valued and structured data searches for substructures (i.e., subgraphs) that best compress the input graph, according to MDL Comments: to compress G with a SUB: to replace every occurrence of SUB in G with a single new node SUB best compresses G: num. of bits needed to represent G after the compression is minimum Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 5 / 21
  • 12. SUBDUE SUBDUE [Joyner, 2000] graph-based hierarchical clustering algorithm suited for discrete-valued and structured data searches for substructures (i.e., subgraphs) that best compress the input graph, according to MDL Comments: to compress G with a SUB: to replace every occurrence of SUB in G with a single new node SUB best compresses G: num. of bits needed to represent G after the compression is minimum Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 5 / 21
  • 13. SUBDUE (cont’d) 1 searches for the best SUB (i.e., that minimizes the Description Length of the graph) 2 compresses the graph by using the best SUB Subsequent SUBs may be defined in terms of previously defined SUBs. Iterations of this basic step results in a lattice Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 6 / 21
  • 14. SUBDUE (cont’d) 1 searches for the best SUB (i.e., that minimizes the Description Length of the graph) 2 compresses the graph by using the best SUB Subsequent SUBs may be defined in terms of previously defined SUBs. Iterations of this basic step results in a lattice Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 6 / 21
  • 15. SUBDUE (cont’d) 1 searches for the best SUB (i.e., that minimizes the Description Length of the graph) 2 compresses the graph by using the best SUB Subsequent SUBs may be defined in terms of previously defined SUBs. Iterations of this basic step results in a lattice Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 6 / 21
  • 16. SUBDUE (cont’d) 1 searches for the best SUB (i.e., that minimizes the Description Length of the graph) 2 compresses the graph by using the best SUB Subsequent SUBs may be defined in terms of previously defined SUBs. Iterations of this basic step results in a lattice Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 6 / 21
  • 17. SUBDUE (cont’d) 1 searches for the best SUB (i.e., that minimizes the Description Length of the graph) 2 compresses the graph by using the best SUB Subsequent SUBs may be defined in terms of previously defined SUBs. Iterations of this basic step results in a lattice Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 6 / 21
  • 18. SUBDUE (cont’d) 1 searches for the best SUB (i.e., that minimizes the Description Length of the graph) 2 compresses the graph by using the best SUB Subsequent SUBs may be defined in terms of previously defined SUBs. Iterations of this basic step results in a lattice Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 6 / 21
  • 19. SUBDUE (cont’d) 1 searches for the best SUB (i.e., that minimizes the Description Length of the graph) 2 compresses the graph by using the best SUB Subsequent SUBs may be defined in terms of previously defined SUBs. Iterations of this basic step results in a lattice Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 6 / 21
  • 20. SUBDUE (cont’d) Traditional cluster evaluation is not applicable to hierarchic domains interClusterDistance quality = intraClusterDistance Desirable features of a good clustering: few and big clusters (large coverage, good generality) minimal/no overlap (better defined concepts) New indexes completeness: % of original nodes/edges still present in the final lattice representativeness (of a SUB): % of input graphs holding the SUB at least once significance: qualitative measure of the meaningfulness of a cluster w.r.t. the domain and application Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 7 / 21
  • 21. SUBDUE (cont’d) Traditional cluster evaluation is not applicable to hierarchic domains interClusterDistance quality = intraClusterDistance Desirable features of a good clustering: few and big clusters (large coverage, good generality) minimal/no overlap (better defined concepts) New indexes completeness: % of original nodes/edges still present in the final lattice representativeness (of a SUB): % of input graphs holding the SUB at least once significance: qualitative measure of the meaningfulness of a cluster w.r.t. the domain and application Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 7 / 21
  • 22. SUBDUE (cont’d) Traditional cluster evaluation is not applicable to hierarchic domains interClusterDistance quality = intraClusterDistance Desirable features of a good clustering: few and big clusters (large coverage, good generality) minimal/no overlap (better defined concepts) New indexes completeness: % of original nodes/edges still present in the final lattice representativeness (of a SUB): % of input graphs holding the SUB at least once significance: qualitative measure of the meaningfulness of a cluster w.r.t. the domain and application Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 7 / 21
  • 23. SUBDUE (cont’d) Traditional cluster evaluation is not applicable to hierarchic domains interClusterDistance quality = intraClusterDistance Desirable features of a good clustering: few and big clusters (large coverage, good generality) minimal/no overlap (better defined concepts) New indexes completeness: % of original nodes/edges still present in the final lattice representativeness (of a SUB): % of input graphs holding the SUB at least once significance: qualitative measure of the meaningfulness of a cluster w.r.t. the domain and application Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 7 / 21
  • 24. Representation Approach 1 representation of original processes into graphs 2 usage of graph-mining techniques to extract SUBs Common operators: sequence split-AND (parallel split), split-XOR (exclusive choice) join-AND (synchronization), join-XOR (simple merge) How to map the process to the graph? Several choices Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 8 / 21
  • 25. Representation models: A rule lowest level of compactness every operator is represented by a node called operator, linked to another node specifying its type (AND/OR/SEQ) and to its operands join/split are distinguishable by the number of in/out-going arcs Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 9 / 21
  • 26. Representation models: B rule medium level of compactness the closest to the original representation join/split are replaced by different nodes, one for each kind of operator Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 10 / 21
  • 27. Representation models: C rule highest level of compactness implicit representation of operators multiple alternative graphs in case of SPLIT-XOR or JOIN-XOR arcs are labeled to preserve information about domain/range nodes Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 11 / 21
  • 28. Evaluation Comments the three representations hold the same information, with no loss easy extension with more attributes (actor name/role, info about time/resources, ...): attribute name → edge value → a node Need of experimentations to assess which one performs best w.r.t. the quality of clustering. Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 12 / 21
  • 29. Evaluation Comments the three representations hold the same information, with no loss easy extension with more attributes (actor name/role, info about time/resources, ...): attribute name → edge value → a node Need of experimentations to assess which one performs best w.r.t. the quality of clustering. Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 12 / 21
  • 30. Evaluation Comments the three representations hold the same information, with no loss easy extension with more attributes (actor name/role, info about time/resources, ...): attribute name → edge value → a node Need of experimentations to assess which one performs best w.r.t. the quality of clustering. Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 12 / 21
  • 31. my Experiment Dataset: 258 processes (XScufl language for Taverna framework) e-Science WF for distributed computation over scientific data applications: local tools, Web Services, scripts Setting: WF extraction/parsing preprocessing translation into graphs (A,B,C rules) SUBDUE execution Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 13 / 21
  • 32. Experimental results Output lattice (fragment) Red boxes = top level SUBs Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 14 / 21
  • 33. Experimental results A rule B rule C rule Size (nodes) 5618 2871 2414 SUBs (num) 671 611 580 top SUBs 2.24% 52.37% 52.93% Completeness 95.41% 90.90% 90.27% Representativeness (first 10 top SUBs) 99.22% 31.78% 23.26% Significance of top level clusters – + ++ Execution time (hh:mm) 03:38 00:06 00:03 Considerations: A model: low significance, high execution time No definitively best choice between B and C: B model: higher representativeness → indexing C model: higher significance → usage pattern discovery Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 15 / 21
  • 34. Experimental results A rule B rule C rule Size (nodes) 5618 2871 2414 SUBs (num) 671 611 580 top SUBs 2.24% 52.37% 52.93% Completeness 95.41% 90.90% 90.27% Representativeness (first 10 top SUBs) 99.22% 31.78% 23.26% Significance of top level clusters – + ++ Execution time (hh:mm) 03:38 00:06 00:03 Considerations: A model: low significance, high execution time No definitively best choice between B and C: B model: higher representativeness → indexing C model: higher significance → usage pattern discovery Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 15 / 21
  • 35. Experimental results A rule B rule C rule Size (nodes) 5618 2871 2414 SUBs (num) 671 611 580 top SUBs 2.24% 52.37% 52.93% Completeness 95.41% 90.90% 90.27% Representativeness (first 10 top SUBs) 99.22% 31.78% 23.26% Significance of top level clusters – + ++ Execution time (hh:mm) 03:38 00:06 00:03 Considerations: A model: low significance, high execution time No definitively best choice between B and C: B model: higher representativeness → indexing C model: higher significance → usage pattern discovery Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 15 / 21
  • 36. Experimental results A rule B rule C rule Size (nodes) 5618 2871 2414 SUBs (num) 671 611 580 top SUBs 2.24% 52.37% 52.93% Completeness 95.41% 90.90% 90.27% Representativeness (first 10 top SUBs) 99.22% 31.78% 23.26% Significance of top level clusters – + ++ Execution time (hh:mm) 03:38 00:06 00:03 Considerations: A model: low significance, high execution time No definitively best choice between B and C: B model: higher representativeness → indexing C model: higher significance → usage pattern discovery Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 15 / 21
  • 37. Experimental results A rule B rule C rule Size (nodes) 5618 2871 2414 SUBs (num) 671 611 580 top SUBs 2.24% 52.37% 52.93% Completeness 95.41% 90.90% 90.27% Representativeness (first 10 top SUBs) 99.22% 31.78% 23.26% Significance of top level clusters – + ++ Execution time (hh:mm) 03:38 00:06 00:03 Considerations: A model: low significance, high execution time No definitively best choice between B and C: B model: higher representativeness → indexing C model: higher significance → usage pattern discovery Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 15 / 21
  • 38. Experimental results A rule B rule C rule Size (nodes) 5618 2871 2414 SUBs (num) 671 611 580 top SUBs 2.24% 52.37% 52.93% Completeness 95.41% 90.90% 90.27% Representativeness (first 10 top SUBs) 99.22% 31.78% 23.26% Significance of top level clusters – + ++ Execution time (hh:mm) 03:38 00:06 00:03 Considerations: A model: low significance, high execution time No definitively best choice between B and C: B model: higher representativeness → indexing C model: higher significance → usage pattern discovery Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 15 / 21
  • 39. Experimental results A rule B rule C rule Size (nodes) 5618 2871 2414 SUBs (num) 671 611 580 top SUBs 2.24% 52.37% 52.93% Completeness 95.41% 90.90% 90.27% Representativeness (first 10 top SUBs) 99.22% 31.78% 23.26% Significance of top level clusters – + ++ Execution time (hh:mm) 03:38 00:06 00:03 Considerations: A model: low significance, high execution time No definitively best choice between B and C: B model: higher representativeness → indexing C model: higher significance → usage pattern discovery Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 15 / 21
  • 40. Experimental results A rule B rule C rule Size (nodes) 5618 2871 2414 SUBs (num) 671 611 580 top SUBs 2.24% 52.37% 52.93% Completeness 95.41% 90.90% 90.27% Representativeness (first 10 top SUBs) 99.22% 31.78% 23.26% Significance of top level clusters – + ++ Execution time (hh:mm) 03:38 00:06 00:03 Considerations: A model: low significance, high execution time No definitively best choice between B and C: B model: higher representativeness → indexing C model: higher significance → usage pattern discovery Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 15 / 21
  • 41. Use Case Scenario: a scientist wants to use web service X for DNA alignment Baseline: browse my Experiment to find every workflow with X: too many results given a workflow: common practice or exception? New approach: search for X in the lattice’s SUBs: 1 find all usage patterns for X (i.e., top SUBs containing X) 2 analyse their representativeness 3 choose one of them or browse more in depth the lattice Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 16 / 21
  • 42. Use Case Scenario: a scientist wants to use web service X for DNA alignment Baseline: browse my Experiment to find every workflow with X: too many results given a workflow: common practice or exception? New approach: search for X in the lattice’s SUBs: 1 find all usage patterns for X (i.e., top SUBs containing X) 2 analyse their representativeness 3 choose one of them or browse more in depth the lattice Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 16 / 21
  • 43. Use Case Scenario: a scientist wants to use web service X for DNA alignment Baseline: browse my Experiment to find every workflow with X: too many results given a workflow: common practice or exception? New approach: search for X in the lattice’s SUBs: 1 find all usage patterns for X (i.e., top SUBs containing X) 2 analyse their representativeness 3 choose one of them or browse more in depth the lattice Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 16 / 21
  • 44. Use Case Scenario: a scientist wants to use web service X for DNA alignment Baseline: browse my Experiment to find every workflow with X: too many results given a workflow: common practice or exception? New approach: search for X in the lattice’s SUBs: 1 find all usage patterns for X (i.e., top SUBs containing X) 2 analyse their representativeness 3 choose one of them or browse more in depth the lattice Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 16 / 21
  • 45. Use Case Scenario: a scientist wants to use web service X for DNA alignment Baseline: browse my Experiment to find every workflow with X: too many results given a workflow: common practice or exception? New approach: search for X in the lattice’s SUBs: 1 find all usage patterns for X (i.e., top SUBs containing X) 2 analyse their representativeness 3 choose one of them or browse more in depth the lattice Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 16 / 21
  • 46. Use Case Scenario: a scientist wants to use web service X for DNA alignment Baseline: browse my Experiment to find every workflow with X: too many results given a workflow: common practice or exception? New approach: search for X in the lattice’s SUBs: 1 find all usage patterns for X (i.e., top SUBs containing X) 2 analyse their representativeness 3 choose one of them or browse more in depth the lattice Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 16 / 21
  • 47. Use case (cont’d) Applications: reference during process design (learn-by-example) process retrieval: find every my Experiment WF containing such a pattern Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 17 / 21
  • 48. Use case (cont’d) Applications: reference during process design (learn-by-example) process retrieval: find every my Experiment WF containing such a pattern Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 17 / 21
  • 49. Discussion A graph-based clustering technique (SUBDUE) to recognize the most frequent/common subprocesses among a set of workflows/process schemas. Comments several representation alternatives for application of SUBDUE evaluations on specialized e-Science domain useful to find typical patterns and schemas of usage for tools/tasks Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 18 / 21
  • 50. Conclusion Applications: recognition of reference processes (common/best/worst practices) organize a process repository (by indexing processes through substructures) enterprise integration: find similarities (differences, overlapping, complementariness) in BP in different companies Future work: extending experimentations, especially with (real) Business Processes management of heterogeneity (syntactic/semantic, level of granularity) Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 19 / 21
  • 51. Mining Usage Patterns from a Repository of Scientific Workflows Claudia Diamantini, Domenico Potena, Emanuele Storti ´ DII, Universita Politecnica delle Marche, Ancona, Italy SAC 2012, Riva del Garda, March 26-30 Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 20 / 21
  • 52. SUBDUE Subdue uses a variant of beam search (heuristic) Best SUB minimizes the Description Length of the graph 1 Search the best SUB Init: set of SUBs consisting of all uniquely labeled vertices extend each SUB in all possible ways by a single edge and a vertex order SUBs according to MDL principle keep only the top n SUBs End: exhaustion of search space or user constraints Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 21 / 21
  • 53. SUBDUE Subdue uses a variant of beam search (heuristic) Best SUB minimizes the Description Length of the graph 1 Search the best SUB Init: set of SUBs consisting of all uniquely labeled vertices extend each SUB in all possible ways by a single edge and a vertex order SUBs according to MDL principle keep only the top n SUBs End: exhaustion of search space or user constraints Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 21 / 21
  • 54. SUBDUE Subdue uses a variant of beam search (heuristic) Best SUB minimizes the Description Length of the graph 1 Search the best SUB Init: set of SUBs consisting of all uniquely labeled vertices extend each SUB in all possible ways by a single edge and a vertex order SUBs according to MDL principle keep only the top n SUBs End: exhaustion of search space or user constraints Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 21 / 21
  • 55. SUBDUE Subdue uses a variant of beam search (heuristic) Best SUB minimizes the Description Length of the graph 1 Search the best SUB Init: set of SUBs consisting of all uniquely labeled vertices extend each SUB in all possible ways by a single edge and a vertex order SUBs according to MDL principle keep only the top n SUBs End: exhaustion of search space or user constraints Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 21 / 21
  • 56. SUBDUE Subdue uses a variant of beam search (heuristic) Best SUB minimizes the Description Length of the graph 1 Search the best SUB Init: set of SUBs consisting of all uniquely labeled vertices extend each SUB in all possible ways by a single edge and a vertex order SUBs according to MDL principle keep only the top n SUBs End: exhaustion of search space or user constraints Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 21 / 21
  • 57. SUBDUE Subdue uses a variant of beam search (heuristic) Best SUB minimizes the Description Length of the graph 1 Search the best SUB Init: set of SUBs consisting of all uniquely labeled vertices extend each SUB in all possible ways by a single edge and a vertex order SUBs according to MDL principle keep only the top n SUBs End: exhaustion of search space or user constraints Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 21 / 21
  • 58. SUBDUE Subdue uses a variant of beam search (heuristic) Best SUB minimizes the Description Length of the graph 1 Search the best SUB Init: set of SUBs consisting of all uniquely labeled vertices extend each SUB in all possible ways by a single edge and a vertex order SUBs according to MDL principle keep only the top n SUBs End: exhaustion of search space or user constraints Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 21 / 21
  • 59. SUBDUE Subdue uses a variant of beam search (heuristic) Best SUB minimizes the Description Length of the graph 1 Search the best SUB Init: set of SUBs consisting of all uniquely labeled vertices extend each SUB in all possible ways by a single edge and a vertex order SUBs according to MDL principle keep only the top n SUBs End: exhaustion of search space or user constraints 2 Compress the graph with the best SUB Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 21 / 21
  • 60. SUBDUE Subdue uses a variant of beam search (heuristic) Best SUB minimizes the Description Length of the graph 1 Search the best SUB Init: set of SUBs consisting of all uniquely labeled vertices extend each SUB in all possible ways by a single edge and a vertex order SUBs according to MDL principle keep only the top n SUBs End: exhaustion of search space or user constraints 2 Compress the graph with the best SUB Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 21 / 21
  • 61. SUBDUE Subdue uses a variant of beam search (heuristic) Best SUB minimizes the Description Length of the graph 1 Search the best SUB Init: set of SUBs consisting of all uniquely labeled vertices extend each SUB in all possible ways by a single edge and a vertex order SUBs according to MDL principle keep only the top n SUBs End: exhaustion of search space or user constraints 2 Compress the graph with the best SUB Subsequent SUBs may be defined in terms of previously defined SUBs Iterations of this basic step results in a lattice Emanuele Storti (UNIVPM) Mining Usage Patterns SAC 2012 21 / 21