SlideShare uma empresa Scribd logo
1 de 27
Baixar para ler offline
Erik Nooijen,
            Boudewijn v. Dongen, Dirk Fahland


Process Mining for ERP Systems
Process Discovery


                             process
             event                            process
                            discovery
              log                              model
                            algorithm



 c1: A B C D E   assumptions
 c2: A C B D E   • case = sequence of events of this case
 c3: A F D E     • cases are isolated:
                   event A in c1 happens only in c1 (and not in c2)
 …
                 • cases of the same process

                 • one unique case id,
                 • each event associated to exactly one case id



                                                              PAGE 1
Typical Process in an ERP System

                             Manufacturer



                    Material A        Material B
          order
                    Material B        Material B
        product X                                   order
Alice                                              materials
                                                                ACME Inc.




                    Material B        Material A
          order
                    Material C        Material C
        product Y                                   order
Bob
                                                   materials
                       Build to Order                           Mega Corp.

                                                               PAGE 2
n-to-m relations  database


                                                  process
                                                                        process
                                                 discovery
                                                                         model
                                                 algorithm

id attributes       time-stamp attributes                  ProductOrder                          Customer
poID    cust.   …   created        processed       built         shipped            cust.     address      …
po1     Alice       30-08 9:22     30-08 13:12     01-09 15:12   03-09 10:15        Alice     …            …

po2     Bob         30-08 10:15    30-08 13:14     01-09 16:13   03-09 17:18        Bob       …            …

      relations                                                                    data attributes
              OrderedMaterial              id attributes                                    MaterialOrder
poID    moID type added                     moID suppl.          …   completed     sent            received
po1     mo3     B    30-08 13:13            mo3      ACME            30-08 13:15   30-08 14:15     01-09 9:05
po1     mo4     A    30-08 13:14            mo4      MEGA            30-08 13:17   30-08 16:12     01-09 10:13
po2     mo3     B    30-08 13:15
po2     mo4     C    30-08 13:16                   relations
                                                                                                  PAGE 3
Process Discovery for ERP Systems


                                                          process
                                                                             process
                                                         discovery
                                                                              model
                                                         algorithm


                   0..*
                          Customer
                                                                   reality: data in a relational DB
ProductOrder              - cust
               1
                          -…                                       • events stored as time-stamped
- poID
- cust                                                               attributes in tables
- created                 OrderedMat.
                                                   MaterialOrder
- processed               - poID
- built        1
                          - moID
                                                   - moID          • multiple primary keys
- shipped                               1..*       - supplier         multiple notions of case
                          - type
                   1..*                            - completed
                          - added              1
                                                   - sent
                                                   - received      • tables are related
                                                                      one event related to
                                                                     multiple cases

                                                                                              PAGE 4
Process Discovery for ERP Systems


                                                          process
                                                                             process
                                                         discovery
                                                                              model
                                                         algorithm


                   0..*
                          Customer
                                                                   reality: data in a relational DB
ProductOrder              - cust
               1
                          -…                                       • events stored as time-stamped
- poID
- cust                                                               attributes in tables
- created                 OrderedMat.
                                                   MaterialOrder
- processed               - poID
- built        1
                          - moID
                                                   - moID          • multiple primary keys
- shipped                               1..*       - supplier         multiple notions of case
                          - type
                   1..*                            - completed
                          - added              1
                                                   - sent
                                                   - received      • tables are related
                                                                      one event related to
                                                                     multiple cases

                                                                                              PAGE 5
Outline


                                                   process
                                                    model


                                                                  related by
                                                              primary foreign-key
                                                                   relations

            decompose       by primary keys




                                                             model f.
                        log f.         discovery              PO
   log f.                                                                 model f.
                         MO
    PO                                                                     MO
                                       discovery
                                                                        PAGE 6
Find Artifact Schemas


                                                   process
                                                    model


                                                                  related by
                                                              primary foreign-key
                                                                   relations

            decompose       by primary keys




                                                             model f.
                        log f.         discovery              PO
   log f.                                                                 model f.
                         MO
    PO                                                                     MO
                                       discovery
                                                                        PAGE 7
Step 0: discover database schema

 document schema vs. actual schema  identify
 • column types (esp. time-stamped columns)
 • primary keys
 • foreign keys
 various (non-trivial) techniques available
 key discovery is NP-complete in the size of the
  table(s)
 result:




                                                PAGE 8
Step 1: decompose schema into processes

= schema summarization                  find:
                                        1. sets of
                                           corresponding
                                           tables
                                        2. links between
                                           those
         ProductOrder   MaterialOrder




                                                 PAGE 9
Automatic Schema Summarization

= group similar tables
  through clustering
 define a distance between
    any 2 tables
    •     by relations
    •     by information content


       tables that are close to
        each other
         same cluster
       # of clusters: user input



                                    PAGE 10
Automatic Schema Summarization


1. structural distance                     A
   between tables                          1
                                           2         fanout: 1 = (2+0)/2
   fanout ~ avg. # of child   fanout: 1
   records related to the                      fanout: 2
   same parent record
                              A B         A B              A B
                              1 X         1 X              1 X
                              2 Y         1 Y              1 Y
                                          2 Z
                                          2 U




                                                           PAGE 11
Automatic Schema Summarization


1. structural distance                        A
   between tables                             1
                                              2          fanout: 1
   fanout ~ avg. # of child      fanout: 1                 m.fr: 2 = 1/ (1/2)
   records related to the        m.fr: 1          fanout: 2
   same parent record                             m.fr: 1
                                 A B         A B              A B
   matched fraction ~            1 X         1 X              1 X
   1 / (fraction of records in   2 Y         1 Y              1 Y
   parent with matching child                2 Z
   record)                                   2 U




                                                                PAGE 12
Grouping by Clustering

1. structural distance
2. information distance
   importance of each table
   = entropy (is maximal if all
   records are different)
   distance: 2 tables with high
   entropies  large distance
3. weighted distance by
   structure + information
4. k-means clustering:            most important table of cluster
   k clusters based on            = table with least distance to all
                                   key attribute of the cluster
   weighted distance
                                                            PAGE 13
Artifact Schema  Artifact Log


                                                   process
                                                    model


                                                                  related by
                                                              primary foreign-key
                                                                   relations

            decompose       by primary keys




                                                             model f.
                        log f.         discovery              PO
   log f.                                                                 model f.
                         MO
    PO                                                                     MO
                                       discovery
                                                                        PAGE 14
Log Extraction

                  cluster = set of related tables
                            + primary key of most important table

                                         case id




                poID   cust.   …   created       processed     built          shipped
       log f.
        PO      po1    Alice       30-08 9:22    30-08 13:12   01-09 15:12    03-09 10:15
                po2    Bob         30-08 10:15   30-08 13:14   01-09 16:13    03-09 17:18

                                                      poID     moID type added
                                                      po1      mo3     B      30-08 13:13
po1:                                                  po1      mo4     A      30-08 13:14
                                                      po2      mo3     B      30-08 13:15

po2:                                                  po2      mo4     C      30-08 13:16

                                                                             PAGE 15
Log Extraction

                           cluster = set of related tables
                                     + primary key of most important table

                                                 case id

                           time-stamped attribute  event


                        poID   cust.   …   created       processed     built          shipped
          log f.
           PO           po1    Alice       30-08 9:22    30-08 13:12   01-09 15:12    03-09 10:15
                        po2    Bob         30-08 10:15   30-08 13:14   01-09 16:13    03-09 17:18

                                                              poID     moID type added
                                                              po1      mo3     B      30-08 13:13
po1: (created, poID=po1, time=30-08 9:22, …)                  po1      mo4     A      30-08 13:14
                                                              po2      mo3     B      30-08 13:15
                                                              po2      mo4     C      30-08 13:16

                                                                                     PAGE 16
Log Extraction

                           cluster = set of related tables
                                     + primary key of most important table

                                                  case id

                           time-stamped attribute  event
                           related attributes  event attributes
                         poID   cust.   …   created       processed     built          shipped
           log f.
            PO           po1    Alice       30-08 9:22    30-08 13:12   01-09 15:12    03-09 10:15
                         po2    Bob         30-08 10:15   30-08 13:14   01-09 16:13    03-09 17:18

                                                               poID     moID type added
                                                               po1      mo3     B      30-08 13:13
po1: (created, poID=po1, time=30-08 9:22, cust.=Alice, …)po1            mo4     A      30-08 13:14
                                                               po2      mo3     B      30-08 13:15
                                                               po2      mo4     C      30-08 13:16

                                                                                      PAGE 17
Log Extraction

                           cluster = set of related tables
                                     + primary key of most important table

                                                  case id

                           time-stamped attribute  event
                           related attributes  event attributes
                         poID   cust.   …   created       processed     built          shipped
           log f.
            PO           po1    Alice       30-08 9:22    30-08 13:12   01-09 15:12    03-09 10:15
                         po2    Bob         30-08 10:15   30-08 13:14   01-09 16:13    03-09 17:18

                                                               poID     moID type added
                                                               po1      mo3     B      30-08 13:13
po1: (created, poID=po1, time=30-08 9:22, cust.=Alice, …)po1            mo4     A      30-08 13:14
    (processed, poID=po1, time=30-08 13:12, …)                 po2      mo3     B      30-08 13:15
                                                               po2      mo4     C      30-08 13:16

                                                                                      PAGE 18
Log Extraction

                           cluster = set of related tables
                                     + primary key of most important table

                                                    case id

                           time-stamped attribute  event
                           related attributes  event attributes
                         poID     cust.   …   created       processed     built          shipped
           log f.
            PO           po1      Alice       30-08 9:22    30-08 13:12   01-09 15:12    03-09 10:15
                         po2      Bob         30-08 10:15   30-08 13:14   01-09 16:13    03-09 17:18

                                                                 poID     moID type added
                                                                 po1      mo3     B      30-08 13:13
po1: (created, poID=po1, time=30-08 9:22, cust.=Alice, …)po1              mo4     A      30-08 13:14
    (processed, poID=po1, time=30-08 13:12, …)                   po2      mo3     B      30-08 13:15
    (added, poID=po1, time=30-08 13:13, moID=mo3, …)po2                   mo4     C      30-08 13:16
                               refers to artifact “MaterialOrder”
                                                                                        PAGE 19
Outline


                                                   process
                                                    model


                                                                  compose by
                                                              primary foreign-key
                                                                    relations

            decompose       by primary keys




                                                             model f.
                        log f.         discovery              order
   log f.                                                                 model f.
                        order
   quote                                                                   quote
                                       discovery
                                                                        PAGE 20
Resulting Model(s)
                Product Order                         Material Order
                                       1..*
                                              added
       create

                                                       completed

      processed

                    added       1..*                      sent

        built

                                                        received

       shipped


                        (addded, poID=po1, …, moID=mo3)
                                                                   PAGE 21
Implementation & Evaluation

 prototype tool
 • input: relational database (via JDBC), .csv tables
 • steps
   − discover database schema (types, keys, relations)
   − discover artifact schema
     − by k-means clustering
     − by user picking tables
   − extract logs  ProM




                                                     PAGE 22
Evaluation: SAP System of Sligro

 > 300 tables, > 40 GiB of data
 schema extraction time-stamp attributes: 15 hrs
                       primary keys:          4 hrs
                       foreign keys:          5 hrs (single col)/
                                              6 days (double col.)

 clustering           entropies:               17 hrs
                       table distances:         5 hrs
                       clustering:              a few seconds
                       ~20 different artifacts found
                       largest: 47 tables, 869 columns

 log extraction       extract 1000 traces of > 246,000 events
                       query database:          1 hrs
                       write log file:          32 hrs

                                                             PAGE 23
Sligro: Artikel lifecycle model




                                  PAGE 24
Open issues

 performance
 •   key discovery: NP-complete in R (# of records)
 •   foreign key discovery: NP-complete in R2
 •   problem is in the “hard part” of NP
 •    sampling of data, domain knowledge, semi-automatic
 requires good database structure
 •   proper relations, proper keys
 •   otherwise wrong clusters are formed
 •   events don’t get right attributes
 •    semi-automatic approach
 events shared by multiple cases… working on it…
                                                    PAGE 25
Erik Nooijen,
            Boudewijn v. Dongen, Dirk Fahland


Process Mining for ERP Systems

Mais conteúdo relacionado

Mais de Dirk Fahland

Multi-Dimensional Process Analysis
Multi-Dimensional Process Analysis Multi-Dimensional Process Analysis
Multi-Dimensional Process Analysis Dirk Fahland
 
Artifacts and Databases - the Need for Event Relation Graphs and Synchronous ...
Artifacts and Databases - the Need for Event Relation Graphs and Synchronous ...Artifacts and Databases - the Need for Event Relation Graphs and Synchronous ...
Artifacts and Databases - the Need for Event Relation Graphs and Synchronous ...Dirk Fahland
 
Describing, Discovering, and Understanding Multi-Dimensional Processes
Describing, Discovering, and Understanding Multi-Dimensional ProcessesDescribing, Discovering, and Understanding Multi-Dimensional Processes
Describing, Discovering, and Understanding Multi-Dimensional ProcessesDirk Fahland
 
Process Mining: Past, Present, and Open Challenges (AIST 2017 Keynote)
Process Mining: Past, Present, and Open Challenges (AIST 2017 Keynote)Process Mining: Past, Present, and Open Challenges (AIST 2017 Keynote)
Process Mining: Past, Present, and Open Challenges (AIST 2017 Keynote)Dirk Fahland
 
Where did I go wrong? Explaining errors in process models
Where did I go wrong? Explaining errors in process modelsWhere did I go wrong? Explaining errors in process models
Where did I go wrong? Explaining errors in process modelsDirk Fahland
 
Mining Branch-Time Scenarios From Execution Logs
Mining Branch-Time Scenarios From Execution LogsMining Branch-Time Scenarios From Execution Logs
Mining Branch-Time Scenarios From Execution LogsDirk Fahland
 
From Live Sequence Chart Specifications to Distributed Components
From Live Sequence Chart Specifications to Distributed ComponentsFrom Live Sequence Chart Specifications to Distributed Components
From Live Sequence Chart Specifications to Distributed ComponentsDirk Fahland
 
LSC Revisited - From Scenarios to Distributed Components
LSC Revisited - From Scenarios to Distributed ComponentsLSC Revisited - From Scenarios to Distributed Components
LSC Revisited - From Scenarios to Distributed ComponentsDirk Fahland
 
Repairing Process Models to Match Reality
Repairing Process Models to Match RealityRepairing Process Models to Match Reality
Repairing Process Models to Match RealityDirk Fahland
 
Simplifying Mined Process Models
Simplifying Mined Process ModelsSimplifying Mined Process Models
Simplifying Mined Process ModelsDirk Fahland
 
The Process of Process Modeling
The Process of Process ModelingThe Process of Process Modeling
The Process of Process ModelingDirk Fahland
 
Behavioral Conformance of Artifact-Centric Process Models
Behavioral Conformance of Artifact-Centric Process ModelsBehavioral Conformance of Artifact-Centric Process Models
Behavioral Conformance of Artifact-Centric Process ModelsDirk Fahland
 
Many-to-Many: Interactions in Artifact-Centric Choreographies
Many-to-Many: Interactions in Artifact-Centric ChoreographiesMany-to-Many: Interactions in Artifact-Centric Choreographies
Many-to-Many: Interactions in Artifact-Centric ChoreographiesDirk Fahland
 
Artifacts - Processes with Multiple Instances
Artifacts - Processes with Multiple InstancesArtifacts - Processes with Multiple Instances
Artifacts - Processes with Multiple InstancesDirk Fahland
 

Mais de Dirk Fahland (14)

Multi-Dimensional Process Analysis
Multi-Dimensional Process Analysis Multi-Dimensional Process Analysis
Multi-Dimensional Process Analysis
 
Artifacts and Databases - the Need for Event Relation Graphs and Synchronous ...
Artifacts and Databases - the Need for Event Relation Graphs and Synchronous ...Artifacts and Databases - the Need for Event Relation Graphs and Synchronous ...
Artifacts and Databases - the Need for Event Relation Graphs and Synchronous ...
 
Describing, Discovering, and Understanding Multi-Dimensional Processes
Describing, Discovering, and Understanding Multi-Dimensional ProcessesDescribing, Discovering, and Understanding Multi-Dimensional Processes
Describing, Discovering, and Understanding Multi-Dimensional Processes
 
Process Mining: Past, Present, and Open Challenges (AIST 2017 Keynote)
Process Mining: Past, Present, and Open Challenges (AIST 2017 Keynote)Process Mining: Past, Present, and Open Challenges (AIST 2017 Keynote)
Process Mining: Past, Present, and Open Challenges (AIST 2017 Keynote)
 
Where did I go wrong? Explaining errors in process models
Where did I go wrong? Explaining errors in process modelsWhere did I go wrong? Explaining errors in process models
Where did I go wrong? Explaining errors in process models
 
Mining Branch-Time Scenarios From Execution Logs
Mining Branch-Time Scenarios From Execution LogsMining Branch-Time Scenarios From Execution Logs
Mining Branch-Time Scenarios From Execution Logs
 
From Live Sequence Chart Specifications to Distributed Components
From Live Sequence Chart Specifications to Distributed ComponentsFrom Live Sequence Chart Specifications to Distributed Components
From Live Sequence Chart Specifications to Distributed Components
 
LSC Revisited - From Scenarios to Distributed Components
LSC Revisited - From Scenarios to Distributed ComponentsLSC Revisited - From Scenarios to Distributed Components
LSC Revisited - From Scenarios to Distributed Components
 
Repairing Process Models to Match Reality
Repairing Process Models to Match RealityRepairing Process Models to Match Reality
Repairing Process Models to Match Reality
 
Simplifying Mined Process Models
Simplifying Mined Process ModelsSimplifying Mined Process Models
Simplifying Mined Process Models
 
The Process of Process Modeling
The Process of Process ModelingThe Process of Process Modeling
The Process of Process Modeling
 
Behavioral Conformance of Artifact-Centric Process Models
Behavioral Conformance of Artifact-Centric Process ModelsBehavioral Conformance of Artifact-Centric Process Models
Behavioral Conformance of Artifact-Centric Process Models
 
Many-to-Many: Interactions in Artifact-Centric Choreographies
Many-to-Many: Interactions in Artifact-Centric ChoreographiesMany-to-Many: Interactions in Artifact-Centric Choreographies
Many-to-Many: Interactions in Artifact-Centric Choreographies
 
Artifacts - Processes with Multiple Instances
Artifacts - Processes with Multiple InstancesArtifacts - Processes with Multiple Instances
Artifacts - Processes with Multiple Instances
 

Último

Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 

Último (20)

Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 

Process Mining for ERP Systems

  • 1. Erik Nooijen, Boudewijn v. Dongen, Dirk Fahland Process Mining for ERP Systems
  • 2. Process Discovery process event process discovery log model algorithm c1: A B C D E assumptions c2: A C B D E • case = sequence of events of this case c3: A F D E • cases are isolated: event A in c1 happens only in c1 (and not in c2) … • cases of the same process • one unique case id, • each event associated to exactly one case id PAGE 1
  • 3. Typical Process in an ERP System Manufacturer Material A Material B order Material B Material B product X order Alice materials ACME Inc. Material B Material A order Material C Material C product Y order Bob materials Build to Order Mega Corp. PAGE 2
  • 4. n-to-m relations  database process process discovery model algorithm id attributes time-stamp attributes ProductOrder Customer poID cust. … created processed built shipped cust. address … po1 Alice 30-08 9:22 30-08 13:12 01-09 15:12 03-09 10:15 Alice … … po2 Bob 30-08 10:15 30-08 13:14 01-09 16:13 03-09 17:18 Bob … … relations data attributes OrderedMaterial id attributes MaterialOrder poID moID type added moID suppl. … completed sent received po1 mo3 B 30-08 13:13 mo3 ACME 30-08 13:15 30-08 14:15 01-09 9:05 po1 mo4 A 30-08 13:14 mo4 MEGA 30-08 13:17 30-08 16:12 01-09 10:13 po2 mo3 B 30-08 13:15 po2 mo4 C 30-08 13:16 relations PAGE 3
  • 5. Process Discovery for ERP Systems process process discovery model algorithm 0..* Customer reality: data in a relational DB ProductOrder - cust 1 -… • events stored as time-stamped - poID - cust attributes in tables - created OrderedMat. MaterialOrder - processed - poID - built 1 - moID - moID • multiple primary keys - shipped 1..* - supplier  multiple notions of case - type 1..* - completed - added 1 - sent - received • tables are related  one event related to multiple cases PAGE 4
  • 6. Process Discovery for ERP Systems process process discovery model algorithm 0..* Customer reality: data in a relational DB ProductOrder - cust 1 -… • events stored as time-stamped - poID - cust attributes in tables - created OrderedMat. MaterialOrder - processed - poID - built 1 - moID - moID • multiple primary keys - shipped 1..* - supplier  multiple notions of case - type 1..* - completed - added 1 - sent - received • tables are related  one event related to multiple cases PAGE 5
  • 7. Outline process model related by primary foreign-key relations decompose by primary keys model f. log f. discovery PO log f. model f. MO PO MO discovery PAGE 6
  • 8. Find Artifact Schemas process model related by primary foreign-key relations decompose by primary keys model f. log f. discovery PO log f. model f. MO PO MO discovery PAGE 7
  • 9. Step 0: discover database schema  document schema vs. actual schema  identify • column types (esp. time-stamped columns) • primary keys • foreign keys  various (non-trivial) techniques available  key discovery is NP-complete in the size of the table(s)  result: PAGE 8
  • 10. Step 1: decompose schema into processes = schema summarization find: 1. sets of corresponding tables 2. links between those ProductOrder MaterialOrder PAGE 9
  • 11. Automatic Schema Summarization = group similar tables through clustering  define a distance between any 2 tables • by relations • by information content  tables that are close to each other  same cluster  # of clusters: user input PAGE 10
  • 12. Automatic Schema Summarization 1. structural distance A between tables 1 2 fanout: 1 = (2+0)/2 fanout ~ avg. # of child fanout: 1 records related to the fanout: 2 same parent record A B A B A B 1 X 1 X 1 X 2 Y 1 Y 1 Y 2 Z 2 U PAGE 11
  • 13. Automatic Schema Summarization 1. structural distance A between tables 1 2 fanout: 1 fanout ~ avg. # of child fanout: 1 m.fr: 2 = 1/ (1/2) records related to the m.fr: 1 fanout: 2 same parent record m.fr: 1 A B A B A B matched fraction ~ 1 X 1 X 1 X 1 / (fraction of records in 2 Y 1 Y 1 Y parent with matching child 2 Z record) 2 U PAGE 12
  • 14. Grouping by Clustering 1. structural distance 2. information distance importance of each table = entropy (is maximal if all records are different) distance: 2 tables with high entropies  large distance 3. weighted distance by structure + information 4. k-means clustering: most important table of cluster k clusters based on = table with least distance to all  key attribute of the cluster weighted distance PAGE 13
  • 15. Artifact Schema  Artifact Log process model related by primary foreign-key relations decompose by primary keys model f. log f. discovery PO log f. model f. MO PO MO discovery PAGE 14
  • 16. Log Extraction cluster = set of related tables + primary key of most important table case id poID cust. … created processed built shipped log f. PO po1 Alice 30-08 9:22 30-08 13:12 01-09 15:12 03-09 10:15 po2 Bob 30-08 10:15 30-08 13:14 01-09 16:13 03-09 17:18 poID moID type added po1 mo3 B 30-08 13:13 po1: po1 mo4 A 30-08 13:14 po2 mo3 B 30-08 13:15 po2: po2 mo4 C 30-08 13:16 PAGE 15
  • 17. Log Extraction cluster = set of related tables + primary key of most important table case id time-stamped attribute  event poID cust. … created processed built shipped log f. PO po1 Alice 30-08 9:22 30-08 13:12 01-09 15:12 03-09 10:15 po2 Bob 30-08 10:15 30-08 13:14 01-09 16:13 03-09 17:18 poID moID type added po1 mo3 B 30-08 13:13 po1: (created, poID=po1, time=30-08 9:22, …) po1 mo4 A 30-08 13:14 po2 mo3 B 30-08 13:15 po2 mo4 C 30-08 13:16 PAGE 16
  • 18. Log Extraction cluster = set of related tables + primary key of most important table case id time-stamped attribute  event related attributes  event attributes poID cust. … created processed built shipped log f. PO po1 Alice 30-08 9:22 30-08 13:12 01-09 15:12 03-09 10:15 po2 Bob 30-08 10:15 30-08 13:14 01-09 16:13 03-09 17:18 poID moID type added po1 mo3 B 30-08 13:13 po1: (created, poID=po1, time=30-08 9:22, cust.=Alice, …)po1 mo4 A 30-08 13:14 po2 mo3 B 30-08 13:15 po2 mo4 C 30-08 13:16 PAGE 17
  • 19. Log Extraction cluster = set of related tables + primary key of most important table case id time-stamped attribute  event related attributes  event attributes poID cust. … created processed built shipped log f. PO po1 Alice 30-08 9:22 30-08 13:12 01-09 15:12 03-09 10:15 po2 Bob 30-08 10:15 30-08 13:14 01-09 16:13 03-09 17:18 poID moID type added po1 mo3 B 30-08 13:13 po1: (created, poID=po1, time=30-08 9:22, cust.=Alice, …)po1 mo4 A 30-08 13:14 (processed, poID=po1, time=30-08 13:12, …) po2 mo3 B 30-08 13:15 po2 mo4 C 30-08 13:16 PAGE 18
  • 20. Log Extraction cluster = set of related tables + primary key of most important table case id time-stamped attribute  event related attributes  event attributes poID cust. … created processed built shipped log f. PO po1 Alice 30-08 9:22 30-08 13:12 01-09 15:12 03-09 10:15 po2 Bob 30-08 10:15 30-08 13:14 01-09 16:13 03-09 17:18 poID moID type added po1 mo3 B 30-08 13:13 po1: (created, poID=po1, time=30-08 9:22, cust.=Alice, …)po1 mo4 A 30-08 13:14 (processed, poID=po1, time=30-08 13:12, …) po2 mo3 B 30-08 13:15 (added, poID=po1, time=30-08 13:13, moID=mo3, …)po2 mo4 C 30-08 13:16 refers to artifact “MaterialOrder” PAGE 19
  • 21. Outline process model compose by primary foreign-key relations decompose by primary keys model f. log f. discovery order log f. model f. order quote quote discovery PAGE 20
  • 22. Resulting Model(s) Product Order Material Order 1..* added create completed processed added 1..* sent built received shipped (addded, poID=po1, …, moID=mo3) PAGE 21
  • 23. Implementation & Evaluation  prototype tool • input: relational database (via JDBC), .csv tables • steps − discover database schema (types, keys, relations) − discover artifact schema − by k-means clustering − by user picking tables − extract logs  ProM PAGE 22
  • 24. Evaluation: SAP System of Sligro  > 300 tables, > 40 GiB of data  schema extraction time-stamp attributes: 15 hrs primary keys: 4 hrs foreign keys: 5 hrs (single col)/ 6 days (double col.)  clustering entropies: 17 hrs table distances: 5 hrs clustering: a few seconds ~20 different artifacts found largest: 47 tables, 869 columns  log extraction extract 1000 traces of > 246,000 events query database: 1 hrs write log file: 32 hrs PAGE 23
  • 25. Sligro: Artikel lifecycle model PAGE 24
  • 26. Open issues  performance • key discovery: NP-complete in R (# of records) • foreign key discovery: NP-complete in R2 • problem is in the “hard part” of NP •  sampling of data, domain knowledge, semi-automatic  requires good database structure • proper relations, proper keys • otherwise wrong clusters are formed • events don’t get right attributes •  semi-automatic approach  events shared by multiple cases… working on it… PAGE 25
  • 27. Erik Nooijen, Boudewijn v. Dongen, Dirk Fahland Process Mining for ERP Systems