Keynote by Marco Montali "Marrying data and processes: from model to event data analysis" at the Workshop on Algorithms & Theories for the Analysis of Event Data (ATAED 2016), satellite event of the 37th International Conference on Application and Theory of Petri Nets and Concurrency and of the 16th International Conference on Application of Concurrency to System Design (PN 2016 and ACSD 2016).
ATAED2016 Montali - Marrying data and processes: from model to event data analysis
1. From Model to Event Data Analysis
Marco Montali
Free University of Bozen-Bolzano
ATAED 2016
Marrying Data and Processes
2. Our Starting Point
Marrying processes and data is a must
if we want to really understand
how complex dynamic systems operate
Dynamic systems of interest:
• business processes
• multiagent systems
• distributed systems
2
3. Our Thesis
Knowledge representation and
computational logics
is a swiss-army knife to
understand data-aware dynamic systems,
and
provide automated reasoning and verification
capabilities along their entire lifecycle
3
5. Formal Verification
Automated analysis
of a formal model of the system
against a property of interest,
considering all possible system behaviors
5
picture by Wil van der Aalst
6. Process Mining
Extraction of valuable,
process-related information
from event logs,
i.e., the footprint of reality
6
picture by Wil van der Aalst
9. Data/Process Fragmentation
• A business process consists of a set of activities that
are performed in coordination in an organizational and
technical environment [Weske, 2007]
• Activities change the real world
• The corresponding updates are reflected into the
organizational information system(s)
• Data trigger decision-making, which in turn determines
the next steps to be taken in the process
• Survey by Forrester [Karel et al, 2009]: lack of
interaction between data and process experts
9
10. Experts Dichotomy
• BPM professionals: data are subsidiary to
processes
• Master data managers: data are the main driver
for the company’s existence
• Forrester: in 83/100 companies, no interaction at
all between these two groups
• This isolation propagates to languages and tools,
which never properly account for the process-
data connection
10
11. Conventional Data Modeling
Focus: revelant entities, relations, static constraints
Supplier ManufacturingProcurement/Supplier
Sales
Customer PO Line Item
Work OrderMaterial PO
*
*
spawns
0..1
Material
But… how do data evolve?
Where can we find the “state” of a purchase order?
11
12. Conventional Process Modeling
Focus: control-flow of activities in response to events
But… how do activities update data?
What is the impact of canceling an order?
12
14. Do you like Spaghetti?
Manage
Cancelation
ShipAssemble
Manage
Material POs
Decompose
Customer PO
Activities
Process
Data
Activities
Process
Data
Activities
Process
Data
Activities
Process
Data
Activities
Process
Data
Customers Suppliers&CataloguesCustomer POs Work Orders Material POs
IT integration: difficult to manage, understand, evolve
14
15. Too Late…
• Where are the data?
• Where shall we model relevant business rules?
15
o late to reconstruct the missing pieces
Where is our data?
part is in the DBs,
part is hidden in the process execution engine.
Where are the relevant business rules, and how are they modeled?
At the DB level? Which DB? How to import the process data?
(Also) in the business model? How to import data from the DBs?
DataProcess
Supplier ManufacturingProcurement/Supplier
Sales
Customer PO Line Item
Work OrderMaterial PO
*
*
spawns
0..1
Determine
cancelation
penalty
Notify penalty
Material
Process Engine
Process State
Business rules
For each work order W
For each material PO M in W
if M has been shipped
add returnCost(M) to penalty
16. How is Research Reacting?
A recent review…
Verification typically takes place at the design stage
of a business process type. However, at this stage,
required knowledge about data (database
schema, integrity constraints) is typically not yet
available.
16
17. …But There is Hope!
• [Meyer et al, 2011]: data-process integration
crucial to assess the value of processes and
evaluate KPIs
• [Dumas, 2011]: data-process integration crucial to
aggregate all relevant information, and to suitably
inject business rules into the system
• [Reichert, 2012]: “Process and data are just two
sides of the same coin”
17
25. Why FO Temporal Logics
• To inspect data: FO queries
• To capture system dynamics: temporal
modalities
• To track the evolution of objects: FO
quantification across states
• Example:
It is always the case that every order
is eventually either cancelled or paid
25
26. Why FO Temporal Logics
• To inspect data: FO queries
• To capture system dynamics: temporal
modalities
• To track the evolution of objects: FO
quantification across states
• Example:
It is always the case that every order
is eventually either cancelled or paid
26
G
✓
8x.Order(x)
! F State(x, cancelled) _ State(x, paid)
◆
27. Problem Dimensions
Data
component
Relational DB
Description
logic KB
OBDA system
Inconsistency
tolerant KB
…
Process
component
condition-
action rules
BPMN Golog program Petri nets …
Task
modeling
Conditional
effects
Add/delete
assertions
Programs User forms …
External
inputs
None
External
services
Input DB Fixed input …
Network
topology
Single
orchestrator
Full mesh
Connected, fixed
graph
Ring …
Interaction
mechanism
None Synchronous
Asynchronous
and ordered
Asynchronous
lossy
…
27
33. RAW-SYS
• Integrated data+process modeling
• Standard relational model for capturing data
• Standard workflow nets (or other types of Petri nets) for capturing
processes
• Net transitions interplay with data
• Conditionally enabled by FO queries over the data
• Described in terms of full-fledged CRUD operations over the data
• Bridge between theory and practice
• Mimics how BPMS actually work
• Has unambiguous execution semantics
33
35. Example: User Cart
35
Customer
Id …
Product (read-only)
name …
InCart
BarCode Product
Owner
CustId
Shared DB
Local DB
create
case
close
case
36. Example: User Cart
36
Customer
Id …
Product (read-only)
name …
InCart
BarCode Product
Owner
CustId
Shared DB
Local DB
Customer(x,…)
ADD Owner(x)
…
create
case
close
case
37. Example: User Cart
37
Customer
Id …
Product (read-only)
name …
InCart
BarCode Product
Owner
CustId
Shared DB
Local DB
Customer(x,…)
ADD Owner(x)
open cart
…
create
case
close
case
38. Example: User Cart
38
Customer
Id …
Product (read-only)
name …
InCart
BarCode Product
Owner
CustId
Shared DB
Local DB
Customer(x,…)
ADD Owner(x)
open cart
insert item(p)
create
case
Product(p,…)
ADD InCart(getBC(),p)
close
case
39. Example: User Cart
39
Customer
Id …
Product (read-only)
name …
InCart
BarCode Product
Owner
CustId
Shared DB
Local DB
Customer(x,…)
ADD Owner(x)
open cart
empty
cart
insert item(p)
create
case
Exist x,p. InCart(x,p)
Product(p,…)
Forall x,p.
InCart(x,p)->DEL InCart(x,p)
ADD InCart(getBC(),p)
close
case
40. Example: User Cart
40
Customer
Id …
Product (read-only)
name …
InCart
BarCode Product
Owner
CustId
Shared DB
Local DB
Customer(x,…)
ADD Owner(x)
open cart
…
close cart
empty
cart
insert item(p)
create
case
Exist x,p. InCart(x,p)
Product(p,…)
Forall x,p.
InCart(x,p)->DEL InCart(x,p)
ADD InCart(getBC(),p)
close
case
41. Execution Semantics
Relational transition system. Each state is labeled by:
• Instance of the shared DB
• Case IDs of running cases, together with corresponding
• Instances of local DBs
• Markings of their nets
Successors constructed considering all possible ground
executable actions and all possible input
configurations (s.t. the resulting state satisfies the
schema constraints) —> infinite-state transition system
42. The Good…
RAW-SYS are:
• Markovian: Next state only depends on the
current state + input.
Two states with identical DBs are bisimilar.
• Generic: FO/SQL (as all query languages) does
not distinguish structures which are identical
modulo uniform renaming of data objects.
—> Two isomorphic states are bisimilar
42
43. … and the Bad
Reachability undecidable even with a single safe net
• Counter —> “size” of a unary relation
• Test counter for zero: check whether counter relation is empty
• What matters is the # of tuples, not the actual values
• Can be reconstructed also without negation in the queries
43
New
Increment Decrement
44. State-Boundedness
[PODS 2013]
Put a pre-defined bound on the DB size
(not the size of the data domain!)
• Resulting transition system: still infinite-state
• But: infinitely-many encountered values along a
run cannot be “accumulated” in a single state
44
45. RAW-SYS, Boundedness,
and Reachability
Reachability undecidable as soon as one of the following
conditions holds:
• Shared DB with unbounded size
• Local DB with unbounded size
• Unboundedly many simultaneously running cases
What happens if all these three sources are “bounded in
size”?
45
48. Magic!
48
First-order
temporal formula
(FO-CTL or FO-LTL with
persistent quantification)
|=
Infinite-state
transition system
|= Propositional
temporal formula
‘
If and only if
Finite-state
abstraction
49. Towards Implementations
• [IJCAI 2015] Planning can be lifted to deal with this
infinite-state setting
• Ongoing implementation effort using DLVk and
state-of-the-art ADL planners
• [SEBD 2015, AMW 2015] Ongoing effort for
implementing model checking techniques based on
our abstraction natively in relational technology
• Goal: combine the best of databases and formal
methods
49
59. Key Issues
• How to resolve the
“impedance mismatch”?
• How to get a “view” of
the data tailored to
process mining?
59
PaperInfo
1..*
*
Conference
creation time:DateTime
conf name:String
User
creation time:DateTime
username:String
Paper
creation time:DateTime
title:String
Review Request
invitation time:DateTime
Review
submission time:DateTime
Decision
decision time:DateTime
outcome: Bool
Upload Submitted
upload time:DateTime
upload accepted
upload time:DateTime
submitted to
1
*
organizer of
Accepted Paper
<<no time>>
*
reviewer
1
0..1
PhasD
1
0..1
RhasR
1
10..1 corresponds to
*
UhasP
1
*
AhasU
1
*1 for
author
1..*
*
by
1
*
USuploadbyU
creator
1
*
1*
UAuploadbyU
1
*
60. Impedance Mismatch is
Really an Issue
Crompton (2008): domain experts loose too much
time to big into data and turn them into
knowledge
• Engineers in the oil/gas industry: 30-70% of
their working time spent for data searching
and data quality
60
61. Optique
Scalable, End-User Access to Big Data
• http://optique-project.eu
• Goal: engineering techniques for enabling end-
users accessing data through domain ontologies
• Case studies: Statoil, Siemens
61
62. Facts on Statoil
• 1000 TB of data inside relational DBMSs
• Schemas not aligned
• More than 2000 tables, in a plethora of different
DBs
• 900 experts part of “Statoil Exploration”
• Up to 4 days to formulate queries and encode
them in SQL
62
63. Query Example
63
OBDI framework Query answering Ontology languages Mappings Identity Conclusions
How much time/money is spent searching for data?
A user query at Statoil
Show all norwegian wellbores with some aditional attributes
(wellbore id, completion date, oldest penetrated age,result). Limit
to all wellbores with a core and show attributes like (wellbore id,
core number, top core depth, base core depth, intersecting
stratigraphy). Limit to all wellbores with core in Brentgruppen and
show key atributes in a table. After connecting to EPDS (slegge)
we could for instance limit futher to cores in Brent with measured
permeability and where it is larger than a given value, for instance 1
mD. We could also find out whether there are cores in Brent which
are not stored in EPDS (based on NPD info) and where there could
be permeability values. Some of the missing data we possibly own,
other not.
Diego Calvanese (FUB) Ontologies for Data Integration FOfAI 2015, Buenos Aires – 27/7/2015 (5/52)
64. 64
er query at Statoil
w all norwegian wellbores with some aditional attributes
bore id, completion date, oldest penetrated age,result). Lim
wellbores with a core and show attributes like (wellbore id
number, top core depth, base core depth, intersecting
graphy). Limit to all wellbores with core in Brentgruppen
key atributes in a table. After connecting to EPDS (slegg
ould for instance limit futher to cores in Brent with measu
eability and where it is larger than a given value, for insta
We could also find out whether there are cores in Brent w
ot stored in EPDS (based on NPD info) and where there
ermeability values. Some of the missing data we possibly o
not.
SELECT [...]
FROM
db_name.table1 table1,
db_name.table2 table2a,
db_name.table2 table2b,
db_name.table3 table3a,
db_name.table3 table3b,
db_name.table3 table3c,
db_name.table3 table3d,
db_name.table4 table4a,
db_name.table4 table4b,
db_name.table4 table4c,
db_name.table4 table4d,
db_name.table4 table4e,
db_name.table4 table4f,
db_name.table5 table5a,
db_name.table5 table5b,
db_name.table6 table6a,
db_name.table6 table6b,
db_name.table7 table7a,
db_name.table7 table7b,
db_name.table8 table8,
db_name.table9 table9,
db_name.table10 table10a,
db_name.table10 table10b,
db_name.table10 table10c,
db_name.table11 table11,
db_name.table12 table12,
db_name.table13 table13,
db_name.table14 table14,
db_name.table15 table15,
db_name.table16 table16
WHERE [...]
table2a.attr1=‘keyword’ AND
table3a.attr2=table10c.attr1 AND
table3a.attr6=table6a.attr3 AND
table3a.attr9=‘keyword’ AND
table4a.attr10 IN (‘keyword’) AND
table4a.attr1 IN (‘keyword’) AND
table5a.kinds=table4a.attr13 AND
table5b.kinds=table4c.attr74 AND
table5b.name=‘keyword’ AND
(table6a.attr19=table10c.attr17 OR
(table6a.attr2 IS NULL AND
table10c.attr4 IS NULL)) AND
table6a.attr14=table5b.attr14 AND
table6a.attr2=‘keyword’ AND
(table6b.attr14=table10c.attr8 OR
(table6b.attr4 IS NULL AND
table10c.attr7 IS NULL)) AND
table6b.attr19=table5a.attr55 AND
table6b.attr2=‘keyword’ AND
table7a.attr19=table2b.attr19 AND
table7a.attr17=table15.attr19 AND
table4b.attr11=‘keyword’ AND
table8.attr19=table7a.attr80 AND
table8.attr19=table13.attr20 AND
table8.attr4=‘keyword’ AND
table9.attr10=table16.attr11 AND
table3b.attr19=table10c.attr18 AND
table3b.attr22=table12.attr63 AND
table3b.attr66=‘keyword’ AND
table10a.attr54=table7a.attr8 AND
table10a.attr70=table10c.attr10 AND
table10a.attr16=table4d.attr11 AND
table4c.attr99=‘keyword’ AND
table4c.attr1=‘keyword’ AND
table11.attr10=table5a.attr10 AND
table11.attr40=‘keyword’ AND
table11.attr50=‘keyword’ AND
table2b.attr1=table1.attr8 AND
table2b.attr9 IN (‘keyword’) AND
table2b.attr2 LIKE ‘keyword’% AND
table12.attr9 IN (‘keyword’) AND
table7b.attr1=table2a.attr10 AND
table3c.attr13=table10c.attr1 AND
table3c.attr10=table6b.attr20 AND
table3c.attr13=‘keyword’ AND
table10b.attr16=table10a.attr7 AND
table10b.attr11=table7b.attr8 AND
table10b.attr13=table4b.attr89 AND
table13.attr1=table2b.attr10 AND
table13.attr20=’‘keyword’’ AND
table13.attr15=‘keyword’ AND
table3d.attr49=table12.attr18 AND
table3d.attr18=table10c.attr11 AND
table3d.attr14=‘keyword’ AND
table4d.attr17 IN (‘keyword’) AND
table4d.attr19 IN (‘keyword’) AND
table16.attr28=table11.attr56 AND
table16.attr16=table10b.attr78 AND
table16.attr5=table14.attr56 AND
table4e.attr34 IN (‘keyword’) AND
table4e.attr48 IN (‘keyword’) AND
table4f.attr89=table5b.attr7 AND
table4f.attr45 IN (‘keyword’) AND
table4f.attr1=‘keyword’ AND
table10c.attr2=table4e.attr19 AND
(table10c.attr78=table12.attr56 OR
(table10c.attr55 IS NULL AND
table12.attr17 IS NULL))
65. 65
er query at Statoil
w all norwegian wellbores with some aditional attributes
bore id, completion date, oldest penetrated age,result). Lim
wellbores with a core and show attributes like (wellbore id
number, top core depth, base core depth, intersecting
graphy). Limit to all wellbores with core in Brentgruppen
key atributes in a table. After connecting to EPDS (slegg
ould for instance limit futher to cores in Brent with measu
eability and where it is larger than a given value, for insta
We could also find out whether there are cores in Brent w
ot stored in EPDS (based on NPD info) and where there
ermeability values. Some of the missing data we possibly o
not.
SELECT [...]
FROM
db_name.table1 table1,
db_name.table2 table2a,
db_name.table2 table2b,
db_name.table3 table3a,
db_name.table3 table3b,
db_name.table3 table3c,
db_name.table3 table3d,
db_name.table4 table4a,
db_name.table4 table4b,
db_name.table4 table4c,
db_name.table4 table4d,
db_name.table4 table4e,
db_name.table4 table4f,
db_name.table5 table5a,
db_name.table5 table5b,
db_name.table6 table6a,
db_name.table6 table6b,
db_name.table7 table7a,
db_name.table7 table7b,
db_name.table8 table8,
db_name.table9 table9,
db_name.table10 table10a,
db_name.table10 table10b,
db_name.table10 table10c,
db_name.table11 table11,
db_name.table12 table12,
db_name.table13 table13,
db_name.table14 table14,
db_name.table15 table15,
db_name.table16 table16
WHERE [...]
table2a.attr1=‘keyword’ AND
table3a.attr2=table10c.attr1 AND
table3a.attr6=table6a.attr3 AND
table3a.attr9=‘keyword’ AND
table4a.attr10 IN (‘keyword’) AND
table4a.attr1 IN (‘keyword’) AND
table5a.kinds=table4a.attr13 AND
table5b.kinds=table4c.attr74 AND
table5b.name=‘keyword’ AND
(table6a.attr19=table10c.attr17 OR
(table6a.attr2 IS NULL AND
table10c.attr4 IS NULL)) AND
table6a.attr14=table5b.attr14 AND
table6a.attr2=‘keyword’ AND
(table6b.attr14=table10c.attr8 OR
(table6b.attr4 IS NULL AND
table10c.attr7 IS NULL)) AND
table6b.attr19=table5a.attr55 AND
table6b.attr2=‘keyword’ AND
table7a.attr19=table2b.attr19 AND
table7a.attr17=table15.attr19 AND
table4b.attr11=‘keyword’ AND
table8.attr19=table7a.attr80 AND
table8.attr19=table13.attr20 AND
table8.attr4=‘keyword’ AND
table9.attr10=table16.attr11 AND
table3b.attr19=table10c.attr18 AND
table3b.attr22=table12.attr63 AND
table3b.attr66=‘keyword’ AND
table10a.attr54=table7a.attr8 AND
table10a.attr70=table10c.attr10 AND
table10a.attr16=table4d.attr11 AND
table4c.attr99=‘keyword’ AND
table4c.attr1=‘keyword’ AND
table11.attr10=table5a.attr10 AND
table11.attr40=‘keyword’ AND
table11.attr50=‘keyword’ AND
table2b.attr1=table1.attr8 AND
table2b.attr9 IN (‘keyword’) AND
table2b.attr2 LIKE ‘keyword’% AND
table12.attr9 IN (‘keyword’) AND
table7b.attr1=table2a.attr10 AND
table3c.attr13=table10c.attr1 AND
table3c.attr10=table6b.attr20 AND
table3c.attr13=‘keyword’ AND
table10b.attr16=table10a.attr7 AND
table10b.attr11=table7b.attr8 AND
table10b.attr13=table4b.attr89 AND
table13.attr1=table2b.attr10 AND
table13.attr20=’‘keyword’’ AND
table13.attr15=‘keyword’ AND
table3d.attr49=table12.attr18 AND
table3d.attr18=table10c.attr11 AND
table3d.attr14=‘keyword’ AND
table4d.attr17 IN (‘keyword’) AND
table4d.attr19 IN (‘keyword’) AND
table16.attr28=table11.attr56 AND
table16.attr16=table10b.attr78 AND
table16.attr5=table14.attr56 AND
table4e.attr34 IN (‘keyword’) AND
table4e.attr48 IN (‘keyword’) AND
table4f.attr89=table5b.attr7 AND
table4f.attr45 IN (‘keyword’) AND
table4f.attr1=‘keyword’ AND
table10c.attr2=table4e.attr19 AND
(table10c.attr78=table12.attr56 OR
(table10c.attr55 IS NULL AND
table12.attr17 IS NULL))
50.000.000
€/year
66. Ontology-Based Data Access
66
OBDI framework Query answering Ontology languages Mappings Identity Conclusions
Ontology-based data integration framework
. . .
. . .
. . .
. . .
Query
Result
Ontology
provides
global vocabulary
and
conceptual view
Mappings
semantically link
sources and
ontology
Data Sources
external and
heterogeneous
We achieve logical transparency in accessing data:
does not know where and how the data is stored.
can only see a conceptual view of the data.
data sources
“lightweight”
conceptual
model
mapping
67. Ontop
• Open-source OBDA technology developed at
UNIBZ (supervisor: Diego Calvanese)
• Fully supports semantic web standards
(OWL/SPARQL)
• Integrates with a plethora of relational DBMSs
• Apache open license
• http://ontop.inf.unibz.it
67
69. What if my DB is Very Nice?
Ontology bootstrapping automatically creates
• a conceptual model that mirrors 1-1 the relational DB
• identity mappings
Useful for “small” case studies
69
70. OBDA for Process Mining
• Need to resolve a second impedance mismatch
problem!
• From here…
70
1..*
*
Conference
creation time:DateTime
conf name:String
User
creation time:DateTime
username:String
Paper
creation time:DateTime
title:String
Review Request
invitation time:DateTime
Review
submission time:DateTime
Decision
decision time:DateTime
outcome: Bool
Upload Submitted
upload time:DateTime
upload accepted
upload time:DateTime
submitted to
1
*
organizer of
Accepted Paper
<<no time>>
*
reviewer
1
0..1
PhasD
1
0..1
RhasR
1
10..1 corresponds to
*
UhasP
1
*
AhasU
1
*1 for
author
1..*
*
by
1
*
USuploadbyU
creator
1
*
1*
UAuploadbyU
1
*
84. Questions
• How to optimize and test the scalability of the
approach? Fine-tuning is a must!
• Real vs simulated data? (Benchmarking OBDA)
• Initial benchmarking using CPN tools
• Is the “virtual” approach useful? How do process
mining algorithms access the data?
• Hybrid virtual approach with caching strategies?
84
85. KAOS Project
Knowledge-Aware Operational Support
• Goal: Empowering process mining and online
operational support with domain knowledge
• Euregio project: Trento + Bolzano + Innsbruck
• Mix of expertise from AI, BPM, database theory, formal
methods, formal ontology, conceptual modeling,
process mining, machine learning, software engineering
• Just started: we are hiring!!!
85
87. Acknowledgments
All coauthors of this research,
in particular
Diego Calvanese (UNIBZ)
Giuseppe De Giacomo (UNIROMA)
Riccardo De Masellis (FBK-Trento)
Alin Deutsch (UCSD)
Chiara Difrancescomarino (FBK-Trento)
Chiara Ghidini (FBK-Trento)
Fabio Patrizi (UNIBZ)
Sergio Tessaris (UNIBZ)
Alifah Syamsiyah (TU/e)
Wil van der Aalst (TU/e)
87