Paper presentation at the International Conference on Advanced Information Systems Engineering (CAiSE).
This paper presents an approach to automatically discover business process simulation models from event logs by combining process mining and deep learning techniques.
Paper available at: https://link.springer.com/chapter/10.1007/978-3-031-07472-1_4
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
Learning Accurate Business Process Simulation Models from Event Logs via Automated Process Discovery and Deep Learning
1. Learning Accurate Business Process Simulation
Models from Event Logs via Automated Process
Discovery and Deep Learning
Manuel Camargo
Marlon Dumas
Oscar González-Rojas
2. z
2
Process Credit Card
Accept Cash or Check
Identify payment method
18 / 8 min
20 / 5 min
5 / 2 min
Prepare package for customer
10 / 5 min
Cycle time
Processing / Waiting times
Costs x activity x instance x resource …
Resource utilization
1h
5 min
38 min
10 min
How to assess the impact of a
business process change on temporal
& cost measures?
BUSINESS PROCESS SIMULATION
3. z
DESIGNING BUSINESS PROCESS SIMULATION (BPS) MODELS
3
• Interviews
• Expert knowledge
• Observations
• Sampling
Time consuming (hard to tune)
Execution paths left aside
Identify
payment
method
Prepare
package for
customer
Accept
Cash or
Check
X
Process
Credit
Card
X
Payment
Method?
Process
Virtual
Currencies
X
Credit Card
Accepted?
Prone to human
errors
Unrealistic parameters
4. z
Data to the Rescue!
EnterpriseSystem
(CRM, ERP,…)
Event Log
Simulation
Model
5. z
PROBLEM STATEMENT
5
How to automatically create accurate business process
simulation models based on data extracted from enterprise
information systems?
6. z
OBSERVATION
6
A generative model of business processes is a statistical
model constructed from an event log that can generate
traces that resemble those observed in the log and other
traces of the process (not observed in the log).
A Business Process Simulation (BPS) model is a generative
model of a business process.
7. z
LEARNING GENERATIVE BUSINESS PROCESS MODELS
7
Data-Driven Discrete Event
Simulation
• Use process mining &
data mining techniques to
discover branching
probabilities, resource
pools, resource calendars,
etc.
Deep Learning
• Use deep learning
methods to discover a
neural network (e.g. an
LSTM network) to predict
successive events and to
generate event sequences
8. z
8
Data-Driven Simulation
• May take as input a process specification (helps with
interpretability)
• Requires specifications of resource constraints
• Models the case creation process via a probability
distribution
• Assumes undifferentiated resources with robotic
behavior
• Models resource availability as calendars (possibly
discovered from historical data)
• Relies on branching probabilities to local conditional
choice
• Provides a natural mechanism for capturing the
effect of changes to the process
Deep Learning (DL) Sequence Generation Methods
• No interpretable process specification
• Does not explicitly consider resource constraints
• Learns the case arrival process from data
• May capture differentiated resources and robotic
behavior
• Captures resource availability via non-linear
functions
• Branching behavior modeled via neural networks
may capture complex relations
• Does not have a mechanism for capturing the
effect of changes to the process
LEARNING GENERATIVE BUSINESS PROCESS MODELS
M. Camargo, M. Dumas, O. González Rojas: Discovering generative models from
event logs: data-driven simulation vs deep learning. PeerJ Computer Science vol.
7, e577, 2021
9. z
HYPOTHESIS
9
By combining data-driven simulation and deep learning, we can
learn generative business process models that are:
1. more accurate than those we can learn by using these methods
in isolation
2. Suitable both for “as is” and for “what if” analysis use cases
10. z
DeepSimulator: HYBRID LEARNING OF BUSINESS PROCESS SIMULATION MODELS
10
{T1 -> T2 -> T3}
{T1 -> T3 -> T3}
{T1 -> T2 -> T3}
{T1 -> T2 -> T3}
{T1 -> T2 -> T2}
{T1 -> T2 -> T3}
{T1 -> T2 -> T3}
{T1 -> T3 -> T2}
{T1 -> T2 -> T3}
{T1 -> T2 -> T3}
Event log
Phase 1
• Stochastic process
model Discovery &
optimization
• Generation of activity
sequences
Activity sequence
generation
Accuracy assessment
Phase 2
• Time-series analysis &
optimization
• Enrichment of traces
with start-times
Case start-times
generation
Phase 3
• Deep-learning models
training & optimization
• Enrichment of traces with
timestamps
Activity timestamps
generation
11. z
11
PHASE 1 - ACTIVITY SEQUENCE GENERATION
A1 A2 A3 A5 A6
A4
Ꝺ1:
A2 A3 A4 A5
Ꝺ2:
• Control-flow discovery (BPMN model):
• Split Miner algorithm
• Discover the branching probabilities:
I. Assign equal values to each
conditional branch
II. Compute the branching probabilities
by replaying the aligned event against
the
process model
DeepSimulator: HYBRID LEARNING OF BUSINESS PROCESS SIMULATION MODELS
12. z
12
PHASE 2 - CASE START-TIMES GENERATION
A1 A2 A3 A5 A6
A4
A2 A3 A4 A5
Time series decomposition (trend, seasonality, and holidays):
y(t) = g(t) + s(t) + h(t) + Ɛt
Trend g(t): models nonperiodic changes in the value of the time
series: Saturating growth model (logistic growth model)
Seasonality s(t): represents periodic changes (e.g., weekly and
yearly seasonality): Fourier series
Holidays h(t): represents the effects of holidays that occur on
potentially irregular schedules over one or more days: Introduced
by the analyst
Error Ɛt: idiosyncratic changes which are not accommodated by
the model
DeepSimulator: HYBRID LEARNING OF BUSINESS PROCESS SIMULATION MODELS
13. z
13
PHASE 3 - ACTIVITY TIMESTAMPS GENERATION
e1- start e1- complete
e2- start
𝜎1 Ac1
Ac2
e2- complete
Waiting time predictive model
Features: Wait+Ac2+Cx+WIP+RO
Processing time predictive model
Features: Proc+Ac1+Cx+WIP+RO
A1 A2 A3 A5 A6
A4
A2 A3 A4 A5
DeepSimulator: HYBRID LEARNING OF BUSINESS PROCESS SIMULATION MODELS
14. z
Size Source log #Traces #Events #Act. Avg. activities
per trace
Avg. duration Max. duration Description
LARGE
R POC 70512 415261 8 5.89 15.21 days 269.23 days Undisclosed banking process*
R BPI17W 30276 240854 8 7.96 12.66 days 286.07 days Dutch financial institution updated
R BPI12W 8616 59302 6 6.88 8.91 days 85.87 days Dutch financial institution
S CVS 10000 103906 15 10.39 7.58 days 21.0 days CVS retail pharmacy**
S CFM 1670 44373 29 26.57 0.76 days 5.83 days Anonymized confidential process**
SMALL
R INS 1182 23141 9 19.58 70.93 days 599.9 days Insurance claims process*
R ACR 954 4962 16 5.2 14.89 days 135.84 days Academic Credential Recognition
R MP 225 4503 24 20.01 20.63 days 87.5 days Manufacturing Production
S CFS 800 21221 29 26.53 0.83 days 4.09 days Anonymized confidential process**
S P2P 608 9119 21 15 21.46 days 108.31 days Purchase-to-Pay process
DATASETS (EVENT LOGS)
14
(*) Private logs, (**) Generated from simulation models of real processes
15. z
EXP1 - AS-IS ANALYSIS USING DEEPSIMULATOR
15
Partition 2
(30%)
Testing
Partition
1
(70%) Training
Testing
Deep Learning
Trainer
BEST DL MODEL
Trace generator
Time
splitting
Evaluator
ELS/DL/MAE/EMD
Simod
Parameter extraction
BEST SIM MODEL
Simulator
Training
(80%)
Validation
(20%)
Event-log 1
2
3
DeepSimulator
BEST DEEP SIM
MODEL
Deep Simulator
16. z
EXP1 – EVALUATION RESULTS
16
DeepSimulator generally outperforms classical DDS w.r.t. temporal measures
18. z
EXP2 - WHAT-IF ANALYSIS (ADDING A NEVER-BEFORE-OBSERVED ACTIVITY)
18
Remove
activity
Train DSIM
baseline model
List of changes
Update models
Update
embeddings
Replace embeddings of
generative models
Evaluator
MAE/RMSE/SMAPE
Generate log Generate log
Simulation
model
Train DSIM modified
model
BASELINE MODEL UPDATED MODEL
Simulation
model modified
log
Modified log
Partition 2 (30%)
Testing
Partition
1
(70%)
Time
splitting
Training (80%)
Validation (20%)
Event-log
Scenario 2
19. z
EXP2 - EVALUATION RESULTS
19
• DeepSimulator can better estimate the impact of changes in the demand in settings where such
changes have been previously observed in the data.
• The accuracy of DeepSimulator degraded when evaluated in a previously unobserved scenario (a
new activity is added to the process)
SIMOD DSIM SIMOD DSIM SIMOD DSIM
Version 1
BPI17W 971151 417572 0.02222 0.03593 3185 3647
BPI12W 660211 534341 0.11295 0.04853 515 458
CVS 1489252 467572 0.03213 0.00001 3380 849
Version 2
BPI17W 895524 290980 0.06438 0.03218 4528 3431
BPT12W 550266 524995 0.25888 0.22003 726 507
CVS 540112 246159 0.15674 0.05708 2453 1967
AS-IS WHAT-IF AS-IS WHAT-IF AS-IS WHAT-IF
CFM 7155 17546 22006 33137 0.15629 0.28762
CVS 283061 1040344 357717 1052255 0.31972 1.84601
Log
MAE RMSE SMAPE
Scenario
1
Scenario
2 Log
MAE EMD DTW
20. z
CONCLUSION
20
The DeepSimulator method combines data-driven simulation to capture the control-flow
perspective of a process with deep learning techniques to capture the temporal perspective.
The evaluation in the AS-IS setting shows that DeepSimulator outperforms a pure Data-Driven
Discrete Event Simulation method and a pure Deep Learning method.
The evaluation on WHAT-IF analysis scenarios shows that DeepSimulator can better
estimate the impact of changes on the arrival rate of cases (the demand) in settings
where such changes have been previously observed in the data.
However, the accuracy of DeepSimulator degrades when applied to a previously
unobserved scenario, specifically a scenario where a completely new activity is added to
the process.
21. z
FUTURE WORK
21
Explore other mechanisms for modeling the activities of the
process via embeddings (e.g., word2vec, transformer models).
Generate events that include resource and domain-specific
attributes.
Support a broader range of changes, such as changes in the
resource perspective