The purpose of this presentation is providing an overview of the main approaches in using big data: data focus vs. business analytics focus. The following topics will be covered:
- Why getting data should not be a starting point in business analytics, and why more data not always result in more accurate predictions
- The simulation analytics methodology in comparison to machine learning and data science approach
- Examples of two business cases:
(i) Healthcare: Pediatric Triage in a Severe Pandemic-Maximizing Population Survival by Establishing Admission Thresholds
(ii) Banking & Finance: Analysis of the staffing and utilization of a team of mutual fund analysts for timely producing ‘buy-sell’ reports
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
DATA ANALYTICS FOR SOLVING BUSINESS PROBLEMS
1. DATA ANALYTICS FOR SOLVING BUSINESS
PROBLEMS:
SHIFTING FOCUS FROM THE TECHNOLOGY
DEPLOYMENT TO THE ANALYTICS METHODOLOGY
Alexander Kolker, PhD
April 3, 2018
Alexander Kolker. All rights reserved 1
2. Some professional highlights…
• 4 business consulting projects: US Bank, Boston Consulting Group,
Children’s Hospital of Wisconsin, Ohio Hospital Association
• 12 years at GE (General Electric) Healthcare: Data Scientist
• 3 years at Froedtert Hospital: Process Simulation Leader
• 5 years at Children’s Hospital of Wisconsin: Simulation and Data
Analytics
• UW-Milwaukee Lubar School of Business-Adjunct Faculty: A graduate
course Healthcare Delivery Systems-Data Analytics
• Lead Editor and Author of 2 books, 8 book chapters, 10 reviewed papers,
18 Conference presentation in the area of operations management,
process modeling and simulation, business analyticsAlexander Kolker. All rights reserved
3. BIG DATA AND BUSINESS
ANALYTICS BACKGROUND
Alexander Kolker. All rights reserved 3
4. A bold statement to start with:
Big data without actionable analytics and business
decision-making is a ‘sleeping beast’
Big Data includes 2 components:
1. Technology for storing, processing and
managing large amounts of data of various
nature- the current trend
2. Methodology for making business decisions
using modeling and data.
This is called Analytics, it is getting momentum…
Alexander Kolker. All rights reserved 4This presentation’s focus
5. Alexander Kolker. All rights reserved 5
Mainstream Big Data Paradigm
• Big Data (its Volume, Variety and Velocity) and
machine learning algorithms (mainly supervised) are
necessary and enough to make reliable predictions of
the future
• More data always result in more accurate predictions
6. Alexander Kolker. All rights reserved 6
Organizational structure of a digital company
7. Key points:
• Analytics must help in:
➢ Developing New products
➢ Improving Operational efficiency
➢ Business Decision support
Alexander Kolker. All rights reserved 7
Translated to $$
8. Alexander Kolker. All rights reserved 8
Devil’s Advocate Counter-point #1
• Data (big or small) represent the past.
• We are trying to predict the future using solely the
past.
• Is it always possible?
9. Alexander Kolker. All rights reserved 9
.
FOOD FOR THOUGHT
A quote:
“We may regard the present state of the universe as the effect of
its past (collected data!?) and the cause of its future (Predictive
analytics?!)
An intellect that is vast enough to submit all collected data to
analysis (analytics!) would embrace in a single formula (algorithm?)
the movements of the greatest bodies of the universe ...
For such an intellect nothing would be uncertain and the future
(predictive analytics?) just like the past (collected data!) would be
present before its eyes.”
Guess where is it quoted from?
10. Alexander Kolker. All rights reserved 10
.
FOOD FOR THOUGHT
Quoted From:
-Pierre Simon Laplace. A Philosophical Essay on Probabilities, 1795
Can the contemporary Big Data Technology function as that
‘intellect’ (e.g. Distributed File System, such as Hadoop or
Apache Spark) capable of analyzing all collected data and
accurately predicting the future?
11. Alexander Kolker. All rights reserved 11
Devil’s Advocate Counter-point #2
• When sensors or social media are used to collect data without planned
experimentation, then the data usually contain a lot of noise and
correlations.
• The number of truly independent observations I is always less than the
total number of observations N by the scaling factor S.
• If 1-st order autocorrelation coefficient of data is R1, then
I = N*S = (1-R1)/(1+R1), i.e.
12. Alexander Kolker. All rights reserved 12
Devil’s Advocate Counter-point #3
• Only the stable signal (pattern) that can be extended into the
future can provide meaningful prediction using validated
models.
• Noise is a data pattern that is unstable (disappears) in the
longer run.
• It is difficult in general to tell the signal from the noise. It
requires a lot of subject matter knowledge.
• Data must be placed in a business context. Without it, there is
no sure way to differentiate the useful signal from the useless
noise.
• A forecast made using mostly noisy data will not be reliable, or
prediction (say, classification) will be inundated by false
positives or false negatives.
13. 1.
Formulate and
Define
Business
Problem
2.
Translate Business
Problem into
Analytics
Problem
5.
Validate
solutions and
develop
implementation
plan
4.
Generate
Solutions and
possible
scenarios
3.
Collect Data
needed to solve
the Analytics
Problem
Main Steps in Business Operations Analytics
6.
Get the Solution
buy-in
from the
stakeholders
14. Key points
Data focus:
Capturing massive amount of raw data in different formats (numerical values,
descriptive texts, videos, audio, etc.), storing, retrieving, and managing data.
Business Analytics focus:
(i) Defining a business problem, (ii) identifying an analytic method (algorithm),
(iii) collecting data required to feed the algorithm, (iv) turning solution into the
actionable managerial decisions.
• Data is NOT a starting point in business analytics
• More data not always result in more accurate predictions
• Analytics is NOT a side effect of collecting, keeping and presenting / querying
data in general.
• Data have business value only in a specific business context.
• Currently there is a trend to focusing on data itself instead of focusing on
business context and methodology.Kolker. All rights reserved
14
15. 15
What are the recent research trends in the field of
industrial engineering & operations management?
Alexander Kolker. All rights reserved
• integration of big data analytics in operation management, including block
chain technology
• internet of Things (IoT) and big data analytics
• business decision-making under uncertainty
• prediction early warning signs of critical transitions in complex systems
• modeling of complex dynamic systems, which exhibit some critical thresholds, the so-
called tipping points, at which the system suddenly shifts from one state to another
Industrial engineering is a field of balancing man, money and
machines that forms truly complex nonlinear systems.
Technology focus
Methodology focus
16. 16
WHAT WILL BE COVERED NEXT…
The concept of simulation analytics for addressing some real
business problems
1. Use case 1: Healthcare
Pediatric Triage in a Severe Pandemic: Maximizing
Population Survival by Establishing Admission Thresholds
2. Use case 2: Banking & Finance
Analysis of the Staffing and Utilization of a Team of
Mutual Fund Analysts for timely producing ‘buy-sell’
reports
Alexander Kolker. All rights reserved
17. Alexander Kolker. All rights reserved 17
SIMULATE!
• In general, simulation is a process of studying complex
systems using their mathematical representation called a
model or a digital twin, e.g.
• Flight simulator-the aircraft response to the cockpit input
controls
• Nuclear plant operators simulators-reactor output
response to the various operator inputs
• Surgical and physiology procedures simulators on
mannequins
• The focus of this presentation is simulation of business
operations
18. Alexander Kolker. All rights reserved 18
Some Typical Business Operations Questions
• Healthcare:
➢ How many nurses are needed to provide the required
patient quality of care?
➢ Is another CT scanner needed to serve all patients on time?
➢ How many beds and staff are needed at different times of
the day or days of the week?
➢ What additional resources (nurses, beds, etc.) are needed
to decrease the wait time in the ED?
➢ How to maximize population survival during a severe
pandemic with limited resources?
19. Alexander Kolker. All rights reserved 19
Some Typical Business Operations Questions
• Banking & Finance:
➢ What transaction volume can be performed for the given
time period?
➢ Do we need to open a new branch (or close the existing
one)?
➢ What amount of capital is needed to mitigate the risk (credit,
interest, etc)?
➢ What staffing is needed to serve all perspective customers?
20. Alexander Kolker. All rights reserved 20
Key Point:
There is no way to answer any of the above and many
other business operations questions without
quantitative analysis based on simulation modeling.
Everything else will be just gut feeling & guessing
The Key issue is Uncertainty & VARIABILITY
21. Basic types of business operations simulation:
Alexander Kolker. All rights reserved 21
System Dynamics (SD)
SD operates mostly with macro-level flows: patient or
manufacturing parts or documents / transactions
SD is the most appropriate for analyzing the large-scale
nationwide healthcare or manufacturing systems, and the
implications of the different policies implementation.
22. Discrete Event Simulation (DES)
Alexander Kolker. All rights reserved 22
DES operates mostly with individual patients, parts,
workpieces or documents and their attributes.
DES is the most appropriate for analyzing business
operations of separate hospitals, clinics, manufacturing
facilities, banks and various separate organizations.
23. Agent-Based Simulation (ABS)
Alexander Kolker. All rights reserved 23
ABS - operates mostly with the actions and
interactions of autonomous entities. Includes emerging
rules of behavior that did not exist in the original
model design.
ABS is the most appropriate for analyzing an effect of
emerging individual behavior on the system response as
a whole.
24. Alexander Kolker. All rights reserved 24
Key Point:
The most powerful and versatile simulation
methodology for analyzing manufacturing, finance,
healthcare, military and other business operations is
Discrete Event Simulation
25. Alexander Kolker. All rights reserved 25
A validated DES model is used for predicting various
scenarios of the system’s responses to the random
inputs as a virtual reality
compare to Data Science validation and cross-validation of a
‘black box’ model for predicting the future outcomes…
26. Alexander Kolker. All rights reserved 26
Key points:
•A simulation model is not a ‘black box’.
• It is a scalable digital twin of the reality
•The model reflects what’s actually happening in the system
• This capability gives a sense of the expected system’s
output before incurring the cost and risk of the
business solution implementation
compare to Data Science approach: map an output to the
input through a black box model or algorithm
27. Alexander Kolker. All rights reserved 27
Use case 1
Healthcare:
Pediatric Triage in a Severe Pandemic:
Maximizing Population Survival by
Establishing Admission Thresholds
C. Gall, A. Kolker, et al,
Critical Care Medicine, 2016, V 44 , issue 9, pp: 1762–1768
28. Problem Description
• Recent experiences with SARS as well as H1N1 influenza
and Ebola virus have highlighted the potential for the
medical infrastructure to be overwhelmed.
• Currently the hospital admission policy is First Comes-First
Served (FCFS) regardless of the severity of the patient.
• However this results to the admission of patients who are
either will not benefit from care or too sick to survive
regardless of the amount of resources.
29. The objective of this work was:
(i) to develop a new admission policy that will result in maximal
population survival given the limited number of available beds
and mechanical ventilators
(ii) compare the population survival for the current FCFS admission
with the new suggested policy based on the identified
thresholds for the probability of death and patient days on
ventilator
This work was funded by the Ohio Hospital Association to develop pediatric
triage algorithms to augment their disaster readiness planning
Business Problem - Project Goal
31. High Level Discrete Event Simulation Model Outline.
• The simulation model explored various combinations of the thresholds,
POD threshold and DV threshold, to identify the specific values that
would maximize population survival and bed occupancy for a given
number of patients
32. High Level Discrete Event Simulation Model Outline.
The input values for the simulation
1. Pattern of patient arrivals and patient volumes during the pandemic.
• We assumed that the pandemic would follow the same epidemiologic
pattern as the 1918-19 Spanish influenza pandemic. The 1918-19
pandemic spanned 38 weeks with a peak in deaths during a 6 week period
in 1918
• We used the numbers of deaths during this peak period as a surrogate for
the numbers of patients developing respiratory failure during CSC.
• Daily numbers of patients to PICUs were estimated assuming a Poisson
distribution with the parameter equal to the total casualty volume times the
fraction of the weekly peak death average divided by 7.
33. The pattern of random daily arrival of children presenting in respiratory failure
(Ohio state children’s hospitals)
34. 2. Starting census.
It was assumed that at the beginning of the activation of CSC, all PICU beds
would be occupied, because CSC would not be activated until existing
resources were overwhelmed.
3. Probability of death (POD) and the number of days on ventilator (DV)
These relationships were derived using logistic regression from patient data in
the Virtual Pediatric Systems (VPS) database that contains over 900,000
clinical records of children hospitalized in approximately 130 participating
children’s hospitals.
35. • The simulation model assumed that patients could be admitted even if a
PICU bed is not immediately available.
• The patient would await admission (“queue”) for a specified maximal period
of time.
• For children entering the queue, there could be 3 potential outcomes:
• Palliative care because a bed was unavailable within the allowable time
• death in the queue
• or PICU admission
• The mortality rate for patients dying in the queue was modeled using the
concept of mortality half-life time
• The percentage of those who died while waiting for a PICU bed was
calculated as:
4. Patient wait-time limits and half-life time constant.
Pdiedwaiting=[1-e-wait_time/(1.443*T1/2)
]*100
36. Model’s Optimization Module
The model’s optimization objective function OF (TPOD, TDV) is
OF=w1*time-averaged occupancy (%)–w2*mortality untreated(%) –
w3*mortality treated(%),
where w1, w2, and w3 are relative weights of the corresponding
components of the OF.
To place higher importance on the priority of minimizing mortality over
maximizing bed occupancy, the weights w2 and w3 were assigned
twice the weight of w1.
OF optimization is always a trade-off between desired and undesired
trends of the contributing components.
The 3 components of the OF move in different directions:
higher occupancy contributes to a higher OF, a desired trend, but
higher mortality reduces the OF, an undesired trend.
37. Maximizing the OF was achieved through an evolutionary
algorithm.
• An evolutionary algorithm is an optimization technique that
generates various solutions in the defined range that must
adapt to their environment to be retained for further steps.
• The algorithm then focuses only on those narrow areas that
could contain better solutions.
• The process stops and the best solution is determined when
no further meaningful improvement in maximizing the OF is
possible
Maximizing of the objective function
39. Results
Overall simulated survival of pandemic casualties and bed
occupancy during CSC were compared using triage and FCFS
• Population survival using identified admission thresholds was
significantly improved compared with FCFS.
• The survival advantage increased as patient volume increased.
• Admission thresholds improved survival compared with FCFS
by 7.4% (99% CI =6.5%-8.3%) for 5,000 patient volume,
and by 24.0% (99% CI=23.4%-24.6%), for 10,000 patients.
42. Model Validation
Suppose we want to test the validity of the optimal thresholds for a
given patient volume.
1. Generate a random sample of intubated patients from a VPS data
set.
2. For each patient in this random sample calculate actual POD % and
DV
3. Apply the optimal thresholds, and exclude the patients who do not
meet both thresholds (they are not admitted and later die).
4. Calculate % mortality of admitted patients.
43. Model Validation
Next, calculate the corresponding FCFS mortality:
5. Generate a same size random sample as above in step 3.
6. Count the number of died in this sample with NO thresholds – FCFS
7. Calculate % mortality in that sample
8. Compare with % mortality for admitted with thresholds above from
step 4.
That comparison is the desired test: % mortality with NO thresholds
(FCFS) is always greater than that for admitted with the optimal
thresholds for the given patient volume.
9. Repeat these steps for other random samples to make sure that
there is no random sample bias (sort of cross-validation).
44. Use case 2
Banking & Finance:
Staffing and Utilization of the Team of Mutual
Fund Analysts to timely produce ‘buy-sell’
reports
Alexander Kolker. All rights reserved 44
45. Alexander Kolker. All rights reserved 45
Business Problem
The team of 11 fund analysts (FA) is supposed to
submit to the upper management fund pre-price reports
daily by 1 o’clock.
The management often receives reports late in
the day (after 1 pm) that hurts real-time buy-sell
decision-making.
Potential reasons for late reports could be:
• the uneven work load of analysts
• variability of time in getting fund information
• random analysts’ absentees
46. Project Goal
Alexander Kolker. All rights reserved 46
The simulation analysis is aimed at answering the
following management questions:
• How does the daily absence of any two analysts
affect the performance of the other team members
and the consistency of the timeliness of report
submission?
• How does the absence of specific team members
affect the team productivity and the timeliness of
report submission?
47. Alexander Kolker. All rights reserved 47
Description of Data
Baseline Fund Analysts Staffing Model
• All fund analysts (FA) work from 8:30 am to 5:30 pm.
• Lunch time 2 pm to 3 pm (can be modified in model
scenarios)
• Industry standard daily availability 85% (can be modified
in model scenarios)
• Variable time for all activities (variability can be modified
in model scenarios)
• Rework time buffer: additional 10% of the pre-price
completion time (can be modified in model scenarios)
51. Alexander Kolker. All rights reserved 51
Take-away:
• Daily FA utilization is relatively low for all FA, below 50%.
It means, in general, a lot of staffing capacity.
• At the same time, daily FA utilization is highly uneven:
FA1 Lee and FA6 Maria have much higher utilization than other FA.
• Bob and Brian have very low daily utilization (they are supervisors)
• Completion time for most FA is in the range from 12:30 pm (on
“good” days) to 1:30 pm and later (on “bad” days).
•The latest time is for FA1 Lee (could be as late as 2:11 pm on “bad”
days), and FA6 Maria (could be as late as 1:30 pm).
This is consistent with higher utilization for these FA compared to
other FA
52. Alexander Kolker. All rights reserved 52
The Use of the Validated Model to Analyze Staffing
Scenarios
Scenario 1
Two most busy analysts are out on the same day. Their
transactions workload will be picked up by the other
available team members.
A question to be answered by the simulation model:
How does two absent analysts affect the daily utilization of
the other team members and the time to complete buy-sell
report?
53. Alexander Kolker. All rights reserved 53
The most busy analysts FA1_Lee and FA6_Maria are out:
54. Alexander Kolker. All rights reserved 54
Overall conclusions for scenario 1:
Two FA team members are out:
1. The reduced team can handle the remaining workload
and meet the deadlines most of the time (but not always).
2. In order to consistently meet the completion deadline the
workload should be distributed more evenly between the
team’s FAs – this is a management issue
55. 55
Concluding Key Points:
So, how can you tell if simulation is right for you?
• This is a methodology of choice for analyzing the dynamic
behavior of the complex systems/processes with random
components
• There is a big decision to make with high potential for failure or
reward
• Provides a framework for experimenting with the system
and testing various business scenarios
• Reveals unintended consequences of business solutions
• Commitment to use the findings and recommendations even if
they are not what you want to hear
Alexander Kolker. All rights reserved
56. Overall Concluding Remarks and Reflections
Alexander Kolker. All rights reserved 56
• As analytics or operations research professionals we are
rewarded for helping the organization in solving business
problems
• Building analytics that influences business decision-making
requires attention to a non-technical side of the project
(the organization’s culture, internal politics and power-sharing)
• Analytics has no practical value for the organization if it does
not affect business decision-making, regardless of how much
a new trendy technology is used
So, how much of your work is about understanding and
addressing real business problems vs. falling in love with a
sexy technology or just finding patterns in the data?