More Related Content Similar to DI&A Slides: Descriptive, Prescriptive, and Predictive Analytics (20) More from DATAVERSITY (20) DI&A Slides: Descriptive, Prescriptive, and Predictive Analytics1. The First Step in Information Management
www.firstsanfranciscopartners.com
Produced by:
MONTHLY SERIES
Brought to you in partnership with:
March 2, 2017
Descriptive, Prescriptive and Predictive Analytics
4. Topics For Today’s Webinar
pg 3© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
§ Overview of statistical analysis process
− Forming a hypothesis
− Identifying appropriate sources
− Proving/Disproving the hypothesis
§ Types of data analysis
− Descriptive data analytics
− Predictive data analytics
− Prescriptive data analytics
§ How these types compare within the analytic environment
§ Key takeaways and suggested resources
Combine?
Descriptive
Predictive
Prescriptive
5. The Process of Statistical Analysis
pg 5© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Form
Hypotheses
• Null: Nothing
special
• Alternative:
Something
unique, an
actionable
finding, etc.
Identify Data
Source
• Don’t go
overboard!
• Collect your
own, OR
• Use
secondary
data
Prove/Disprove
Hypothesis
• Is Type I or
Type II error
worse?
• Choose
confidence
level
• Reject/not
reject null
When we have resource constraints, Statistical Analysis enables us to make quantitative
inferences based on an amount of information we can analyze (a sample).
6. Step 1: Forming a Hypothesis
§ In statistical analysis, we have two hypotheses:
− Null hypothesis: Claims that any irregularities in the sample are due
to chance
− Alternative hypothesis: Claims that irregularities in the sample are due
to non-random causes (and would therefore reflect the population)
§ What are you really looking to discover/prove?
− Experiment 1:
§ Null: There is no difference in the amount sold when comparing salespeople who did
and did not receive training.
§ Alternative: There is a difference in the amount sold when comparing salespeople who
did and did not receive training.
− Experiment 2:
§ Null: The salespeople who received training do not sell more on average than the
salespeople who did not receive training.
§ Alternative: Salespeople who received the training sell more on average than those who
did not receive the training.
pg 6© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Step 1
7. Step 2: Identifying Appropriate Sources
§ Remember, you don’t need Big Data for every decision!
§ Sometimes, knowing what data you don’t need is just as important
as knowing what you do need. Keep your end decision in mind.
§ Potential sources of data:
− Primary data − collect new data
§ Who to include: Random sample, stratified random sample, etc.
§ How many to include: Sample size calculators online (free)
§ Determine the level of measurement needed for your desired analysis:
categorical, ordinal, interval, rational
§ As necessary, design a control group
− Secondary data − utilize existing data
§ Census records, syndicated data, government data, etc.
§ Consider your data needs, data cleanliness, cost, etc., when determining
appropriate sources.
pg 7© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Step 2
8. Step 3: Proving/Disproving the Hypothesis
§ Establish a confidence level prior to analysis.
§ Confidence levels:
1. Determine how significant a difference/irregularity must be for you
to prove/disprove your alternative hypothesis.
2. Determine how confident you can be in your decision.
§ Even with a high confidence level, you aren’t always right:
− Type I error: You reject the null hypothesis but shouldn’t have.
− Type II error: You do not reject the null hypothesis but should have.
− How to decrease the likelihood of these errors: change the confidence level, increase
sample size (be aware of effect size), etc.
§ Determine which type of error is more detrimental to your investigation and set
up your study accordingly.
pg 8© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Step 3
9. Step 3: Proving/Disproving the Hypothesis
pg 9© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Training N Mean
Std.
Deviation
Std. Error
Mean
No training 74 102.643 9.95482 1.15722
Training 74 106.3889 9.83445 1.14323
QPctQ3
Sig. (2-
tailed)
Mean
Difference
Std. Error
Difference
95%
Confidence
Interval of
the
Difference
95%
Confidence
Interval of
the
Difference
Lower Upper
0.029 0.865 -2.303 146 0.023 -3.74595 1.6267 -6.96086 -0.53103
-2.303 145.978 0.023 -3.74595 1.6267 -6.96087 -0.53102
Levene's Test for
Equality of Variances
t-test for Equality of
Means
F Sig.
§ Confidence level = 95%
§ Alpha = 0.05
100
102
104
106
108
No training Training
Percent of 3rd Quarter Quota Sold
by Trained vs. Untrained
Salespeople
11. Types of Data Analysis
pg 11© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Predictive PrescriptiveDescriptive
• Aims to help
uncover valuable
insight from the
data being analyzed
• Answers the
question
“What happened?”
• Helps forecast
behavior of people
and markets
• Answers the question
“What could happen?”
• Suggests
conclusions or
actions that may
be taken based
on the analysis
• Answers the
question
“What should
be done?”
12. § Though the most simple type, it is used most
often.
§ Two types of descriptive analysis:
1. Measures of central tendency (tells us
about the middle)
§ Mean − the average
§ Median − the midpoint of the
responses
§ Mode − the response with the highest
frequency
2. Measures of dispersion
§ Range − the min, the max and the
distance between the two
§ Variance − the average degree to which
each of the points differ from the mean
§ Standard Deviation − the most
common/standard way of expressing
the spread of data
pg 12© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Customer_ID Items Purchased Amount Spent
29304 1 1.09$
28308 3 44.43$
19962 21 218.58$
30281 1 73.02$
6.5
2
1
0
1
2
3
4
5
6
7
Mean Median Mode
Mean, Median and Mode Amounts
of Items Purchased
Descriptive Data Analytics
14. § Some mistake predictive analysis to have exclusive relevance to predicting
future events.
− However, in cases such as sentiment analysis, existing data (e.g., the text
of a tweet) is used to predict non-existent data (whether the tweet is positive
or negative).
§ Several of the models that can be used for predictive analysis are:
− Forecasting
− Simulation
− Regression
− Classification
− Clustering
pg 14© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Predictive Data Analytics
15. Forecasting
§ Forecasting:
− Moving average technique: use the
mean of prior periods to predict the
next
§ The mean of periods 1−4 = period 5
§ The mean of periods 2−5 = period 6
− Exponential smoothing technique:
similar, but more recent data points
are weighted more heavily due to
relevance
− Regression techniques
§ Use caution in forecasting – The
larger the forecasted time period,
the less accuracy there is in the
projections.
pg 15© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
$-
$5,000.00
$10,000.00
$15,000.00
$20,000.00
$25,000.00
2006 2008 2010 2012 2014 2016 2018 2020 2022
Net Income of Store C Projected 2017-2020
Predictive
16. Simulation
§ Simulation
− Queuing models: used to predict wait time and queue length
§ Results can be used to create staff schedules in a way that reduces inefficiencies, etc.
− Discrete event model: used in special situations when queuing cannot be used
§ Results can be used to identify bottlenecks, etc.
− Monte Carlo simulations: used to identify probable outcomes of a scenario
based on many possible outcomes (uses random number generation and many
iterations of the scenario).
§ Results can be used to predict the likelihood of profitability within the first two years, etc.
pg 16© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Predictive
19. Regression
§ Regression − generally speaking, used
to understand the correlation of
independent and dependent variables
pg 19© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
§ Types of regression models:
− Logistic: used for categorical variables (i.e., will customers shop at your store or a
competitor?)
− Linear: used to identify a linear relationship between the dependent variable and
at least one independent variables (i.e., daily store revenue predicted by the
number of customers entering the store)
− Step-wise: used to identify a relationship between dependent/independent
variables. This is done by adding/removing variables based on how those
variables impact the overall strength of the model.
Predictive
23. Linear Programming Example
pg 23© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Product A Product B Product C Product D Product E
Quantity to Order
Profit per Unit 5$ 3$ 20$ 50$ 200$ Total Profit -$
Product A Product B Product C Product D Product E Used Available
Storage Space 0.05 0.5 1 5 10 1000
Selling Effort 0.25 5 0.5 2 7 500
Minimum Order 100 15 20 60 5
Product A Product B Product C Product D Product E
Quantity to Order 100 15 490 60 5
Profit per Unit 5$ 3$ 20$ 50$ 200$ Total Profit 14,345.00$
Product A Product B Product C Product D Product E Used Available
Storage Space 0.05 0.5 1 5 10 852.5 1000
Selling Effort 0.25 5 0.5 2 7 500 500
Minimum Order 100 15 20 60 5
Solution:
Prescriptive
26. Closing Q&A
pg 26© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Descriptive
Predictive
Prescriptive
?