DI&A Slides: Descriptive, Prescriptive, and Predictive Analytics

The First Step in Information Management
www.firstsanfranciscopartners.com
Produced by:
MONTHLY SERIES
Brought to you in partnership with:
March 2, 2017
Descriptive, Prescriptive and Predictive Analytics

Polling Questions
§ What type of statistical analyses do you use or plan to use (can choose multiple answers)?
− Descriptive
− Predictive
− Prescriptive
− I don’t use any of these
− I don’t know the difference between these
pg 2© 2017 First San Francisco Partners www.firstsanfranciscopartners.com

Polling Questions
§ What type of statistical analyses do you use or plan to use (can choose multiple answers)?
− Descriptive
− Predictive
− Prescriptive
− I don’t use any of these
− I don’t know the difference between these
§ How frequently do you use statistical analyses in your work?
− I don’t currently do any type of statistical analysis
− Less than once a week
− Once or a few times a week
− At least once a day

Topics For Today’s Webinar
§ Overview of statistical analysis process
− Forming a hypothesis
− Identifying appropriate sources
− Proving/Disproving the hypothesis
§ Types of data analysis
− Descriptive data analytics
− Predictive data analytics
− Prescriptive data analytics
§ How these types compare within the analytic environment
§ Key takeaways and suggested resources
Combine?
Descriptive
Predictive
Prescriptive

The Process of Statistical Analysis
Form
Hypotheses
• Null: Nothing
special
• Alternative:
Something
unique, an
actionable
finding, etc.
Identify Data
Source
• Don’t go
overboard!
• Collect your
own, OR
• Use
secondary
data
Prove/Disprove
Hypothesis
• Is Type I or
Type II error
worse?
• Choose
confidence
level
• Reject/not
reject null
When we have resource constraints, Statistical Analysis enables us to make quantitative
inferences based on an amount of information we can analyze (a sample).

Step 1: Forming a Hypothesis
§ In statistical analysis, we have two hypotheses:
− Null hypothesis: Claims that any irregularities in the sample are due
to chance
− Alternative hypothesis: Claims that irregularities in the sample are due
to non-random causes (and would therefore reflect the population)
§ What are you really looking to discover/prove?
− Experiment 1:
§ Null: There is no difference in the amount sold when comparing salespeople who did
and did not receive training.
§ Alternative: There is a difference in the amount sold when comparing salespeople who
did and did not receive training.
− Experiment 2:
§ Null: The salespeople who received training do not sell more on average than the
salespeople who did not receive training.
§ Alternative: Salespeople who received the training sell more on average than those who
did not receive the training.
Step 1

Step 2: Identifying Appropriate Sources
§ Remember, you don’t need Big Data for every decision!
§ Sometimes, knowing what data you don’t need is just as important
as knowing what you do need. Keep your end decision in mind.
§ Potential sources of data:
− Primary data − collect new data
§ Who to include: Random sample, stratified random sample, etc.
§ How many to include: Sample size calculators online (free)
§ Determine the level of measurement needed for your desired analysis:
categorical, ordinal, interval, rational
§ As necessary, design a control group
− Secondary data − utilize existing data
§ Census records, syndicated data, government data, etc.
§ Consider your data needs, data cleanliness, cost, etc., when determining
appropriate sources.
Step 2

Step 3: Proving/Disproving the Hypothesis
§ Establish a confidence level prior to analysis.
§ Confidence levels:
1. Determine how significant a difference/irregularity must be for you
to prove/disprove your alternative hypothesis.
2. Determine how confident you can be in your decision.
§ Even with a high confidence level, you aren’t always right:
− Type I error: You reject the null hypothesis but shouldn’t have.
− Type II error: You do not reject the null hypothesis but should have.
− How to decrease the likelihood of these errors: change the confidence level, increase
sample size (be aware of effect size), etc.
§ Determine which type of error is more detrimental to your investigation and set
up your study accordingly.
Step 3

Step 3: Proving/Disproving the Hypothesis
Training N Mean
Std.
Deviation
Std. Error
Mean
No training 74 102.643 9.95482 1.15722
Training 74 106.3889 9.83445 1.14323
QPctQ3
Sig. (2-
tailed)
Mean
Difference
Std. Error
Difference
95%
Confidence
Interval of
the
Difference
95%
Confidence
Interval of
the
Difference
Lower Upper
0.029 0.865 -2.303 146 0.023 -3.74595 1.6267 -6.96086 -0.53103
-2.303 145.978 0.023 -3.74595 1.6267 -6.96087 -0.53102
Levene's Test for
Equality of Variances
t-test for Equality of
Means
F Sig.
§ Confidence level = 95%
§ Alpha = 0.05
100
102
104
106
108
No training Training
Percent of 3rd Quarter Quota Sold
by Trained vs. Untrained
Salespeople

Types of Data Analysis

Types of Data Analysis
Predictive PrescriptiveDescriptive
• Aims to help
uncover valuable
insight from the
data being analyzed
• Answers the
question
“What happened?”
• Helps forecast
behavior of people
and markets
• Answers the question
“What could happen?”
• Suggests
conclusions or
actions that may
be taken based
on the analysis
• Answers the
question
“What should
be done?”

§ Though the most simple type, it is used most
often.
§ Two types of descriptive analysis:
1. Measures of central tendency (tells us
about the middle)
§ Mean − the average
§ Median − the midpoint of the
responses
§ Mode − the response with the highest
frequency
2. Measures of dispersion
§ Range − the min, the max and the
distance between the two
§ Variance − the average degree to which
each of the points differ from the mean
§ Standard Deviation − the most
common/standard way of expressing
the spread of data
Customer_ID Items Purchased Amount Spent
29304 1 1.09$
28308 3 44.43$
19962 21 218.58$
30281 1 73.02$
6.5
2
1
0
1
2
3
4
5
6
7
Mean Median Mode
Mean, Median and Mode Amounts
of Items Purchased
Descriptive Data Analytics

AnalysisPredictive

§ Some mistake predictive analysis to have exclusive relevance to predicting
future events.
− However, in cases such as sentiment analysis, existing data (e.g., the text
of a tweet) is used to predict non-existent data (whether the tweet is positive
or negative).
§ Several of the models that can be used for predictive analysis are:
− Forecasting
− Simulation
− Regression
− Classification
− Clustering
Predictive Data Analytics

Forecasting
§ Forecasting:
− Moving average technique: use the
mean of prior periods to predict the
next
§ The mean of periods 1−4 = period 5
§ The mean of periods 2−5 = period 6
− Exponential smoothing technique:
similar, but more recent data points
are weighted more heavily due to
relevance
− Regression techniques
§ Use caution in forecasting – The
larger the forecasted time period,
the less accuracy there is in the
projections.
$-
$5,000.00
$10,000.00
$15,000.00
$20,000.00
$25,000.00
2006 2008 2010 2012 2014 2016 2018 2020 2022
Net Income of Store C Projected 2017-2020
Predictive

Simulation
§ Simulation
− Queuing models: used to predict wait time and queue length
§ Results can be used to create staff schedules in a way that reduces inefficiencies, etc.
− Discrete event model: used in special situations when queuing cannot be used
§ Results can be used to identify bottlenecks, etc.
− Monte Carlo simulations: used to identify probable outcomes of a scenario
based on many possible outcomes (uses random number generation and many
iterations of the scenario).
§ Results can be used to predict the likelihood of profitability within the first two years, etc.
Predictive

Queuing Model Example
Scenario 1 Scenario 2
Predictive

Monte Carlo Simulation Example
Predictive

Regression
§ Regression − generally speaking, used
to understand the correlation of
independent and dependent variables
§ Types of regression models:
− Logistic: used for categorical variables (i.e., will customers shop at your store or a
competitor?)
− Linear: used to identify a linear relationship between the dependent variable and
at least one independent variables (i.e., daily store revenue predicted by the
number of customers entering the store)
− Step-wise: used to identify a relationship between dependent/independent
variables. This is done by adding/removing variables based on how those
variables impact the overall strength of the model.
Predictive

Classification & Clustering
§ Classification: used to assign objects to
one of several categories
− Sentiment analysis of social media
postings
§ Clustering: another method of forming
groups
− Intragroup differences are minimized
− Intergroup differences are maximized
− Commonly used to create and better
understand customer groups
Predictive

AnalysisPrescriptive

§ Decisions can be formulated from descriptive and predictive analysis
− If I need to cut a product and I know that product C is least preferred and least
profitable, I will cut product C.
§ However, prescriptive analytics explicitly tell you the decisions that should
be made. This can be done using a variety of techniques:
− Linear programming
− Integer programming
− Mixed integer programming
− Nonlinear programming
Prescriptive Data Analytics

Linear Programming Example
Product A Product B Product C Product D Product E
Quantity to Order
Profit per Unit 5$ 3$ 20$ 50$ 200$ Total Profit -$
Product A Product B Product C Product D Product E Used Available
Storage Space 0.05 0.5 1 5 10 1000
Selling Effort 0.25 5 0.5 2 7 500
Minimum Order 100 15 20 60 5
Product A Product B Product C Product D Product E
Quantity to Order 100 15 490 60 5
Profit per Unit 5$ 3$ 20$ 50$ 200$ Total Profit 14,345.00$
Product A Product B Product C Product D Product E Used Available
Storage Space 0.05 0.5 1 5 10 852.5 1000
Selling Effort 0.25 5 0.5 2 7 500 500
Minimum Order 100 15 20 60 5
Solution:
Prescriptive

Comparing the Three Types of Data Analytics
§ Descriptive analysis is most common.
− Best practice to perform descriptive
analyses prior to prescriptive/predictive
§ Understand that distribution, variance,
skew, etc., may exclude certain models
§ How to know which type of analysis to
pursue:
− How much time do you have?
− What resources are available to you?
− How accurate is your data? How accurate
do you need the model/analysis to be?
− How popular/accepted is the model you are considering?
§ Don’t subscribe to “that’s how we’ve always done it,” but
remember to use a model that stakeholders will accept.

Key Takeaways and Suggested Resources
§ Gaining meaningful insights from data requires planning, technical awareness and consistency.
§ Statistical analysis isn’t a replacement for your own logic (don’t go on statistical autopilot).
§ Utilize available resources (blogs, podcasts, articles, webinars and online courses) to learn more.
− Look for APPLIED statistics topics
§ Big data is not always required.
§ Basic understanding of the statistical
analysis process goes a long way!
Podcast: Not So Standard Deviations
https://soundcloud.com/nssd-podcast
Guide: When Predictive Models Fail
searchdatamanagement.techtarget.com/
ezine/Business-Information/When-
predictive-analytics-models-produce-
false-outcomes
Book: Statistics
in Plain English
Timothy C. Urdan

Closing Q&A
Descriptive
Predictive
Prescriptive
?

pg 27
Thank you!
See you Thursday, April 6 for our next DIA webinar,
Building a Flexible and Scalable Analytics Architecture
Catch our webinar recap next week here:
firstsanfranciscopartners.com/blog
John Ladley @jladley
john@firstsanfranciscopartners.com
Kelle O’Neal @kellezoneal
kelle@firstsanfranciscopartners.com
© 2016 First San Francisco Partners www.firstsanfranciscopartners.com

DI&A Slides: Descriptive, Prescriptive, and Predictive Analytics

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to DI&A Slides: Descriptive, Prescriptive, and Predictive Analytics

Similar to DI&A Slides: Descriptive, Prescriptive, and Predictive Analytics (20)

More from DATAVERSITY

More from DATAVERSITY (20)

Recently uploaded

Recently uploaded (20)

DI&A Slides: Descriptive, Prescriptive, and Predictive Analytics