This tutorial covers Driver Analysis and Product Optimization with the Probabilistic Structural Equation Models of BayesiaLab. It provides hands-on examples of how to use Bayesian Belief networks in the field of marketing science.
1. Tutorial on Driver Analysis and Product Optimization
with BayesiaLab
Stefan Conrady, stefan.conrady@conradyscience.com
Dr. Lionel Jouffe, jouffe@bayesia.com
December 1, 2010
Conrady Applied Science, LLC - Bayesia’s North American Partner for Sales and Consulting
2. Conrady Applied Science, LLC - www.conradyscience.com
Table of Contents
Tutorial on Driver Analysis and Product Optimization with BayesiaLab
Introduction 1
BayesiaLab 1
Conrady Applied Science 1
Acknowledgements 1
Abstract 1
Bayesian Networks 1
Structural Equation Models 1
Probabilistic Structural Equation Models 2
Tutorial 2
Model Development 2
Data Preparation 2
Consumer Research 2
Data Import 2
Unsupervised Learning 5
Preliminary Analysis 6
Variable Clustering 8
Multiple Clustering 10
Analysis of Factors 12
Completing the PSEM 14
Market Driver Analysis 16
Product Driver Analysis 19
Product Optimization 19
Conclusion 24
Contact Information 25
Conrady Applied Science, LLC 25
Bayesia SAS 25
Copyright 25
Driver Analysis and Product Optimization with BayesiaLab
i
3. Driver Analysis and Product Optimization with BayesiaLab
Acknowledgements
Tutorial on Driver We would like to express our gratitude to Ares Research
Analysis and Product (www.ares-etudes.com) for generously providing data
from their consumer research for our case study.
Optimization with
Abstract
BayesiaLab Market driver analysis and product optimization are one
of the central tasks in Product Marketing and thus
relevant to virtually all types of businesses. BayesiaLab
Introduction provides a uni ed software platform, which can, based
This tutorial is intended for new or prospective users of on consumer data,
BayesiaLab. The example in this tutorial is taken from
the eld of marketing science and is meant to illustrate 1. provide deep understanding of the market
the capabilities of BayesiaLab with a real-world case preference structure
study and actual consumer data. Beyond market
2. directly generate recommendations for prioritized
researchers, analysts and researchers in many elds will
product actions.
hopefully nd the proposed methodology valuable and
intuitive. In this context, many of the technical steps are The proposed approach utilizes Probabilistic Structural
outlined in great detail, such as data preparation and the Equation Models (PSEM), based on machine-learned
network learning, as they are applicable to research with Bayesian networks. PSEMs provide an ef cient
BayesiaLab in general, regardless of the domain. alternative to Structural Equation Models (SEM), which
have been used traditionally in market research.
BayesiaLab
Bayesia SAS, based in Laval, France has been developing Bayesian Networks
BayesiaLab since 1999 and it has emerged as the leading A Bayesian network, belief network is a directed acyclic
software package for knowledge discovery, data mining graphical model that represents the joint probability
and knowledge modeling using Bayesian networks. distribution over a set of random variables and their
BayesiaLab enjoys broad acceptance in academic conditional dependencies via a directed acyclic graph
communities as well as in business and industry. The (DAG). For example, a Bayesian network could represent
relevance of Bayesian networks, especially in the context the probabilistic relationships between diseases and
of market research, is highlighted by Bayesia’s strategic symptoms. Given symptoms, the network can be used to
partnership with Procter & Gamble, who has deployed compute the probabilities of the presence of various
BayesiaLab globally since 2007. diseases.
Conrady Applied Science Structural Equation Models
Conrady Applied Science, based in Franklin, TN, is a Structural Equation Modeling (SEM) is a statistical
consulting rm specializing in knowledge discovery and technique for testing and estimating causal relations
probabilistic reasoning with Bayesian networks. In 2010, using a combination of statistical data and qualitative
Conrady Applied Science has been appointed Bayesia’s causal assumptions. This de nition of SEM was
authorized sales and consulting partner for North articulated by the geneticist Sewall Wright (1921), the
America. economist Trygve Haavelmo (1943) and the cognitive
scientist Herbert Simon (1953), and formally de ned by
Judea Pearl (2000).
Structural Equation Models (SEM) allow both
con rmatory and exploratory modeling, meaning they
Conrady Applied Science, LLC - www.conradyscience.com 1
4. Driver Analysis and Product Optimization with BayesiaLab
are suited to both theory testing and theory • BayesiaLab functions, keywords, commands, etc., are
development. shown in bold type.
Probabilistic Structural Equation Models • Variable names are capitalized and italicized.
Traditionally, specifying and estimating an SEM required
a multitude of manual steps, which are typically very Model Development
time consuming, often requiring weeks or even months
of an analyst’s time. PSEMs are based on the idea of Data Preparation
leveraging machine learning for automatically generating
Consumer Research
a structural model. As a result, creating PSEMs with
This study is based on a monadic1 consumer survey
BayesiaLab is extremely fast and can thus form an
about perfumes, which was conducted in France. In this
immediate basis for much deeper analysis and
example we use survey responses from 1,320 women,
optimization.
who have evaluated a total of 11 fragrances on a wide
Tutorial range of attributes:
At the beginning of this tutorial, we want to emphasize • 27 ratings on fragrance-related attributes, such as,
the overarching objectives of this case study, so we don’t “sweet”, “ owery”, “feminine”, etc., measured on a 1-
lose sight of the “big picture” as we immerse ourselves to-10 scale.
into the technicalities of BayesiaLab and Bayesian • 12 ratings on projected imagery related to someone,
networks. who would be wearing the respective fragrance, e.g.
“is sexy”, “is modern”, measured on a 1-to-10 scale.
In this study we want to examine how product attributes
• 1 variable for Intensity, a measure re ecting the level
perceived by consumers relate to purchase intention for
of intensity, measured on a 1-to-5 scale.2
speci c products. Put simply, we want to understand the
• 1 variable for Purchase Intent, measured on a 1-to-6
key drivers for purchase intent. Given the large number
scale.
of attributes in our study, we also want to identify
• 1 nominal variable, Product, for product identi cation
common concepts among these attributes in order to
purposes.
make interpretation easier and communication with
managerial decision makers more effective. Data Import
To start the analysis with BayesiaLab, we rst import the
Secondly, we want to utilize the generated understanding
data set, which is formatted as a CSV le.3 With
of consumer dynamics, so product developers can
Data>Open Data Source>Text File, we start the Data
optimize the characteristics of the products under study
Import wizard, which immediately provides a preview of
in order to increase purchase intent among consumers,
the data le.
which is our ultimate business objective.
Notation
In order to clearly distinguish between natural language,
BayesiaLab-speci c functions and study-speci c variable
names, the following notation is used:
1 a product test only involving one product, i.e. in our study each respondent evaluated only one perfume.
2 The variable Intensity is listed separately due to the a-priori knowledge of its non-linearity and the existence of a “just-
about-right” level.
3 CSV stands for “comma-separated values”, a common format for text-based data les.
Conrady Applied Science, LLC - www.conradyscience.com 2
5. Driver Analysis and Product Optimization with BayesiaLab
Product variable and clicking the Discrete check box,
which changes the color of the Product column to red.
The table displayed in the Data Import wizard shows the
individual variables as columns and the responses as
rows. There are a number of options available, e.g. for
sampling. However, this is not necessary in our example We will also de ne Purchase Intent and Intensity as a
given the relatively small size of the database. discrete variables, as the default number of states of
Clicking the Next button, prompts a data type analysis, these variables is already adequate for our purposes.5
which provides BayesiaLab’s best guess regarding the The next screen provides options as to how to treat any
data type of each variable. missing values. In our case, there are no missing values
Furthermore, the Information box provides a brief so the corresponding panel is grayed-out.
summary regarding the number of records, the number Clicking the small upside-down triangle next to the
of missing values, ltered states, etc.4 variable names brings up a window with key statistics of
the selected variable, in this case Fresh.
For this example, we will need to override the default
data type for the Product variable, as each value is a The next step is the Discretization and Aggregation
nominal product identi er rather than a numerical scale dialogue, which allows the analyst to determine the type
value. We can change the data type by highlighting the of discretization, which must be performed on all
4 There are no missing values in our database and ltered states are not applicable in this survey.
5 The desired number of variable states is largely a function of the analyst’s judgment.
Conrady Applied Science, LLC - www.conradyscience.com 3
6. Driver Analysis and Product Optimization with BayesiaLab
continuous variables.6 For this survey, and given the
number of observations, it is appropriate to reduce the
number of states from the original 10 states (1 through
10) to smaller number. One could, for instance, bin the
1-10 rating into low, mid and high, or apply any other
arbitrary method deemed appropriate by the analyst.
Clicking Select All Continuous followed by Finish
completes the import process and the 49 variables
(columns) from our database are now shown as blue
nodes in the Graph Panel, which is the main window for
network editing.
The screenshot shows the dialogue for the Manual
selection of discretization steps, which permits to select
binning thresholds by point-and-click.
Note
For choosing discretization algorithms beyond this
example, the following rule of thumb may be helpful:
• For supervised learning, choose Decision Tree.
• For unsupervised learning, choose, in the order of
priority, K-Means, Equal Distances or Equal
Frequencies.
For this particular example, we select Equal Distances
with 5 intervals for all continuous variables. This was
the analyst’s choice in order to be consistent with prior
research.
This initial view represents a fully unconnected Bayesian
network.
For reasons, which will become clear later, we will
initially exclude two variables, Product and Purchase
Intent. We can do so by right-clicking the nodes and
selecting Properties>Exclusion. Alternatively, holding “x”
while double-clicking the nodes performs the same
exclusion function.
6 BayesiaLab requires discrete distributions for all variables.
Conrady Applied Science, LLC - www.conradyscience.com 4
7. Driver Analysis and Product Optimization with BayesiaLab
Unsupervised Learning
As the next step, we will perform the rst unsupervised
Needless to say, this view of the network is not very
learning of a network by selecting Learning>Association
intuitive. BayesiaLab has numerous built-in layout
Discovering>EQ.
algorithms, of which the Force Directed Layout is
perhaps the most commonly used.
The resulting view shows the learned network with all
the nodes in their original position.
It can be invoked by View>Automatic Layout>Force
Directed Layout or alternatively through the keyboard
shortcut “p”. This shortcut is worthwhile to remember
as it is one of the most commonly used functions.
Conrady Applied Science, LLC - www.conradyscience.com 5
8. Driver Analysis and Product Optimization with BayesiaLab
The resulting network will look similar to the following
screenshot.
It is very important to note that, although this learned
graph happens to have a tree structure, this is not the
To optimize the use of the available screen, clicking the result of an imposed constraint.
Best Fit button in the toolbar “zooms to t” the Preliminary Analysis
graph to the screen. In addition, rotating the graph with The analyst can further examine this graph by switching
the Rotate Left and Rotate Right buttons helps to into the Validation Mode, which immediately opens up
create a suitable view. the Monitor Panel on the right side of the screen.
The nal graph should closely resemble the following
screenshot and, in this view, the properties of this rst
learned Bayesian network become immediately apparent.
This network is a now compact representation of the 47
dimensions of the joint probability distribution of the
underlying database.
This panel is initially empty, but by clicking on any node
or multiple nodes in my network, Monitors appear
Conrady Applied Science, LLC - www.conradyscience.com 6
9. Driver Analysis and Product Optimization with BayesiaLab
inside the Monitor Panel and the corresponding nodes
are highlighted in yellow.
The gray arrows inside the bars indicate how the
distributions have changed compared to the previous
distributions. This means that respondents, who have
rated the Flowery attribute of a perfume at the top level,
will have a 67% probability of also assigning a top
rating to the Fresh attribute.
P(Fresh = " > 8.2" | Flowery = " > 8.2") = 66.9%
Note
By default, the Monitors show the marginal distributions The structure of our Bayesian network may be
of all selected variables. This shows, for instance, 9.7% directed, but the directions of the arcs do not
necessarily have to be meaningful.
of respondents rated their perfume at <=2.8 in terms of
the Fresh attribute. For observational inference, it is only necessary that
the Bayesian network correctly represents the joint
probability distribution of the underlying database.
On this basis, one can start to experiment with the
properties of this particular Bayesian network and query
it. With BayesiaLab this can be done in an extremely Switching brie y back into the Modeling Mode and by
intuitive way, i.e. by setting evidence (or observations) clicking on the Flowery node, one can see the
directly on the Monitors. For instance, we can compute probabilistic relationship between Flowery and Fresh in
the conditional probability distribution of Flowery, given detail. By learning the network, BayesiaLab has
that we have observed a speci c value, i.e. a speci c automatically created a contingency table for every
state of Fresh. In formal notation, this would be single direct relationship between nodes.
P(Flowery | Fresh)
We will now set Flowery to the state that represents the
highest rating (>8.2) and we can immediately observe the
conditional probability distribution of Fresh, i.e.
P(Fresh | Flowery = " > 8.2")
Conrady Applied Science, LLC - www.conradyscience.com 7
10. Driver Analysis and Product Optimization with BayesiaLab
All contingency tables, together with the graph structure, Formal De nition of Mutual Information
thus encode the joint probability distribution of our
original database. ⎛ p(x, y) ⎞
I(X;Y ) = ∑ ∑ p(x, y)log ⎜
Returning to the Validation Mode, we can further y∈Y x∈X ⎝ p(x)p(y) ⎟
⎠
examine the properties of our network. Of great interest
is the strength of the probabilistic relationships between
the variables. In BayesiaLab this can be shown by We can also show the values of the Mutual Information
selecting Analysis>Graphic>Arcs’ Mutual Information. on the graph by clicking on Display Arc Comments.
The thickness of the arcs is now proportional to the
Mutual Information, i.e. the strength of the relationship
between the nodes.
In the top part of the comment box
attached to each arc the Mutual
Information of the arc is shown. Below,
expressed as a percentage and highlighted
in blue, we see the relative Mutual
Information in the direction of the arc (parent node ➔
Intuitively, Mutual Information measures the
child node). And, at the bottom, we have the relative
information that X and Y share: it measures how much
mutual information in the opposite direction of the arc
knowing one of these variables reduces our uncertainty
(child node ➔ parent node).
about the other. For example, if X and Y are
independent, then knowing X does not provide any Variable Clustering
information about Y and vice versa, so their mutual The information about the strength between the manifest
information is zero. At the other extreme, if X and Y are variables can also be utilized for purposes of Variable
identical then all information conveyed by X is shared Clustering. More speci cally, a concept related closely to
with Y: knowing X determines the value of Y and vice the Mutual Information, namely the Kullback-Leibler
versa. Divergence (K-L Divergence) is utilized for clustering.
Conrady Applied Science, LLC - www.conradyscience.com 8
11. Driver Analysis and Product Optimization with BayesiaLab
For probability distributions P and Q of a discrete
random variable their K–L divergence is de ned to be
P(i)
DKL = (P || Q) = ∑ P(i)log
i Q(i)
In words, it is the average of the logarithmic difference
between the joint probability distributions P(i) and Q(i),
where the average is taken using the probabilities P(i).
Such variable clusters will allow us to induce new latent
variables, which each represent a common concept
among the manifest variables.7 From here on, we will
make a very clear distinction between manifest variables,
which are directly observed, such as the survey
responses, and latent variables, which are derived. In
traditional statistics, deriving such latent variables or
factors is typically performed by means of Factor In this case, BayesiaLab has identi ed 15 variable
Analysis, e.g. Principal Components Analysis (PCA). clusters and each node is color-coded according to the
cluster membership. To interpret these newly-found
In BayesiaLab, this “factor extraction” can be done very clusters, we can zoom in and visually examine the
easily via the Analysis>Graphics>Variable Clustering structure on the graph panel.
function, which is also accessible through the keyboard
shortcut “s”.
The speed in which this is performed is one of the
strengths of BayesiaLab, as the resulting variable clusters To support the interpretation process, BayesiaLab can
are presented instantly. also display a Dendrogram, which allows the analyst to
review the linkage of nodes into variable clusters.
7 An alternative approach is to interpret the derived concept or factor as a hidden common cause.
Conrady Applied Science, LLC - www.conradyscience.com 9
12. Driver Analysis and Product Optimization with BayesiaLab
The analyst may also choose a different number of
clusters, based on his own judgement relating to the
domain. A slider in the toolbar allows to choose various
numbers of clusters and the color association of the
nodes will be update instantly. The analyst also has the option to use his domain
knowledge to modify which manifest variables belong to
speci c factors. This can be done by right-clicking on the
Graph Panel and selecting Class Editor.
By clicking the Validate Clustering button in the
toolbar, the clusters are saved and the color codes will be
formally associated with the nodes. A clustering report
provides us with a formal summary of the new factors
and their associated manifest variables.8
Multiple Clustering
As our next step towards building the PSEM, we will
introduce these newly-generated latent factors into our
existing network and also estimate their probabilistic
relationships with the manifest variables. This means we
will create a new node for each latent factor, creating 15
new dimensions in our network. For this step, we will
need to return to the Modeling Mode, because the
introduction of the factor nodes into the networks
requires the learning algorithms.
8 Variable cluster = derived concept = unobserved latent variable = hidden cause = extracted factor.
Conrady Applied Science, LLC - www.conradyscience.com 10
13. Driver Analysis and Product Optimization with BayesiaLab
new factor will need to represent the corresponding
manifest variables with up to ve states.
The Multiple Clustering process concludes with a report,
which shows details regarding the generated clustering.
The top portion of the report is shown in the following
screenshot.
More speci cally, we select Learning>Multiple
Clustering, which brings up the Multiple Clustering
dialogue. There is a range of settings, but we will focus
here only a subset. Firstly, we need to specify an output
directory for the to-be-learned networks. Secondly, we
need to set some parameters for the clustering process,
such as the minimum and maximum number of states,
which can be created during the learning process.
The detail section of Factor_0, as it relates to the
manifest variables, is worth highlighting. Here we can
see the strength of the relationship between the manifest
variables, such as Trust, Bold, etc., and Factor_0. In a
traditional Factor Analysis, this would be the equivalent
of factor loading.
After closing the report, we will now see a new
(unconnected) network, with 15 additional nodes, one
for each factor, i.e. Factor_0 through Factor_14,
highlighted in yellow in the screenshot.
In our example, we select Automatic Selection of the
Number of Classes, which will allow the learning
algorithm to nd the optimum number of factor states
up to a maximum of ve states. This means that each
Conrady Applied Science, LLC - www.conradyscience.com 11
14. Driver Analysis and Product Optimization with BayesiaLab
Analysis of Factors Returning to the Validation Mode, we can see ve states
We can also further examine how the new factors relate for Factor_0, labeled C1 through C5, as well as their
to the manifest variables and how well they represent marginal distribution. As Factor_0 is a target node by
them. In the case of Factor_0, we want to understand default, it automatically appears highlighted in red in the
how it can summarize our ve manifest variables. Monitor Panel.
By going into our previously-speci ed output directory,
using the Windows Explorer or the Mac Finder, we can
see that 15 new networks (in BayesiaLab’s xbl format for
networks) were generated. We open the speci c network
for Factor_0, either by directly double-clicking the xbl
le or by selecting Network>Open. The factor-speci c
networks are identi ed by a suf x/extension of the
format “_[Factor_#].xbl” and “#” stands for the factor
number. We then see a network including the manifest
variables and with the factor being linked by arcs going
from the factor to the manifest variables.
Here we can also study how the states of the manifest
variables relate to the states of Factor_0. This can be
done easily by setting observations to the monitors, e.g.
setting C1 to 100%.
Conrady Applied Science, LLC - www.conradyscience.com 12
15. Driver Analysis and Product Optimization with BayesiaLab
which will bring up a record selector in the toolbar.
With this record selector, we can now scroll through the
entire database, review the actual ratings of the
respondents and then see the estimation to which cluster
each respondent belongs.
We now see that given that Factor_0 is in state C1, the
variable Active has a probability of approx. 75% of
being in state <=2.8. Expressed more formally, we would
state P(Active = “<=2.8” | Factor_0 = C1) = 74.57%.
This means that for respondents, who have been
assigned to C1, it is likely that they would rate the Active
attribute very low as well.
In the Monitor for Factor_0, in parentheses behind the
cluster name, we nd the expected mean value of the
numeric equivalents of the states of the manifest
variables, e.g. “C1 (2.08)”. That means that given the
state C1 of Factor_0, we expect the mean value of Trust,
Bold, Ful lled, Active and Character to be 2.08.
In our rst case, record 0, we see the ratings of this
respondent indicated by the manifest Monitors. In the
highlighted Monitor for Factor_0 we read that this
respondent, given her responses, has a 82% probability
of belonging to Cluster 5 (C5) in Factor_0.
Moving to our second case, record 1, we see that the
respondent belongs to Cluster 3 (C3) with a 96%
To go into even greater detail, we can actually look at probability.
every single respondent, i.e. every record in the database,
and see what cluster they were assigned to. We select
Inference>Interactive Inference,
Conrady Applied Science, LLC - www.conradyscience.com 13
16. Driver Analysis and Product Optimization with BayesiaLab
We can also evaluate the performance of our new
network based on Factor_0 by selecting
Analysis>Network Performance>Global. Before we re-learn our network with the new factors, we
need to include Purchase Intent as a variable and also
impose a number of constraints in the form of Forbidden
Arcs.
Being in the Modeling Mode, we can include Purchase
Intent by right-clicking the node and uncheck Exclusion.
This will return the log-likelihood density function, as
shown in the following screenshot.
This makes the Purchase Intent variable available in the
next stage of learning, which is re ected visually as well
in the node color and the icon.
Completing the PSEM
We are now returning to our main task and our principal
network, which has been augmented by the 15 new
factors.
Conrady Applied Science, LLC - www.conradyscience.com 14
17. Driver Analysis and Product Optimization with BayesiaLab
Our desired SEM-type network structure stipulates that
manifest variables be connected exclusively to the factors
and that all the connections with Purchase Intent must
also go through the factors. We achieve such a structure
by imposing the following sets of forbidden arcs:
1. No arcs between manifest variables
2. No arcs from manifest variables to factors
3. No arcs between manifest variables and Purchase
Intent
We can de ne these forbidden arcs by right-clicking
anywhere on the graph panel, which brings up the
following menu.
In BayesiaLab, all manifest variables and all factors are
conveniently grouped into classes, so we can easily de ne Upon completing this step, we can proceed to learning
which arcs are forbidden in the Forbidden Arc Editor. our network again: Learning>Association
Discovering>EQ
The initial result will resemble the following screenshot.
Conrady Applied Science, LLC - www.conradyscience.com 15
18. Driver Analysis and Product Optimization with BayesiaLab
comments by double-clicking Factor_0 and scrolling to
the right inside the Node Editor until we see the
Comments tab.
Using the Force Directed Layout algorithm (shortcut
“p”), as before, we can quickly transform this network We repeat this for all other nodes and we can
into a much more interpretable format. subsequently display the node comments for all factors
by clicking the Display Node Comment icon in the
toolbar or by selecting View>Display Node Comments
from the menu.
Market Driver Analysis
Our model, the PSEM, is complete and we can now use
it to perform the actual analysis part of this exercise,
Now we see manifest variables “laddering up” to the namely to nd out what “drives” Purchase Intent.
factors and we also see how the factors are related to
each other. Most importantly, we can observe where the We return to the Validation Mode and right-click on
Purchase Intent node was attached to the network Purchase Intent and then check Set As Target Node.
during the learning process. The structure conveys that Double-clicking the node while pressing “t” is a helpful
Purchase Intent has the strongest link with Factor_2. shortcut.
Now that we can see the big picture, it is perhaps
appropriate to give the factors more descriptive names.
For obvious reasons, this task is the responsibility of the
analyst. In this case study, Factor_0 was given the name
“Self-Con dent”. We add this name into the node
Conrady Applied Science, LLC - www.conradyscience.com 16
19. Driver Analysis and Product Optimization with BayesiaLab
The resulting view has all the manifest variables grayed-
out, so the relationship between the factors becomes
more prominent. By deselecting the manifest variables,
This will also change the appearance of the node and we also exclude them from subsequent analysis.
literally give it the look of a target.
In order to understand the relationship between the
factors and Purchase Intent, we want to tune out all the
manifest variables for the time being. We can do so by
right-clicking the Use of Classes icon in the bottom right
corner of the screen. This will bring up a list of all
classes. By default, all are checked and thus visible.
We will now right-click inside the (currently empty)
Monitor Panel and select Monitors Sorted wrt Target
Variable Correlations. The keyboard shortcut “x” will
do the same.
For our purposes, we want to deselect All and then only
check the Factor class.
Conrady Applied Science, LLC - www.conradyscience.com 17
20. Driver Analysis and Product Optimization with BayesiaLab
“Correlations” is more of a metaphor here, as
BayesiaLab actually orders the factors by their mutual
information relative to the target node, Purchase Intent.
This brings up the monitor for the target node, Purchase
Intent, plus all the monitors for the factors, in the order
of the strength of relationship with the Target Node.
By clicking Quadrants, we can obtain a type of
opportunity graph, which shows the mean value of each
factor on the x-axis and the relative Mutual Information
with Purchase Intent on the y-axis. Mutual Information
can be interpreted as importance in this context.
This immediately highlights the order of importance of
the factors relative to the Target Node, Purchase Intent.
Another way of comprehensively displaying the
importance is by selecting Reports>Target
Analysis>Correlations With the Target Node
Conrady Applied Science, LLC - www.conradyscience.com 18
21. Driver Analysis and Product Optimization with BayesiaLab
By right-clicking on the graph, we can switch between these constraints will be extremely important when
the display of the formal factor names, e.g. Factor_0, searching for realistic product scenarios.
Factor_1, etc., and the factor comments, such as
On a side note, an example from the presumably more
Adequacy, Seduction, which is much easier for
tangible auto industry may better illustrate such kinds of
interpretation.
constraints. For instance, a vehicle platform may have an
As in the previous views, it becomes very obvious that inherent wheelbase limitation, which thus sets a hard
the factor Adequacy is most important with regard to limit regarding the maximum amount of rear passenger
Purchase Intent, followed by the factor Seduction. This is legroom. Even if consumers perceived a need for
very helpful for understanding the overall market improvement on this attribute, making such a
dynamics and for communicating the key drivers to recommendation to the engineers would be futile. As we
managerial decision makers. search for optimum product solutions with our Bayesian
network, this is very important to bear in mind and thus
The lines dividing the graph into quadrants re ect the we must formally encode these constraints of our
mean values for each axis. The upper-left quadrant
domain through the Cost Editor.
highlights opportunities as these particular factors are
“above average” in importance, but “below average” in Product Optimization
terms of their rating.
We now return brie y to the Modeling Mode to include
the Product variable, which has been excluded from our
Product Driver Analysis
analysis thus far. Right-clicking the node and then
Although this insight is relevant for the whole market, it
unchecking Properties>Exclusion will achieve this.
does not yet allow us to work on improving speci c
products. For this we need to look at product-speci c At this time, we will also move beyond the analysis of
graphs. In addition, we may need to introduce factors and actually look at the individual product
constraints as to where we may not have the ability to attributes, so we select Manifest from the Display
impact any attributes. Such information must come from Classes menu.
the domain expert, in our case from the perfumer, who
will determine if and how odoriferous compounds can
affect the consumers’ perception of the product
attributes.
Back in the Validation Mode, we can perform a Multi
Quadrant Analysis: Tools>Multi Quadrant Analysis
These constraints can be entered into BayesiaLab’s Cost
Editor, which is accessible by right-clicking anywhere in
the Graph Panel. Those attributes, which cannot be This tool allows us to look at the attribute ratings of
changed (as determined by the expert), will be set to each product and their respective importance, as
“Not Observable”. As we proceed with our analysis, expressed with the Mutual Information. Thus we pick
Conrady Applied Science, LLC - www.conradyscience.com 19
22. Driver Analysis and Product Optimization with BayesiaLab
Product as the Selector Node and choose Mutual
Information for Analysis. In this case, we also want to
check Linearize Nodes’ Values, Regenerate Values and
specify an Output Directory, where the product-speci c
networks will be saved. In the process of generating the
Multi Quadrant Analysis, BayesiaLab will actually
generate one Bayesian network for each Product. For all
Products the network structure will be identical to the
network for the entire market, however, the parameters,
i.e. the contingency tables, will be speci c to each
Product.
For Product No. 5, Personality is at the very top of the
importance scale. But how will the Personality attribute
However, before we proceed to the product-speci c compare in the competitive context? If we Display Scales
networks, we will rst see a Multi Quadrant Analysis by by right-clicking on the graph, it appears that Personality
Product and we can select each product’s graph simply is already at the best level among the competitors, i.e. to
by right-clicking and choosing the appropriate product the far right of the horizontal scale. On the other hand,
identi cation number. on the Fresh attribute Product No. 5 9 marks the bottom
end of the competitive range.
Please note that only the observable variables are visible
on the chart, i.e. those variables which were not
previously de ned as “Not Observable” in the Cost
Editor.
9 Any similarities of identi ers with actual product names are purely coincidental.
Conrady Applied Science, LLC - www.conradyscience.com 20
23. Driver Analysis and Product Optimization with BayesiaLab
For a perfumer it would thus be reasonable to assume
that there is limited room for improvement in regard to
Personality and that Fresh offers perhaps signi cant
opportunity for Product No. 5.
To highlight the differences between products, we will
also show Product No. 1 in comparison.
BayesiaLab also allows us to measure and save the “gap
to best level” (=variations) for each product and each
variable through the Export Variations function. This
formally captures our opportunity for improvement.
For Product No. 1 it becomes apparent that Intensity is
highly important, but that its rating is towards the
bottom end of the scale. The perfumer may thus
conclude a bolder version of the same fragrance will
improve Purchase Intent.
Finally, by hovering over any data point in the
opportunity chart, BayesiaLab can also display the
position of competitors compared to the reference
product for any attribute. The screenshot shows Product
No. 5 as the reference and the position of competitors on
the Personality attribute. Please note that these variations need to be saved
individually by Product.
By now we have all the components necessary for a
comprehensive optimization of product attributes:
1. Constraints on “non-actionable” attributes, i.e.
excluding those variables, which can’t be affected
through product changes.
2. A Bayesian network for each Product.
Conrady Applied Science, LLC - www.conradyscience.com 21
24. Driver Analysis and Product Optimization with BayesiaLab
3. The current attribute rating of each Product and each
attribute’s importance relative to Purchase Intent.
4. The “gap to best level” (variation) for each attribute
and Product.
With the above, we are now in a position to search for
realistic product con gurations, based on the existing
product, which would realistically optimize Purchase
Intent.
We proceed individually by Product and for illustration
purposes we use Product No. 5 again. We load the
product-speci c network, which was previously saved
when the Multi Quadrant Analysis was performed.
The Target Dynamic Pro le provides a number of
important options:
• Pro le Search Criterion: we intend to optimize the
mean of the Purchase Intent.
• Criterion Optimization: maximization is the objective.
• Search Method: We select Mean and also click on Edit
One of the powerful features of BayesiaLab is Target Variations, which allows us to manually stipulate the
Dynamic Pro le, which we will apply here on this range of possible variations of each attribute. In our
network to optimize Purchase Intent: case, however, we had saved the actual variations of
Analysis>Report>Target Analysis>Target Dynamic Product No. 5 versus the competition, so we load that
Pro le data set, which subsequently displays the values in the
Variation Editor. For example, Fresh could be
improved by 10.7% before catching up to the highest-
Conrady Applied Science, LLC - www.conradyscience.com 22
25. Driver Analysis and Product Optimization with BayesiaLab
rated product in this attribute. Initially, we have the marginal distribution of the
attributes and the original mean value for Purchase
Intent, i.e. 3.77.
• Search Stop Criterion: We check Maximum Number
of Evidence Reached and set this parameter to 4. This
means that no more than the top-four attributes will
be suggested for improvement.
To further illustrate the impact of our product actions,
Upon completion of all computations, we will obtain a
we will simulate their implementation step-by-step,
list of product action priorities: Fresh, Fruity, Flowery
which is available through Inference>Interactive
and Wooded.
Inference.
With the selector in the toolbar, we can go through each
product action step-by-step in the order in which they
The highlighted Value/Mean column shows the
were recommended.
successive improvement upon implementation of each
action. From initially 3.76, the Purchase Intent improves Upon implementation of the rst product action, we
to 3.92, which may seem like a fairly small step. obtain the following picture and Purchase Intent grows
However, the importance lies in the fact that this to 3.9. Please note that this is not a sea change in terms
improvement is not based on utopian thinking, but of Purchase Intent, but rather a realistic consumer
rather on attainable product improvements within the response to a product change.
range of competitive performance.
Conrady Applied Science, LLC - www.conradyscience.com 23
26. Driver Analysis and Product Optimization with BayesiaLab
The second change results in further subtle improvement Although BayesiaLab generates these recommendation
to Purchase Intent: very quickly and easily, they represent a major
innovation in the eld of marketing science. This
particular optimization task has not been tractable with
traditional methods.
Conclusion
The presented case study demonstrates how BayesiaLab
can transform simple survey data into a deep
understanding of consumers’ thinking and quickly
provides previously-inconceivable product
recommendations. As such, BayesiaLab is an
revolutionary tool, especially as the work ow shown
here may take no more than a few hours for an analyst
to implement. This kind of rapid and “actionable”10
insight is clearly a breakthrough and creates an entirely
new level of relevance of research for business
The third and fourth step are analogous and bring us to
applications.
the nal value for Purchase Intent of 3.92.
10 The authors cringe at the in ationary use of “actionable”, but here, for once, it actually seems appropriate.
Conrady Applied Science, LLC - www.conradyscience.com 24