Context
1. Social Enterprise collected data on customers & wants to make insight-informed decisions.
Objective
2. To identify customer segments to customised offers for each segment.
Strategy
3. Explore & Clean data for analysis.
4. Perform K-Means Clustering, in Orange, to find possible segments in the customer data.
5. Tune the model to improve its performance.
6. Visualise the findings, share conclusions, and give insight-driven recommendations.
Author: Anthony mok
date: 18 Nov 2023
Email: xxiaohao@yahoo.com
Identify Customer Segments to Create Customer Offers for Each Segment - Application of K-Means Clustering With Orange
1. K-MEANS CLUSTERING
WITH ORANGE
IDENTIFY CUSTOMER SEGMENTS
OF A SOCIAL ENTERPRISE TO
CREATE CUSTOMER OFFERS FOR EACH SEGMENT
AUTHOR: ANTHONY MOK
DATE: 18 NOV 2023
EMAIL: XXIAOHAO@YAHOO.COM
2. WHAT IS ORANGE
Open-source
and Extensible
Freely available,
adaptable, and
customisable
data mining tool
Visual
Programming
Drag-and-drop
interface for
building data
analysis
workflows
Interactive Data
Exploration
Quickly
understand data
patterns and
trends using
visualisations
Wide Range of
Data Mining
Algorithms
Identify patterns,
make predictions,
and solve data mining
problems
3. PROJECT’S CONTEXT, OBJECTIVE & STRATEGIES
To identify customer
segments to customised
offers for each segment
Social Enterprise
collected data on
customers & wants to make
insight-informed decisions
• Explore & Clean data for
analysis
• Perform K-Means Clustering,
in Orange, to find possible
segments in the customer
data
• Tune the model to improve its
performance
• Visualise the findings, share
conclusions, and give insight-
driven recommendations
4. EXPLORATORY DATA ANALYSIS
Findings
• Target = Recency_in_Day
• Provides insights into customer behavior,
preferences, and churn risk
• Feature Columns = 9
• Instances = 2,240
• Blanks & Outliers
Age Column Income Column
23 Blanks -
1 Outlier 3 Outliers
6. LOADING DATA & DEALING WITH BLANKS
Customer.csv file imported into
workflow with the ‘Role’ of
Recency_days set as ‘Target’,‘ID’ as
“meta’, with the rest as ‘features’
Exploratory Data Analysis (EDA) was
considered, and blanks are imputed
by ‘Average’ of sum of values in the
‘Income’ column
7. EXAMINING RELATIONSHIPS & PATTERNS
Scatter Plots were created
to explore the relationships
and patterns in the dataset
‘Recency_days’is the ‘Target’
with Four feature columns
selected for the model:
‘Income’ & ‘Age’ (Numerical
Data) & ‘Marital Status’ &
‘Education’, since these are
more informative
8. IDENTIFYING IDEAL NUMBER OF CLUSTERS
• To determine the ideal number of
clusters, the Silhouette Scores in the
range of 2 to 12 clusters were
calculated
• Overall, the Silhouette Scores are
positive, but relatively low, suggesting
the clustering is fair, but there is still
some overlaps between clusters
• Clustering parameters can be
adjusted to improve the separation
between clusters
9. BOOSTING MODEL’S PERFORMANCE & LIMITATIONS
• By default,‘K-Means++’ & ‘Normalise Columns’
are enabled in the Hyperparameters
• So only ‘Maximum Iterations’ was set to 100,000
(from 300) and ‘Re-runs’ at 100 (from 10) to
boost the performance of the model
• But the Silhouette Scores haven’t improved in
the range of 2 to 12 clusters after these changes,
suggesting that the K-Means Clustering
Algorithm has converged to a stable solution
10. BOOSTING MODEL’S PERFORMANCE & LIMITATIONS
In this stable state, scores can be
increased at the upper ranges of
the clusters, but will result to
overfitting the model to the dataset
To avoid this outcome, the
conservative number of 3 Clusters
was chosen (Silhouette Score =
0.217) instead
11. FINDINGS & CONCLUSIONS
• Maximum income of customer base is
$100,000/annum
• For customers in the age range of 30 to 55, half of
these earned below $50,000/annum, who could
be price sensitive and are bargain hunters, while
the other half earned above this threshold, who
may be able to pay a premium for quality
• Higher concentration of customers is found to
have undergraduate degrees, who are more
educated, and they are separated equally into
two clusters: singles, with more ability for
discretionary spending, and married couples,
with less spending power given children/teens in
their households
• Customers above 55 are even distributed across
all income groups
* More comprehensive findings and conclusions were provided in the project report, which
are not released at the request of the Social Enterprise
12. RECOMMENDATIONS*
Segment 1 - Customers in the age range of 30 to 55
who earned below $50,000/annum
• Offer value-for-money products and services
• Highlight discounts and promotions
• Offer bundle deals and loyalty programs
• Target them with personalised marketing campaigns
based on their purchase history and interests
* More recommendations were provided for each identified cluster in the project report,
which are not released at the request of the Social Enterprise
Segment 3 - Customers with undergraduate degrees
• Offer educational and informative content
• Highlight the benefits of products and services for their
careers and personal development
• Partner with other businesses that offer complementary
products and services
• Target them with personalised marketing campaigns
based on their interests and areas of expertise
13. K-MEANS CLUSTERING
WITH ORANGE
IDENTIFY CUSTOMER SEGMENTS
OF A SOCIAL ENTERPRISE TO
CREATE CUSTOMER OFFERS FOR EACH SEGMENT
AUTHOR: ANTHONY MOK
DATE: 18 NOV 2023
EMAIL: XXIAOHAO@YAHOO.COM