7. What is Python?
Python is a high level programming
language which is perfect for
automating repetitive tasks
8. What is Python?
Python is a high level programming
language which is perfect for
automating repetitive tasks
Very popular in the data science
community
9. What is Python?
Python is a high level programming
language which is perfect for
automating repetitive tasks
Very popular in the data science
community
Becoming very popular with technical
SEOs
Especially for data blending and
automation
20. LEATHER
SOFAS
VELVET SOFAS LOUNGE SOFAS
SOFAS
Grouping each
product type
into
subcategories
would better
align them to
search demand
21. LEATHER
SOFAS
VELVET SOFAS LOUNGE SOFAS
SOFAS
Creating
three new
subcategories
would create
an additional
21,000+
searches a
month*
+19,000 +1,200 +150
*source ahrefs.com
22. LEATHER
SOFAS
VELVET SOFAS LOUNGE SOFAS
SOFAS
This method
will produce
a lot of
additional
traffic for
any
eCommerce
site
+19,000 +1,200 +150
30. We wrote a Python
script to automate
the process and do
the hard work for
us!
@LeeFootSEO | #BrightonSEO
31. LEATHER
SOFAS
VELVET SOFAS LOUNGE SOFAS
SOFAS
The
products
suggest
the
categorie
s for us!
+19,000 +1,200 +150
Leather Buttoned Sofa
Mid Century Leather
Sofa
Tetbury Leather Sofa -
Black
Hardwick Leather Sofa
Tetbury Leather
Sofa - Tan
@LeeFootSEO | #BrightonSEO
32. LEATHER
SOFAS
VELVET SOFAS LOUNGE SOFAS
SOFAS
By clustering
the product
names together
our script was
able to find
opportunities
for new
categories
+19,000 +1,200 +150
Leather Buttoned Sofa
Mid Century Leather
Sofa
Tetbury Leather Sofa -
Black
Hardwick Leather Sofa
Tetbury Leather
Sofa - Tan
@LeeFootSEO | #BrightonSEO
33. Total Opportunity – Cox &
Cox
New Subcategories: 185
Search Volume: 1,400,000
@LeeFoot@SEO | #BrightonSEO
34. In testing we ran
the script on
Homebase and found
opportunity to
create
1,650
subcategories with
over
13,000,000
estimated monthly
searches
@LeeFootSEO | #BrightonSEO
35. This would take
a LONG time to
do manually!
(Assuming you
could work as
efficiently as
a computer!)
@LeeFootSEO | #BrightonSEO
36. At the end of
this talk I’m
going to share
this script
with
instructions so
you can use it
on your own
Websites
@LeeFootSEO | #BrightonSEO
39. The
Method
We’ll be using
Python and the
NLTK library to
generate hundreds
of thousands of N-
gram combinations
from product names
@LeeFootSEO | #BrightonSEO
41. The
Challenge
Using this method
to generate so
many n-grams will
create a lot of
non-sensical
words in the
process
@LeeFootSEO | #BrightonSEO
42. The
Challenge
The goal is to
keep only
relevant
keywords with
commercial
value and
discard the@LeeFootSEO | #BrightonSEO
43. The
Challeng
e
At a high level
our solution to
this problem is to
check the keywords
for search volume
& CPC data
@LeeFootSEO | #BrightonSEO
44. The
Challeng
e
If they have
neither Search
Volume or CPC
data then those
keywords will be
discarded before
the final output
@LeeFootSEO | #BrightonSEO
46. aa alkaline
aa alkaline batteries
aa alkaline batteries command
aa alkaline batteries command adjustables
aa alkaline batteries command adjustables
self
@LeeFootSEO | #BrightonSEO
Examples of N-Grams the Script will
Generate from clustering product nam
47. @LeeFootSEO | #BrightonSEO
Only one of these
suggestions has commercial
value
aa alkaline
aa alkaline batteries
aa alkaline batteries command
aa alkaline batteries command adjustables
aa alkaline batteries command adjustables
self
48. @LeeFootSEO | #BrightonSEO
Our goal is to programmatically
discard the non-sensical ones and
keep any with commercial value
aa alkaline
aa alkaline batteries
aa alkaline batteries command
aa alkaline batteries command adjustables
aa alkaline batteries command adjustables
self
49. @LeeFootSEO | #BrightonSEO
So Let’s Check for Search
Volume!
aa alkaline(20)
aa alkaline batteries(80)
aa alkaline batteries command(0)
aa alkaline batteries command adjustables(0)
aa alkaline batteries command adjustables
self(0)
50. Everything is Red will be
discarded automatically because
they have no search volume
aa alkaline (20)
aa alkaline batteries (80)
aa alkaline batteries command
aa alkaline batteries command adjustables
aa alkaline batteries command adjustables
self
@LeeFootSEO | #BrightonSEO
51. Checking n-grams for keyword
volume does a lot of the hard
work but it’s not perfect
aa alkaline (20)
aa alkaline batteries (80)
@LeeFootSEO | #BrightonSEO
52. To deal with this we have included
pre and post configurable
filtering options
aa alkaline (20)
aa alkaline batteries (80)
@LeeFootSEO | #BrightonSEO
Keep Longest Word Fragment = True
64. You Will Need
Screaming Frog – To crawl the site
Keywords Everywhere API – To check search
volume ($10 for 100,000 creds)
@LeeFootSEO | #BrightonSEO
65. You Will Need
Screaming Frog – To crawl the site
Keywords Everywhere API – To check search
volume ($10 for 100,000 creds)
Python with the following libraries
imported
@LeeFootSEO | #BrightonSEO
66. You Will Need
Screaming Frog – To crawl the site
Keywords Everywhere API – To check search
volume ($10 for 100,000 creds)
Python with the following libraries
imported
NLTK – Used to create n-gram word
combinations
@LeeFootSEO | #BrightonSEO
67. You Will Need
Screaming Frog – To crawl the site
Keywords Everywhere API – To check search
volume ($10 for 100,000 creds)
Python with the following libraries
imported
NLTK – Used to create n-gram word
combinations
PolyFuzz – To match KWs to existing
categories
@LeeFootSEO | #BrightonSEO
77. r
.csv exports are
read into Python
and processed with
the Natural
Language Tool Kit
library.
@LeeFootSEO | #BrightonSEO
78. Cluster
Product names are
clustered together
using n-grams to
generate new words
Keyword
aa alkaline
aa alkaline batteries
aa alkaline batteries command
aa alkaline batteries command adjustables
aa alkaline batteries command adjustables self
aa alkaline batteries command adjustables self
adhesive
aa alkaline batteries duracell
aa alkaline batteries duracell optimum
aa alkaline batteries duracell optimum aa
aa alkaline batteries duracell optimum aa
batteries
aa alkaline batteries duracell plus
aa alkaline batteries duracell plus battery
aa alkaline batteries duracell plus battery pack
aa alkaline batteries duracell plus lr
aa alkaline batteries duracell plus lr aa
aa alkaline batteries duracell specialty
aa alkaline batteries duracell specialty alkaline
aa alkaline batteries duracell specialty alkaline
button
aa alkaline batteries energizer
aa alkaline batteries energizer maxplus
aa alkaline batteries energizer maxplus aa
aa alkaline batteries energizer maxplus aa
batteries
@LeeFootSEO | #BrightonSEO
79. Cluster
Products are
clustered category
by category (so if a
product lives in two
categories, it’ll be
clustered twice)
Keyword
aa alkaline
aa alkaline batteries
aa alkaline batteries command
aa alkaline batteries command adjustables
aa alkaline batteries command adjustables self
aa alkaline batteries command adjustables self
adhesive
aa alkaline batteries duracell
aa alkaline batteries duracell optimum
aa alkaline batteries duracell optimum aa
aa alkaline batteries duracell optimum aa
batteries
aa alkaline batteries duracell plus
aa alkaline batteries duracell plus battery
aa alkaline batteries duracell plus battery pack
aa alkaline batteries duracell plus lr
aa alkaline batteries duracell plus lr aa
aa alkaline batteries duracell specialty
aa alkaline batteries duracell specialty alkaline
aa alkaline batteries duracell specialty alkaline
button
aa alkaline batteries energizer
aa alkaline batteries energizer maxplus
aa alkaline batteries energizer maxplus aa
aa alkaline batteries energizer maxplus aa
batteries
@LeeFootSEO | #BrightonSEO
81. Filterin
g
We started by
generating over half
a million n-grams
using existing
products on
wilko.com
597,66
4
@LeeFoot@SEO | #BrightonSEO
@LeeFootSEO | #BrightonSEO
82. Filterin
g
34,000 were
matched to a
minimum of three
products and the
rest discarded
597,66
4
@LeeFoot@SEO | #BrightonSEO
34,100
@LeeFootSEO | #BrightonSEO
83. Filterin
g
Just under 9,000
keywords remained
after deduplication
These were then
checked for search
volume
597,66
4
@LeeFoot@SEO | #BrightonSEO
34,100
8,969
@LeeFootSEO | #BrightonSEO
84. Filterin
g
The final output
contained 1,883
subcategorisation
opportunities ready to
QA
597,66
4
@LeeFoot@SEO | #BrightonSEO
34,100
8,969
1,883
@LeeFootSEO | #BrightonSEO
85. Filterin
g
99.68% of all
keywords were
discarded before the
final output!
Essentially, we brute
forced the
opportunity
597,66
4
@LeeFoot@SEO | #BrightonSEO
34,100
8,969
1,883
@LeeFootSEO | #BrightonSEO
86. Typical Script
Output
Total Subcategories Generated : 597,6
Matched to Min of: 3 Products: 34,088
Remaining after de-duplication: 8,969
Subcategories with Search Volume: 1,8
Total Volume: 8,023,629
Discarded: 99.68 % of Keywords!
Completed in: 16.15 Minutes
@LeeFootSEO | #BrightonSEO
93. Parent Category Suggested Subcategory Vol CPC # Products Similarity Closest Matched Category
/outdoor-toys/climbing-frames.list rope ladders 2,400 0.28 4 73% loft ladder new ladders
/outdoor-toys/climbing-frames.list wooden climbing frames 90 0.78 3 72% climbing plants
/outdoor-toys/garden-swings.list double swing sets 1,900 0.54 3 61% double beds
/outdoor-toys/garden-swings.list single swing sets 1,000 0.32 4 58% garden swings
/outdoor-toys/garden-swings.list wooden swing sets 8,100 0.97 8 70% wooden garden swing seats
/outdoor-toys/ride-on-toys.list pro stunt scooter 320 0.61 5 20% protect garden
/outdoor-toys/role-play-toys.list outdoor play kitchen 320 0.38 3 82% outdoor kitchens
/outdoor-toys/sandpits.list activity tables 6,600 0.24 7 38% cavity wall
/outdoor-toys/sandpits.list planter tables 5,400 0.28 3 76% planters
/outdoor-toys/sandpits.list plum discovery toys 320 0.7 3 43% ecover
/outdoor-toys/sandpits.list water tables 27,100 0.37 3 59% 6 seater tables
/outdoor-toys/sandpits.list water tracks 480 0.3 3 47% track set shop by room
/outdoor-toys/trampolines.list junior trampolines 880 0.31 4 66% trampolines
It also shows
the number of
products
available to
populate the
new
categories!
@LeeFootSEO | #BrightonSEO
94. Parent Category Suggested Subcategory Volume CPC # Products Similarity Closest Matched Category
/outdoor-toys/climbing-frames.list rope ladders 2,400 0.28 4 73% loft ladder new ladders
/outdoor-toys/climbing-frames.list wooden climbing frames 90 0.78 3 72% climbing plants
/outdoor-toys/garden-swings.list double swing sets 1,900 0.54 3 61% double beds
/outdoor-toys/garden-swings.list single swing sets 1,000 0.32 4 58% garden swings
/outdoor-toys/garden-swings.list wooden swing sets 8,100 0.97 4 70% wooden garden swing seats
/outdoor-toys/ride-on-toys.list pro stunt scooter 320 0.61 5 20% protect garden
/outdoor-toys/role-play-toys.list outdoor play kitchen 320 0.38 3 82% outdoor kitchens
/outdoor-toys/sandpits.list activity tables 6,600 0.24 3 38% cavity wall
/outdoor-toys/sandpits.list planter tables 5,400 0.28 4 76% planters
/outdoor-toys/sandpits.list plum discovery toys 320 0.7 3 43% ecover
/outdoor-toys/sandpits.list
water
tables
27,10
0 0.37
3 59% 6 seater table
/outdoor-toys/sandpits.list water tracks 480 0.3 3 47% track set shop by room
/outdoor-toys/trampolines.list junior trampolines 880 0.31 4 66% trampolines
/outdoor-toys/trampolines.list trampoline accessory kits 70 0.26 4 69% accessory d-line
Suggested categories with high
search demand, but low inventory
can signal that it could be time
to expand the range to tap into
the demand…
Low Inventory
High Demand
@LeeFootSEO | #BrightonSEO
95. Parent Category Suggested Subcategory Vol CPC # Products
Similarit
y Closest Matched Category
/outdoor-toys/climbing-frames.list rope ladders 2,400 0.28 4 73%
loft ladder new
ladders
/outdoor-toys/climbing-frames.list wooden climbing frames 90 0.78 3 72% climbing plants
/outdoor-toys/garden-swings.list double swing sets 1,900 0.54 3 61% double beds
/outdoor-toys/garden-swings.list single swing sets 1,000 0.32 4 58% garden swings
/outdoor-toys/garden-swings.list wooden swing sets 8,100 0.97 8 70%
wooden garden swing
seats
/outdoor-toys/ride-on-toys.list pro stunt scooter 320 0.61 5 20% protect garden
/outdoor-toys/role-play-toys.list outdoor play kitchen 320 0.38 3 82% outdoor kitchens
/outdoor-toys/sandpits.list activity tables 6,600 0.24 7 38% cavity wall
/outdoor-toys/sandpits.list planter tables 5,400 0.28 3 76% planters
/outdoor-toys/sandpits.list plum discovery toys 320 0.7 3 43% ecover
/outdoor-toys/sandpits.list water tables 27,100 0.37 6 59% 6 seater tables
/outdoor-toys/sandpits.list water tracks 480 0.3 3 47%
track set shop by
room
/outdoor-toys/trampolines.list junior trampolines 880 0.31 4 66% trampolines
All category suggestions
are fuzzy matched to
against existing
categories.
Categories which closely
match existing categories
(including plurals and
words out of order) are
removed automatically!
96. Limitations and
Considerations
The output is only
as good as the
naming conventions.
If product names are
short or non-
descriptive then
that’ll affect the
final output.
@LeeFootSEO | #BrightonSEO
97. Limitations and
Considerations
The script will output
keywords in the singular
tense
where as categories will
be pluralised because
they contain more than a
single product
@LeeFootSEO | #BrightonSEO
99. Automation
This script can be automated
on a VPS in conjunction with
an automated crawl setup.
@LeeFootSEO | #BrightonSEO
100. Automation
Perhaps client work can be
road mapped every three
months with the output
automatically sent as an
email or a Slack channel
@LeeFootSEO | #BrightonSEO
101. Remixes and Mashups
I’d love to see some remixes,
mashups and improvements to the
script.
Just make sure you tag me in
anything you make!
@LeeFootSEO | #BrightonSEO
110. Don’t Wait🐍🔥
There is an awesome
community of SEOs Online who
are passionate about Python.
If you’re thinking about
getting started, come and
join us!
111. Python Resources
YouTube Channels
Corey Shafer
Data School
Socratica
MIT Introduction
to Computer
Science & Python
Apps
Solo Learn (Android
/ iPhone)
Books
Automate the Boring
Stuff
112. Python SEOs to follow on
Twitter
@GregBernhardt4
@DataChaz
@OritSiMu
@DanielHereMe
@LeeFootSEO | #BrightonSEO
@SEOPythonistas
@rvtheverett
@vdrweb
@LeeFootSEO 😃
and since then my productivity has gone through the roof and it’s gotten to the point where I’m not even sure how I did my job without it before!
Talk about internal search mapping,
Examples of n-grams generated. Not the highest value of categories – but useful to get the idea across
Examples of n-grams generated. Not the highest value of categories – but useful to get the idea across
I know
I know
In other wod
Tried to account for this in the past, by adding an ‘s’ to the fnial output – but there’s too many edges cases. ‘es’ words and the like
I’ll tweet the link out at the end as well
I’ll tweet this out at the end toogreat community of python enthusiasts and professionals online.
If you want to get started – don’t wait! Make things and dive