Scaling Keyword Research to Find Content Gaps

Scaling Keyword Research to
Find Content Gaps
Hamlet Batista

#TTTLive
@hamletbatista
Hamlet Batista
Founder/CEO @ RankSense
Hamlet Batista is CEO and founder of
RankSense, an agile SEO platform for
online retailers and manufacturers.
He holds US patents on innovative
SEO technologies, started doing SEO
as a successful affiliate marketer
back in 2002, and believes great SEO
results should not take 6 months.

#TTTLive
@hamletbatista
How Low Can #1 Go?
Moz’s Feb 2020 report finds ten
organic blue links pushed further
down the page.
This is a refresh from a 2013
research study.
https://moz.com/blog/how-low-can-
number-one-go-2020

#TTTLive
@hamletbatista
What is an Organic Listing in 2020?
In his response to the article, Google’s
Danny Sullivan contends organic
listings are no longer just the ten plain
blue links.
Users expectations from Google have
changed over time and Google has
adapted to them.
https://twitter.com/dannysullivan/status/12327
45667119865856

#TTTLive
@hamletbatista
Keyword Research in 2013
Track the keyword rankings
03
● Position tracking
● Share of voice
● SERP Pixel tracking
Build content rich web pages to match
the keywords02
● Content word length
● Social media promotion
● Compelling headlines
Research keywords/topics
01
● Low competition
● Relevant
● High search volume

#TTTLive
@hamletbatista
Keyword Research in 2020
As the ten blue web links get pushed down the SERP, our
research should focus on the features replacing them.
https://moz.com/learn/seo/serp-features

#TTTLive
@hamletbatista
Agenda 1. What are content formats?
2. Mapping content formats to
SERP features
3. Using SERP features to research
content formats gaps
4. Automating the process with
Python

#TTTLive
@hamletbatista
What are content formats?
Content templates:
1. Article
2. Forum post
3. Product page
4. Tool/calculator
5. Directory listing
6. Etc.
Content formats:
1. Video
2. Image
3. List (ordered, unordered)
4. Table
5. Answers
6. Reviews
7. Etc.

#TTTLive
@hamletbatista
How to detect content formats in
web pages?
We can find missed content format opportunities using structured
data:
1. If there is relevant content and no structured data, there is
opportunity to add it
2. If there is structured data and no relevant content, there is
opportunity to add the content

#TTTLive
@hamletbatista
Mapping SEMrush SERP features to
content formats

#TTTLive
@hamletbatista
Checking for EmbedURL
JSONPath to detect Video

#TTTLive
@hamletbatista
Let’s automate this!
Here is our technical plan:
1. Extract keywords (and pages) with high impressions and no clicks
2. Extract SERP features for those keywords
3. Use our Feature->Format (JSONPaths) map to identify content
format expected
4. Check if page includes format
5. Report content formats missing

#TTTLive
@hamletbatista
Extracting underperforming
keywords and pages from
Google Search Console

#TTTLive
@hamletbatista
Extract keywords with high impressions
and no clicks
Using code from TTT webinar
https://trafficthinktank.com/cours
es/automation-for-seo/
1. !git clone
https://github.com/hamletbatista/google-
searchconsole
2. !pip3 install google-searchconsole/

#TTTLive
@hamletbatista
Extract keywords with high impressions
and no clicks
Configure Search Console API
https://developers.google.co
m/webmaster-tools
1. Activate Search Console API in Compute Engine
https://console.cloud.google.com/apis/api/webmasters.
googleapis.com/overview?project=&folder=&organizati
onId=
2. Create New Credentials / Help me choose (Search
Console API, Other UI, User data)
https://console.cloud.google.com/apis/credentials/wizar
d?api=iamcredentials.googleapis.com&project=
3. Download client_id.json

#TTTLive
@hamletbatista
Extract keywords with high
impressionsand no clicks
Upload client_id.json
from google.colab import files
files.upload()
# run once
import searchconsole
account =
searchconsole.authenticate(client_config="client_id.json",
serialize='credentials.json', from_colab=True)

#TTTLive
@hamletbatista
impressions and no clicks
Get keywords and pages
webproperty = account['https://www.domain.com/']
#Last 7 days of GSC data
query = webproperty.query.range(start='today', days=-7).dimension('page', 'query')#.limit(100)
r = query.get()
import pandas as pd
df = pd.DataFrame(r.rows)
df.head()

#TTTLive
@hamletbatista
Get keywords and pages

#TTTLive
@hamletbatista
Filter by high impressions and no clicks
high_potential = df.query("clicks == 0.0 & impressions > 10 & position < 20")
high_potential

#TTTLive
@hamletbatista
Extracting SERP features from
SEMrush

#TTTLive
@hamletbatista
Using code from SEMrush webinar
https://www.semrush.com/blog/weekly
-wisdom-hamlet-batista-python-
javascript-marketers/
1. Extracting data from SEMRush
2. You can find SEMrush API reference here
https://www.semrush.com/api-analytics/
3. You can find your API key here
https://www.semrush.com/api-use/
4. Fk > All SERP Features triggered by a keyword. List of
available SERP Features
5. Ph > Keyword bringing users to the website via Google's
top 20 organic search results.

#TTTLive
@hamletbatista
Get SERP features Gist
https://gist.github.com/hamletb
atista/ed5e810b56acf0f8490e29
050caa4351

#TTTLive
@hamletbatista
Get SERP features names
(from indices) Gist
https://gist.github.com/hamletb
atista/74730874b7e0540cd51d3
ab749f18ffd

#TTTLive
@hamletbatista
Get SERP features names by keywords from SEMrush
df["SERP Feature by Keyword Names"] = df["SERP Features by Keyword"].apply(lambda x: ",".join(get_feature_names(x)) )

#TTTLive
@hamletbatista
Let’s merge SEMrush features with our Google Search Console data!
We merge on query and Keyword columns.
new_df = pd.merge(high_potential, df, how="right", left_on="query", right_on="Keyword")

#TTTLive
@hamletbatista
Checking if pages include expected
content formats

#TTTLive
@hamletbatista
Checking if pages include
expected content formats
Using third party libraries:
requests, extract and jsonpah-
ng
1. Extract all structured data
from the page
2. Map expected formats to
JSONPaths
1. !pip install extruct==0.7.3
2. !pip install rdflib==4.2.2
I needed to revert the latest
version due to an error.

#TTTLive
@hamletbatista
Extract structured data
import extruct
import requests
import pprint
from w3lib.html import get_base_url
pp = pprint.PrettyPrinter(indent=2)
r = requests.get('https://www.cnn.com/videos/health/2020/04/25/elmo-sesame-street-people-wearing-masks-gupta-sot-town-hall-
vpx.cnn')
base_url = get_base_url(r.text, r.url)
data = extruct.extract(r.text, base_url=base_url)
pp.pprint(data)

#TTTLive
@hamletbatista
Extract structured data

#TTTLive
@hamletbatista
JSONPath selectors
1. $..acceptedAnswer
2. $..address
3. $..review
4. $..embedUrl
5. $..employmentType

#TTTLive
@hamletbatista
Does the page include our content
formats?
https://gist.github.com/hamletbatista
/f77d6cd6343b240f6451116a5a7c08b6

#TTTLive
@hamletbatista

#TTTLive
@hamletbatista
Does the page include our content
formats?
This function uses the content formats and expected SERP
features to calculate the opportunity gaps.
https://gist.github.com/hamletbatista/157e7cad373113e976
4e280f106bdac5
We consider an opportunity if there a SERP feature
requested (for example, a video carousel), and there is no
corresponding content format in the page (no video in
the structured data).
We count opportunities as 1.

#TTTLive
@hamletbatista
Report missing
content formats

#TTTLive
@hamletbatista
Yellow colored spots
represent opportunity

#TTTLive
@hamletbatista
Visualize our content gap matrix
import plotly.graph_objects as go
columns = ["image", "video", "local_business",
"review", "top_story", "faq", "job"]
data=go.Heatmap(z=gap_df[columns], x=columns, y=gap_df.url)

#TTTLive
@hamletbatista
Visualize our content gaps as a binary heatmap
fig = go.Figure(data)
fig.update_xaxes(side=”top")

#TTTLive
@hamletbatista
Resources to learn more

#TTTLive
@hamletbatista
Resources to learn more
Python Introduction for SEOs
https://www.searchenginejournal.com/introduc
tion-to-python-seo-spreadsheets/342779/
Search-driven Content Strategy
https://www.slideshare.net/stephaniebeadell/s
earchdriven-content-strategy-mozcon-2018-
105014924
Query Syntax
http://www.blindfiveyearold.com/query-syntax
SEO Automation Course
https://trafficthinktank.com/courses/automatio
n-for-seo/

#TTTLive
@hamletbatista
About RankSense

#TTTLive
@hamletbatista
About RankSense
Automate tedious SEO tasks in Google Sheets.
Import the sheets and deploy them as
experiments to Cloudflare.
Learn which changes effective.
https://www.ranksense.com

Scaling Keyword Research to Find Content Gaps

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (19)

Semelhante a Scaling Keyword Research to Find Content Gaps

Semelhante a Scaling Keyword Research to Find Content Gaps (20)

Mais de Hamlet Batista

Mais de Hamlet Batista (13)

Último

Último (20)

Scaling Keyword Research to Find Content Gaps

Notas do Editor