2. #TTTLive
@hamletbatista
Hamlet Batista
Founder/CEO @ RankSense
Hamlet Batista is CEO and founder of
RankSense, an agile SEO platform for
online retailers and manufacturers.
He holds US patents on innovative
SEO technologies, started doing SEO
as a successful affiliate marketer
back in 2002, and believes great SEO
results should not take 6 months.
3. #TTTLive
@hamletbatista
How Low Can #1 Go?
Moz’s Feb 2020 report finds ten
organic blue links pushed further
down the page.
This is a refresh from a 2013
research study.
https://moz.com/blog/how-low-can-
number-one-go-2020
4. #TTTLive
@hamletbatista
What is an Organic Listing in 2020?
In his response to the article, Google’s
Danny Sullivan contends organic
listings are no longer just the ten plain
blue links.
Users expectations from Google have
changed over time and Google has
adapted to them.
https://twitter.com/dannysullivan/status/12327
45667119865856
5. #TTTLive
@hamletbatista
Keyword Research in 2013
Track the keyword rankings
03
● Position tracking
● Share of voice
● SERP Pixel tracking
Build content rich web pages to match
the keywords02
● Content word length
● Social media promotion
● Compelling headlines
Research keywords/topics
01
● Low competition
● Relevant
● High search volume
6. #TTTLive
@hamletbatista
Keyword Research in 2020
As the ten blue web links get pushed down the SERP, our
research should focus on the features replacing them.
https://moz.com/learn/seo/serp-features
7. #TTTLive
@hamletbatista
Agenda 1. What are content formats?
2. Mapping content formats to
SERP features
3. Using SERP features to research
content formats gaps
4. Automating the process with
Python
8. #TTTLive
@hamletbatista
What are content formats?
Content templates:
1. Article
2. Forum post
3. Product page
4. Tool/calculator
5. Directory listing
6. Etc.
Content formats:
1. Video
2. Image
3. List (ordered, unordered)
4. Table
5. Answers
6. Reviews
7. Etc.
10. #TTTLive
@hamletbatista
How to detect content formats in
web pages?
We can find missed content format opportunities using structured
data:
1. If there is relevant content and no structured data, there is
opportunity to add it
2. If there is structured data and no relevant content, there is
opportunity to add the content
15. #TTTLive
@hamletbatista
Let’s automate this!
Here is our technical plan:
1. Extract keywords (and pages) with high impressions and no clicks
2. Extract SERP features for those keywords
3. Use our Feature->Format (JSONPaths) map to identify content
format expected
4. Check if page includes format
5. Report content formats missing
17. #TTTLive
@hamletbatista
Extract keywords with high impressions
and no clicks
Using code from TTT webinar
https://trafficthinktank.com/cours
es/automation-for-seo/
1. !git clone
https://github.com/hamletbatista/google-
searchconsole
2. !pip3 install google-searchconsole/
18. #TTTLive
@hamletbatista
Extract keywords with high impressions
and no clicks
Configure Search Console API
https://developers.google.co
m/webmaster-tools
1. Activate Search Console API in Compute Engine
https://console.cloud.google.com/apis/api/webmasters.
googleapis.com/overview?project=&folder=&organizati
onId=
2. Create New Credentials / Help me choose (Search
Console API, Other UI, User data)
https://console.cloud.google.com/apis/credentials/wizar
d?api=iamcredentials.googleapis.com&project=
3. Download client_id.json
19. #TTTLive
@hamletbatista
Extract keywords with high
impressionsand no clicks
Upload client_id.json
from google.colab import files
files.upload()
# run once
import searchconsole
account =
searchconsole.authenticate(client_config="client_id.json",
serialize='credentials.json', from_colab=True)
20. #TTTLive
@hamletbatista
Extract keywords with high
impressions and no clicks
Get keywords and pages
webproperty = account['https://www.domain.com/']
#Last 7 days of GSC data
query = webproperty.query.range(start='today', days=-7).dimension('page', 'query')#.limit(100)
r = query.get()
import pandas as pd
df = pd.DataFrame(r.rows)
df.head()
22. #TTTLive
@hamletbatista
Extract keywords with high
impressions and no clicks
Filter by high impressions and no clicks
high_potential = df.query("clicks == 0.0 & impressions > 10 & position < 20")
high_potential
24. #TTTLive
@hamletbatista
Extract keywords with high
impressions and no clicks
Using code from SEMrush webinar
https://www.semrush.com/blog/weekly
-wisdom-hamlet-batista-python-
javascript-marketers/
1. Extracting data from SEMRush
2. You can find SEMrush API reference here
https://www.semrush.com/api-analytics/
3. You can find your API key here
https://www.semrush.com/api-use/
4. Fk > All SERP Features triggered by a keyword. List of
available SERP Features
5. Ph > Keyword bringing users to the website via Google's
top 20 organic search results.
26. #TTTLive
@hamletbatista
Extract keywords with high
impressions and no clicks
Get SERP features names
(from indices) Gist
https://gist.github.com/hamletb
atista/74730874b7e0540cd51d3
ab749f18ffd
27. #TTTLive
@hamletbatista
Extract keywords with high
impressions and no clicks
Get SERP features names by keywords from SEMrush
df["SERP Feature by Keyword Names"] = df["SERP Features by Keyword"].apply(lambda x: ",".join(get_feature_names(x)) )
28. #TTTLive
@hamletbatista
Extract keywords with high
impressions and no clicks
Let’s merge SEMrush features with our Google Search Console data!
We merge on query and Keyword columns.
new_df = pd.merge(high_potential, df, how="right", left_on="query", right_on="Keyword")
30. #TTTLive
@hamletbatista
Checking if pages include
expected content formats
Using third party libraries:
requests, extract and jsonpah-
ng
1. Extract all structured data
from the page
2. Map expected formats to
JSONPaths
1. !pip install extruct==0.7.3
2. !pip install rdflib==4.2.2
I needed to revert the latest
version due to an error.
31. #TTTLive
@hamletbatista
Checking if pages include
expected content formats
Extract structured data
import extruct
import requests
import pprint
from w3lib.html import get_base_url
pp = pprint.PrettyPrinter(indent=2)
r = requests.get('https://www.cnn.com/videos/health/2020/04/25/elmo-sesame-street-people-wearing-masks-gupta-sot-town-hall-
vpx.cnn')
base_url = get_base_url(r.text, r.url)
data = extruct.extract(r.text, base_url=base_url)
pp.pprint(data)
34. #TTTLive
@hamletbatista
Checking if pages include
expected content formats
Does the page include our content
formats?
https://gist.github.com/hamletbatista
/f77d6cd6343b240f6451116a5a7c08b6
36. #TTTLive
@hamletbatista
Checking if pages include
expected content formats
Does the page include our content
formats?
This function uses the content formats and expected SERP
features to calculate the opportunity gaps.
https://gist.github.com/hamletbatista/157e7cad373113e976
4e280f106bdac5
We consider an opportunity if there a SERP feature
requested (for example, a video carousel), and there is no
corresponding content format in the page (no video in
the structured data).
We count opportunities as 1.
Make sure you use the font, “Poppins” throughout this deck. You’ll have been sent this but can also download for free here: https://fonts.google.com/specimen/Poppins?selection.family=Poppins:100,100i,200,200i,300,300i,400,400i,500,500i,600,600i,700,700i,800,800i,900,900i
This is your speaker bio page - we have individually designed images for each of you that we can add to these pages if you like.
Make sure you use the font, “Poppins” throughout this deck. You’ll have been sent this but can also download for free here: https://fonts.google.com/specimen/Poppins?selection.family=Poppins:100,100i,200,200i,300,300i,400,400i,500,500i,600,600i,700,700i,800,800i,900,900i