#CMC2019: Advanced SEO: Competitive intelligence, Web Scraping, and More.

Advanced SEO
COMPETITIVE INTELLIGENCE,
MODERN WEB SCRAPING, & MORE

Advanced SEO
Competitive intelligence, modern web scraping, & more

RAISE OF HANDS
How many people in this session work on Search Engine Optimization?
How many people in this session have used Screaming Frog?
How many people in this session have used XPath?
4

Agenda
How SEOs and Content Teams can work
together, SMARTER
Web Scraping technology & intro to XPath
Elements of webpages that can be
EXTRACTED
REAL LIFE USE CASES so you can go into
work next week and
5
impress your boss

EVOLVE@mel_arroics #cmc2019
Content Strategy
Workflow
Ideation Stage = the time to brainstorm topics.
Brainstorming topics can consist of:
- aha moments
- discovering topics through reading
- watching tv
-something you care about
idea
Writing &
Optimization
publishing

“Web Scraping is a way of automating the
process of gathering information from
different websites on the Internet.”

XPath
A query language that describes a way to find and
process items in XML (and HTML) documents
(Short For XML Path Language)
It’s supported by modern web browsers
In plain ENGLISH:
You can select any element, attribute, table,
content of an element, or meta object in a webpage.

Let’s See an Example
“I want to find all <h3> tags in my blog post.”
SCREAMING FROG can extract 2 <h1> and 2 <h2>, but
extracting <h3> doesn’t come out of the box, and it doesn’t
crawl more than 2 Header Tag types.

Screaming Frog
Custom Extraction
//h1
//h2
//h3
extract all <h1>
extract all <h2>
extract all <h3>

EVOLVE@mel_arroics #cmc2019@mel_arroics #cmc2019

PSA: The internetis a collection of pages. LOTS of pages.
Every website is built differently from the next.
Its all HTLM, CSS, JavaScript, etc.
Some are built well. Some are not.
Inconsistency in coding can make data collection hard.
…XPath can help!

Xpath: Location Paths
Xpath expressions can begin with the root node (the element) with /
/ selects the entire document
/html/head selects contents of the head element only
/html/head/title selects contents of a title element
Node-by-Node is important to understand for XPath, but not necessary to use
//title selects title element no matter where it is

Your XPath Syntax should be //a/@href
This is because //@href would give you ALL link attributes, from any line of code,
including references to JS, CSS and so on.
What if you want to extract all of the links on a page?
A link is defined by <a @href=“www.website.com/example”</a>

The Tools You
Need
Screaming Frog
https://bit.ly/29AEs8Q
Google Chrome
http://bit.ly/2CqZqp7
Scraper for Chrome
http://bit.ly/2W6dbAT
XPath Helper
https://bit.ly/2n8gtTC
Make sure developer
tools is enabled.

Screaming Frog
Google Chrome
http://bit.ly/2CqZqp7

SCREAMINGFROG XPATH EXTRACTION
• 10fields allow youto insert Xpath, CSSPath, or RegEx to searchand extract custom elements
• IncludesSyntax Validator
ExtractHTML Element
The selected element andall ofits
innerHTML content.
ExtractInner HTML
The innerHTML contentofthe
selected element; if theselected
element containsotherHTML
elements, they’llbeincluded.
ExtractText
The textcontentofa selected
element andthe textcontentof
anysub elements.
Tip: You choose what
you want to extract

Google Chrome
Developer Tools

Scraper For Chrome

XPath Helper
For
Chrome

How To Use XPath
In Your Day-to-Day

1.
ExtractExternal
Lists
“Airlines + Luggage Policy” =
I Need To Find All The Airlines To Create A Keyword
Tree To Provide My Content Team.
opportunity
I Find A Ranker.Com Site That Lists Out All
Airlines, But If I + Paste Into Excel It Would Be
Messy. Copy.
Right-click On An Airline Header > Scrape Similar
1
2
3

1.
ExtractExternal
Lists
Shorten to
//h2/div/a
to collect ALL
airline <h2>

2.
ExtractArticle
Publish
Date
Updated Date Schema =
I Need To Provide My Content Team With High-
performing URLs That Need To Be Reviewed And Updated.
HIGHER CTR%

2.
ExtractArticle
Publish
Date
1. Identify Top Pages In Google Search Console, Export,
And Open Up A Page Into Your Browser.
2. Find The Date Of Your Article And Right-click > Inspect.
3. Right Click On The Highlighted Entry > Copy > Copy
Xpath:
https://www.jetsetter.com/magazine/cool-things-to-do-in-denver/
//*[@id="container-scroll"]/div/div[2]/div[2]/div[1]/div/span[2]/time
4. Close Source Code And Open XPath Helper. Paste Your
Copied XPath Into “Query” And Make Sure It Returns The
Date Result.
5. Open Screaming Frog > Configuration > Custom >
Extraction.

2.
ExtractArticle
Publish
Date
5. Open Screaming Frog > Configuration > Custom >
Extraction.
6. Paste Your XPath Function And Name It. Extract
Inner HTML. Check For Checkmark Validation.
7. Paste Your Top URLS Into Screaming Frog And
Crawl.
Find Your Extractions Under Custom
Tab > Extraction Filter

3.
Analyze
Competitor’sArticle
Titles
Competitive Analysis =
I Want To See The Main Themes Of What They
Are Writing About To Begin My Competitive
Analysis.
1. Run A Crawl Of Competitor’s Website, Or
Extract Highest Performing URLs From
SEMRush And Crawl.
2. Download <H1> Or Title Tags.
3. Paste Into A Text Analyzer, Like
online-utility.org
Find content gaps

3.
Analyze
Competitor’sArticle
Titles

4.
ExtractYouTube
Video Titles And
Tags
New Video Strategy=
I Need To See Where To Start With My Video SEO
Strategy.
1. Visit The YouTube Channel And Load Up Videos
Until You Can’t Load Anymore Under Channel
Videos
2. Right-click On A Video Title And Select Scrape
Similar
3. Export to Google Docs
More visibility

4.
ExtractYouTube
Video Titles And
Tags
4. Add YouTube.com Through A Concatenate Formula
Onto All URLs:
5. Paste Full URL Into Screaming Frog.
6. Export Crawl Into Excel To Analyze Title, Meta
Description, And Meta Keywords.
=concatenate(”https://www.youtube.com",B2)

5.
Find Pages With
Specific Anchor
Text
Extract Certain On-Page Links=
I Want To See If Any Of My On-page Link Anchor
Text Contains “Amazon”.
1. Open Screaming Frog.
2. Enter The Below Formula Into Configuration >
Custom > Extraction:
//a[contains(translate(.,'ABCDEFGHIJKLMNOPQRSTUVWXYZ',
'abcdefghijklmnopqrstuvwxyz'),'amazon')]/@href
More opportunity
3. Replace ‘Amazon’ With Other Anchor Text You Want
To Search. Extract Inner HTML.

5.
Find Pages with
Specific Anchor
Text

6.
FindPages That
ContainExternal
LinksFromSpecific
Sites
Optimize Profitable Pages=
I Want To Extract A List Of All My Affiliate URLs
(fave.co)
1. Open Screaming Frog.
2. Enter The Below Formula Into Configuration >
Custom > Extraction:
//A[contains(@href,'Fave.Co')]/@Href
3. Extract Inner HTML And Crawl Your Website To
Find Your URLs That Contain Fave.co.
More Money

7.
Find Your
Content Fans For
Outreach
Your Fans =
I Want To Reach Out To People Who Left Comments
On My Site And Let Them Know About A New Piece Of
Content.
Most Users Who Comment On WordPress Blogs Enter
Their Name And Website.
interested IN YOU

7.
Find Your
Content Fans For
Outreach
Your Fans =
If This Is Something You Or Your Competitor Has
Enabled, Scrape The Names And Websites Of The
Commenters To Reach Out And Tell Them About Your
Content.
interested IN YOU

8. Analyze Which
Of Your Content
PerformsBest
Finding Valuable Category Types=
I Want To Find Which Type Of Content Gets The
Most Organic Clicks.
1. Pull Top 100 URL From Google Search Console
And Paste Into Screaming Frog.
2. Open A Sample UTL And Find The Location Of
Your Primary Tag.
3. Copy XPath (Right-click, Inspect, Copy XPath)
4. Paste Formula Into Screaming Frog Custom
Extraction.
//*[@id="container-scroll"]/div/div[2]/div[1]/div[1]
Content opp’y

8. Analyze Which
Of Your Content
PerformsBest
5. Combine Tag Data With Google Search Console
Data Via VLookup And Create A Pivot Table. Create A
Bar Chart.
Clicks by Tag

XPATH OUTPUT
//h1 Extract all H1tags
//h3[1] Extract the firstH3tag
//h3[2] Extract the secondH3tag
//div/p Extract any<p> containedwithina <div>
//div[@class='author'] Extract any<div> with class“author”
//p[@class='bio'] Extract any<p> with class“bio”
//*[@class='bio'] Extract anyelementwith class“bio”
//ul/li[last()] Extract the last<li>ina <ul>
//ol[@class='cat']/li[1] Extract the first<li> in a <ol> with class“cat”
count(//h2) Countthe numberof H2’s(setextractionfilter to “FunctionValue”)
//a[contains(.,'clickhere')] Extract anylinkwith anchortext containing“click here”
//a[starts-with(@title,'Writtenby')] Extract anylinkwith a titlestartingwith “Writtenby”
//@href Extract all links
//a[starts-with(@href,'mailto')]/@href Extract linkthat startswith “mailto” (emailaddress)
//meta[@property='article:published_time']/@content Extract the articlepublishdate(commonly-foundmetatag onWP)

Keep Learning!
• https://www.linkedin.com/pulse/secret-increasing-organic-ctr-
2019-updating-your-article-sciorra/
• https://builtvisible.com/seo-guide-to-xpath/
• https://www.screamingfrog.co.uk/web-scraping/
• https://www.w3schools.com/xml/xpath_intro.asp
• https://www.pmg.com/blog/how-to-use-xpath-in-screaming-frog/
• https://uproer.com/articles/screaming-frog-custom-extraction-
xpath-regex/
• https://ahrefs.com/blog/web-scraping-for-marketers/
43

44
Thanks!
Any questions?
@mel_arroics
#cmc2019

#CMC2019: Advanced SEO: Competitive intelligence, Web Scraping, and More.

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a #CMC2019: Advanced SEO: Competitive intelligence, Web Scraping, and More.

Semelhante a #CMC2019: Advanced SEO: Competitive intelligence, Web Scraping, and More. (20)

Último

Último (20)

#CMC2019: Advanced SEO: Competitive intelligence, Web Scraping, and More.

Notas do Editor