It's all about getting ahead of the competition and winning the war on the web. Learn how to scrape your competitors top performing content and keywords, analyze the text with AI tools to find tone, style and consistent themes, and apply that intelligence to develop your own content strategy rooted in performance that will better appeal to your readers and fans and deliver results.
Attend this session to learn advanced optimization secrets:
•Key elements of a web page that can be extracted for research.
•Top discovery tools to quickly find optimized topics, titles and tags.
•How to use XPath and Screaming Frog Web Crawler to fuel research.
•New tools to analyze content and predict the big five characteristics.
•Sneak peek at some new tools for advanced search engine optimization.
3. Hello!
I am Melissa Sciorra
Sr Manager, SEO @ SmarterTravel (a TripAdvisor Company)
3
@mel_arroics
#cmc2019
FamilyVacationCritic.com | SmarterTravel.com | Jetsetter.com | WhatToPack.com | AirfareWatchdog.com | Oyster.com
4. RAISE OF HANDS
How many people in this session work on Search Engine Optimization?
How many people in this session have used Screaming Frog?
How many people in this session have used XPath?
4
5. Agenda
How SEOs and Content Teams can work
together, SMARTER
Web Scraping technology & intro to XPath
Elements of webpages that can be
EXTRACTED
REAL LIFE USE CASES so you can go into
work next week and
5
impress your boss
6. EVOLVE@mel_arroics #cmc2019
Content Strategy
Workflow
Ideation Stage = the time to brainstorm topics.
Brainstorming topics can consist of:
- aha moments
- discovering topics through reading
- watching tv
-something you care about
idea
Writing &
Optimization
publishing
8. EVOLVE@mel_arroics #cmc2019
XPath
A query language that describes a way to find and
process items in XML (and HTML) documents
(Short For XML Path Language)
It’s supported by modern web browsers
In plain ENGLISH:
You can select any element, attribute, table,
content of an element, or meta object in a webpage.
9. Let’s See an Example
“I want to find all <h3> tags in my blog post.”
SCREAMING FROG can extract 2 <h1> and 2 <h2>, but
extracting <h3> doesn’t come out of the box, and it doesn’t
crawl more than 2 Header Tag types.
12. EVOLVE@mel_arroics #cmc2019
PSA: The internetis a collection of pages. LOTS of pages.
Every website is built differently from the next.
Its all HTLM, CSS, JavaScript, etc.
Some are built well. Some are not.
Inconsistency in coding can make data collection hard.
…XPath can help!
13. EVOLVE@mel_arroics #cmc2019
Xpath: Location Paths
Xpath expressions can begin with the root node (the element) with /
/ selects the entire document
/html/head selects contents of the head element only
/html/head/title selects contents of a title element
Node-by-Node is important to understand for XPath, but not necessary to use
//title selects title element no matter where it is
14. EVOLVE@mel_arroics #cmc2019
Your XPath Syntax should be //a/@href
This is because //@href would give you ALL link attributes, from any line of code,
including references to JS, CSS and so on.
What if you want to extract all of the links on a page?
A link is defined by <a @href=“www.website.com/example”</a>
15. EVOLVE@mel_arroics #cmc2019
The Tools You
Need
Screaming Frog
https://bit.ly/29AEs8Q
Google Chrome
http://bit.ly/2CqZqp7
Scraper for Chrome
http://bit.ly/2W6dbAT
XPath Helper
https://bit.ly/2n8gtTC
Make sure developer
tools is enabled.
17. EVOLVE@mel_arroics #cmc2019
SCREAMINGFROG XPATH EXTRACTION
• 10fields allow youto insert Xpath, CSSPath, or RegEx to searchand extract custom elements
• IncludesSyntax Validator
ExtractHTML Element
The selected element andall ofits
innerHTML content.
ExtractInner HTML
The innerHTML contentofthe
selected element; if theselected
element containsotherHTML
elements, they’llbeincluded.
ExtractText
The textcontentofa selected
element andthe textcontentof
anysub elements.
Tip: You choose what
you want to extract
22. EVOLVE@mel_arroics #cmc2019
1.
ExtractExternal
Lists
“Airlines + Luggage Policy” =
I Need To Find All The Airlines To Create A Keyword
Tree To Provide My Content Team.
opportunity
I Find A Ranker.Com Site That Lists Out All
Airlines, But If I + Paste Into Excel It Would Be
Messy. Copy.
Right-click On An Airline Header > Scrape Similar
1
2
3
26. EVOLVE@mel_arroics #cmc2019
2.
ExtractArticle
Publish
Date
1. Identify Top Pages In Google Search Console, Export,
And Open Up A Page Into Your Browser.
2. Find The Date Of Your Article And Right-click > Inspect.
3. Right Click On The Highlighted Entry > Copy > Copy
Xpath:
https://www.jetsetter.com/magazine/cool-things-to-do-in-denver/
//*[@id="container-scroll"]/div/div[2]/div[2]/div[1]/div/span[2]/time
4. Close Source Code And Open XPath Helper. Paste Your
Copied XPath Into “Query” And Make Sure It Returns The
Date Result.
5. Open Screaming Frog > Configuration > Custom >
Extraction.
27. EVOLVE@mel_arroics #cmc2019
2.
ExtractArticle
Publish
Date
5. Open Screaming Frog > Configuration > Custom >
Extraction.
6. Paste Your XPath Function And Name It. Extract
Inner HTML. Check For Checkmark Validation.
7. Paste Your Top URLS Into Screaming Frog And
Crawl.
Find Your Extractions Under Custom
Tab > Extraction Filter
30. EVOLVE@mel_arroics #cmc2019
3.
Analyze
Competitor’sArticle
Titles
Competitive Analysis =
I Want To See The Main Themes Of What They
Are Writing About To Begin My Competitive
Analysis.
1. Run A Crawl Of Competitor’s Website, Or
Extract Highest Performing URLs From
SEMRush And Crawl.
2. Download <H1> Or Title Tags.
3. Paste Into A Text Analyzer, Like
online-utility.org
Find content gaps
32. EVOLVE@mel_arroics #cmc2019
4.
ExtractYouTube
Video Titles And
Tags
New Video Strategy=
I Need To See Where To Start With My Video SEO
Strategy.
1. Visit The YouTube Channel And Load Up Videos
Until You Can’t Load Anymore Under Channel
Videos
2. Right-click On A Video Title And Select Scrape
Similar
3. Export to Google Docs
More visibility
33. EVOLVE@mel_arroics #cmc2019
4.
ExtractYouTube
Video Titles And
Tags
4. Add YouTube.com Through A Concatenate Formula
Onto All URLs:
5. Paste Full URL Into Screaming Frog.
6. Export Crawl Into Excel To Analyze Title, Meta
Description, And Meta Keywords.
=concatenate(”https://www.youtube.com",B2)
34. EVOLVE@mel_arroics #cmc2019
5.
Find Pages With
Specific Anchor
Text
Extract Certain On-Page Links=
I Want To See If Any Of My On-page Link Anchor
Text Contains “Amazon”.
1. Open Screaming Frog.
2. Enter The Below Formula Into Configuration >
Custom > Extraction:
//a[contains(translate(.,'ABCDEFGHIJKLMNOPQRSTUVWXYZ',
'abcdefghijklmnopqrstuvwxyz'),'amazon')]/@href
More opportunity
3. Replace ‘Amazon’ With Other Anchor Text You Want
To Search. Extract Inner HTML.
36. EVOLVE@mel_arroics #cmc2019
6.
FindPages That
ContainExternal
LinksFromSpecific
Sites
Optimize Profitable Pages=
I Want To Extract A List Of All My Affiliate URLs
(fave.co)
1. Open Screaming Frog.
2. Enter The Below Formula Into Configuration >
Custom > Extraction:
//A[contains(@href,'Fave.Co')]/@Href
3. Extract Inner HTML And Crawl Your Website To
Find Your URLs That Contain Fave.co.
More Money
38. EVOLVE@mel_arroics #cmc2019
7.
Find Your
Content Fans For
Outreach
Your Fans =
I Want To Reach Out To People Who Left Comments
On My Site And Let Them Know About A New Piece Of
Content.
Most Users Who Comment On WordPress Blogs Enter
Their Name And Website.
interested IN YOU
39. EVOLVE@mel_arroics #cmc2019
7.
Find Your
Content Fans For
Outreach
Your Fans =
If This Is Something You Or Your Competitor Has
Enabled, Scrape The Names And Websites Of The
Commenters To Reach Out And Tell Them About Your
Content.
interested IN YOU
40. EVOLVE@mel_arroics #cmc2019
8. Analyze Which
Of Your Content
PerformsBest
Finding Valuable Category Types=
I Want To Find Which Type Of Content Gets The
Most Organic Clicks.
1. Pull Top 100 URL From Google Search Console
And Paste Into Screaming Frog.
2. Open A Sample UTL And Find The Location Of
Your Primary Tag.
3. Copy XPath (Right-click, Inspect, Copy XPath)
4. Paste Formula Into Screaming Frog Custom
Extraction.
//*[@id="container-scroll"]/div/div[2]/div[1]/div[1]
Content opp’y
41. EVOLVE@mel_arroics #cmc2019
8. Analyze Which
Of Your Content
PerformsBest
5. Combine Tag Data With Google Search Console
Data Via VLookup And Create A Pivot Table. Create A
Bar Chart.
Clicks by Tag
42. EVOLVE@mel_arroics #cmc2019
XPATH OUTPUT
//h1 Extract all H1tags
//h3[1] Extract the firstH3tag
//h3[2] Extract the secondH3tag
//div/p Extract any<p> containedwithina <div>
//div[@class='author'] Extract any<div> with class“author”
//p[@class='bio'] Extract any<p> with class“bio”
//*[@class='bio'] Extract anyelementwith class“bio”
//ul/li[last()] Extract the last<li>ina <ul>
//ol[@class='cat']/li[1] Extract the first<li> in a <ol> with class“cat”
count(//h2) Countthe numberof H2’s(setextractionfilter to “FunctionValue”)
//a[contains(.,'clickhere')] Extract anylinkwith anchortext containing“click here”
//a[starts-with(@title,'Writtenby')] Extract anylinkwith a titlestartingwith “Writtenby”
//@href Extract all links
//a[starts-with(@href,'mailto')]/@href Extract linkthat startswith “mailto” (emailaddress)
//meta[@property='article:published_time']/@content Extract the articlepublishdate(commonly-foundmetatag onWP)
Welcome to Advanced SEO, competitive intelligence, modern web scraping, and more.
My name is Melissa Sciorra, and I’m currently the senior manager of SEO at SmarterTravel, a TripAdvisor company. We own and operate travel websites that reach nearly 200 million unique visitors each month. You may have heard of some of my sites, including Jetsetter.com, Airfarewatchdog.com, Oyster.com, and our newest site, whattopack.com.
Feel free to tweet at me using my handle, @mel_arroics, and use the hashtag CMC2019.
I want to preface this talk by first including a disclaimer; I’ve been in SEO for almost 9 years, and I’m by no means a developer who is proficient in python. We all know that in SEO, sometimes things can get a little repetitive, and I’ve discovered ways of fueling my research that can help save time and automate processes, and provide competitive insights.
this TOPIC gets very technical very quickly, so I’m going to try to break it down to a level where anyone can understand and use these functions to make custom extraction easy
Quick poll:
How many people in this session work in SEO full time?
How many people in this session work in SEO part time?
How many people have used screaming frog
How many people have never heard of screaming frog?
How many people have used xpath?
How many people have never heard of xpath?
Today, we are going to learn how SEO’s can automate research processes to help fuel their own competitive research, and to help provide insights to content teams. We’re going to dive into webscraping technology in todays age, and what xpath is. We’ll go over elements of webpages that can be extracted using real life examples, and by the end of this session, you’ll have takeaways that you can start using at work to imress your boss, your colleagues, your friends, and maybe even your mothers.
Let’s dive in.
We know that SEO in 2019 is still about creating really awesome content for our users.
This means you and your team must must continuously come up with great ideas, or find great ideas from existing posts, search query reports, or competitive analysis and content gaps.
Content strategy begins with the ideation stage, and brainstorming topics can consist of aha moments, watching tv, things you are passionate about, and more.
You can also come up with ideas through web scraping. That is, scraping what your competitors are doing, and this starts at the type of content they are writing about.
What is Web Scraping?
A way of automating the process of gathering information from different sites on the internet.
The trick with web scraping is that you have to have a basic understanding of how a web page’s markup is laid out. This, plus an understanding of Xpath, helps you extract data quickly and easily.
So what is Xpath and how can it make my life easier?
Xpath is a query language for selecting pieces of information in an XML document. It allows you to extract elements, attributes and objects from the HTML in a webpage. Its supported by most web browsers
This means that any website, your own website and your competitors websites, can be scraped for information that you want based on cammands you write in Xpath.
Lets see an example. For those of you who have used screaming frog before, we know that the H1 and H2 tags can be pulled automatically with every site crawl, but lets say we want to also identify and analyze H3 tags.
I’d open the custom extraction field in screaming frog and enter the syntax for H3.
The two slashes mean search the entire XML document and looks for any element containing <h3>.
When I enter the syntax, I can find the extraction within the custom field in Screaming frog.
But there is more too it than just copying and pasting expressions. If only it were that simple….
The internet is full of tons of webpages that are built differently from the next. The only similarity is that XML documents contain HTML, CSS, and JS.
Xpath can help automate the process of data collection, saving you time at your keyboard to work on more strategic goals.
Node by node begins at the root node, a slash.
Two slashes searches the whole document.
Use XPath to extract any HTML element of a webpage. If you want to scrape information contained in a div, span, p, heading tag or really any other HTML element
The Screaming Frog SEO Spider is a website crawler, that allows you to crawl websites’ URLs and fetch key elements to analyse and audit technical and onsite SEO
Inspect and live-edit the HTML and CSS of a page using the Chrome DevTools Elements panel.
Google Chrome has a feature that makes writing XPath easier. Using the Inspect tool, you can right-click on any element and copy the XPath syntax. It’ll often be the case that you’ll need to modify what Chrome gives you before pasting the XPath into Screaming Frog, but it at least gets you started.
Scraper for Chrome is a simple and fast tool that allows you to identify and refine xpath expressions.
QA your xpath queries
Lets start off with an easy example. Our content team came up with the idea to create a large piece of content that explained luggage policy by airline after doing a few searches on Google and using SEMrush. As an SEO, I have to provide the content team with the highest volume search terms so they can narrow down their list. I google “list of American airlines” and find a ranker.com website that lists all the airlines in America. I could copy and paste this list in excel, but I would be left with a really messy spreadsheet that would take time to clean up. Instead, I right click on an airline header, and use my tool “Scrape similar“
Right click > Scrape similar
From here, the Xpath reference is /html/body/article/h2/div/a, but I remove my root node info and include two slashes next to my h2 to find all H2s in the XML document. I can then export these into excel, put together a concatenate formula based off of popular luggage policy terms, and upload them into google adwords to find average monthly search volume.
Lets see another example. We know that having updated content not only makes Google happy, but it also makes users happy. For example, I search for best shows on Netflix and am presented with position 1 and position 2 SERPs. One shows me its been updated in April and the other has been updated in march – which one do you think I’m going to click into?
You should make this a normal deliverable to provide to your client or content team. Heres how you do it. First, identify your top pages in Google Search Console and export. Open up one of those pages into your browser and find the date on page. Right click and inspect element, which brings up the code in devbrowser. Rigt click on the highlighted entry within the code, and copy xpath. For example, on my jetsetter.com URL for cool things to do in Denver, my xpath looks like this. To QA, I’m going to open my Xpath helper and paste the xpath into it.
Analyze competitor’s recent posts titles. Plug into a text analysis tool to let us see what posts are about
We advise being very careful with this strategy. Remember, these people may have left a comment, but they didn’t opt into your email list. That could have been for a number of reasons, but chances are they were only really interested in this post. We, therefore, recommend using this strategy only to tell commenters about the updates to the post and/or other new posts that are similar. In other words, don’t email people about stuff they’re unlikely to care about!
..Use hunger.io add-on in Google Sheets for to find Emails