4. Crawl & Technical SEO Site Data
When crawling the site, we found:
• An uncrawlable number on-site URLs, 3,445 of which were designated ‘core’ site URLs. Core
addresses include only those which should be unique and search accessible with unique Meta
data
• Because the site is so vast we performed two crawls. We performed our usual ‘inclusive’ crawl
(which includes all addresses: URLs, PDFs, CSS, images etc) and an additional exclusive crawl
• Our exclusive crawl filtered out URLs which were spawning over-numerously in order to get a
better view of the core site URLs
• There was a lot to discover in terms of architectural issues with Example Site, so read on and we’ll
divulge everything which we have learned
• Our ‘exclusive’ crawl reached only 2% (estimated) completion once it had crawled 45,000+ URLs
and ‘discovered’ over 1.3 million! We estimate that the site may have over 5-million URLs,
meaning that core site URLs only build up 0.06% of the site’s web pages
Site Overview
5. Pages by Address Type
Site Overview
As mentioned, we ran two
crawls of the site. One was a
2% completed ‘inclusive’ crawl
which would never have fully
resolved, even if we had left it
running for over a week. The
other was our ‘exclusive’ crawl
which excluded page-types
that were over-numerous.
This pie-chart adds together
the data from both our crawls.
Because we cut the inclusive
crawl short it shows that core
URLs build up 8% of the site. In
actual fact this segment is likely
closer to 0.06% in size (by our
estimates) and in realistic
terms, wouldn’t even be visible
on this pie chart!
By request: our data and
working (should you be
interested) will be available to
you in Excel format. Your
developer may be able to use
the we provide information to
streamline your website.
We would highly recommend
this activity!
NO INDEX
NO INDEX
NO INDEX
NO INDEX
= Blocked in Robots.txt
NO INDEX = Blocked via Meta No-index
6. Core URLs: Page Titles
47% of page titles across core-site
URLs are in ‘good’ health from a top
down perspective, this is moderate
51% of page titles are suffering from
‘bad length’ related issues, meaning
that they are either too short or too
long
We should re-write all afflicted Page
Titles in order to better increase our
SERPs (search engine ranking
positions) on Google, Bing and other
leading search engines
Site Overview
7. Core URLs: Meta Descriptions
Less than 1% of descriptions across
core-site URLs are in ‘good’ health
from a top down perspective, this is
very poor
75% of descriptions are suffering
from combined bad length and
internal duplication issues. These
descriptions are either too short or
too long, as well as being duplicated
internally
We should re-write all afflicted
Descriptions in order to optimise the
conversion rates of our current SERPs
(search engine ranking positions)
Site Overview
8. Core URLs: H1s
14% of H1s across core-site URLs are
in ‘good’ health from a top down
perspective, this is poor
85% of H1s are entirely missing
We should re-write all afflicted H1s
in order to better increase our SERPs
(search engine ranking positions) on
Google, Bing and other leading search
engines
All pages missing H1s should have a
unique, keyword-optimised heading
written for them
Site Overview
10. Architectural Issues: Review Pages
• 14% of crawled URLs were built up of addresses which either contain “/referer/” or “/form_key/” in the URL. In terms of SEO, these URLs
add nothing and should be discarded. That being said: these URLs look like they bind into some greater CMS / dev functionality of which
we are not aware. We would suggest keeping these URLs for now as they have already been blocked via robots.txt (and thus shouldn’t
cause major problems
• 7% of on-site addresses were built up of URLs containing “/review/product/” in the URL. We found 2,985 of these URLs. We crawled
them all using XPath and found that only one out of 2,985 pages did not contain the on-page text: “Be the first to review this product”
• This surely shows that the downside of these product-review URLs (they cause site architecture to sprawl by creating new addresses) far
outweighs their potential benefit (ONE PERSON ever has left a product review, and when you click to view it nothing happens so the
functionality is broken anyway). Merge the review page functionality onto the actual product pages, don’t have separate URLs for these.
GET rid of them (as separate addresses)!
• Example Site would be far better off getting their services reviewed than their products. What do people have to say about
packaging, shipping and customer service? Remember: tradesmen know what they need. You’re not selling iPods, one ‘bit’ produced by
the same manufacturer is much the same as another. That being the case – why would product reviews be successful on this type of
website? How is one screw or washer miles better than another? What can be better are the service-elements which Example Site
provides. Focus on that area for reviews
Issue #1 Action Point(s): Remove or integrate review pages with product URLs
11. Severe Architectural SEO Issues
• Looking at our combined crawl data, we found 14,165 unique URLs which included a ‘thickness’ based filter (e.g: /thickness-
%28mm%29/ in the URL). These URLs represent 31% of all on-site addresses and have been Meta no-indexed. This stops
potential content duplication problems but causes crawl-allowance issues (as Google still has to crawl these pages to find the
Meta no-index tag)
• We also found 5,457 colour filtered URLs (e.g: /colour/light-blue-lime-green-burgundy-mustard-yellow-bronze-tint-bronze-
dark-grey-light-green-silver-teal-white/ in the URL). These are particularly problematic as they extend URLs to unreasonable
lengths, due to citing every chosen matching colour in the URL structure (completely unnecessary and causing SEO problems).
These URLs have also been no-indexed, so they’re also causing crawl allowance issues
• A further 12% of URLs contain on-page results sorting (e.g: /sort-by/position/ in the URL). This information really shouldn’t
be present in the URL at all. Whereas people may search for green acrylic plastic; NO ONE is specifically searching for a URL
which lists sheet plastic products – but only sorted by A-Z on the first page. A keyword query like “sheet plastic products only
display A-Z” is likely to have no search volume, this data adds absolutely nothing for the searcher or end-user contextually.
Because these URLs multiply the number of addresses on site they do more harm than good. Again: these URLs utilise meta
no-index tags, meaning they will still affect crawl allowance on Example Site
Issue #2 Action Point(s): See action points on slide 13
12. Severe Architectural SEO Issues• One problem caused by the URL segments referenced on the previous slide is that they are all Meta no-indexed rather than blocked via robots.txt
which means they are using up a lot of crawl allowance unnecessarily. Another issue is that there may be some examples of niche filter
combinations (e.g: blue acrylic of a particular thickness) which may be particularly popular. Because of the usage of Meta no-index, we can’t rank
for those long-tail queries now
• All URLs containing “/sort-by/” or “/show/” should have canonicals added, pointing to their unfiltered, product-based parent pages. They should then
also be blocked in robots.txt so that they do not impact crawl allowance at all
• Colour filtering technology desperately needs to be streamlined so that only one colour can be chosen to view at once. A URL like
examplesite.co.uk/products/acrylic-disc-circles/acrylic-discs/colour/bronze would be ok, but the current structure is producing URLs like:
examplesite.co.uk/products/acrylic-disc-circles/acrylic-discs/colour/anti-reflective-ivory-bronze-tint-grey-tint-black-brown-clear-grey-light-green-
lightblue-purple-red - this is insanity. Please PLEASE try to refine and streamline this functionality as a matter of urgency. Once this is sorted out
the Meta no-index tags can be lifted, this will give Example Site a greater long-tail SEO presence. Until the issue is fixed, add a robots.txt block for
these URLs in addition to the meta no-index tags (which do not help with crawl allowance issues)
• With thickness-data containing URLs, we have similar problems. Luckily these URLs aren’t becoming incredibly long, but they are sometimes
combining with the colour-filtered URLs to combine ridiculous addresses like: https://www.examplesite.co.uk/products/colour/bronze-clear-cream-
grey-white/thickness-%28mm%29/3-5-30/show/all?SID=lm60dkklm6pkgrapfpe4o2j0k2 – this cannot be allowed to continue. The following rules will
stop Google from crawling URLs where various parameters and filters combine:
• Disallow: /*/colour/*/thickness*
• Disallow: /*/show/
• Disallow: /*/sort-by/
Potentially remove the Meta no-index blocks from thickness filtered URLsIssue #2 Action Point(s): See action points on following slide
13. Severe Architectural SEO Issues
• Action points:
• Add these rules to the Example Site robots.txt file:
• Disallow: /*/colour/*/thickness*
• Disallow: /*/show/*
• Disallow: /*/sort-by/*
• Disallow: /*/*/*/*/*/*/*
• For “/show/” and “/sort-by/” URLs, add canonical tags pointing to their product-based, unfiltered parent pages (e.g:
examplesite.co.uk/products/acrylic-perspexequivalent-sheet/fluorescent-acrylic/show/all to examplesite.co.uk/products/acrylic-perspexequivalent-
sheet/fluorescent-acrylic)
• Amend the URL structure of colour-filtered URLs so that users can only filter by one colour at once, thus stopping thousands of useless URLs
spawning by selecting multiple colours simultaneously (which is resulting is crazy-long URL strings)
• Once the above is done, lift the Meta no-index tags from colour filtered URLs
• Depending on how all of that goes, we may also lift Meta no-index from thickness filtered URLs
• Once Example Site gives notification that this work is complete, we must conduct a search traffic flow review immediately
Issue #2 Action Point(s): Action points are in the content of this slide, not here
14. Redundant URL Architecture
• Many on-site URLs contain entirely redundant URL architecture which is unnecessarily bloating URL strings ( problem for
Google). Here’s an example:
• examplesite.co.uk/products/acrylic-disc-circles/colour-acrylic-discs-circles/colour/anti-reflective-ivory
• In red, you can see there is a superfluous “/products/” layer. This is something that can be dropped. Obviously your plastics are
your products, we don’t thing Google (or anyone else) needs this made explicit in the URL structure. “Acrylic Disc Circles” are
obviously products. Removing this will help keywords push higher up the URL string, beneficial for SEO
• In purple you can see that acrylic disc circles are referenced twice redundantly. It’s already been said earlier on in the URL
string! In green you have more redundancies with the word ‘colour’ if the colour is anti-reflective ivory (as specified later),
obviously the discs are coloured and this doesn’t need to be mentioned twice.
• Imagine how much better this URL string would be:
examplesite.co.uk/acrylic-disc-circles/colour/anti-reflective-ivory – the same amount of information portrayed, less characters used.
You can also see that important keywords are now much nearer the open of the URL string. Amend and minify your URL
structure to see some quick SEO gains. Neglect this and matters will only get worse…
Issue #3 Action Point(s): Client to work with their dev to streamline URL architecture
15. False Status Code 200s
• 404 pages which falsely mask their status code as 3XX or 200 (OK) can be a problem for SEO.
• Luckily this is not a problem on
Example Site
• A custom 404 page with navigation
is used, but the correct status code
is returned
• The on-page content describes the
404 making it highly relevant
• No changes are needed
No Issue
16. Trailing Slash Canonicalisation Not Implemented
• Here are a pair of example URLs:
https://www.examplesite.co.uk/products
https://www.examplesite.co.uk/products/
• One has a trailing slash, the other does not. Both are linked internally (we know
because we discovered both URLs through the Screaming Frog crawler)
• There is no 301 redirect process in place to force the loading of the preferred
page version
• There are canonical tags in place, however refusing to rectify this issue will
cause ‘data pollution’ in Google Analytics (you won’t be able to see the
‘consolidated’ visits for a single page easily, etc)
• Although content duplication isn’t a problem, crawl allowance remains an issue
(Google has to crawl duplicate pages to find the canonical tags)
Issue #4 Action Point(s): Client to liaise with dev to decide trailing slash consistency
17. Canonical Redirects are Incomplete
• One side of the canonical redirect equation (‘non-www’ to ‘www’) is implemented
correctly and is functioning as intended
• The other side of the equation (/index) is not in place:
• SEOP. recommends fixing this as soon as possible. All URLs which can be made
to render (status code 200 – OK) with “/index.php” appended - should be 301
redirected to their ‘non- /index’ parent
Issue #5
Note: canonical tags
are in place to alleviate this issue, but it
would still be best to fix this - to alleviate
crawl allowance and data pollution issues
Action Point(s): Client to liaise with dev to finalise canonical redirects
19. Issues with Meta Data, Titles & H1s
• Page Titles should be further optimised, with 51% suffering from bad-
length issues. 47% of core addresses contain Page Titles with no surface-
level issues
• Meta Descriptions should be further optimised, with 75% suffering from
combined bad-length and duplication. 15% of core-site Meta
descriptions are missing
• 14% of H1s are in good health. 85% of H1s are entirely missing
• We’d like to recommend unique, high quality Meta data (descriptions, titles
and H1s) for all core-site URLs
Issue #7 Action Point(s): Write out Meta sheet with fixed / new Meta data
20. • Although no single tag can turn a website’s SEO performance around by itself, the H1 is important for SEO:
• The H1s used on the homepage is a sentence, not a heading (you can tell by the punctuation and also
by the non-title casing
• The H1 does include keywords, but doesn’t ‘open’ with them (diluting their SEO potency)
• A better H1 would be “Example Site: Supplier of Acrylic, Perspex & Polycarbonate Sheets”
• The old H1 can still be included on-page, just put it underneath in styled <p> tags as a ‘strapline’. Just
make sure it does not use the H1 tag(s) as there should only be one H1 per page!
Mediocre H1 on Homepage
Issue #8 Action Point(s): SEOP. to suggest new homepage H1
21. • Here is the Example Site homepage keyword cloud:
• Overall, the keyword could shows that the homepage is relatively well optimised. Quite a few words
and terms which relate to relevant products (or super-categories) appear larger.
• No significant problems here, no action required.
Homepage Keyword Cloud
No Issue
22. • Here’s an example of in-source JavaScript from the homepage:
• It’s best to save JavaScript code into source-linked JavaScript (‘.js’) files. Some scripts on-
site are already linked in correctly, but not all
• Saving JavaScript modules to external files means that they only have to be loaded once,
when a user first visits the site. They don’t have to be re-loaded for every new page the user
visits. Doing this has the potential to positively impact page-load speeds
Issue #9
In-source JavaScript
Action Point(s): Client’s design / dev to look into this
23. • Some documents on the Example Site website contain JavaScript and CSS
which could be minified to improve site performance:
• This basically means that the scripts and style-sheets in question could have
their code re-written so that they are smaller / shorter whilst retaining the
same functionality. Think of this as an analogue of writing down the simplest,
most ‘elegant’ version of a mathematical equation
• The result will be the same, but the time taken to ‘read’ the scripts will be
lessened (slightly improving page-speed metrics). Examples of non-minified
scripts and sheets will be supplied separately (To-Be-Minified.xlsx)
CSS and JavaScript Minification
Issue #4Issue #10 Action Point(s): Client’s design / dev to look into this
24. • The IP address of Example Site is not canonicalised to redirect to the base
domain:
• This could - in some rare circumstances, cause authority to vent (if other
webmasters are linking to the IP address)
• The larger problem is that the IP results in a broken page which may affect site
health metrics
IP Address Canonicalization
Issue #4Issue #11 Action Point(s): Client to explore potential fix with dev
25. • The Example Site homepage is not making use of any structured data or schema:
• Some other pages do use schema {Example URLs}
• It may be a good idea to look over Google’s guide to structured data. There are several
qualifying content types which may benefit from richer markup
• Looking at the Example Site website, we can see that reviews might be a good starting
point. It’s a bit strange that review-based URLs aren’t using review schema
Microdata & Schema
Issue #4Issue #12 Action Point(s): Client’s design / dev to look into this
26. Web & SEO Security
Google loves secure sites
SEO Security
27. Issue #13
Server Signature & Libwww-perl Access
• The Example Site website has an active server signature and grants access to the
user-agent of Libwww-perl (this is bad!):
• Not all applications identified as Libwww-perl are malicious, but many are
• A common application for Libwww-perl is on bot-nets. First a bot-net will look for a
vulnerability on your website, then it will embed itself using up your storage space
and (more importantly) server processing power. This should be fixed ASAP!
Action Point(s): Client to liaise with dev to secure the website
Note: Server signature to be switched off,
Libwww-perl access to be disabled
28. HTTPS Set as Default
• HTTPS is the default method for accessing pages on Example Site
• Google is pushing all webmasters to make the switch to HTTPS, this came
about as a result of Edward Snowden’s revelations on state-sponsored spying
• HTTPS encrypts connections and data for users as they interact with the website,
providing greater overall security
• Google will be impressed by Example Site’ commitment to security and end-user
data encryption
No Issue
30. Google Page-Speed Insights Read Out
Neither the desktop nor mobile scores are positive. Score’s are relatively poor.
Google gives some handy hints on what to fix – your developer(s) may be able to tackle these
Issue #14 Action Point(s): Design / Dev to look into this
31. Google Mobile Friendly Tool Read Out
Example Site has a mobile friendly site design. This also means the site will gain some slight
ranking bonuses on mobile and tablet search.
No Issue
32. Bing Mobile Friendly Tool Read Out
Microsoft’s tool (Bing branded) agrees that Example Site has a mobile friendly site design.
This also means the site will gain some slight ranking bonuses on Bing’s mobile search.
No Issue
33. Google Mobile Web-Speed Tester Read Out
Google’s Mobile Website Speed Tester is a new tool aiming to combine some data from Page-Speed Insights
and the old Mobile Friendly tool. The insights are supposed to be simpler, clearer and more accurate
As referenced by the read-outs from the
previous two tools; mobile UX is doing really
well whilst page-load speed is suffering, this is
especially true on mobile:
Issue #15 Action Point(s): See action point(s) from slide 30
34. The homepage makes over 20 HTTP requests in order to locate, return and load all its appended resources.
This is something a designer / developer may be able to help you look into in greater detail.
Whilst the tool which we use to check this cites
that 20 HTTP requests is too many, we actually
feel that up to 50 (maybe even 60 at a push) is
acceptable
That being said, the homepage is loading a
total of 75 objects which does seem a little bit
over the top
Consult a designer to see if ‘sprite-sheets’ could
be used to cut down on image requests. A
developer may be able to combine scripts to
reduce the number being pulled into each page
Issue #16
Excessive HTTP Requests
Action Point(s): Design / Dev to look into this
Note: Excessive HTTP requests can cause problems for a
site’s page-load speed. Loading the same amount of data from
more fragmented sources takes longer, as processing power must
be expended switching between different ‘objects’
36. Are the Indexation Files Optimised?
Example Site is using a Sitemap.xml file, which helps Google to index the site and find new
content more quickly. We checked the Sitemap file for broken links and found none:
The Robots.txt file may need
some amendments, however
these will be suggested inline
with any architectural
recommendations…
One important note: the sitemap
link in the robots.txt is wrong.
It should point to:
https://www.examplesite.co.uk/site
maps/sitemap.xml
Action Point(s): Sitemap link in robots.txt to be fixed!
Issue #17