You have a shiny new site and your brand is looking for a fresh start with their offering. It may be one of many past migrations, protocol switches and redirections you're undertaken historically. But then you find that things didn't quite go as you expected. You never really got back to where you wanted to be in organic search. Part of this is because 'Gone is never Gone'. Every URL that ever was known of on your site is listed in the history logs in the Google search engine system and history logs are used to determine the amount of time your site will be apportioned crawling. You inherited technical SEO debt and generational cruft where everything gets blurred for Google in understanding which is the target URL for a particular term. This can be particularly prevalent when you migrate from one ecommerce platform to another because past crawling rules developed for your site are now not applicable but are still in the history and crawl patterns discovered.
2. @dawnieando #BigDigitalADL
A New Beginning
§ “A
new
website
will
solve
ALL
our
problems”
§ “Let’s
start
again”
§ “We’ll
just
migrate…
and
redirect
everything”
8. @dawnieando #BigDigitalADL
Web Crawler System
GOOGLE
NEVER
FORGETS
The
history
logs
play
a
role
in
deciding
when
every
URL
that
was
EVER
discovered
gets
visited
again
9. @dawnieando #BigDigitalADL
History Log Records Include:
• URL
fingerprint
• Timestamp
(last
crawl
or
download
attempt)
• Crawl
status
(success
or
error)
(Response
code)
• Content
checksum
(binary
code)
• Source
ID
(accessed
from
cache
or
downloaded)
• Segment
identifier
(Crawl
segment
assigned
to??)
• Page
importance
(a
measure
of
importance
assigned
to
the
URL)
May
be
calculated
by
identifying
historical
importance
scores
based
on
past
X
number
of
crawls
10. @dawnieando #BigDigitalADL
Gone Is Never Gone
“We
knew
there
was
content
there
at
some
point
so
we
just
swing
by
every
now
and
then
to
see
if
anything
came
back”
(John
Mueller,
2016)
12. @dawnieando #BigDigitalADL
The Generational ’Snail Trail’
• Old
XML
sitemaps
• Redirects
drop
away
on
old
site
.htaccess
• DNS
issues
• People
link
to
old
site
but
wrong
protocol
• Old
sites
not
verified
in
GSC
• Not
all
protocols
redirecting
Leaving
it’s
slithery
footprint
14. @dawnieando #BigDigitalADL
The Slow Page Evolution of Near Duplicates
In
a
study
over
11
weeks
Denis
Fetterly and
Mark
Najork found
that
near-‐duplicate
pages
rarely
change
and
that
they
are
still
near-‐
duplicates
of
each
other
10
weeks
later.
Therefore
once
identified
their
download
priority
may
be
reduced
so
that
resources
may
be
used
more
efficiently
/
productively
elsewhere
(Fetterly &
Najork,
2003)
Fetterly &
Najork,
2003
20. @dawnieando #BigDigitalADL
CRAWLING RULES BUILT OVER TIME
Crawl
Frequency
Patterns
No
two
sites
will
have
the
same
crawl
schedules
or
rules
built
Moving
from
one
CMS
to
another
may
mean
that
different
parameters
are
created.
New
parameters
=
new
rules
23. @dawnieando #BigDigitalADL
YOU INHERITED SEO TECHNICAL DEBT
• Previous
content
/
link
manual
actions
• Previous
algorithmic
suppressions
• Past
infinite
loops
• “We’ll
SEO
it
after
launch”
• “SEO
is
dead…
so
we
won’t
optimise”
• Dodgy
URL
parameters
• Misconfigured
URL
parameters
• Old
URL
crawling
‘rules
/
hints’
27. @dawnieando #BigDigitalADL
GENERATIONAL CRUFT
EVERY
SINGLE
TIME
YOU
MIGRATE,
CHANGE
DESIGN,
REDIRECT,
REINVENT
A
SITE
/
URL
A
CLEAN
START
REDIRECTIONS
ANOTHER
STRUCTURE
FIRST
SITE
STRUCTURE
NEW
CRAWLING
‘RULES’
BUILT
CRAWLING
‘RULES’
BUILT
EVERYTHING
IS
‘200
OK’
MORE
URLs
MIXED
RESPONSE
CODES
REDIRECTIONS
‘FUZZINESS’
IS
EMERGING
NEW
CRAWLING
‘RULES’
BUILT
MORE
URLs
REDIRECT
CHAINS
&
MIXED
RESPONSE
CODES
NEW
SEO’s
DON’T
KNOW
THE
‘HISTORY’
TARGET
URLs
NOW
‘VERY
FUZZY’
30. @dawnieando #BigDigitalADL
Time Seems To Fly… The Older You Get
Your
new
site
URL
is
just
one
of
very
many
historical
URLs
on
your
IP
to
be
visited
periodically
A
tiny
fish
in
a
very
big
URL
pond
queue
32. @dawnieando #BigDigitalADL
SOLUTION - THE BELOVED CANONICAL
§ 30X
redirects
§ Canonical
tag
§ Href lang
§ HTTPS
protocol
§ Global
canonicalization
rules
In
’ALL’
its
forms
35. @dawnieando #BigDigitalADL
Oh Yeah – Canonicalization is Easy
76% of SEOs surveyed
considered;;
“CANONICALIZATION
IS AN EASY CONCEPT
TO UNDERSTAND”
36. @dawnieando #BigDigitalADL
REL NEXT REL PREV is NOT Canonicalization
47% of SEOs
categorizing themselves
as ‘TECHNICAL SEO’s
considered;;
“REL=NEXT / REL =
PREV” IS A FORM OF
CANONICALIZATION
38. @dawnieando #BigDigitalADL
On Href Lang as Canonicalization
Only 64% of ’Technical
SEOs’ thought HRef
Lang was a form of
Canonicalization
IT IS
39. @dawnieando #BigDigitalADL
URL Parameter Handling is Your Friend
Help
Google
Build
‘Crawling
Rules’
for
your
site
rather
than
wasting
time
on
‘sampling’
and
giving
a
bad
impression
GIVE
HELP
AND
GUIDANCE
WITH
THE
CRAWL
RULE
AND
HINT
BUILDING
40. @dawnieando #BigDigitalADL
SOLUTION - Understand URL Parameters
ACTIVE
PARAMETERS
==
CHANGE
THE
CONTENT
ON
YOUR
PAGE
(e.g.
sort,
filter,
translate,
paginate,
specify)
PASSIVE
PARAMETERS
==
DO
NOT
CHANGE
THE
CONTENT
ON
YOUR
PAGE
(Often
used
for
tracking)
(ALIAS:
REPRESENTATIVE)
41. @dawnieando #BigDigitalADL
ACTIVE Parameters (CHANGE CONTENT)
SORT
==
Sorts
dynamic
items
and
reorders
in
descending
/
ascending
price
/
popularity
/
added
NARROWING
==
Filters
dynamically
added
items
down
to
include
only
features
&
attributes
in
a
chosen
consideration
set
SPECIFYING
==
Identifies
a
particular
dynamically
variable
populated
content
set
within
a
site
section
(e.g.
store=women)
TRANSLATING
==
Indicates
a
language
driven
translation
URL
(e.g.
lang=fr)
PAGINATING
==
Indicates
a
paginated
display
of
long
content
(e.g.
page=2)
43. @dawnieando #BigDigitalADL
Examples of Multiple Parameter Handling
KNOW
THE
RULES
http://www.example.com?shopping-‐category=DVD-‐movies&sort-‐
by=production-‐year&sort-‐order=asc WILL
BE
CRAWLED
http://www.example.com?shopping-‐category=shoes&sort-‐by=size&sort-‐
order=asc WILL
NOT
BE
CRAWLED
(production-‐year
blocks)
44. @dawnieando #BigDigitalADL
Help Googlebot Get Round its Shopping List
OPEN
MORE
CHECKOUTS
WIDEN
THE
AISLES
MAKE
THINGS
EASY
TO
FIND
DON’T
CONFUSE
GOOGLEBOT
HELP
FILL
THE
TROLLEY
QUICKLY
SPEED,
SPEED,
SPEED
45. @dawnieando #BigDigitalADL
SOLUTION - XML SitemapsAre Your
Friend… (Strong Foundations)
They
help
to
pass
‘importance’
signals
within
a
site
But…
never
leave
them
to
just
autogenerate
without
periodically
checking
‘The
foundations’
underneath
a
site
46. @dawnieando #BigDigitalADL
Validate & Retain in GSC ALL Past Domains &
Past Site Versions (Protocols (HTTPS / HTTP)
THERE
MAY
STILL
BE
UNDETECTED
ACTIVITY
GOING
ON
THERE
47. @dawnieando #BigDigitalADL
Server Log FileAnalysis is Your Friend…
You’ll
be
surprised
by
what
you
find
Find
out
what
Googlebot is
visiting
and
when
(how
often)
and
whether
it
should
be
visiting
it
at
all
48. @dawnieando #BigDigitalADL
SOLUTION - Save & Grow The URLs
Not
EVERYTHING
is
worthy
of
its
own
URL
VARIANTS
STEMMINGS
PLURALS
RANDOM
TAGS
LONG,
LONG,
LONG
TAIL
PARAMETERS
52. @dawnieando #BigDigitalADL
Increase ‘Importance’ quickly of target URLs
• Internal
link
optimization
• Canonicalise to
(if
relevant)
• Strengthen
up
importance
signals
• Inclusion
in
front
facing
and
XML
sitemaps
• Improve
the
content
&
keep
it
updated
• 301
redirect
to
(if
relevant
redundant
content)
53. @dawnieando #BigDigitalADL
Reduce ‘Importance’ quickly of old URLs
• Internal
link
unoptimization
• 410
• Dig
out
URLs
with
links
to
them
• Orphan
URLs
• Canonicals
to
HTTPs
• Exclusion
from
XML
sitemaps
(even
old
ones
on
the
server)
• Strip
out
content
57. @dawnieando #BigDigitalADL
REVISIT PAST .HTACCESS FILES
Can
you
rewrite
the
rules
to
be
more
efficient
or
cut
out
some
old
rules
still
firing
unnecessarily?
59. @dawnieando #BigDigitalADL
You Have a Shiny New Site… So What?
You
may
still
have…
GENERATIONAL
CRUFT
&
TECHNICAL
DEBT
TO
PAY
OFF
GONE
IS
NEVER
GONE