Google, Machine Learning, Algorithms, and You.

@schachin
Kristine Schachinger
Kristine@SitesWithoutWalls.com
GOOGLE, MACHINE LEARNING, & SEO.
“…we do not understand documents - we fake it...” ~Google vs The DOJ

@schachin
• Started at a front-end dev & designer
Claim to Fame – Designed Reba McEntire’s site
• Started in SEO 2005
• Consultant 2009 – Present
• Some sites I have worked with:
GoodRx, Vice Media, Zappos, Instacart, Healthline, Jack in the Box, Discover,
USA.gov, Salon.com, Paychex,com, AndroidHeadlines.com, Patch Media etc
• Judge: US Search Awards, UK Search Awards, EU Search Awards
and since I said yes to all the Search Awards during the pandemic, there might be more.
• Specialties: Site Auditing, Site Recoveries, Technical SEO, and all the rest.
• Articles in: WIX SEO, Search Engine Journal, Marketing Land, Search Engine Land,
and Search Engine Watch -- among others.
• Speaker: BrightonSEO San Diego, iGaming, Affiliate Summit West, BarbadosSEO,
UngaggedUK/US, State of SearchLeeds, Pubcon, SMX, RIMC, SXSWi -- and others.

@schachin
In the beginning, there was a …
Large-Scale Hypertextual Web Search Engine

@schachin
http://infolab.stanford.edu/pub/papers/google.pdf

@schachin
The Web 1998 Would FIT on a USB

@schachin
Today in 2022…
Roughly OVER half of the world's population
5.07 billion use the internet every day.

@schachin
Google
processes
TRILLIONS
of queries a
year & has
indexed
BILLIONS
of Websites.

@schachin
In one day there were over…
3,567,977,254 Google searches.
http://www.internetlivestats.com/google-search-statistics/

@schachin
In ONE SECOND today, there were

@schachin

@schachin
Making Sense of it All.
“We do not understand documents.” ~ Google

@schachin
Enter Machine Learning

@schachin
What is Machine Learning & AI.

@schachin
Google Myth: AI, machine learning, & deep learning are all the same thing
While artificial intelligence (AI) is a convenient and commonplace term, it has no
widely agreed-upon technical definition. One helpful way to think about AI is as the
science of making things smart. Much of the recent progress we’ve seen
in AI is based on machine learning (ML), a subfield of AI where
computers learn and recognize patterns from examples, rather
than being programmed with specific rules. There are many different
ML techniques, but deep learning is a particularly popular one right now. Deep
learning is based on neural network technology, an algorithm whose architecture is
inspired by the human brain and can learn to recognize pretty complex patterns, such
as what “hugs” are or what a “party” looks like.
https://ai.google/static/documents/exploring-6-myths.pdf

@schachin
Google
Myth: AI is approaching
human intelligence
“While AI systems are
nearing or outperforming
human beings at
increasingly complex tasks
like generating musical
melodies or playing the
game of Go, they remain
narrow and brittle, and lack
true agency or creativity.”

@schachin
Google
THERE ARE THREE PLACES GOOGLE APPLIES MACHINE LEARNING
IN THE ORGANIC SEARCH ENGINE.
+ PRE-SCORING
LANGUAGE MODELS
+ AD HOC POST-SCORING
RANK BRAIN
NEURAL MATCHING
+ LIVE RANKING FACTORS
HELPFUL CONTENT UPDATE
THE BIG DADDIES! SGE and MUM ARE IN A CLASS BY ITSELF.

@schachin
Pre-scoring algorithms.
Machine Learning/AI.

@schachin
In the beginning there was…
Word2Vec the Embedded Word Model
Semantic Search.
https://www.tensorflow.org/tutorials/representation/word2vec

@schachin
Word Embedding
Vector space models (VSMs) represent
(embed) words in a continuous vector space
where semantically similar words are
mapped to nearby points
('are embedded nearby each other').
Word2Vec
https://www.tensorflow.org/tutorials/representation/word2vec

@schachin
First Steps.

@schachin
This is the Knowledge Graph
Knowledge Graph

@schachin
• Words go in.
• Words get assigned a mathematical address in a vector.
• Similar and related words sit close to each other in the vector space.
• Words are retrieved based on your query and the words it locates in the “best fit” vector.
• These word “interpretations” are used to return results.
Begging of Semantic Search.

@schachin
SLOW………
& Inefficient.
Word2Vec

@schachin
Language Machine Learning/AI.

@schachin
Enter “BERT”

@schachin
Sesame Street and Search
What is BERT?
Natural Language Processing pre-training called Bidirectional
Encoder Representations from Transformers, or BERT.
Moving from NLU into early NLP

@schachin
Google
https://searchengineland.com/how-google-uses-artificial-intelligence-in-google-search-379746
BERT. ”BERT, Bidirectional Encoder Representations from Transformers, came in 2019, it is a neural
network-based technique for natural language processing pre-training. looking at the sequence of words
on a page, so even seemingly unimportant words in your queries are counted for in the result.”
• Year Launched: 2019
• Used For Ranking: No
• Looks at the query and content language
• All languages
• Language Training Model: Prescoring
• Very commonly used for many queries
• Can you optimize for it? No

@schachin
Google
BERT. Example.
“if you search for “can you get medicine for someone at my pharmacy,”
BERT helps us understand that you’re trying to determine if you can pick up medicine for someone else
at your pharmacy.
Before BERT Google ignored the FOR, AT, & YOUR and would mostly surface results about how to fill a
prescription
For offers context.

@schachin
https://bensen.ai/elmo-meet-bert-recent-advances-in-natural-language-embeddings/
BERT, or Bidirectional Encoder Representations from Transformers, improves upon
standard Transformers by removing the unidirectionality constraint by using a masked language
model (MLM) pre-training objective. The masked language model randomly masks some of the tokens
from the input, and the objective is to predict the original vocabulary id of the masked word based only
on its context. Unlike left-to-right language model pre-training, the MLM objective enables the
representation to fuse the left and the right context, which allows us to pre-train a deep bidirectional
Transformer. In addition to the masked language model, BERT uses a next sentence prediction task
that jointly pre-trains text-pair representations.
There are two steps in BERT: pre-training and fine-tuning. During pre-training, the model is trained on
unlabeled data over different pre-training tasks. For fine-tuning, the BERT model is first initialized with
the pre-trained parameters, and all of the parameters are fine-tuned using labeled data from the
downstream tasks. Each downstream task has separate fine-tuned models, even though they are
initialized with the same pre-trained parameters.
Sesame Street and Search: BERT Definition

@schachin
LLMs can go forward and backwards
to predict an unknown (masked) term and/or sentence.
Also uses root words, so play for player/playing/played are the same
This allows them to derive context for what is being written.
Previous models were based on word vectors (entities and knowledge graphs)
LLM Transformers are Bidirectional
https://blog.google/products/search/search-language-understanding-bert/

@schachin
Sesame Street and Search: Why is BERT Special?
BERT can disambiguate words from the sentence and apply meaning forward and backward to those
words in order to predict a masked word using those applied contexts. This is SUPER EFFICIENT!

@schachin
Because BERT can go forward and backwards
to predict an unknown (masked) term and/or sentence.
Also uses root words, so play for player/playing/played are the same
Sesame Street and Search: Why is BERT Special?
https://blog.google/products/search/search-language-understanding-bert/

@schachin
Why are LLMs So Special?
Large Language modeling can determine the meaning of words in context
so it can better predict the next word in the sentence.
These sentences mean two different things forward and backward.

@schachin
How does this work? Transformers
What are transformers?
A transformer in language processing is a type of computer program
that is designed to understand and generate text.
It does this by using a special type of algorithm called self-attention.
Self-attention allows the program to look at all the words in a
sentence or a piece of text at once, and understand how they relate
to each other, rather than just one word at a time like traditional
methods. This way it can better understand the meaning of the text,
and can generate text that is more similar to how a human would
write.

@schachin
Confused yet?

@schachin
Simply put BERT or language modeling is
“Language modeling – although it sounds formidable –
is essentially just predicting words in a blank.”

@schachin
Why does it matter to us as SEOs?
It mostly doesn’t.
It was a breakthrough in Language Model
Processing, because it is …
+ VERY Fast
+ Uses fewer resources
+ Provides better understanding of content

@schachin
Post-scoring ad hoc algorithms.
Machine Learning/AI: Rank Brain

@schachin
https://searchengineland.com/google-search-ranking-documents-434141#life-of-a-click-user-interaction

@schachin
Enter AI - LIVE.
Ad Hoc Post Retrieval Document Models
Rank Brain & Neural Matching

@schachin
Rank Brain.
Rank Brain & Neural Matching & the
Document Relevancy Model (DRAM)
“Document relevance ranking, also known as adhoc retrieval
is the task of ranking documents from a large collection using
the query and the text of each document only.”
Rank Brain.

@schachin
Rank Brain vs Neural Matching.
Both are used to re-ordered the results post retrieval
according to “ad hoc retrieval” methods and ”dynamic relevancy”
Ranking with ONLY the document text
• https://www.searchenginejournal.com/google-neural-matching/271125/
• http://www2.aueb.gr/users/ion/docs/emnlp2018.pdf

@schachin
Rank Brain

@schachin
Rank Brain.
One of three (known) algorithms
that uses AI on the live results
Rank Brain.

@schachin
Google
RankBrain. Google told us RankBrain helps Google understand how words are related to concepts and
can take a broad query and better define how that query relates to real-world concepts.
• Used For Ranking: Yes (post ad hoc)
• Works for all languages
• Applied post scoring ad hoc
• ”Kitchen Sink” result
• Can you optimize for it? No

@schachin
Rank Brain is used for Unknown Queries where entity
meanings/relationships are unclear or unknown.
“KITCHEN SINK RESULTS”.

@schachin
Presence of Rank Brain means
Google is confused …

@schachin
Rank Brain Also Uses Users Queries & Clicks
to Help It Understand Query Intent.

@schachin
Sweets?

@schachin
A year later …
(yellow = poor intent match)

@schachin
Google + Rank Brain Also Uses Users Queries & Clicks
to Help It Understand Query Intent.
+ GEO LOCATION

@schachin
A year later in London …

@schachin
Nov 2023 Texas…

@schachin
Why location?
Semantic Relevancy
Sweets has no definitive entity in the US.

@schachin
• When do you see it?
• Relationships between entities & search intent are weak or unknown
• -- enter Rank Brain.
• Behind the scenes, data is continually fed into the machine
learning process, to make results more relevant the next time.
• Can be combined with other algorithms such as neural matching
• No way to optimize for it
• BUT you can help prevent your page from getting one of these
results check your results for your queries.
Make sure Google is NOT CONFUSED.
Rank Brain.

@schachin
• When do you see it?
• Relationships are weak or unknown
• -- enter Rank Brain.
• Behind the scenes, data is continually fed into the machine
learning process, to make results more relevant the next time.
• Can be combined with other algorithms such as neural matching
• No way to optimize for it
• BUT you can help prevent your page from getting one of these
results check your results for your queries.
Make sure Google is NOT CONFUSED.
Rank Brain.

@schachin
SEP 2018: Rank Brain has a friend now.
Neural Matching the other AI

@schachin
Post-scoring algorithms.
Machine Learning/AI: Neural Matching

@schachin
Google
Neural matching. Neural matching was released in 2018 - expanded to the local search results in 2019.
Neural matching does specifically help Google rank search results and is part of the POST ad-hoc
ranking algorithms.
Links CANNOT affect this ranking sort.
• Used For Ranking: Yes (but post scoring)
• Works for all languages
• Applied post scoring ad hoc
• Can you optimize for it? Yes and No

@schachin
Google
Neural matching cont.
“Example of how neural matching is used.
Search for “insights how to manage a green”
By looking at the broader representations of concepts in the query —
management, leadership, personality and more — neural matching can
decipher that this searcher is looking for management tips based on a popular,
color-based personality guide”

@schachin
https://www.oncrawl.com/technical-seo/neural-matching-seo-content-creation-rules/
AND IT DOES
NOT USE LINKS

@schachin
Neural Matching + Super Synonyms.

@schachin
Links CANNOT affect this ranking sort.

@schachin
Machine Learning.

@schachin
Rank Brain.
What is the difference between
Rank Brain and Neural Matching?
Rank Brain Vs Neural Matching.

@schachin
RankBrain helps Google better relate pages to concepts.
Neural Matching helps Google better relate words to searches.
• Rank Brain = page concepts
• Neural Matching = linking words to the page concepts
“…neural matching, – AI method to better connect words to concepts.” - Google
https://www.seroundtable.com/google-explains-neural-matching-vs-rankbrain-27300.html

@schachin
Machine Learning Ranking Signals.
Neural Matching the other AI

@schachin
Ranking algorithms.
AI Ranking Signals.
Helpful Content Update.

@schachin
Google Helpful Content Update
“Our classifier for this update runs continuously, allowing it to monitor newly-launched sites and
existing ones. As it determines that the unhelpful content has not returned in the long-term, the
classification will no longer apply.
This classifier process is entirely automated, using a machine-learning model.”
https://developers.google.com/search/blog/2022/08/helpful-content-update

@schachin

@schachin
Main Points
• Ranking signal NOT an update
• First known ranking signal that has machine learning
• Continually rolling but with delays, so can take 2-3
months to catch-up with your site
• Sitewide but severity based on the number of issued
pages
• Other factors can lessen the devaluation (like
content quality on other pages)
• Seems to target what Panda and Penguin did with an
additional focus on the quality of “usefulness” or
“helpfulness”
• Is your content differentiating itself?
DALL-E image for “Angry SEO”

@schachin
Helpful Content + Page Experience
“Helpful content generally offers a good page
experience. That's why today, we've added a
section on page experience to our guidance on
creating helpful content and revised our help
page about page experience. We think this all will
help site owners consider page experience more
holistically as part of the content creation
process…”
https://developers.google.com/search/blog/2023/04/page-experience-in-search
HCU + Page Experience.

@schachin
AI Content.
The HCU.
And Google.
Don’t Blow Yourself Up.
a human male person with the job of SEO Consultant yelling at Google in 3d cartoon form

@schachin
Google
Myth: can’t detect AI content.
AI systems can predict that content is likely
created by AI.
How?
AI cannot create anything. It is only able to
use what is knows to detect patterns and then
in the case of content, use those patterns to
“write content”
So, AI can recognize patterns of how AI would
“write” and determine a likelihood that this
item is written by AI.
It is not 100%, but it can be done.
Google has an algorithm that detects AI
repurposed scraped content.

@schachin
Google
Myth: can’t detect AI content.
AI systems can predict that content is likely
created by AI.
How?
AI cannot create anything. It is only able to use
what is knows to detect patterns and then in the
case of content, use those patterns to “write
content”
So, AI can recognize patterns of how AI would
“write” and determine a likelihood that this item
is written by AI.
It is not 100%, but it can be done.
Google has an algorithm that
detects AI repurposed scraped
content.
https://www.seroundtable.com/google-ai-plagiarized-content-34495.html

@schachin
Some of the general parameters that are used to train language models include:
AI Content, Google, and the HCU.
Google says AI Content is okay IF it provides value and it not “spammy”,
But since it is writing what it trained on how does provide value?

@schachin
How does Google define “Spammy” content?

@schachin
https://developers.google.com/search/blog/2022/08/helpful-content-update
Google and
the Helpful Content Update.

@schachin
MUM’s The Word.
MUM’s the Word

@schachin
MUM: What is it?

@schachin
GoogleMUM (Multitask Unified Model)
“…has the potential to transform how Google helps you with complex tasks. MUM
uses the T5 text-to-text framework and is 1,000 times more powerful than BERT.
MUM not only understands language, but also generates it.”
Built on top of BERT.
____________
Possible related patent
https://www.searchenginejournal.com/what-is-google-mum/407844/
https://blog.google/products/search/introducing-mum/
https://www.fastcompany.com/90681337/google-mum-search

@schachin
“The choice of multimodal models fits Google because of the increased number of non-text
based sources, such as video in the form of livestreams or similar, and audio files, as in the
case of podcasts. To develop MUM, Google trained the algorithm "across 75 different
languages and many different tasks at once" to refine its comprehension of information and
digital details.
MUM also considers knowledge across languages, comparing a query to sources that aren’t
written in the user's native language to bring better information accuracy.
As a result Google claims MUM is 1,000 times more powerful than
BERT.”
https://www.cmswire.com/digital-marketing/what-marketers-can-expect-from-google-mum/

@schachin
Reid acknowledges that MUM carries its own risks. “Any time you’re training a model based on
humans, if you’re not thoughtful, you’ll get the best and worst parts,” she says. She emphasizes
that Google users human raters to analyze the data used to train the algorithm and then assess
the results, based on extensive published guidelines.
“Our raters help us understand what is high quality content, and that’s what we use as
the basis,” she says. “But even after we’ve built the model, we do extensive testing, not
just on the model overall, but trying to look at slices so that we can ensure that there is
no bias in the system.”
The importance of this step is one reason why Google isn’t
deploying all its MUM-infused features today.”
https://www.cmswire.com/digital-marketing/what-marketers-can-expect-from-google-mum/

@schachin
MUM and COVID-19

@schachin
Optimizing for Machine Learning?
What do we do?

@schachin
So should you optimize for Machine Learning
algorithms and ranking signals?
Do you optimize for Machine Learning?

@schachin
AI is ever-changing and unfixed.
Don’t waste the time and resources on gaming it.
But you can make it easier for the machine
learning to get it right.
Do you optimize for Machine Learning?

@schachin
So what can you do?

@schachin
“Applying” it to Your SEO.

@schachin
It is important to feed the Machine Learning
the right signals and interpreters, so it does
not make mistakes.

@schachin
DALL-E image using previous photo with text ”happy laughing baby”

@schachin
Simple answer to a very complex issue?
Do your normal query research,
check the SERPs for Rank Brain issues
and then just write naturally.
Using specificity (topical hubs) PLUS
depth & breadth to create holistic content.

@schachin
Write holistic content? Does your content have depth, breadth, & semantic relationships?
Use terms that are semantically related. Image search is great for showing related terms.

@schachin
Schema.

@schachin
Do You Schema?

@schachin
What is Structured Data?
https://developers.google.com/search/docs/guides/intro-structured-data

@schachin
What is Structured Data?
Structured data for SEO purposes is on-page markup that
enables search engines to better understand the information
currently on your site’s web pages, and then use this information
to improve search results listing by better matching user intent.

@schachin
Why Does Schema Matter?

@schachin
We can act as the interpreter and help “teach”
Google what our site is about.

@schachin
Adding semantic mark-up
(structured data via schema) allows us to tell
Google what WE SAY our site is about and WHAT
RELATIONSHIPS we define within it.

@schachin
We can act as the interpreter and help “teach”
Google the context of our content.

@schachin
We can help give Google a clearer understanding.
That helps us help Google better answer
the questions users ask
and to better surface our content for those users
We give our data meaning
Google Understands

@schachin
Ranking Without Links

@schachin
Well Formed Questions.

@schachin
And Use Well Formed Text.

@schachin
Well Formed Text & Parsey McParseFace.
http://www.kurzweilai.net/google-open-sources-natural-language-understanding-tools
Ray Kurzweil on Google NLU

@schachin
Questions = Well Formed Text
https://ai.google/research/pubs/pub47323
“Understanding natural language queries is fundamental to many practical NLP
systems. Often, such systems comprise of a brittle processing pipeline, that is not
robust to "word salad" text ubiquitously issued by users. However, if a query
resembles a grammatical and well-formed question, such a pipeline is able to
perform more accurate interpretation, thus reducing downstream compounding
errors.”

@schachin
Google is an Answer Engine.
PARTIALLY BECAUSE
…that is how MACHINE LEARNING works best.

@schachin
THINK in Intent, Query Terms, & Context.

@schachin
And Questions!

@schachin
Takeaways.
• Think Search Queries NOT Simple Keywords
• Write in natural language
• Write using holistic content
• Focus on depth and breadth with related terms
• Add Structured Data
• Use well formed text (ie questions) when you can.
Takeaways.

Google, Machine Learning, Algorithms, and You.

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a Google, Machine Learning, Algorithms, and You.

Semelhante a Google, Machine Learning, Algorithms, and You. (20)

Mais de Kristine Schachinger SEO and Online Marketing

Mais de Kristine Schachinger SEO and Online Marketing (20)

Último

Último (20)

Google, Machine Learning, Algorithms, and You.