4. #pubcon
SEO: Then & Now
Back then:
•Keyword-focused:
• Text retrieval system relied on exact match keywords
• Weighted documents by keyword frequency
•Unable to distinguish synonyms and homographs
• Synonym: Words that share the same meaning (e.g. car and
automobile)
• Homograph: More than one meaning depending on context
(e.g. “charge)
5. #pubcon
SEO: Then & Now
Now:
•Driven by intent and context
•Provide relevant answers to
complex and vague queries
11. #pubcon
What is Semantic Search
Semantics:
A branch of linguistics that studies the relationship between words and
sentences and their actual meanings.
Semantic Search:
The improvement of search accuracy by understanding intent and
context, using various on-site elements to crawl, index, and serve
relevant results.
12. #pubcon
What is Semantic Search
•Entity Optimization
•Knowledge Graph
•Structured Data
•Information Architecture
•Co-occurrence and Clustering
13. #pubcon
What is Semantic Search:
Entity Optimization
Paul Haahr – Google Ranking Engineer – SMX 2016
14. #pubcon
What is Semantic Search:
Knowledge Graph
•Understands relationships between things
•Stores and understands the intelligence between
different entities
•Not just a catalog of objects, but a data model for
inter-relationships
15. #pubcon
What is Semantic Search:
Structured Data
•Google is a data-driven machine that needs to be
fed in order for it to learn
•Feed it structured data – it’s a piece of intelligence
the crawler uses to build semantic relevance and
authority
•This is how entities are indexed!
16. #pubcon
What is Semantic Search:
Information Architecture
•Allows for a crawler to clearly understand content and how it’s connected
•Provide a clear and hierarchical path of information
•Lends to a good UX
•The RIGHT approach is the most LOGICAL approach
•Must read: Information Architecture for the World Wide Web [3rd Edition, by
Peter Morville]: https://www.amazon.com/Information-Architecture-World-Wide-
Web/dp/0596527349
17. #pubcon
What is Semantic Search:
Co-Occurrence and Clustering
Word Co-Occurrence Clustering
• Generates topics from words frequently occurring together
Weighted Bigraph Clustering
• Uses URLs from Google search results to induce query similarity and
generate topics
The combination of these two methods demonstrated greater usefulness
and accuracy when compared to Latent Semantic Analysis.
Read the patent here:
https://pdfs.semanticscholar.org/dcf7/05ba07ee1b73fda0c94e9d01b2474173e470.pdf
18. #pubcon
What is Semantic Search:
Co-Occurrence and Clustering
Word Co-Occurrence
• A set of words anchors serve as initial topics, which are then
generalized to other words co-appearing with the same queries.
• Topics are created using hierarchical clustering on query
similarity, which measures to what extent two queries agree on their
intersections with the list of words in each topic.
Bigraph Clustering
• Uses organic results to create a bigraph with a set of queries and a set
of URLs as nodes. Weights of the graph are computed with the
impression and click data.
• Bigraph clustering works very well even if the queries do not share
common words
21. #pubcon
• Learning the mathematical relevance helps to understand search
on a functional level
• LSI uses Singular Value Decomposition which is a linear algebraic
factorization for many of our modern algorithms
• It is not a way to “do SEO”
• LSI KEYWORDS ARE NOT A THING
22. #pubcon
Latent Semantic Indexing
Latent Semantic Indexing (LSI):
•Mathematical algorithm based on Singular Value Decomposition (SVD)
•Text indexing and retrieval method
•How terms and concepts are related
23. #pubcon
Latent Semantic Indexing
•LSI works by projecting a large multi-
dimensional space down into a smaller
number of dimensions
•Semantically similar words get
bunched together
•Boundary blurring allows LSI to go
beyond exact keyword matching
24. #pubcon
Latent Semantic Indexing
•LSI uses Singular Value Decomposition (SVD) to decompose this matrix
•Preserves information about relative distances between document vectors
•Collapsed into smaller dimensions
•Information is lost and words are superimposed on one another
25. #pubcon
Latent Semantic Indexing
•Noise reduction
•Reveal similarities that were latent
•Similar terms become more similar, while dissimilar things remain distinct
This method is a widely used technique to unveil latent themes in text
data, as these models learn the hidden topics by understanding
document level word co-occurrence patterns.
26. #pubcon
Latent Semantic Indexing
Short texts, such as search queries, tweets or instant messages suffer from
data sparsity, which causes problems for traditional topic modeling
techniques. Unlike proper documents, short text snippets do not provide
enough word counts for models to learn how words are related and to
disambiguate multiple meanings of a single word.
*This is why the binary co-occurrence/clustering model works better*
28. #pubcon
Key Takeaways
•Craft and optimize content for topics and concepts, not just
keywords
•Use structured data to feed crawler the semantic intelligence it
needs to understand your site better
•Align the information architecture of your website to the
consumer journey
•Navigation, sitemaps, page structure, content organization
•Stop saying/using “LSI keywords”
•The best approach is the most logical approach!