This was my talk at Ungagged London on understanding the history of Google's "The Knowledge Graph" and how it is being used as the first step towards scaling natural language understanding and machine learning as we move past the simple keyword and into the time of queries, questions, and voice search.
14. #GetUngagged @schachin
Kristine Schachinger
Unstructured data (or unstructured information) is information that
either does not have a pre-defined data model or is not organized in a
pre-defined manner. Unstructured information is typically text-heavy,
but may contain data such as dates, numbers, and facts as well.
https://www.google.co.uk/search?q=definition+unstructured+data&oq=definition+unstructured+data&aqs=chrome..69i57j0l5.5175j0j7&sourceid=chrome&ie=UTF-8
15. #GetUngagged @schachin
Kristine Schachinger
Unstructured data (or unstructured information) is information that
either does not have a pre-defined data model or is not organized in a
pre-defined manner. Unstructured information is typically text-heavy,
but may contain data such as dates, numbers, and facts as well.
https://www.google.co.uk/search?q=definition+unstructured+data&oq=definition+unstructured+data&aqs=chrome..69i57j0l5.5175j0j7&sourceid=chrome&ie=UTF-8
This is known as the “Bag of Words” approach.
25. #GetUngagged @schachin
Kristine Schachinger
“Graph-based knowledge representation has been
researched for decades and the term knowledge
graph does not constitute a new technology.
Rather, it is a buzzword reinvented by Google
and adopted by other companies and academia to
describe different knowledge representation
applications.”
Knowledge Graphs
http://ceur-ws.org/Vol-1695/paper4.pdf
34. #GetUngagged @schachin
Kristine Schachinger
(Knowledge Graphs)
”…quite possibly ...
one of Google's significant achievements”
Nathania Johnson of Search Engine Watch
https://web.archive.org/web/20090516213508/http://blog.searchenginewatch.com/090512-201139
Knowledge Graphs
35. #GetUngagged @schachin
Kristine Schachinger
The Knowledge Graph (Google) is seeded by things known.
Instead of just text without meaning, The KG is a relational
graph with known objects and mapped relationships.
THE Knowledge Graph
37. #GetUngagged @schachin
Kristine Schachinger
Why?
Google doesn’t truly process
Natural Language (NLP), but it does use
Natural Language Understanding (NLU)
The Knowledge Graph was the first step
towards language understanding.
41. #GetUngagged @schachin
Kristine Schachinger
The Knowledge Graph enables you to search for things, people or places
that Google knows about—landmarks, celebrities, cities, sports teams,
buildings, geographical features, movies, celestial objects, works of art
and more—and instantly get information that’s relevant to your query
THE Knowledge Graph
45. #GetUngagged @schachin
Kristine Schachinger
Knowledge Graph entities
The Knowledge Graph has millions of entries that describe real-world entities like people, places, and things. These
entities form the nodes of the graph.
The following are some of the types of entities found in the Knowledge Graph:
Book
BookSeries
EducationalOrganization
Event
GovernmentOrganization
LocalBusiness
Movie
MovieSeries
MusicAlbum
MusicGroup
MusicRecording
Organization
Periodical
Person
Place
SportsTeam
TVEpisode
TVSeries
VideoGame
VideoGameSeries
WebSite
THE Knowledge Graph ENTITIES
51. #GetUngagged @schachin
Kristine Schachinger
"Four years ago this July, Google
acquired Metaweb,
bringing Freebase and
linked open data to Google,"
he wrote.
Google software engineer Barak Michener
http://www.eweek.com/database/google-releases-cayley-open-source-graph-database
THE Knowledge Graph Seeds
52. #GetUngagged @schachin
Kristine Schachinger
Also includes trusted
sources such as the
CIA Fact Book, Wikipedia,
Wikidata etc.
http://www.eweek.com/database/google-releases-cayley-open-source-graph-database
THE Knowledge Graph Seeds
59. #GetUngagged @schachin
Kristine Schachinger
KEY FACTOR word2vec:
Vector space models (VSMs) represent (embed)
words in a continuous vector space where
semantically similar words are mapped to nearby
points ('are embedded nearby each other').
Hummingbird
https://www.tensorflow.org/tutorials/representation/word2vec
61. #GetUngagged @schachin
Kristine Schachinger
“…words that appear in the same contexts share semantic meaning. The
different approaches that leverage this principle can be divided into two
categories: count-based methods (e.g. Latent Semantic Analysis),
and predictive methods (e.g. neural probabilistic language models).”
Hummingbird
https://www.tensorflow.org/tutorials/representation/word2vec
70. #GetUngagged @schachin
Kristine Schachinger
Entity Salience.
This part of the algorithm
determines meaning
through known
relationships.
+
2018-19 Google adds the
“topic layer” to the
knowledge graph
(categorical classification)
https://moz.com/blog/7-advanced-seo-concepts
71. #GetUngagged @schachin
Kristine Schachinger
So Hummingbird moves from
strict word count based modeling
(ie keyword counts) to
probabilistic modeling
(ie predictive interpretation)
via known word vectors+nodes (relationships).
Hummingbird
77. #GetUngagged @schachin
Kristine Schachinger
What is Structured Data?
Structured data for SEO purposes is on-page markup that
enables search engines to better understand the information
currently on your site’s web pages, and then use this information
to improve search results listing by better matching user intent.
78. #GetUngagged @schachin
Kristine Schachinger
What is Structured Data?
This structured data is defined by using schema to act as the
interpreter. This is the definition we add to the page using
schema code.
Google allows 3 types.
• RDFa
• Microdata
• JSON-LD PREFERS
79. #GetUngagged @schachin
Kristine Schachinger
Schema
JSON-LD is the recommended schema code.
JSON-LD stands for JavaScript Object Notation for Linked Data
This is just a way to implement schema outside the HTML mark-up
structure. RDFa and Microformats required the code to be implemented
via HTML.
80. #GetUngagged @schachin
Kristine Schachinger
Schema
Benefit is it can be removed from the HTML structure, which
makes it easier to write, implement, and maintain.
Resources.
For a good breakdown on what JSON is at the code level.
Portent’s JSON Implementation Guide is very helpful.
https://www.portent.com/blog/seo/json-ld-implementation-guide.htm
And Google has a section in the Developer Guides
https://developers.google.com/search/docs/guides/intro-structured-data
83. #GetUngagged @schachin
Kristine Schachinger
Schema
NOTE this tool only tells you if it is semantically correct, NOT if
you are using the proper schema.
Make sure to check with Google’s Guides on schema implementation.
Improper use or implementation can result in a manual action.
• https://developers.google.com/search/docs/guides/intro-structured-data
• https://developers.google.com/search/docs/guides/prototype
90. #GetUngagged @schachin
Kristine Schachinger
We can help give Google a clearer understanding.
That helps us help Google better answer
the questions users ask
and to better surface our content for those users
We give our data meaning
Google Understands
100. #GetUngagged @schachin
Kristine Schachinger
• Words go in.
• Words get assigned a mathematical address in a vector.
• Similar and related words sit close to each other in the vector space.
• Words are retrieved based on your query and the words it locates in the “best fit” vector.
• These word “interpretations” are used to return results.
• If the relationships are weak or unknown, enter Rank Brain.
• Behind the scenes, data is continually fed into the machine learning process, so as to make
those results more relevant the next time.
Rank Brain – Known Relationships.
113. #GetUngagged @schachin
Kristine Schachinger
Rank Brain vs Neural Matching.
Rank Brain = concepts
Neural Matching = linking words to concepts
“…neural matching, – AI method to better connect words to concepts.” - Google
114. #GetUngagged @schachin
Kristine Schachinger
Rank Brain vs Neural Matching.
A Google patent related to Rank Brain and Neural Matching
describes a system that uses traditional ranking factors to decide
what is relevant, but NOT what is in the top 10.
Which may be re-ordered post retrieval according to
“ad hoc retrieval” methods and ”dynamic relevancy”
https://www.searchenginejournal.com/google-neural-matching/271125/
123. #GetUngagged @schachin
Kristine Schachinger
Write holistic content.
Use terms that are semantically related.
For a detailed explanation Google explains here > https://www.youtube.com/watch?v=vzoe2G5g-w4&feature=youtu.be&t=32m19s
124. #GetUngagged @schachin
Kristine Schachinger
Write holistic content.
DOES YOUR CONTENT HAVE DEPTH AND WIDTH?
For a detailed explanation Google explains here > https://www.youtube.com/watch?v=vzoe2G5g-w4&feature=youtu.be&t=32m19s
130. #GetUngagged @schachin
Kristine Schachinger
Takeaways.
• Think Search Queries NOT Simple Keywords
• Write in natural, conversational language
• Write using holistic content
• Focus on depth and breadth with related terms
• Add Structured Data
• Use well formed text (ie questions) when you can.
Takeaways.