Introduction to Knowledge Graphs: What, Why, Who, How

Introduction to Knowledge Graphs
Mukul Joshi
Enterprise Architect
Tech Mahindra Ltd.
mukul@techmahindra.com
@mukulashokjoshi

Introduction to Knowledge Graphs
Knowledge Graphs are being used in some of the largest enterprises in the world as well as in Intelligent Agents, as a means to capture the concepts/entities
in their domains (i.e. context) and the relationships amongst those concepts/entities, in order to:
 drive their businesses,
 generate insights/inferences and
 to enhance the entities/relationships by applying appropriate learning techniques to generate new entities/relationships
This presentation will give a brief overview of Knowledge Graphs
What? – Preamble/Background to Knowledge Graphs
Who? – Who uses Knowledge Graphs
Why? – Why use Knowledge Graphs
How? – How to use Knowledge Graphs
A few Knowledge Graph use cases:
 Intelligent Agents
 Semantic Web
 Semantic Search Engine
 Social Network graphs
 Biological Networks
 Enterprise Knowledge Graph (including the likes of CMDB/IT Architecture)
 Master Data Management
 Dialog Systems/NLP/NLG
 Financial Knowledge Graphs (risk analysis, fraud detection, GDPR etc.)
 Cybersecurity

Knowledge Graphs
What Why Who How

AI Dimensions – Machine Evolution – What is the End Game?
Matrix Digital Rain are Sushi Recipes
https://en.wikipedia.org/wiki/Matrix_digital_rain
Evolution
Human Machine
Artificial
Intelligence
Human
Intelligence
Human
Machine
Interface
Neural
Chip
(Elon Musk)
Science
Fiction?
Cyborg I, Robot
Consciousness
Matrix
Control
Devices
Control
Humans??
Artificial
Consciousness
Human Level AI
Artificial General
Intelligence
Cognitive
Architecture
Wake Up, Neo
In an interview with MIT researcher Lex Fridman, Tesla and SpaceX CEO Elon Musk
reiterated his belief that we’re all living inside a simulation.
When Fridman asked him what his first question would be for the first-ever artificial
general intelligence system, Musk replied: “What’s outside the simulation?”
Pong Of My People
Musk has floated the possibility that we’re all living in a “The Matrix” style simulation for
several years. During a 2016 interview at a tech conference, Musk asserted
that there’s “a one in billions chance we’re in base reality.”
The argument: video games have evolved from a “Pong, two rectangles and a dot” he
told audiences at the 2016 conference, to “photorealistic 3D simulations with millions of
people playing simultaneously.” Given enough time, those games would become
“indistinguishable from reality.”
https://futurism.com/elon-musk-smart-ai-simulation

AI Dimensions - Intelligent Agent/System
Intelligent
Agent/
System
Thinking Acting
Human
Rational
Agent mimics Human Thinking
& Behavior
 Fidelity to Human
Performance
 Empirical i.e. based on
observations & hypothesis
of Human Behavior
Agent has rational behavior
 System is rational, if it does the
“right” thing given what it
knows
 Rationalist approach is based
on combination of mathematics
and engineering
Source : From the book “Artificial Intelligence : A Modern Approach” by Stuart Russell and Peter Norvig

AI Dimensions – Turing Test for Intelligent Agent/System
Intelligent
Agent/
System
Natural
Language
Processing
Knowledge
Representation
Automated
Reasoning
Machine
Learning
Computer
Vision
Robotics
The Turing test, proposed by Alan Turing, was designed to
provide a satisfactory operational definition of
Intelligence.
A computer passes the test if a human interrogator, after
posing some written questions, cannot tell whether the
responses come from a person or from a computer.
The computer would need to possess the following
capabilities:
• Natural Language Processing – to enable it to
communicate successfully
• Knowledge Representation – to store what it knows
or hears
• Automated Reasoning – to use the stored the
information to answer queries and to draw new
conclusions
• Machine Learning – to adapt to new circumstances
and to detect and extrapolate patterns
The so-called total Turing Test includes a video signal so
that the interrogator can test the subject’s perceptual
abilities, as well as the opportunity for the interrogator to
pass physical objects “through the hatch”. To pass the total
Turing Test, the computer will need:
• Computer Vision – to perceive objects, and
• Robotics – to manipulate objects and move about
Source : From the book “Artificial Intelligence : A Modern Approach” by Stuart Russell and Peter Norvig

AI Dimensions – Broad Learning Categories (ML Tribes)
Machine
Learning
Tribes
Symbolists Bayesians Connectionists
Source: Machine Learning Evolution Infographic: http://usblogs.pwc.com/emerging-technology/machine-learning-evolution-infographic/
Evolutionaries Analogizers
In practice, each of these
algorithms is good for some
things but not for others.
What we really want is a
single algorithm combining
the key features of all of
them: the ultimate master
algorithm
- Pedro Domingos, The
Master Algorithm
Master
Algorithm

AI Dimensions – Third Wave of AI - DARPA
Source : DARPA perspective on Artificial Intelligence: https://www.darpa.mil/about-us/darpa-perspective-on-ai
AI Next (DARPA program) will accelerate
“the Third Wave” which enables machines
to adapt to changing situations.
For instance, adaptive reasoning will enable
computer algorithms to discern the
difference between the use of ‘principal’
and ‘principle’ based on the analysis of
surrounding words to help determine
context.
Dr Steven Walker, Director of DARPA, said:
“Today, machines lack contextual reasoning
capabilities and their training must cover
every eventuality – which is not only costly –
but ultimately impossible.
We want to explore how machines can
acquire human-like communication and
reasoning capabilities, with the ability to
recognize new situations and environments
and adapt to them.”

Artificial
Intelligence
Perceive Learn Abstract Reason
Source : DARPA perspective on Artificial Intelligence: https://www.darpa.mil/about-us/darpa-perspective-on-ai

Source: A DARPA perspective on AI: https://sites.nationalacademies.org/cs/groups/pgasite/documents/webpage/pga_177035.pdf

cognitive adjective
cog·ni·tive | ˈkäg-nə-tiv
Definition of cognitive
1: of, relating to, being, or involving conscious intellectual activity (such
as thinking, reasoning, or remembering)
//cognitive impairment
2: based on or capable of being reduced to empirical factual knowledge
AI Dimensions – Dictionary definition of Cognitive

AI Dimensions – Cognitive AI
This concept of cognitive computing and human-like
reasoning has propelled Abdallat to take a giant leap beyond
the limits of "conventional AI" to help solve complex problems
and engage customers. "Machine learning or neural networks is
what we call the numeric side of AI," he explained. "The human
brain is really not good at doing large calculations; a calculator
can do that faster. But what we're good at is that symbolic side.
That's where we are superior to machines. Beyond Limits takes
the outputs from these numeric tools -- machine learning, deep
learning, neural nets -- and we make actionable intelligence
when the presence of human beings is not possible or when
you want to automate that layer.“
Numeric SymbolicCOGNITIVE
Source: AI for the Enterprise transmitted directly from Mars:
https://searchcustomerexperience.techtarget.com/feature/AI-for-the-enterprise-transmitted-directly-from-Mars

AI Dimensions – Cognitive AI
Trust in the machine
In high-risk, high-value industries such as energy, healthcare, and finance there is too much at stake to trust the decisions of a machine at face value, with no
explainable understanding of its reasoning. Since machine learning is a component for many AI systems, it’s important for us to know precisely what the machine is
learning.
Machine learning is a great method for handling lots of data that can tell you the what. But for AI systems to become trusted advisors to human decision-makers
they need to be able to explain the why.
In order to make an AI system explainable, we add symbolic reasoning on top of numerical calculation. Symbolic reasoning allows the system to think like a person
and supply human-like reasoning to its recommendations. This hybrid approach combining conventional, numeric techniques with symbolic reasoning is called
cognitive AI. To comprehend this more intuitive approach, we have to turn to space exploration.
Source: Beyond Conventional AI: More Intelligent, More Explainable AI:
http://stories.venturebeat.com/beyond-conventional-ai-more-intelligent-more-explainable-ai/
Thinking for itself
If you really think about it, our cognitive technology is based on concepts. You can describe a concept at a strict algorithmic level, or you can choose to add more
natural language components that give the system the ability to explain itself.
Beyond Limits’ cognitive systems have to say: “I have been educated to understand this kind of problem; you're presenting me with a set of features, so I need to
manipulate those features relative to my education (Context)” That process of manipulating the features in order to perform inductive, deductive, and abductive
reasoning produces an explainable trail. If the natural language declarations are in place, then the system can (at a later time or in conjunction) produce natural
language descriptions of what it's doing at any given moment.
Explainability becomes a powerful tool when AI engineers work with subject matter experts to learn about their respective specialties. The engineers study the
specialty from an algorithm/process/detective perspective, and then annotate it in a form to enable the machine to provide explainability at a human level of
understanding. This was a requirement for space missions that Beyond Limits’ scientists solved years ago.

AI Dimensions – Knowledge Graphs – third era of Computing
Source: Knowledge Graphs: The third era of computing: https://medium.com/@dmccreary/knowledge-graphs-the-third-era-of-computing-a8106f343450

Perspectives - Data, Information, Knowledge, Wisdom (DIKW) Pyramid
Data is a collection of facts in a raw or unorganized form such as numbers or
characters. However, without context, data can mean little. By adding context
and value to the data, they now have more meaning.
Information is the next building block of the DIKW Pyramid. This is data that has
been “cleaned” of errors and further processed in a way that makes it easier to
measure, visualize and analyze for a specific purpose.
By asking relevant questions about ‘who’, ‘what’, ‘when’, ‘where’, etc., we can
derive valuable information from the data and make it more useful for us.
“How” is the information, derived from the collected data, relevant to our goals?
“How” are the pieces of this information connectedto other pieces to add
more meaning and value? And, maybe most importantly, “how” can we apply the
information to achieve our goal?
When we don’t just view information as a description of collected facts, but also
understand how to apply it to achieve our goals, we turn it into knowledge. This
knowledge is often the edge that enterprises have over their competitors.
Wisdom is the top of the DIKW hierarchy and to get there, we must answer
questions such as ‘why do something’ and ‘what is best’. In other words, wisdom is
knowledge applied in action.
How Enterprises and Organizations Move Up the Knowledge Pyramid?
One easy and fast way for enterprises to take the steps from data to information to knowledge and wisdom is to use Semantic Technologies such as Linked Data and Semantic Graph
Databases. These technologies can create links between disparate and heterogeneous data and infer new knowledge out of existing facts.
Armed with this new knowledge, enterprises can climb up the mountain of wisdom and gain a competitive advantage by supporting their business decisions with data-driven analytics.
Source: What is the DIKW Pyramid: https://ontotext.com/knowledgehub/fundamentals/dikw-pyramid/

“ISRO knows that Vikram has landed on the
Moon”
Knowledge is a relation between a
Knowerand a Proposition:
Knower: ISRO
Proposition: the idea expressed by a simple
derivative sentence, like “Vikram has
landed on the Moon”
What can be said about Propositions? For
KR&R, what matters about propositions is
that they are abstract entities that can be
true or false, right or wrong
When we say, “ISRO knows that p”, we can
just as well say, “ISRO knows that it is true
that p”
Representation is a relation between domains,
where the first is meant to “stand for” or take the
place of the second (surrogate). It can also
be known as “Digital Twin”
Usually, the first domain, the representor, is more
concrete or immediate, or accessible in some way
than the second
• For example, a road sign indicating sharp turn
ahead stands for a less immediately visible
sharp turn on the road
One type of representor is the formal symbol, that
is, a character or a group of characters taken from
pre-determined alphabets
• The digit “7”, for example, stands for the
number 7 as does the group of letters “VII”
Knowledge Representation, then, is the field of
study concerned with using formal symbols
to represent a collection of
propositionsbelieved by some agents
Reasoning, in general, is the formal
manipulation of the symbols
representing a collection of believed propositions to
produce representations of new ones
 Symbols are more accessible than the
propositions they represent: They must be
concrete enough that we can manipulate them
in such a way as to construct representations
of new propositions
 Socrates is human and All humans are
mortal. Thus Socrates is mortal
 This form of reasoning is called logical
inference because the final sentence
represents a logical conclusion of the
propositions represented by the initial
ones
Reasoning is a form of calculation, not unlike
arithmetic, but over symbols standing for
propositions rather than numbers
Perspectives – Knowledge Representation & Reasoning (KR&R)
Knowledge Representation Reasoning

Perspectives – Symbolic Methods of Knowledge Representation

In their pivotal article Davis, Shrobe, and Szolovits set five principles for knowledge representation:
1.Surrogate: KR provides a symbolic counterpart of actual objects, events and relationships.
2.Ontological commitments: a KR is a set of statements about the categories of things that may exist in the domain under consideration.
3.Fragmentary theory of intelligent reasoning: a KR is a model of what the things can do or can be done with.
4.Medium for efficient computation: making knowledge understandable by computers is a necessary step for any learning curve.
5.Medium for human expression: one the KR prerequisite is to improve the communication between specific domain experts on one
hand, generic knowledge managers on the other hand.
Whereas models are meant to fully and consistently meet these requirements, ontologies do not: ontological commitments (2) are not
supposed to apply to surrogates (1) nor be designed for efficient computation (4). Instead, ontologies are to focus on intelligibility and
transparent reasoning (3, 5), and that can be seen as the main objective of enterprise architecture.
Perspectives – Ontologies to drive KR&R
Source: What is Knowledge Representation: http://groups.csail.mit.edu/medg/ftp/psz/k-rep.html

Ontology (Knowledge Representation)
An ontology is a formal description of knowledge as a set of
concepts within a domain and the relationships that hold between
them. To enable such a description, we need to formally specify
components such as individuals (instances of objects), classes,
attributes and relations as well as restrictions, rules and axioms. As a
result, ontologies do not only introduce a sharable and reusable
knowledge representation but can also add new knowledge about
the domain.
There are, of course, other methods that use formal specifications for
knowledge representation such as vocabularies, taxonomies,
thesauri, topic maps and logical models. However, unlike taxonomies
or relational database schemas, for example, ontologies express
relationships and enable users to link multiple concepts to other
concepts in a variety of ways.
Source: What are Ontologies? https://www.ontotext.com/knowledgehub/fundamentals/what-are-ontologies/

The Benefits of Using Ontologies
One of the main features of ontologies is that, by having the
essential relationships between concepts built into them, they
enable automated reasoning about data. Such reasoning is easy
to implement in semantic graph databases that use ontologies as
their semantic schemata.
What’s more, ontologies function like a ‘brain’. They ‘work and
reason’ with concepts and relationships in ways that are close to
the way humans perceive interlinked concepts.
In addition to the reasoning feature, ontologies provide a more
coherent and easy navigation as users move from one concept to
another in the ontology structure.
Another valuable feature is that ontologies are easy to extend as
relationships and concept matching are easy to add to existing
ontologies. As a result, this model evolves with the growth of data
without impacting dependent processes and systems if something
goes wrong or needs to be changed.
Ontologies also provide the means to represent any data formats,
including unstructured, semi-structured or structured data,
enabling smoother data integration, easier concept and text
mining, and data-driven analytics.
Source: What are Ontologies? https://www.ontotext.com/knowledgehub/fundamentals/what-are-ontologies/

What is the Difference Between a Taxonomy and Ontology?
A taxonomy is a set of definitions that are organized by a hierarchy that starts at the most general description of something and gets more defined and specific as
you go down the hierarchy of terms. For example, a red-tailed hawk could be represented in a common language taxonomy as follows:
Bird
Raptors
Hawks
Red Tailed Hawk
An ontology describes a concept both by its position in a hierarchy of common factors like the above description of the red-tailed hawk but also by its
relationships to other concepts. For example, the red-tailed hawk would also be associated with the concept of predators or animals that live in trees.
The richness of the relationships described in an ontology is what makes it a powerful tool for modeling complex business ecosystems.
Source: Semantic Ontology – The Basics https://www.semanticarts.com/semantic-ontology-the-basics/

 The history of artificial intelligence shows that knowledge is critical for intelligent systems. In many cases, better knowledge can be more important for solving
a task than better algorithms. To have truly intelligent systems, knowledge needs to be captured, processed, reused, and communicated. Ontologies support all
these tasks.
 The term "ontology" can be defined as an explicit specification of conceptualization.
 Ontologies capture the structure of the domain, i.e. conceptualization. This includes the model of the domain with possible restrictions. The
conceptualization describes knowledge about the domain, not about the particular state of affairs in the domain. In other words, the conceptualization is
not changing, or is changing very rarely. Ontology is then specification of this conceptualization - the conceptualization is specified by using particular
modeling language and particular terms. Formal specification is required in order to be able to process ontologies and operate on ontologies
automatically.
 Ontology describes a domain, while a knowledge base (based on an ontology) describes particular state of affairs.
 Each knowledge based system or agent has its own knowledge base, and only what can be expressed using an ontology can be stored and used in the
knowledge base. When an agent wants to communicate to another agent, he uses the constructs from some ontology. In order to understand in
communication, ontologies must be shared between agents.

Ontologies capture knowledge
in context. The context is expressed as
taxonomical trees in which higher order
entities (categories and
domains) generalize properties of
things wherewith you can group them.
All the entities from the highest to the
lowest are concepts, and often
instances, and have properties of their
own. Lower order entities (concepts
and instances) inherit general
properties from above and possess
unique properties of their own.

A knowledge graph acquires and integrates information into an ontology and applies a reasoner to derive new
knowledge.
Knowledge Graph – Definition – Key Concepts
It’s a graph
A knowledge graph is organized as a graph, which is not always true of knowledge bases. The primary benefits of a graph are that
connections (relationships) in the data are first-class citizens, you can easily connect new data items as they are injected into the data pool,
and, finally, you can easily traverse links to discover how remote parts of a domain relate to each other (there’s a huge value in linking
information). A graph is one of the most flexible formal data structures, so you can easily map other data formats to graphs using generic
tools and pipelines.
It’s semantic
the meaning of the data is encoded alongside the data in the graph, in the form of the ontology. A knowledge graph is self-descriptive, or,
simply put, it provides a single place to find the data and understand what it’s all about.
There is an additional benefit in that you can submit queries in a style that is much closer to a natural language, using a familiar domain
vocabulary. That is, the meaning of the data is typically expressed in terms of entity and relation names that are familiar to those interested
in the given domain. This enables smarter search, more efficient discovery, and narrows the communication gap between data providers and
consumers.
Source: What is a Knowledge Graph? https://hackernoon.com/wtf-is-a-knowledge-graph-a16603a1a25f

It’s smart
The underlying basis of a knowledge graph is the ontology, which specifies the semantics of the data. An ontology is typically based on logical
formalisms which support some form of inference: allowing implicit information to be derived from explicitly asserted data. Some of the
information inferred can be otherwise hard to discover.
Knowledge graphs being actual graphs, in the proper mathematical sense, allow for the application of various graph-computing techniques and
algorithms (for example, shortest path computations, or network analysis), which add additional intelligence over the stored data.
They are easy to extend over time as the schema is not strict or prohibitive in the same way as it is for SQL
Knowledge Graph – Definition – Key Concepts
It’s alive
Knowledge Graphs have a flexible structure: the ontology can be extended and revised as new data arrives. This makes it convenient to store
and manage data in a knowledge graph if you have use cases where regular updates and data growth are important, particularly when data is
arriving from diverse, heterogenous sources. A knowledge graph can support a continuously running data pipeline that keeps adding new
knowledge to the graph, refining it as new information arrives.
Knowledge graphs are also able to capture diverse meta-data annotations such as provenance or versioning information, which make them
ideal for working with a dynamic dataset. There is an increasing need to account for the provenance of data and include it so that the
knowledge can be assessed by its consumers in terms of credibility and trustworthiness. A knowledge graph can answer what it knows, and also
how and why it knows it.
Source: What is a Knowledge Graph? https://hackernoon.com/wtf-is-a-knowledge-graph-a16603a1a25f

What Google says about its knowledge graph:
• Take a query like [taj mahal]. For more than four decades, search has essentially been about matching keywords to queries. To a search
engine the words [taj mahal] have been just that—two words.
• But we all know that [taj mahal] has a much richer meaning. You might think of one of the world’s most beautiful monuments, or a
Grammy Award-winning musician, or possibly even a casino in Atlantic City, NJ. Or, depending on when you last ate, the nearest Indian
restaurant. It’s why we’ve been working on an intelligent model—in geek-speak, a “graph”—that understands real-world entities and
their relationships to one another: things, not strings.
• The Knowledge Graph enables you to search for things, people or places that Google knows about—landmarks, celebrities, cities,
sports teams, buildings, geographical features, movies, celestial objects, works of art and more—and instantly get information that’s
relevant to your query. This is a critical first step towards building the next generation of search, which taps into the collective intelligence
of the web and understands the world a bit more like people do.
• Google’s Knowledge Graph isn’t just rooted in public sources such as Freebase, Wikipedia and the CIA World Factbook. It’s also
augmented at a much larger scale—because we’re focused on comprehensive breadth and depth. It currently contains more than 500
million objects, as well as more than 3.5 billion facts about and relationships between these different objects. And it’s tuned based on
what people search for, and what we find out on the web.
Google Knowledge Graph
Source: Introducing the Knowledge Graph:
https://www.blog.google/products/search/introducing-knowledge-graph-things-not/

Diffbot Knowledge Graph
Source: Diffbot Knowledge Graph: https://www.diffbot.com/knowledge-graph/

Diffbot Knowledge Graph
That is also what Diffbot is unveiling today: The ability to query the web
as a database. This impressive feat is also based on a knowledge graph.
The difference is that, in Diffbot's case, the knowledge graph is only
partially curated by humans, and is automatically populated by crawling
the web
First off, you have to crawl the web. This is where Gigablast and Matt
Wells come in. Gigablast is a search engine created by Matt Wells,
Diffbot's VP of Search, in 2000. Tung says this is what Diffbot uses to
crawl, and store, every single document on the web. Hard as this may
be, however, it's not even half the job.
The really hard part is getting the information out of documents, and
this is where the magic is. Tung explains this is done using computer
vision, machine learning (ML), and natural language processing (NLP).
Computer vision helps Diffbot understand the structure of documents. It
mimics the way humans break down documents, figuring out what are
the structural elements of each document -- things such as headers,
blocks, etc. In a perfect world, this should be possible by inspecting the
HTML structure of web documents. But not everything on the web is
HTML, and HTML documents are not perfect either.
After structure comes content. Content is parsed using a combination of
NLP and ML, the result of which is structured knowledge which is added
to Diffbot's knowledge graph (DKG).
Source: The web as a database: The biggest knowledge graph ever: https://www.zdnet.com/article/the-web-as-a-database-the-biggest-knowledge-graph-ever/

Microsoft Graph
Microsoft Graph exposes REST APIs and client
libraries to access data on the following
Microsoft 365 services:
 Office 365 services: Delve, Excel, Microsoft
Bookings, Microsoft Teams, OneDrive,
OneNote, Outlook/Exchange, Planner, and
SharePoint
 Enterprise Mobility and Security services:
Advanced Threat Analytics, Advanced Threat
Protection, Azure Active Directory, Identity
Manager, and Intune
 Windows 10 services: activities, devices,
notifications
 Dynamics 365 Business Central
Source: Microsoft Graph Overview: https://docs.microsoft.com/en-us/graph/overview

Microsoft Graph

Use Microsoft Graph to build experiences around the user's unique context to help them be more productive. Imagine an app that...
 Looks at your next meeting and helps you prepare for it by providing profile information for attendees, including their job titles and managers, as
well as information about the latest documents they're working on, and people they're collaborating with.
 Scans your calendar, and suggests the best times for the next team meeting.
 Fetches the latest sales projection chart from an Excel file in your OneDrive and lets you update the forecast in real time, all from your phone.
 Subscribes to changes in your calendar, sends you an alert when you’re spending too much time in meetings, and provides recommendations for
the ones you can miss or delegate based on how relevant the attendees are to you.
 Helps you sort out personal and work information on your phone; for example, by categorizing pictures that should go to your personal OneDrive
and business receipts that should go to your OneDrive for Business.
 Analyzes at-scale Office 365 data so that decision makers can unlock valuable insights into time allocation and collaboration patterns that improve
business productivity.
 Brings custom business data into Microsoft Graph, indexing it to make it searchable along with data from Microsoft 365 services.
Pick the first scenario about researching meeting attendees as an example. With the Microsoft Graph API, you can:
 Get the email addresses of the meeting event attendees.
 Look them up individually as a user in Azure Active Directory to get their profile information.
You can then navigate to other resources using relationships:
 Connect to their manager through a manager relationship.
 Get valuable insights and intelligence including the popular files trending around the user.
 Get the most relevant people around the user.
 Extend the scenario to get to the user's groups through a memberOf relationship
 Reach other members in each group.
 Tap into other scenarios enabled by groups, such as education and teamwork.
Microsoft Graph

Facebook – Social Graph – TAO
This simple example shows a subgraph of
objects and associations that is created in TAO
after Alice checks in at the Golden Gate Bridge
and tags Bob there, while Cathy comments on
the check-in and David likes it.
Every data item, such as a user, check-in, or
comment, is represented by a typed object
containing a dictionary of named fields.
Relationships between objects, such as “liked
by" or “friend of," are represented by typed
edges (associations) grouped in association lists
by their origin. Multiple associations may
connect the same pair of objects as long as the
types of all those associations are distinct.
Together objects and associations form a
labeled directed multigraph.
Source: TAO: The power of the graph: https://www.facebook.com/notes/facebook-engineering/tao-the-power-of-the-graph/10151525983993920/

Facebook puts an extremely demanding workload on its data backend:
Every time any one of over a billion active users visits Facebook through a desktop browser or on a mobile device, they are presented with hundreds of pieces
of information from the social graph. Users see News Feed stories; comments, likes, and shares for those stories; photos and check-ins from their friends -- the
list goes on.
The high degree of output customization, combined with a high update rate of a typical user’s News Feed, makes it impossible to generate the views presented
to users ahead of time. Thus, the data set must be retrieved and rendered on the fly in a few hundred milliseconds.
This challenge is made more difficult because the data set is not easily partitionable, and by the tendency of some items, such as photos of celebrities, to have
request rates that can spike significantly. Multiply this by the millions of times per second this kind of highly customized data set must be delivered to users, and
you have a constantly changing, read-dominated workload that is incredibly challenging to serve efficiently.
Facebook – Social Graph – TAO
 Facebook users record their relationships, share their interests, upload text, images, and video, and curate semantic information about their data. This data is
the Facebook “Social Graph” and the personalized user experience requires timely, efficient and scalable access to this flood of data
 TAO (The Objects & Associations) is an implementation of a simple data model and API for serving the Social Graph. TAO provides basic access to the nodes and
edges of a constantly changing graph in data centers across multiple regions. It is optimized heavily for reads, and explicitly favors efficiency and availability over
consistency
 TAO is geographically distributed data store
 Facebook focuses on people, actions, and relationships. These entities and connections are modeled as nodes and edges in a graph. This representation is very
flexible; it directly models real-life objects, and can also be used to store an application’s internal implementation-specific data. TAO’s goal is not to support a
complete set of graph queries, but to provide sufficient expressiveness to handle most application needs while allowing a scalable and efficient implementation
A single Facebook page may aggregate and filter hundreds of items from the social graph. We present each user with content tailored to them, and we filter every
item with privacy checks that take into account the current viewer. This extreme customization makes it infeasible to perform most aggregation and filtering when
content is created; instead we resolve data dependencies and check privacy each time the content is viewed. As much as possible we pull the social graph, rather
than pushing it. This implementation strategy places extreme read demands on the graph data store; it must be efficient, highly available, and scale to high query
rates.
Source: TAO: The power of the graph: https://www.facebook.com/notes/facebook-engineering/tao-the-power-of-the-graph/10151525983993920/

Airbnb – Knowledge Graph
 Imagine you’re finally getting to take that vacation you’ve dreamed of
— three countries, seven cities, thousands of miles. It’s everything you
could want and more, right? But where do you start? How do you
know what to eat, where to visit, how to experience what makes your
destination truly unique? All the information you’re looking for is on
Airbnb…somewhere…the question is, “How do we surface
the relevant parts of that information to you at the exact time you’re
looking for it?”
 Discovering what you want and need to know about a destination is
crucial to the overall trip experience, especially when traveling to a
place you’ve never been to before.
 In order to surface relevant context to people, we need to have some
way of representing relationships between distinct but related
entities (think cities, activities, cuisines, etc.) on Airbnb to easily
access important and relevant information about them. These types
of information will become increasingly important as we move towards
becoming an end-to-end travel platform as opposed to just a place for
staying in homes.
 The knowledge graph is our solution to this need, giving us the
technical scalability we need to power all of Airbnb’s verticals and the
flexibility to define abstract relationships.
Source: Scaling Knowledge Access and Retrieval at Airbnb:
https://medium.com/airbnb-engineering/scaling-knowledge-access-and-retrieval-at-airbnb-665b6ba21e95

Why is a graph structure scalable?
 Normally, engineers work with relational structures, where a schema defines what each row of data contains. This is the preferred way for
holding data because it works great for transactional processes since it makes it really quick to access rows of data. However, there is an
operational burden when you have many table for distinct objects that may contain the same relational information in individual columns (ex:
the city homes or experiences are located in, or the type of activity that an experience and that a destination is known for). This is where the
graph structure comes into play.
 Although our underlying data store is still relational, structuring our queries in terms of this graph give us power in maintaining data semantics.
We want the same Surfing that an experience is associated with to be the same Surfing that Hawaii is known for. This type of structure around
the relationships between the entities on Airbnb’s platform gives us the scalability and flexibility needed to expand categorization to any number
of things. By having the same object to represent all of the things in our world, we remove the operational overhead for redefining the world
whenever we introduce a new product to our platform.
 In this way, we can support our objective to 1) encode an exponentially growing number of relationships between entities and 2) enable easy
traversal along those connections.

 The taxonomy of our knowledge graph refers to the vocabulary
that we use to describe our inventory and world around us. The
taxonomy is hierarchical (as shown above), so that we can map
high-level concepts like “Sport” down to a very specific activity
such as “Surfing.” The main constraint that we want to maintain
is that the knowledge graph is Mutually Exclusive and Collectively
Exhaustive, so that we can keep the taxonomy very streamlined
in avoiding duplicate data. Because of the graph structure that
we have, it’s very easy to scale this taxonomy to tens or hundreds
of layers deep and still surface the relevant inventory for high
level concepts.
 In the graph, we have nodes and edges. Nodes refer to any type
of entity on the Airbnb platform (restaurants, neighborhoods,
experiences, events, etc.). Edges refer to the types of
relationships that exist between any of the entities in the graph.
Under this model, there are different types of nodes for different
types of entities and different types of edges for different types
of relationships (located in, tagged by, etc.). From there, we have
a flexible API to query for neighbors connected by certain types
of relationships and can index our inventory items by the unique
identifiers of their corresponding representation in the
knowledge graph.
 Once a critical mass of data is reached, we can start thinking
about making automatic inferences based on the data already in
the graph. For example, if something is tagged with “Nature” and
“Walking,” maybe we can infer that it should be tagged with
“Hiking” as well.

 The shopping interests of customers keep changing with their lifestyle. A customer might want to shop for a book today and 3 months later shop for a laptop.
With an increase in such behavioural changes, there was a need to relate such users.
 Commerce graph is one such solution at Flipkart to identify users with similar shopping patterns. It relates users and products based on the user’s
behaviour on our website and apps.
 Our commerce graph has approximately more than 500 million nodes and 2 billion edges. The graph involves buyers, sellers, products and
categories in a single connected graph.
 Few use cases currently being powered by our commerce graph are:
 Generating features for insights ML models
 Finding out heterogeneous users as beta and control set
 Detecting fraudulent sellers and buyers
 Sending targeted marketing campaign to users
A commerce graph is a data structure that connects products, categories, sellers and users who view or purchase products on Flipkart.
Flipkart Commerce Graph
Source: Flipkart Commerce Graph - Evaluation of Graph Data Stores:
https://tech.flipkart.com/flipkart-commerce-graph-evaluation-of-graph-data-stores-8fe0f964affd

 LinkedIn knowledge graph is a large knowledge base built upon “entities” on LinkedIn, such as members, jobs, titles, skills, companies, geographical
locations, schools, etc. These entities and the relationships among them form the ontologyof the professional world and are used by LinkedIn to enhance its
recommender systems, search, monetization and consumer products, business and consumer analytics.
 we build LinkedIn knowledge graph based on a large amount of user-generated content from members, recruiters, advertisers and company administrators, who are not aware of
their participation in building the knowledge graph, and supplement it with data extracted from the Web, which is noisy and can have duplicates. The knowledge graph needs to
scale as new members register, new jobs are posted, new companies, skills and titles appear in member profile, etc.
 we apply machine learning techniques to solve the challenges when building the LinkedIn knowledge graph, which is essentially a process of data standardization on user-
generated content and external data sources, in which
machine learning is applied to:
entity taxonomy construction,
entity relationship inference,
data representation for downstream data consumers,
knowledge penetration (insights) from graph, and
active data acquisition from users to validate our inference and collect training data.
 LinkedIn knowledge graph is a dynamic graph. New entities are added to the graph and new relationships are formed continuously. Existing relationships can also change.
For example, the mapping from a member to her current title changes when she has a new job. We need to update LinkedIn knowledge graph in real-time upon member profile
changes and new entity emergencies.
LinkedIn Knowledge Graph
Source: Building the LinkedIn Knowledge Graph:
https://engineering.linkedin.com/blog/2016/10/building-the-linkedin-knowledge-graph

Semantic Web Technologies
The term “Semantic Web” was coined by Tim Berners-Lee for a web of data
(or data web)[6] that can be processed by machines[7]—that is, one in which
much of the meaning is machine-readable. While its critics have questioned its
feasibility, proponents argue that applications in library and information science,
industry, biology and human sciences research have already proven the validity
of the original concept.[8]
Berners-Lee originally expressed his vision of the Semantic Web as follows:
I have a dream for the Web [in which computers] become capable of analyzing
all the data on the Web – the content, links, and transactions between people
and computers. A "Semantic Web", which makes this possible, has yet to
emerge, but when it does, the day-to-day mechanisms of trade, bureaucracy
and our daily lives will be handled by machines talking to machines. The
"intelligent agents" people have touted for ages will finally materialize.[9]

Semantic Web Technologies
The term "Semantic Web" is often used more specifically to refer to the formats and technologies that enable it.[5] The collection, structuring and recovery of linked
data are enabled by technologies that provide a formal description of concepts, terms, and relationships within a given knowledge domain.
These technologies are specified as W3C standards and include:
Resource Description Framework (RDF), a general method for describing information
RDF Schema (RDFS)
Simple Knowledge Organization System (SKOS)
SPARQL, an RDF query language
Notation3 (N3), designed with human-readability in mind
N-Triples, a format for storing and transmitting data
Turtle (Terse RDF Triple Language)
Web Ontology Language (OWL), a family of knowledge representation languages
Rule Interchange Format (RIF), a framework of web rule language dialects supporting rule interchange on the Web
 RDF is a simple language for expressing data models, which refer to objects ("web resources") and their relationships. An RDF-based model can be represented
in a variety of syntaxes, e.g., RDF/XML, N3, Turtle, and RDFa. RDF is a fundamental standard of the Semantic Web.[25][26]
 RDF Schema extends RDF and is a vocabulary for describing properties and classes of RDF-based resources, with semantics for generalized-hierarchies of such
properties and classes.
 OWL adds more vocabulary for describing properties and classes: among others, relations between classes (e.g. disjointness), cardinality (e.g. "exactly one"),
equality, richer typing of properties, characteristics of properties (e.g. symmetry), and enumerated classes.
 SPARQL is a protocol and query language for semantic web data sources.
 RIF is the W3C Rule Interchange Format. It's an XML language for expressing Web rules that computers can execute. RIF provides multiple versions, called
dialects. It includes a RIF Basic Logic Dialect (RIF-BLD) and RIF Production Rules Dialect (RIF PRD).
An ontology in OWL is made up of statements about Classes (i.e., sets of things) and Properties (ways that things relate to other things).
Classes and Properties are Concepts.

Semantic Web Technologies - Wikidata
Source: Wikidata: https://www.wikidata.org/wiki/Wikidata:Main_Page

Semantic Web Technologies - Wikidata

How to build Knowledge Graphs?

Example Content to be represented in different graph storage models
We consider the following Graph Data Storage Models:
 Labelled Property Graph (LPG)
 Resource Description Framework (RDF)
 Specialized Data Storage Model
LHS is a Summary Content Card of a well-defined search
query on google. We will use this as an example to show the
differences in modelling of LPG & RDF Graphs
Source: Flipkart Commerce Graph – Evolution of graph data stores: https://tech.flipkart.com/flipkart-commerce-graph-evaluation-of-graph-data-stores-8fe0f964affd

Labelled Property Graph (LPG) Stores
The LPG Storage model is extensible and used when extensive storage
and the ability to write a query are the requirements.
An LPG store has the following characteristics:
 The vertices (nodes) have a uniquely identifiable ID and a set of key-
value pairs, or properties, that characterise them.
 The edges (relations) have an ID, a type and a set of key value of
pairs, or properties that characterise the connections.
The LHS labelled property graph depicts the Summary Card illustration
The LPG has certain elements in common with the RDF graph along with
the following imperatives:
The Nodes have an internal structure. Example: The Node ‘v-wall’ has
properties such as name and genre to characterise the content stored in
the node.
The Values of attributes do not represent the vertices in the
graph. Example: The node ‘v-floyd’ has properties such as Name to
characterize the content stored in the node.

Resource Description Framework (RDF) Store
RDF Storage model is mostly used in solutions which exchange data
and metadata. At the core, is this notion of a ‘Triple’, which is a
statement composed of three elements that represent two vertices
connected by an edge. It’s called subject-predicate-object.
 Subject is a resource, or a Node in the graph
 Predicate represents an edge–a Relationship
 Object is to be another node or a literal value. From the RDF
perspective, this is another vertex.
Resources (vertices/nodes) and Relationships (edges) are identified by
a URI, which is a unique identifier
The LHS RDF graph depicts the Summary Card illustration
RDF store creates an atomic decomposition of the data, and we may
find nodes in the graph that are resources and also literal values.

Comparison

Grakn.ai
Source: GRAKN.AI: The Hyper-Relational Database for knowledge oriented systems:
https://www.slideshare.net/GraknLabs/graknai-the-hyperrelational-database-for-knowledgeoriented-systems-80937069

Grakn.ai
Source: Logical inference in a Hyper-Relational Database:
https://www.slideshare.net/GraknLabs/logical-inference-in-a-hyperrelational-database-80936775

Why use Knowledge Graphs?
Source: Data Centric Architecture in Business Transformation using Knowledge Graphs:
https://www.slideshare.net/AlanMorrison/datacentric-business-transformation-using-knowledge-graphs

Data Models
Relational Hierarchical Network
A database model to
manage data as tuples
grouped into relations
(tables)
A structure of data
organized in a tree like
model using parent child
relationships
A database model that
allows multiple records to
be linked to the same
owner file
Arranges data in tables Arranges data in a tree
similar structure
Organizes data in a graph
structure
Represent both “one to
many” and “many to
many” relationships
Represents “one to many”
relationship
Represents “many to
many” relationships
Easier to access data Difficult to access data Easier to access data
Flexible Less Flexible Flexible
Each child node can have
only 1 parent
Child node can have
multiple parents

Artificial Intelligence – Five Tribes of ML
 Symbolists – view learning as the inverse of deduction and take steps
from philosophy, psychology and logic
Master algorithm – Inverse deduction
 Connectionists – reverse engineer the brain and are inspired by
neuroscience and physics
Master algorithm – Back propagation
 Evolutionaries – simulate evolution on the computer and draw on
genetics and evolutionary biology
Master algorithm – genetic programming
 Bayesians – believe learning is a form of probabilistic inference and have
their roots in statistics
Master algorithm – Bayesian inference
 Analogizers – learn by extrapolating from similarity judgements and are
influenced by psychology and mathematical optimization
Master algorithm – support vector machine
Source: The five tribes of machine learning https://medium.com/@jrodthoughts/the-five-tribes-of-machine-learning-c74d702e88da

Introduction to Knowledge Graphs: What, Why, Who, How

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Introduction to Knowledge Graphs: What, Why, Who, How

Semelhante a Introduction to Knowledge Graphs: What, Why, Who, How (20)

Último

Último (20)

Introduction to Knowledge Graphs: What, Why, Who, How