2. Can you make your decisions on
just 20% of your data?
◦ According to IDC Research, less than 20% percent
of an enterprise’s information is in the form of
structured data which can reside neatly in traditional
columnar relational databases
◦ 80% of information is unstructured and semi-
structured in the form of documents, web-
pages, emails, images and videos which are growing
at a tremendous rate
◦ Current Enterprise Systems of Record
(ERPs, CRMs) capture a miniscule amount of
information generated within an Enterprise
◦ However the Systems of Record remain the main
focus of the IT team and the main source of
information for the enterprise leadership
3. Unstructured Data creates
Enterprise Information
Management Challenges
• Information is scattered and inaccessible
• Spread across documents, spreadsheet, emails
• Data is stored in multiple, often incompatible
formats
• Data sources are not linked
• No documented relationships between pieces of
information
• No easy way to harness data from external sources
including social networks
• Information is hard to understand
• Different terminology and vocabularies
4. How do employees create and
share information? Through
Systems of Engagement
These systems are the primary way
employees in an Enterprise
communicate and share information,
namely
◦ Email
◦ IM (Lync)
◦ Social collaboration tools like Yammer, Tibbr,
Jive
Not surprisingly, these systems
generate unstructured data at high
5. Systems of Engagement
Loosely structured knowledge flows
Conversational
Dynamic and in flux
6. How does the industry extract information
from unstructured text? Google
Knowledge Graph
The Google Knowledge Graph provides “Things not just
Strings”, that is, it enhances its search results with semantic
information gathered from multiple sources. It provides
structured information about entities and links to other related
entities. Its goal is to help people
• Find the right thing: Find the right entity, understand the
difference between Taj Mahal the monument and Taj Mahal
the musician
• Get the best summary: Summarize relevant content
related to the entity, key facts and other related entities
• Go deep and broader: Help make unexpected discoveries
and relationships
7. How does an Enterprise extract
information from these Systems of
Engagement? Enter the Enterprise
Knowledge Graph (EKG)
Along the lines of the Google Knowledge Graph, the EKG aims to help
enterprises extract and explore information created by systems of
engagement. Core EKG concepts are:
• Knowledge Capture: Extract key concepts and relationships from
unstructured documents using an Enterprise Ontology. This allows
concept based indexing of content
• Example: An employee submits a trip report in the form of an email. EKG automatically extracts
the Who, What, When and Where information and links it to other relevant resources.
• Knowledge Discovery: Search multiple data sources for information
using a relevant Enterprise Ontology
• Example: A proposal manager can ask, “Who has background information about
the Army CIO/G6?”.
• Knowledge Exploration: Expose information to a host of graphical tools
to visualize and further analyze relationships between data
8. How is the EKG seeded?
Crowd-source the creation
The major source of information generation in an enterprise
is email. The process to seed the EKG with email would
be:
◦ The sender copies their email to a monitored EKG email
mailbox
◦ The EKG parses, analyzes and adds the extracted facts
to the Knowledge Graph
◦ The EKG then sends an automated email back to the
sender, describing the facts and a link to correct the
extracted information
Start with a specific Ontology geared towards a high value
use case and then build out the entities and their
relationships
9. Benefits of adding email to the
EKG
◦ Bigger insights as we can leverage the
collective interactions of all the employees
(not just the respondent) and the
subsequent interactions enrich the
EKG, allowing even more questions to be
answered
◦ Liberate employee knowledge, expertise
and interactions from the mailbox and
make it available for the enterprise to
leverage.
10. EKG Benefits
• Utilize all available knowledge sources
• Allows documents, spreadsheets and emails to serve as “top
level” information sources
• Integration
• Ties disconnected pieces of data together into meaningful
wholes that provide a basis for planning and decision making
• Meaning-Centric
• Facts around an object or an entity can be easily explored
• Search phrases are better “understood” as they are based
upon concepts and not literals
• Serendipity
• Related searches allow the formerly “unknown” to be
discovered
SLIDE 10
11. How we discover information within an
Enterprise today
Sumeet Vij
Proposal Manager Resume Facts
System
Search
Presented at
Cliff Daus
Attended
DoD SOA &
Semantic
Technology Attended Trip Report
Symposium
Search Opportunity
Who has Management
information System Follow on Meeting
about the army
Employee of
CIO/G6 ?
Demonstration
Attended
Social Network
at
CIO/G6
Search CIO/G6
Customer
Topic
CRM
Attended
Semantic
Technologie
s
Web
A B C D
Systems of Record Systems of Engagement
SLIDE 11
12. Knowledge Discovery using EKG
Proposal Manager
Knowledge Discovery Knowledge Capture Web Submission
Who has information about the Army CIO/G6?
Entity Extraction
Parse Trip Reports
Meeting Minutes
Email Submission Etc.
Determine Sources for Information
Query
Resume Knowledgebase
System
Opportunity
Management Submit
System
CRM Update
Sumeet Vij
SLIDE 12
13. Conceptual EKG Architecture
• An open architecture composed of re-
useable open source components
User Interface Layer
Document Knowledge
Query UI
Upload Browser
Semantic Processing Layer
Data Source
Entity Extraction Concept Catalog
Catalog
Integration Layer Persistence Layer
E-Mail Database Web Services NoSQL
Connector Connector Client Store
SLIDE 13