2. The Company & The Opportunity
We believe there is huge business value locked away in content, because content
contains the majority of the organization's intelligence. Organizations that
unlock that value can outperform competitors in their market. Content
augments the value that most organizations have already found in structured data
- it is the untapped frontier of competitive advantage.
To realize this value we know you must understand your content, the information
and knowledge it contains, how it can be applied in the context of your
operations and how it enhances the insights from structured data. The content
must be described completely and consistently with metadata.
We focus all our energy on creating value from unstructured content – something
we call Content Intelligence
Jeremy Bentley, CEO
“
4. Search “Maturity” - why am I searching?
Search Volume
Search Value
Document Search Subject Search
“Euraka” Search
€0
€1M?
SEARCH
REFINERS
ENTITIES
CLUSTERING
RELATIONSHIPS
FACT EXTRACTION
5. • Digital universe is growing
dramatically
• Most of this information is
unstructured
• Only a small fraction of the
digital universe has been
explored for analytical
value
• Valuable knowledge and
relationships are hidden in
this data
The Challenges … Part 1
Relentless growth in content volumes ……
7. The Challenges …. Part 3
The proliferation of systems within and beyond the “Hyper-Connected Enterprise”
is creating a HUGE range of sources of ‘content’:
• File shares
• DMS
• ECMS
• ERP
• HR (HCM)
• Finance & Legal
• Email
• Knowledge Base
• CRM
• SFA
• Twitter (etc.)
• LIMS (eg)
• Call Centre Logs
Board & Meeting Minutes, Engineers’ Reports, H&S Audits (and Actions), Annual Appraisals, Processes &
Procedures, Newsletters, Marketing & Product Materials, Maintenance & Repair Manuals, Contracts, Letters of
Credit, Insurance Policies, Supply Chain Information, Strategy Documents, Business Plans, Annual Report,
Regulatory Submissions, Management Reports, Performance Management Documents, Grievance Procedure
Evidence ……..
8. Contextual metadata driven experience
User interfaces to leverage the ontologies to deliver the
richest experience for users when publishing, using and
analyzing content
Semaphore delivers these capabilities – enterprise scale
Build and manage semantic models
Simplify the ingestion, development or customization of
ontologies
Assisted and automated metadata enrichment
Automatically describe all your content with rich
metadata
What is Semaphore?
17. Executing a SOLR Search
Executing a Solr Search
The screen-shot
shows the
Semaphore SAF
(Search Application
Framework) front-end
where the user is
wanting to search
NASA content for
information on the
“moon buggy”.
The search box is
prompting with
suggestions from the
model, but we’re
going to ignore these
to illustrate the
benefits of using
semaphore to
enhance solr search
18. Standard Search ResultsStandard SOLR Search
Results
You can see there are
over 2000 results.
The standard SOLR
method for joining two
words is to use an
‘OR’; as a result you
get the majority of
results that mention
“Moon” but are not
about the “Moon
Buggy”.
19. More sophisticated searching
still doesn’t get better results
A more
knowledgeable user
might search for the
phrase “moon buggy”
which should
potentially return
more relevant results,
but may not return
ALL the relevant
results as there may
be other ways to
describe this item.
20. Standard Search ResultsOntology Driven Widgets
provide “Did You Mean?”
Each set of results
includes some
suggested terms,
extracted from the
Semantic Model using
a process called
“Concept Mapping”.
The most common use
for this is to provide a
“Did You Mean” panel.
The user can hover-
over terms and see
information such as a
description and images
surfaced from the
Model. In this case, the
user has selected the
preferred-term of
“Lunar Roving Vehicle”
as the picture matches
what they call the
“Moon Buggy”.
21. Model Assisted Search ResultSearch Results enhanced by
Semantic Model
In this case, the user has
selected the preferred term
“Lunar Roving Vehicle”
(either when prompted in
the search box or via the
“Did you mean” panel).
The search engine is now
returning the 59 results that
were categorised as being
relevant to the Lunar
Roving Vehicle, using the
rules built automatically
from the Semantic Model,
using as evidence the term,
its acronyms (‘LRV’), its
synonyms (such as ‘Moon
Buggy’) and the context of
the related missions
(Apollos 15, 17 and 17).
Results returned in this
type of search will be more
relevant, as the match is
determined by a linguistic
analysis of the content –
not by a search algorithm.
22. Search refiners augment the
Semantic Model
The search results
page includes
refiners, populated
from document
metadata which can
be obtained from the
document itself, or by
classification against
the Semantic Model.
These refiners can be
used to supplement
the Semantic Model,
for example you could
use an author refiner
to identify experts on
the subject that you
are researching.
24. Document for Categorisation
This slide shows how you
can apply the Semantic
Model to documents (in this
case a transcription of an
Apollo crew de-brief) to
automatically identify the
areas of the model that are
discussed in this document.
These items are stored as
various items of metadata,
in this case when the
document is uploaded to
SharePoint, although
Semaphore integrates with
many other systems.
Semaphore has also
identified the type of
document, and this can be
used to drive additional
workflow such as
compliance etc.
Lastly, the Model is
interactive – document
authors can browse the
model for relevant terms, or
use search-as-you-type.
25. Entity Extraction
In this example a
document (taken from
Wikipedia) is not only
being categorised for
Subject (in this case
topics from a civilian
government
taxonomy) but
Semaphore is also
extracting
Organisations and
People found in the
document using
Natural Language
Processing, names
that can be included
as Metadata even
though they aren’t
part of the Semantic
Model.
26. Fact Extraction
In this example
Semaphore is being
used to process legal
documents to
automatically extract
key pieces of
information such as
Party names,
amounts, terms and
conditions etc. Where
these items can be
extracted explicitly
they can be stored as
metadata properties;
where they cannot be
extracted explicitly,
the clauses referring
to these items can be
stored for manual
processing.
28. Model; High Level Concepts
Browsing the Semantic Model
Semaphore provides
a collaborative
environment for
managing semantic
models, capitalising
on subject matter
experts within an
organisation.
This illustration shows
the Semaphore
Workbench being
used to browse the
NASA Model, the user
can select to browse
by top-level category,
or can type a search,
which will be matched
to terms in the model.
29. Concept Relationships (Collaboration Tool)
Term information
The Semaphore
workbench shows
how each term fits
into the model,
including related
terms, synonyms and
term properties. All
this information can
be used in document
categorization and in
search enhancement
as illustrated in this
presentation.
30. Obtaining feedback
The Semaphore
Workbench also
allows collaboration:
subject matter experts
can contribute to the
quality of the
Semantic Model by
suggesting additional
terms, synonyms and
related terms.
31. The Value of a Semantic Solution
Our clients describe the value they derive in a number of ways, here are just three:
Cost Efficiency:
One organisation, which has a very engineering/scientific workforce, indicates that it saves the
equivalent of cUS$700 per employee per year due to the reduction in time taken to find the right
content from across many content repositories. ($700/$45 (hourly salary) = 15.5 hours/year saved =
19 minutes/week saved). With over 10,000 employees the equivalent savings are huge.
Cost Savings:
Another organisation calculated the cost of classifying documents manually at US$3 per document
(based on staff costs, office space, etc). With over 500,000 documents needing to be classified the
Return on Investment was 10 fold – and would continue to increase as more documents are produced.
They also cited the quality and consistency of auto-classification to be significantly better than human-
classified content
Risk Reduction:
Financial Services companies that cannot prove compliance to a host of regulations are being fined
millions of Euros/Pounds/Dollars. One reason they cannot prove compliance is that the evidence they
need is lost or locked away in textual content, in a file-share or in a Content Repository, poorly
classified. Our semantic solution makes the evidence readily available and provides consistency over
time. Looking for the same evidence at a later date will still deliver the same results.
Thank you … and I am delighted to be here and have the opportunity to talk about – and show – how the “new-ish” world of semantic technologies, combined with Information Science, can aid “Discovery”.
I am humbled to be on a mainland European country and be delivering a presentation in my native tongue. However, of course, as an Englishman in Holland, it is impossible to converse in any other language – you are all too good!
As I am speaking in English I do sometimes go a little too fast (as I get excited). If I do, please shout, wave or throw things and I will attempt to slow down.
I am Paul Gunstone, the EMEA/APAC Sales Director for Smartlogic
Smartlogic has been in this world of semantic technologies since 2006. We set out then to help organisations benefit from the business value locked away in their unstructured and semi-structured content. That is what we still do today, it is all we do.
In order to be able to realise the business value in ‘content’ you have to know that the content is, what it is about, what entities and facts it contains.
Our platform (Semaphore) uses a combination of a semantic domain model and NLP to read documents and surface the “aboutness”, the entities and the facts as a rich layer of metadata or as RDF triples.
We don’t keep copies of any of the content, there is no duplication or ETL processes, we simply process it as it is being ingested into some form of content store or is being re-indexed.
This is the last “marketing” slide but it illustrates that the company, whilst a UK company, has a global footprint and that the platform scales.
It is worth noting that this slide shows use cases in Financial Services, Government or Public Sector, Media, Life Sciences, more general manufacturing and Retail, Professional Services, Heavy Engineering.
Any industry, any organization that has large volumes of “content” will benefit from understanding what that content is and what information/data it contains. We call that Content Intelligence.
I just wanted to spend a moment or two understand why people search for things and the value those searches deliver.
You can see at the bottom left of the curve individuals are performing Document Searches … “I need the latest version of the holiday application form”
This has been by far the greatest volume of searches conducted … but is being caught by Subject Searches …. “I want to understand more about competitor X, give me all of the documents we have.” So not just the latest competitor analysis document but perhaps news or a report on the CEO or pricing information or last deals won or lost against them. This clearly is likely to represent greater value to the enterprise if the search is fulfilled effectively
Finally, a very very small part (at the moment) largely because it has been too difficult but Eureka Search … “I don’t know what I don’t know, I don’t know what I’m looking for, but show me entities in the content we have, show me relationships between those entities and let’s see if anything interesting emerges”
This was the world of advanced analytics, OLAPs and cubes, where you had to define the fields you want to examine ahead of time and if you wanted to add or change a field you had to re-define the cube. With RDF Triples, Graph Databases (or Triple Stores) this is crashing into the world of Search.
Why is it so difficult then?
Of the many reasons I have picked three
First is the inexorable growth in content (largely unstructured content). This will contain some very interesting “stuff”. It will also contain a vast amount of garbage … but unless you read it all, you won’t know what it does contain and you won’t be able to use the interesting “stuff”
Second is that we don’t all talk about things in the same way.
In this example – from the NASA domain model you can see that whilst I might talk about the Moon Buggy (and others might talk of the Geological Rover), the official NASA term is Lunar Roving Vehicle.
Now this one is quite easy because it is famous and public but what about the issues in an Enterprise.
I was working with a large pharma only a few years ago that didn’t have a consistent term for each Country in which they operated. Fortunately they have spent many many millions on three MDM programmes in the years since then …. And they still don’t have a consistent term for each Country in which they operate. But three MDM vendors are happy!
My final example is just a reflection of the corporate world today where content is scattered across the inside of organisations in different systems, structured and organised in different ways. As organisations become part of bigger ecosystems the problem is compounded by the different ways that other companies do things.
How do you refer to the components of a “switch rail” – in manufacturing, in engineering, in marketing, in maintenance. How do the third parties you have to use occasionally refer to them. How do you reflect changes in all of the documentation when it is “owned” by so many different people in different departments.
How do you use the information you learned 5 years ago (and locked away in a closed project) in a problem that has occurred today.
Food for thought ….
So how does this Semantic thing help??
Our Semantic platform helps by reading the content and surfacing that rich layer of metadata that I spoke of earlier.
At the heart of it is a domain model (something that describes your universe
My final example is just a reflection of the corporate world today where content is scattered across the inside of organisations in different systems, structured and organised in different ways. As organisations become part of bigger ecosystems the problem is compounded by the different ways that other companies do things.
How do you refer to the components of a “switch rail” – in manufacturing, in engineering, in marketing, in maintenance. How do the third parties you have to use occasionally refer to them. How do you reflect changes in all of the documentation when it is “owned” by so many different people in different departments.
How do you use the information you learned 5 years ago (and locked away in a closed project) in a problem that has occurred today.
Food for thought ….
So how does this Semantic thing help??