Optimize Enterprise Content with a Powerful Search Solution

Enterprise Search
8/12/2011 – Damien Dewitte

Enterprise Search
Setting the scene

Damien Dewitte
Lead ECM consultant
2.

Contents

search
The enterprise search promise
Some thoughts on search scenarios
Make your content “findable”
Search: How it works
The enterprise search market

3.

While on the Intranet …

5.

the Enterprise Search promise

6.

The Enterprise Search Promise
IDC 2001:‖The High Cost of Not Finding Information”
Cost=
Poor decisions based on faulty or poor information
Duplicated efforts within different divisions/projects
Lost sales due to customer‘s inability to find product and services
Lost productivity due to employees inability to find information

7.

Google (2008)

8.


9.

The Invisible Intranet
Using Search on an Intranet usually leaves a huge portion of
existing valuable information ‗invisible‘, because
Some information silos are not indexed:
Databases with structured content
External sources
Isolated departmental content repositories
Individual desktops
Content applications ‗in the cloud‘
Digital Archives
Some Information is ―over-secured‖
Some Information is trapped in proprietary file formats, which can not
be indexed
Some Information can not be extracted as text
Rich Media files (Audio, Video)
Badly scanned documents

10.


11.


…

MAIL SEARCH
SITE SEARCH

DMS SEARCH

ECOMMERCE
CORPORATE
BI SEARCH

SEARCH
SEARCH
Enterprise Search
Platform

Legacy Data RDBMS Files WWW Direct
(e.g. ISAM, (JDBC, ODBC, (e.g. Word, Excel, (HTML, XML, WML, Push
VSAM, IMS) SQLNet, DW, pdf, images, mp3) JavaScript)
DM)
Applications Message Queues
(e.g. ERM, CRM, DMS eMail Systems Portals Private Webs (e.g. TIBCO,
Help Desk) (e.g. M’Soft CMS, (e.g. Notes, (e.g. WebSphere, (e.g. news feeds, MQ-Series)
Documentum) Exchange) WebLogic) Intranets)

REAL--TIME
STRUCTURED UNSTRUCTURED

1 12.

―There’s no reason to expect that search is going to get that
much better. The basic algorithms by which search is
done have not improved much since about 1975.
The only way to improve the situation is by enhancing
search engines with more deterministic metadata.
If you look at the victory of Google, it wasn’t because they
had better search techniques. It’s because they deployed
one key metadata value – how many pages are linked to
this one – to enhance the relevancy of their results.
The same concepts need to be applied to the enterprise.‖

(Tim Bray)

1 13.

Some thoughts on search scenarios

14.

Enterprise versus web search
Web Enterprise
Content Mainly HTML and All formats and
PDF sources, including
databases and
legacy systems
Security Focus on system Also restricting
security user access to
specific content
Updates Via (scheduled) Push updates to
crawling the index (near
real time)
Volume On average: 1000 Potentially: >
files 1.000.000
“records”
Metadata Centrally in e.g. Consolidate
Web CMS metadata from
management various source
systems
15.

Enterprise versus web search
Probably the cheapest website search you can find

16.

Structured versus unstructured

Start by Start by
filtering typing 17.

Search versus research
“Meeting “Ecm and Green
minutes social IT in Europe”
collaboration
project” “Amplexor “average time
proposal for spent on
Intranet” searching for
content”
“Does ECM have
“Timesheets impact on
april 2009” governmental
decisions in Spain?”

“Life is like a box of chocolates
“I know you’re out there..”
You never know what you gonna
18.

Search based on
“Meeting Information Type (Meeting minutes,
minutes social Proposal, Invoice, Timesheet, …)
collaboration Document Format (PDF, DOC, PPT, e-
project” mail, …)
Organisational Source
Projects
Products
Processes
– HR
– Compliance
– Marketing
– IT
– …
…
Publication Date, Modification date
Author

Search queries are more or less
predictable (after analysis) 19.

Research based on
Entities:
People
Geographical locations “Does ECM have
Companies & Brands impact on
… governmental
Source: Internal or External decisions in Spain?”
Publication Date Range
Natural language search

Search queries are unpredictable. The
system should be “taught” how to
interpret a query. (natural language
search, entity extraction from content,
… 20.

Metadata
What is metadata?
Information about the information:
Descriptive
Structural
Administrative

Types of metadata:
Implicit (e.g. creation date, publication date, URL, filename, file format, source
system, …)
Explicit (e.g. owner, topic, summary, expiry date, status, …)

Guiding metadata input with:
Taxonomies
Folksonomies
Ontologies

21.

Folksonomies

http://taggalaxy.de

23.

Ontologies
Taxonomies, representing knowledge as a set of concepts
within a domain, and the relationships between those
concepts

http://en.wikipedia.org/wiki/Geopolit
24.

Metadata
Statement 1: ―A performant Enterprise Search Engine should
not require information workers to add metadata. It should
just Crawl all my information sources‖
But:
Will users understand the
results displayed?
(title, author, …
How will they filter results?
Does it really help to crawl
1.000.000 records
if 900.000 have become
irrelevant over time?

25.

Metadata
Statement 2: ―Google doesn‘t need metadata‖
Are you sure?

26.

Metadata
So you think Google doesn‘t need metadata?

27.

Simple example of the semantic web

28.

Metadata
Statement 3: Adding metadata is so time consuming my
information workers will never do it.
Yes, but:
In an structured ECM approach, it is possible to automate lots of the
metadata input, because it can be deduced from some business rules
If you‘re not 100% sure you will need a metadata field for a specific
purpose, then don‘t create it.
Convince users about the value of the metadata fields which remain
Make it user friendly for content contributors to add metadata

29.

Metadata
Avoid defining metadata around the document, if it should
already be present IN the document.

30.

Make content findable

31.

Findability
Findability is not obtained just by implementing search
technology
AIIM.org: ―Information Organization and Access (IOA) refers
to a collection of technologies to help you organize and find
information‖, which includes:
enterprise search
content classification
categorization and clustering
fact and entity extraction
taxonomy creation and management
information presentation (i.e., visualization)
information governance

32.

Findability Tips & Tricks
The more value content has, the more effort should be spent
in managing it (and making it findable)

33.

One search interface doesn‘t solve it all. Keep in mind that
Specific content sources or Lines of Business might require
specialized search screens

34.

Define specific search scopes, if your information
governance permits …

35.

Landing Pages are still
―in‖!
Projects Overview Page
Knowledge base page
(links to knowledge bases)
Practical Guide
(categorized hyperlinks to
practical information)
Tools
Forms
Filtered listings (e.g.
Automatic listing of all FAQ
Content types)

36.

How search works

37.

How it works

Architecture

TUNING,
ADMINISTRATION
Web
Content Vertical
WEB Pipeline Pipeline
Query Applications
CRAWLER

Files, SEARCH

QUERY & RESULT
Documents

PROCESSING
FILE Portals

CONNECTORS
CONNECTORS

TRAVERSE

PROCESSING
DOCUMENT
Multimedia R
Results Custom
Front-Ends
Databases DATABASE
CONNECTO
R FILTER Pipeline Alert
Custom Content Mobile
Applications Push Devices

Index Files

38.

How it works
Connect to content sources and get data
Web pages (e.g. XML, HTML, WML): Crawler
Files, documents (e.g. Word, Excel, pdf): File traverser
Database content (e.g. Oracle, DB2): Database
connectors
Applications (e.g. Sharepoint, Documentum, Exchange,
CMS/DMS): Application connectors

TUNING,
ADMINISTRATION
Web
Content
WEB Vertical
Pipeline Pipeline
Query Applications
CRAWLE
Files, R
Documents SEARCH

QUERY & RESULT
FILE Portals
CONNECTORS

CONNECTORS
PROCESSING
PROCESSING

TRAVERSE
DOCUMENT

Multimedia R
Results Custom
DATABASE Front-Ends
Databases
CONNECTO
R FILTER Alert

Index Files

39.

How it works
Analyze and index content to make it searchable
Convert and process content through pre-
processing pipeline:
Lemmatization/stemming, entity extraction, taxonomy
classification
Custom logic (e.g. adding special tags)
Write content to index files
TUNING,
ADMINISTRATION
Web
Content
WEB Vertical
Pipeline Pipeline
Query Applications
CRAWLE
R
Files, SEARCH
Documents

QUERY /RESULT
PROCESSING
FILE Portals
CONNECTORS

CONNECTORS
PROCESSING

TRAVERSE
DOCUMENT

Multimedia R
Results Custom
DATABASE Front-Ends
Databases
CONNECTO
R FILTER Pipeline Alert

Index Files

40.

Search Engine
How It Works

Analyze query
Use query language or query API
Convert and process query through query
pipeline:
Linguistic processing
Custom logic (e.g. query term modification/addition)

TUNING,
Web ADMINISTRATION
Content
WEB Vertical
Pipeline Pipeline
Query Applications
CRAWLE
Files, R
Documents SEARCH

PROCESSING
FILE Portals

QUERY
CONNECTORS

CONNECTORS
PROCESSING

TRAVERSE
DOCUMENT

Multimedia
R
Results Custom
DATABASE Front-Ends
Databases
CONNECTO
R FILTER Alert

Index Files

41.

How it works
Match query to content index
Query- and content adaptive matching
Exploit all information and structure in the data

TUNING,
ADMINISTRATION
Web
Content
WEB Vertical
Pipeline Pipeline
Query Applications
CRAWLE
R
Files, SEARCH

QUERY /RESULT
Documents

PROCESSING
FILE Portals
CONNECTORS

CONNECTORS
PROCESSING

TRAVERSE
DOCUMENT

Multimedia R
Results Custom
DATABASE Front-Ends
Databases
CONNECTO
R Pipeline Alert
FILTER

Index Files

42.

How it works
Return results to user
Convert and process results through result pipeline:
Resort, filter for security, organize for dynamic drilldown
Pass results on to application (generated or through
API)
Push results to alert engine and then external
environment (e.g. mail, queue)
TUNING,
ADMINISTRATION
Web
Content
WEB Vertical
Pipeline
Query Applications
CRAWLE
R
Files,

PROCESSING
SEARCH
Documents

RESULT
FILE Portals
CONNECTORS

CONNECTORS
PROCESSING

TRAVERSE
DOCUMENT

Multimedia R
Results Custom
DATABASE Front-Ends
Databases
CONNECTO
R Pipeline Alert
FILTER

Index Files

43.

How it works
Federated Search: Relies on the indexes and the relevance
algorithms of the under laying search engines

45.

the Enterprise Search market

46.

The Enterprise Search Market
What‘s the vendors focus?
Business Intelligence
Text-mining (linguistic support!)
E-Commerce
Image/Video: Visual Information retrieval
Audio/Video: speech recognition
eDiscovery
…

47.

Enterprise search products can be:
Specialized — products that use search to address a need in a
specific area like customer service or to supplement business
intelligence platforms
Integrated — products that merge search capabilities with other
information management functions like content management,
collaboration or analytics; the goal of these products is to become
deeply ingrained in the technology portfolio so that the use of the tool
becomes a ubiquitous part of the information workplace
Detached — products like Google‘s appliance focused on ease of
deployment and flexibility

48.

Forrester (september 2011) evaluated twelve
vendors/products in its Market Overview (not including open
source):
Autonomy IDOL 7  Acquired by HP
Attivio AIE 1.3
Coveo Platform 6.5
Endeca Latitude 2  Acquired by Oracle
Exalead CloudView 5.1
Fabsoft Mindbreeze 5.0
Google Search Appliance 6.8
IBM Content Analytics with Enterprise Search 2.2
ISYS Enterprise Server v9.7
Microsoft FAST Search for SharePoint Server 2010
Sinequa ES 7
Vivisimo Velocity 8.0

49.

Important Trends
Social and collaborative features
Mobile support
Audio/Video
Cloud
Spatial support
Semantics/text analytics
Search Based Applications
(―SBA‖)

50.

Wrap up
Search Technology platforms are mature and are available on
the market in abundance and multiple flavors.

But,
make sure you are:
Cost-effective (what‘s the business case? Priorities?)
Consistent in Content classification and Governance
Continuously monitoring usage and improving relevance
Clever & Pragmatic
Creative (User interface, multi-device)

51.

Optimize Enterprise Content with a Powerful Search Solution

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (20)

Semelhante a Optimize Enterprise Content with a Powerful Search Solution

Semelhante a Optimize Enterprise Content with a Powerful Search Solution (20)

Mais de Amplexor

Mais de Amplexor (20)

Último

Último (20)

Optimize Enterprise Content with a Powerful Search Solution

Notas do Editor