Enterprise search promises to make all organizational content findable through a single search interface. However, simply implementing search technology is not enough - content must be well-organized and metadata-rich. Effective enterprise search requires strategies to classify content, extract metadata, govern information, and ensure search interfaces meet user needs across multiple devices. The enterprise search market offers specialized and integrated solutions from various vendors, with trends including social, mobile, and semantic search capabilities.
3. Contents
search
The enterprise search promise
Some thoughts on search scenarios
Make your content “findable”
Search: How it works
The enterprise search market
3.
7. The Enterprise Search Promise
IDC 2001:‖The High Cost of Not Finding Information”
Cost=
Poor decisions based on faulty or poor information
Duplicated efforts within different divisions/projects
Lost sales due to customer‘s inability to find product and services
Lost productivity due to employees inability to find information
7.
10. The Invisible Intranet
Using Search on an Intranet usually leaves a huge portion of
existing valuable information ‗invisible‘, because
Some information silos are not indexed:
Databases with structured content
External sources
Isolated departmental content repositories
Individual desktops
Content applications ‗in the cloud‘
Digital Archives
Some Information is ―over-secured‖
Some Information is trapped in proprietary file formats, which can not
be indexed
Some Information can not be extracted as text
Rich Media files (Audio, Video)
Badly scanned documents
10.
12. The Enterprise Search Promise
…
MAIL SEARCH
SITE SEARCH
DMS SEARCH
ECOMMERCE
CORPORATE
BI SEARCH
SEARCH
SEARCH
Enterprise Search
Platform
Legacy Data RDBMS Files WWW Direct
(e.g. ISAM, (JDBC, ODBC, (e.g. Word, Excel, (HTML, XML, WML, Push
VSAM, IMS) SQLNet, DW, pdf, images, mp3) JavaScript)
DM)
Applications Message Queues
(e.g. ERM, CRM, DMS eMail Systems Portals Private Webs (e.g. TIBCO,
Help Desk) (e.g. M’Soft CMS, (e.g. Notes, (e.g. WebSphere, (e.g. news feeds, MQ-Series)
Documentum) Exchange) WebLogic) Intranets)
REAL--TIME
STRUCTURED UNSTRUCTURED
1 12.
13. The Enterprise Search Promise
―There’s no reason to expect that search is going to get that
much better. The basic algorithms by which search is
done have not improved much since about 1975.
The only way to improve the situation is by enhancing
search engines with more deterministic metadata.
If you look at the victory of Google, it wasn’t because they
had better search techniques. It’s because they deployed
one key metadata value – how many pages are linked to
this one – to enhance the relevancy of their results.
The same concepts need to be applied to the enterprise.‖
(Tim Bray)
1 13.
15. Enterprise versus web search
Web Enterprise
Content Mainly HTML and All formats and
PDF sources, including
databases and
legacy systems
Security Focus on system Also restricting
security user access to
specific content
Updates Via (scheduled) Push updates to
crawling the index (near
real time)
Volume On average: 1000 Potentially: >
files 1.000.000
“records”
Metadata Centrally in e.g. Consolidate
Web CMS metadata from
management various source
systems
15.
18. Search versus research
“Meeting “Ecm and Green
minutes social IT in Europe”
collaboration
project” “Amplexor “average time
proposal for spent on
Intranet” searching for
content”
“Does ECM have
“Timesheets impact on
april 2009” governmental
decisions in Spain?”
“Life is like a box of chocolates
“I know you’re out there..”
You never know what you gonna
18.
19. Search versus research
Search based on
“Meeting Information Type (Meeting minutes,
minutes social Proposal, Invoice, Timesheet, …)
collaboration Document Format (PDF, DOC, PPT, e-
project” mail, …)
Organisational Source
Projects
Products
Processes
– HR
– Compliance
– Marketing
– IT
– …
…
Publication Date, Modification date
Author
Search queries are more or less
predictable (after analysis) 19.
20. Search versus research
Research based on
Entities:
People
Geographical locations “Does ECM have
Companies & Brands impact on
… governmental
Source: Internal or External decisions in Spain?”
Publication Date Range
Natural language search
Search queries are unpredictable. The
system should be “taught” how to
interpret a query. (natural language
search, entity extraction from content,
… 20.
21. Metadata
What is metadata?
Information about the information:
Descriptive
Structural
Administrative
Types of metadata:
Implicit (e.g. creation date, publication date, URL, filename, file format, source
system, …)
Explicit (e.g. owner, topic, summary, expiry date, status, …)
Guiding metadata input with:
Taxonomies
Folksonomies
Ontologies
21.
24. Ontologies
Taxonomies, representing knowledge as a set of concepts
within a domain, and the relationships between those
concepts
http://en.wikipedia.org/wiki/Geopolit
24.
25. Metadata
Statement 1: ―A performant Enterprise Search Engine should
not require information workers to add metadata. It should
just Crawl all my information sources‖
But:
Will users understand the
results displayed?
(title, author, …
How will they filter results?
Does it really help to crawl
1.000.000 records
if 900.000 have become
irrelevant over time?
25.
29. Metadata
Statement 3: Adding metadata is so time consuming my
information workers will never do it.
Yes, but:
In an structured ECM approach, it is possible to automate lots of the
metadata input, because it can be deduced from some business rules
If you‘re not 100% sure you will need a metadata field for a specific
purpose, then don‘t create it.
Convince users about the value of the metadata fields which remain
Make it user friendly for content contributors to add metadata
29.
30. Metadata
Avoid defining metadata around the document, if it should
already be present IN the document.
30.
32. Findability
Findability is not obtained just by implementing search
technology
AIIM.org: ―Information Organization and Access (IOA) refers
to a collection of technologies to help you organize and find
information‖, which includes:
enterprise search
content classification
categorization and clustering
fact and entity extraction
taxonomy creation and management
information presentation (i.e., visualization)
information governance
32.
33. Findability Tips & Tricks
The more value content has, the more effort should be spent
in managing it (and making it findable)
33.
34. Findability Tips & Tricks
One search interface doesn‘t solve it all. Keep in mind that
Specific content sources or Lines of Business might require
specialized search screens
34.
35. Findability Tips & Tricks
Define specific search scopes, if your information
governance permits …
35.
36. Findability Tips & Tricks
Landing Pages are still
―in‖!
Projects Overview Page
Knowledge base page
(links to knowledge bases)
Practical Guide
(categorized hyperlinks to
practical information)
Tools
Forms
Filtered listings (e.g.
Automatic listing of all FAQ
Content types)
36.
38. How it works
Architecture
TUNING,
ADMINISTRATION
Web
Content Vertical
WEB Pipeline Pipeline
Query Applications
CRAWLER
Files, SEARCH
QUERY & RESULT
Documents
PROCESSING
FILE Portals
CONNECTORS
CONNECTORS
TRAVERSE
PROCESSING
DOCUMENT
Multimedia R
Results Custom
Front-Ends
Databases DATABASE
CONNECTO
R FILTER Pipeline Alert
Custom Content Mobile
Applications Push Devices
Index Files
38.
39. How it works
Connect to content sources and get data
Web pages (e.g. XML, HTML, WML): Crawler
Files, documents (e.g. Word, Excel, pdf): File traverser
Database content (e.g. Oracle, DB2): Database
connectors
Applications (e.g. Sharepoint, Documentum, Exchange,
CMS/DMS): Application connectors
TUNING,
ADMINISTRATION
Web
Content
WEB Vertical
Pipeline Pipeline
Query Applications
CRAWLE
Files, R
Documents SEARCH
QUERY & RESULT
FILE Portals
CONNECTORS
CONNECTORS
PROCESSING
PROCESSING
TRAVERSE
DOCUMENT
Multimedia R
Results Custom
DATABASE Front-Ends
Databases
CONNECTO
R FILTER Alert
Custom Content Mobile
Applications Push Devices
Index Files
39.
40. How it works
Analyze and index content to make it searchable
Convert and process content through pre-
processing pipeline:
Lemmatization/stemming, entity extraction, taxonomy
classification
Custom logic (e.g. adding special tags)
Write content to index files
TUNING,
ADMINISTRATION
Web
Content
WEB Vertical
Pipeline Pipeline
Query Applications
CRAWLE
R
Files, SEARCH
Documents
QUERY /RESULT
PROCESSING
FILE Portals
CONNECTORS
CONNECTORS
PROCESSING
TRAVERSE
DOCUMENT
Multimedia R
Results Custom
DATABASE Front-Ends
Databases
CONNECTO
R FILTER Pipeline Alert
Custom Content Mobile
Applications Push Devices
Index Files
40.
41. Search Engine
How It Works
Analyze query
Use query language or query API
Convert and process query through query
pipeline:
Linguistic processing
Custom logic (e.g. query term modification/addition)
TUNING,
Web ADMINISTRATION
Content
WEB Vertical
Pipeline Pipeline
Query Applications
CRAWLE
Files, R
Documents SEARCH
PROCESSING
FILE Portals
QUERY
CONNECTORS
CONNECTORS
PROCESSING
TRAVERSE
DOCUMENT
Multimedia
R
Results Custom
DATABASE Front-Ends
Databases
CONNECTO
R FILTER Alert
Custom Content Mobile
Applications Push Devices
Index Files
41.
42. How it works
Match query to content index
Query- and content adaptive matching
Exploit all information and structure in the data
TUNING,
ADMINISTRATION
Web
Content
WEB Vertical
Pipeline Pipeline
Query Applications
CRAWLE
R
Files, SEARCH
QUERY /RESULT
Documents
PROCESSING
FILE Portals
CONNECTORS
CONNECTORS
PROCESSING
TRAVERSE
DOCUMENT
Multimedia R
Results Custom
DATABASE Front-Ends
Databases
CONNECTO
R Pipeline Alert
FILTER
Custom Content Mobile
Applications Push Devices
Index Files
42.
43. How it works
Return results to user
Convert and process results through result pipeline:
Resort, filter for security, organize for dynamic drilldown
Pass results on to application (generated or through
API)
Push results to alert engine and then external
environment (e.g. mail, queue)
TUNING,
ADMINISTRATION
Web
Content
WEB Vertical
Pipeline
Query Applications
CRAWLE
R
Files,
PROCESSING
SEARCH
Documents
RESULT
FILE Portals
CONNECTORS
CONNECTORS
PROCESSING
TRAVERSE
DOCUMENT
Multimedia R
Results Custom
DATABASE Front-Ends
Databases
CONNECTO
R Pipeline Alert
FILTER
Custom Content Mobile
Applications Push Devices
Index Files
43.
47. The Enterprise Search Market
What‘s the vendors focus?
Business Intelligence
Text-mining (linguistic support!)
E-Commerce
Image/Video: Visual Information retrieval
Audio/Video: speech recognition
eDiscovery
…
47.
48. The Enterprise Search Market
Enterprise search products can be:
Specialized — products that use search to address a need in a
specific area like customer service or to supplement business
intelligence platforms
Integrated — products that merge search capabilities with other
information management functions like content management,
collaboration or analytics; the goal of these products is to become
deeply ingrained in the technology portfolio so that the use of the tool
becomes a ubiquitous part of the information workplace
Detached — products like Google‘s appliance focused on ease of
deployment and flexibility
48.
49. The Enterprise Search Market
Forrester (september 2011) evaluated twelve
vendors/products in its Market Overview (not including open
source):
Autonomy IDOL 7 Acquired by HP
Attivio AIE 1.3
Coveo Platform 6.5
Endeca Latitude 2 Acquired by Oracle
Exalead CloudView 5.1
Fabsoft Mindbreeze 5.0
Google Search Appliance 6.8
IBM Content Analytics with Enterprise Search 2.2
ISYS Enterprise Server v9.7
Microsoft FAST Search for SharePoint Server 2010
Sinequa ES 7
Vivisimo Velocity 8.0
49.
50. The Enterprise Search Market
Important Trends
Social and collaborative features
Mobile support
Audio/Video
Cloud
Spatial support
Semantics/text analytics
Search Based Applications
(―SBA‖)
50.
51. Wrap up
Search Technology platforms are mature and are available on
the market in abundance and multiple flavors.
But,
make sure you are:
Cost-effective (what‘s the business case? Priorities?)
Consistent in Content classification and Governance
Continuously monitoring usage and improving relevance
Clever & Pragmatic
Creative (User interface, multi-device)
51.
The Open Graph Protocol enables you to integrate your Web pages into the social graph. It is currently designed for Web pages representing profiles of real-world things — things like movies, sports teams, celebrities, and restaurants. Including Open Graph tags on your Web page, makes your page equivalent to a Facebook Page. This means when a user clicks a Like button on your page, a connection is made between your page and the user.
Most companies have different types of ECM needs. The picture above shows the different types of needs in the context of DM/collaboration/search/Intranet.The picture helps in explaining that each “room” above might induce specific technical and functional requirements.Library= Document management, high level of classification and metadata, Archiving might be desired, search is essentialTeam Rooms= collaboration spaces, less classification, more fuzzy, but many to many, collaborative editing might be needed. Some content from here might be moved to the library at some time.Conference Center= the “classic” Intranet, HR documents, trainings, presentations, ... (this does not require strict DM features. But in some cases, rather WCM features)Expert corner= applications: FAQ, social networking, blogs & wikis, ...Registration Area: security, profiling and personalizationDashboard= portal functionalityData processing= Workflow, Integrations
Doesn’t include opensource
SBA: software applications in which a search engine platform is used as the core infrastructure for information access and reporting