Organizations are beginning to recognize that search is not a stand-alone technology or application, but must be integrated with business processes and corporate objectives as a key infrastructure component.
Why? Providing enriched metadata to the search engine index significantly improves search applications, eDiscovery, FOIA requests, and collaboration.
In this webinar COMPU-DATA International and Concept Searching will demonstrate their combined offering that uses unique, language independent technology and integrated enterprise metadata repository management, to deliver intelligent metadata enabled search.
What you will learn about during this session:
• How our innovative technology delivers both high precision and high recall, using industry unique compound term processing
• How to accomplish federated search as content is created or ingested
• How to enable true concept based searching
• How to eliminate end user tagging
• How to integrate the combined solution with any search engine including SharePoint, the former FAST products, Google Search Appliance, IBM Vivisimo, and Solr
• How the combined solution can be extended to address records identification, protection of privacy information, migration, and text analytics with the same technology
• Benefit from industry-specific use cases:
• Developing a powerful search solution for the US Army, creating easy access to millions of records, with an integrated solution to consolidate many data sources, accessing high volumes of data
• Solving search, migration, records management, and data privacy challenges to manage the intranet for a global company which designs, manufactures, and distributes appliances to more than 70 countries
Powerpoint exploring the locations used in television show Time Clash
How to Get Enterprise Search Right with Automatic Metadata
1. How to Get Enterprise Search Right
Juan J. Celaya
President & CEO
COMPU-DATA International, LLC
jcelaya@cdlac.com
Twitter @conceptsearch
Ken Lemons
VP Federal Programs
Concept Searching
kenl@conceptsearching.com
2. Expert Speakers
Ken Lemons – VP Federal Programs at Concept Searching
has over 25 years’ experience in the IT industry, with a track record in
consulting, solutions delivery, sales and project management in the federal
sector. He has managed Microsoft consulting practices for several US
government integrators, latterly as VP of Business Development for Air Force
and DoD programs. Ken has provided US DoD agencies with solutions to
address a range of challenges, leveraging a combination of Microsoft and third
party technology solutions.
Juan Celaya – President and CEO at COMPU-DATA International
founded COMPU-DATA International, LLC in 1988 and has been successfully
delivering content and data integration solutions for ECM implementations
through the use of capture, administration, collaboration, retrieval technologies
and solutions. Juan has a broad Information Technology background covering
30 years of experience, with successful system implementations in Energy,
Healthcare, Pharmaceutical and Transportation companies, the US Department
of Defense and the National Nuclear Security Administration.
3. Agenda
• Introductions
• Concept Searching
• Our Approach and Technologies
• The Challenge of Search
• Products
• COMPU-DATA International
• Why Searching Raw Corporate Content is not Enough
• The Managed Metadata Environment – Your Organization’s “Controlled Vocabulary”
• CDI’s “IGT” Model – A simple approach to the Managed Medata Environment
• Initial Setup Sample
• Rules
• Taxonomy
• Leveraging Automatically Applied Metadata to Deliver
• Improved Search Results
• Utilizing the Combined Power of
• Classification and Taxonomy Management Technology
4. • Company founded in 2002
• Product launched in 2003
• Focus on management of structured and unstructured information
• Technology Platform
• Delivered as a web service
• Automatic concept identification, content tagging, auto-classification,
taxonomy management
• Only statistical vendor that can extract conceptual metadata
• 2009, 2010, 2011, 2012, 2013 ‘100 Companies that Matter in KM’
(KMWorld Magazine) and Trend Setting product of 2009, 2010, 2011, 2012,
2013
• Authority to Operate enterprise wide US Air Force and enterprise wide
NETCON US Army
• Locations: US, UK, and South Africa
• Client base: Fortune 500/1000 organizations
• Managed Partner under Microsoft global ISV Program - ‘go to partner’
for Microsoft for auto-classification and taxonomy management
• Smart Content Framework™ for Information Governance
• Product Suite: conceptSearch, conceptTaxonomyManager, conceptClassifier,
conceptClassifier for SharePoint, conceptTaxonomyWorkflow, conceptContentTypeUpdater for SharePoint
Concept Searching – The Industry Leader in
Managed Metadata Solutions
5. • Metadata driven application and enforcement of policies - conceptClassifier has been
deployed since 2003 to automatically generate metadata and use that metadata to apply and enforce
policies. Most clients are using the platform to support their information governance strategy.
• Proven, mature functionality out of the box - The platform has been deployed in numerous sites
and applications across the enterprise, including MOSS and SharePoint 2010, 2013, Office 365, Stellent,
Documentum, SQL, Oracle, File Shares, Exchange via SharePoint and across the enterprise.
Smart Content Framework™
Getting It Right
6. • Concept Searching’s statistical concept identification underpins all technologies
• Multi-word suggestion is explicitly more valuable than single term suggestion algorithms
• conceptClassifier will generate conceptual metadata by
extracting multi-word terms that identify ‘triple heart bypass’
as a concept as opposed to single keywords
• conceptTaxonomyManager uses statistical concept
identification to provide real-time feedback during the process
of building, testing, refining, and deploying taxonomies
• Metadata can be used by any search engine index or any
application/process that uses metadata.
Concept Searching
provides Automatic
Concept Term Extraction
Triple
Baseball
Three
Heart
Organ
Center
Bypass
Highway
Avoid
Industry Unique Technology
7. A Manual Metadata Approach Will Fail 95%+ Of The Time
Issue Organizational Impact
Inconsistent Less than 50% of content is correctly indexed, meta-tagged or
efficiently searchable rendering it unusable to the organization. (IDC)
Risky 59% of middle managers miss valuable information every day
because they can’t find it or never see it (Accenture)
Cumbersome - expensive Average cost of manually tagging one item runs from $4 - $7 per
document and does not factor in the accuracy of the meta tags nor the
repercussions from mis-tagged content. (Hoovers)
Malicious compliance End users select first value in list.
(Perspectives on Metadata, Sarah Courier)
No perceived value for end user What’s in it for me? End user does not see value for organization nor
risks associated with litigation and non- conformance to policies. Less
than 14% of end users receive training. (AIIM)
What have you seen Metadata will continue to be a problem due to inconsistent human
behavior.
The answer to consistent metadata is an automated approach that can extract the meaning
from content eliminating manual metadata generation yet still providing the ability to manage
knowledge assets in alignment with the unique corporate knowledge infrastructure.
Manual Approach Leads to Failure
8. Learning to Search
• Searchers do not know “how to search”
• 56% constructed poor queries
• Proficiency with the machine does not translate
into proficiency with the software
• Searchers get lost in the data
• 33% had difficulty navigating/orienting search
results
• 28% had difficulty maintaining orientation on a
website
• Loss of capacity for discernment
• 36% did not go beyond the first 3 search results
• (not pages…results on page 1)
• 91% did not go beyond the first page of search
results
• 55% selected irrelevant results 1 or more times
9. • Enterprise versus Internet search is a different animal
• In the enterprise end users know the information is there
if they could only find it
• As a result, they will spend more time and aggravation
looking for that one asset and don’t want to give up
(IDC)
• Enterprise end users expect information to be found
within 4 minutes but will actually spend 2.5 hours per
day looking for information
• How do enterprise users overcome poor search results?
• Recreate information
• Use outdated or older versions of information
• Interrupt a co-worker
• Forget about finding the information
• Just don’t start the task
The Typical Search Approach
10. The Hidden Costs of Search
“There is a debilitating disconnect between the proliferation of electronic information and the
constant need to quickly and accurately find all of the information and expertise that is
essential for work every day. From top to bottom, enterprises have failed to take seriously the
high cost of being grossly inadequate at finding information, data, documents, experts. Instead
they have settled for low performance, low-return techniques to… sort of handle Search.”
Julie Hunt - Search Consultant
The cost to a 500 employee company is
$2.4 million per year in inefficiencies
and lost productivity.
Gartner Group
12. • People explore concepts – computers find keywords
• Recall versus Precision
• Location Search
• User knows what they are seeking
• Search engine must retrieve exactly and only the information required
• Discovery Search
• User does not know precisely what they are seeking
• Search engine must retrieve content that “appears” to answer the query
• Search engine must be able to accommodate both types of searching
• The hierarchy provided by a taxonomy addresses the two different search
approaches. Location based searches appear simple, but in fact are not.
• If the end user does not immediately find what they are looking for, they can use the
hierarchical structure to drill down by searching the concepts or taxonomy nodes.
• Outcome: Identify associations and relationships that are typically not obvious in
searching
• More relevant information being found more quickly
• Accessing inter-related ideas and concepts supports a fundamental change in
user focus and activity and transforms it from searching to insight and discovery
Taxonomy Navigation
13. It’s Not Just About Search!!
• Data Privacy
• Records Management
• Migration
• Enterprise Content Management
• Information Governance
• Legal, eDiscovery, FOIA
• Collaboration/Social
• Text Analytics
14. conceptClassifier for SharePoint
• conceptClassifier for SharePoint
• Combination of automatic classification, taxonomy management and Concept Searching’s
APIs packaged for delivery into the SharePoint environment
• Single code able to be deployed with SharePoint 2007, 2010, 2013, and Office 365
• Provides clients with the choice of on-premise, cloud based, or hybrid solutions to best meet
their needs
• Integrates with any search engine (SharePoint, former FAST products, Google Search
Appliance, etc.)
• Classifies content as it is created or ingested from diverse repositories within and outside of
SharePoint
• conceptTaxonomyWorkflow
• Optional component that can perform an action on a document following a classification
decision when the criteria are met
• Built with a plug-in architecture enabling the simple development of content sources
• Uses records retention codes, semantic, and security metadata associated to data assets to
identify and process the automatic application of content types
• Once documents have the appropriate content type, based on natural language and
automatically applied metadata, workflows can be initiated.
• Workflow source type works in the SharePoint 2007, 2010, and 2013 as well as for all
document types, FILE document types, and HTTP document types
15. conceptClassifier for Office 365
• Runs natively and bi-directionally with the SharePoint Term Store in any
environment
• Portability – ubiquitous access to information regardless of where it resides or
how it is stored
• Provides “intelligent” migration capabilities
• Enables management of one term store for on premise and Office 365 use
• Maintains GUIDs
• Delivers enterprise class automatic document classification for all SharePoint,
FILE, and HTTP document types
• Protects records and confidential information from inadvertently being place in
the cloud to avoid data breaches and unauthorized access to information
• Enables concept based search and retrieval integrated with Microsoft search
solutions
• Provides a method to enable text analytics from multiple data sources without
impacting on-premise server utilization
16. Preserving
the
World's Knowledge
Available Anytime Anywhere
SM
Getting
Enterprise
Search Right
by Using
Your Own
Vocabulary
For Indexing
and
Searching
34. Taxonomy Management
Enabling the Automatic Meta-tagging and Auto-Classification of Documents and Records
Each node is a piece of metadata that gets tagged to a document or record based upon the
prevalence of a clue within the document
Manually Created Metadata associated
with the concept of “Weather”
Distribution Statement A: Approved for public release; distribution is unlimited
311 ABG/PA No. 09-488, 16 Oct 2009
35. Automatic Metadata Generation
Unique IP of Compound Term Processing enables the identification of compound terms
(not keywords) from highly relevant content that can be used to trigger the automatic
meta-tagging and auto-classification processes
Automatically Generated
Metadata associated with
the concept of “Weather”
Distribution Statement A: Approved for public release; distribution is unlimited
311 ABG/PA No. 09-488, 16 Oct 2009
36. Automatic Metadata Generation
Automatically generated metadata is added to original metadata for the category/folder
Outcome: more semantics that can be linked to a document or record result in information that
becomes more actionable (the document/record is now retrievable and classifiable)
Highly relevant metadata generated by
Taxonomy Manager added to original clue
set for the concept of “Weather”
Distribution Statement A: Approved for public release; distribution is unlimited
311 ABG/PA No. 09-488, 16 Oct 2009
37. Automatic Meta-tagging
Metatags are automatically added to the properties field of each document
enhancing the document’s valuable to the organization by increasing
the ability of the document to be retrieved using enterprise search
solutions that use keywords and metadata to retrieve information
38. Automatic Meta-tagging in Action
One of the Metatags for the document was “Turbulence Encounter” however when
we search for this term within the document we do not find it
Why did this happen?
39. Automatic Meta-tagging in Action
Turbulence Encounter is only one of 4 “clues” that must exist within a document in order for
that document to be automatically meta-tagged with the concept of Turbulence Encounter
Distribution Statement A: Approved for public release; distribution is unlimited
311 ABG/PA No. 09-488, 16 Oct 2009
40. Automatic Meta-tagging in Action
When we search the document using another clue for Turbulence Encounter, “Windshear”,
we see that its existence within the document triggered the automated meta-tagging event
that resulted in the document being tagged with “Turbulence Encounter”
41. Automatic Meta-tagging
Metatags are automatically added to the properties field of each document
making the document more valuable to the organization by increasing
the ability of the document to be retrieved using enterprise search
solutions that use keywords and metadata to retrieve information
44. • US Army - Records Management and Declassification Agency (RMDA)
• Solution
• An integrated search solution to consolidate over 20 data sources and
databases into just a few repositories, with access to high data volumes
• Benefits
• Centralized data standardization
• Increased findability
• Terabyte size data store support
• Increased productivity
• Scalable
Use Case - RMDA
45. • A global company which designs, manufactures, and distributes
appliances to more than 70 countries
• Solution
• An intranet and content management solution to improve search,
sensitive/confidential information protection, and records identification
and tagging
• Benefits
• Accurate search
• Data privacy
• Effective records management
Use Case – Global Appliance Company
47. Thank You
Juan J. Celaya
President & CEO
COMPU-DATA International, LLC
jcelaya@cdlac.com
Twitter @conceptsearch
Ken Lemons
VP Federal Programs
Concept Searching
kenl@conceptsearching.com