SlideShare a Scribd company logo
1 of 1
Download to read offline
weighting influenced by parent, child and sib-

A New Concept in Search                                                                                ling can reduce taxonomy development and
                                                                                                       on-going maintenance by 66%-80%.

                                                                                                       Semantic Metadata Generation
                                            By John Challis, CEO, CTO, Concept Searching
                                                                                                           The metadata generation issue is increas-
                                                                                                       ingly a growing concern in large enterprises.

With                                                                                                   A comprehensive approach that requires more
         the exponential increase in unstruc-     those items that are relevant to the query.          than syntactic metadata and that requires end
tured information, enterprises are seeking        Recall is the retrieval of all items that are rel-   users to add rich metadata is haphazard and
new ways to improve not only the search           evant to the query. Yet most information             subjective at best. Since the suggested
and retrieval process but to identify tools to    retrieval technologies are less than 22% accu-       approach is no longer restricted to keyword
                                                  rate for both precision and recall. The ideal
manage, capitalize on and leverage their in-                                                           identification, compound-term metadata can
                                                  goal is to have them balanced. Compound
formation assets to improve organizational                                                             be automatically generated either when the
                                                  term processing has the ability to increase
performance. Moving beyond keyword                                                                     content is created or ingested. The generation
                                                  precision with no loss of recall.
identification and traditional taxonomy ap-                                                            of metadata based on concepts extracts com-
proaches, the use of compound term pro-                                                                pound terms and keywords from a document
cessing or identifying “concepts in context”                                                           or corpus of documents that are highly corre-
                                                  Managing Content
effectively addresses the issue of managing                                                            lated to a particular concept. By identifying
                                                     Taxonomy development and mainte-
unstructured content and enables organiza-                                                             the most significant patterns in any text, these
                                                  nance has traditionally been a laborious
tions to more effectively find, organize and                                                           compound terms can then be used to generate
                                                  and on-going challenge, not to mention
manage their information capital.                                                                      non-subjective metadata based on an under-
                                                  costly. The most effective approach is to
    Compound term processing automatically                                                             standing of conceptual meaning.
                                                  use rules-based categorization, providing
identifies the word patterns in unstructured                                                               Compound-term processing is a new
                                                  enterprises complete control of rules-based
text that convey the most meaning and uses                                                             approach to an old problem. Instead of
                                                  descriptors unique to their organization.
these higher order terms to improve precision                                                          identifying single keywords, compound-
                                                  Since all rules can be defined and man-
with no loss of recall. The algorithms adapt to                                                        term processing identifies multi-word
                                                  aged, error-prone results utilizing “train-
each customer’s content and they work in any                                                           terms that form a complex entity and iden-
                                                  ing” algorithms typically found in other
language regardless of vocabulary or linguis-                                                          tifies them as a concept. By forming these
                                                  approaches are eliminated.
tic style. The technology was originally                                                               compound terms and placing them in the
developed by Concept Searching in 2002 and                                                             search engine’s index, the search can be
is similar in many ways to the “phrase-based                                                           performed with a higher degree of accura-
                                                   “Precision and recall
indexing” techniques detailed in various U.S.                                                          cy because the ambiguity inherent in single
patents filed in 2004 and to which Google                                                              words is no longer a problem. As a result, a
                                                      are the two key
subsequently acquired the rights.                                                                      search for “survival rates following a triple
                                                                                                       heart bypass” will locate documents about
                                                       performance                                     this topic even if this precise phrase is not
Keyword Search
                                                                                                       contained in any document. A concept
versus Concept Search
                                                     measurements for                                  search using compound-term processing
                                                                                                       can extract the key concepts, in this case
    Knowledge workers need to identify
                                                                                                       “survival rates” and “triple heart bypass”
content in the context of what they are
                                                  information retrieval.”                              and use these concepts to select the most
seeking. The fundamental problem with
                                                                                                       relevant documents.
most enterprise search solutions, and all
                                                                                                           Compound-term processing can address
statistical search solutions, is that they are        A concept-based automatic classification
                                                                                                       many challenges facing large enterprises and
based on an index of single words. Yet            process identifies, during indexing, the cate-
                                                                                                       provide many benefits. Identification of con-
most queries are expressed in short pat-          gories each document belongs to. Each cate-
                                                                                                       cepts within a large corpus of information
terns of words and not single words in iso-       gory is identified by a unique descriptor and
                                                                                                       removes the ambiguity in search, eliminates
lation which are highly ambiguous.                is associated with key descriptive words
                                                                                                       inconsistent meta-tagging and automatic clas-
    A concept search engine can isolate the       and/or phrases held in the database. This
                                                                                                       sification and taxonomy management based
key meaning that is normally expressed as         approach enables a rapid implementation of
                                                                                                       on concept identification, simplifies develop-
proper nouns, nouns phrases and verb              a corporate taxonomy with all documents
                                                                                                       ment and on-going maintenance. T
phrases. Although linguistic products can         classified to multiple nodes at index time.
do this, their performance is highly vari-        Ideally, the taxonomy can be used to browse          John Challis has had success with several ventures
able depending upon the vocabulary and            the document collection or as a filter when          involving the management of unstructured data. In
language in use. A statistical-based, lan-        running ad hoc searches.                             1990, he founded Imagesolve International which
guage-independent concept search can                  An easy-to-use taxonomy and automatic            became the UK’s leading supplier of document image
accept queries in natural language with the       classification tool creates the framework to         processing and workflow products. In 1995 he launched
user typing words, phrases or whole sen-                                                               ImageFirst Office for BancTec in the US. He was also
                                                  classify content based on concepts to one or
                                                                                                       CTO at Smartlogik, the company behind the world’s first
tences. The system then analyzes the natu-        more nodes in the taxonomy. Features that
                                                                                                       probabilistic search engine.
ral language query to extract the keywords        enable subject-matter experts to interact with
and phrases to identify the main concepts         the taxonomy can simplify on-going mainte-           Providing advanced search, auto-classification, taxono-
and retrieve content that is highly relevant.     nance. For example: automatically generating         my and semantic metadata-tagging solutions, Concept
    Precision and recall are the two key per-     compound term clues from the document                Searching is the first and only statistical search and clas-
formance measurements for information             corpus; dynamically showing the effect               sification company that uses compound-term process-
                                                  of changes on the taxonomy; and class
retrieval. Precision is the retrieval of only                                                          ing to identify concepts within unstructured content.


                                                                                                                      KMWorld October 2008                   S17

More Related Content

More from martingarland

Expert Webinar Series 5: "De-mystifying Content Types - Four Key Content...
Expert Webinar Series 5: "De-mystifying Content Types - Four Key Content...Expert Webinar Series 5: "De-mystifying Content Types - Four Key Content...
Expert Webinar Series 5: "De-mystifying Content Types - Four Key Content...martingarland
 
Expert Webinar Series: SharePoint Governance - Managing Content Sprawl
Expert Webinar Series:  SharePoint Governance - Managing Content SprawlExpert Webinar Series:  SharePoint Governance - Managing Content Sprawl
Expert Webinar Series: SharePoint Governance - Managing Content Sprawlmartingarland
 
Expert Webinar Series 2: Designing Information Architecture for SharePoint: M...
Expert Webinar Series 2: Designing Information Architecture for SharePoint: M...Expert Webinar Series 2: Designing Information Architecture for SharePoint: M...
Expert Webinar Series 2: Designing Information Architecture for SharePoint: M...martingarland
 
Webinar: Business Solutions and Metadata Design
Webinar:  Business Solutions and Metadata DesignWebinar:  Business Solutions and Metadata Design
Webinar: Business Solutions and Metadata Designmartingarland
 
Webinar: Records Management in SharePoint combining Governance with Content T...
Webinar: Records Management in SharePoint combining Governance with Content T...Webinar: Records Management in SharePoint combining Governance with Content T...
Webinar: Records Management in SharePoint combining Governance with Content T...martingarland
 
Concept Searching Overview Google Vs Fast
Concept Searching Overview  Google Vs FastConcept Searching Overview  Google Vs Fast
Concept Searching Overview Google Vs Fastmartingarland
 
Webinar: Does the SharePoint 2010 Term Store Seem Like Alphabet Soup? Find ...
Webinar:  Does the SharePoint 2010 Term Store Seem Like Alphabet Soup?  Find ...Webinar:  Does the SharePoint 2010 Term Store Seem Like Alphabet Soup?  Find ...
Webinar: Does the SharePoint 2010 Term Store Seem Like Alphabet Soup? Find ...martingarland
 
Webinar: The How To Guide For Taxonomies In Share Point
Webinar: The How To Guide For Taxonomies In Share PointWebinar: The How To Guide For Taxonomies In Share Point
Webinar: The How To Guide For Taxonomies In Share Pointmartingarland
 
Webinar - The Swiss Army Knife for SharePoint 2010 – Tagging, Term Store and ...
Webinar - The Swiss Army Knife for SharePoint 2010 – Tagging, Term Store and ...Webinar - The Swiss Army Knife for SharePoint 2010 – Tagging, Term Store and ...
Webinar - The Swiss Army Knife for SharePoint 2010 – Tagging, Term Store and ...martingarland
 
Concept Searching Portal Solutions Search Engine Face Off
Concept Searching Portal Solutions Search Engine Face OffConcept Searching Portal Solutions Search Engine Face Off
Concept Searching Portal Solutions Search Engine Face Offmartingarland
 
Concept Searching Webinar Presentation
Concept Searching Webinar PresentationConcept Searching Webinar Presentation
Concept Searching Webinar Presentationmartingarland
 
ConceptClassifier for SharePoint Turbo Charging the Public Sector
ConceptClassifier for SharePoint Turbo Charging the Public SectorConceptClassifier for SharePoint Turbo Charging the Public Sector
ConceptClassifier for SharePoint Turbo Charging the Public Sectormartingarland
 
conceptClassifier For SharePoint Driving Business Value
conceptClassifier For SharePoint Driving Business ValueconceptClassifier For SharePoint Driving Business Value
conceptClassifier For SharePoint Driving Business Valuemartingarland
 
Concept Searching Webinar
Concept Searching WebinarConcept Searching Webinar
Concept Searching Webinarmartingarland
 
Concept Searching ConceptClassifier For SharePoint
Concept Searching ConceptClassifier For SharePointConcept Searching ConceptClassifier For SharePoint
Concept Searching ConceptClassifier For SharePointmartingarland
 

More from martingarland (15)

Expert Webinar Series 5: "De-mystifying Content Types - Four Key Content...
Expert Webinar Series 5: "De-mystifying Content Types - Four Key Content...Expert Webinar Series 5: "De-mystifying Content Types - Four Key Content...
Expert Webinar Series 5: "De-mystifying Content Types - Four Key Content...
 
Expert Webinar Series: SharePoint Governance - Managing Content Sprawl
Expert Webinar Series:  SharePoint Governance - Managing Content SprawlExpert Webinar Series:  SharePoint Governance - Managing Content Sprawl
Expert Webinar Series: SharePoint Governance - Managing Content Sprawl
 
Expert Webinar Series 2: Designing Information Architecture for SharePoint: M...
Expert Webinar Series 2: Designing Information Architecture for SharePoint: M...Expert Webinar Series 2: Designing Information Architecture for SharePoint: M...
Expert Webinar Series 2: Designing Information Architecture for SharePoint: M...
 
Webinar: Business Solutions and Metadata Design
Webinar:  Business Solutions and Metadata DesignWebinar:  Business Solutions and Metadata Design
Webinar: Business Solutions and Metadata Design
 
Webinar: Records Management in SharePoint combining Governance with Content T...
Webinar: Records Management in SharePoint combining Governance with Content T...Webinar: Records Management in SharePoint combining Governance with Content T...
Webinar: Records Management in SharePoint combining Governance with Content T...
 
Concept Searching Overview Google Vs Fast
Concept Searching Overview  Google Vs FastConcept Searching Overview  Google Vs Fast
Concept Searching Overview Google Vs Fast
 
Webinar: Does the SharePoint 2010 Term Store Seem Like Alphabet Soup? Find ...
Webinar:  Does the SharePoint 2010 Term Store Seem Like Alphabet Soup?  Find ...Webinar:  Does the SharePoint 2010 Term Store Seem Like Alphabet Soup?  Find ...
Webinar: Does the SharePoint 2010 Term Store Seem Like Alphabet Soup? Find ...
 
Webinar: The How To Guide For Taxonomies In Share Point
Webinar: The How To Guide For Taxonomies In Share PointWebinar: The How To Guide For Taxonomies In Share Point
Webinar: The How To Guide For Taxonomies In Share Point
 
Webinar - The Swiss Army Knife for SharePoint 2010 – Tagging, Term Store and ...
Webinar - The Swiss Army Knife for SharePoint 2010 – Tagging, Term Store and ...Webinar - The Swiss Army Knife for SharePoint 2010 – Tagging, Term Store and ...
Webinar - The Swiss Army Knife for SharePoint 2010 – Tagging, Term Store and ...
 
Concept Searching Portal Solutions Search Engine Face Off
Concept Searching Portal Solutions Search Engine Face OffConcept Searching Portal Solutions Search Engine Face Off
Concept Searching Portal Solutions Search Engine Face Off
 
Concept Searching Webinar Presentation
Concept Searching Webinar PresentationConcept Searching Webinar Presentation
Concept Searching Webinar Presentation
 
ConceptClassifier for SharePoint Turbo Charging the Public Sector
ConceptClassifier for SharePoint Turbo Charging the Public SectorConceptClassifier for SharePoint Turbo Charging the Public Sector
ConceptClassifier for SharePoint Turbo Charging the Public Sector
 
conceptClassifier For SharePoint Driving Business Value
conceptClassifier For SharePoint Driving Business ValueconceptClassifier For SharePoint Driving Business Value
conceptClassifier For SharePoint Driving Business Value
 
Concept Searching Webinar
Concept Searching WebinarConcept Searching Webinar
Concept Searching Webinar
 
Concept Searching ConceptClassifier For SharePoint
Concept Searching ConceptClassifier For SharePointConcept Searching ConceptClassifier For SharePoint
Concept Searching ConceptClassifier For SharePoint
 

Recently uploaded

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 

Recently uploaded (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 

A New Concept In Search

  • 1. weighting influenced by parent, child and sib- A New Concept in Search ling can reduce taxonomy development and on-going maintenance by 66%-80%. Semantic Metadata Generation By John Challis, CEO, CTO, Concept Searching The metadata generation issue is increas- ingly a growing concern in large enterprises. With A comprehensive approach that requires more the exponential increase in unstruc- those items that are relevant to the query. than syntactic metadata and that requires end tured information, enterprises are seeking Recall is the retrieval of all items that are rel- users to add rich metadata is haphazard and new ways to improve not only the search evant to the query. Yet most information subjective at best. Since the suggested and retrieval process but to identify tools to retrieval technologies are less than 22% accu- approach is no longer restricted to keyword rate for both precision and recall. The ideal manage, capitalize on and leverage their in- identification, compound-term metadata can goal is to have them balanced. Compound formation assets to improve organizational be automatically generated either when the term processing has the ability to increase performance. Moving beyond keyword content is created or ingested. The generation precision with no loss of recall. identification and traditional taxonomy ap- of metadata based on concepts extracts com- proaches, the use of compound term pro- pound terms and keywords from a document cessing or identifying “concepts in context” or corpus of documents that are highly corre- Managing Content effectively addresses the issue of managing lated to a particular concept. By identifying Taxonomy development and mainte- unstructured content and enables organiza- the most significant patterns in any text, these nance has traditionally been a laborious tions to more effectively find, organize and compound terms can then be used to generate and on-going challenge, not to mention manage their information capital. non-subjective metadata based on an under- costly. The most effective approach is to Compound term processing automatically standing of conceptual meaning. use rules-based categorization, providing identifies the word patterns in unstructured Compound-term processing is a new enterprises complete control of rules-based text that convey the most meaning and uses approach to an old problem. Instead of descriptors unique to their organization. these higher order terms to improve precision identifying single keywords, compound- Since all rules can be defined and man- with no loss of recall. The algorithms adapt to term processing identifies multi-word aged, error-prone results utilizing “train- each customer’s content and they work in any terms that form a complex entity and iden- ing” algorithms typically found in other language regardless of vocabulary or linguis- tifies them as a concept. By forming these approaches are eliminated. tic style. The technology was originally compound terms and placing them in the developed by Concept Searching in 2002 and search engine’s index, the search can be is similar in many ways to the “phrase-based performed with a higher degree of accura- “Precision and recall indexing” techniques detailed in various U.S. cy because the ambiguity inherent in single patents filed in 2004 and to which Google words is no longer a problem. As a result, a are the two key subsequently acquired the rights. search for “survival rates following a triple heart bypass” will locate documents about performance this topic even if this precise phrase is not Keyword Search contained in any document. A concept versus Concept Search measurements for search using compound-term processing can extract the key concepts, in this case Knowledge workers need to identify “survival rates” and “triple heart bypass” content in the context of what they are information retrieval.” and use these concepts to select the most seeking. The fundamental problem with relevant documents. most enterprise search solutions, and all Compound-term processing can address statistical search solutions, is that they are A concept-based automatic classification many challenges facing large enterprises and based on an index of single words. Yet process identifies, during indexing, the cate- provide many benefits. Identification of con- most queries are expressed in short pat- gories each document belongs to. Each cate- cepts within a large corpus of information terns of words and not single words in iso- gory is identified by a unique descriptor and removes the ambiguity in search, eliminates lation which are highly ambiguous. is associated with key descriptive words inconsistent meta-tagging and automatic clas- A concept search engine can isolate the and/or phrases held in the database. This sification and taxonomy management based key meaning that is normally expressed as approach enables a rapid implementation of on concept identification, simplifies develop- proper nouns, nouns phrases and verb a corporate taxonomy with all documents ment and on-going maintenance. T phrases. Although linguistic products can classified to multiple nodes at index time. do this, their performance is highly vari- Ideally, the taxonomy can be used to browse John Challis has had success with several ventures able depending upon the vocabulary and the document collection or as a filter when involving the management of unstructured data. In language in use. A statistical-based, lan- running ad hoc searches. 1990, he founded Imagesolve International which guage-independent concept search can An easy-to-use taxonomy and automatic became the UK’s leading supplier of document image accept queries in natural language with the classification tool creates the framework to processing and workflow products. In 1995 he launched user typing words, phrases or whole sen- ImageFirst Office for BancTec in the US. He was also classify content based on concepts to one or CTO at Smartlogik, the company behind the world’s first tences. The system then analyzes the natu- more nodes in the taxonomy. Features that probabilistic search engine. ral language query to extract the keywords enable subject-matter experts to interact with and phrases to identify the main concepts the taxonomy can simplify on-going mainte- Providing advanced search, auto-classification, taxono- and retrieve content that is highly relevant. nance. For example: automatically generating my and semantic metadata-tagging solutions, Concept Precision and recall are the two key per- compound term clues from the document Searching is the first and only statistical search and clas- formance measurements for information corpus; dynamically showing the effect sification company that uses compound-term process- of changes on the taxonomy; and class retrieval. Precision is the retrieval of only ing to identify concepts within unstructured content. KMWorld October 2008 S17