SlideShare uma empresa Scribd logo
1 de 50
Taxonomies in SearchAn SLA Webinar Aug 10, 1:00pm-2:00pm EST Marjorie Hlava, President mhlava@accessinn.com Access Innovations, Inc.  www.accessinn.com Leveraging your content semantically
Agenda How search works Measuring accuracy in search Precision Recall Relevance Search theoretical basis Bayes, Boole and the rest of the guys The taxonomy effect
How does search work? Many parts Search software – of course Computer network Parsing of text Well formed or structured text CLEAN DATA Computer software – network Computer hardware Telecommunications connection Training sets for statistical systems
Technical parts of search Search technology Ranking algorithms Query language Federators Cache Inverted index Other enhancements Presentation Layer
My Main Frustration Select hardware Select software Design system Try to load the data Add the taxonomy That’s BACKWARDS
Data First! What are you building the system for? Assess the data Do the design Decide what else needs to be added Taxonomy terms Other controls Find a system that will work with your data
Access Innovations – Complex FarmWith Perfect Search Query Federators Query Servers Search Harmony Presentation  Layer Deploy Hub Index  Builders Cleanup, etc. Repository XIS (cache) Cache  Builders Source Data
CUSTOM CONNECTOR EMAIL CONNECTOR DATABASE CONNECTOR FILE TRAVERSER WEB CRAWLER MANAGEMENT API QUERY  API CONTENT API Data Harmony Governance API SEARCH SERVER FILTERSERVER FAST Search example Core Architectural Components Administrator’s Dashboard Web Content Vertical Applications Pipeline Query Pipeline Files, Documents QUERY PROCESSOR Portals Index DB Databases DOCUMENT PROCESSOR Results Custom Front-Ends Alerts Email,  Groupware Search harmony Mobile Devices Custom Applications Content Push MAIstro Agent DB
Measuring accuracy in search Relevance Recall Precision Accuracy – Hits, miss, noise Ranking Linguistics Query Processing Results Processing Display Search refinement Usability Business Rules 9
Relevance How well a set of returned documents answers the information need “Accuracy” Related to objective of search Different user communities Information resources Tension of user needs and context available A confidence “guessimate” 10
The formulas Recall = Number of relevant items retrieved         Number of relevant items in the collection Precision = Number of relevant items retrieved            Number of items retrieved Relevance = Germane (Precision)                      Pertinent (Recall)
Measuring Relevance Concepts  Context Age of documents  Completeness (recall)  Quality Statistically determined ? Nope, it is subjective  Someone has to determine the rightness of the item A confidence factor = canard!
Kinds of search Bayesian –  FAST Lucene Autonomy / Verity Boolean Dialog Endeca Perfect Search Ranking algorithms Google 13
Search Theoretical BasisThose Famous Guys Boole Bayes Bayesian Techniques Turney Turney algorithm Enriched structured data Marco Dorigo Ant Colony This is only a sample   of a large body of research
George Boole and Boolean algebra George Boole Mathematician 1815-1864 Boolean algebra An algebraic system of logic  AND, OR, NOT, ANDNOT,  Dialog, BRS, Stairs 15
Boolean representation Venn diagram showing the intersection of sets A AND B (in violet),  The union of sets A OR B (all the colored regions),  And set A XOR B (all the colored regions except the violet).  The "universe" is represented by the rectangular frame. 16
Bayes and Bayes’ Theorem Thomas Bayes Mathematician 1702 - 1761 Bayesian theorem  Uses probability inductively  Established a mathematical basis for probability inference  WHAT? A means of calculating,  from the number of times an event has not occurred,  the probability that it will occur in future trials 17
Bayesian methods - Cautions A user might wish to change the distribution of probabilities.  A user will make a novel request for information in a previously unanticipated way. The computational difficulty of exploring a previously unknown network.  The quality and extent of the prior beliefs used in Bayesian inference processing.
Bayesian cautions (cont.) A Bayesian network is only as useful as the prior knowledge is reliable.  An optimistic or pessimistic expectation of the quality of these prior beliefs will distort the entire network and invalidate the results.  Must ensure the selection of the statistical distribution induced in modeling the data.  Must have the proper distribution model to describe the data. That is you have to constantly train and retrain the data
Peter Turney and the Turney Algorithm Peter D. Turney, Canada, present Learning algorithms for keyphraseextraction Tree Induction Algorithm Lexical Semantics GenEx – with human input 80% acceptable Extraction vs. generation and sentiment of words          (hits(word AND "excellent") hits (poor))log2 ----------------------------------------         (hits(word AND "poor") hits (excellent))
Marco Dorigo and Ant Colony Optimization Marco Dorigo Research director for the Belgian Fonds de la RechercheScientifique Research director of the IRIDIA lab at the UniversitéLibre de Bruxelles Ant Colony Optimization  metaheuristicfor combinatorial optimization problems Swarm intelligence Value importance vs. heuristic importance Useful in search prediction 21
Natural Language Processing Syntactic Semantic Morphological Phraseological Lemmatization (stemming) Statistical Grammatical Common Sense
Basic areas of Automatic Language Processing (ALP) Auto Translation Auto Indexing Auto Abstracting Artificial Intelligence Searching Spell Checking Semantic Web Natural Language Processes (NLP) Computational Linguistics
Statistical Search  Cluster analysis Neural networks Co-occurrence Bayesian inference Latent Semantic  Etc. 24
Inverted Files and Boolean  are basic to all search  Searchable Index Inverted File Index Taxonomy Thesaurus Hierarchical Display
Sample Slide for Inverted File Index Demonstration Outline of Presentation ,[object Object]
Thesaurus tools Features Functions ,[object Object],Thesaurus construction Thesaurus tools ,[object Object],[object Object]
Complex Inverted File Index Example 1 key - L2, P2, H of - Stop outline - L1, P1, T presentation - L1, P3, T terminology - L2, P3, H thesaurus - (1) - L3, P1, H     (2) - L7, P1, SH     (3) - L8, P1, SH tools - (1) - L3, P2, H      (2) - L8, P2, SH when - L9, P3, H why - L9, P1, H & - Stop 1 - Stop 2 - Stop 3 - Stop 4 - Stop construction - L7, P2, SH  costs - L6, P1, H define - L2, P1, H features - L4, P1, SH functions - L5, P1, SH
Word and Term Parsing Stemming -ing, -ed, -es, -’s, -s’, etc.  Depluralization Truncation Left and right Wild cards Organi*ation Variant Spellings Centre, center Hyphens
The taxonomy effect Where do the terms go? How are they used in search What other ways can I use the taxonomy in search?
Site search Search of 53 crawled sites including journals, books, web site,  conference sites, etc. Navigation  Bookstore search  Search database for Journals and pubs For search all publications
Navigate the full taxonomy “tree” BROWSE Auto-completion using the taxonomy Guide the user Taxonomy Driven Search Presentation
A quick look behind the scenes Database Management System ,[object Object]
Validate term entry
Block invalid terms
Record candidates
Establish rules for 	term use ,[object Object],	terms Thesaurus tool Indexing tool ,[object Object]
Add terms and rules
Change terms and rules
Delete terms and rules,[object Object]
Where does the subject metadata go? Apply to content itself Use meta name field in HTML header Connect search to the keywords in the SQL or other database tables
HTML Header
RDBMS Connection Taxonomy term table
Suggested taxonomy descriptors
Integrate taxonomy to enhance findability Browsable categories of a directory Browsable faceted navigation Smart search for term equivalents Taxonomy terms (original or modified) as labels Navigation aids incorporate taxonomy terms and relationships
More Taxonomy Enrichment Spelling alternatives and correction Related concepts Statistical information about the metadata Navigation or drill downs Search refinement Recursive sets Concept linking Dictionary lookup (in taxonomy glossary)
Brand is repeated in several spots and tied to search as well
Raw Full text data feeds  Data Base Plus Search Workflow  XIS Creation SQL for ecommerce Printed source  materials Add metadata Data Crawls on 53+ sources XIS repository  Taxonomy terms  Load to Perfect Search MAI Concept Extractor Taxonomy Thesaurus Master MAI Rule Base Search Harmony Display  Search   Save data to search and repositories at the same time
Raw Full text data feeds  Data Base Plus Search Workflow  XIS Creation SQL for ecommerce Printed source  materials XIS repository  Data Crawls on data sources Add metadata Load to Search MAI Concept Extractor MAI Rule Base Search Harmony Display  Search   Taxonomy Thesaurus Master Source data Taxonomy terms  Search data Clean and enhance data

Mais conteúdo relacionado

Mais procurados

Enhancing Relevancy & User Experience with SharePoint Search - SPSBMORE 2015
Enhancing Relevancy & User Experience with SharePoint Search - SPSBMORE 2015Enhancing Relevancy & User Experience with SharePoint Search - SPSBMORE 2015
Enhancing Relevancy & User Experience with SharePoint Search - SPSBMORE 2015Gina Montgomery, V-TSP
 
Information architecture search_bettertogether
Information architecture search_bettertogetherInformation architecture search_bettertogether
Information architecture search_bettertogetherAgnes Molnar
 
SPSBOS -- How your metadata strategy impacts everything you do
SPSBOS -- How your metadata strategy impacts everything you doSPSBOS -- How your metadata strategy impacts everything you do
SPSBOS -- How your metadata strategy impacts everything you doChristian Buckley
 
How your metadata strategy impacts everything you do
How your metadata strategy impacts everything you doHow your metadata strategy impacts everything you do
How your metadata strategy impacts everything you doChristian Buckley
 
Enterprise Search Using SharePoint 2010 and FAST
Enterprise Search Using SharePoint 2010 and FASTEnterprise Search Using SharePoint 2010 and FAST
Enterprise Search Using SharePoint 2010 and FASTBert Johnson
 
Introduction To Enterprise Search - OKCSUG 2010
Introduction To Enterprise Search - OKCSUG 2010Introduction To Enterprise Search - OKCSUG 2010
Introduction To Enterprise Search - OKCSUG 2010Corey Roth
 
Metadata management in SharePoint
Metadata management in SharePointMetadata management in SharePoint
Metadata management in SharePointMetataxis
 
OnePlaceMail 6.6 for Outlook and SharePoint Highlights
OnePlaceMail 6.6 for Outlook and SharePoint HighlightsOnePlaceMail 6.6 for Outlook and SharePoint Highlights
OnePlaceMail 6.6 for Outlook and SharePoint HighlightsDavid J Rosenthal
 
Smartlogic, Semaphore and Semantically Enhanced Search – For “Discovery”
Smartlogic, Semaphore and Semantically Enhanced Search –  For “Discovery”Smartlogic, Semaphore and Semantically Enhanced Search –  For “Discovery”
Smartlogic, Semaphore and Semantically Enhanced Search – For “Discovery”VOGIN-academie
 
Enhancing Relevancy & User Experience with #SharePoint Search sps-philly 2015
Enhancing Relevancy & User Experience with #SharePoint Search   sps-philly 2015Enhancing Relevancy & User Experience with #SharePoint Search   sps-philly 2015
Enhancing Relevancy & User Experience with #SharePoint Search sps-philly 2015Gina Montgomery, V-TSP
 
3 25 11 Term Store Best Practices
3 25 11 Term Store Best Practices3 25 11 Term Store Best Practices
3 25 11 Term Store Best Practicespuckmiller3
 
Implementing Semantic Search
Implementing Semantic SearchImplementing Semantic Search
Implementing Semantic SearchPaul Wlodarczyk
 
BPC10 BuckleyMigration-share
BPC10 BuckleyMigration-shareBPC10 BuckleyMigration-share
BPC10 BuckleyMigration-shareChristian Buckley
 
Taxonomies and Metadata in Information Architecture
Taxonomies and Metadata in Information ArchitectureTaxonomies and Metadata in Information Architecture
Taxonomies and Metadata in Information ArchitectureAccess Innovations, Inc.
 
Advanced Taxonomy for Content Strategists
Advanced Taxonomy for Content StrategistsAdvanced Taxonomy for Content Strategists
Advanced Taxonomy for Content StrategistsDawn Bovasso
 
SharePoint 2010 Managed Metadata
SharePoint 2010 Managed MetadataSharePoint 2010 Managed Metadata
SharePoint 2010 Managed MetadataNick Hobbs
 

Mais procurados (20)

Enhancing Relevancy & User Experience with SharePoint Search - SPSBMORE 2015
Enhancing Relevancy & User Experience with SharePoint Search - SPSBMORE 2015Enhancing Relevancy & User Experience with SharePoint Search - SPSBMORE 2015
Enhancing Relevancy & User Experience with SharePoint Search - SPSBMORE 2015
 
Information architecture search_bettertogether
Information architecture search_bettertogetherInformation architecture search_bettertogether
Information architecture search_bettertogether
 
SPSBOS -- How your metadata strategy impacts everything you do
SPSBOS -- How your metadata strategy impacts everything you doSPSBOS -- How your metadata strategy impacts everything you do
SPSBOS -- How your metadata strategy impacts everything you do
 
KMA Taxonomy TBC2010
KMA Taxonomy TBC2010KMA Taxonomy TBC2010
KMA Taxonomy TBC2010
 
How your metadata strategy impacts everything you do
How your metadata strategy impacts everything you doHow your metadata strategy impacts everything you do
How your metadata strategy impacts everything you do
 
Enterprise Search Using SharePoint 2010 and FAST
Enterprise Search Using SharePoint 2010 and FASTEnterprise Search Using SharePoint 2010 and FAST
Enterprise Search Using SharePoint 2010 and FAST
 
Introduction To Enterprise Search - OKCSUG 2010
Introduction To Enterprise Search - OKCSUG 2010Introduction To Enterprise Search - OKCSUG 2010
Introduction To Enterprise Search - OKCSUG 2010
 
Metadata management in SharePoint
Metadata management in SharePointMetadata management in SharePoint
Metadata management in SharePoint
 
OnePlaceMail 6.6 for Outlook and SharePoint Highlights
OnePlaceMail 6.6 for Outlook and SharePoint HighlightsOnePlaceMail 6.6 for Outlook and SharePoint Highlights
OnePlaceMail 6.6 for Outlook and SharePoint Highlights
 
Share point metadata
Share point metadataShare point metadata
Share point metadata
 
Managed metadata in SharePoint 2010
Managed metadata in SharePoint 2010Managed metadata in SharePoint 2010
Managed metadata in SharePoint 2010
 
Smartlogic, Semaphore and Semantically Enhanced Search – For “Discovery”
Smartlogic, Semaphore and Semantically Enhanced Search –  For “Discovery”Smartlogic, Semaphore and Semantically Enhanced Search –  For “Discovery”
Smartlogic, Semaphore and Semantically Enhanced Search – For “Discovery”
 
Enhancing Relevancy & User Experience with #SharePoint Search sps-philly 2015
Enhancing Relevancy & User Experience with #SharePoint Search   sps-philly 2015Enhancing Relevancy & User Experience with #SharePoint Search   sps-philly 2015
Enhancing Relevancy & User Experience with #SharePoint Search sps-philly 2015
 
3 25 11 Term Store Best Practices
3 25 11 Term Store Best Practices3 25 11 Term Store Best Practices
3 25 11 Term Store Best Practices
 
Implementing Semantic Search
Implementing Semantic SearchImplementing Semantic Search
Implementing Semantic Search
 
BPC10 BuckleyMigration-share
BPC10 BuckleyMigration-shareBPC10 BuckleyMigration-share
BPC10 BuckleyMigration-share
 
Taxonomies and Metadata in Information Architecture
Taxonomies and Metadata in Information ArchitectureTaxonomies and Metadata in Information Architecture
Taxonomies and Metadata in Information Architecture
 
Tools for Taxonomies
Tools for TaxonomiesTools for Taxonomies
Tools for Taxonomies
 
Advanced Taxonomy for Content Strategists
Advanced Taxonomy for Content StrategistsAdvanced Taxonomy for Content Strategists
Advanced Taxonomy for Content Strategists
 
SharePoint 2010 Managed Metadata
SharePoint 2010 Managed MetadataSharePoint 2010 Managed Metadata
SharePoint 2010 Managed Metadata
 

Destaque

ecdl_windows_8_office_2013_biblia_minta
ecdl_windows_8_office_2013_biblia_mintaecdl_windows_8_office_2013_biblia_minta
ecdl_windows_8_office_2013_biblia_mintaKrist P
 
Yousef Aburub - Resume - Final(1)
Yousef Aburub - Resume - Final(1)Yousef Aburub - Resume - Final(1)
Yousef Aburub - Resume - Final(1)Yousef Aburub
 
Trend 2025 tanyer sonmezer finansbank
Trend 2025 tanyer sonmezer finansbankTrend 2025 tanyer sonmezer finansbank
Trend 2025 tanyer sonmezer finansbankTanyer Sonmezer
 
Instructor powerpoint
Instructor powerpointInstructor powerpoint
Instructor powerpointtanglin
 
Tvigle & Media3 - NOAH13 London
Tvigle & Media3 - NOAH13 LondonTvigle & Media3 - NOAH13 London
Tvigle & Media3 - NOAH13 LondonNOAH Advisors
 
Wonder Woman Feminist Icon 1031 (3)
Wonder Woman Feminist Icon 1031 (3)Wonder Woman Feminist Icon 1031 (3)
Wonder Woman Feminist Icon 1031 (3)Simmons Jessie
 
It’s A Trap! Don't fly into budget airline traps!
It’s A Trap! Don't fly into budget airline traps!It’s A Trap! Don't fly into budget airline traps!
It’s A Trap! Don't fly into budget airline traps!Joyce Lim
 
CIF16: Knock, Knock: Unikernels Calling! (Richard Mortier, Cambridge University)
CIF16: Knock, Knock: Unikernels Calling! (Richard Mortier, Cambridge University)CIF16: Knock, Knock: Unikernels Calling! (Richard Mortier, Cambridge University)
CIF16: Knock, Knock: Unikernels Calling! (Richard Mortier, Cambridge University)The Linux Foundation
 
الفجوة الرقمية
الفجوة الرقميةالفجوة الرقمية
الفجوة الرقميةimi zeghmati
 
CIF16: Unikernels, Meet Docker! Containing Unikernels (Richard Mortier, Anil ...
CIF16: Unikernels, Meet Docker! Containing Unikernels (Richard Mortier, Anil ...CIF16: Unikernels, Meet Docker! Containing Unikernels (Richard Mortier, Anil ...
CIF16: Unikernels, Meet Docker! Containing Unikernels (Richard Mortier, Anil ...The Linux Foundation
 
Etude Prêt à porter féminin : Qu'est ce qui incite les femmes à se rendre en ...
Etude Prêt à porter féminin : Qu'est ce qui incite les femmes à se rendre en ...Etude Prêt à porter féminin : Qu'est ce qui incite les femmes à se rendre en ...
Etude Prêt à porter féminin : Qu'est ce qui incite les femmes à se rendre en ...Clotilde Chenevoy
 
XPDS16: CPUID handling for guests - Andrew Cooper, Citrix
XPDS16:  CPUID handling for guests - Andrew Cooper, CitrixXPDS16:  CPUID handling for guests - Andrew Cooper, Citrix
XPDS16: CPUID handling for guests - Andrew Cooper, CitrixThe Linux Foundation
 
AWS Lambda from the Trenches
AWS Lambda from the TrenchesAWS Lambda from the Trenches
AWS Lambda from the TrenchesYan Cui
 
plan educacion_artistica
plan educacion_artisticaplan educacion_artistica
plan educacion_artisticaGermán oña
 

Destaque (18)

Owuor paradigm
Owuor paradigmOwuor paradigm
Owuor paradigm
 
ecdl_windows_8_office_2013_biblia_minta
ecdl_windows_8_office_2013_biblia_mintaecdl_windows_8_office_2013_biblia_minta
ecdl_windows_8_office_2013_biblia_minta
 
Yousef Aburub - Resume - Final(1)
Yousef Aburub - Resume - Final(1)Yousef Aburub - Resume - Final(1)
Yousef Aburub - Resume - Final(1)
 
Trend 2025 tanyer sonmezer finansbank
Trend 2025 tanyer sonmezer finansbankTrend 2025 tanyer sonmezer finansbank
Trend 2025 tanyer sonmezer finansbank
 
Instructor powerpoint
Instructor powerpointInstructor powerpoint
Instructor powerpoint
 
Tvigle & Media3 - NOAH13 London
Tvigle & Media3 - NOAH13 LondonTvigle & Media3 - NOAH13 London
Tvigle & Media3 - NOAH13 London
 
Wonder Woman Feminist Icon 1031 (3)
Wonder Woman Feminist Icon 1031 (3)Wonder Woman Feminist Icon 1031 (3)
Wonder Woman Feminist Icon 1031 (3)
 
Conceptboek Grill & Chill
Conceptboek Grill & ChillConceptboek Grill & Chill
Conceptboek Grill & Chill
 
5 Signs You Are In A Waterfall Agile Transformation
5 Signs You Are In A Waterfall Agile Transformation5 Signs You Are In A Waterfall Agile Transformation
5 Signs You Are In A Waterfall Agile Transformation
 
It’s A Trap! Don't fly into budget airline traps!
It’s A Trap! Don't fly into budget airline traps!It’s A Trap! Don't fly into budget airline traps!
It’s A Trap! Don't fly into budget airline traps!
 
CCNAS Ch01
CCNAS Ch01 CCNAS Ch01
CCNAS Ch01
 
CIF16: Knock, Knock: Unikernels Calling! (Richard Mortier, Cambridge University)
CIF16: Knock, Knock: Unikernels Calling! (Richard Mortier, Cambridge University)CIF16: Knock, Knock: Unikernels Calling! (Richard Mortier, Cambridge University)
CIF16: Knock, Knock: Unikernels Calling! (Richard Mortier, Cambridge University)
 
الفجوة الرقمية
الفجوة الرقميةالفجوة الرقمية
الفجوة الرقمية
 
CIF16: Unikernels, Meet Docker! Containing Unikernels (Richard Mortier, Anil ...
CIF16: Unikernels, Meet Docker! Containing Unikernels (Richard Mortier, Anil ...CIF16: Unikernels, Meet Docker! Containing Unikernels (Richard Mortier, Anil ...
CIF16: Unikernels, Meet Docker! Containing Unikernels (Richard Mortier, Anil ...
 
Etude Prêt à porter féminin : Qu'est ce qui incite les femmes à se rendre en ...
Etude Prêt à porter féminin : Qu'est ce qui incite les femmes à se rendre en ...Etude Prêt à porter féminin : Qu'est ce qui incite les femmes à se rendre en ...
Etude Prêt à porter féminin : Qu'est ce qui incite les femmes à se rendre en ...
 
XPDS16: CPUID handling for guests - Andrew Cooper, Citrix
XPDS16:  CPUID handling for guests - Andrew Cooper, CitrixXPDS16:  CPUID handling for guests - Andrew Cooper, Citrix
XPDS16: CPUID handling for guests - Andrew Cooper, Citrix
 
AWS Lambda from the Trenches
AWS Lambda from the TrenchesAWS Lambda from the Trenches
AWS Lambda from the Trenches
 
plan educacion_artistica
plan educacion_artisticaplan educacion_artistica
plan educacion_artistica
 

Semelhante a Taxonomies in Search: Leveraging Content Semantically

Public PhD Defense - Ben De Meester
Public PhD Defense - Ben De MeesterPublic PhD Defense - Ben De Meester
Public PhD Defense - Ben De MeesterBen De Meester
 
Tovek Presentation by Livio Costantini
Tovek Presentation by Livio CostantiniTovek Presentation by Livio Costantini
Tovek Presentation by Livio Costantinimaxfalc
 
Keyword searching idc
Keyword searching idcKeyword searching idc
Keyword searching idcSuchittaU
 
Changing Data: Implementing Primo for the Tri University Group of Libraries (...
Changing Data: Implementing Primo for the Tri University Group of Libraries (...Changing Data: Implementing Primo for the Tri University Group of Libraries (...
Changing Data: Implementing Primo for the Tri University Group of Libraries (...Alison Hitchens
 
Business Research Methods. search strategies for online databases
Business Research Methods. search strategies for online databasesBusiness Research Methods. search strategies for online databases
Business Research Methods. search strategies for online databasesAhsan Khan Eco (Superior College)
 
E-LEARN: Search Strategies
E-LEARN: Search StrategiesE-LEARN: Search Strategies
E-LEARN: Search StrategiesRose Petralia
 
Tracing Networks: Ontology-based Software in a Nutshell
Tracing Networks: Ontology-based Software in a NutshellTracing Networks: Ontology-based Software in a Nutshell
Tracing Networks: Ontology-based Software in a NutshellTracingNetworks
 
The Internet
The InternetThe Internet
The Internetmscuttle
 
Question Answering over Linked Data - Reasoning Issues
Question Answering over Linked Data - Reasoning IssuesQuestion Answering over Linked Data - Reasoning Issues
Question Answering over Linked Data - Reasoning IssuesMichael Petychakis
 
Data science training in hyderabad
Data science training in hyderabadData science training in hyderabad
Data science training in hyderabadGeohedrick
 
2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_upload2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_uploadProf. Wim Van Criekinge
 
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and VocabulariesHaystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and VocabulariesMax Irwin
 
NLP Tasks and Applications.ppt useful in
NLP Tasks and Applications.ppt useful inNLP Tasks and Applications.ppt useful in
NLP Tasks and Applications.ppt useful inKumari Naveen
 
lect36-tasks.ppt
lect36-tasks.pptlect36-tasks.ppt
lect36-tasks.pptHaHa501620
 
EVALution 1.0 - An Evolving Semantic Dataset for Trainining and Evaluation of...
EVALution 1.0 - An Evolving Semantic Dataset for Trainining and Evaluation of...EVALution 1.0 - An Evolving Semantic Dataset for Trainining and Evaluation of...
EVALution 1.0 - An Evolving Semantic Dataset for Trainining and Evaluation of...Enrico Santus Aversano
 
Faceted search using Solr and Ontopia
Faceted search using Solr and OntopiaFaceted search using Solr and Ontopia
Faceted search using Solr and OntopiaGeir Ove Grønmo
 

Semelhante a Taxonomies in Search: Leveraging Content Semantically (20)

Public PhD Defense - Ben De Meester
Public PhD Defense - Ben De MeesterPublic PhD Defense - Ben De Meester
Public PhD Defense - Ben De Meester
 
Tovek Presentation by Livio Costantini
Tovek Presentation by Livio CostantiniTovek Presentation by Livio Costantini
Tovek Presentation by Livio Costantini
 
Keyword searching idc
Keyword searching idcKeyword searching idc
Keyword searching idc
 
Changing Data: Implementing Primo for the Tri University Group of Libraries (...
Changing Data: Implementing Primo for the Tri University Group of Libraries (...Changing Data: Implementing Primo for the Tri University Group of Libraries (...
Changing Data: Implementing Primo for the Tri University Group of Libraries (...
 
Business Research Methods. search strategies for online databases
Business Research Methods. search strategies for online databasesBusiness Research Methods. search strategies for online databases
Business Research Methods. search strategies for online databases
 
Searching techniques
Searching techniquesSearching techniques
Searching techniques
 
Searching techniques
Searching techniquesSearching techniques
Searching techniques
 
E-LEARN: Search Strategies
E-LEARN: Search StrategiesE-LEARN: Search Strategies
E-LEARN: Search Strategies
 
Tracing Networks: Ontology-based Software in a Nutshell
Tracing Networks: Ontology-based Software in a NutshellTracing Networks: Ontology-based Software in a Nutshell
Tracing Networks: Ontology-based Software in a Nutshell
 
2017 biological databases_part1_vupload
2017 biological databases_part1_vupload2017 biological databases_part1_vupload
2017 biological databases_part1_vupload
 
The Internet
The InternetThe Internet
The Internet
 
Question Answering over Linked Data - Reasoning Issues
Question Answering over Linked Data - Reasoning IssuesQuestion Answering over Linked Data - Reasoning Issues
Question Answering over Linked Data - Reasoning Issues
 
2020 02 11_biological_databases_part1
2020 02 11_biological_databases_part12020 02 11_biological_databases_part1
2020 02 11_biological_databases_part1
 
Data science training in hyderabad
Data science training in hyderabadData science training in hyderabad
Data science training in hyderabad
 
2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_upload2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_upload
 
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and VocabulariesHaystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
 
NLP Tasks and Applications.ppt useful in
NLP Tasks and Applications.ppt useful inNLP Tasks and Applications.ppt useful in
NLP Tasks and Applications.ppt useful in
 
lect36-tasks.ppt
lect36-tasks.pptlect36-tasks.ppt
lect36-tasks.ppt
 
EVALution 1.0 - An Evolving Semantic Dataset for Trainining and Evaluation of...
EVALution 1.0 - An Evolving Semantic Dataset for Trainining and Evaluation of...EVALution 1.0 - An Evolving Semantic Dataset for Trainining and Evaluation of...
EVALution 1.0 - An Evolving Semantic Dataset for Trainining and Evaluation of...
 
Faceted search using Solr and Ontopia
Faceted search using Solr and OntopiaFaceted search using Solr and Ontopia
Faceted search using Solr and Ontopia
 

Mais de TSoholt

2011 Taxonomy Standards Update
2011 Taxonomy Standards Update2011 Taxonomy Standards Update
2011 Taxonomy Standards UpdateTSoholt
 
Dealing the Cards
Dealing the CardsDealing the Cards
Dealing the CardsTSoholt
 
Found in Space: Creating and Visualizing IEEE Abstract Space for Publication ...
Found in Space: Creating and Visualizing IEEE Abstract Space for Publication ...Found in Space: Creating and Visualizing IEEE Abstract Space for Publication ...
Found in Space: Creating and Visualizing IEEE Abstract Space for Publication ...TSoholt
 
Using KOS as a Basis for Text Analytics and Trend Forecasting
Using KOS as a Basis for Text Analytics and Trend ForecastingUsing KOS as a Basis for Text Analytics and Trend Forecasting
Using KOS as a Basis for Text Analytics and Trend ForecastingTSoholt
 
Solving the Challenge of Connecting People and Author Networks
Solving the Challenge of Connecting People and Author NetworksSolving the Challenge of Connecting People and Author Networks
Solving the Challenge of Connecting People and Author NetworksTSoholt
 
Taxonomies for Publishing: Enhancing the User Experience
Taxonomies for Publishing: Enhancing the User ExperienceTaxonomies for Publishing: Enhancing the User Experience
Taxonomies for Publishing: Enhancing the User ExperienceTSoholt
 

Mais de TSoholt (6)

2011 Taxonomy Standards Update
2011 Taxonomy Standards Update2011 Taxonomy Standards Update
2011 Taxonomy Standards Update
 
Dealing the Cards
Dealing the CardsDealing the Cards
Dealing the Cards
 
Found in Space: Creating and Visualizing IEEE Abstract Space for Publication ...
Found in Space: Creating and Visualizing IEEE Abstract Space for Publication ...Found in Space: Creating and Visualizing IEEE Abstract Space for Publication ...
Found in Space: Creating and Visualizing IEEE Abstract Space for Publication ...
 
Using KOS as a Basis for Text Analytics and Trend Forecasting
Using KOS as a Basis for Text Analytics and Trend ForecastingUsing KOS as a Basis for Text Analytics and Trend Forecasting
Using KOS as a Basis for Text Analytics and Trend Forecasting
 
Solving the Challenge of Connecting People and Author Networks
Solving the Challenge of Connecting People and Author NetworksSolving the Challenge of Connecting People and Author Networks
Solving the Challenge of Connecting People and Author Networks
 
Taxonomies for Publishing: Enhancing the User Experience
Taxonomies for Publishing: Enhancing the User ExperienceTaxonomies for Publishing: Enhancing the User Experience
Taxonomies for Publishing: Enhancing the User Experience
 

Último

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 

Último (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 

Taxonomies in Search: Leveraging Content Semantically

  • 1. Taxonomies in SearchAn SLA Webinar Aug 10, 1:00pm-2:00pm EST Marjorie Hlava, President mhlava@accessinn.com Access Innovations, Inc. www.accessinn.com Leveraging your content semantically
  • 2. Agenda How search works Measuring accuracy in search Precision Recall Relevance Search theoretical basis Bayes, Boole and the rest of the guys The taxonomy effect
  • 3. How does search work? Many parts Search software – of course Computer network Parsing of text Well formed or structured text CLEAN DATA Computer software – network Computer hardware Telecommunications connection Training sets for statistical systems
  • 4. Technical parts of search Search technology Ranking algorithms Query language Federators Cache Inverted index Other enhancements Presentation Layer
  • 5. My Main Frustration Select hardware Select software Design system Try to load the data Add the taxonomy That’s BACKWARDS
  • 6. Data First! What are you building the system for? Assess the data Do the design Decide what else needs to be added Taxonomy terms Other controls Find a system that will work with your data
  • 7. Access Innovations – Complex FarmWith Perfect Search Query Federators Query Servers Search Harmony Presentation Layer Deploy Hub Index Builders Cleanup, etc. Repository XIS (cache) Cache Builders Source Data
  • 8. CUSTOM CONNECTOR EMAIL CONNECTOR DATABASE CONNECTOR FILE TRAVERSER WEB CRAWLER MANAGEMENT API QUERY API CONTENT API Data Harmony Governance API SEARCH SERVER FILTERSERVER FAST Search example Core Architectural Components Administrator’s Dashboard Web Content Vertical Applications Pipeline Query Pipeline Files, Documents QUERY PROCESSOR Portals Index DB Databases DOCUMENT PROCESSOR Results Custom Front-Ends Alerts Email, Groupware Search harmony Mobile Devices Custom Applications Content Push MAIstro Agent DB
  • 9. Measuring accuracy in search Relevance Recall Precision Accuracy – Hits, miss, noise Ranking Linguistics Query Processing Results Processing Display Search refinement Usability Business Rules 9
  • 10. Relevance How well a set of returned documents answers the information need “Accuracy” Related to objective of search Different user communities Information resources Tension of user needs and context available A confidence “guessimate” 10
  • 11. The formulas Recall = Number of relevant items retrieved Number of relevant items in the collection Precision = Number of relevant items retrieved Number of items retrieved Relevance = Germane (Precision) Pertinent (Recall)
  • 12. Measuring Relevance Concepts Context Age of documents Completeness (recall) Quality Statistically determined ? Nope, it is subjective Someone has to determine the rightness of the item A confidence factor = canard!
  • 13. Kinds of search Bayesian – FAST Lucene Autonomy / Verity Boolean Dialog Endeca Perfect Search Ranking algorithms Google 13
  • 14. Search Theoretical BasisThose Famous Guys Boole Bayes Bayesian Techniques Turney Turney algorithm Enriched structured data Marco Dorigo Ant Colony This is only a sample of a large body of research
  • 15. George Boole and Boolean algebra George Boole Mathematician 1815-1864 Boolean algebra An algebraic system of logic AND, OR, NOT, ANDNOT, Dialog, BRS, Stairs 15
  • 16. Boolean representation Venn diagram showing the intersection of sets A AND B (in violet), The union of sets A OR B (all the colored regions), And set A XOR B (all the colored regions except the violet). The "universe" is represented by the rectangular frame. 16
  • 17. Bayes and Bayes’ Theorem Thomas Bayes Mathematician 1702 - 1761 Bayesian theorem Uses probability inductively Established a mathematical basis for probability inference WHAT? A means of calculating, from the number of times an event has not occurred, the probability that it will occur in future trials 17
  • 18. Bayesian methods - Cautions A user might wish to change the distribution of probabilities. A user will make a novel request for information in a previously unanticipated way. The computational difficulty of exploring a previously unknown network. The quality and extent of the prior beliefs used in Bayesian inference processing.
  • 19. Bayesian cautions (cont.) A Bayesian network is only as useful as the prior knowledge is reliable. An optimistic or pessimistic expectation of the quality of these prior beliefs will distort the entire network and invalidate the results. Must ensure the selection of the statistical distribution induced in modeling the data. Must have the proper distribution model to describe the data. That is you have to constantly train and retrain the data
  • 20. Peter Turney and the Turney Algorithm Peter D. Turney, Canada, present Learning algorithms for keyphraseextraction Tree Induction Algorithm Lexical Semantics GenEx – with human input 80% acceptable Extraction vs. generation and sentiment of words          (hits(word AND "excellent") hits (poor))log2 ----------------------------------------         (hits(word AND "poor") hits (excellent))
  • 21. Marco Dorigo and Ant Colony Optimization Marco Dorigo Research director for the Belgian Fonds de la RechercheScientifique Research director of the IRIDIA lab at the UniversitéLibre de Bruxelles Ant Colony Optimization metaheuristicfor combinatorial optimization problems Swarm intelligence Value importance vs. heuristic importance Useful in search prediction 21
  • 22. Natural Language Processing Syntactic Semantic Morphological Phraseological Lemmatization (stemming) Statistical Grammatical Common Sense
  • 23. Basic areas of Automatic Language Processing (ALP) Auto Translation Auto Indexing Auto Abstracting Artificial Intelligence Searching Spell Checking Semantic Web Natural Language Processes (NLP) Computational Linguistics
  • 24. Statistical Search Cluster analysis Neural networks Co-occurrence Bayesian inference Latent Semantic Etc. 24
  • 25. Inverted Files and Boolean are basic to all search Searchable Index Inverted File Index Taxonomy Thesaurus Hierarchical Display
  • 26.
  • 27.
  • 28. Complex Inverted File Index Example 1 key - L2, P2, H of - Stop outline - L1, P1, T presentation - L1, P3, T terminology - L2, P3, H thesaurus - (1) - L3, P1, H (2) - L7, P1, SH (3) - L8, P1, SH tools - (1) - L3, P2, H (2) - L8, P2, SH when - L9, P3, H why - L9, P1, H & - Stop 1 - Stop 2 - Stop 3 - Stop 4 - Stop construction - L7, P2, SH costs - L6, P1, H define - L2, P1, H features - L4, P1, SH functions - L5, P1, SH
  • 29. Word and Term Parsing Stemming -ing, -ed, -es, -’s, -s’, etc. Depluralization Truncation Left and right Wild cards Organi*ation Variant Spellings Centre, center Hyphens
  • 30. The taxonomy effect Where do the terms go? How are they used in search What other ways can I use the taxonomy in search?
  • 31. Site search Search of 53 crawled sites including journals, books, web site, conference sites, etc. Navigation Bookstore search Search database for Journals and pubs For search all publications
  • 32. Navigate the full taxonomy “tree” BROWSE Auto-completion using the taxonomy Guide the user Taxonomy Driven Search Presentation
  • 33.
  • 37.
  • 40.
  • 41. Where does the subject metadata go? Apply to content itself Use meta name field in HTML header Connect search to the keywords in the SQL or other database tables
  • 45.
  • 46. Integrate taxonomy to enhance findability Browsable categories of a directory Browsable faceted navigation Smart search for term equivalents Taxonomy terms (original or modified) as labels Navigation aids incorporate taxonomy terms and relationships
  • 47. More Taxonomy Enrichment Spelling alternatives and correction Related concepts Statistical information about the metadata Navigation or drill downs Search refinement Recursive sets Concept linking Dictionary lookup (in taxonomy glossary)
  • 48. Brand is repeated in several spots and tied to search as well
  • 49. Raw Full text data feeds Data Base Plus Search Workflow XIS Creation SQL for ecommerce Printed source materials Add metadata Data Crawls on 53+ sources XIS repository Taxonomy terms Load to Perfect Search MAI Concept Extractor Taxonomy Thesaurus Master MAI Rule Base Search Harmony Display Search Save data to search and repositories at the same time
  • 50. Raw Full text data feeds Data Base Plus Search Workflow XIS Creation SQL for ecommerce Printed source materials XIS repository Data Crawls on data sources Add metadata Load to Search MAI Concept Extractor MAI Rule Base Search Harmony Display Search Taxonomy Thesaurus Master Source data Taxonomy terms Search data Clean and enhance data
  • 51. Client Data Full Text HTML, PDF, Data Feeds, etc. Taxonomy In Sharepoint Automatic Summarization Search Presentation:90% accuracy Browse by Subject Auto-completion Broader Terms Narrower Terms Related Terms Machine Aided Indexer (M.A.I.™) Repository Search Software Inline Tagging Client taxonomy Client Taxonomy Metadata and Entity Extractor Thesaurus Master
  • 52. What we covered How search works Measuring accuracy in search Search theoretical basis Bayes, Boole and the rest of the guys The taxonomy effect
  • 53. Do the data FIRST What do you have? What does it need? How would you LIKE to access it? Look at the data BEFORE you create the specifications DTD built without data is not going to work Then choose the system that will support your data
  • 54. Next Month Same time, same station Solving the Challenge of Connecting People and Author NetworksJay Ven Eman, Ph.D.September 14As online digital publishing continues to grow, taxonomies can be increasingly useful in connecting people with author networks through directory creation with author disambiguation and subject metadata tagging to increase the usefulness of information for researchers and community-building.
  • 55.
  • 56. Headquartered in Albuquerque
  • 58.