SlideShare uma empresa Scribd logo
1 de 99
The Fundamentals of Enterprise Search KMWorld 2009 Avi Rappoport, Search Tools Consulting www.searchtools.com consult9@searchtools.com www.searchtools.com/slides/kmw09/fundamentals-of-search.html
What’s In This Workshop Overview of enterprise search, in context  Search engine processes Robot spiders, database access Indexing Security Query parsing, retrieval, and relevance ranking Usable search interfaces.  Maintenance and Analytics Methods for choosing a  good search engine Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
About SearchTools  Avi Rappoport is a librarian (MLIS from Berkeley)  Software developer and product manager User interface designer Long-time search consultant Editor & Publisher, www.searchtools.com Search Tools Consulting Search needs analysis and recommendations Enterprise search evaluation  Outsourced search administration  Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Defining Enterprise Search  Large scale web site search  Corporate sites Institutional sites Online stores Intranet search  Crossing departmental lines Opening data silos Extranets Portal Search Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Similarities to Webwide Search  Robot crawlers  HTML over HTTP Scaling to millions of items Distributed processing  Full-text indexing of content Simple query language Relevance ranking of results TF-IDF (term frequency : inverse document frequency) Familiar results list Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Differences from Web Search  Limited scope  A site, set of sites, extranet, or intranet  Few meaningful hyperlinks  Page Rank and link analysis is less useful  Security and access control issues Content in databases, CMSs, etc.  More control Index update scheduling  Some content is very valuable, other is not  No search spam Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Text Search vs. Database Search Indexes multiple content sources Database fields, files, web pages, feeds... Simple search commands instead of SQL Flexible indexing and retrieval Relevance ranking (this is a major issue) Does not compete for database resources  Easy to scale separately from DBMS  New features: spellcheck, auto complete, facets Works in the real world, from eBay to Google  Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Search and Information Architecture  Information Architecture  The art and science of organizing information for access and use. IA work enriches search Creates order and systems Provides standard vocabulary Removes ROT (redundant, obsolete, trivial) Search supplements IA Supports user vocabularies Changes dynamically with new content Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Search and Taxonomy Taxonomy creates categories Labels and metadata Improves quality of search results Additional metadata extremely valuable Search crosses categories  Bypasses ambiguous topic labels Useful for novices  Supports user vocabulary Dynamic updates for new topics Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Search &  Knowledge Management KM is: “The process through which organizations generate value from their intellectual and knowledge-based assets.”   (CIO Magazine) Organizes information, processes and people  Offers collaboration and archiving tools Attempts to regularize implicit knowledge Search mostly matches words  Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Two Main Types of Search  Known-item search  Short queries “Good-enough” answers Exploratory search Research - finding unknowns Scientific, legal, medical, business, sales Conceptual overviews Completeness - all possible relevant items Law enforcement Medicine Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
All people see are the search box and results list Invisible functionality  Indexes Query processing Retrieval Relevance ranking Search is a mystery  But it’s just software  Search as an Iceberg  Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Elements of Search Engines  Automated tools to collect content  Specialized storage for quick retrieval Query processing and expansion  Retrieval (matching query to index content) Relevance ranking Search results interfaces  Analytics, metrics and maintenance  Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Choosing Content To Index  Information sites  Consider indexing every single page Use search indexing as a discovery mechanism Online stores, catalogs  Product information: cost, color, size, materials Other: return policies, CEO’s name, jobs listing Intranets  Intranet portal and core servers  May need archive servers and search Multimedia: images, audio, video Metadata at least Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
(Near) Real Time Indexing Twitter has changed expectations Even in intranets Index must support partial updates Search engines finding limits at scale Distribute indexing and indexes Trigger index updates (push vs. pull) Continuous feed Send web service message Database trigger Update watched URLs with new links Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Indexing and Security Search can undermine “security by obscurity” One link can expose a whole set of documents Work with your security team  List areas which contain sensitive content Define words which trigger further analysis Create a process for removing sensitive data Indexing encrypted content  Search engine uses SSL client for indexing  Encrypt search results before returning Physical security on search servers Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Search and Access Control  Authentication and authorization in indexing  “Basic authentication” - user name and password NT Security integration ACLs and single sign-on  Conform to security rules during indexing Keep access control info as part of document store Showing results - who can see what? Access to search engine itself Collection-level access control  Locked results as teaser for subscription Hit-level access control  Check before displaying results Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Indexing: Sources of Content Web sites  Intranets Extranets Blogs Wikis Mailing list archives & email public folders File systems & shared servers NFS, SMB, AFP, GFS, ftp, WebDAV Content Management Systems  Databases Legacy programs in silos Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Indexing: Robot Spiders  Start with base URL for all hosts  For each page, repeat  Read text into internal format Save document in cache Save words into index Extract all links and check the rules If they are new URLs, add them to the list Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Robot  Indexing Spider Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Common Problems With Robots  Pages that are not linked from anywhere  Spider disallowed by robots.txt or robots meta URLs with ? and & (all should do these now) JavaScript, forms, and interactive dynamic links Some robots can handle some of these Session IDs that change Duplicate detection Multiple views of the same data (Lotus, wikis) Symbolic links & bad redirects Multiple copies of files or directories Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Indexing: Other Data Sources RSS feeds: nice clean text File servers: SMB, file:/// etc.  Content / Document Management Systems Email archives  Databases via ODBC, JDBC, Oracle API Full-text content Metadata: library catalog records, yellow pages External sources using APIs  (Application programmatic interfaces) News feeds  (Reuters, AP) Twitter Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Indexing: Text Files  Plain text is easy RTF export format text easy to find HTML semi-structured text Content is between tags and in attributes Generated by JavaScript - hard to extract Bad HTML, especially missing </ close tags XML files (structured) Many tags are document-level Content is between tags and in attributes Complex tag hierarchy TEI (Text Encoding Initiative) & Semantic Web Xquery and XPATH tools Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Indexing: Binary File Formats  PDF Scanned, may not have any text Bad PDF generators break words at columns “Shadow” text effect duplicates letters SWF and Flash: API may not load dynamic text Office documents Word processing files (may have hidden text from revisions) Spreadsheets (hard to know what to grab)  Presentations Note: new docx, xslx, pptx are really XML file sets CAD and project files  Metadata (properties, Adobe XMP) Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Indexing: Tokenizing Lowercase all characters (aka ‘folding’) Tokenizing makes words searchable  Break on punctuation and spaces Recognize special words:  C++ @ [TS] Typography issues: st is really “st” HTML escaped text: möchten = möchten Special cases for structured strings Numbers, Prices, Dates N-grams - an alternate approach Break into short text patterns Takes a lot of index space Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Indexing: Character Set Issues World has many charsets (aka scripts, alphabets) English has a simple alphabet: 26 letters, 10 numbers Other Roman languages: extended (ç, î, ß) Non-Roman one byte: Cyrillic, Arabic, Hebrew Asian two bytes: Chinese, Japanese, Korean Identifying character sets Unicode characters Older usage: language “code pages” HTTP header or <META http-equiv> Statistical detection techniques Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Indexing: Language Issues Text search works across languages Simple pattern-matching, query to index  Language-specific indexing improves search Tokenizing using appropriate rules Compound nouns (kindergarten) Language rules for stemming Singular version of thés is thé Language detection Trusted tags Bilingual dictionaries Statistical matches, n-grams Documents may have mixed languages… Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Indexing: Multimedia  Images, photos, drawings, sound, scores, video External metadata  File name Link text, surrounding words Internal metadata  ID3 tags for music EXIF and other digital photo information Subtitles (sometimes)  Content OCR to extract graphic text and closed captions Audio: Speech-to-text conversion, still buggy Use human judgment not just automated systems  Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Inverted Index Diagram ,[object Object]
Lots of IR research shows this
Better than DBMS
Alphabetical list of tokens
Tokens not in paragraph order, thus, inverted
Each token hasID of sourceFundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Richer Index Structures Store word position (for phrase matching)  Enclosing tag or field Document metadata  Database field names Image (which attribute) Named anchor text Text markup tags (TEI, Semantic Web) Extracted entities  Personal names, companies, geo locations, dates Anchor text from incoming links Can be very descriptive Add to index as if part of the target document Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Example Inverted Index Structure For each word Document ID Position Tag name  For each document ID Title URL  Description Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Indexing: Stopwords Stopwords - very common terms  Linguistic (a an the as he she it you new) Ubiquitous (names, copyright, click here) Consequences of excluding stopwords: Reduces the size of index files  Improves recall, finds more matching documents  Fails some queries As You Like It, IT copyright policy Problems matching phrases: “New York University” Solutions vary: Index everything, pay the price in index size CommonGrams: n-grams of of frequent phrases Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Stopwords Problems: Example Searching wordpress.com for whatever will be ,[object Object]
Useless results ranking
No matches for will be
One ad gets it right
External search finds over 3,000 pages on site with phraseFundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Indexing: Stemming Singular query should find plural words & vice versa Shoe <=> shoes, cans <=> can, geese <=> goose  Statistical and probabilistic truncation rules Linguistic rules  Lemmatization - stemming based on part of speech Stemming before indexing  Improve recall: find all forms of a word Reduce index size Consequences of extreme stemming Short query problems Search for Ranshouldn’t match Run, Lola, Run Other options Index everything (makes indexes larger and queries slower) New idea: CommonGrams (n-grams of frequent phrases) Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Indexing: Document Store Minimum ID (key for for inverted index) Unique location (URL / file path / record ID) Richer document store Implicit metadata: filename, size, location Explicit metadata Title, date, keywords, author Taxonomy labels, classification, user tagging Language, character set Access control settings Full text of the document For snippets and caching Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Indexing: Dealing with Duplicates Detecting duplicate documents  Exact match is fairly easy: checksums Document similarity check: harder but worth it Choosing the primary copy  Most recent (if reliable) Rules based on path or metadata New web search “canonical” tag What to do with duplicates  Remove from the index: saves space Hide in results unless requested That’s the Google way Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Indexing: Document Dates  HTTP servers lie about dates  Frequent wrong settings: 1969, 2040 Dynamic pages send the current timestamp File systems lie about dates  Applications lie about dates Indexers do the best they can  Metadata (date tag, property, tag DC.date) Extract from page content Checksum to see if file has changed since last index  Consider external metadata repository Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Search Process Flow  Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Where the Queries Come From User-entered text in search fields Search navigation: moving around in results list Previous searches  May just be repeated clicks on URL Save Search feature Simplistic alerts Facet click to add a metadata filter May re-issue search with additional terms May be navigational, no text query Scripts or automated queries Dynamic links (find all pictures by this artist) Geographic information systems Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Query Processing Steps  Try to recognize the character set and language  Tokenize the text by language rules Break at spaces and punctuation Same algorithm as index tokenizer Check for operators  Internet Query Operators: + - "quotes" Boolean Operators: AND OR NOT & | ! Others: NEAR, (parentheses) Check for field names, zones, other filters  Example: title:lunch location=94703 Handle the rare natural language question Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Query Expansion Stemming  Dependant on index stemming choices Good to find singular/plural forms  Word similarity searching - increases recall Fuzzy matching Phonetic, soundex, sound-alike  May overwhelm exact matches Synonym expansion, should be site-specific   bus => coach, ATM => Air Tasking Message Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Search: Retrieval, Recall & Precision  Retrieval  Finding the documents matching a particular query Recall  Finding every relevant document Precision  Finding only relevant documents Balance more recall vs. better precision Use search logs and user studies to guide choices Use precision as part of relevance ranking  Top results should be more exact matches Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
One-Word Text Retrieval Fastbinary search in inverted index Check index updates on disk or in memory If there are distributed indexes, merge results Store the related document information in a list  Document ID Term frequency in document Term positions in the document Note: The document list is not yet sorted Frequent searches may be cached “Short head” vs. “long tail” Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Multi-Word Text Retrieval  Relationship between words defines results Boolean AND, + operator, find all default Only documents which contain all terms Boolean OR operator, find any default All documents with any term Boolean NOT, - operator All documents with the first term but not next term  Phrase operators, quotes Only documents with the words as a phrase Also check for zones or field filters Parentheses: use for order of processing Merge resulting lists Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Relevance Ranking Algorithms Relevance  The likelihood that an item will fill an information need Based on documents in retrieval list Most common algorithm: TF:IDF (Term frequency : inverse document frequency) How often the query word is in the document? How often the word is in the index? Other relevance algorithms  Vectors and document-query similarity  Linguistic analysis and Natural Language Processing  Statistical and Bayesian analysis  Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Relevance Heuristics Phrase matches for multiple query terms  Logs show most multi-word searches are phrases Query terms found in special sections Title Metadata Top of document All terms matched in document  Even when not relevant, it’s transparent Old systems gave excess weight to single rare terms Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
More on Relevance  Relevance is task-specific  Results can never please all of the people More like berry-picking than like hunting Link analysis (PageRank) not very useful  Intranet and site links tend to be navigational  Situation-specific adjustments  Some areas more likely to be valuable  Current  content Local content  Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Federated Search and Relevance Send query to multiple search engines May require special syntax Response time often a factor Receive results in relevance order for each Display results, two options Separate sections for each search engine Merged single relevance rank list Works if all search indexes are similar Problems where the sources are very different Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Retrieval: Access Control Limit access to search itself User enters password or other credentials Search only accepts queries when authenticated Collection-level access control Query filter only retrieves items from allowed groups Hit-level access control Real-time check for user access on documents Start with most relevant documents Repeat until there are ten (may be slow) Display top results, include estimate of how many more Show helpful message if user can’t see any Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Search User Experience Limit user interface complexity Show the scope of the information covered Expose query expansion and contraction  Use familiar UI elements User experience goes beyond interface Index coverage Query syntax Retrieval quality and speed Relevance ranking (first ten are vital) Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Search Forms Interface  Balance simplicity with functionality Put a search field in the navigation bar  Location should be consistent Longer is better: short fields lead to short queries  Simple Search forms: limit options Zone or section Dates Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Search Field Auto-Complete Dropdown menu of matching words Base on search logs Smallish list, 7-10 Most popular Simple sort Alphabetic Price or size Complete range (preferably lowestto highest) Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Other Search Interfaces Heavily researched Natural language  Must keep typing Defining a questionis quite hard  Interactive search Guided interviews But users want immediate results Avatars  do not improve interaction Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Simple vs. Advanced Search UI  Most searches are simple Short: one to three words Fewer than 10% use any operators at all (maybe 1%) Even experts prefer simple search  Will use advanced tools if simple doesn’t work   Default to simple search, link to advanced search  Those are your power users: librarians, techies Expose all possible options Don’t spend huge resources on advanced UI  Exploratory search is different Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Advanced Search Fits Sometimes EBay High motivation  Complex search requirements  Frequent use UX testing still required  Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Search Results: Page Elements Site context  General page layout, navigation links Colors and design elements Results header A search field, with the current search terms  Retrieval information - how many hits Results list in relevance order Each result item with at least a linked title Facets: dynamic links for filtering results Results footer Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Search Results: Good Example  Full but readable ,[object Object]
content blocksSite look-and-feel Navigation Familiar search results elements  Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Search Results: Not-So-Good Example Site page has navigation, colors: search results should too Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Search Results: Visualization Fascinating to look at, great demos Star charts Topographical displays Interactive fly-throughs Hyperbolic trees  Require significant resources to run Good for exploratory & comprehensive research Finding unexpected synergies Simple search is much cheaper for casual users Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Search Results: Header Elements Search field, with the current query Users often edit to be more or less restrictive Number of results found A few search options  Match Any Word / All Words / Exact Phase Filter by date option (if trustworthy) Search zones Results navigation Best Bets Spelling suggestions  Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Search Results: Hits and Pages Show number of items matched Be accurate  Do not give estimates for small numbers (Google and SharePoint are bad this way) Pagination - results list navigation Helps user calibrate content Important for exploratory search Follow web search conventions, example < previous1 2 34 ... 26next > Be accurate Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Results Headers: Examples  Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Search Results: “Best Bets” aka Search Suggestions, QuickLinks, KeyMatch, Recommendations Special-case links for problem queries  Internal topic landing pages External sites when appropriate New and better query to search ,[object Object],Discover problems from users, log analysis “Short head” - few very popular query terms Allocate resources to keep them current ,[object Object],Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Best Bets Example  Best Bets are very clear Would not come first in normal search results Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Search Results: List Sorting List of links to items matching the query Sorted by matching terms Impossible to be relevant to every query Variety of sources when possible Transparency: why these items in this order Other sort orders - make very visible By author’s last name By date By price Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Search Results: Not Enough Variety Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Search Results: Weird Sort Sorted by:“Degrees away” Labels too subtle: ,[object Object]
Degree icon should be on the left side Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Result Items: Elements Information foraging: show hints about items Title of document, or name of product Location: URL, file path, database ID  May need to rewrite to user-accessible URLs Hide location if it’s not meaningful Distinguishing data  Metadata:  picture, product code, author name Show match terms in context (snippets) Text before and after query term matches  Highlight the matches Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Results Items: Not Enough Content Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Results Items: Too Much Content Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Results Items: Just Right Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Results Items: Additional Data Date (if reliable) Size and File type  Avoid surprising launches of Acrobat or other app. Metadata  Author, department, brand, product...  Access status: password required?  Topics and subject headings Taxonomy categories Keywords and concept tags User tags, folksonomy Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Results Items: Rich Items Example  Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Results: Dynamic Clustering  Uses search results text to infer topics  Groups by similarity in titles and results text Particularly good for portals and intranets Unstructured, uncontrolled text Dynamic, no preprocessing needed Can supplement categorization and taxonomies Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Results: Clustering Example Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Commerce and Catalog Results  Picture or graphic if possible Important attributes  Price Color Size Compatibility Availability “Buy” button  Simplify process, save time Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Online Store Results Example  Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Multimedia Search  Image, audio, and video files Audio and visual similarity search still theory Show context in results  Match terms from transcript or OCR Text around image Thumbnails or keyframes Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Multimedia Results Example  Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Results: Faceted Metadata Better than forms for structured text data  Exposes attributes as part of search results  Leverages metadata Topic names, taxonomy Mundane stuff: color, date, size, author...	 Choices specifically relating to search results  Dynamically generates from metadata  Preview numbers offer users confidence in clicking  Supported by extensive usability testing Used on a majority of large e-commerce sites Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Why Faceted Search is Better Than Forms Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Faceted Metadata: Commerce Example  Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Faceted Metadata: Library Catalog Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
No Matches Queries: Causes  Misspellings and typing errors Scope problem: nothing for that topic Vocabulary differences  Users may be less precise, or use competitor’s terms Marketers may dominate content  Restrictive search settings  Default may only match exact phrase or all words  Access control may disallow user Software/hardware/network failures Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
No Matches Queries: Responses Track queries with no matches in logs  Use sessions, surveys & testing to find user intent  Design the no-matches page carefully  Explain what is and isn’t on the site  Provide useful navigation links Add search engine help  Synonyms Best Bets Spelling Add terms to text Add content, topic pages Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
No Matches Queries: Spelling Issues Detect and address common problems Spelling errors Typos Queries without spaces between words  Use site-specific dictionary Easy to build from search index  Never suggests any words not on the site Users familiar with did you mean....? Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Good Example of No-Matches Page  Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
Empty Searches Users click or press “enter” in the search box ,[object Object],Should not find all items in the index ,[object Object],Do nothing Go to a simple search page Show an error dialog Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com

Mais conteúdo relacionado

Mais procurados

ECS2019 - Managing Content Types in the Modern World
ECS2019 - Managing Content Types in the Modern WorldECS2019 - Managing Content Types in the Modern World
ECS2019 - Managing Content Types in the Modern WorldMarc D Anderson
 
14 Tips for Planning ECM Content Migration to SharePoint
14 Tips for Planning ECM Content Migration to SharePoint14 Tips for Planning ECM Content Migration to SharePoint
14 Tips for Planning ECM Content Migration to SharePointJoel Oleson
 
Sp24 design a share point 2013 architecture – the basics
Sp24   design a share point 2013 architecture – the basicsSp24   design a share point 2013 architecture – the basics
Sp24 design a share point 2013 architecture – the basicsAlexander Meijers
 
Spsbe 18-04-15 - should i move my network folders to office 365
Spsbe   18-04-15 - should i move my network folders to office 365Spsbe   18-04-15 - should i move my network folders to office 365
Spsbe 18-04-15 - should i move my network folders to office 365BIWUG
 
Understanding and Applying Cloud Hybrid Search
Understanding and Applying Cloud Hybrid SearchUnderstanding and Applying Cloud Hybrid Search
Understanding and Applying Cloud Hybrid SearchJeff Fried
 
SharePoint 2013 'Search': What you need to Know!
SharePoint 2013 'Search': What you need to Know!SharePoint 2013 'Search': What you need to Know!
SharePoint 2013 'Search': What you need to Know!WinWire Technologies Inc
 
2018 09-03 aOS Aachen - SharePoint demystified - Thomas Vochten
2018 09-03 aOS Aachen - SharePoint demystified - Thomas Vochten2018 09-03 aOS Aachen - SharePoint demystified - Thomas Vochten
2018 09-03 aOS Aachen - SharePoint demystified - Thomas VochtenaOS Community
 
Drilling Down to the Challenges of SharePoint Taxonomy Implementation
Drilling Down to the Challenges of SharePoint Taxonomy ImplementationDrilling Down to the Challenges of SharePoint Taxonomy Implementation
Drilling Down to the Challenges of SharePoint Taxonomy ImplementationTSoholt
 
Take Cloud Hybrid Search to the Next Level
Take Cloud Hybrid Search to the Next LevelTake Cloud Hybrid Search to the Next Level
Take Cloud Hybrid Search to the Next LevelJeff Fried
 
SharePoint 2013 Search Topology and Optimization
SharePoint 2013 Search Topology and OptimizationSharePoint 2013 Search Topology and Optimization
SharePoint 2013 Search Topology and OptimizationMike Maadarani
 
How to build your own Delve: combining machine learning, big data and SharePoint
How to build your own Delve: combining machine learning, big data and SharePointHow to build your own Delve: combining machine learning, big data and SharePoint
How to build your own Delve: combining machine learning, big data and SharePointJoris Poelmans
 
search driven intranets
search driven intranetssearch driven intranets
search driven intranetsJeff Fried
 
SharePoint 2013 Search - Whats new for End Users
SharePoint 2013 Search - Whats new for End UsersSharePoint 2013 Search - Whats new for End Users
SharePoint 2013 Search - Whats new for End UsersMark Stokes
 
Succeeding with Hybrid SharePoint
Succeeding with Hybrid SharePointSucceeding with Hybrid SharePoint
Succeeding with Hybrid SharePointJeff Fried
 
Planning and deploying_share_point_farm_in_azure_gabsg_2016
Planning and deploying_share_point_farm_in_azure_gabsg_2016Planning and deploying_share_point_farm_in_azure_gabsg_2016
Planning and deploying_share_point_farm_in_azure_gabsg_2016Thuan Ng
 
Search-Driven Applications with SharePoint 2013 (#SBSBE16)
Search-Driven Applications with SharePoint 2013 (#SBSBE16)Search-Driven Applications with SharePoint 2013 (#SBSBE16)
Search-Driven Applications with SharePoint 2013 (#SBSBE16)Maximilian Melcher
 
Implementing BCS-Business Connectivity Services - Sharepoint 2013- Office 365
Implementing BCS-Business Connectivity Services - Sharepoint 2013- Office 365Implementing BCS-Business Connectivity Services - Sharepoint 2013- Office 365
Implementing BCS-Business Connectivity Services - Sharepoint 2013- Office 365Shahzad S
 
SharePoint Search Secrets for Power Users & Administrators - Mike Smith
SharePoint Search Secrets for Power Users & Administrators - Mike SmithSharePoint Search Secrets for Power Users & Administrators - Mike Smith
SharePoint Search Secrets for Power Users & Administrators - Mike SmithMAX Technical Training
 
Office 365 and share point online ramp up in 60 minutes for on-premises share...
Office 365 and share point online ramp up in 60 minutes for on-premises share...Office 365 and share point online ramp up in 60 minutes for on-premises share...
Office 365 and share point online ramp up in 60 minutes for on-premises share...Nik Patel
 

Mais procurados (20)

ECS2019 - Managing Content Types in the Modern World
ECS2019 - Managing Content Types in the Modern WorldECS2019 - Managing Content Types in the Modern World
ECS2019 - Managing Content Types in the Modern World
 
14 Tips for Planning ECM Content Migration to SharePoint
14 Tips for Planning ECM Content Migration to SharePoint14 Tips for Planning ECM Content Migration to SharePoint
14 Tips for Planning ECM Content Migration to SharePoint
 
Sp24 design a share point 2013 architecture – the basics
Sp24   design a share point 2013 architecture – the basicsSp24   design a share point 2013 architecture – the basics
Sp24 design a share point 2013 architecture – the basics
 
Spsbe 18-04-15 - should i move my network folders to office 365
Spsbe   18-04-15 - should i move my network folders to office 365Spsbe   18-04-15 - should i move my network folders to office 365
Spsbe 18-04-15 - should i move my network folders to office 365
 
Understanding and Applying Cloud Hybrid Search
Understanding and Applying Cloud Hybrid SearchUnderstanding and Applying Cloud Hybrid Search
Understanding and Applying Cloud Hybrid Search
 
SharePoint 2013 'Search': What you need to Know!
SharePoint 2013 'Search': What you need to Know!SharePoint 2013 'Search': What you need to Know!
SharePoint 2013 'Search': What you need to Know!
 
2018 09-03 aOS Aachen - SharePoint demystified - Thomas Vochten
2018 09-03 aOS Aachen - SharePoint demystified - Thomas Vochten2018 09-03 aOS Aachen - SharePoint demystified - Thomas Vochten
2018 09-03 aOS Aachen - SharePoint demystified - Thomas Vochten
 
Drilling Down to the Challenges of SharePoint Taxonomy Implementation
Drilling Down to the Challenges of SharePoint Taxonomy ImplementationDrilling Down to the Challenges of SharePoint Taxonomy Implementation
Drilling Down to the Challenges of SharePoint Taxonomy Implementation
 
Take Cloud Hybrid Search to the Next Level
Take Cloud Hybrid Search to the Next LevelTake Cloud Hybrid Search to the Next Level
Take Cloud Hybrid Search to the Next Level
 
SharePoint 2013 Search Topology and Optimization
SharePoint 2013 Search Topology and OptimizationSharePoint 2013 Search Topology and Optimization
SharePoint 2013 Search Topology and Optimization
 
How to build your own Delve: combining machine learning, big data and SharePoint
How to build your own Delve: combining machine learning, big data and SharePointHow to build your own Delve: combining machine learning, big data and SharePoint
How to build your own Delve: combining machine learning, big data and SharePoint
 
search driven intranets
search driven intranetssearch driven intranets
search driven intranets
 
SharePoint 2013 Search - Whats new for End Users
SharePoint 2013 Search - Whats new for End UsersSharePoint 2013 Search - Whats new for End Users
SharePoint 2013 Search - Whats new for End Users
 
Succeeding with Hybrid SharePoint
Succeeding with Hybrid SharePointSucceeding with Hybrid SharePoint
Succeeding with Hybrid SharePoint
 
Planning and deploying_share_point_farm_in_azure_gabsg_2016
Planning and deploying_share_point_farm_in_azure_gabsg_2016Planning and deploying_share_point_farm_in_azure_gabsg_2016
Planning and deploying_share_point_farm_in_azure_gabsg_2016
 
Is BCS Dead?
Is BCS Dead?Is BCS Dead?
Is BCS Dead?
 
Search-Driven Applications with SharePoint 2013 (#SBSBE16)
Search-Driven Applications with SharePoint 2013 (#SBSBE16)Search-Driven Applications with SharePoint 2013 (#SBSBE16)
Search-Driven Applications with SharePoint 2013 (#SBSBE16)
 
Implementing BCS-Business Connectivity Services - Sharepoint 2013- Office 365
Implementing BCS-Business Connectivity Services - Sharepoint 2013- Office 365Implementing BCS-Business Connectivity Services - Sharepoint 2013- Office 365
Implementing BCS-Business Connectivity Services - Sharepoint 2013- Office 365
 
SharePoint Search Secrets for Power Users & Administrators - Mike Smith
SharePoint Search Secrets for Power Users & Administrators - Mike SmithSharePoint Search Secrets for Power Users & Administrators - Mike Smith
SharePoint Search Secrets for Power Users & Administrators - Mike Smith
 
Office 365 and share point online ramp up in 60 minutes for on-premises share...
Office 365 and share point online ramp up in 60 minutes for on-premises share...Office 365 and share point online ramp up in 60 minutes for on-premises share...
Office 365 and share point online ramp up in 60 minutes for on-premises share...
 

Semelhante a Fundamentals Of Search

SharePoint Jumpstart #2 Making Basic SharePoint Search Work
SharePoint Jumpstart #2 Making Basic SharePoint Search WorkSharePoint Jumpstart #2 Making Basic SharePoint Search Work
SharePoint Jumpstart #2 Making Basic SharePoint Search WorkEarley Information Science
 
Cms an overview
Cms an overviewCms an overview
Cms an overviewkmusthu
 
PoolParty Thesaurus Management Quick Overview
PoolParty Thesaurus Management Quick OverviewPoolParty Thesaurus Management Quick Overview
PoolParty Thesaurus Management Quick OverviewAndreas Blumauer
 
EPC Group - Comprehensive Overview of SharePoint 2010's Enterprise Search Cap...
EPC Group - Comprehensive Overview of SharePoint 2010's Enterprise Search Cap...EPC Group - Comprehensive Overview of SharePoint 2010's Enterprise Search Cap...
EPC Group - Comprehensive Overview of SharePoint 2010's Enterprise Search Cap...EPC Group
 
The Searchmaster's Toolbox - David Hawking, Funnelback Search
The Searchmaster's Toolbox - David Hawking, Funnelback SearchThe Searchmaster's Toolbox - David Hawking, Funnelback Search
The Searchmaster's Toolbox - David Hawking, Funnelback SearchSquiz
 
Enterprise Search in SharePoint 2010
Enterprise Search in SharePoint 2010Enterprise Search in SharePoint 2010
Enterprise Search in SharePoint 2010bgerman
 
IBM Omnifind Enterprise Portal Seach To Improve Productivity
IBM Omnifind Enterprise   Portal Seach To Improve ProductivityIBM Omnifind Enterprise   Portal Seach To Improve Productivity
IBM Omnifind Enterprise Portal Seach To Improve ProductivityFrancis Ricalde
 
Introduction to search_marketing
Introduction to search_marketingIntroduction to search_marketing
Introduction to search_marketingBill Hunt
 
Google Search Appliance Version 2.0 Webinar - May 2012
Google Search Appliance Version 2.0 Webinar - May 2012Google Search Appliance Version 2.0 Webinar - May 2012
Google Search Appliance Version 2.0 Webinar - May 2012Fishbowl Solutions
 
Search technologies & aws cloud search
Search technologies & aws cloud searchSearch technologies & aws cloud search
Search technologies & aws cloud searchAmazon Web Services
 
SharePoint Overview
SharePoint OverviewSharePoint Overview
SharePoint OverviewAmy Phillips
 
Leverage Search and Customize to your Brand within SharePoint 2010
Leverage Search and Customize to your Brand within SharePoint 2010Leverage Search and Customize to your Brand within SharePoint 2010
Leverage Search and Customize to your Brand within SharePoint 2010Chaitu Madala
 
SharePoint 2010 - Enterprise search overview
SharePoint 2010 - Enterprise search overviewSharePoint 2010 - Enterprise search overview
SharePoint 2010 - Enterprise search overviewbarryboudreau
 
Sharepoint 2013 Overview
Sharepoint 2013 OverviewSharepoint 2013 Overview
Sharepoint 2013 OverviewTarek Yehia
 
Making IA Real: Planning an Information Architecture Strategy
Making IA Real: Planning an Information Architecture StrategyMaking IA Real: Planning an Information Architecture Strategy
Making IA Real: Planning an Information Architecture StrategyChiara Fox Ogan
 
What your IT Doesn't Know about Publishing DITA Content
What your IT Doesn't Know about Publishing DITA ContentWhat your IT Doesn't Know about Publishing DITA Content
What your IT Doesn't Know about Publishing DITA Contentctnitchie
 
Universal Search for Legal Enterprises
Universal Search for Legal EnterprisesUniversal Search for Legal Enterprises
Universal Search for Legal EnterprisesAdhereSolutions
 
#SPSPhilly search topology & optimization
#SPSPhilly search topology & optimization#SPSPhilly search topology & optimization
#SPSPhilly search topology & optimizationMike Maadarani
 

Semelhante a Fundamentals Of Search (20)

SharePoint Jumpstart #2 Making Basic SharePoint Search Work
SharePoint Jumpstart #2 Making Basic SharePoint Search WorkSharePoint Jumpstart #2 Making Basic SharePoint Search Work
SharePoint Jumpstart #2 Making Basic SharePoint Search Work
 
Cms an overview
Cms an overviewCms an overview
Cms an overview
 
PoolParty Thesaurus Management Quick Overview
PoolParty Thesaurus Management Quick OverviewPoolParty Thesaurus Management Quick Overview
PoolParty Thesaurus Management Quick Overview
 
EPC Group - Comprehensive Overview of SharePoint 2010's Enterprise Search Cap...
EPC Group - Comprehensive Overview of SharePoint 2010's Enterprise Search Cap...EPC Group - Comprehensive Overview of SharePoint 2010's Enterprise Search Cap...
EPC Group - Comprehensive Overview of SharePoint 2010's Enterprise Search Cap...
 
The Searchmaster's Toolbox - David Hawking, Funnelback Search
The Searchmaster's Toolbox - David Hawking, Funnelback SearchThe Searchmaster's Toolbox - David Hawking, Funnelback Search
The Searchmaster's Toolbox - David Hawking, Funnelback Search
 
Enterprise Search in SharePoint 2010
Enterprise Search in SharePoint 2010Enterprise Search in SharePoint 2010
Enterprise Search in SharePoint 2010
 
IBM Omnifind Enterprise Portal Seach To Improve Productivity
IBM Omnifind Enterprise   Portal Seach To Improve ProductivityIBM Omnifind Enterprise   Portal Seach To Improve Productivity
IBM Omnifind Enterprise Portal Seach To Improve Productivity
 
Introduction to search_marketing
Introduction to search_marketingIntroduction to search_marketing
Introduction to search_marketing
 
Document repositories-and-metadata
Document repositories-and-metadataDocument repositories-and-metadata
Document repositories-and-metadata
 
Google Search Appliance Version 2.0 Webinar - May 2012
Google Search Appliance Version 2.0 Webinar - May 2012Google Search Appliance Version 2.0 Webinar - May 2012
Google Search Appliance Version 2.0 Webinar - May 2012
 
Search technologies & aws cloud search
Search technologies & aws cloud searchSearch technologies & aws cloud search
Search technologies & aws cloud search
 
SharePoint Overview
SharePoint OverviewSharePoint Overview
SharePoint Overview
 
Leverage Search and Customize to your Brand within SharePoint 2010
Leverage Search and Customize to your Brand within SharePoint 2010Leverage Search and Customize to your Brand within SharePoint 2010
Leverage Search and Customize to your Brand within SharePoint 2010
 
SharePoint 2010 - Enterprise search overview
SharePoint 2010 - Enterprise search overviewSharePoint 2010 - Enterprise search overview
SharePoint 2010 - Enterprise search overview
 
Sharepoint 2013 Overview
Sharepoint 2013 OverviewSharepoint 2013 Overview
Sharepoint 2013 Overview
 
Making IA Real: Planning an Information Architecture Strategy
Making IA Real: Planning an Information Architecture StrategyMaking IA Real: Planning an Information Architecture Strategy
Making IA Real: Planning an Information Architecture Strategy
 
What your IT Doesn't Know about Publishing DITA Content
What your IT Doesn't Know about Publishing DITA ContentWhat your IT Doesn't Know about Publishing DITA Content
What your IT Doesn't Know about Publishing DITA Content
 
Universal Search for Legal Enterprises
Universal Search for Legal EnterprisesUniversal Search for Legal Enterprises
Universal Search for Legal Enterprises
 
Pratical Deep Dive into the Semantic Web - #smconnect
Pratical Deep Dive into the Semantic Web - #smconnectPratical Deep Dive into the Semantic Web - #smconnect
Pratical Deep Dive into the Semantic Web - #smconnect
 
#SPSPhilly search topology & optimization
#SPSPhilly search topology & optimization#SPSPhilly search topology & optimization
#SPSPhilly search topology & optimization
 

Último

GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 

Último (20)

GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 

Fundamentals Of Search

  • 1. The Fundamentals of Enterprise Search KMWorld 2009 Avi Rappoport, Search Tools Consulting www.searchtools.com consult9@searchtools.com www.searchtools.com/slides/kmw09/fundamentals-of-search.html
  • 2. What’s In This Workshop Overview of enterprise search, in context Search engine processes Robot spiders, database access Indexing Security Query parsing, retrieval, and relevance ranking Usable search interfaces. Maintenance and Analytics Methods for choosing a good search engine Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 3. About SearchTools Avi Rappoport is a librarian (MLIS from Berkeley) Software developer and product manager User interface designer Long-time search consultant Editor & Publisher, www.searchtools.com Search Tools Consulting Search needs analysis and recommendations Enterprise search evaluation Outsourced search administration Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 4. Defining Enterprise Search Large scale web site search Corporate sites Institutional sites Online stores Intranet search Crossing departmental lines Opening data silos Extranets Portal Search Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 5. Similarities to Webwide Search Robot crawlers HTML over HTTP Scaling to millions of items Distributed processing Full-text indexing of content Simple query language Relevance ranking of results TF-IDF (term frequency : inverse document frequency) Familiar results list Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 6. Differences from Web Search Limited scope A site, set of sites, extranet, or intranet Few meaningful hyperlinks Page Rank and link analysis is less useful Security and access control issues Content in databases, CMSs, etc. More control Index update scheduling Some content is very valuable, other is not No search spam Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 7. Text Search vs. Database Search Indexes multiple content sources Database fields, files, web pages, feeds... Simple search commands instead of SQL Flexible indexing and retrieval Relevance ranking (this is a major issue) Does not compete for database resources Easy to scale separately from DBMS New features: spellcheck, auto complete, facets Works in the real world, from eBay to Google Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 8. Search and Information Architecture Information Architecture The art and science of organizing information for access and use. IA work enriches search Creates order and systems Provides standard vocabulary Removes ROT (redundant, obsolete, trivial) Search supplements IA Supports user vocabularies Changes dynamically with new content Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 9. Search and Taxonomy Taxonomy creates categories Labels and metadata Improves quality of search results Additional metadata extremely valuable Search crosses categories Bypasses ambiguous topic labels Useful for novices Supports user vocabulary Dynamic updates for new topics Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 10. Search & Knowledge Management KM is: “The process through which organizations generate value from their intellectual and knowledge-based assets.” (CIO Magazine) Organizes information, processes and people Offers collaboration and archiving tools Attempts to regularize implicit knowledge Search mostly matches words Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 11. Two Main Types of Search Known-item search Short queries “Good-enough” answers Exploratory search Research - finding unknowns Scientific, legal, medical, business, sales Conceptual overviews Completeness - all possible relevant items Law enforcement Medicine Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 12. All people see are the search box and results list Invisible functionality Indexes Query processing Retrieval Relevance ranking Search is a mystery But it’s just software Search as an Iceberg Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 13. Elements of Search Engines Automated tools to collect content Specialized storage for quick retrieval Query processing and expansion Retrieval (matching query to index content) Relevance ranking Search results interfaces Analytics, metrics and maintenance Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 14. Choosing Content To Index Information sites Consider indexing every single page Use search indexing as a discovery mechanism Online stores, catalogs Product information: cost, color, size, materials Other: return policies, CEO’s name, jobs listing Intranets Intranet portal and core servers May need archive servers and search Multimedia: images, audio, video Metadata at least Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 15. (Near) Real Time Indexing Twitter has changed expectations Even in intranets Index must support partial updates Search engines finding limits at scale Distribute indexing and indexes Trigger index updates (push vs. pull) Continuous feed Send web service message Database trigger Update watched URLs with new links Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 16. Indexing and Security Search can undermine “security by obscurity” One link can expose a whole set of documents Work with your security team List areas which contain sensitive content Define words which trigger further analysis Create a process for removing sensitive data Indexing encrypted content Search engine uses SSL client for indexing Encrypt search results before returning Physical security on search servers Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 17. Search and Access Control Authentication and authorization in indexing “Basic authentication” - user name and password NT Security integration ACLs and single sign-on Conform to security rules during indexing Keep access control info as part of document store Showing results - who can see what? Access to search engine itself Collection-level access control Locked results as teaser for subscription Hit-level access control Check before displaying results Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 18. Indexing: Sources of Content Web sites Intranets Extranets Blogs Wikis Mailing list archives & email public folders File systems & shared servers NFS, SMB, AFP, GFS, ftp, WebDAV Content Management Systems Databases Legacy programs in silos Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 19. Indexing: Robot Spiders Start with base URL for all hosts For each page, repeat Read text into internal format Save document in cache Save words into index Extract all links and check the rules If they are new URLs, add them to the list Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 20. Robot Indexing Spider Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 21. Common Problems With Robots Pages that are not linked from anywhere Spider disallowed by robots.txt or robots meta URLs with ? and & (all should do these now) JavaScript, forms, and interactive dynamic links Some robots can handle some of these Session IDs that change Duplicate detection Multiple views of the same data (Lotus, wikis) Symbolic links & bad redirects Multiple copies of files or directories Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 22. Indexing: Other Data Sources RSS feeds: nice clean text File servers: SMB, file:/// etc. Content / Document Management Systems Email archives Databases via ODBC, JDBC, Oracle API Full-text content Metadata: library catalog records, yellow pages External sources using APIs (Application programmatic interfaces) News feeds (Reuters, AP) Twitter Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 23. Indexing: Text Files Plain text is easy RTF export format text easy to find HTML semi-structured text Content is between tags and in attributes Generated by JavaScript - hard to extract Bad HTML, especially missing </ close tags XML files (structured) Many tags are document-level Content is between tags and in attributes Complex tag hierarchy TEI (Text Encoding Initiative) & Semantic Web Xquery and XPATH tools Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 24. Indexing: Binary File Formats PDF Scanned, may not have any text Bad PDF generators break words at columns “Shadow” text effect duplicates letters SWF and Flash: API may not load dynamic text Office documents Word processing files (may have hidden text from revisions) Spreadsheets (hard to know what to grab) Presentations Note: new docx, xslx, pptx are really XML file sets CAD and project files Metadata (properties, Adobe XMP) Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 25. Indexing: Tokenizing Lowercase all characters (aka ‘folding’) Tokenizing makes words searchable Break on punctuation and spaces Recognize special words: C++ @ [TS] Typography issues: st is really “st” HTML escaped text: möchten = möchten Special cases for structured strings Numbers, Prices, Dates N-grams - an alternate approach Break into short text patterns Takes a lot of index space Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 26. Indexing: Character Set Issues World has many charsets (aka scripts, alphabets) English has a simple alphabet: 26 letters, 10 numbers Other Roman languages: extended (ç, î, ß) Non-Roman one byte: Cyrillic, Arabic, Hebrew Asian two bytes: Chinese, Japanese, Korean Identifying character sets Unicode characters Older usage: language “code pages” HTTP header or <META http-equiv> Statistical detection techniques Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 27. Indexing: Language Issues Text search works across languages Simple pattern-matching, query to index Language-specific indexing improves search Tokenizing using appropriate rules Compound nouns (kindergarten) Language rules for stemming Singular version of thés is thé Language detection Trusted tags Bilingual dictionaries Statistical matches, n-grams Documents may have mixed languages… Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 28. Indexing: Multimedia Images, photos, drawings, sound, scores, video External metadata File name Link text, surrounding words Internal metadata ID3 tags for music EXIF and other digital photo information Subtitles (sometimes) Content OCR to extract graphic text and closed captions Audio: Speech-to-text conversion, still buggy Use human judgment not just automated systems Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 29.
  • 30. Lots of IR research shows this
  • 33. Tokens not in paragraph order, thus, inverted
  • 34. Each token hasID of sourceFundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 35. Richer Index Structures Store word position (for phrase matching) Enclosing tag or field Document metadata Database field names Image (which attribute) Named anchor text Text markup tags (TEI, Semantic Web) Extracted entities Personal names, companies, geo locations, dates Anchor text from incoming links Can be very descriptive Add to index as if part of the target document Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 36. Example Inverted Index Structure For each word Document ID Position Tag name For each document ID Title URL Description Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 37. Indexing: Stopwords Stopwords - very common terms Linguistic (a an the as he she it you new) Ubiquitous (names, copyright, click here) Consequences of excluding stopwords: Reduces the size of index files Improves recall, finds more matching documents Fails some queries As You Like It, IT copyright policy Problems matching phrases: “New York University” Solutions vary: Index everything, pay the price in index size CommonGrams: n-grams of of frequent phrases Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 38.
  • 40. No matches for will be
  • 41. One ad gets it right
  • 42. External search finds over 3,000 pages on site with phraseFundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 43. Indexing: Stemming Singular query should find plural words & vice versa Shoe <=> shoes, cans <=> can, geese <=> goose Statistical and probabilistic truncation rules Linguistic rules Lemmatization - stemming based on part of speech Stemming before indexing Improve recall: find all forms of a word Reduce index size Consequences of extreme stemming Short query problems Search for Ranshouldn’t match Run, Lola, Run Other options Index everything (makes indexes larger and queries slower) New idea: CommonGrams (n-grams of frequent phrases) Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 44. Indexing: Document Store Minimum ID (key for for inverted index) Unique location (URL / file path / record ID) Richer document store Implicit metadata: filename, size, location Explicit metadata Title, date, keywords, author Taxonomy labels, classification, user tagging Language, character set Access control settings Full text of the document For snippets and caching Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 45. Indexing: Dealing with Duplicates Detecting duplicate documents Exact match is fairly easy: checksums Document similarity check: harder but worth it Choosing the primary copy Most recent (if reliable) Rules based on path or metadata New web search “canonical” tag What to do with duplicates Remove from the index: saves space Hide in results unless requested That’s the Google way Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 46. Indexing: Document Dates HTTP servers lie about dates Frequent wrong settings: 1969, 2040 Dynamic pages send the current timestamp File systems lie about dates Applications lie about dates Indexers do the best they can Metadata (date tag, property, tag DC.date) Extract from page content Checksum to see if file has changed since last index Consider external metadata repository Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 47. Search Process Flow Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 48. Where the Queries Come From User-entered text in search fields Search navigation: moving around in results list Previous searches May just be repeated clicks on URL Save Search feature Simplistic alerts Facet click to add a metadata filter May re-issue search with additional terms May be navigational, no text query Scripts or automated queries Dynamic links (find all pictures by this artist) Geographic information systems Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 49. Query Processing Steps Try to recognize the character set and language Tokenize the text by language rules Break at spaces and punctuation Same algorithm as index tokenizer Check for operators Internet Query Operators: + - "quotes" Boolean Operators: AND OR NOT & | ! Others: NEAR, (parentheses) Check for field names, zones, other filters Example: title:lunch location=94703 Handle the rare natural language question Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 50. Query Expansion Stemming Dependant on index stemming choices Good to find singular/plural forms Word similarity searching - increases recall Fuzzy matching Phonetic, soundex, sound-alike May overwhelm exact matches Synonym expansion, should be site-specific bus => coach, ATM => Air Tasking Message Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 51. Search: Retrieval, Recall & Precision Retrieval Finding the documents matching a particular query Recall Finding every relevant document Precision Finding only relevant documents Balance more recall vs. better precision Use search logs and user studies to guide choices Use precision as part of relevance ranking Top results should be more exact matches Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 52. One-Word Text Retrieval Fastbinary search in inverted index Check index updates on disk or in memory If there are distributed indexes, merge results Store the related document information in a list Document ID Term frequency in document Term positions in the document Note: The document list is not yet sorted Frequent searches may be cached “Short head” vs. “long tail” Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 53. Multi-Word Text Retrieval Relationship between words defines results Boolean AND, + operator, find all default Only documents which contain all terms Boolean OR operator, find any default All documents with any term Boolean NOT, - operator All documents with the first term but not next term Phrase operators, quotes Only documents with the words as a phrase Also check for zones or field filters Parentheses: use for order of processing Merge resulting lists Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 54. Relevance Ranking Algorithms Relevance The likelihood that an item will fill an information need Based on documents in retrieval list Most common algorithm: TF:IDF (Term frequency : inverse document frequency) How often the query word is in the document? How often the word is in the index? Other relevance algorithms Vectors and document-query similarity Linguistic analysis and Natural Language Processing Statistical and Bayesian analysis Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 55. Relevance Heuristics Phrase matches for multiple query terms Logs show most multi-word searches are phrases Query terms found in special sections Title Metadata Top of document All terms matched in document Even when not relevant, it’s transparent Old systems gave excess weight to single rare terms Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 56. More on Relevance Relevance is task-specific Results can never please all of the people More like berry-picking than like hunting Link analysis (PageRank) not very useful Intranet and site links tend to be navigational Situation-specific adjustments Some areas more likely to be valuable Current content Local content Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 57. Federated Search and Relevance Send query to multiple search engines May require special syntax Response time often a factor Receive results in relevance order for each Display results, two options Separate sections for each search engine Merged single relevance rank list Works if all search indexes are similar Problems where the sources are very different Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 58. Retrieval: Access Control Limit access to search itself User enters password or other credentials Search only accepts queries when authenticated Collection-level access control Query filter only retrieves items from allowed groups Hit-level access control Real-time check for user access on documents Start with most relevant documents Repeat until there are ten (may be slow) Display top results, include estimate of how many more Show helpful message if user can’t see any Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 59. Search User Experience Limit user interface complexity Show the scope of the information covered Expose query expansion and contraction Use familiar UI elements User experience goes beyond interface Index coverage Query syntax Retrieval quality and speed Relevance ranking (first ten are vital) Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 60. Search Forms Interface Balance simplicity with functionality Put a search field in the navigation bar Location should be consistent Longer is better: short fields lead to short queries Simple Search forms: limit options Zone or section Dates Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 61. Search Field Auto-Complete Dropdown menu of matching words Base on search logs Smallish list, 7-10 Most popular Simple sort Alphabetic Price or size Complete range (preferably lowestto highest) Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 62. Other Search Interfaces Heavily researched Natural language Must keep typing Defining a questionis quite hard Interactive search Guided interviews But users want immediate results Avatars do not improve interaction Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 63. Simple vs. Advanced Search UI Most searches are simple Short: one to three words Fewer than 10% use any operators at all (maybe 1%) Even experts prefer simple search Will use advanced tools if simple doesn’t work Default to simple search, link to advanced search Those are your power users: librarians, techies Expose all possible options Don’t spend huge resources on advanced UI Exploratory search is different Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 64. Advanced Search Fits Sometimes EBay High motivation Complex search requirements Frequent use UX testing still required Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 65. Search Results: Page Elements Site context General page layout, navigation links Colors and design elements Results header A search field, with the current search terms Retrieval information - how many hits Results list in relevance order Each result item with at least a linked title Facets: dynamic links for filtering results Results footer Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 66.
  • 67. content blocksSite look-and-feel Navigation Familiar search results elements Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 68. Search Results: Not-So-Good Example Site page has navigation, colors: search results should too Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 69. Search Results: Visualization Fascinating to look at, great demos Star charts Topographical displays Interactive fly-throughs Hyperbolic trees Require significant resources to run Good for exploratory & comprehensive research Finding unexpected synergies Simple search is much cheaper for casual users Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 70. Search Results: Header Elements Search field, with the current query Users often edit to be more or less restrictive Number of results found A few search options Match Any Word / All Words / Exact Phase Filter by date option (if trustworthy) Search zones Results navigation Best Bets Spelling suggestions Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 71. Search Results: Hits and Pages Show number of items matched Be accurate Do not give estimates for small numbers (Google and SharePoint are bad this way) Pagination - results list navigation Helps user calibrate content Important for exploratory search Follow web search conventions, example < previous1 2 34 ... 26next > Be accurate Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 72. Results Headers: Examples Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 73.
  • 74. Best Bets Example Best Bets are very clear Would not come first in normal search results Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 75. Search Results: List Sorting List of links to items matching the query Sorted by matching terms Impossible to be relevant to every query Variety of sources when possible Transparency: why these items in this order Other sort orders - make very visible By author’s last name By date By price Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 76. Search Results: Not Enough Variety Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 77.
  • 78. Degree icon should be on the left side Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 79. Result Items: Elements Information foraging: show hints about items Title of document, or name of product Location: URL, file path, database ID May need to rewrite to user-accessible URLs Hide location if it’s not meaningful Distinguishing data Metadata: picture, product code, author name Show match terms in context (snippets) Text before and after query term matches Highlight the matches Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 80. Results Items: Not Enough Content Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 81. Results Items: Too Much Content Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 82. Results Items: Just Right Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 83. Results Items: Additional Data Date (if reliable) Size and File type Avoid surprising launches of Acrobat or other app. Metadata Author, department, brand, product... Access status: password required? Topics and subject headings Taxonomy categories Keywords and concept tags User tags, folksonomy Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 84. Results Items: Rich Items Example Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 85. Results: Dynamic Clustering Uses search results text to infer topics Groups by similarity in titles and results text Particularly good for portals and intranets Unstructured, uncontrolled text Dynamic, no preprocessing needed Can supplement categorization and taxonomies Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 86. Results: Clustering Example Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 87. Commerce and Catalog Results Picture or graphic if possible Important attributes Price Color Size Compatibility Availability “Buy” button Simplify process, save time Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 88. Online Store Results Example Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 89. Multimedia Search Image, audio, and video files Audio and visual similarity search still theory Show context in results Match terms from transcript or OCR Text around image Thumbnails or keyframes Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 90. Multimedia Results Example Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 91. Results: Faceted Metadata Better than forms for structured text data Exposes attributes as part of search results Leverages metadata Topic names, taxonomy Mundane stuff: color, date, size, author... Choices specifically relating to search results Dynamically generates from metadata Preview numbers offer users confidence in clicking Supported by extensive usability testing Used on a majority of large e-commerce sites Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 92. Why Faceted Search is Better Than Forms Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 93. Faceted Metadata: Commerce Example Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 94. Faceted Metadata: Library Catalog Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 95. No Matches Queries: Causes Misspellings and typing errors Scope problem: nothing for that topic Vocabulary differences Users may be less precise, or use competitor’s terms Marketers may dominate content Restrictive search settings Default may only match exact phrase or all words Access control may disallow user Software/hardware/network failures Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 96. No Matches Queries: Responses Track queries with no matches in logs Use sessions, surveys & testing to find user intent Design the no-matches page carefully Explain what is and isn’t on the site Provide useful navigation links Add search engine help Synonyms Best Bets Spelling Add terms to text Add content, topic pages Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 97. No Matches Queries: Spelling Issues Detect and address common problems Spelling errors Typos Queries without spaces between words Use site-specific dictionary Easy to build from search index Never suggests any words not on the site Users familiar with did you mean....? Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 98. Good Example of No-Matches Page Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 99.
  • 100. Search Engine Maintenance Index maintenance Obsolete content removal Check for new content Track technical problems (bad links, servers down) Search quality Re-run test suite Compare with original results Add new test queries Track user feedback, surveys Use metrics and log analysis to catch trends Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 101. Metrics for Search Engines Server uptime Errors: how often and how serious Index Size on disc and in memory Number of entries Number and type of indexing errors Search traffic Queries per minute (60 qpm is common) Average clicks on results items per query Average next-page views per query Number and percent of no-match queries Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 102. Search Log Analysis Most frequent query terms Short head: a few very popular terms Long tail of unique queries Lots of junk: URLs, spam, gibberish Frequent query terms not matched - fix somehow More esoteric analysis - need a lot of data Frequent query terms with low click-through Frequent query terms with high “next page” clicks Raw logs Import into database for ad-hoc reports Session analysis can be enlightening Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 103. Choosing a Search Engine Find specific information needs Analyze content Source and formats formats Rough number of pages/ records / items Define platform, API, language requirements Buy (or use open source), don’t build User surveys show problems with home-grown Choose & compare likely candidates Gathering, indexing, retrieval, relevance features Scaling Administration tools Continuing development, support, user groups Price Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 104.
  • 105. Content Inventory Work with Information Architects Use existing taxonomies and catalogs Learn what you have Simple static HTML pages Other formats: PDF, Office documents (which version) CMS, document management, publishing systems Databases and legacy systems Multimedia audio and video files Identify more and less valuable data Some content should be in archives Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 106. Search Engine Deployment Types Software Controlled by local IT Flexible installation Open-source - several high quality packages Search Appliances Server hardware/software combinations Require very little technical attention Check development and backup server pricing Remote Search Services (SaaS) Index using robot spiders or remote access Query goes to service, results go back to user Low network, hosting, IT load Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 107. Scaling Search to Millions & Billions What are the largest installations for each? Talk to them before committing Cache frequent queries Add query servers, automated load balancing Indexing at scale Indexing on dedicated servers Deal with new calls for near-real-time indexing Distribute multiple clones of indexes Segment indexes, parallel lookups, merge result Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 108. Testing Search Indexing Choose 3-4 good candidates Index as much content as possible Watch the robot, track errors Try to index tricky data sources Compare coverage among them Test index scaling Make a really big index based on expected use Speed of add/ update/ delete Responsiveness during big update Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 109. Evaluating Search Results Create a query test suite Use existing search logs if possible Short, long, unusual, common (check cache) Simple and complex queries Spelling, typing and vocabulary errors Many matches, few matches, no matches Perform searches against the test engines Save results pages as HTML for later checking Analyze differences among them Retrieval (and indexing): what’s found? Relevance: are the top results good ones? Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com
  • 110. Search: Not a Black Box Simple search solves many enterprise problems Dynamic access to local content Familiar interface, expectations User vocabulary Understand the real information needs Index the right stuff Work with content providers and IAs Link to specialty research engines Learn from users over time, make it better Fundamentals of Search Engines 2009 / © Avi Rappoport, www.searchtools.com

Notas do Editor

  1. http://www.slideshare.net/bdelacretaz/beyond-fulltext-searches-with-lucene-and-solrGreat book: Search User Interfaces"by Marti Hearst