"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
Faceted search using Solr and Ontopia
1. Faceted search using Solr and Ontopia 2009-11-03 Geir Ove Grønmo, grove@bouvet.no
2. Agenda Short introductions to Solr and Ontopia What is faceted search? An integration of the two – a prototype Demos
3. Apache Solr A search engine implemented as HTTP service on top of Apache Lucene searching and indexing (no web-crawling) adds support for faceted search (and more) sharding and replication distributed search excellent interoperability (i.e not really Java-specific) Next release: Solr 1.4 Open source: http://lucene.apache.org/solr/ Apache Licence 2.0
4. Ontopia A Topic Maps toolkit: data representation, persistence and querying application development written in Java Next release: Ontopia 5.1 Open source: http://code.google.com/p/ontopia/ Apache Licence 2.0
5. Where the meat is... Solr fast textual search and faceted search support Ontopia rich semantic data and structured search User interface design providing a useful interface to the user
6. But first, what is faceted search? A technique for refining search results Integrates textual search and navigation Allows concept composition slow + expensive + red + used + car article + in english + about salmon people + aged 20-30 + SQL expert punk rock songs + < 1 minute + in norwegian + released 1980-1982 Support exploration and learning Never returns zero results
7.
8. How is it done? Given a starting set usually all documents or the result of filling in the search input box ...do the following: count the number of hits matching each facet field which fields to facet on are defined at query time
13. Facet types Standard facets a list of facet values Hierarchical facet values taxonomy of facet values Range/query facets dates prices alphabet buckets intervals (lower and upper bounds)
18. User interface considerations Single select link radio button Multi select checkboxes Decide on which operator to use: AND/OR within a facet between facets How many facet values to display given limited screen real estate How to provide intuitive undo operation
20. Scoring Some types of documents should be ranked higher than others Solr lets one boost the default score: per document per field The total score of a documents depends on: the boost and score of the fields adjusted by how relevant a field is relatively to the actual query the boost of the document
21. Sorting How to sort the list of facets? by relevance How to sort the values of each facet? by number of hits alphabetically How to sort the search result? by relevance alphabetically by date
23. Why not use Ontopia only? You can, but it is not optimizedfor this use case It lets you implement faceted search but it’ll be too slow The reasons are: all the expensive processing will have to happen at runtime, and not indexing time involves a lot of traversal relies on the underlying fulltext search engine search has limited cacheability
24. Trade-offs Considerations: Search performance Indexing performance Consistency Ontopia no indexing overhead results always up-to-date Solr very fast search indexing overhead index must be kept up-to-date regularly
25. Solr – the data model An index contains documents Documents have fields A field can have multiple values { “id”: “1234”, “title”: “Structure and Interpretation of Computer Programs”, “authors”: [“Harold Abelson”, “Gerald Jay Sussman”] }
26. Ontopia – the data model A topic map contains topics and information about them Identities Names Associations to other topics Occurrences (read: non-association properties)
27. Integrating Solr and Ontopia Proposed solution: Solr indexes constructed from Ontopia queries For each document type create a query that extracts data from the topic map to fields in documents Then do faceting on selected fields Use-case specific schema definition should be project specific (to some degree) Perform full index or incremental reindex
36. Ideas for the future Faceted search user-interface in Ontopoly could be made declarative Incremental reindexing requires tracking changes usually done with a timestamp implement last-modified field in Ontopoly Add optional fourth column for score boost? a float between 0 and 1 Ontopia extensions for interacting with Solr JSP tag library tolog predicates
37. More demos Epicurious: recipe search http://www.epicurious.com/tools/searchresults?search= Flickr photo search with hierarchical facets http://people.csail.mit.edu/dfhuynh/projects/hierarchical-facets/test.html A collection of faceted navigation examples: http://www.flickr.com/photos/morville/collections/72157603789246885/
38. More information 3 Quick Design Patterns for Better Faceted Search http://www.thingsontop.com/3-quick-patterns-better-facet-design-889.html How to Make a Faceted Classification and Put It On the Web http://www.miskatonic.org/library/facet-web-howto.html Book: Faceted Search (Synthesis Lectures on Information Concepts, Retrieval, and Services), Daniel Tunkelang
39. ...is easier to find when using faceted search. Structured semantics-rich data...