The Mosaic search engine is a prototype of an bibliographic search engine with personalisation facilities produced as part of the JISC-funded Mosaic Project
Handwritten Text Recognition for manuscripts and early printed texts
Mosiac Search Engine
1. The Mosaic Search Engine Mark van Harmelen Hedtek Ltd markvanharmelen@gmail.comhedtek.com
2. Aim Provide a proof of concept that Users can have personalised search results according to their place and stage of studies Users can adopt other personas or points-of-view to explore academic resources We can exploit ‘mass’ attention data as revealed by library circulation information So far only working with ISBN identified books
3. HEI circulation data build Solr index anonymise partial Copac records annotated with use and reading list data reading lists Solr HEI anonymise front-end HEI anonymise
4. Anonymisation Level 1: Current prototype, enables faceting Level 2: With extra information, enables“people who borrowed this also borrowed”and“people who borrowed this went on to borrow” Anonymisationutility provided DPA compliant, can also use fair processing agreements
5. Augmenting Solr’s index Solr’s search index is loaded with items and any associated use information Use information is: institution course progression level year of use count of number of uses in that year Use information enables faceting Also add reading list info to items
7. Narrowing and broadening Thoughts (NB, ‘thoughts’) of narrowing of choice led to two features to broaden choice Don’t believe that the Mosaic demo in itself narrows when used for browsing Broadening features More like this link Reading lists
8. The Harry Potter ‘problem’ and scale The Harry Potter ‘problem’: Balderdash! We can control this using Library of Congress subject categories and Dewey Decimal shelfmarks Paul Miller raises questions of scale Dave Pattern has shown success use of use data at a single (small) institution We want to leverage reasonably large scale: 3.5-4M students in HE, over say the last five years
9. User context and attention Has been relatively simple to parameterise an open source search engine with user context Institution, course, progression level, academic year This is only part of the user context, can add Location Attention data, e.g., search history Further social search information
10. Disclaimer The next slide is independent of any decisions on a pure data approach Could be a pure data approach in there Or maybe not
11.
12. Mosiac searchpersonalised/point-of-view search Massively parallel search for blindingly fast response times Data mining for library ‘stewardship’ We have prototypes for the first two, and we’re about to start experimenting with parallel search using Hadoop+Lucene
13. Building institutional contributions Propose union-cat-local: Search in local library Mosaic-like search utilises local loan data if it is available Two ways to encourage library contribution of loan data (thoughts in progress) Narrow: Libraries which contribute loan data to the pool get Mosaic search over the pool Broad: Offer the contextual/PoV search available everywhere; users will agitate if they don’t see local data
14. This is a Just Do It moment A national union catalogue with contextual search and local library interfaces Relatively cheap to do Potentially massive gains for learners, teachers and researchers Portends the development of shared services across the library domain and large cost savings Doesn’t preclude / agnostic on an open data approach Could incorporate a pure data service approach and/or a centralised service