Hello everyone, My name is Antoine Isaac I work for the Free University in Amsterdam, for the EuropeanaConnect project. And today, I'm going to talk about plans to use formalized data to enhance search processes in Europeana
Europeana is rooted in the European Commission's initiative on information society It aims at better connecting citizens to their cultural heritage curated by libraries, museums, archives, and so on. It is about giving access to digitized content and the rich knowledge that is associated to them, as produced by these institutes. It is a large, pan-European efforts: dozens of institutions are now involved. And From 2 million objects in the first beta release Europeana is expected to give access to 10 million objects next year.
What does it look like? So if you go to europeana.eu, you will currently find yourself on that page.
There is a search box where you can type the query you are interested in. For instance, Da Vinci.
For that query, you get a bunch of documents that are matching your query. This is pretty simple text-like search, based on the metadata associated to the objects collected by Europeana. What you get is a raw set of documents.
But you can refine this result, first by using some facets like country, date, etc.
For example, we can select country.
We have a list of countries that appear in the metadata for the current results.
We can for example click on UK And we get a smaller set of results.
These results are actually stuff that is coming from UK institutions, not stuff for example, *about* UK.
It is possible to use advanced search to somehow solve this issue. We can ask for instance for works by Da Vinci about UK. And naturally we'll find no results.
All this is great for a start. But there is a shared feeling by many people in the project that there can be more to it. Especially with respect to advanced search, and assisting users to orient themselves into such a big information space. One of the main options that is foreseen is the use of more semantics in the search process. The idea is to enhance access to Europeana content using query expansion mechanisms, or clustering of results: things that have already been tested in a number of projects. Ideally a better search would be able to exploit different type of relations between the entities appearing in the information spaces: distinguishing different links such as located in, is born in, created can be very useful for search. Also, some inference process can be beneficial here: for instance, if one queries for UK, it could be handy to find items that were only related to London in their description The point is that we know that there are already quite some rich descriptions available in the metadata. It requires to make that information properly machine-accessible, and design the tools to exploit it.
And that's were formal, linked data can come into play:
By allowing to build and exploit a kind of semantic layer on top of the items collected by Europeana. That semantic layer (a concept introduced in the context of the Europeana v1.0 project) would serve as an interface between the users's query needs and the item descriptions.
This is actually already been investigated in the context of the Europeana Thought Lab prototype, which I'm going to demonstrate now. The Thought Labe has been developed at the Free university and the CWI in Amsterdam It is a kind of mini-Europeana, as it works on just three collections. But it relies on formalized data that has been semantically aligned, allowing to experiment with some new features. The Thought Lab just starts as the normal portal: we have a search box, in which we can enter textual queries
The first difference is the autocompletion that is activated while typing in the search box. The tool returns the elements that are known in the information space and match the query. For example if I type Egypt, I get a number of artefacts' names, locations names or other concepts.
If I select the location Egypte, I get a number of results, but they are clustered: the first two ones are items that show Egypte
The reason for which they are here can be seen in their metadata:
They have a matching subject. But it is important to notice that this is a true matching of URIs, not simple string matching alone. In fact Egypte (iwht an e) is the label of the concept of Egypt which comes from a controlled vocabulary in the Rijksmusem in amsterdam. We can see above that when we selected Egypt in the first query step, we selected a resource with a URI Actually this explains the second cluster, about Egypt without an e: here are the work that are described using a concept from another vocabulary for Egypt, but which we know to be equivalent to the one in the Rijskmuseum that is in our query.
The third cluster seems a bit stranger: a more specific Egypt?
In fact these places, as can be seen from their metadata,
Have a subject which is a specific place in Egypt
And the system knows, from the description of that place in the Rijksmuseum vocabularies
That it is a more specific concept than Egypt. Hence its appearing in that cluster.
Other clusters how how different paths in the information space can be followed to lead to even less anticipated results For example, works created by persons who died in Egypt
Here are items that, as seen from their metadata,
were created by someone (here, a French photographer) who died in Egypt. It is interesting to note that here, if I click on the creator (Gustave Le gray)
The information on the death place is not found in the original resource's description
Actually, this knowledge comes from the fact that the resource standing for Le Gray in the first vocabulary has been linked (we say match) to the same person as represented in another vocabulary
Where it is decribed
As being dead in a place
For which we also have information
And which is fact Cairo, which is more specific than Egypt. Quite a long path, but it brings some serendipity which can be beneficial to have for addressing more complex users' needs