Développer la clé de voute du web de données culturelles et scientifiques / Developing a backbone for the web of cultural and scientific data.
Jurgen Kett, chef du département Library Standards et responsable du GND, Deutsche National Bibliothek
Journées ABES 2018
2. The idea of linked data or of the semantic web is to combine data beyond the boundaries of systems and domains
in a meaningful way. The idea is not new. In fact as librarians we have long tradition of interlinking, sharing and re-
using data. Perhaps that is one of the reasons why libraries embraced the movement very quickly.
The difference to our tradition is, that we were using our own standards, formats and protocols that where hard to
understand. Now it seemed by doing some minor tranformations we could directly plug into the world. It was too
tempting. We have just been treated as dinosaurs, suddenly we find ourselves setting the tone in future
technologies. Since then we have learnded that it‘s not that easy. It takes much more time and efforts to build a
living cultural semantic web. It is not enough to just change the cover, we need to revise the way we are working
and the environments we are using. One important step towards this goal is to develop standard ways for
interlinking cultural collections. The existing concept of authority data is a perfect way to do this.
2
3. Our goal is to make the GND which still is mainly used by libraries a cross-domain community project. This project
will address perhaps the most important task of these days: building bridges.
3
4. The bridges the GND provides are components of the Semantic Web. It connects data collections. The GND is the
central authority file of the german speaking world. The most intensive application is in Germany, Austria and
Switzerland – but also in South Tyrol, Liechtenstein and Luxembourg.
4
5. It contains records of persons, corperate bodies, works, conferences, geografical names an subject headings.
5
6. In order to make the data available to a broad user base and to maximize its dissemination, the data is available
under a free license. Also important for the distribution, is the offer of different formats. The most used format is still
MARC21. But increasing use outside the library domain has made RDF and JSON-based formats more
meaningful.
6
7. Currently the GND contains about sixteen million records. The majority of these records (three-quarter) describe
persons. As you can see on the diagram a big portion of are “Names of Persons”. These records are of less good
quality and are not precise. One challenge is to improve as much of this data as possible.
As with the introduction of the FRBR-based RDA in the german-speaking world the number of works in the GND
will grow very fast.
7
8. The main application of the GND database was initially the standardization of search entries and the re-use of
cataloging data. But with the success of the World Wide Web, its potential as a tool for semantically linking
publications in meaningful machine-readable ways has become increasingly important. Over the years our partners
integrated all there local authority data into the GND database. More and more collections and datasets referred to
it. Step by step the GND became a data network. You can distinguish roughly the three kinds of links that are listed
here.
8
9. The most important feature of the GND is its very co-operative character. The GND is carried by a lot of partners –
most of them national libraries and library centers. Hundreds of libraries an other institutions are connected via
these partners. We all jointly maintain the GND. We have jointly built it over the years, defined the rules for it, and
specified interfaces and business processes. We argued heavily about strategic decisions. And whenever old local
remainders had to be integrated (forgotten, so-called authority data), we were regularly annoyed about each other's
data garbage. In short: We have learned to manage an authority file together.
9
10. Since the last couple of years our user community has grown. That is is a great opportunity, but our organisation
and system do not scale enough. The biggest challenge is the integration of new communities (e.g. museums,
archives, historic preservation, portals).
10
11. But it is also clear that the complexity of the system is increasing due to the different perspectives.
11
12. In order to meet these challenges, we have set up an initiative. It covers the revision of the organization
(responsibilities, rights and obligations), the rules (cataloging rules, workflows and rights management), the system
environment (tools and interfaces for editing, analyzing and visualizing) and technical infrastructure. As a basis, we
set up a new organizational structure, agreed on common strategic goals and principles and defined a work
program.
12
13. Before the start of the initiative, the GND had no formal organizational structure. Although it is great in principle,
when things work without the usual formalisms, this situation had some disadvantages. The roles and
responsibilities (especially with regard to strategic issues) were not clear. It was obvious that a binding structure
was needed to handle the fundamental modernization of the GND.
13
14. The new organizational structure needed to be scalable. It must be able to grow as new partners from different
domains join. Since 2017 we agreed on the new organizational model shown here. The GND committee is
responsible for strategic decisions and oversees the operation of the GND. The GND office provides the common
core system environment (GND platform) and services - including quality control services. The office also oversees
the development of rules, formats and the GND platform. The so-called agencies are mainly responsible for the
management of user groups (participants). They represent the interests of their participants in the GND Committee.
Agencies coordinate change requests and change implementation, provide support and training. And they are
responsible for the quality of all the data provided by their participants. This means that if problems arise with a
record, they are obliged to take care of it.
14
15. With the founding of the GND cooperative, we also agreed common principles.
The principles are partly in contradiction to each other and form a field of tension. They are already showing what
the fundamental challenges of the coming years are. For example, in future the GND should be consistently
designed across domains and take into account the requirements of non-library institutions. On the other hand we
commit ourselves to consistency (unambiguous) and a high uniform data quality (obliging rules, trusted quality).
15
16. The final step to get the development going was to set up a work program with six Action Fields.
16
17. Our goals are impossible to achieve without additional ressources and political support. Fortunately our vision has
wide support in the whole scientific and cultural community.
It is a shared vision. “Die Deutsche Forschungsgemeinschaft (DFG)” gives us the opportunity to obtain substantial
funds for essential development-intensive parts of the program.
17
18. Numerous cultural institutions participate in this from their own resources. We s also been able to build strategic
partnerships with other communities.
18
19. These are ongoing and timely planned projects of the work program. The project “GND4C”, which is to build up
important bases for the opening of the GND, forms something like the centerpiece. There are also numerous
activities outside of projects.
19
21. The basic idea of Action Field 1 is that new partners join together to form interest groups. These stakeholders
should then be assigned in a second step, either an existing agency or create a new own agency. Of course, this is
a gradual process and there is a lot to learn about each other. In order to enable a constant dialogue,
representatives of the interest group are involved in the committees of the GND from the beginning. It is clear that
the policy of cooperation needs to be further developed in view of the needs of these new partners. Important
preparatory work has already been done in setting up the German Digital Library (DDB), a kind of national sibling
from EUROPEANA. The structures created there should be reused and strengthened.
21
22. Action Field 2 addresses the already mentioned contradiction between uniqueness and unity on the one hand and
community-specific demands on the other: On the one hand, the rules and the data model should better meet the
needs of the new partners. On the other hand, it should promote cross-domain collaboration. Through years of
collaboration with special interest groups, we have learned that it is useless to force everyone into a single model.
In practice existing data fields will be reinterpreted. Unfortunately, this leads to incompatibility, misunderstandings
and as a rule to the creation of duplicates and to inefficient and frustrating “edit battles”. Therefore, we plan to
introduce modular extensions for specific stakeholders (GND-PLUS). The rules and fields within these extensions
are set by the stakeholders. The fields are protected. They are not obligatory for the other groups but the
information can be used by everyone. Many library-specific changes will be in a GND-PLUS space.
22
23. Action field 3 „Import and dataming“ is about creating tools that allow efficient data analysis and data integration.
New partners come with previously unconnected datasets and collections. We need tools and workflows in order to
support the integration, better tools for quality control and better support for adding internal and external links.
23
24. We also need to improve the access to the GND network. Currently the undelying network of interconnected
collections and datasets is not really visable to users. Our goal is to provide a central entry point to that network.
This entry point will not be a giant integrated portal, but a signpost to connected datasets, colections and services.
It will also provide means to explore the GND.
24
25. Action 5 is necessary because the data exchange infrastructure is unstable and will not survive further growth.
Currently many partners mirror the complete database locally. The number of updates depends on the processing
speed of the slowest connected system. We have to pour a swimming pool full of changes cup by cup through a
tiny tube to prevent overflows. The main difficulty with this work package is that it also has to deal with the local
systems.
25
26. Last but not least Action Field 6 „collaboration“ cares about reaching out for new user groups and usage scenarios.
We want other user groups to use GND as a tool in their environment. In particular, authors and publisher should
discover the GND for themselves and immediately claim their publications with a personal unique identifier. We are
still in the early stages of this and are currently starting a project together with important representatives from the
publishing industry. In order to reach scientists and universities, we started a co-operation project around ORCID
with various project partners (ORCID DE).
26
27. We also want to make the GND more attractive to software developers and research projects. Therefore, we plan
to develop a lightweight API for the GND, as well as build a registry for existing projects and tools.
27
28. Even more important is the cooperation with the wikimedia projects "Wikipedia" and "Wikibase". For about ten
years there is a successful cooperation with Wikipedia. Wikipedians connect articles to the GND and make
suggestions for correction. Our goal is to bring the Wikidata community even closer to the GND and vice versa. We
also plan to evaluate the re-use of Wikibase - the software running on Wikidata.
28
29. In the end, it's about skillfully complementing each other's strengths. We hope that through our initiative we will
make a difference for other, similar projects.
29
30. If you look at the parallels between our two endeavors, the idea of a European authority data system comes up. If
our data hubs are based on similar flexible concepts and based on the same software base, then in a second or
third step, it will not be difficult for us to interlink these hubs. We should just try it.
30