SlideShare uma empresa Scribd logo
1 de 29
Wikidata, a target for Europeana’s semantic strategy
Valentine Charles, Hugo Manguinhas, Antoine Isaac: Europeana
Vladimir Alexiev: Ontotext Corp
GLAM Wiki 2015, Den Haag
Europeana.eu, Europe’s cultural heritage portal
40M objects from 2,200 galleries, museums, archives and libraries
Europeana has many data challenges: diversity
 Aggregates metadata from the cultural heritage sector in Europe
• Large amount of references to places, agents, concepts, time
Europeana has many data challenges: diversity
 Metadata in more than 30 languages
 From all EU countries
Europeana’s priority 1: Improve data quality
 Europeana Data Model (EDM), a framework for richer data
• Re-uses several existing Semantic Web-based models
Dublin Core, OAI-ORE, SKOS, CIDOC-CRM…
• EDM gives support for contextual resources (semantic layer)
 Rely on vocabularies to solve a problem of data interlinking
• Encourage data providers to contribute their own vocabularies
and benefit from data links made at data providers’ level
Vocabularies currently provided to Europeana
Europeana also manages its own vocabularies
Europeana performs automatic enrichment
based on vocabularies
Goal: Contextualization which reaches
outside the scope of a particular platform
ObjectObject
Automatic enrichment process in Europeana
• Selection of
metadata fields
in resource
descriptions
• Selection of
potential rules
to match
• Selection of
metadata fields
in resource
descriptions
• Selection of
potential rules
to match
AnalysisAnalysis
• Matching the
values of the
metadata fields
to values of the
contextual
resources
• Adding
contextual links
• Matching the
values of the
metadata fields
to values of the
contextual
resources
• Adding
contextual links
LinkingLinking
• Selecting the
values from the
contextual
resource
• Augmentation of
the search index
with the labels
from the
vocabulary
• Selecting the
values from the
contextual
resource
• Augmentation of
the search index
with the labels
from the
vocabulary
Augmentation
Enrichment Types and Current Vocabularies
Enrichment Type Target vocabulary
Source
metadata fields
Places GeoNames dcterms:spatial,
dc:coverage
Concepts GEMET, DBpedia dc:subject, dc:type
Agents DBpedia dc:creator,
dc:contributor
Time Semium Time dc:date, dc:coverage,
dcterms:temporal,
edm:year
Europeana enrichment - an example
How Wikidata fits in Europeana’s
semantic strategy?
Wikipedia's Relevance for Cultural Heritage
 Authority Lists and Thesauri have central importance in CH
 Wikipedia being "the sum of all knowledge" has broader reach than
any institutional authority list
 Only large-scale aggregations like VIAF (35 institutions) and LCSH
(about 10 libraries around LoC) are comparable
 While some facts are inaccurate and disputable, Wikipedia has a
great role as a source of stable URLs on all kinds of topics
How Big is Wikidata?
 Name data sources for semantic enrichment (Europeana Creative
D2.4) gives DBpedia and Wikidata stats
 Wikidata: 3y old, 14M items, 209M edits
 2.7M humans, 5k families, 22k literary characters
 215k organizations
 66k creative orgs (bands, radio/TV stations, newspapers…)
 30k educational institutions
 20k non-profit orgs
 13k GLAM orgs: 0.5k galleries,1k libraries, 0.2k archives, 9k
museums
 500k creative works
 110k heritages sites and monuments
 40k family names, 20k first names
Is this big enough?
 Wikidata: 2.7M humans, 215k organizations, 800k places, 500k works
 VIAF: 35M personal names, 5.4M orgs/conferences, 410k places,
1.7M works
 GeoNames: 9M places
 Only 1.1M persons are coreferenced, see
Authority Addicts: The New Frontier of Authority Control on Wikidata
 VIAF much bigger but still Wikidata is very important for GLAM:
 Wikidata is active in Authority Control and Coreferencing
 (VIAF) Moving to Wikidata: will get 1M persons/orgs, and many
multilingual names (see next)
 Authority Files have barely more than names & dates; Wikipedia
often has a lot more info
Wikidata Multilingual Coverage
 Wikidata/DBpedia has huge multilingual coverage
 Each entity is represented in 2.11 Wikipedias on average (see
Europeana food and drink classification scheme, EFD D2.2)
 But popular entities are present in many more (up to 180); and
even in one Wikipedia there are many languages
 E.g. Lucas Cranach in Wikidata: 57 lang tags, representing 44
languages and 13 language variants
 Languages are consistently marked
 Important for semantic enrichment (Named Entity Recognition)
 Even though language labels in Europeana are not consistent 
Name Variants for Lucas Cranach
 Wikidata and VIAF each have 70 variants and
dominate the "Wikipedia tradition" and
"Library tradition" datasets respectively (see
Name data sources for semantic enrichment)
 Only 5 variants are in common (see
Interactive Venn diagram)
 Excellent complementarity. VIAF has more
variants, Wikidata more multilingual names
 VIAF's move to sync to Wikidata will narrow
the gap
Wikidata is connected to other vocabularies
 Europeana prefers using pivot vocabularies
• that are connected to many other vocabularies
• It is key to avoid duplication and redundancy
 Wikidata has lot of coreferences to other vocabularies that
can be used to create extra links, and extract missing data
• https://www.wikidata.org/wiki/Wikidata:WikiProject_Authority_control
• https://twitter.com/hashtag/coreferencing: shots and news
• Please tweet!
VIAF-Wikidata Coreferences for Lucas Cranach
 Can be leveraged to fill the gaps, e.g. bring RKDartists into VIAF
VIAF id in VIAF Wikidata id in Wikidata
viafID 49268177 VIAF 49268177
BAV ADV10197613
BNC .a10853637
BNE XX907273
BNF cb12176451h BNF 12176451h
DNB 118522582 GND 118522582
ISNI 0000000121319721 ISNI 0000 0001 2131 9721
JPG 500115364 ULAN 500115364
LC n50020861 LCCN n50020861
LNB LNC10-000002573
NDL 00436834
NKC jn20000700335
NLA 000035031951
NLI 000035532,001445575,001448179
NLP a16828161
NTA 068435312 NTA PPN 068435312
NUKAT vtls000190728
SELIBR 182422
SUDOC 028710010
WKP Lucas_Cranach_the_Elder Many Wikipedias
IMAGINE T7238,T267474 Cantic a10853637
Commons Creator Lucas Cranach (I)
Commons category Lucas Cranach d. Ä.
Freebase /m/0kqp0
RKDartists 18978
SIMBAD CRANACH, Lucas the Elder
Your Paintings lucas-the-elder-cranach​ ​ ​
Wikidata Coreferencing (1)
 Excellent Mix-n-Match tool by Magnus Manske. 54 catalogs loaded!!
 Decent auto-matching and excellent crowd-sourcing features
Wikidata Coreferencing (2)
 Excellent Authority Control navbox in Wikipedia
 E.g. matching British Museum person-institution thesaurus
(currently not coreferenced to anything: high value to BM)
Europeana Food and Drink
 How do you define such wide area as Food and Drink,
which is so pervasive in every day life and culture?
 Europeana food and drink classification scheme (EFD D2.2,
or presentation) studies ~20 datasets for relevance to FD
 Concludes that Wikipedia is our playing ground, and we
should try to use Wikipedia Categories to delineate the topic
• AGROVOC has 32k concepts but on production/science
• Wikipedia/DBpedia has 6.6k proper Foods (with infoboxes and
ingredients)
• But I estimate 0.6-1.2M things relevant to FD in all Wikipedias
 Background image: 2 levels of Food_and_drink cat hierarchy
Wikidata is Easily Accessible
 It is important for Europeana to have the data
• Technically available:
• Data dump preferably as Linked Data (RDF)
• SPARQL end-point or other query mechanism (e.g. WDQ)
• Properly documented and structured
• Wikidata has an excellent Property Proposal process
• Wikidata integrity constraints are excellent
• In contrast, no Class creation process, so the classes are quite a
mess (16k of which 2/3 have less than 5 instances)
• Data templates should be made more visible and be used as
references
• Open access
Wikidata Property Integrity Constraints
 E.g. ULAN id constraints help to find records to merge / split
 E.g. Communist Party of the Russian Federation has 5 LCNAF id's,
what's up? Is it so popular with the Library of Congress?
How Wikidata will be used by Europeana
 Semantic Enrichment of Europeana data with additional
information
• With a specific focus on entities such as persons and concepts
 Linking Europeana objects with Wikidata
• Approach similar to
https://www.wikidata.org/wiki/Wikidata:WikiProject_sum_of_all_pa
intings
• But would be extended to the whole Europeana dataset
• Links would be added in the Europeana data
 Structure (data template) for CH objects (e.g. paintings) still
not very rich on Wikidata, e.g. Measurements not there
• Improvements are made all the time, but see next
Wikidata Items as Linking Hubs
 Still, they're
great as stable
URLs
 Providing the
basic info
(who, when,
where, what)
 And acting as
coreferencing
hubs
 I don't expect Wikidata CH objects to ever be described in the full
richness & complexity of professional art research. E.g. see
British Museum Mapping to CIDOC CRM
Wikidata and DBpedia
 Wikidata and DBpedia are the two structured representations of
Wikipedia
 Wikidata: initially populated from Wikipedia, manually curated, will
master structured data for Wikipedia. Synchronized through an
assortment of bots
 Data is fairly accurate but data depth is still small
 DBpedia: automatically extracted from Wikipedia, live update, one-
way extraction only.
 Data reach is deep, but there are many problems in ontology and
individual mappings, especially for non-English. E.g. United
Nations is extracted as "Country". See DBpedia Ontology and
Mapping Problems.
Should they be together?
GLAMs should add to Wikipedia or Wikidata!
 EFD project. Swiecenie Koszyczek, "blessing of the baskets", a
colorful Polish tradition
 There's no article in pl.wikipedia.org, so we can't relate such
artifacts to anything
 Content partner's museum staff have no time to make a proper
Wikipedia article
 But adding a Wikidata
item is quick & easy
 Appropriate
categories (Easter
Traditions, Easter-
related Foods) will
put it in context
Thank you
Valentine Charles, valentine.charles@europeana.eu
Vladimir Alexiev, vladimir.alexiev@ontotext.com
Hugo Manguinhas, hugo.manguinhas@europeana.eu

Mais conteúdo relacionado

Mais procurados

Europeana bergen may2010_dovwiner
Europeana bergen may2010_dovwinerEuropeana bergen may2010_dovwiner
Europeana bergen may2010_dovwiner
Dov Winer
 

Mais procurados (20)

A portrait of Europeana as a Linked Open Data case
A portrait of Europeana as a Linked Open Data caseA portrait of Europeana as a Linked Open Data case
A portrait of Europeana as a Linked Open Data case
 
AAC Education Session
AAC Education Session AAC Education Session
AAC Education Session
 
Semantic Web, Linked Data: the Europeana case(s)
Semantic Web, Linked Data: the Europeana case(s)Semantic Web, Linked Data: the Europeana case(s)
Semantic Web, Linked Data: the Europeana case(s)
 
Europeana DSI - LT-Accelerate 14
Europeana DSI -  LT-Accelerate 14Europeana DSI -  LT-Accelerate 14
Europeana DSI - LT-Accelerate 14
 
Data modelling at Europeana and DM2E - SMW13
Data modelling at Europeana and DM2E - SMW13Data modelling at Europeana and DM2E - SMW13
Data modelling at Europeana and DM2E - SMW13
 
Multilingual challenges for accessing digitized culture online - Riga Summit 15
Multilingual challenges for accessing digitized culture online - Riga Summit 15Multilingual challenges for accessing digitized culture online - Riga Summit 15
Multilingual challenges for accessing digitized culture online - Riga Summit 15
 
Europeana @ NISO Bibliographic Roadmap Meeting
Europeana @ NISO Bibliographic Roadmap MeetingEuropeana @ NISO Bibliographic Roadmap Meeting
Europeana @ NISO Bibliographic Roadmap Meeting
 
Challenges for the Language Technology Industry
Challenges for the Language Technology IndustryChallenges for the Language Technology Industry
Challenges for the Language Technology Industry
 
EuropeanaTech update - Europeana AGM 2015
EuropeanaTech update - Europeana AGM 2015EuropeanaTech update - Europeana AGM 2015
EuropeanaTech update - Europeana AGM 2015
 
Enriching Cultural Heritage Data with DBpedia
Enriching Cultural Heritage Data with DBpediaEnriching Cultural Heritage Data with DBpedia
Enriching Cultural Heritage Data with DBpedia
 
Linked Data for EuropeanaCultural Heritage: the Europeana approach
Linked Data for EuropeanaCultural Heritage: the Europeana approachLinked Data for EuropeanaCultural Heritage: the Europeana approach
Linked Data for EuropeanaCultural Heritage: the Europeana approach
 
EIFL 2014 - Linked Open Data
EIFL 2014 - Linked Open DataEIFL 2014 - Linked Open Data
EIFL 2014 - Linked Open Data
 
Semantic Interoperability at Europeana - MultilingualDSIs2018
Semantic Interoperability at Europeana - MultilingualDSIs2018Semantic Interoperability at Europeana - MultilingualDSIs2018
Semantic Interoperability at Europeana - MultilingualDSIs2018
 
Europeana and Schema.org - DC2013
Europeana and Schema.org - DC2013Europeana and Schema.org - DC2013
Europeana and Schema.org - DC2013
 
Designing a multilingual knowledge graph - DCMI2018
Designing a multilingual knowledge graph - DCMI2018Designing a multilingual knowledge graph - DCMI2018
Designing a multilingual knowledge graph - DCMI2018
 
EDM - American Art Collaborative LOD Meeting
EDM - American Art Collaborative LOD MeetingEDM - American Art Collaborative LOD Meeting
EDM - American Art Collaborative LOD Meeting
 
Europeana and open data
Europeana and open dataEuropeana and open data
Europeana and open data
 
Europeana bergen may2010_dovwiner
Europeana bergen may2010_dovwinerEuropeana bergen may2010_dovwiner
Europeana bergen may2010_dovwiner
 
Achieving Interoperability between the CARARE Schema for Monuments and Sites ...
Achieving Interoperability between the CARARE Schema for Monuments and Sites ...Achieving Interoperability between the CARARE Schema for Monuments and Sites ...
Achieving Interoperability between the CARARE Schema for Monuments and Sites ...
 
Archaeology in Europeana’s publishing framework
Archaeology in Europeana’s publishing frameworkArchaeology in Europeana’s publishing framework
Archaeology in Europeana’s publishing framework
 

Destaque

Networked books and networked reading
Networked books and networked readingNetworked books and networked reading
Networked books and networked reading
Camille Hartsell
 
ALA2009_Andy Weissberg (Bowker)
ALA2009_Andy Weissberg (Bowker)ALA2009_Andy Weissberg (Bowker)
ALA2009_Andy Weissberg (Bowker)
bisg
 

Destaque (20)

NISO Annual Report of 2012 Activities
NISO Annual Report of 2012 ActivitiesNISO Annual Report of 2012 Activities
NISO Annual Report of 2012 Activities
 
Finding media illustrating events
Finding media illustrating eventsFinding media illustrating events
Finding media illustrating events
 
Implementing the Media Fragments URI Specification
Implementing the Media Fragments URI SpecificationImplementing the Media Fragments URI Specification
Implementing the Media Fragments URI Specification
 
Networked books and networked reading
Networked books and networked readingNetworked books and networked reading
Networked books and networked reading
 
Interlinking Multimedia: How to Apply Linked Data Principles to Multimedia F...
Interlinking Multimedia: How to Apply Linked Data Principles to Multimedia F...Interlinking Multimedia: How to Apply Linked Data Principles to Multimedia F...
Interlinking Multimedia: How to Apply Linked Data Principles to Multimedia F...
 
NISO's Standards Update & Annual Membership Meeting
NISO's Standards Update & Annual Membership MeetingNISO's Standards Update & Annual Membership Meeting
NISO's Standards Update & Annual Membership Meeting
 
ALA2009_Andy Weissberg (Bowker)
ALA2009_Andy Weissberg (Bowker)ALA2009_Andy Weissberg (Bowker)
ALA2009_Andy Weissberg (Bowker)
 
Europeana and RDF data validation
Europeana and RDF data validationEuropeana and RDF data validation
Europeana and RDF data validation
 
Progress Report on Government Linked Data Worldwide
Progress Report on Government Linked Data WorldwideProgress Report on Government Linked Data Worldwide
Progress Report on Government Linked Data Worldwide
 
Uncork Your Licenses!
Uncork Your Licenses!Uncork Your Licenses!
Uncork Your Licenses!
 
Uncork Your Licenses
Uncork Your LicensesUncork Your Licenses
Uncork Your Licenses
 
Expanding the Possible: What’s New and Upcoming in Standards and Technologies...
Expanding the Possible: What’s New and Upcoming in Standards and Technologies...Expanding the Possible: What’s New and Upcoming in Standards and Technologies...
Expanding the Possible: What’s New and Upcoming in Standards and Technologies...
 
Carpenter Library Assessment Conference Presentation
Carpenter Library Assessment Conference PresentationCarpenter Library Assessment Conference Presentation
Carpenter Library Assessment Conference Presentation
 
NISO's Altmetrics Initiative
NISO's Altmetrics InitiativeNISO's Altmetrics Initiative
NISO's Altmetrics Initiative
 
Intro D2D Paper ER&L Feb 2015
Intro D2D Paper ER&L Feb 2015Intro D2D Paper ER&L Feb 2015
Intro D2D Paper ER&L Feb 2015
 
Lagace Presentation on the NISO Open Access Metadata and Indicators Project a...
Lagace Presentation on the NISO Open Access Metadata and Indicators Project a...Lagace Presentation on the NISO Open Access Metadata and Indicators Project a...
Lagace Presentation on the NISO Open Access Metadata and Indicators Project a...
 
Carpenter Update on NISO Altmetrics Initiative at CNI Fall meeting in Washing...
Carpenter Update on NISO Altmetrics Initiative at CNI Fall meeting in Washing...Carpenter Update on NISO Altmetrics Initiative at CNI Fall meeting in Washing...
Carpenter Update on NISO Altmetrics Initiative at CNI Fall meeting in Washing...
 
Uncork Your Licenses!
Uncork Your Licenses! Uncork Your Licenses!
Uncork Your Licenses!
 
The Infrastructure for Alternative Metrics
The Infrastructure for Alternative MetricsThe Infrastructure for Alternative Metrics
The Infrastructure for Alternative Metrics
 
ER&L SUSHI ALI Feb 2015
ER&L SUSHI ALI Feb 2015ER&L SUSHI ALI Feb 2015
ER&L SUSHI ALI Feb 2015
 

Semelhante a Wikidata, a target for Europeana's semantic strategy - GLAM-WIKI 2015

Contributing to the global commons: Repositories and Wikimedia
Contributing to the global commons: Repositories and WikimediaContributing to the global commons: Repositories and Wikimedia
Contributing to the global commons: Repositories and Wikimedia
Nick Sheppard
 
Semantic Web and Cultural Heritage Collections
Semantic Web and Cultural Heritage CollectionsSemantic Web and Cultural Heritage Collections
Semantic Web and Cultural Heritage Collections
RyanRM
 

Semelhante a Wikidata, a target for Europeana's semantic strategy - GLAM-WIKI 2015 (20)

Links, languages and semantics: linked data approaches in The European Libra...
Links, languages and semantics: linked data approaches in The European Libra...Links, languages and semantics: linked data approaches in The European Libra...
Links, languages and semantics: linked data approaches in The European Libra...
 
Valentine Charles: Linking cultural heritage with KOS: the Europeana example
Valentine Charles: Linking cultural heritage with KOS: the Europeana example Valentine Charles: Linking cultural heritage with KOS: the Europeana example
Valentine Charles: Linking cultural heritage with KOS: the Europeana example
 
When Semantics support Multilingual Access to Digital Cultural Heritage - the...
When Semantics support Multilingual Access to Digital Cultural Heritage - the...When Semantics support Multilingual Access to Digital Cultural Heritage - the...
When Semantics support Multilingual Access to Digital Cultural Heritage - the...
 
Contributing to the global commons: Repositories and Wikimedia
Contributing to the global commons: Repositories and WikimediaContributing to the global commons: Repositories and Wikimedia
Contributing to the global commons: Repositories and Wikimedia
 
Wikipedia as source of collaboratively created Knowledge Organization Systems
Wikipedia as source of collaboratively created Knowledge Organization SystemsWikipedia as source of collaboratively created Knowledge Organization Systems
Wikipedia as source of collaboratively created Knowledge Organization Systems
 
The Future of Libraries and Wikipedia
The Future of Libraries and WikipediaThe Future of Libraries and Wikipedia
The Future of Libraries and Wikipedia
 
Europeana as a Linked Data (Quality) case
Europeana as a Linked Data (Quality) caseEuropeana as a Linked Data (Quality) case
Europeana as a Linked Data (Quality) case
 
Alexandria winer20100623
Alexandria winer20100623Alexandria winer20100623
Alexandria winer20100623
 
Open Culture - How Wiki loves art and data - Packed
 Open Culture - How Wiki loves art and data - Packed Open Culture - How Wiki loves art and data - Packed
Open Culture - How Wiki loves art and data - Packed
 
Linked Open Data Publications through Wikidata & Persistent Identification...
Linked Open Data  Publications through  Wikidata &  Persistent Identification...Linked Open Data  Publications through  Wikidata &  Persistent Identification...
Linked Open Data Publications through Wikidata & Persistent Identification...
 
Linked Open Data Publications through Wikidata & Persistent Identification in...
Linked Open Data Publications through Wikidata & Persistent Identification in...Linked Open Data Publications through Wikidata & Persistent Identification in...
Linked Open Data Publications through Wikidata & Persistent Identification in...
 
Tim Hill
Tim HillTim Hill
Tim Hill
 
Eun lre brussels_winer20100616
Eun lre brussels_winer20100616Eun lre brussels_winer20100616
Eun lre brussels_winer20100616
 
Entity Management at Europeana - DCMI 2021
Entity Management at Europeana - DCMI 2021Entity Management at Europeana - DCMI 2021
Entity Management at Europeana - DCMI 2021
 
Semantic Web and Cultural Heritage Collections
Semantic Web and Cultural Heritage CollectionsSemantic Web and Cultural Heritage Collections
Semantic Web and Cultural Heritage Collections
 
Mapping the European(a) metadata landscape
Mapping the European(a) metadata landscapeMapping the European(a) metadata landscape
Mapping the European(a) metadata landscape
 
LOD/LAM Presentation
LOD/LAM PresentationLOD/LAM Presentation
LOD/LAM Presentation
 
Presentation on The European Library
Presentation on The European LibraryPresentation on The European Library
Presentation on The European Library
 
High and Lows of Library Linked Data
High and Lows of Library Linked DataHigh and Lows of Library Linked Data
High and Lows of Library Linked Data
 
OCLC Linked Data Progress
OCLC Linked Data ProgressOCLC Linked Data Progress
OCLC Linked Data Progress
 

Mais de Antoine Isaac

Mais de Antoine Isaac (16)

Addressing multilingual challenges at Europeana: An update - DCMI 2021
Addressing multilingual challenges at Europeana: An update - DCMI 2021Addressing multilingual challenges at Europeana: An update - DCMI 2021
Addressing multilingual challenges at Europeana: An update - DCMI 2021
 
Le Cadre de publication d'Europeana
Le Cadre de publication d'EuropeanaLe Cadre de publication d'Europeana
Le Cadre de publication d'Europeana
 
The Europeana Data Model Principles, community and innovation
The Europeana Data Model  Principles, community and innovationThe Europeana Data Model  Principles, community and innovation
The Europeana Data Model Principles, community and innovation
 
Metadata aggregation of IIIF Resources at Europeana: status and plans
Metadata aggregation of IIIF Resources at Europeana: status and plansMetadata aggregation of IIIF Resources at Europeana: status and plans
Metadata aggregation of IIIF Resources at Europeana: status and plans
 
IIIF and the Europeana mission
IIIF and the Europeana missionIIIF and the Europeana mission
IIIF and the Europeana mission
 
Multilingual challenges and ongoing work to tackle them at Europeana
Multilingual challenges and ongoing work to tackle them at EuropeanaMultilingual challenges and ongoing work to tackle them at Europeana
Multilingual challenges and ongoing work to tackle them at Europeana
 
Lightweight rights modeling and linked data publication for online cultural h...
Lightweight rights modeling and linked data publication for online cultural h...Lightweight rights modeling and linked data publication for online cultural h...
Lightweight rights modeling and linked data publication for online cultural h...
 
The Europeana Data Model - TPDL2018
The Europeana Data Model - TPDL2018The Europeana Data Model - TPDL2018
The Europeana Data Model - TPDL2018
 
Europeana et IIIF
Europeana et IIIFEuropeana et IIIF
Europeana et IIIF
 
Data scale and diversity issues at Europeana
Data scale and diversity issues at EuropeanaData scale and diversity issues at Europeana
Data scale and diversity issues at Europeana
 
Isaac - W3C Data on the Web Best Practices - Data Vocabularies
Isaac - W3C Data on the Web Best Practices - Data VocabulariesIsaac - W3C Data on the Web Best Practices - Data Vocabularies
Isaac - W3C Data on the Web Best Practices - Data Vocabularies
 
Europeana APIs
Europeana APIsEuropeana APIs
Europeana APIs
 
Modelling and exchanging annotations
Modelling and exchanging annotationsModelling and exchanging annotations
Modelling and exchanging annotations
 
Modelling annotations for Europeana and related projects - DARIAH-EU WS
Modelling annotations for Europeana and related projects - DARIAH-EU WSModelling annotations for Europeana and related projects - DARIAH-EU WS
Modelling annotations for Europeana and related projects - DARIAH-EU WS
 
Classification schemes, thesauri and other Knowledge Organization Systems - a...
Classification schemes, thesauri and other Knowledge Organization Systems - a...Classification schemes, thesauri and other Knowledge Organization Systems - a...
Classification schemes, thesauri and other Knowledge Organization Systems - a...
 
Enrichment and Europeana
Enrichment and EuropeanaEnrichment and Europeana
Enrichment and Europeana
 

Último

Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
Lars Albertsson
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
JohnnyPlasten
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
shambhavirathore45
 

Último (20)

Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 

Wikidata, a target for Europeana's semantic strategy - GLAM-WIKI 2015

  • 1. Wikidata, a target for Europeana’s semantic strategy Valentine Charles, Hugo Manguinhas, Antoine Isaac: Europeana Vladimir Alexiev: Ontotext Corp GLAM Wiki 2015, Den Haag
  • 2. Europeana.eu, Europe’s cultural heritage portal 40M objects from 2,200 galleries, museums, archives and libraries
  • 3. Europeana has many data challenges: diversity  Aggregates metadata from the cultural heritage sector in Europe • Large amount of references to places, agents, concepts, time
  • 4. Europeana has many data challenges: diversity  Metadata in more than 30 languages  From all EU countries
  • 5. Europeana’s priority 1: Improve data quality  Europeana Data Model (EDM), a framework for richer data • Re-uses several existing Semantic Web-based models Dublin Core, OAI-ORE, SKOS, CIDOC-CRM… • EDM gives support for contextual resources (semantic layer)  Rely on vocabularies to solve a problem of data interlinking • Encourage data providers to contribute their own vocabularies and benefit from data links made at data providers’ level
  • 7. Europeana also manages its own vocabularies
  • 8. Europeana performs automatic enrichment based on vocabularies Goal: Contextualization which reaches outside the scope of a particular platform ObjectObject
  • 9. Automatic enrichment process in Europeana • Selection of metadata fields in resource descriptions • Selection of potential rules to match • Selection of metadata fields in resource descriptions • Selection of potential rules to match AnalysisAnalysis • Matching the values of the metadata fields to values of the contextual resources • Adding contextual links • Matching the values of the metadata fields to values of the contextual resources • Adding contextual links LinkingLinking • Selecting the values from the contextual resource • Augmentation of the search index with the labels from the vocabulary • Selecting the values from the contextual resource • Augmentation of the search index with the labels from the vocabulary Augmentation
  • 10. Enrichment Types and Current Vocabularies Enrichment Type Target vocabulary Source metadata fields Places GeoNames dcterms:spatial, dc:coverage Concepts GEMET, DBpedia dc:subject, dc:type Agents DBpedia dc:creator, dc:contributor Time Semium Time dc:date, dc:coverage, dcterms:temporal, edm:year
  • 12. How Wikidata fits in Europeana’s semantic strategy?
  • 13. Wikipedia's Relevance for Cultural Heritage  Authority Lists and Thesauri have central importance in CH  Wikipedia being "the sum of all knowledge" has broader reach than any institutional authority list  Only large-scale aggregations like VIAF (35 institutions) and LCSH (about 10 libraries around LoC) are comparable  While some facts are inaccurate and disputable, Wikipedia has a great role as a source of stable URLs on all kinds of topics
  • 14. How Big is Wikidata?  Name data sources for semantic enrichment (Europeana Creative D2.4) gives DBpedia and Wikidata stats  Wikidata: 3y old, 14M items, 209M edits  2.7M humans, 5k families, 22k literary characters  215k organizations  66k creative orgs (bands, radio/TV stations, newspapers…)  30k educational institutions  20k non-profit orgs  13k GLAM orgs: 0.5k galleries,1k libraries, 0.2k archives, 9k museums  500k creative works  110k heritages sites and monuments  40k family names, 20k first names
  • 15. Is this big enough?  Wikidata: 2.7M humans, 215k organizations, 800k places, 500k works  VIAF: 35M personal names, 5.4M orgs/conferences, 410k places, 1.7M works  GeoNames: 9M places  Only 1.1M persons are coreferenced, see Authority Addicts: The New Frontier of Authority Control on Wikidata  VIAF much bigger but still Wikidata is very important for GLAM:  Wikidata is active in Authority Control and Coreferencing  (VIAF) Moving to Wikidata: will get 1M persons/orgs, and many multilingual names (see next)  Authority Files have barely more than names & dates; Wikipedia often has a lot more info
  • 16. Wikidata Multilingual Coverage  Wikidata/DBpedia has huge multilingual coverage  Each entity is represented in 2.11 Wikipedias on average (see Europeana food and drink classification scheme, EFD D2.2)  But popular entities are present in many more (up to 180); and even in one Wikipedia there are many languages  E.g. Lucas Cranach in Wikidata: 57 lang tags, representing 44 languages and 13 language variants  Languages are consistently marked  Important for semantic enrichment (Named Entity Recognition)  Even though language labels in Europeana are not consistent 
  • 17. Name Variants for Lucas Cranach  Wikidata and VIAF each have 70 variants and dominate the "Wikipedia tradition" and "Library tradition" datasets respectively (see Name data sources for semantic enrichment)  Only 5 variants are in common (see Interactive Venn diagram)  Excellent complementarity. VIAF has more variants, Wikidata more multilingual names  VIAF's move to sync to Wikidata will narrow the gap
  • 18. Wikidata is connected to other vocabularies  Europeana prefers using pivot vocabularies • that are connected to many other vocabularies • It is key to avoid duplication and redundancy  Wikidata has lot of coreferences to other vocabularies that can be used to create extra links, and extract missing data • https://www.wikidata.org/wiki/Wikidata:WikiProject_Authority_control • https://twitter.com/hashtag/coreferencing: shots and news • Please tweet!
  • 19. VIAF-Wikidata Coreferences for Lucas Cranach  Can be leveraged to fill the gaps, e.g. bring RKDartists into VIAF VIAF id in VIAF Wikidata id in Wikidata viafID 49268177 VIAF 49268177 BAV ADV10197613 BNC .a10853637 BNE XX907273 BNF cb12176451h BNF 12176451h DNB 118522582 GND 118522582 ISNI 0000000121319721 ISNI 0000 0001 2131 9721 JPG 500115364 ULAN 500115364 LC n50020861 LCCN n50020861 LNB LNC10-000002573 NDL 00436834 NKC jn20000700335 NLA 000035031951 NLI 000035532,001445575,001448179 NLP a16828161 NTA 068435312 NTA PPN 068435312 NUKAT vtls000190728 SELIBR 182422 SUDOC 028710010 WKP Lucas_Cranach_the_Elder Many Wikipedias IMAGINE T7238,T267474 Cantic a10853637 Commons Creator Lucas Cranach (I) Commons category Lucas Cranach d. Ä. Freebase /m/0kqp0 RKDartists 18978 SIMBAD CRANACH, Lucas the Elder Your Paintings lucas-the-elder-cranach​ ​ ​
  • 20. Wikidata Coreferencing (1)  Excellent Mix-n-Match tool by Magnus Manske. 54 catalogs loaded!!  Decent auto-matching and excellent crowd-sourcing features
  • 21. Wikidata Coreferencing (2)  Excellent Authority Control navbox in Wikipedia  E.g. matching British Museum person-institution thesaurus (currently not coreferenced to anything: high value to BM)
  • 22. Europeana Food and Drink  How do you define such wide area as Food and Drink, which is so pervasive in every day life and culture?  Europeana food and drink classification scheme (EFD D2.2, or presentation) studies ~20 datasets for relevance to FD  Concludes that Wikipedia is our playing ground, and we should try to use Wikipedia Categories to delineate the topic • AGROVOC has 32k concepts but on production/science • Wikipedia/DBpedia has 6.6k proper Foods (with infoboxes and ingredients) • But I estimate 0.6-1.2M things relevant to FD in all Wikipedias  Background image: 2 levels of Food_and_drink cat hierarchy
  • 23. Wikidata is Easily Accessible  It is important for Europeana to have the data • Technically available: • Data dump preferably as Linked Data (RDF) • SPARQL end-point or other query mechanism (e.g. WDQ) • Properly documented and structured • Wikidata has an excellent Property Proposal process • Wikidata integrity constraints are excellent • In contrast, no Class creation process, so the classes are quite a mess (16k of which 2/3 have less than 5 instances) • Data templates should be made more visible and be used as references • Open access
  • 24. Wikidata Property Integrity Constraints  E.g. ULAN id constraints help to find records to merge / split  E.g. Communist Party of the Russian Federation has 5 LCNAF id's, what's up? Is it so popular with the Library of Congress?
  • 25. How Wikidata will be used by Europeana  Semantic Enrichment of Europeana data with additional information • With a specific focus on entities such as persons and concepts  Linking Europeana objects with Wikidata • Approach similar to https://www.wikidata.org/wiki/Wikidata:WikiProject_sum_of_all_pa intings • But would be extended to the whole Europeana dataset • Links would be added in the Europeana data  Structure (data template) for CH objects (e.g. paintings) still not very rich on Wikidata, e.g. Measurements not there • Improvements are made all the time, but see next
  • 26. Wikidata Items as Linking Hubs  Still, they're great as stable URLs  Providing the basic info (who, when, where, what)  And acting as coreferencing hubs  I don't expect Wikidata CH objects to ever be described in the full richness & complexity of professional art research. E.g. see British Museum Mapping to CIDOC CRM
  • 27. Wikidata and DBpedia  Wikidata and DBpedia are the two structured representations of Wikipedia  Wikidata: initially populated from Wikipedia, manually curated, will master structured data for Wikipedia. Synchronized through an assortment of bots  Data is fairly accurate but data depth is still small  DBpedia: automatically extracted from Wikipedia, live update, one- way extraction only.  Data reach is deep, but there are many problems in ontology and individual mappings, especially for non-English. E.g. United Nations is extracted as "Country". See DBpedia Ontology and Mapping Problems. Should they be together?
  • 28. GLAMs should add to Wikipedia or Wikidata!  EFD project. Swiecenie Koszyczek, "blessing of the baskets", a colorful Polish tradition  There's no article in pl.wikipedia.org, so we can't relate such artifacts to anything  Content partner's museum staff have no time to make a proper Wikipedia article  But adding a Wikidata item is quick & easy  Appropriate categories (Easter Traditions, Easter- related Foods) will put it in context
  • 29. Thank you Valentine Charles, valentine.charles@europeana.eu Vladimir Alexiev, vladimir.alexiev@ontotext.com Hugo Manguinhas, hugo.manguinhas@europeana.eu

Notas do Editor

  1. Take advantages of these rich data to improve other types of services such as auto-completion Two categories: Global Produced by projects See list on the wiki
  2. In the linked environment, enrichment often refers to adding new information at the semantic level to the data about certain resources. It is the creation of new links between the enriched resources and another data resource, such as controlled vocabularies and authority files. The goal is contextualization of metadata and embedding the resources in context outside the scope of the platform