SlideShare uma empresa Scribd logo
1 de 26
Baixar para ler offline
Turning Data into Knowledge – profiling and interlinking Web datasets 
Stefan Dietze 
L3S Research Center 
- KESW2014 - 
30/09/14 
1 
Stefan Dietze 
KESW2014
KESW2014 
Recent work on Linked Data exploration/discovery/search 
 Entity interlinking & dataset interlinking recommendation 
 Dataset profiling 
 Data consistency & conflicts 
Research areas 
 Web science, Information Retrieval, Semantic Web & Linked 
Data, data & knowledge integration (mapping, classification, 
interlinking) 
 Application domains: education/TEL, Web archiving, … 
Some projects 
Introduction 
http://www.l3s.de/ 
30/09/14 2 
 See also: http://purl.org/dietze 
Stefan Dietze
KESW2014 
…why are there so few datasets actually used? 
Date reuse and in-links focused on trusted „reference graphs“ such as DBpedia, Freebase etc 
Long tail of LD datasets which are neither reused nor linked to (LOD Cloud alone 300+ datasets, 50 bn triples) 
Explanations? 
Linked Data is awesome, but... 
30/09/14 
„HTTP-accessibility“ (SPARQL, URI-dereferencing) 
„Structure“ & „Semantics“ (=> shared/linked vocabularies) 
„Interlinked“ 
„Persistent“ 
Hm, really? 
Stefan Dietze 
3
KESW2014 
Linked data is more diverse (and messy) than we think 
SPARQL endpoint availability over time [Buil-Aranda et al 2013] 
Accessibility of datasets? 
Less than 50% of all SPARQL endpoints actually responsive at given point of time [Buil-Aranda2013] 
“THE” SPARQL protocol? No, but many variants & subsets 
“Semantics”, links, quality? 
…data accuracy (eg DBpedia)? [Paulheim2013] 
…vocabulary reuse? [D’AquinWebSci13] 
…schema compliance (RDFS, schemas) [HoganJWS2012] 
Stefan Dietze 
SPARQL Web-Querying Infrastructure: Ready for Action?, Carlos Buil-Aranda, Aidan Hogan, Jürgen Umbrich Pierre-Yves Vandenbussch, International Semantic Web Conference 2013, (ISWC2013). 
Assessing the Educational Linked Data Landscape, D’Aquin, M., Adamou, A., Dietze, S., ACM Web Science 2013 (WebSci2013), Paris, France, May 2013. 
Type Inference on Noisy RDF Data, Paulheim H., Bizer, C. Semantic Web – ISWC 2013, Lecture Notes in Computer Science Volume 8218, 2013, pp 510-525 
An empirical survey of Linked Data conformance. Hogan, A., Umbrich, J., Harth, A., Cyganiak, R., Polleres, A., Decker., S., Journal of Web Semantics 14, 2012 
30/09/14 
4
KESW2014 
What about data consistency? 
Analyzing Relative Incompleteness of Movie Descriptions in the Web of Data: A Case Study, Yuan, W., Demidova, E., Dietze, S., Zhu, X., International Semantic Web Conference 2014 (ISWC2014) 
30/09/14 
Stefan Dietze 
5
KESW2014 
Too many/diverse datasets, too little knowledge 
Stefan Dietze 
30/09/14 
? 
? 
? 
? 
? 
? 
Topics? Which datasets are useful & trustworthy for case XY (eg „learning about the solar system“) ? Which topics are covered? 
Types? Which datasets describe statistics, videos, slides, publications etc? 
Quality? Currentness, dynamics, accessability/reliability, data quantity & quality? 
6
KESW2014 
db:Astro. Objects 
Dataset Metadata 
Stefan Dietze 
30/09/14 
BIBO 
AAISO 
FOAF 
contains 
Entity & dataset disambiguation & linking [ESWC13] 
Topic profile extraction [WWW13, ESCW14] 
db:Astronomy 
db:Astro. Objects 
Dataset Catalog/Registry 
yov:Video 
po:Programme 
BBC Programme 
<po:Programme …> <po:Series>Wonders of the Solar System</.> <po:Actor>Brian Cox</…> </po:Programme…> 
<yo:Video …> <dc:title>Pluto & the Dwarf Planets</dc:title> … </yo:Video…> 
Yovisto Video 
bibo:Fil 
bibo:Fi 
bibo:Film 
Schema mappings [WebSci13] 
Data mapping, linking and profiling 
7
KESW2014 
Schemas/vocabularies on the Web: XKCD 927 
Stefan Dietze 
30/09/14 
https://xkcd.com/927/ 
schemas & vocabularies 
8
KESW2014 
Schema assessment and mapping 
Co-occurence of data types (in 146 datasets: 144 Vocabularies, 588 highly overlapping types, 719 Properties) 
Assessing the Educational Linked Data Landscape, D’Aquin, M., Adamou, A., Dietze, S., ACM Web Science 2013 (WebSci2013), Paris, France, May 2013. 
po:Programme 
sioc:Item 
30/09/14 
yov:Video 
? 
Stefan Dietze 
9
KESW2014 
typeX 
typeX 
Schema assessment and mapping 
Co-occurence of data types (in 146 datasets: 144 Vocabularies, 588 highly overlapping types, 719 Properties) 
Co-occurence after mapping into most frequent schemas (201 frequent types mapped into 79 classes) 
Assessing the Educational Linked Data Landscape, D’Aquin, M., Adamou, A., Dietze, S., ACM Web Science 2013 (WebSci2013), Paris, France, May 2013. 
bibo:Film 
bibo:Document 
po:Programme 
sioc:Item 
30/09/14 
foaf:Document 
yov:Video 
typeX 
10
KESW2014 
Application: LinkedUp Data Catalog 
in a nutshell 
 RDF (VoID) dataset catalog: browse & 
query distributed datasets 
 Federated queries using type mappings 
 Live information about endpoint 
accessibility 
Stefan Dietze 30/09/14 
11 
http://data.linkededucation.org/linkedup/catalog/ 
http://datahub.io/group/linked-education 
DBpedia categories
KESW2014 
Stefan Dietze 
30/09/14 
contains 
yov:Video 
po:Programme 
BBC Programme 
<po:Programme …> <po:Series>Wonders of the Solar System</.> <po:Actor>Brian Cox</…> </po:Programme…> 
<yo:Video …> <dc:title>Pluto & the Dwarf Planets</dc:title> … </yo:Video…> 
Yovisto Video 
Towards profiling: dataset disambiguation/linking 
? 
Relatedness of entities, meaningfulness of paths? [ESWC13] 
Extraction of “topics” & relatedness of datasets [ESWC14] 
? 
? 
? 
14 
db:Astro. Objects 
db:CartoonCharacters 
?
KESW2014 
Stefan Dietze 
30/09/14 
contains 
yov:Video 
po:Programme 
BBC Programme 
<po:Programme …> 
<po:Series>Wonders of the Solar System</.> 
<po:Actor>Brian Cox</…> 
</po:Programme…> 
<yo:Video …> 
<dc:title>Pluto & the Dwarf Planets</dc:title> 
… 
</yo:Video…> 
Yovisto Video 
Combining a co-occurrence-based and a semantic measure for entity linking, B. P. Nunes, S. Dietze, M.A. Casanova, R. Kawase, B. Fetahu, and W. Nejdl., ESWC 2013 - 10th Extended Semantic Web Conference, (May 2013). 
db:Pluto (Dwarf Planet) 
db:Astrono- mical Objects 
db:Sun 
db:Astronomy 
Computation of connectivity scores between entities 
Combination of a (i) semantic (graph-based) connectivity score (SCS) with (ii) a Web co-occurence-based measure (CBM) (similar to NGD) 
For (i): adaptation of Katz-Index from SNA for (linked) data graphs (considering path number and path lengths of transversal properties) 
SCS = 0.32 
CBM = 0.24 
15 
Dataset disambiguation/linking
KESW2014 
Entity linking: evaluation 
30/09/14 
16 
Stefan Dietze 
 Evaluation based on USA Today News items (80.000 entity pairs) 
 Manually created gold standard (1000 entity pairs) 
 Baseline: Explicit Semantic Analysis (ESA) => CBM/SCS: „relatedness“; ESA: „similarity“ 
Precision/Recall/F1 for SCS, CBM, ESA. 
Combining a co-occurrence-based and a semantic measure for entity linking, B. P. Nunes, S. Dietze, M.A. Casanova, R. Kawase, B. Fetahu, and W. Nejdl., ESWC 2013 - 10th Extended Semantic Web Conference, (May 2013).
KESW2014 
„SCS Connector“ demo 
http://lod2.inf.puc-rio.br/scs/SemConnectivities 
SCS Connector – Quantifying and Visualising Semantic Paths between Entity Pairs, Nunes, B. P., Herrera, J. E. T., Taibi, D., Lopes, G. R., Casanova, M. A., Dietze, S., Demo Paper at 11th Extended Semantic Web Conference (ESWC2014), Heraklion, Crete, Greece, (2014. – *BEST ESWC2014 DEMO AWARD* 
17 
Stefan Dietze 
30/09/14
KESW2014 
Dataset Metadata 
db:Astronomy 
db:Astro. Objects 
Dataset Catalog/Registry 
yov:Video 
<yo:Video …> 
<dc:title>Pluto & the Dwarf Planets</dc:title> 
… 
</yo:Video…> 
Yovisto Video 
Extracting representative (DBpedia) categories („topic profile“) & entities for arbitrary datasets 
Sounds easy? But how to do that for 300+ datasets with < 50 bn triples? 
Scalability vs representativeness: sampling & ranking for good scalability/accuracy balance [ESWC2014] (applied to all responsive LOD datasets) 
A Scalable Approach for Efficiently Generating 
Structured Dataset Topic Profiles, Fetahu, B., Dietze, S., Nunes, B. P., Casanova, M. A., Nejdl, W., 11th Extended Semantic Web Conference (ESWC2014), Crete, Greece, (2014). 
Dataset profiling: what‘s the data about? 
18 
Stefan Dietze 
30/09/14 
db:Pluto (Dwarf Planet)
KESW2014 
Efficient dataset profiling: method 
1.Sampling of resource instances (random sampling, weighted sampling, resource centrality sampling) 
2.Entity and topic extraction (NER via DBpedia Spotlight, category mapping and expansion) 
3.Normalisation and ranking (using graphical- models such as PageRank with Priors, HITS with Priors and K-Step Markov) 
Result: weighted dataset-topic profile graph 
A Scalable Approach for Efficiently Generating 
Structured Dataset Topic Profiles, Fetahu, B., Dietze, S., Nunes, B. P., Casanova, M. A., Nejdl, W., 11th Extended Semantic Web Conference (ESWC2014), Crete, Greece, (2014). 
19 
Stefan Dietze 
30/09/14
KESW2014 
Dataset profiling: exploring LOD datasets/topics in a nutshell 
http://data-observatory.org/lod-profiles/ 
Automatic extraction of dataset “topics” [ESWC2014] => RDF/VoiD dataset profiles 
Visualisation & exploration of dataset-topic graph (datasets, topics, relationships) 
Includes all (responsive) datasets of LOD Cloud 
20 
Stefan Dietze 
30/09/14
KESW2014 
Dataset profiling: evaluation 
NDCG (averaged over all datasets) . 
Datasets & Ground Truth 
Yovisto, Oxpoints, LAK Dataset, Semantic Web Dogfood 
Crowd-sourced topic indicators from datasets (keywords, tags) 
Manual mapping to entities & category extraction (ranking according to frequency) Baselines 
1) LDA, 2) tf/idf (applied to entire datasets) 
Topic extraction according to our approach, weighting/ranking based on term weight Measure 
NDCG @ rank l 
Performance (time/NDCG) for different sampling strategies/sizes etc 
21 
Stefan Dietze 
30/09/14
KESW2014 
30/09/14 
What (dataset) have these categories in common? 
dbp:Category:1955_births 
dbp:Category:People_from_London 
dbp:Category:Buzzwords 
dbp:Category:Semantic_Web 
dbp:Category:Web_Services 
dbp:Category:HTTP 
dbp:Category:Unitarian_Universalists 
dbp:Category:World_Wide_Web 
dbp:Category:Royal_Medal_winners 
Stefan Dietze 
22 
? 
?
KESW2014 
30/09/14 
Diversity of category profile for a single publication 
Berners-Lee, Tim; Hendler, James, Ora Lassila (2001). "The Semantic Web". Scientific American Magazine. 
foaf:Person 
foaf:Document 
dbp:Tim_Berners-Lee 
dbp:Category:1955_births 
dbp:Category:People_from_London 
dbp:Category:Buzzwords 
dbp:Semantic_Web 
dbp:Category:Semantic_Web 
dbp:Category:Web_Services 
dbp:Category:HTTP 
dbp:Category:Unitarian_Universalists 
first-level categories (dcterms:subject) 
dbp:Category:World_Wide_Web 
dbp:Category:Royal_Medal_winners 
Stefan Dietze 
DBLP 
23
KESW2014 
30/09/14 
http://data-observatory.org/led-explorer/ 
Type specific views on datasets/ categories 
“Document” (foaf:document) 
“Person “ (foaf:person) 
“Course” (aaiso:course) 
Currently applied to datasets in LinkedUp Catalog only (as schema mappings already available here) 
Type-specific exploration of dataset categories 
Stefan Dietze 
Exploring type-specific topic profiles of datasets: a demo for educational linked data, Taibi, D., Dietze, S., Fetahu, B., Fulantelli, G., Demo at International Semantic Web Conference 2014 (ISWC2014) 
24
KESW2014 
data.l3s.de – the L3S DataHub
KESW2014 
KEYSTONE & PROFILES 2014 
30/09/14 
27 
Stefan Dietze 
http://www.keystone-cost.eu/ 
KEYSTONE: semantic keyword-based search on structured data sources (2013-2017) 
Research network focused on distributed search, dataset profiling, to Semantic Web, Databases, etc. 
Open to new members (beyond Europe) 
http://www.keystone-cost.eu/profiles 
http://www.ijswis.org/?q=node/51/ 
PROFILES2014 - Dataset PROFIling & fEderated Search for Linked Data 
Workshop collocated with ESWC2014 
IJSWIS Special Issue on … LD search & profiling 
Deadline 8 December 2014
KESW2014 
Summing up 
Summary 
Increasing amounts of data => require knowledge about nature and relationships of datasets 
Profiling: scalable methods for extracting dataset metadata 
Interlinking: connectivity of entities or datasets What about LD evolution? 
In RDF graphs (eg LOD Cloud), „all“ nodes are connected 
Impact of evolution on preservation, linking and enrichment? 
Which parts of datasets to preserve (entity „neighbourhood“)? => semantic relatedness /relevance/entity retrieval 
Link correctness in evolving LD? 
…. 
30/09/14 
29 
Stefan Dietze
KESW2014 
Спасибо! Thank You! 
WWW See also (general) 
 http://purl.org/dietze 
 http://linkedup-project.eu 
 http://duraark.eu 
 http://data.l3s.de See also (data) 
 http://data.l3s.de 
 http://data.linkededucation.org 
http://lak.linkededucation.org 
30/09/14 
30 
Stefan Dietze 
Besnik Fetahu (L3S) 
Elena Demidova (L3S) 
Bernardo Pereira Nunes (PUC Rio) 
Marco Casanova (PUC Rio) 
Luiz Andre Paes Leme (PUC Rio) 
Giseli Lopes (PUC Rio) 
Davide Taibi (CNR, IT) 
Mathieu d’Aquin (Open University, UK) 
and many more… 
Acknowledgements

Mais conteúdo relacionado

Mais procurados

Online Learning and Linked Data: An Introduction
Online Learning and Linked Data: An IntroductionOnline Learning and Linked Data: An Introduction
Online Learning and Linked Data: An Introduction
EUCLID project
 
Open Data & Education Seminar, ITMO, St Petersburg, March 2014
Open Data & Education Seminar, ITMO, St Petersburg, March 2014Open Data & Education Seminar, ITMO, St Petersburg, March 2014
Open Data & Education Seminar, ITMO, St Petersburg, March 2014
Stefan Dietze
 
WWW2014 Tutorial: Online Learning & Linked Data - Lessons Learned
WWW2014 Tutorial: Online Learning & Linked Data - Lessons LearnedWWW2014 Tutorial: Online Learning & Linked Data - Lessons Learned
WWW2014 Tutorial: Online Learning & Linked Data - Lessons Learned
Stefan Dietze
 
Web Science Synergies: Exploring Web Knowledge through the Semantic Web
Web Science Synergies: Exploring Web Knowledge through the Semantic WebWeb Science Synergies: Exploring Web Knowledge through the Semantic Web
Web Science Synergies: Exploring Web Knowledge through the Semantic Web
Stefan Dietze
 
euclid_linkedup WWW tutorial (Besnik Fetahu)
euclid_linkedup WWW tutorial (Besnik Fetahu)euclid_linkedup WWW tutorial (Besnik Fetahu)
euclid_linkedup WWW tutorial (Besnik Fetahu)
Besnik Fetahu
 
Open Data Dialog 2013 - Linked Data in Education
Open Data Dialog 2013 - Linked Data in EducationOpen Data Dialog 2013 - Linked Data in Education
Open Data Dialog 2013 - Linked Data in Education
Stefan Dietze
 
Creating knowledge out of interlinked data
Creating knowledge out of interlinked dataCreating knowledge out of interlinked data
Creating knowledge out of interlinked data
Sören Auer
 
20130805 Activating Linked Open Data in Libraries Archives and Museums
20130805 Activating Linked Open Data in Libraries Archives and Museums20130805 Activating Linked Open Data in Libraries Archives and Museums
20130805 Activating Linked Open Data in Libraries Archives and Museums
andrea huang
 

Mais procurados (20)

Retrieval, Crawling and Fusion of Entity-centric Data on the Web
Retrieval, Crawling and Fusion of Entity-centric Data on the WebRetrieval, Crawling and Fusion of Entity-centric Data on the Web
Retrieval, Crawling and Fusion of Entity-centric Data on the Web
 
Open Educational Data - Datasets and APIs (Athens Green Hackathon 2012)
Open Educational Data - Datasets and APIs (Athens Green Hackathon 2012)Open Educational Data - Datasets and APIs (Athens Green Hackathon 2012)
Open Educational Data - Datasets and APIs (Athens Green Hackathon 2012)
 
Online Learning and Linked Data: An Introduction
Online Learning and Linked Data: An IntroductionOnline Learning and Linked Data: An Introduction
Online Learning and Linked Data: An Introduction
 
Open Data & Education Seminar, ITMO, St Petersburg, March 2014
Open Data & Education Seminar, ITMO, St Petersburg, March 2014Open Data & Education Seminar, ITMO, St Petersburg, March 2014
Open Data & Education Seminar, ITMO, St Petersburg, March 2014
 
WWW2014 Tutorial: Online Learning & Linked Data - Lessons Learned
WWW2014 Tutorial: Online Learning & Linked Data - Lessons LearnedWWW2014 Tutorial: Online Learning & Linked Data - Lessons Learned
WWW2014 Tutorial: Online Learning & Linked Data - Lessons Learned
 
Web Science Synergies: Exploring Web Knowledge through the Semantic Web
Web Science Synergies: Exploring Web Knowledge through the Semantic WebWeb Science Synergies: Exploring Web Knowledge through the Semantic Web
Web Science Synergies: Exploring Web Knowledge through the Semantic Web
 
Mining and Understanding Activities and Resources on the Web
Mining and Understanding Activities and Resources on the WebMining and Understanding Activities and Resources on the Web
Mining and Understanding Activities and Resources on the Web
 
B2: Open Up: Open Data in the Public Sector
B2: Open Up: Open Data in the Public SectorB2: Open Up: Open Data in the Public Sector
B2: Open Up: Open Data in the Public Sector
 
euclid_linkedup WWW tutorial (Besnik Fetahu)
euclid_linkedup WWW tutorial (Besnik Fetahu)euclid_linkedup WWW tutorial (Besnik Fetahu)
euclid_linkedup WWW tutorial (Besnik Fetahu)
 
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...
 
Beyond Meta-Data: Nano-Publications Recording Scientific Endeavour
Beyond Meta-Data: Nano-Publications Recording Scientific EndeavourBeyond Meta-Data: Nano-Publications Recording Scientific Endeavour
Beyond Meta-Data: Nano-Publications Recording Scientific Endeavour
 
Open Data Dialog 2013 - Linked Data in Education
Open Data Dialog 2013 - Linked Data in EducationOpen Data Dialog 2013 - Linked Data in Education
Open Data Dialog 2013 - Linked Data in Education
 
Interpreting Data Mining Results with Linked Data for Learning Analytics
Interpreting Data Mining Results with Linked Data for Learning AnalyticsInterpreting Data Mining Results with Linked Data for Learning Analytics
Interpreting Data Mining Results with Linked Data for Learning Analytics
 
Beyond Linked Data - Exploiting Entity-Centric Knowledge on the Web
Beyond Linked Data - Exploiting Entity-Centric Knowledge on the WebBeyond Linked Data - Exploiting Entity-Centric Knowledge on the Web
Beyond Linked Data - Exploiting Entity-Centric Knowledge on the Web
 
Analysing & Improving Learning Resources Markup on the Web
Analysing & Improving Learning Resources Markup on the WebAnalysing & Improving Learning Resources Markup on the Web
Analysing & Improving Learning Resources Markup on the Web
 
Linked Data at the Open University: From Technical Challenges to Organization...
Linked Data at the Open University: From Technical Challenges to Organization...Linked Data at the Open University: From Technical Challenges to Organization...
Linked Data at the Open University: From Technical Challenges to Organization...
 
Why should semantic technologies pay more attention to privacy... and vice-ve...
Why should semantic technologies pay more attention to privacy... and vice-ve...Why should semantic technologies pay more attention to privacy... and vice-ve...
Why should semantic technologies pay more attention to privacy... and vice-ve...
 
Creating knowledge out of interlinked data
Creating knowledge out of interlinked dataCreating knowledge out of interlinked data
Creating knowledge out of interlinked data
 
20130805 Activating Linked Open Data in Libraries Archives and Museums
20130805 Activating Linked Open Data in Libraries Archives and Museums20130805 Activating Linked Open Data in Libraries Archives and Museums
20130805 Activating Linked Open Data in Libraries Archives and Museums
 
Linked Open Data for Digital Humanities
Linked Open Data for Digital HumanitiesLinked Open Data for Digital Humanities
Linked Open Data for Digital Humanities
 

Destaque

A Domain-driven Approach to Digital Curation and Preservation of 3D Architect...
A Domain-driven Approach to Digital Curation and Preservation of 3D Architect...A Domain-driven Approach to Digital Curation and Preservation of 3D Architect...
A Domain-driven Approach to Digital Curation and Preservation of 3D Architect...
lindlar
 
Quality criteria for architectural 3D data in usage and preservation processes
Quality criteria for architectural 3D data in usage and preservation processesQuality criteria for architectural 3D data in usage and preservation processes
Quality criteria for architectural 3D data in usage and preservation processes
lindlar
 
Presentation of the DURAARK project at Ex Libris conference, Berlin, Germany.
Presentation of the DURAARK project at Ex Libris conference, Berlin, Germany.Presentation of the DURAARK project at Ex Libris conference, Berlin, Germany.
Presentation of the DURAARK project at Ex Libris conference, Berlin, Germany.
Lena Lindbäck
 

Destaque (13)

A Domain-driven Approach to Digital Curation and Preservation of 3D Architect...
A Domain-driven Approach to Digital Curation and Preservation of 3D Architect...A Domain-driven Approach to Digital Curation and Preservation of 3D Architect...
A Domain-driven Approach to Digital Curation and Preservation of 3D Architect...
 
Quality criteria for architectural 3D data in usage and preservation processes
Quality criteria for architectural 3D data in usage and preservation processesQuality criteria for architectural 3D data in usage and preservation processes
Quality criteria for architectural 3D data in usage and preservation processes
 
DURAARK at IGeLU 2014
DURAARK at IGeLU 2014DURAARK at IGeLU 2014
DURAARK at IGeLU 2014
 
DURAARK presentation CIB W78 "Applications of IT in AEC" conference Beijing 2...
DURAARK presentation CIB W78 "Applications of IT in AEC" conference Beijing 2...DURAARK presentation CIB W78 "Applications of IT in AEC" conference Beijing 2...
DURAARK presentation CIB W78 "Applications of IT in AEC" conference Beijing 2...
 
Grapp2014 presentation
Grapp2014 presentationGrapp2014 presentation
Grapp2014 presentation
 
Towards preservation of semantically enriched architectural knowledge
Towards preservation of semantically enriched architectural knowledgeTowards preservation of semantically enriched architectural knowledge
Towards preservation of semantically enriched architectural knowledge
 
Presentation nokobit
Presentation nokobitPresentation nokobit
Presentation nokobit
 
DURAARK at Bibliotheksymposium Wildau
DURAARK at Bibliotheksymposium WildauDURAARK at Bibliotheksymposium Wildau
DURAARK at Bibliotheksymposium Wildau
 
DURAARK presentation at DEDICATE final seminar, October 21st 2013, Michelle L...
DURAARK presentation at DEDICATE final seminar, October 21st 2013, Michelle L...DURAARK presentation at DEDICATE final seminar, October 21st 2013, Michelle L...
DURAARK presentation at DEDICATE final seminar, October 21st 2013, Michelle L...
 
A Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles
A Scalable Approach for Efficiently Generating Structured Dataset Topic ProfilesA Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles
A Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles
 
DURAARK at AUdS 2015
DURAARK at AUdS 2015DURAARK at AUdS 2015
DURAARK at AUdS 2015
 
Presentation of the DURAARK project at Ex Libris conference, Berlin, Germany.
Presentation of the DURAARK project at Ex Libris conference, Berlin, Germany.Presentation of the DURAARK project at Ex Libris conference, Berlin, Germany.
Presentation of the DURAARK project at Ex Libris conference, Berlin, Germany.
 
Preservation of 3 d objects of buildings
Preservation of 3 d objects of buildingsPreservation of 3 d objects of buildings
Preservation of 3 d objects of buildings
 

Semelhante a Turning Data into Knowledge (KESW2014 Keynote)

LinkedUp - Linked Data Europe Workshop 2014
LinkedUp - Linked Data Europe Workshop 2014LinkedUp - Linked Data Europe Workshop 2014
LinkedUp - Linked Data Europe Workshop 2014
Stefan Dietze
 
A semantic framework and software design to enable the transparent integratio...
A semantic framework and software design to enable the transparent integratio...A semantic framework and software design to enable the transparent integratio...
A semantic framework and software design to enable the transparent integratio...
Patricia Tavares Boralli
 
Linking Open Data with Drupal
Linking Open Data with DrupalLinking Open Data with Drupal
Linking Open Data with Drupal
emmanuel_jamin
 

Semelhante a Turning Data into Knowledge (KESW2014 Keynote) (20)

Semantic Linking & Retrieval for Digital Libraries
Semantic Linking & Retrieval for Digital LibrariesSemantic Linking & Retrieval for Digital Libraries
Semantic Linking & Retrieval for Digital Libraries
 
From Data to Knowledge - Profiling & Interlinking Web Datasets
From Data to Knowledge - Profiling & Interlinking Web DatasetsFrom Data to Knowledge - Profiling & Interlinking Web Datasets
From Data to Knowledge - Profiling & Interlinking Web Datasets
 
Towards research data knowledge graphs
Towards research data knowledge graphsTowards research data knowledge graphs
Towards research data knowledge graphs
 
Linked Data for Architecture, Engineering and Construction (AEC)
Linked Data for Architecture, Engineering and Construction (AEC)Linked Data for Architecture, Engineering and Construction (AEC)
Linked Data for Architecture, Engineering and Construction (AEC)
 
LinkedUp - Linked Data Europe Workshop 2014
LinkedUp - Linked Data Europe Workshop 2014LinkedUp - Linked Data Europe Workshop 2014
LinkedUp - Linked Data Europe Workshop 2014
 
Knowledge Graph Introduction
Knowledge Graph IntroductionKnowledge Graph Introduction
Knowledge Graph Introduction
 
BrightTALK - Semantic AI
BrightTALK - Semantic AI BrightTALK - Semantic AI
BrightTALK - Semantic AI
 
Big Data in Learning Analytics - Analytics for Everyday Learning
Big Data in Learning Analytics - Analytics for Everyday LearningBig Data in Learning Analytics - Analytics for Everyday Learning
Big Data in Learning Analytics - Analytics for Everyday Learning
 
OSFair2017 training | Explore, model, analyze and visualize systematic resear...
OSFair2017 training | Explore, model, analyze and visualize systematic resear...OSFair2017 training | Explore, model, analyze and visualize systematic resear...
OSFair2017 training | Explore, model, analyze and visualize systematic resear...
 
A semantic framework and software design to enable the transparent integratio...
A semantic framework and software design to enable the transparent integratio...A semantic framework and software design to enable the transparent integratio...
A semantic framework and software design to enable the transparent integratio...
 
Dc 2014 baierer-droege
Dc 2014 baierer-droegeDc 2014 baierer-droege
Dc 2014 baierer-droege
 
Linked Data vs Open Educational Resources
Linked Data vs Open Educational ResourcesLinked Data vs Open Educational Resources
Linked Data vs Open Educational Resources
 
Hala skafkeynote@conferencedata2021
Hala skafkeynote@conferencedata2021Hala skafkeynote@conferencedata2021
Hala skafkeynote@conferencedata2021
 
The web of data: how are we doing so far
The web of data: how are we doing so farThe web of data: how are we doing so far
The web of data: how are we doing so far
 
Beyond research data infrastructures: exploiting artificial & crowd intellige...
Beyond research data infrastructures: exploiting artificial & crowd intellige...Beyond research data infrastructures: exploiting artificial & crowd intellige...
Beyond research data infrastructures: exploiting artificial & crowd intellige...
 
Linking Open Data with Drupal
Linking Open Data with DrupalLinking Open Data with Drupal
Linking Open Data with Drupal
 
Metadata as Linked Data for Research Data Repositories
Metadata as Linked Data for Research Data RepositoriesMetadata as Linked Data for Research Data Repositories
Metadata as Linked Data for Research Data Repositories
 
Presentation at MTSR 2012
Presentation at MTSR 2012Presentation at MTSR 2012
Presentation at MTSR 2012
 
Information Extraction and Linked Data Cloud
Information Extraction and Linked Data CloudInformation Extraction and Linked Data Cloud
Information Extraction and Linked Data Cloud
 
Camp 4-data workshop presentation
Camp 4-data workshop presentationCamp 4-data workshop presentation
Camp 4-data workshop presentation
 

Mais de Stefan Dietze

Mais de Stefan Dietze (11)

AI in between online and offline discourse - and what has ChatGPT to do with ...
AI in between online and offline discourse - and what has ChatGPT to do with ...AI in between online and offline discourse - and what has ChatGPT to do with ...
AI in between online and offline discourse - and what has ChatGPT to do with ...
 
An interdisciplinary journey with the SAL spaceship – results and challenges ...
An interdisciplinary journey with the SAL spaceship – results and challenges ...An interdisciplinary journey with the SAL spaceship – results and challenges ...
An interdisciplinary journey with the SAL spaceship – results and challenges ...
 
Research Knowledge Graphs at NFDI4DS & GESIS
Research Knowledge Graphs at NFDI4DS & GESISResearch Knowledge Graphs at NFDI4DS & GESIS
Research Knowledge Graphs at NFDI4DS & GESIS
 
Research Knowledge Graphs at GESIS & NFDI4DataScience
Research Knowledge Graphs at GESIS & NFDI4DataScienceResearch Knowledge Graphs at GESIS & NFDI4DataScience
Research Knowledge Graphs at GESIS & NFDI4DataScience
 
Human-in-the-loop: the Web as Foundation for interdisciplinary Data Science M...
Human-in-the-loop: the Web as Foundation for interdisciplinary Data Science M...Human-in-the-loop: the Web as Foundation for interdisciplinary Data Science M...
Human-in-the-loop: the Web as Foundation for interdisciplinary Data Science M...
 
Human-in-the-Loop: das Web als Grundlage interdisziplinärer Data Science Meth...
Human-in-the-Loop: das Web als Grundlage interdisziplinärer Data Science Meth...Human-in-the-Loop: das Web als Grundlage interdisziplinärer Data Science Meth...
Human-in-the-Loop: das Web als Grundlage interdisziplinärer Data Science Meth...
 
From Web Data to Knowledge: on the Complementarity of Human and Artificial In...
From Web Data to Knowledge: on the Complementarity of Human and Artificial In...From Web Data to Knowledge: on the Complementarity of Human and Artificial In...
From Web Data to Knowledge: on the Complementarity of Human and Artificial In...
 
Using AI to understand everyday learning on the Web
Using AI to understand everyday learning on the WebUsing AI to understand everyday learning on the Web
Using AI to understand everyday learning on the Web
 
Analysing User Knowledge, Competence and Learning during Online Activities
Analysing User Knowledge, Competence and Learning during Online ActivitiesAnalysing User Knowledge, Competence and Learning during Online Activities
Analysing User Knowledge, Competence and Learning during Online Activities
 
Towards embedded Markup of Learning Resources on the Web
Towards embedded Markup of Learning Resources on the WebTowards embedded Markup of Learning Resources on the Web
Towards embedded Markup of Learning Resources on the Web
 
Dietze linked data-vr-es
Dietze linked data-vr-esDietze linked data-vr-es
Dietze linked data-vr-es
 

Último

Último (20)

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

Turning Data into Knowledge (KESW2014 Keynote)

  • 1. Turning Data into Knowledge – profiling and interlinking Web datasets Stefan Dietze L3S Research Center - KESW2014 - 30/09/14 1 Stefan Dietze KESW2014
  • 2. KESW2014 Recent work on Linked Data exploration/discovery/search  Entity interlinking & dataset interlinking recommendation  Dataset profiling  Data consistency & conflicts Research areas  Web science, Information Retrieval, Semantic Web & Linked Data, data & knowledge integration (mapping, classification, interlinking)  Application domains: education/TEL, Web archiving, … Some projects Introduction http://www.l3s.de/ 30/09/14 2  See also: http://purl.org/dietze Stefan Dietze
  • 3. KESW2014 …why are there so few datasets actually used? Date reuse and in-links focused on trusted „reference graphs“ such as DBpedia, Freebase etc Long tail of LD datasets which are neither reused nor linked to (LOD Cloud alone 300+ datasets, 50 bn triples) Explanations? Linked Data is awesome, but... 30/09/14 „HTTP-accessibility“ (SPARQL, URI-dereferencing) „Structure“ & „Semantics“ (=> shared/linked vocabularies) „Interlinked“ „Persistent“ Hm, really? Stefan Dietze 3
  • 4. KESW2014 Linked data is more diverse (and messy) than we think SPARQL endpoint availability over time [Buil-Aranda et al 2013] Accessibility of datasets? Less than 50% of all SPARQL endpoints actually responsive at given point of time [Buil-Aranda2013] “THE” SPARQL protocol? No, but many variants & subsets “Semantics”, links, quality? …data accuracy (eg DBpedia)? [Paulheim2013] …vocabulary reuse? [D’AquinWebSci13] …schema compliance (RDFS, schemas) [HoganJWS2012] Stefan Dietze SPARQL Web-Querying Infrastructure: Ready for Action?, Carlos Buil-Aranda, Aidan Hogan, Jürgen Umbrich Pierre-Yves Vandenbussch, International Semantic Web Conference 2013, (ISWC2013). Assessing the Educational Linked Data Landscape, D’Aquin, M., Adamou, A., Dietze, S., ACM Web Science 2013 (WebSci2013), Paris, France, May 2013. Type Inference on Noisy RDF Data, Paulheim H., Bizer, C. Semantic Web – ISWC 2013, Lecture Notes in Computer Science Volume 8218, 2013, pp 510-525 An empirical survey of Linked Data conformance. Hogan, A., Umbrich, J., Harth, A., Cyganiak, R., Polleres, A., Decker., S., Journal of Web Semantics 14, 2012 30/09/14 4
  • 5. KESW2014 What about data consistency? Analyzing Relative Incompleteness of Movie Descriptions in the Web of Data: A Case Study, Yuan, W., Demidova, E., Dietze, S., Zhu, X., International Semantic Web Conference 2014 (ISWC2014) 30/09/14 Stefan Dietze 5
  • 6. KESW2014 Too many/diverse datasets, too little knowledge Stefan Dietze 30/09/14 ? ? ? ? ? ? Topics? Which datasets are useful & trustworthy for case XY (eg „learning about the solar system“) ? Which topics are covered? Types? Which datasets describe statistics, videos, slides, publications etc? Quality? Currentness, dynamics, accessability/reliability, data quantity & quality? 6
  • 7. KESW2014 db:Astro. Objects Dataset Metadata Stefan Dietze 30/09/14 BIBO AAISO FOAF contains Entity & dataset disambiguation & linking [ESWC13] Topic profile extraction [WWW13, ESCW14] db:Astronomy db:Astro. Objects Dataset Catalog/Registry yov:Video po:Programme BBC Programme <po:Programme …> <po:Series>Wonders of the Solar System</.> <po:Actor>Brian Cox</…> </po:Programme…> <yo:Video …> <dc:title>Pluto & the Dwarf Planets</dc:title> … </yo:Video…> Yovisto Video bibo:Fil bibo:Fi bibo:Film Schema mappings [WebSci13] Data mapping, linking and profiling 7
  • 8. KESW2014 Schemas/vocabularies on the Web: XKCD 927 Stefan Dietze 30/09/14 https://xkcd.com/927/ schemas & vocabularies 8
  • 9. KESW2014 Schema assessment and mapping Co-occurence of data types (in 146 datasets: 144 Vocabularies, 588 highly overlapping types, 719 Properties) Assessing the Educational Linked Data Landscape, D’Aquin, M., Adamou, A., Dietze, S., ACM Web Science 2013 (WebSci2013), Paris, France, May 2013. po:Programme sioc:Item 30/09/14 yov:Video ? Stefan Dietze 9
  • 10. KESW2014 typeX typeX Schema assessment and mapping Co-occurence of data types (in 146 datasets: 144 Vocabularies, 588 highly overlapping types, 719 Properties) Co-occurence after mapping into most frequent schemas (201 frequent types mapped into 79 classes) Assessing the Educational Linked Data Landscape, D’Aquin, M., Adamou, A., Dietze, S., ACM Web Science 2013 (WebSci2013), Paris, France, May 2013. bibo:Film bibo:Document po:Programme sioc:Item 30/09/14 foaf:Document yov:Video typeX 10
  • 11. KESW2014 Application: LinkedUp Data Catalog in a nutshell  RDF (VoID) dataset catalog: browse & query distributed datasets  Federated queries using type mappings  Live information about endpoint accessibility Stefan Dietze 30/09/14 11 http://data.linkededucation.org/linkedup/catalog/ http://datahub.io/group/linked-education DBpedia categories
  • 12. KESW2014 Stefan Dietze 30/09/14 contains yov:Video po:Programme BBC Programme <po:Programme …> <po:Series>Wonders of the Solar System</.> <po:Actor>Brian Cox</…> </po:Programme…> <yo:Video …> <dc:title>Pluto & the Dwarf Planets</dc:title> … </yo:Video…> Yovisto Video Towards profiling: dataset disambiguation/linking ? Relatedness of entities, meaningfulness of paths? [ESWC13] Extraction of “topics” & relatedness of datasets [ESWC14] ? ? ? 14 db:Astro. Objects db:CartoonCharacters ?
  • 13. KESW2014 Stefan Dietze 30/09/14 contains yov:Video po:Programme BBC Programme <po:Programme …> <po:Series>Wonders of the Solar System</.> <po:Actor>Brian Cox</…> </po:Programme…> <yo:Video …> <dc:title>Pluto & the Dwarf Planets</dc:title> … </yo:Video…> Yovisto Video Combining a co-occurrence-based and a semantic measure for entity linking, B. P. Nunes, S. Dietze, M.A. Casanova, R. Kawase, B. Fetahu, and W. Nejdl., ESWC 2013 - 10th Extended Semantic Web Conference, (May 2013). db:Pluto (Dwarf Planet) db:Astrono- mical Objects db:Sun db:Astronomy Computation of connectivity scores between entities Combination of a (i) semantic (graph-based) connectivity score (SCS) with (ii) a Web co-occurence-based measure (CBM) (similar to NGD) For (i): adaptation of Katz-Index from SNA for (linked) data graphs (considering path number and path lengths of transversal properties) SCS = 0.32 CBM = 0.24 15 Dataset disambiguation/linking
  • 14. KESW2014 Entity linking: evaluation 30/09/14 16 Stefan Dietze  Evaluation based on USA Today News items (80.000 entity pairs)  Manually created gold standard (1000 entity pairs)  Baseline: Explicit Semantic Analysis (ESA) => CBM/SCS: „relatedness“; ESA: „similarity“ Precision/Recall/F1 for SCS, CBM, ESA. Combining a co-occurrence-based and a semantic measure for entity linking, B. P. Nunes, S. Dietze, M.A. Casanova, R. Kawase, B. Fetahu, and W. Nejdl., ESWC 2013 - 10th Extended Semantic Web Conference, (May 2013).
  • 15. KESW2014 „SCS Connector“ demo http://lod2.inf.puc-rio.br/scs/SemConnectivities SCS Connector – Quantifying and Visualising Semantic Paths between Entity Pairs, Nunes, B. P., Herrera, J. E. T., Taibi, D., Lopes, G. R., Casanova, M. A., Dietze, S., Demo Paper at 11th Extended Semantic Web Conference (ESWC2014), Heraklion, Crete, Greece, (2014. – *BEST ESWC2014 DEMO AWARD* 17 Stefan Dietze 30/09/14
  • 16. KESW2014 Dataset Metadata db:Astronomy db:Astro. Objects Dataset Catalog/Registry yov:Video <yo:Video …> <dc:title>Pluto & the Dwarf Planets</dc:title> … </yo:Video…> Yovisto Video Extracting representative (DBpedia) categories („topic profile“) & entities for arbitrary datasets Sounds easy? But how to do that for 300+ datasets with < 50 bn triples? Scalability vs representativeness: sampling & ranking for good scalability/accuracy balance [ESWC2014] (applied to all responsive LOD datasets) A Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles, Fetahu, B., Dietze, S., Nunes, B. P., Casanova, M. A., Nejdl, W., 11th Extended Semantic Web Conference (ESWC2014), Crete, Greece, (2014). Dataset profiling: what‘s the data about? 18 Stefan Dietze 30/09/14 db:Pluto (Dwarf Planet)
  • 17. KESW2014 Efficient dataset profiling: method 1.Sampling of resource instances (random sampling, weighted sampling, resource centrality sampling) 2.Entity and topic extraction (NER via DBpedia Spotlight, category mapping and expansion) 3.Normalisation and ranking (using graphical- models such as PageRank with Priors, HITS with Priors and K-Step Markov) Result: weighted dataset-topic profile graph A Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles, Fetahu, B., Dietze, S., Nunes, B. P., Casanova, M. A., Nejdl, W., 11th Extended Semantic Web Conference (ESWC2014), Crete, Greece, (2014). 19 Stefan Dietze 30/09/14
  • 18. KESW2014 Dataset profiling: exploring LOD datasets/topics in a nutshell http://data-observatory.org/lod-profiles/ Automatic extraction of dataset “topics” [ESWC2014] => RDF/VoiD dataset profiles Visualisation & exploration of dataset-topic graph (datasets, topics, relationships) Includes all (responsive) datasets of LOD Cloud 20 Stefan Dietze 30/09/14
  • 19. KESW2014 Dataset profiling: evaluation NDCG (averaged over all datasets) . Datasets & Ground Truth Yovisto, Oxpoints, LAK Dataset, Semantic Web Dogfood Crowd-sourced topic indicators from datasets (keywords, tags) Manual mapping to entities & category extraction (ranking according to frequency) Baselines 1) LDA, 2) tf/idf (applied to entire datasets) Topic extraction according to our approach, weighting/ranking based on term weight Measure NDCG @ rank l Performance (time/NDCG) for different sampling strategies/sizes etc 21 Stefan Dietze 30/09/14
  • 20. KESW2014 30/09/14 What (dataset) have these categories in common? dbp:Category:1955_births dbp:Category:People_from_London dbp:Category:Buzzwords dbp:Category:Semantic_Web dbp:Category:Web_Services dbp:Category:HTTP dbp:Category:Unitarian_Universalists dbp:Category:World_Wide_Web dbp:Category:Royal_Medal_winners Stefan Dietze 22 ? ?
  • 21. KESW2014 30/09/14 Diversity of category profile for a single publication Berners-Lee, Tim; Hendler, James, Ora Lassila (2001). "The Semantic Web". Scientific American Magazine. foaf:Person foaf:Document dbp:Tim_Berners-Lee dbp:Category:1955_births dbp:Category:People_from_London dbp:Category:Buzzwords dbp:Semantic_Web dbp:Category:Semantic_Web dbp:Category:Web_Services dbp:Category:HTTP dbp:Category:Unitarian_Universalists first-level categories (dcterms:subject) dbp:Category:World_Wide_Web dbp:Category:Royal_Medal_winners Stefan Dietze DBLP 23
  • 22. KESW2014 30/09/14 http://data-observatory.org/led-explorer/ Type specific views on datasets/ categories “Document” (foaf:document) “Person “ (foaf:person) “Course” (aaiso:course) Currently applied to datasets in LinkedUp Catalog only (as schema mappings already available here) Type-specific exploration of dataset categories Stefan Dietze Exploring type-specific topic profiles of datasets: a demo for educational linked data, Taibi, D., Dietze, S., Fetahu, B., Fulantelli, G., Demo at International Semantic Web Conference 2014 (ISWC2014) 24
  • 23. KESW2014 data.l3s.de – the L3S DataHub
  • 24. KESW2014 KEYSTONE & PROFILES 2014 30/09/14 27 Stefan Dietze http://www.keystone-cost.eu/ KEYSTONE: semantic keyword-based search on structured data sources (2013-2017) Research network focused on distributed search, dataset profiling, to Semantic Web, Databases, etc. Open to new members (beyond Europe) http://www.keystone-cost.eu/profiles http://www.ijswis.org/?q=node/51/ PROFILES2014 - Dataset PROFIling & fEderated Search for Linked Data Workshop collocated with ESWC2014 IJSWIS Special Issue on … LD search & profiling Deadline 8 December 2014
  • 25. KESW2014 Summing up Summary Increasing amounts of data => require knowledge about nature and relationships of datasets Profiling: scalable methods for extracting dataset metadata Interlinking: connectivity of entities or datasets What about LD evolution? In RDF graphs (eg LOD Cloud), „all“ nodes are connected Impact of evolution on preservation, linking and enrichment? Which parts of datasets to preserve (entity „neighbourhood“)? => semantic relatedness /relevance/entity retrieval Link correctness in evolving LD? …. 30/09/14 29 Stefan Dietze
  • 26. KESW2014 Спасибо! Thank You! WWW See also (general)  http://purl.org/dietze  http://linkedup-project.eu  http://duraark.eu  http://data.l3s.de See also (data)  http://data.l3s.de  http://data.linkededucation.org http://lak.linkededucation.org 30/09/14 30 Stefan Dietze Besnik Fetahu (L3S) Elena Demidova (L3S) Bernardo Pereira Nunes (PUC Rio) Marco Casanova (PUC Rio) Luiz Andre Paes Leme (PUC Rio) Giseli Lopes (PUC Rio) Davide Taibi (CNR, IT) Mathieu d’Aquin (Open University, UK) and many more… Acknowledgements