SlideShare uma empresa Scribd logo
1 de 17
Motivation
Data on the Web
18/06/13Lile 2013 – Rio de Janeiro
Some eyecatching opener illustrating growth and or diversity of web data
Towards Integration of Web Data into a
coherent Educational Data Graph
LILE 2013 : 3rd International Workshop on Learning and Education with the Web of Data
14 May 2013, Rio de Janeiro, Brazil
Davide Taibi – Besnik Fetahu – Stefan Dietze
(CNR – ITD, IT) (L3S Research Center, DE)
Outline
• Linked Open Data serving data-intensive applications
• Heterogeneity of datasets and schemas
• Is it all that easy to use Linked Open Data and what are they all about?
– Interlinking of datasets only at a superficial level
– Different schemas for similar resource classes accross datasets
– Non-structured resource descriptions
– Best-case scenario: very abstract topic definitions
– Difficult to query for a subset of resources and datasets for a specific topic
• Our approach
– Schema level integration
– Enhanced dataset & resource descriptions
– Instance level integration
– Scalable annotation extraction
– Clustering and correlation of datasets
18/06/13 Lile 2013 – Rio de Janeiro
Introduction
• Large amounts of publicly available Linked Open Data of educational relevance
• Difficulties on providing large-scale integration
• Dataset and resource description annotation
• Clustering and dataset interlinking
18/06/13 Lile 2013 – Rio de Janeiro
Educational Data
Steps towards a Linked Education Data Graph
18/06/13 Lile 2013 – Rio de Janeiro
Schema Level Integration
18/06/13 Lile 2013 – Rio de Janeiro
http://data.linkededucation.org/ns/linked-education.rdf
Schema Level Integration
18/06/13 Lile 2013 – Rio de Janeiro
http://data.linkededucation.org/ns/linked-education.rdf
LinkedUniversities Dataset
Schema Level Integration
• VoID based schema:
– http://data.linkededucation.org/ns/linked-education.rdf
– Dataset cataloging and classification
– Mappings (types, properties)
• Datasets:
– LinkedUniversities Dataset
– mEducator
– Europeana
• Imported resources for clustering experiments:
– 6 millions of distinct resources
– 97 millions of RDF triples
– 21.6 GB of data
• SPARQL endpoint:
– http://okkam.l3s.uni-hannover.de:8880/openrdf-workbench/repositories/linked-
learning-rdf
18/06/13 Lile 2013 – Rio de Janeiro
 DBLP-L3S
 BBC programmes
 ACM publications
Instance-level integration
18/06/13 Lile 2013 – Rio de Janeiro
<http://dbpedia.org/page/Gravitation>
<http://dbpedia.org/page/Strong>
<http://dbpedia.org/page/Dense>
• DBpedia Spotlight as NER & NED tool
• Annotation of unstructured content
• Selective & Scalable annotation
• Annotate tokens of different size
Instance-level integration
Characteristics of enrichments
•Disambiguation
•Acronyms detection (e.g. “dns”, “gmt”)
•Synonyms detection (e.g. “globe”, “earth”)
•Context detection (e.g. “apple” fruits, “apple” computer)
18/06/13 Lile 2013 – Rio de Janeiro
<http://dbpedia.org/page/Gravitation>
Correlation and Clustering
18/06/13 Lile 2013 – Rio de Janeiro
Gravitation
Equations
Earth
• Annotations used to construct a network of resources, with edges based on common
resource annotations.
Correlation and Clustering
• Methods used for clustering
• Based on the shared enrichments
• Naïve
• Based on the ef-irf (Enrichment Frequency-Inverse Resource Frequency) index
• Jaccard
• Cosine
Different threshold have been used to generate clusters
18/06/13 Lile 2013 – Rio de Janeiro
Evaluation
Three evaluation stages:
•Quantitative & Qualitative
• Assess annotation accuracy for exhaustive and scalable approaches
• Measure standard precision/recall metrics
• 250 resources for each dataset used for assessment
•Performance
• Gains in terms of scalability
18/06/13 Lile 2013 – Rio de Janeiro
Quantitative Evaluation
Context #Resources #Annotations #Entity Types
ACM 249 200 239
mEducator 250 495 355
BBC 250 1364 769
LinkedUniversities 243 166 283
DBLP 250 295 161
Europeana 249 938 672
Total 1491 3458 937
18/06/13 Lile 2013 – Rio de Janeiro
• Number of extracted entities is related to the length of a textual description in a
resource
• For long texts up to 87 distinct entities and more than 200 entity type associations
Qualitative Evaluation
18/06/13 Lile 2013 – Rio de Janeiro
• Human evaluators to measure annotation accuracy
• 2000 annotations for both (exhaustive and scalable) approaches were
assessed
• Number of evaluators for the first approach was 32, with an average of 63
tasks per user, while for the second, there were 23 users with an average
of 87 completed tasks
Precision Recall
Exhaustive 0.82 0.429
Scalable 0.77 0.687
∆[E-S] -0.05 +0.26
Performance Evaluation
Size-k No Filtering Filtered:resource level Filtered: dataset level
1 53089 24850 7464
2 51346 17919 13281
3 49603 11800 9607
4 47871 7793 6432
5 46153 5184 4289
6 44480 3529 2922
18/06/13 Lile 2013 – Rio de Janeiro
• Reduction of textual content to be analyzed for the annotation phase:
• Terms of tags {NN,NNP,NNPS}, reduce the amount of text by almost 40%.
• For various token sizes, the reduced amount goes up to 86%
• NER complexity task from DBpedia Spotlight:
• Reduction of HTTP requests.
• Avoid annotating similar chunks of text.
• Significant gains in terms of execution time: 3.5hrs vs. 20mins
Conclusion
• Large-scale educational data-graph
• Well-interlinked datasets at schema and instance level
• Enhanced dataset and resource description
• Scalable annotation procedure
• EF-IRF clustering approach
• Clusters and correlated datasets
18/06/13 Lile 2013 – Rio de Janeiro
Thank you!
Questions?
18/06/13 Lile 2013 – Rio de Janeiro

Mais conteúdo relacionado

Mais procurados

Proposal for open government data
Proposal for open government dataProposal for open government data
Proposal for open government dataMahmoud Jalajel
 
A distributed network of digital heritage information - Semantics Amsterdam
A distributed network of digital heritage information - Semantics AmsterdamA distributed network of digital heritage information - Semantics Amsterdam
A distributed network of digital heritage information - Semantics AmsterdamEnno Meijers
 
WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...
WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...
WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...Micah Altman
 
Requirements for Open Sharing of Archaeological Research Data
Requirements for Open Sharing of Archaeological Research DataRequirements for Open Sharing of Archaeological Research Data
Requirements for Open Sharing of Archaeological Research Dataariadnenetwork
 
Open Data Publication - Requirements, Good practices, and Benefits
Open Data Publication - Requirements, Good practices, and BenefitsOpen Data Publication - Requirements, Good practices, and Benefits
Open Data Publication - Requirements, Good practices, and Benefitsariadnenetwork
 
01 caa2019 ariadn_eplus_snd_uj_krakow 20190425
01 caa2019 ariadn_eplus_snd_uj_krakow 2019042501 caa2019 ariadn_eplus_snd_uj_krakow 20190425
01 caa2019 ariadn_eplus_snd_uj_krakow 20190425ariadnenetwork
 
Archiving archaeological data in Austria, Edeltraud Aspöck, Anja Masur OREA/ÖAW
Archiving archaeological data in Austria, Edeltraud Aspöck, Anja Masur OREA/ÖAWArchiving archaeological data in Austria, Edeltraud Aspöck, Anja Masur OREA/ÖAW
Archiving archaeological data in Austria, Edeltraud Aspöck, Anja Masur OREA/ÖAWariadnenetwork
 
Adoption and Integration of Persistent Identifiers in European Research Infor...
Adoption and Integration of Persistent Identifiers in European Research Infor...Adoption and Integration of Persistent Identifiers in European Research Infor...
Adoption and Integration of Persistent Identifiers in European Research Infor...LIBER Europe
 
EC-WEB: Validator and Preview for the JobPosting Data Model of Schema.org
EC-WEB: Validator and Preview for the JobPosting Data Model of Schema.orgEC-WEB: Validator and Preview for the JobPosting Data Model of Schema.org
EC-WEB: Validator and Preview for the JobPosting Data Model of Schema.orgJindřich Mynarz
 
Coherance in dissemination- Msis 2007
Coherance in dissemination- Msis 2007Coherance in dissemination- Msis 2007
Coherance in dissemination- Msis 2007annegrete
 
Getting Started with Knowledge Graphs
Getting Started with Knowledge GraphsGetting Started with Knowledge Graphs
Getting Started with Knowledge GraphsPeter Haase
 
Visualising Data on Interactive Maps
Visualising Data on Interactive MapsVisualising Data on Interactive Maps
Visualising Data on Interactive MapsAnna Pawlicka
 
The GND initiative 2017-2021: Developing a Backbone for the Web of Cultural a...
The GND initiative 2017-2021: Developing a Backbone for the Web of Cultural a...The GND initiative 2017-2021: Developing a Backbone for the Web of Cultural a...
The GND initiative 2017-2021: Developing a Backbone for the Web of Cultural a...LIBER Europe
 
The Information Workbench - Linked Data and Semantic Wikis in the Enterprise
The Information Workbench - Linked Data and Semantic Wikis in the EnterpriseThe Information Workbench - Linked Data and Semantic Wikis in the Enterprise
The Information Workbench - Linked Data and Semantic Wikis in the EnterprisePeter Haase
 
Open Access of Research Data - The Present and Future Situation in Germany
Open Access of Research Data - The Present and Future Situation in GermanyOpen Access of Research Data - The Present and Future Situation in Germany
Open Access of Research Data - The Present and Future Situation in Germanyariadnenetwork
 
A Survey of Exploratory Search Systems Based on LOD Resources
A Survey of Exploratory Search Systems Based on LOD ResourcesA Survey of Exploratory Search Systems Based on LOD Resources
A Survey of Exploratory Search Systems Based on LOD ResourcesKarwan Jacksi
 
" Overview of the Metadata in the new CountrySTAT platform "
" Overview of the Metadata in the new CountrySTAT platform "" Overview of the Metadata in the new CountrySTAT platform "
" Overview of the Metadata in the new CountrySTAT platform "FAO
 

Mais procurados (20)

Proposal for open government data
Proposal for open government dataProposal for open government data
Proposal for open government data
 
A distributed network of digital heritage information - Semantics Amsterdam
A distributed network of digital heritage information - Semantics AmsterdamA distributed network of digital heritage information - Semantics Amsterdam
A distributed network of digital heritage information - Semantics Amsterdam
 
WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...
WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...
WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...
 
Citizen Science Open Data
Citizen Science Open DataCitizen Science Open Data
Citizen Science Open Data
 
Requirements for Open Sharing of Archaeological Research Data
Requirements for Open Sharing of Archaeological Research DataRequirements for Open Sharing of Archaeological Research Data
Requirements for Open Sharing of Archaeological Research Data
 
Open Data Publication - Requirements, Good practices, and Benefits
Open Data Publication - Requirements, Good practices, and BenefitsOpen Data Publication - Requirements, Good practices, and Benefits
Open Data Publication - Requirements, Good practices, and Benefits
 
01 caa2019 ariadn_eplus_snd_uj_krakow 20190425
01 caa2019 ariadn_eplus_snd_uj_krakow 2019042501 caa2019 ariadn_eplus_snd_uj_krakow 20190425
01 caa2019 ariadn_eplus_snd_uj_krakow 20190425
 
Archiving archaeological data in Austria, Edeltraud Aspöck, Anja Masur OREA/ÖAW
Archiving archaeological data in Austria, Edeltraud Aspöck, Anja Masur OREA/ÖAWArchiving archaeological data in Austria, Edeltraud Aspöck, Anja Masur OREA/ÖAW
Archiving archaeological data in Austria, Edeltraud Aspöck, Anja Masur OREA/ÖAW
 
Adoption and Integration of Persistent Identifiers in European Research Infor...
Adoption and Integration of Persistent Identifiers in European Research Infor...Adoption and Integration of Persistent Identifiers in European Research Infor...
Adoption and Integration of Persistent Identifiers in European Research Infor...
 
EC-WEB: Validator and Preview for the JobPosting Data Model of Schema.org
EC-WEB: Validator and Preview for the JobPosting Data Model of Schema.orgEC-WEB: Validator and Preview for the JobPosting Data Model of Schema.org
EC-WEB: Validator and Preview for the JobPosting Data Model of Schema.org
 
Coherance in dissemination- Msis 2007
Coherance in dissemination- Msis 2007Coherance in dissemination- Msis 2007
Coherance in dissemination- Msis 2007
 
Getting Started with Knowledge Graphs
Getting Started with Knowledge GraphsGetting Started with Knowledge Graphs
Getting Started with Knowledge Graphs
 
Visualising Data on Interactive Maps
Visualising Data on Interactive MapsVisualising Data on Interactive Maps
Visualising Data on Interactive Maps
 
The GND initiative 2017-2021: Developing a Backbone for the Web of Cultural a...
The GND initiative 2017-2021: Developing a Backbone for the Web of Cultural a...The GND initiative 2017-2021: Developing a Backbone for the Web of Cultural a...
The GND initiative 2017-2021: Developing a Backbone for the Web of Cultural a...
 
The Information Workbench - Linked Data and Semantic Wikis in the Enterprise
The Information Workbench - Linked Data and Semantic Wikis in the EnterpriseThe Information Workbench - Linked Data and Semantic Wikis in the Enterprise
The Information Workbench - Linked Data and Semantic Wikis in the Enterprise
 
GeoLinkedData
GeoLinkedDataGeoLinkedData
GeoLinkedData
 
Geo linked data lstd10(v2-boris)
Geo linked data lstd10(v2-boris)Geo linked data lstd10(v2-boris)
Geo linked data lstd10(v2-boris)
 
Open Access of Research Data - The Present and Future Situation in Germany
Open Access of Research Data - The Present and Future Situation in GermanyOpen Access of Research Data - The Present and Future Situation in Germany
Open Access of Research Data - The Present and Future Situation in Germany
 
A Survey of Exploratory Search Systems Based on LOD Resources
A Survey of Exploratory Search Systems Based on LOD ResourcesA Survey of Exploratory Search Systems Based on LOD Resources
A Survey of Exploratory Search Systems Based on LOD Resources
 
" Overview of the Metadata in the new CountrySTAT platform "
" Overview of the Metadata in the new CountrySTAT platform "" Overview of the Metadata in the new CountrySTAT platform "
" Overview of the Metadata in the new CountrySTAT platform "
 

Destaque

Automated News Suggestions for Populating Wikipedia Entity Pages
Automated News Suggestions for Populating Wikipedia Entity PagesAutomated News Suggestions for Populating Wikipedia Entity Pages
Automated News Suggestions for Populating Wikipedia Entity PagesBesnik Fetahu
 
How much is Wikipedia lagging behind News?
How much is Wikipedia lagging behind News?How much is Wikipedia lagging behind News?
How much is Wikipedia lagging behind News?Besnik Fetahu
 
Complex Matching of RDF Datatype Properties
Complex Matching of RDF Datatype PropertiesComplex Matching of RDF Datatype Properties
Complex Matching of RDF Datatype PropertiesBesnik Fetahu
 
euclid_linkedup WWW tutorial (Besnik Fetahu)
euclid_linkedup WWW tutorial (Besnik Fetahu)euclid_linkedup WWW tutorial (Besnik Fetahu)
euclid_linkedup WWW tutorial (Besnik Fetahu)Besnik Fetahu
 
Improving Entity Retrieval on Structured Data
Improving Entity Retrieval on Structured DataImproving Entity Retrieval on Structured Data
Improving Entity Retrieval on Structured DataBesnik Fetahu
 
Summaries on the fly: Query-based Extraction of Structured Knowledge from Web...
Summaries on the fly: Query-based Extraction of Structured Knowledge from Web...Summaries on the fly: Query-based Extraction of Structured Knowledge from Web...
Summaries on the fly: Query-based Extraction of Structured Knowledge from Web...Besnik Fetahu
 
Combining a co-occurrence-based and a semantic measure for entity linking
Combining a co-occurrence-based and a semantic measure for entity linkingCombining a co-occurrence-based and a semantic measure for entity linking
Combining a co-occurrence-based and a semantic measure for entity linkingBesnik Fetahu
 
A Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles
A Scalable Approach for Efficiently Generating Structured Dataset Topic ProfilesA Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles
A Scalable Approach for Efficiently Generating Structured Dataset Topic ProfilesBesnik Fetahu
 
Finding News Citations For Wikipedia
Finding News Citations For WikipediaFinding News Citations For Wikipedia
Finding News Citations For WikipediaBesnik Fetahu
 

Destaque (9)

Automated News Suggestions for Populating Wikipedia Entity Pages
Automated News Suggestions for Populating Wikipedia Entity PagesAutomated News Suggestions for Populating Wikipedia Entity Pages
Automated News Suggestions for Populating Wikipedia Entity Pages
 
How much is Wikipedia lagging behind News?
How much is Wikipedia lagging behind News?How much is Wikipedia lagging behind News?
How much is Wikipedia lagging behind News?
 
Complex Matching of RDF Datatype Properties
Complex Matching of RDF Datatype PropertiesComplex Matching of RDF Datatype Properties
Complex Matching of RDF Datatype Properties
 
euclid_linkedup WWW tutorial (Besnik Fetahu)
euclid_linkedup WWW tutorial (Besnik Fetahu)euclid_linkedup WWW tutorial (Besnik Fetahu)
euclid_linkedup WWW tutorial (Besnik Fetahu)
 
Improving Entity Retrieval on Structured Data
Improving Entity Retrieval on Structured DataImproving Entity Retrieval on Structured Data
Improving Entity Retrieval on Structured Data
 
Summaries on the fly: Query-based Extraction of Structured Knowledge from Web...
Summaries on the fly: Query-based Extraction of Structured Knowledge from Web...Summaries on the fly: Query-based Extraction of Structured Knowledge from Web...
Summaries on the fly: Query-based Extraction of Structured Knowledge from Web...
 
Combining a co-occurrence-based and a semantic measure for entity linking
Combining a co-occurrence-based and a semantic measure for entity linkingCombining a co-occurrence-based and a semantic measure for entity linking
Combining a co-occurrence-based and a semantic measure for entity linking
 
A Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles
A Scalable Approach for Efficiently Generating Structured Dataset Topic ProfilesA Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles
A Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles
 
Finding News Citations For Wikipedia
Finding News Citations For WikipediaFinding News Citations For Wikipedia
Finding News Citations For Wikipedia
 

Semelhante a Towards Integration of Web Data into a coherent Educational Data Graph

2014 10 23 (fie2014) emadrid upm roadmap towards the openness of educational ...
2014 10 23 (fie2014) emadrid upm roadmap towards the openness of educational ...2014 10 23 (fie2014) emadrid upm roadmap towards the openness of educational ...
2014 10 23 (fie2014) emadrid upm roadmap towards the openness of educational ...eMadrid network
 
Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...Lucy McKenna
 
Staffing Research Data Services at University of Edinburgh
Staffing Research Data Services at University of EdinburghStaffing Research Data Services at University of Edinburgh
Staffing Research Data Services at University of EdinburghRobin Rice
 
2015 03 19 (EDUCON2015) eMadrid UPM Towards a Learning Analytics Approach for...
2015 03 19 (EDUCON2015) eMadrid UPM Towards a Learning Analytics Approach for...2015 03 19 (EDUCON2015) eMadrid UPM Towards a Learning Analytics Approach for...
2015 03 19 (EDUCON2015) eMadrid UPM Towards a Learning Analytics Approach for...eMadrid network
 
Creating, Curating, and Using Cultural Heritage Metadata and Resources in a L...
Creating, Curating, and Using Cultural Heritage Metadata and Resources in a L...Creating, Curating, and Using Cultural Heritage Metadata and Resources in a L...
Creating, Curating, and Using Cultural Heritage Metadata and Resources in a L...Visual Resources Association
 
10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...
10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...
10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...DuraSpace
 
Hide the Stack: Toward Usable Linked Data
Hide the Stack:Toward Usable Linked DataHide the Stack:Toward Usable Linked Data
Hide the Stack: Toward Usable Linked Dataaba-sah
 
Semantic web 101: Benefits for geologists
Semantic web 101: Benefits for geologistsSemantic web 101: Benefits for geologists
Semantic web 101: Benefits for geologistsdgarijo
 
Serendipity: a platform to discover and visualize data from OER
Serendipity: a platform to discover and visualize data from OERSerendipity: a platform to discover and visualize data from OER
Serendipity: a platform to discover and visualize data from OERThe Open Education Consortium
 
Linked Open Data for Cultural Heritage
Linked Open Data for Cultural HeritageLinked Open Data for Cultural Heritage
Linked Open Data for Cultural HeritageNoreen Whysel
 
Managing 'Big Data' in the social sciences: the contribution of an analytico-...
Managing 'Big Data' in the social sciences: the contribution of an analytico-...Managing 'Big Data' in the social sciences: the contribution of an analytico-...
Managing 'Big Data' in the social sciences: the contribution of an analytico-...CILIP MDG
 
Research Data Service at the University of Edinburgh
Research Data Service at the University of EdinburghResearch Data Service at the University of Edinburgh
Research Data Service at the University of EdinburghRobin Rice
 
Scottish Digital Library Consortium Meeting: Edinburgh DataShare
Scottish Digital Library Consortium Meeting: Edinburgh DataShareScottish Digital Library Consortium Meeting: Edinburgh DataShare
Scottish Digital Library Consortium Meeting: Edinburgh DataShareRobin Rice
 
Linked Data at the OU - the story so far
Linked Data at the OU - the story so farLinked Data at the OU - the story so far
Linked Data at the OU - the story so farEnrico Daga
 
Big Data (SOCIOMETRIC METHODS FOR RELEVANCY ANALYSIS OF LONG TAIL SCIENCE D...
Big Data (SOCIOMETRIC METHODS FOR  RELEVANCY ANALYSIS OF LONG TAIL  SCIENCE D...Big Data (SOCIOMETRIC METHODS FOR  RELEVANCY ANALYSIS OF LONG TAIL  SCIENCE D...
Big Data (SOCIOMETRIC METHODS FOR RELEVANCY ANALYSIS OF LONG TAIL SCIENCE D...AKSHAY BHAGAT
 
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...giuseppe_futia
 

Semelhante a Towards Integration of Web Data into a coherent Educational Data Graph (20)

2014 10 23 (fie2014) emadrid upm roadmap towards the openness of educational ...
2014 10 23 (fie2014) emadrid upm roadmap towards the openness of educational ...2014 10 23 (fie2014) emadrid upm roadmap towards the openness of educational ...
2014 10 23 (fie2014) emadrid upm roadmap towards the openness of educational ...
 
Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...
 
Semantics-enhanced Geoscience Interoperability, Analytics, and Applications
Semantics-enhanced Geoscience Interoperability, Analytics, and ApplicationsSemantics-enhanced Geoscience Interoperability, Analytics, and Applications
Semantics-enhanced Geoscience Interoperability, Analytics, and Applications
 
Staffing Research Data Services at University of Edinburgh
Staffing Research Data Services at University of EdinburghStaffing Research Data Services at University of Edinburgh
Staffing Research Data Services at University of Edinburgh
 
Open University Data
Open University DataOpen University Data
Open University Data
 
2015 03 19 (EDUCON2015) eMadrid UPM Towards a Learning Analytics Approach for...
2015 03 19 (EDUCON2015) eMadrid UPM Towards a Learning Analytics Approach for...2015 03 19 (EDUCON2015) eMadrid UPM Towards a Learning Analytics Approach for...
2015 03 19 (EDUCON2015) eMadrid UPM Towards a Learning Analytics Approach for...
 
Creating, Curating, and Using Cultural Heritage Metadata and Resources in a L...
Creating, Curating, and Using Cultural Heritage Metadata and Resources in a L...Creating, Curating, and Using Cultural Heritage Metadata and Resources in a L...
Creating, Curating, and Using Cultural Heritage Metadata and Resources in a L...
 
10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...
10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...
10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...
 
Hide the Stack: Toward Usable Linked Data
Hide the Stack:Toward Usable Linked DataHide the Stack:Toward Usable Linked Data
Hide the Stack: Toward Usable Linked Data
 
Hansen Metadata for Institutional Repositories
Hansen Metadata for Institutional RepositoriesHansen Metadata for Institutional Repositories
Hansen Metadata for Institutional Repositories
 
Semantic web 101: Benefits for geologists
Semantic web 101: Benefits for geologistsSemantic web 101: Benefits for geologists
Semantic web 101: Benefits for geologists
 
Serendipity: a platform to discover and visualize data from OER
Serendipity: a platform to discover and visualize data from OERSerendipity: a platform to discover and visualize data from OER
Serendipity: a platform to discover and visualize data from OER
 
Linked Open Data for Cultural Heritage
Linked Open Data for Cultural HeritageLinked Open Data for Cultural Heritage
Linked Open Data for Cultural Heritage
 
Managing 'Big Data' in the social sciences: the contribution of an analytico-...
Managing 'Big Data' in the social sciences: the contribution of an analytico-...Managing 'Big Data' in the social sciences: the contribution of an analytico-...
Managing 'Big Data' in the social sciences: the contribution of an analytico-...
 
Research Data Service at the University of Edinburgh
Research Data Service at the University of EdinburghResearch Data Service at the University of Edinburgh
Research Data Service at the University of Edinburgh
 
Scottish Digital Library Consortium Meeting: Edinburgh DataShare
Scottish Digital Library Consortium Meeting: Edinburgh DataShareScottish Digital Library Consortium Meeting: Edinburgh DataShare
Scottish Digital Library Consortium Meeting: Edinburgh DataShare
 
Linked Data at the OU - the story so far
Linked Data at the OU - the story so farLinked Data at the OU - the story so far
Linked Data at the OU - the story so far
 
Big Data (SOCIOMETRIC METHODS FOR RELEVANCY ANALYSIS OF LONG TAIL SCIENCE D...
Big Data (SOCIOMETRIC METHODS FOR  RELEVANCY ANALYSIS OF LONG TAIL  SCIENCE D...Big Data (SOCIOMETRIC METHODS FOR  RELEVANCY ANALYSIS OF LONG TAIL  SCIENCE D...
Big Data (SOCIOMETRIC METHODS FOR RELEVANCY ANALYSIS OF LONG TAIL SCIENCE D...
 
Lauruhn-5-jun15
Lauruhn-5-jun15Lauruhn-5-jun15
Lauruhn-5-jun15
 
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
 

Último

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 

Último (20)

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 

Towards Integration of Web Data into a coherent Educational Data Graph

  • 1. Motivation Data on the Web 18/06/13Lile 2013 – Rio de Janeiro Some eyecatching opener illustrating growth and or diversity of web data Towards Integration of Web Data into a coherent Educational Data Graph LILE 2013 : 3rd International Workshop on Learning and Education with the Web of Data 14 May 2013, Rio de Janeiro, Brazil Davide Taibi – Besnik Fetahu – Stefan Dietze (CNR – ITD, IT) (L3S Research Center, DE)
  • 2. Outline • Linked Open Data serving data-intensive applications • Heterogeneity of datasets and schemas • Is it all that easy to use Linked Open Data and what are they all about? – Interlinking of datasets only at a superficial level – Different schemas for similar resource classes accross datasets – Non-structured resource descriptions – Best-case scenario: very abstract topic definitions – Difficult to query for a subset of resources and datasets for a specific topic • Our approach – Schema level integration – Enhanced dataset & resource descriptions – Instance level integration – Scalable annotation extraction – Clustering and correlation of datasets 18/06/13 Lile 2013 – Rio de Janeiro
  • 3. Introduction • Large amounts of publicly available Linked Open Data of educational relevance • Difficulties on providing large-scale integration • Dataset and resource description annotation • Clustering and dataset interlinking 18/06/13 Lile 2013 – Rio de Janeiro Educational Data
  • 4. Steps towards a Linked Education Data Graph 18/06/13 Lile 2013 – Rio de Janeiro
  • 5. Schema Level Integration 18/06/13 Lile 2013 – Rio de Janeiro http://data.linkededucation.org/ns/linked-education.rdf
  • 6. Schema Level Integration 18/06/13 Lile 2013 – Rio de Janeiro http://data.linkededucation.org/ns/linked-education.rdf LinkedUniversities Dataset
  • 7. Schema Level Integration • VoID based schema: – http://data.linkededucation.org/ns/linked-education.rdf – Dataset cataloging and classification – Mappings (types, properties) • Datasets: – LinkedUniversities Dataset – mEducator – Europeana • Imported resources for clustering experiments: – 6 millions of distinct resources – 97 millions of RDF triples – 21.6 GB of data • SPARQL endpoint: – http://okkam.l3s.uni-hannover.de:8880/openrdf-workbench/repositories/linked- learning-rdf 18/06/13 Lile 2013 – Rio de Janeiro  DBLP-L3S  BBC programmes  ACM publications
  • 8. Instance-level integration 18/06/13 Lile 2013 – Rio de Janeiro <http://dbpedia.org/page/Gravitation> <http://dbpedia.org/page/Strong> <http://dbpedia.org/page/Dense> • DBpedia Spotlight as NER & NED tool • Annotation of unstructured content • Selective & Scalable annotation • Annotate tokens of different size
  • 9. Instance-level integration Characteristics of enrichments •Disambiguation •Acronyms detection (e.g. “dns”, “gmt”) •Synonyms detection (e.g. “globe”, “earth”) •Context detection (e.g. “apple” fruits, “apple” computer) 18/06/13 Lile 2013 – Rio de Janeiro <http://dbpedia.org/page/Gravitation>
  • 10. Correlation and Clustering 18/06/13 Lile 2013 – Rio de Janeiro Gravitation Equations Earth • Annotations used to construct a network of resources, with edges based on common resource annotations.
  • 11. Correlation and Clustering • Methods used for clustering • Based on the shared enrichments • Naïve • Based on the ef-irf (Enrichment Frequency-Inverse Resource Frequency) index • Jaccard • Cosine Different threshold have been used to generate clusters 18/06/13 Lile 2013 – Rio de Janeiro
  • 12. Evaluation Three evaluation stages: •Quantitative & Qualitative • Assess annotation accuracy for exhaustive and scalable approaches • Measure standard precision/recall metrics • 250 resources for each dataset used for assessment •Performance • Gains in terms of scalability 18/06/13 Lile 2013 – Rio de Janeiro
  • 13. Quantitative Evaluation Context #Resources #Annotations #Entity Types ACM 249 200 239 mEducator 250 495 355 BBC 250 1364 769 LinkedUniversities 243 166 283 DBLP 250 295 161 Europeana 249 938 672 Total 1491 3458 937 18/06/13 Lile 2013 – Rio de Janeiro • Number of extracted entities is related to the length of a textual description in a resource • For long texts up to 87 distinct entities and more than 200 entity type associations
  • 14. Qualitative Evaluation 18/06/13 Lile 2013 – Rio de Janeiro • Human evaluators to measure annotation accuracy • 2000 annotations for both (exhaustive and scalable) approaches were assessed • Number of evaluators for the first approach was 32, with an average of 63 tasks per user, while for the second, there were 23 users with an average of 87 completed tasks Precision Recall Exhaustive 0.82 0.429 Scalable 0.77 0.687 ∆[E-S] -0.05 +0.26
  • 15. Performance Evaluation Size-k No Filtering Filtered:resource level Filtered: dataset level 1 53089 24850 7464 2 51346 17919 13281 3 49603 11800 9607 4 47871 7793 6432 5 46153 5184 4289 6 44480 3529 2922 18/06/13 Lile 2013 – Rio de Janeiro • Reduction of textual content to be analyzed for the annotation phase: • Terms of tags {NN,NNP,NNPS}, reduce the amount of text by almost 40%. • For various token sizes, the reduced amount goes up to 86% • NER complexity task from DBpedia Spotlight: • Reduction of HTTP requests. • Avoid annotating similar chunks of text. • Significant gains in terms of execution time: 3.5hrs vs. 20mins
  • 16. Conclusion • Large-scale educational data-graph • Well-interlinked datasets at schema and instance level • Enhanced dataset and resource description • Scalable annotation procedure • EF-IRF clustering approach • Clusters and correlated datasets 18/06/13 Lile 2013 – Rio de Janeiro
  • 17. Thank you! Questions? 18/06/13 Lile 2013 – Rio de Janeiro

Notas do Editor

  1. http://okkam.l3s.uni-hannover.de:8880/openrdf-workbench/repositories/linked-learning-rdf/summary Previous Work: LinkedEducation 0.5+ - VoiD based schema: URL etc (dataset description and classification, alignments of types and properties) - Datasets: list (=subset of current linked education datasets) - But also imported resources for clustering experiments - Size: 6 million triples etc... - SPARQL endpoint, initial clustering results