SlideShare uma empresa Scribd logo
1 de 17
HDL
Towards a Harmonized Dataset
Model for Open Data Portals
Ahmad Assaf, Raphaël Troncy And Aline Senart
@ahmadaassaf
PROFILES 15 – 2nd International Workshop on Dataset PROFIling & fEderated Search for Linked Data 1st June 2015
HDL Towards a Harmonized Dataset Model for Open Data Portals
Open Data/Linked Open Data
 Open Data (OD) is the data that can be easily discovered, accessed, reused and
redistributed by anyone [Davies et al. 2014]
 Open Data should be placed in public domain under liberal terms of use and available
in electronic formats that are non-proprietary and machine readable.
 Linked Open Data (LOD) refers to the semantically rich, linked and machine readable
open data.
 Open Data has major benefits for citizens, businesses, societies and governments.
2
HDL Towards a Harmonized Dataset Model for Open Data Portals
Metadata
Metadata is structured information that describes, explains, locates or otherwise makes it
easier to retrieve use or manage information resources
Data Discovery,
exploration and
reuse
Organization
&
identification
Archiving
&
preservation
3
HDL Towards a Harmonized Dataset Model for Open Data Portals
Data Portals/Data Management Systems
 Data Portals (Catalogs) are the entry points to discover published
datasets
 Data Portals are a curated collection of datasets metadata providing a
set discovery and integration services.
 Data Portals can be private like datahub.io, publicdata.eu or private like
enigma.io or quandle.com
 Portals are built on top of Data Management Systems (DMS) like
CKAN, DKAN and Socrata
4
HDL Towards a Harmonized Dataset Model for Open Data Portals
Why a Harmonized Model ?
 Exploring/discovering datasets for
(re)use
 Defining a “minimal” set of
information needed to build a
“profile”
 Building tools that will
automatically generate/validate
metadata models
5
 The Data Catalog Vocabulary (DCAT)✝ is a W3C recommendation to facilitate interoperability
between data catalogs on the web
 DCAT is an RDF vocabulary with three main classes: dcat:Catalog, dcat:Dataset and dcat:Distribution
 DCAT Profiles [extensions built upon DCAT]
 DCAT-AP✝✝ defines a minimal set of properties that should be included in a datasets
profile by specifying mandatory and optional properties
 The Asset Description Metadata Schema (ADMS)✝✝✝ is used to semantically describe
assets (code lists, taxonomies, vocabularies)
HDL Towards a Harmonized Dataset Model for Open Data Portals
Dataset Models - DCAT
6
✝ http://w3.org/TR/vocab-dcat/
✝✝ https://joinup.ec.europa.eu/asset/dcat_application_profile/description
✝✝✝ http://www.w3.org/TR/vocab-adms/
HDL Towards a Harmonized Dataset Model for Open Data Portals
Dataset Models - VoID✝
 RDF vocabulary for interlinked datasets
 In addition to describing datasets, VoID
describes the links between datasets
 VoID defines three main classes:
void:Dataset, void:Linkset and void:subset
 A linkset in voiD is a subclass of a dataset,
used for storing triples to express the
interlinking relationship between datasets
7
✝ http://www.w3.org/TR/void/
HDL Towards a Harmonized Dataset Model for Open Data Portals
Dataset Models – CKAN✝/DKAN✝✝
 Data model describes a set of entities (dataset, resource, group, tag)
 Allow additional information to be added via “extra” arbitrary key/value fields
 The core metadata restricted as a JSON file
 Supports Linked Data and RDF by providing a complete and functional mapping of its
model to LD formats
 CKAN support descriptions of vocabularies
 DKAN is a Drupal based DMS
8
✝ http://ckan.org/
✝✝ http://demo.getdkan.com/
 Online collection of best practices
and case studies to help data
publishers
 POD data model is based on DCAT
 Similarly to DCAT-AP, POD defines
three types of metadata elements:
Required, Required-If and
Expanded(optional)
 Metadata extensions using elements
from the “Expanded” fields
HDL Towards a Harmonized Dataset Model for Open Data Portals
Dataset Models - Continued
 Commercial platform to streamline
data publishing, management,
analysis and reusing.
 The model is designed specifically to
represent tabular data
 The model covers a basic set of
metadata properties and has good
support for geospatial data
 A collection of schema used to
markup HTML pages with structured
data
 Covers many domains. We are
interested in the Dataset schema
although we also use various
properties from schemas like
organizations, authors, etc.
9
✝ http://socrata.com/
✝✝ http://schema.org/
✝✝✝ https://project-open-data.cio.gov/
✝ ✝✝ ✝✝✝
10
Ballmer
effect
anyone?
HDL Towards a Harmonized Dataset Model for Open Data Portals
https://xkcd.com/323/
HDL Towards a Harmonized Dataset Model for Open Data Portals
Metadata Classification – Information Groups
11
Organization
Clustering or curation
solely based on
associations with specific
administration parties
Resource
Actual raw data that can
be downloaded or
accessed directly e.g.
JSON, CSV, SPARQL
endpoint
Tag
Descriptive knowledge
about the dataset
contents and structure.
This can range from
simple textual tags to
semantically rich
controlled terms
Group
Organizational units that
share common
semantics. They can be
seen as a cluster or
curation based on shared
themes/categories
HDL Towards a Harmonized Dataset Model for Open Data Portals
Metadata Classification – Information Types
12
General Information
title, description, id
Ownership Information
author, maintainer_email
Provenance Information
version, creation_date, update_date
Access Information
URL, license_title, license_id
Geospatial Information
bbox, layers
Temporal Information
coverage_from, coverage_to
Statistical Information
max_value, uniques, average
Quality Information
rating, availability, freshness
Dataset Metadata
HDL Towards a Harmonized Dataset Model for Open Data Portals
Harmonization Process
 Examine the model or vocabulary specification and documentation
 Examine existing datasets using these models
 Examine the source code for DMS
13
1 Map the information groups [resource, tag, group, organization]
2 Map the information types [general, ownership, provenance, etc.]
HDL Towards a Harmonized Dataset Model for Open Data Portals
Mapping Information Types
14
CKAN maintainer_email
DKAN maintainer_email
POD ContactPoint -> hasEmail
Schema.org CreativeWork:producer -> Person:email
VoID void:Dataset -> dct:creator -> foaf:Person:givenName
DCAT dcat:Dataset -> dct:creator -> foaf:Person:givenName
HDL Towards a Harmonized Dataset Model for Open Data Portals
Extra Information
15
 Examining the models, we noticed an abundance of information filled in “extras” fields
 Using Roomba we generated aggregation reports to inspect those extras on LOD Cloud✝ and
OpenAfrica✝✝
extras>value:extras>name1 Extra fields names and values
resources>resource_type:resources>name2 Types describing resources
 53% of the datasets in OpenAfrica have additional geospatial attached (spatial-reference-system, spatial
harvester, bbox-east-long, bbox-north-long, bbox-south-long, bbox-west-long)
 16% of the datasets have additional provenance and ownership information (frequency-of-update, dataset-
reference-date)
✝ http://datahub.io/group/lodcloud
✝✝ http://africaopendata.org/https://github.com/ahmadassaf/opendata-checker/tree/master/model
HDL Towards a Harmonized Dataset Model for Open Data Portals 16
https://xkcd.com/927/
17HDL Towards a Harmonized Dataset Model for Open Data Portals
Questions?
Ahmad Assaf
http://ahmadassaf.com/
@ahmadaassaf
http://github.com/ahmadassaf

Mais conteúdo relacionado

Mais procurados

Metadata harvesting
Metadata harvestingMetadata harvesting
Metadata harvesting
AndrewLIS688
 
FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...
FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...
FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...
EUDAT
 
The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014
The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014
The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014
Robert Meusel
 
Wed roman tut_open_datapub
Wed roman tut_open_datapubWed roman tut_open_datapub
Wed roman tut_open_datapub
eswcsummerschool
 

Mais procurados (20)

Metadata harvesting Tools
Metadata harvesting ToolsMetadata harvesting Tools
Metadata harvesting Tools
 
Advantages of metadata
Advantages of metadataAdvantages of metadata
Advantages of metadata
 
Metadata harvesting
Metadata harvestingMetadata harvesting
Metadata harvesting
 
Introduction to Metadata
Introduction to MetadataIntroduction to Metadata
Introduction to Metadata
 
Linked Open Data Principles, Technologies and Examples
Linked Open Data Principles, Technologies and ExamplesLinked Open Data Principles, Technologies and Examples
Linked Open Data Principles, Technologies and Examples
 
Gap Analysis
Gap AnalysisGap Analysis
Gap Analysis
 
Applying Digital Library Metadata Standards
Applying Digital Library Metadata StandardsApplying Digital Library Metadata Standards
Applying Digital Library Metadata Standards
 
Metadata an overview
Metadata an overviewMetadata an overview
Metadata an overview
 
Metadata
MetadataMetadata
Metadata
 
PRELIDA Project Draft Roadmap
PRELIDA Project Draft RoadmapPRELIDA Project Draft Roadmap
PRELIDA Project Draft Roadmap
 
FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...
FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...
FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...
 
General concepts: DDI
General concepts: DDIGeneral concepts: DDI
General concepts: DDI
 
The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014
The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014
The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014
 
FAIR Data ecosystem
FAIR Data ecosystemFAIR Data ecosystem
FAIR Data ecosystem
 
Interaction with Linked Data
Interaction with Linked DataInteraction with Linked Data
Interaction with Linked Data
 
Flexible metadata schemes for research data repositories - Clarin Conference...
Flexible metadata schemes for research data repositories  - Clarin Conference...Flexible metadata schemes for research data repositories  - Clarin Conference...
Flexible metadata schemes for research data repositories - Clarin Conference...
 
Meta data
Meta dataMeta data
Meta data
 
Wed roman tut_open_datapub
Wed roman tut_open_datapubWed roman tut_open_datapub
Wed roman tut_open_datapub
 
Providing Linked Data
Providing Linked DataProviding Linked Data
Providing Linked Data
 
Hadoop
HadoopHadoop
Hadoop
 

Destaque

Joseph S Stump Resume
Joseph S Stump ResumeJoseph S Stump Resume
Joseph S Stump Resume
Joseph Stump
 
Snyder_Susan-Resume 2016
Snyder_Susan-Resume 2016Snyder_Susan-Resume 2016
Snyder_Susan-Resume 2016
Susan Snyder
 

Destaque (20)

HOTEL EXPO 2016
HOTEL EXPO 2016HOTEL EXPO 2016
HOTEL EXPO 2016
 
FPGAs libres
FPGAs libresFPGAs libres
FPGAs libres
 
LEY DE COMPAÑÍAS
LEY DE COMPAÑÍASLEY DE COMPAÑÍAS
LEY DE COMPAÑÍAS
 
2016/10/28: Reset ETSII UPM
2016/10/28: Reset ETSII UPM2016/10/28: Reset ETSII UPM
2016/10/28: Reset ETSII UPM
 
Timeplan
TimeplanTimeplan
Timeplan
 
Joseph S Stump Resume
Joseph S Stump ResumeJoseph S Stump Resume
Joseph S Stump Resume
 
Heroku cloud platform
Heroku cloud platformHeroku cloud platform
Heroku cloud platform
 
Resume 1.4
Resume 1.4Resume 1.4
Resume 1.4
 
Nascenia: Road to Software Industry
Nascenia: Road to Software IndustryNascenia: Road to Software Industry
Nascenia: Road to Software Industry
 
Dina_Condon_Resume_2016
Dina_Condon_Resume_2016Dina_Condon_Resume_2016
Dina_Condon_Resume_2016
 
боги древних славян
боги древних славянбоги древних славян
боги древних славян
 
Useful C++ Features You Should be Using
Useful C++ Features You Should be UsingUseful C++ Features You Should be Using
Useful C++ Features You Should be Using
 
Bachillerato de humanidades ok
Bachillerato de humanidades okBachillerato de humanidades ok
Bachillerato de humanidades ok
 
Inspección atún en lata
Inspección atún en lataInspección atún en lata
Inspección atún en lata
 
Reunión de padres y tutores 2016. IES Joaquín Turina
Reunión de padres y tutores 2016. IES Joaquín TurinaReunión de padres y tutores 2016. IES Joaquín Turina
Reunión de padres y tutores 2016. IES Joaquín Turina
 
Decálogo antibulling
Decálogo antibullingDecálogo antibulling
Decálogo antibulling
 
Vagrant vs Docker
Vagrant vs DockerVagrant vs Docker
Vagrant vs Docker
 
Snyder_Susan-Resume 2016
Snyder_Susan-Resume 2016Snyder_Susan-Resume 2016
Snyder_Susan-Resume 2016
 
Formatos de manejo de almacén
Formatos de manejo de almacénFormatos de manejo de almacén
Formatos de manejo de almacén
 
1 q 2016-us-tile-industry-update
1 q 2016-us-tile-industry-update1 q 2016-us-tile-industry-update
1 q 2016-us-tile-industry-update
 

Semelhante a HDL - Towards A Harmonized Dataset Model for Open Data Portals

Enabling Self-service Data Provisioning Through Semantic Enrichment of Data |...
Enabling Self-service Data Provisioning Through Semantic Enrichment of Data |...Enabling Self-service Data Provisioning Through Semantic Enrichment of Data |...
Enabling Self-service Data Provisioning Through Semantic Enrichment of Data |...
Ahmad Assaf
 
Linked Data Planet Key Note
Linked Data Planet Key NoteLinked Data Planet Key Note
Linked Data Planet Key Note
rumito
 

Semelhante a HDL - Towards A Harmonized Dataset Model for Open Data Portals (20)

How to Describe a Dataset. Interoperability Issues, by Valeria Pesce
How to Describe a Dataset. Interoperability Issues, by Valeria PesceHow to Describe a Dataset. Interoperability Issues, by Valeria Pesce
How to Describe a Dataset. Interoperability Issues, by Valeria Pesce
 
Linked Data In Action
Linked Data In ActionLinked Data In Action
Linked Data In Action
 
Dataset Catalogs as a Foundation for FAIR* Data
Dataset Catalogs as a Foundation for FAIR* DataDataset Catalogs as a Foundation for FAIR* Data
Dataset Catalogs as a Foundation for FAIR* Data
 
Enabling Self-service Data Provisioning Through Semantic Enrichment of Data |...
Enabling Self-service Data Provisioning Through Semantic Enrichment of Data |...Enabling Self-service Data Provisioning Through Semantic Enrichment of Data |...
Enabling Self-service Data Provisioning Through Semantic Enrichment of Data |...
 
CLARIN CMDI support in Dataverse
CLARIN CMDI support in Dataverse CLARIN CMDI support in Dataverse
CLARIN CMDI support in Dataverse
 
Llinked open data training for EU institutions
Llinked open data training for EU institutionsLlinked open data training for EU institutions
Llinked open data training for EU institutions
 
Data Wrangling and Visualization Using Python
Data Wrangling and Visualization Using PythonData Wrangling and Visualization Using Python
Data Wrangling and Visualization Using Python
 
Linked data life cycles
Linked data life cyclesLinked data life cycles
Linked data life cycles
 
Flexible metadata schemes for research data repositories - CLARIN Conference'21
Flexible metadata schemes for research data repositories - CLARIN Conference'21Flexible metadata schemes for research data repositories - CLARIN Conference'21
Flexible metadata schemes for research data repositories - CLARIN Conference'21
 
EUDAT data architecture and interoperability aspects – Daan Broeder
EUDAT data architecture and interoperability aspects – Daan BroederEUDAT data architecture and interoperability aspects – Daan Broeder
EUDAT data architecture and interoperability aspects – Daan Broeder
 
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"
 
IRJET- Data Retrieval using Master Resource Description Framework
IRJET- Data Retrieval using Master Resource Description FrameworkIRJET- Data Retrieval using Master Resource Description Framework
IRJET- Data Retrieval using Master Resource Description Framework
 
Ontologies, controlled vocabularies and Dataverse
Ontologies, controlled vocabularies and DataverseOntologies, controlled vocabularies and Dataverse
Ontologies, controlled vocabularies and Dataverse
 
Linked Data Planet Key Note
Linked Data Planet Key NoteLinked Data Planet Key Note
Linked Data Planet Key Note
 
Metadata lecture(9 17-14)
Metadata lecture(9 17-14)Metadata lecture(9 17-14)
Metadata lecture(9 17-14)
 
Let's downscale the semantic web !
Let's downscale the semantic web !Let's downscale the semantic web !
Let's downscale the semantic web !
 
Linked Data Tutorial
Linked Data TutorialLinked Data Tutorial
Linked Data Tutorial
 
A Generic Scientific Data Model and Ontology for Representation of Chemical Data
A Generic Scientific Data Model and Ontology for Representation of Chemical DataA Generic Scientific Data Model and Ontology for Representation of Chemical Data
A Generic Scientific Data Model and Ontology for Representation of Chemical Data
 
CLARIN CMDI use case and flexible metadata schemes
CLARIN CMDI use case and flexible metadata schemes CLARIN CMDI use case and flexible metadata schemes
CLARIN CMDI use case and flexible metadata schemes
 
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 

HDL - Towards A Harmonized Dataset Model for Open Data Portals

  • 1. HDL Towards a Harmonized Dataset Model for Open Data Portals Ahmad Assaf, Raphaël Troncy And Aline Senart @ahmadaassaf PROFILES 15 – 2nd International Workshop on Dataset PROFIling & fEderated Search for Linked Data 1st June 2015
  • 2. HDL Towards a Harmonized Dataset Model for Open Data Portals Open Data/Linked Open Data  Open Data (OD) is the data that can be easily discovered, accessed, reused and redistributed by anyone [Davies et al. 2014]  Open Data should be placed in public domain under liberal terms of use and available in electronic formats that are non-proprietary and machine readable.  Linked Open Data (LOD) refers to the semantically rich, linked and machine readable open data.  Open Data has major benefits for citizens, businesses, societies and governments. 2
  • 3. HDL Towards a Harmonized Dataset Model for Open Data Portals Metadata Metadata is structured information that describes, explains, locates or otherwise makes it easier to retrieve use or manage information resources Data Discovery, exploration and reuse Organization & identification Archiving & preservation 3
  • 4. HDL Towards a Harmonized Dataset Model for Open Data Portals Data Portals/Data Management Systems  Data Portals (Catalogs) are the entry points to discover published datasets  Data Portals are a curated collection of datasets metadata providing a set discovery and integration services.  Data Portals can be private like datahub.io, publicdata.eu or private like enigma.io or quandle.com  Portals are built on top of Data Management Systems (DMS) like CKAN, DKAN and Socrata 4
  • 5. HDL Towards a Harmonized Dataset Model for Open Data Portals Why a Harmonized Model ?  Exploring/discovering datasets for (re)use  Defining a “minimal” set of information needed to build a “profile”  Building tools that will automatically generate/validate metadata models 5
  • 6.  The Data Catalog Vocabulary (DCAT)✝ is a W3C recommendation to facilitate interoperability between data catalogs on the web  DCAT is an RDF vocabulary with three main classes: dcat:Catalog, dcat:Dataset and dcat:Distribution  DCAT Profiles [extensions built upon DCAT]  DCAT-AP✝✝ defines a minimal set of properties that should be included in a datasets profile by specifying mandatory and optional properties  The Asset Description Metadata Schema (ADMS)✝✝✝ is used to semantically describe assets (code lists, taxonomies, vocabularies) HDL Towards a Harmonized Dataset Model for Open Data Portals Dataset Models - DCAT 6 ✝ http://w3.org/TR/vocab-dcat/ ✝✝ https://joinup.ec.europa.eu/asset/dcat_application_profile/description ✝✝✝ http://www.w3.org/TR/vocab-adms/
  • 7. HDL Towards a Harmonized Dataset Model for Open Data Portals Dataset Models - VoID✝  RDF vocabulary for interlinked datasets  In addition to describing datasets, VoID describes the links between datasets  VoID defines three main classes: void:Dataset, void:Linkset and void:subset  A linkset in voiD is a subclass of a dataset, used for storing triples to express the interlinking relationship between datasets 7 ✝ http://www.w3.org/TR/void/
  • 8. HDL Towards a Harmonized Dataset Model for Open Data Portals Dataset Models – CKAN✝/DKAN✝✝  Data model describes a set of entities (dataset, resource, group, tag)  Allow additional information to be added via “extra” arbitrary key/value fields  The core metadata restricted as a JSON file  Supports Linked Data and RDF by providing a complete and functional mapping of its model to LD formats  CKAN support descriptions of vocabularies  DKAN is a Drupal based DMS 8 ✝ http://ckan.org/ ✝✝ http://demo.getdkan.com/
  • 9.  Online collection of best practices and case studies to help data publishers  POD data model is based on DCAT  Similarly to DCAT-AP, POD defines three types of metadata elements: Required, Required-If and Expanded(optional)  Metadata extensions using elements from the “Expanded” fields HDL Towards a Harmonized Dataset Model for Open Data Portals Dataset Models - Continued  Commercial platform to streamline data publishing, management, analysis and reusing.  The model is designed specifically to represent tabular data  The model covers a basic set of metadata properties and has good support for geospatial data  A collection of schema used to markup HTML pages with structured data  Covers many domains. We are interested in the Dataset schema although we also use various properties from schemas like organizations, authors, etc. 9 ✝ http://socrata.com/ ✝✝ http://schema.org/ ✝✝✝ https://project-open-data.cio.gov/ ✝ ✝✝ ✝✝✝
  • 10. 10 Ballmer effect anyone? HDL Towards a Harmonized Dataset Model for Open Data Portals https://xkcd.com/323/
  • 11. HDL Towards a Harmonized Dataset Model for Open Data Portals Metadata Classification – Information Groups 11 Organization Clustering or curation solely based on associations with specific administration parties Resource Actual raw data that can be downloaded or accessed directly e.g. JSON, CSV, SPARQL endpoint Tag Descriptive knowledge about the dataset contents and structure. This can range from simple textual tags to semantically rich controlled terms Group Organizational units that share common semantics. They can be seen as a cluster or curation based on shared themes/categories
  • 12. HDL Towards a Harmonized Dataset Model for Open Data Portals Metadata Classification – Information Types 12 General Information title, description, id Ownership Information author, maintainer_email Provenance Information version, creation_date, update_date Access Information URL, license_title, license_id Geospatial Information bbox, layers Temporal Information coverage_from, coverage_to Statistical Information max_value, uniques, average Quality Information rating, availability, freshness Dataset Metadata
  • 13. HDL Towards a Harmonized Dataset Model for Open Data Portals Harmonization Process  Examine the model or vocabulary specification and documentation  Examine existing datasets using these models  Examine the source code for DMS 13 1 Map the information groups [resource, tag, group, organization] 2 Map the information types [general, ownership, provenance, etc.]
  • 14. HDL Towards a Harmonized Dataset Model for Open Data Portals Mapping Information Types 14 CKAN maintainer_email DKAN maintainer_email POD ContactPoint -> hasEmail Schema.org CreativeWork:producer -> Person:email VoID void:Dataset -> dct:creator -> foaf:Person:givenName DCAT dcat:Dataset -> dct:creator -> foaf:Person:givenName
  • 15. HDL Towards a Harmonized Dataset Model for Open Data Portals Extra Information 15  Examining the models, we noticed an abundance of information filled in “extras” fields  Using Roomba we generated aggregation reports to inspect those extras on LOD Cloud✝ and OpenAfrica✝✝ extras>value:extras>name1 Extra fields names and values resources>resource_type:resources>name2 Types describing resources  53% of the datasets in OpenAfrica have additional geospatial attached (spatial-reference-system, spatial harvester, bbox-east-long, bbox-north-long, bbox-south-long, bbox-west-long)  16% of the datasets have additional provenance and ownership information (frequency-of-update, dataset- reference-date) ✝ http://datahub.io/group/lodcloud ✝✝ http://africaopendata.org/https://github.com/ahmadassaf/opendata-checker/tree/master/model
  • 16. HDL Towards a Harmonized Dataset Model for Open Data Portals 16 https://xkcd.com/927/
  • 17. 17HDL Towards a Harmonized Dataset Model for Open Data Portals Questions? Ahmad Assaf http://ahmadassaf.com/ @ahmadaassaf http://github.com/ahmadassaf

Notas do Editor

  1. An asset is something that can be opened and read using a familiar desktop software as opposed to the need to be processed like raw data.
  2. The interlinking is modelled by a linkset (void:Linkset). A linkset in voiD is a subclass of a dataset, used for storing triples to express the interlinking relationship between datasets. In each interlinking triple, the subject is a resource hosted in one dataset and the object is a resource hosted in another dataset. This modelling enables a flexible and powerful way to talk in great detail about the interlinking between two datasets, such as how many links there exist, which kind of links (e.g. owl:sameAs or foaf:knows) are present, or stating who claims these statements.