SlideShare uma empresa Scribd logo
1 de 30
Linked Statistical Data:
does it actually pay off?
Keynote at
3rd International Workshop on Semantic Statistics
(SemStats 2015)
11/10/2015
Oscar Corcho
ocorcho@fi.upm.es
@ocorcho
https://www.slideshare.com/ocorcho
Disclaimers
• I am convinced about the
potential benefits of
joining Semantics and
Statistics
• I will provide a very
practical point of view
• From my own experience
in working with semantics
and statistics
• I may have not followed
all recent advances
• I am here to learn as well
• I will be provocative
• Food for thought…
Structure of the talk
Part I. Some RDF Data Cube datasets that I have
created and (simple) applications on top of them
Part II. Lessons learned, reflections and a view towards
the future
Structure of the talk
Part I. Some RDF Data Cube datasets that I have
created and (simple) applications on top of them
Part II. Lessons learned, reflections and a view towards
the future
Why did they call me?
• I did not participate actively on the W3C RDF Data
Cube discussions…
• I have not submitted any papers to SemStats
• I have not participated in the yearly challenges
• Even if I always say “I have to do it…”.
• So what? Let’s look into what I have done in this
area…
A few places where I have worked on data cubes
From the lab…
…to the market
Map4RDF
Map4RDF-iOS
Visualisation tools from the lab…
• Map4RDF and Map4RDF-iOS
• http://oeg-upm.github.io/map4rdf/
• Visualisations tools originally created for Geographical
Linked Data
• Faceted browsing
• Map-based visualisation
• Data inspection
• Data curation
• Extra features: bounding boxes, route planning, etc.
• And extended to RDF Data Cube-related data
Visualisation tools from the lab…
• https://youtu.be/us8wsG8HfKg
Geomarketing at Localidata
• https://youtu.be/DyLk3jInfkI
RDF Data Cube visualisations at Localidata
• https://youtu.be/aPqg_eoLVt4
Statistical data in Aragón
• Early work already available…
• http://opendata.aragon.es/
• Land use
• Recycling
• Lodging
• Work in progress now
with a list of 1940 reports
BBVA challenge
• https://youtu.be/sqfSsGQ3De8
Data Cube at the norm UNE 178301:2015
• UNE 178301:2015
• Norm on Open Data for
Smart Cities
• Organised by
• AENOR CTN 178 group
• Government and Mobility
• Government
• Open Data
(led by Localidata)
• Formed by
• Several cities
• Private companies
• Nation-wide
organisations
W3C RDF Data Cube proposed as the vocabulary to use for publishing open data about population
Structure of the talk
Part I. Some RDF Data Cube datasets that I have
created and (simple) applications on top of them
Part II. Lessons learned, reflections and a view towards
the future
Linked Statistical Data:
The Good, The Bad and the Ugly
Keynote at
3rd International Workshop on Semantic Statistics
(SemStats 2015)
11/10/2015
Oscar Corcho
ocorcho@fi.upm.es
@ocorcho
https://www.slideshare.com/ocorcho
Note: Not sure about the license of this image
Linked Statistical Data: The Good
• URIs everywhere
• Easer treatment and linking
• Effective visualisations
• Especially map-based
• They allow breaking the data silos of a statistical office (even
when micro-data, macro-data and indicators are published)
• Easier cross-dataset querying
• E.g., give me the statistics about recycling of places with
more than 5000 inhabitants and ruled by political party X.
• See my talk at the COLD workshop tomorrow to learn more
• Simplified manner of accessing SDMX/PC-Axis/TSV
data for outsiders
• Non statisticians who know a bit of SPARQL and don’t
dislike SKOS (XKOS)
The Good: URIs everywhere and ontologies
• What do the columns
mean?
• unit PER
• geotime FI
• Which are their units of
measurement?
• All these should be
attached to a
methodology page, but
this is not always the
case (e.g. Eurostat)
The Good: a single language for cross-dataset querying
• Get municipalities and the number of hectares dedicated to airports for
those municipalities with an area smaller than 50 square kilometers
PREFIX aragodef: <http://opendata.aragon.es/def/Aragopedia#>
PREFIX dbpedia: <http://dbpedia.org/ontology/>
PREFIX qb: <http://purl.org/linked-data/cube#>
SELECT DISTINCT ?municipio ?ha
WHERE {
?x a qb:Observation .
?x qb:dataSet <http://opendata.aragon.es/recurso/DataSet/UsoSuelo> .
?x aragodef:hectareasAeropuertos ?ha .
?x aragodef:refArea ?municipio .
?municipio a dbpedia:Municipality .
?municipio aragodef:areaTotal ?area .
FILTER(?area<50 && ?ha != 0)
} ORDER BY DESC(?ha)
Linked Statistical Data: The Bad
• RDF Data Cube datasets are too large in size
• Rather simple datasets easily go up to 1Gb in Turtle
• Obviously, they can be always HDT-ed, compressed, etc.
• RDF Data Cube lacks some
simple property to let us
know how to aggregate values
of a dimension
• Can the values of dimension X in
this dataset be aggregated by
AVG, SUM, or something else?
• What about performance in general-purpose triple stores?
• Are analytical queries to be done on RDF Data Cube data?
• Challenge and opportunity for improved data structures
• See also the work of Benedikt Kämpgen
Linked Statistical Data: The Ugly
• Generating (and validating) Data Structure
Definitions is time consuming and error prone
• People in the audience, how do you do it?
• Manual/ad-hoc transformations (e.g., OpenRefine,
Kettle) into RDF Data Cube may lead to errors when
loading in CubeViz, OpenCube, etc.
• How can I run tests?
• We need simple services for
developers to use
• Easy-to-understand REST API
• And Linked Data for
observations
• Does it make sense?
Ugly things can always become pretty
Welcome to our Linked Statistical Data beauty center
Generating data structure definitions (I)
+
Generating data structure definitions (II)
Tests and validators for RDF Data Cube datasets
Simple APIs to make use of RDF Data Cube (I)
• Get servers
• http://stats.linkeddata.es/services/getServers
• Get available datasets from a server or all servers
• http://stats.linkeddata.es/services/getStatistics?Server=http://
sandbox.linkeddata.es/sparql
• http://stats.linkeddata.es/services/getStatistics?Server=ALL
• Get available datasets from a geo resource
• http://stats.linkeddata.es/services/getStatistics?Server=http://
localidata.oeg-
upm.net/sparql&URI=http://datos.localidata.com/recurso/terri
torio/Provincia/Madrid/Municipio/madrid/Distrito/09/Seccion/0
43
Simple APIs to make use of RDF Data Cube (II)
• Get dimensions from server, resource and dataset
• http://stats.linkeddata.es/services/getDimensions?Server=htt
p://localidata.oeg-upm.net/sparql&
Statistic=http://datos.localidata.com/recurso/CityStats/Provin
cia/Madrid/Poblacion/2012/12
• Get values for X axis
• http://stats.linkeddata.es/services/getStatisticsXValues?Serv
er=http://localidata.oeg-upm.net/sparql&
Statistic=http://datos.localidata.com/recurso/CityStats/Provin
cia/Madrid/Poblacion/2012/12&
Dimension=http://datos.localidata.com/def/CityStats/dimensi
on%23refPaisNacionalidad
Simple APIs to make use of RDF Data Cube (III)
• Get values for the X and Y axis, aggregation: SUM
• http://stats.linkeddata.es/services/getStatisticsValues?Server
=http://localidata.oeg-upm.net/sparql&
Statistic=http://datos.localidata.com/recurso/CityStats/Provin
cia/Madrid/Poblacion/2012/12&
Dimension=http://datos.localidata.com/def/CityStats/dimensi
on%23refPaisNacionalidad&
URI=http://datos.localidata.com/recurso/territorio/Provincia/M
adrid/Municipio/madrid/Distrito/09/Seccion/043&
DimensionY=http://datos.localidata.com/def/CityStats/stats%
23numeroHabitantes&
aggr=SUM
Linked Data for RDF Data Cube
• ELDA profiles for datasets and observations
My wish list to make this guy even prettier
• A (SKOS/XKOS) codelist finder
• RAMON (http://ec.europa.eu/eurostat/ramon/) for SKOS
• Given the values for this dimension, tell me which codelists I
may want to make use of
• Specifying applicable aggregators for dimensions
• SDMX/PCAxis connectors to automate
transformations
• I hate starting from CSVs
• JSON-stat convertor in
OpenCube?
• Optimised operators and data
structures to deal with queries
• A paper at the COLD workshop talking about this: Optimizing
RDF Data Cubes for Efficient Processing of Analytical
Queries
not
so
Linked Statistical Data:
does it actually pay off?
or… The Good, The Bad and The
not-so Ugly
Keynote at
3rd International Workshop on Semantic Statistics
(SemStats 2015)
11/10/2015
Oscar Corcho
ocorcho@fi.upm.es
@ocorcho
https://www.slideshare.com/ocorcho

Mais conteúdo relacionado

Mais procurados

Identifying The Benefit of Linked Data
Identifying The Benefit of Linked DataIdentifying The Benefit of Linked Data
Identifying The Benefit of Linked Data
Richard Wallis
 
DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDTDBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
Herbert Van de Sompel
 

Mais procurados (20)

The Web of Data is Our Oyster
The Web of Data is Our OysterThe Web of Data is Our Oyster
The Web of Data is Our Oyster
 
Publishing and Consuming FAIR Data A Case in the Agri-Food Domain
Publishing and Consuming FAIR DataA Case in the Agri-Food DomainPublishing and Consuming FAIR DataA Case in the Agri-Food Domain
Publishing and Consuming FAIR Data A Case in the Agri-Food Domain
 
Web Driven Revolution For Library Data
Web Driven Revolution For Library DataWeb Driven Revolution For Library Data
Web Driven Revolution For Library Data
 
An introduction to Linked Open Data
An introduction to Linked Open DataAn introduction to Linked Open Data
An introduction to Linked Open Data
 
LD4L OCLC Data Strategy
LD4L OCLC Data StrategyLD4L OCLC Data Strategy
LD4L OCLC Data Strategy
 
Designing Linked Data Software & Services for Libraries
Designing Linked Data Software & Services for LibrariesDesigning Linked Data Software & Services for Libraries
Designing Linked Data Software & Services for Libraries
 
semantic markup using schema.org
semantic markup using schema.orgsemantic markup using schema.org
semantic markup using schema.org
 
The Web of Data is Our Opportunity
The Web of Data is Our OpportunityThe Web of Data is Our Opportunity
The Web of Data is Our Opportunity
 
Entification: The Route to 'Useful' Library Data
Entification: The Route to 'Useful' Library DataEntification: The Route to 'Useful' Library Data
Entification: The Route to 'Useful' Library Data
 
Schema.org - An Extending Influence
Schema.org - An Extending InfluenceSchema.org - An Extending Influence
Schema.org - An Extending Influence
 
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...
 
Telling the World and Our Users What We Have
Telling the World and Our Users What We HaveTelling the World and Our Users What We Have
Telling the World and Our Users What We Have
 
Identifying The Benefit of Linked Data
Identifying The Benefit of Linked DataIdentifying The Benefit of Linked Data
Identifying The Benefit of Linked Data
 
Web of Data and its Status on Persian Web Data Space
Web of Data and its Status on Persian Web Data SpaceWeb of Data and its Status on Persian Web Data Space
Web of Data and its Status on Persian Web Data Space
 
WorldCat, Works, and Schema.org
WorldCat, Works, and Schema.orgWorldCat, Works, and Schema.org
WorldCat, Works, and Schema.org
 
The Information Workbench - Linked Data and Semantic Wikis in the Enterprise
The Information Workbench - Linked Data and Semantic Wikis in the EnterpriseThe Information Workbench - Linked Data and Semantic Wikis in the Enterprise
The Information Workbench - Linked Data and Semantic Wikis in the Enterprise
 
30° Nexa Lunch Seminar - Linked Data Platform vs real world
30° Nexa Lunch Seminar - Linked Data Platform vs real world30° Nexa Lunch Seminar - Linked Data Platform vs real world
30° Nexa Lunch Seminar - Linked Data Platform vs real world
 
DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDTDBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
 
DSpace-CRIS: An Open Source Solution for Research - @THETA15
DSpace-CRIS: An Open Source Solution for Research - @THETA15DSpace-CRIS: An Open Source Solution for Research - @THETA15
DSpace-CRIS: An Open Source Solution for Research - @THETA15
 
Contextual Computing - Knowledge Graphs & Web of Entities
Contextual Computing - Knowledge Graphs & Web of EntitiesContextual Computing - Knowledge Graphs & Web of Entities
Contextual Computing - Knowledge Graphs & Web of Entities
 

Destaque

Destaque (10)

Banner
BannerBanner
Banner
 
STARS4ALL general presentation at ALAN2016
STARS4ALL general presentation at ALAN2016STARS4ALL general presentation at ALAN2016
STARS4ALL general presentation at ALAN2016
 
Linked Statistical Data 101
Linked Statistical Data 101Linked Statistical Data 101
Linked Statistical Data 101
 
Aplicando los principios de Linked Data en AEMET
Aplicando los principios de Linked Data en AEMETAplicando los principios de Linked Data en AEMET
Aplicando los principios de Linked Data en AEMET
 
Detrás de un gran dataset siempre hay un gran vocabulario
Detrás de un gran dataset siempre hay un gran vocabularioDetrás de un gran dataset siempre hay un gran vocabulario
Detrás de un gran dataset siempre hay un gran vocabulario
 
Ojo Al Data 100 - Call for sharing session at IODC 2016
Ojo Al Data 100 - Call for sharing session at IODC 2016Ojo Al Data 100 - Call for sharing session at IODC 2016
Ojo Al Data 100 - Call for sharing session at IODC 2016
 
Presentación de la red de excelencia de Open Data y Smart Cities
Presentación de la red de excelencia de Open Data y Smart CitiesPresentación de la red de excelencia de Open Data y Smart Cities
Presentación de la red de excelencia de Open Data y Smart Cities
 
ARIADNE: Initial Dissemination Plan
ARIADNE: Initial Dissemination PlanARIADNE: Initial Dissemination Plan
ARIADNE: Initial Dissemination Plan
 
Educando sobre datos abiertos: desde el colegio a la universidad
Educando sobre datos abiertos: desde el colegio a la universidadEducando sobre datos abiertos: desde el colegio a la universidad
Educando sobre datos abiertos: desde el colegio a la universidad
 
Generación de datos estadísticos enlazados del Instituto Aragonés de Estadística
Generación de datos estadísticos enlazados del Instituto Aragonés de EstadísticaGeneración de datos estadísticos enlazados del Instituto Aragonés de Estadística
Generación de datos estadísticos enlazados del Instituto Aragonés de Estadística
 

Semelhante a Linked Statistical Data: does it actually pay off?

Dublinked tech workshop_15_dec2011
Dublinked tech workshop_15_dec2011Dublinked tech workshop_15_dec2011
Dublinked tech workshop_15_dec2011
Dublinked .
 
DISIT Lab overview: smart city, big data, semantic computing, cloud
DISIT Lab overview: smart city, big data, semantic computing, cloudDISIT Lab overview: smart city, big data, semantic computing, cloud
DISIT Lab overview: smart city, big data, semantic computing, cloud
Paolo Nesi
 
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
ALTER WAY
 

Semelhante a Linked Statistical Data: does it actually pay off? (20)

Smarter Data for Smarter Libraries
Smarter Data for Smarter LibrariesSmarter Data for Smarter Libraries
Smarter Data for Smarter Libraries
 
Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA
Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSABetter Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA
Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA
 
Dublinked tech workshop_15_dec2011
Dublinked tech workshop_15_dec2011Dublinked tech workshop_15_dec2011
Dublinked tech workshop_15_dec2011
 
Ontology Engineering at Scale for Open City Data Sharing
Ontology Engineering at Scale for Open City Data SharingOntology Engineering at Scale for Open City Data Sharing
Ontology Engineering at Scale for Open City Data Sharing
 
Reusing and Unifying Background Knowledge for Internet of Things with LOV4IoT
Reusing and Unifying Background Knowledge for Internet of Things with LOV4IoTReusing and Unifying Background Knowledge for Internet of Things with LOV4IoT
Reusing and Unifying Background Knowledge for Internet of Things with LOV4IoT
 
FiCloud2016 lov4iot extended
FiCloud2016 lov4iot extended FiCloud2016 lov4iot extended
FiCloud2016 lov4iot extended
 
Methodological Guidelines for Publishing Linked Data
Methodological Guidelines for Publishing Linked DataMethodological Guidelines for Publishing Linked Data
Methodological Guidelines for Publishing Linked Data
 
DISIT Lab overview: smart city, big data, semantic computing, cloud
DISIT Lab overview: smart city, big data, semantic computing, cloudDISIT Lab overview: smart city, big data, semantic computing, cloud
DISIT Lab overview: smart city, big data, semantic computing, cloud
 
2015 GIS in Colorado Track: Challenges Standardizing and Implementing Metadat...
2015 GIS in Colorado Track: Challenges Standardizing and Implementing Metadat...2015 GIS in Colorado Track: Challenges Standardizing and Implementing Metadat...
2015 GIS in Colorado Track: Challenges Standardizing and Implementing Metadat...
 
Supporting the digital transformation of the society with APIs (@Polimi)
Supporting the digital transformation of the society with APIs (@Polimi)Supporting the digital transformation of the society with APIs (@Polimi)
Supporting the digital transformation of the society with APIs (@Polimi)
 
MuseoTorino, first italian project using a GraphDB, RDFa, Linked Open Data
MuseoTorino, first italian project using a GraphDB, RDFa, Linked Open DataMuseoTorino, first italian project using a GraphDB, RDFa, Linked Open Data
MuseoTorino, first italian project using a GraphDB, RDFa, Linked Open Data
 
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
 
Values & Vision - Cloud Sandboxes for BIG Earth Sciences
Values & Vision - Cloud Sandboxes for BIG Earth SciencesValues & Vision - Cloud Sandboxes for BIG Earth Sciences
Values & Vision - Cloud Sandboxes for BIG Earth Sciences
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data Virtualization
 
FIWARE Wednesday Webinars - Cities as Enablers of the Data Economy: Smart Dat...
FIWARE Wednesday Webinars - Cities as Enablers of the Data Economy: Smart Dat...FIWARE Wednesday Webinars - Cities as Enablers of the Data Economy: Smart Dat...
FIWARE Wednesday Webinars - Cities as Enablers of the Data Economy: Smart Dat...
 
INTERFACE, by apidays - The Evolution of Data Movement.pdf
INTERFACE, by apidays - The Evolution of Data Movement.pdfINTERFACE, by apidays - The Evolution of Data Movement.pdf
INTERFACE, by apidays - The Evolution of Data Movement.pdf
 
Proof of Concept for Learning Analytics Interoperability
Proof of Concept for Learning Analytics InteroperabilityProof of Concept for Learning Analytics Interoperability
Proof of Concept for Learning Analytics Interoperability
 
FIWARE Wednesday Webinars - NGSI-LD and Smart Data Models: Standard Access to...
FIWARE Wednesday Webinars - NGSI-LD and Smart Data Models: Standard Access to...FIWARE Wednesday Webinars - NGSI-LD and Smart Data Models: Standard Access to...
FIWARE Wednesday Webinars - NGSI-LD and Smart Data Models: Standard Access to...
 
IoT Interoperability: a Hub-based Approach
IoT Interoperability: a Hub-based ApproachIoT Interoperability: a Hub-based Approach
IoT Interoperability: a Hub-based Approach
 
Planetdata simpda
Planetdata simpdaPlanetdata simpda
Planetdata simpda
 

Mais de Oscar Corcho

Mais de Oscar Corcho (19)

Organisational Interoperability in Practice at Universidad Politécnica de Madrid
Organisational Interoperability in Practice at Universidad Politécnica de MadridOrganisational Interoperability in Practice at Universidad Politécnica de Madrid
Organisational Interoperability in Practice at Universidad Politécnica de Madrid
 
Introducción a los Datos Abiertos - Open Data Day 2020
Introducción a los Datos Abiertos - Open Data Day 2020Introducción a los Datos Abiertos - Open Data Day 2020
Introducción a los Datos Abiertos - Open Data Day 2020
 
Open Data (and Software, and other Research Artefacts) - A proper management
Open Data (and Software, and other Research Artefacts) -A proper managementOpen Data (and Software, and other Research Artefacts) -A proper management
Open Data (and Software, and other Research Artefacts) - A proper management
 
Adiós a los ficheros, hola a los grafos de conocimientos estadísticos
Adiós a los ficheros, hola a los grafos de conocimientos estadísticosAdiós a los ficheros, hola a los grafos de conocimientos estadísticos
Adiós a los ficheros, hola a los grafos de conocimientos estadísticos
 
Situación de las iniciativas de Open Data internacionales (y algunas recomen...
Situación de las iniciativas de Open Data internacionales (y algunas recomen...Situación de las iniciativas de Open Data internacionales (y algunas recomen...
Situación de las iniciativas de Open Data internacionales (y algunas recomen...
 
STARS4ALL - Contaminación Lumínica
STARS4ALL - Contaminación LumínicaSTARS4ALL - Contaminación Lumínica
STARS4ALL - Contaminación Lumínica
 
Towards Reproducible Science: a few building blocks from my personal experience
Towards Reproducible Science: a few building blocks from my personal experienceTowards Reproducible Science: a few building blocks from my personal experience
Towards Reproducible Science: a few building blocks from my personal experience
 
Publishing Linked Statistical Data: Aragón, a case study
Publishing Linked Statistical Data: Aragón, a case studyPublishing Linked Statistical Data: Aragón, a case study
Publishing Linked Statistical Data: Aragón, a case study
 
An initial analysis of topic-based similarity among scientific documents base...
An initial analysis of topic-based similarity among scientific documents base...An initial analysis of topic-based similarity among scientific documents base...
An initial analysis of topic-based similarity among scientific documents base...
 
Research Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibilityResearch Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibility
 
(Big) Data (Science) Skills
(Big) Data (Science) Skills(Big) Data (Science) Skills
(Big) Data (Science) Skills
 
Big Data - El Futuro a través de los Datos
Big Data - El Futuro a través de los DatosBig Data - El Futuro a través de los Datos
Big Data - El Futuro a través de los Datos
 
Aspectos técnicos de la ontología PPROC
Aspectos técnicos de la ontología PPROCAspectos técnicos de la ontología PPROC
Aspectos técnicos de la ontología PPROC
 
AragoDBpedia
AragoDBpediaAragoDBpedia
AragoDBpedia
 
A Linked Data Dataset for Madrid Transport Authority's Datasets
A Linked Data Dataset for Madrid Transport Authority's DatasetsA Linked Data Dataset for Madrid Transport Authority's Datasets
A Linked Data Dataset for Madrid Transport Authority's Datasets
 
The role of annotation in reproducibility (Empirical 2014)
The role of annotation in reproducibility (Empirical 2014)The role of annotation in reproducibility (Empirical 2014)
The role of annotation in reproducibility (Empirical 2014)
 
Best practices for Archival Processing of Research Objects (a librarian view)
Best practices for Archival Processing of Research Objects (a librarian view)Best practices for Archival Processing of Research Objects (a librarian view)
Best practices for Archival Processing of Research Objects (a librarian view)
 
Linked Data: Oportunidades para el Transporte
Linked Data: Oportunidades para el TransporteLinked Data: Oportunidades para el Transporte
Linked Data: Oportunidades para el Transporte
 
OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Pr...
OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Pr...OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Pr...
OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Pr...
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

Linked Statistical Data: does it actually pay off?

  • 1. Linked Statistical Data: does it actually pay off? Keynote at 3rd International Workshop on Semantic Statistics (SemStats 2015) 11/10/2015 Oscar Corcho ocorcho@fi.upm.es @ocorcho https://www.slideshare.com/ocorcho
  • 2. Disclaimers • I am convinced about the potential benefits of joining Semantics and Statistics • I will provide a very practical point of view • From my own experience in working with semantics and statistics • I may have not followed all recent advances • I am here to learn as well • I will be provocative • Food for thought…
  • 3. Structure of the talk Part I. Some RDF Data Cube datasets that I have created and (simple) applications on top of them Part II. Lessons learned, reflections and a view towards the future
  • 4. Structure of the talk Part I. Some RDF Data Cube datasets that I have created and (simple) applications on top of them Part II. Lessons learned, reflections and a view towards the future
  • 5. Why did they call me? • I did not participate actively on the W3C RDF Data Cube discussions… • I have not submitted any papers to SemStats • I have not participated in the yearly challenges • Even if I always say “I have to do it…”. • So what? Let’s look into what I have done in this area…
  • 6. A few places where I have worked on data cubes From the lab… …to the market Map4RDF Map4RDF-iOS
  • 7. Visualisation tools from the lab… • Map4RDF and Map4RDF-iOS • http://oeg-upm.github.io/map4rdf/ • Visualisations tools originally created for Geographical Linked Data • Faceted browsing • Map-based visualisation • Data inspection • Data curation • Extra features: bounding boxes, route planning, etc. • And extended to RDF Data Cube-related data
  • 8. Visualisation tools from the lab… • https://youtu.be/us8wsG8HfKg
  • 9. Geomarketing at Localidata • https://youtu.be/DyLk3jInfkI
  • 10. RDF Data Cube visualisations at Localidata • https://youtu.be/aPqg_eoLVt4
  • 11. Statistical data in Aragón • Early work already available… • http://opendata.aragon.es/ • Land use • Recycling • Lodging • Work in progress now with a list of 1940 reports
  • 13. Data Cube at the norm UNE 178301:2015 • UNE 178301:2015 • Norm on Open Data for Smart Cities • Organised by • AENOR CTN 178 group • Government and Mobility • Government • Open Data (led by Localidata) • Formed by • Several cities • Private companies • Nation-wide organisations W3C RDF Data Cube proposed as the vocabulary to use for publishing open data about population
  • 14. Structure of the talk Part I. Some RDF Data Cube datasets that I have created and (simple) applications on top of them Part II. Lessons learned, reflections and a view towards the future
  • 15. Linked Statistical Data: The Good, The Bad and the Ugly Keynote at 3rd International Workshop on Semantic Statistics (SemStats 2015) 11/10/2015 Oscar Corcho ocorcho@fi.upm.es @ocorcho https://www.slideshare.com/ocorcho Note: Not sure about the license of this image
  • 16. Linked Statistical Data: The Good • URIs everywhere • Easer treatment and linking • Effective visualisations • Especially map-based • They allow breaking the data silos of a statistical office (even when micro-data, macro-data and indicators are published) • Easier cross-dataset querying • E.g., give me the statistics about recycling of places with more than 5000 inhabitants and ruled by political party X. • See my talk at the COLD workshop tomorrow to learn more • Simplified manner of accessing SDMX/PC-Axis/TSV data for outsiders • Non statisticians who know a bit of SPARQL and don’t dislike SKOS (XKOS)
  • 17. The Good: URIs everywhere and ontologies • What do the columns mean? • unit PER • geotime FI • Which are their units of measurement? • All these should be attached to a methodology page, but this is not always the case (e.g. Eurostat)
  • 18. The Good: a single language for cross-dataset querying • Get municipalities and the number of hectares dedicated to airports for those municipalities with an area smaller than 50 square kilometers PREFIX aragodef: <http://opendata.aragon.es/def/Aragopedia#> PREFIX dbpedia: <http://dbpedia.org/ontology/> PREFIX qb: <http://purl.org/linked-data/cube#> SELECT DISTINCT ?municipio ?ha WHERE { ?x a qb:Observation . ?x qb:dataSet <http://opendata.aragon.es/recurso/DataSet/UsoSuelo> . ?x aragodef:hectareasAeropuertos ?ha . ?x aragodef:refArea ?municipio . ?municipio a dbpedia:Municipality . ?municipio aragodef:areaTotal ?area . FILTER(?area<50 && ?ha != 0) } ORDER BY DESC(?ha)
  • 19. Linked Statistical Data: The Bad • RDF Data Cube datasets are too large in size • Rather simple datasets easily go up to 1Gb in Turtle • Obviously, they can be always HDT-ed, compressed, etc. • RDF Data Cube lacks some simple property to let us know how to aggregate values of a dimension • Can the values of dimension X in this dataset be aggregated by AVG, SUM, or something else? • What about performance in general-purpose triple stores? • Are analytical queries to be done on RDF Data Cube data? • Challenge and opportunity for improved data structures • See also the work of Benedikt Kämpgen
  • 20. Linked Statistical Data: The Ugly • Generating (and validating) Data Structure Definitions is time consuming and error prone • People in the audience, how do you do it? • Manual/ad-hoc transformations (e.g., OpenRefine, Kettle) into RDF Data Cube may lead to errors when loading in CubeViz, OpenCube, etc. • How can I run tests? • We need simple services for developers to use • Easy-to-understand REST API • And Linked Data for observations • Does it make sense?
  • 21. Ugly things can always become pretty Welcome to our Linked Statistical Data beauty center
  • 22. Generating data structure definitions (I) +
  • 23. Generating data structure definitions (II)
  • 24. Tests and validators for RDF Data Cube datasets
  • 25. Simple APIs to make use of RDF Data Cube (I) • Get servers • http://stats.linkeddata.es/services/getServers • Get available datasets from a server or all servers • http://stats.linkeddata.es/services/getStatistics?Server=http:// sandbox.linkeddata.es/sparql • http://stats.linkeddata.es/services/getStatistics?Server=ALL • Get available datasets from a geo resource • http://stats.linkeddata.es/services/getStatistics?Server=http:// localidata.oeg- upm.net/sparql&URI=http://datos.localidata.com/recurso/terri torio/Provincia/Madrid/Municipio/madrid/Distrito/09/Seccion/0 43
  • 26. Simple APIs to make use of RDF Data Cube (II) • Get dimensions from server, resource and dataset • http://stats.linkeddata.es/services/getDimensions?Server=htt p://localidata.oeg-upm.net/sparql& Statistic=http://datos.localidata.com/recurso/CityStats/Provin cia/Madrid/Poblacion/2012/12 • Get values for X axis • http://stats.linkeddata.es/services/getStatisticsXValues?Serv er=http://localidata.oeg-upm.net/sparql& Statistic=http://datos.localidata.com/recurso/CityStats/Provin cia/Madrid/Poblacion/2012/12& Dimension=http://datos.localidata.com/def/CityStats/dimensi on%23refPaisNacionalidad
  • 27. Simple APIs to make use of RDF Data Cube (III) • Get values for the X and Y axis, aggregation: SUM • http://stats.linkeddata.es/services/getStatisticsValues?Server =http://localidata.oeg-upm.net/sparql& Statistic=http://datos.localidata.com/recurso/CityStats/Provin cia/Madrid/Poblacion/2012/12& Dimension=http://datos.localidata.com/def/CityStats/dimensi on%23refPaisNacionalidad& URI=http://datos.localidata.com/recurso/territorio/Provincia/M adrid/Municipio/madrid/Distrito/09/Seccion/043& DimensionY=http://datos.localidata.com/def/CityStats/stats% 23numeroHabitantes& aggr=SUM
  • 28. Linked Data for RDF Data Cube • ELDA profiles for datasets and observations
  • 29. My wish list to make this guy even prettier • A (SKOS/XKOS) codelist finder • RAMON (http://ec.europa.eu/eurostat/ramon/) for SKOS • Given the values for this dimension, tell me which codelists I may want to make use of • Specifying applicable aggregators for dimensions • SDMX/PCAxis connectors to automate transformations • I hate starting from CSVs • JSON-stat convertor in OpenCube? • Optimised operators and data structures to deal with queries • A paper at the COLD workshop talking about this: Optimizing RDF Data Cubes for Efficient Processing of Analytical Queries not so
  • 30. Linked Statistical Data: does it actually pay off? or… The Good, The Bad and The not-so Ugly Keynote at 3rd International Workshop on Semantic Statistics (SemStats 2015) 11/10/2015 Oscar Corcho ocorcho@fi.upm.es @ocorcho https://www.slideshare.com/ocorcho

Notas do Editor

  1. The release of the W3C RDF Data Cube recommendation was a significant milestone towards improving the maturity of the area of Linked Statistical Data. Many Data Cube-based datasets have been released since then. Tools for the generation and exploitation of such datasets have also appeared. While the benefits for the usage of RDF Data Cube and the generation of Linked Data in this area seem to be clear, there are still many challenges associated to the generation and exploitation of such data. In this talk we will reflect about them, based on our experience on generating and exploiting such type of data, and hopefully provoke some discussion about what the next steps should be.