2. Open Statistics – Agenda
Statistics Belgium => Open Data
• Statistics Belgium
• Open Data Start
• Statbel Open Data Portal
• Statistics Belgium in the EU
Open Data => Linked Open Data
• 5*****?
• RDF
• LOD
• Semantic Web
• Ontologies for statisticians
• LOD in the NSIs
• RDF@statbel
Questions
Contact
3. Open Statistics – Statistics Belgium 1
Statistics Belgium ?
– National Statistical Institute (before NIS)
• largest producer of official statistics in Belgium
What do we do?
– Collect data: administrative sources (registers) or surveys
– Process and analyse data:
• common methodology, definitions (national, European)
– Publish data
• => +/- 400y releases on Statbel
4. Open Statistics – Statistics Belgium 2
One of the core tasks consists in making all produced statistics
available to everyone (European Statistics Code of Practice)
– Website Statbel since 1997
– Free re-(use) => source
– ‘open by default’
+/-100 statistics
– The main fields covered are population, society, work,
economy, real estate, construction, mobility and transport.
– Census
5. Open Statistics – Open Data Start?
Why?
• 2nd PSI – directive
• Belgian Federal Open Data strategy 2015
• Digital agenda (EU)
• Eurostat => EU Open Data Portal
• Crossroad Bank Enterprises (KBO) company register
• Users
Benefits
6. Open Statistics – Statbel Open Data Portal 1
Open Data Portal on the Statbel website since Q4 2015 : www.statbel.fgov.be/opendata
– Population & Census
– Labour market &
living conditions
• Fiscal statistics
on income
– Environment
– Prices
• CPI
– Tools
• Geography
• Codes and Classifications
7. Open Statistics – Statbel Open Data Portal 2
+-/ 110 datasets
Formats
• XLSX Excel Pivot tables
• CSV, TXT R, SAS, …, PostgreSQL,
• GML, SHP QGIS, ArcGIS, … ,
• Json, XML, CSV, XLSX be.STAT=> dynamic databank of Statbel
Special care
– Privacy
– Continuity
Goal : 1 new dataset/month
– Next : population, households, real estate
8. Open Statistics – Statistics Belgium in the EU
European Statistical System = Eurostat + NSI’s
– Key provider of public open data
– Draft Open Data Strategy (feb 2017)
Statistics Belgium
• Statbel.fgov.be/opendata
Eurostat
• Key contributor to the open
data portals
EU Open Data Portal
• Data.europa.eu/euodp
Belgium
• Data.gov.be
Metadata
harvesting European Data Portal
• www.europeandataportal.eu
metadata
Metadata
harvesting
9. Open Statistics – 5***** ?
Statistics Belgium => Open Data
Statbel: Situation actuelle
Statbel: Ambition
11. Open Statistics – RDF - Uniform resource identifier URI
Use URIs to identify things, so that people can point at your
stuff
– A URI identifies a concept.
– Example of a URI for the Rixensart
commune:http://vocab.belgif.be/refnis/25091#id
– In general, a URI is associated with a web page that documents the
concept. For Rixensart:
http://vocab.belgif.be/refnis/25091
12. Open Statistics – Resource description framework (RDF)
In the RDF files, triplets of the type “subject-predicate-object” are stored
In RDF files,
– subjects are URIs.
– predicats are URIs.
– objects are URIs ou des litéraux
Example (nomenclature):
<http://vocab.belgif.be/refnis/25091#id>
<http://www.w3.org/2004/02/skos/core#prefLabel> "Rixensart"@fr .
There are "standard vocabularies" (rules for forming triplets). Skos is one
of them.
13. Open Statistics – Resource description framework (RDF)
It’s possible to use "prefixes" to "abbreviate" URIs in RDF files
Example:
@prefix refnis: http://vocab.belgif.be/refnis/ .
@prefix skos: http://www.w3.org/2004/02/skos/core# .
refnis:25091#id skos:prefLabel "Rixensart"@fr.
refnis:25091#id skos:broader refnis:25000#id.
14. Open Statistics – Resource description framework (RDF)
Sample RDF file to describe a study(metadata):
– ddi:Study_1 a disco:Study.
– ddi:Study_1 dcterms:title "National Population and Housing Census, 1980"@en.
– ddi:Study_1 dcterms:identifier "ARG_1980_PHC_v01_A_IPUMS“ .
This description uses the vocabulary « ddi-rdf » (disco):
– DDI-RDF is “A vocabulary for publishing metadata about data sets
(research and survey data) into the Web of Linked Data”
– Described here : http://rdf-vocabulary.ddialliance.org/discovery.html
15. Open Statistics – Resource description framework (RDF)
RDF = forming triplets
There are several syntaxes to form them
– turtle,
– N-triples,
– xml,
– …
17. Open Statistics – Linked Open Data (LOD)
It’s possible to link several RDF sources. This is referred to as Linked
Open Data (LOD).
Examples of LOD sites on which to link :
– Dbpedia
– Wikidata
– Geonames
A simple way to link to another DB is to re-use its URIs
18. Open Statistics – Linked Open Data (LOD)
Example of LOD (nomenclature):
– @prefix refnis: http://vocab.belgif.be/refnis/ .
@prefix skos: http://www.w3.org/2004/02/skos/core# .
refnis:25091#id skos:prefLabel "Rixensart"@fr.
refnis:25091#id skos:broader refnis:25000#id.
refnis:25091#id skos:exactMatch <http://sws.geonames.org/2787990>.
refnis:25091#id skos:exactMatch <http://www.wikidata.org/entity/Q630478> .
20. Open Statistics – Semantic web
All the " sujet-prédicat-objet " sentences of the different LODs
form a giant "knowledge graph" whose size increases rapidly
22. Open Statistics – Ontologies for statisticians
Standard vocabularies
23. Open Statistics – Standard vocabularies
Classifications
– SKOS: Classifications (nomenclatures)
– XKOS: SKOS extension (for NACE, …)
Document a list of files (catalog)
– DCAT
– StatDCAT-AP
– GeoDCAT-AP
24. Open Statistics – Standard vocabularies
Metadata:
– Dublin core
– DDI-RDF
Data:
– RDF Data cube vocabulary
25. Open Statistics – Standard vocabularies
Other interesting vocabularies recommended by Eurostat
– The Organization Ontology
– The PROV ontology
– Time Ontology in OWL
– Dublin Core
– ISA Core Vocabularies in RDF (Person, Public Organisation,
Business, Public Service, Location)
– Vocabulary of Interlinked Datasets (VoID)
26. Open Statistics – Nomenclatures
Some nomenclatures, "controlled vocabularies" & thesauri
recommended by Eurostat:INSPIRE code lists
– EuroVoc thesaurus
– Named Authority Lists (NAL)
27. Open Statistics – LOD IN THE NSIs
Some NSIs already have LOD:
– Insee: Some code tables + legal population
– Istat
– ONS + Geoportal UK
– Census 2011 in Ireland
28. Open Statistics – RDF@Statbel
What to publish as LOD?
Priorities for publication as LOD:
– Nomenclatures (create URIs for NACEBEL, REFNIS, … +
create files that expose hierarchies, …)
– Catalog of the data (to let the ‘machines’ all over the world
know that our datasets are available in csv, …)
– Metadata
– A selection of datasets (For example: legal population of
municipalities)