The document summarizes an ENGAGE workshop discussing barriers to open data reuse in Europe and findings from the ENGAGE project. Key points include:
1) Metadata was identified as a major barrier, with needs for a rich format to facilitate discovery and addressing issues like multilinguality, data ownership, and formats.
2) ENGAGE aims to provide a single access point and tools for researchers and citizens to discover, browse, download, visualize, and submit diverse public sector data sources.
3) ENGAGE 2.0 adds additional functionalities like dataset extension, conversion, cleansing, and crowdsourcing derived datasets while maintaining provenance information.
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
ENGAGE Workshop at OpenDataWeek2013
1. ENGAGE Workshop, June 27th, 2013, Marseille
Accelerate the data re-use:
ex of an e-infrastructure at European level
Valerie Brasse
euroCRIS / IS4RI
Strasbourg, France
Slides reproduced from presentations by ENGAGE members
2. Agenda
The ENGAGE project, an introduction
The ENGAGE 2.0 platform, released in Beta since April 2013
Open data for re-use in Europe, some barriers to overcome?
Findings from the ENGAGE project
Discussion
Your suggestions to overcome the barriers
2
3. Contract no
Project type
Start date
Duration
Partners
Framework Programme 7
(2007-2013)
NTUA GR
TU-DELFT NL
MIC-GR GR
IBM-ISRAEL IL
INTRASOFT LU
STFC UK
FhG-FOKUS DE
AEGEAN GR
EUROCRIS NL
Acronym ENGAGE
Title An Infrastructure for Open, Linked Governmental Data Provision
towards Research Communities and Citizens
Website http://www.engage-project.eu
Platform http://www.engagedata.eu
ENGAGE Project Information
RI-283700
CP-CSA
01/06/2011
36 months
9
Project participants
Research Infrastructures (Coordinator)
4. Public Sector Information
0 Data produced by governmental organisations – typically
referring to datasets
0 Examples: geospatial, demographic, statistical, environmental,
public safety, financial data
0 Growing international movement: open access to PSI datasets
in a way that facilitates reuse
0 Opening up PSI datasets can potentially lead to substantial
economic gains 1
1Vickery, G. (2011): Review of recent studies on PSI re-use and related
market developments.
5. • Development and use of a data infrastructure, incorporating distributed and diverse public
sector information (PSI) resources
• Capable of supporting scientific collaboration and research, particularly for the Social
Science and Humanities (SSH) scientific communities,
• Empowering the deployment of open governmental data towards citizens.
Simply put, ENGAGE is a door for researchers that leads them to the world of Open
Government Data. Through the ENGAGE platform, researchers and citizens will be able to
search, browse, download, visualise and submit diverse and distributed Public Sector
datasets from EU countries.
Overview of ENGAGE objectives
6. ENGAGE Two-way Scenario
Public Sector
Information
Collection
Data
Curation
Archival
Data Search
and Retrieval
Advanced
Data Services
Delivering Open Data Needs and guidelines to Public Sector Organisations
•Public Sector
Organisations
•Open data
initiations
•Pre-processing
•Anonymisation
•Harmonisation
•Annotation
•Linking
•Cloud and Grid
Infrastructure
•Platform
Independence and
Interoperability
•Open and intuitive
access to the data
collection
•Context-specific
search
•Visualisation (inc.
combined views)
•Context-specific
formatting
•Collaboration tools
•Public Sector
Organisations
•ENGAGE and
eInfrastructures
•ENGAGE•Society
•Policy
•Research
Communities
•Policy makers
New Problems
– new
Challenges
Search Data
Needs
New Service
Definition for
open data
Utilisation of
existing
Infrastructures
Needs for
Governmental
data Provision
7. ENGAGE provides a
single point of access
to PSI sources as well
as relevant tools in
order to cover the
needs of researchers
and citizens
Unstructured / “Semi-structured”
Ministries / local public
agencies websites
Publicdata.eu
National
Statistical
Offices
Public
data
sources
ENGAGE traverses
across distributed and
diverse public sector
information resources
8. ENGAGE aims to embrace the
Linked Data Paradigm while
ensuring the quality and
responsiveness of highly
structured information models.
ENGAGE: not an isolated
data silo but a vital part of
the Global Data Space.
9. ENGAGE will enable EU Researchers / Citizens to
Discover and browse datasets across diverse and
dispersed public sector information resources
(local, National and European) in their own
language.
Upload curated, enhanced or extended versions of
existing datasets, originally published by public
agencies, in order to address various formats,
standards and scientific purposes in a crowd-
sourcing manner.
Acquire the datasets
Visualize properly structured datasets in data
tables, maps and charts
Additionally
Utilize ENGAGE Application Programming
Interfaces (APIs) for searching and acquiring the
datasets.
Rate the quality of datasets on various dimensions
Request additional datasets or information on
existing datasets from the Public Agencies
View usage statistics
View publications and other material linked to
datasets
10. Public Agencies will be able to Utilize the ENGAGE infrastructure (interface and APIs) to publish
governmental data
Register and link their datasets within the ENGAGE infrastructure
Receive feedback on the quality of their datasets
Review the opinion or request of citizens and researchers
View the applications, publications and other datasets uploaded by
scientists, that are linked to their original published datasets
12. Unstructured / Semi-structured / Structured
Public
data
sources
JSON
Conversion
Data Enrichment
Metadata Enrichment
Cleansing
“Snapshots”
Low
Re-Use Value /
Quality structure /
metadata
Discovery
and Context
Metadata
High
Re-Use Value /
Quality structure /
metadata
ENGAGECrowdsourcing
Moving from low
structured, low value
datasets to highly
structured and / or
derived datasets
14. ENGAGE 2.0
0 On top of ENGAGE basic functions (catalog, search,
visualizations, API)
Researchers / Citizens / Journalists:
0 Extend other datasets (official or already extended - derived
datasets)
0 Conversions (e.g. HTML- PDF to xls, PDF to RDF)
0 Data Cleansing (e.g. duplicate records, empty rows, errors)
0 Metadata Enrichment (missing metadata, Linked Data Enablers!)
0 Data Enrichment (enrich datasets with more information)
0 Snapshots of real-time data (e.g.
Diavgeia_decisions_10_2012_to_12_2012.xls)
0 Mash-ups / Interlinking (e.g. Combine Election results to UV radiation
levels!)
0 View the version tree of official – derived datasets (clean
solution - easy to understand and manage the contributions /
versions)
15. ENGAGE 2.0
Researchers / Citizens / Journalists:
0 Data Requests
0 Looking for a dataset (e.g. I can’t find it elsewhere. Does it exist?)
0 Looking for a curation / conversion / enrichment (e.g. I am looking for
the election results in Greece in XLS. )
0 Looking for data verification (e.g. Do you think this dataset is valid?)
0 Freedom of Information Requests
0 Integration of tools
0 Google Refine
0 ScraperWiki
0 Visualizations
16. ENGAGE 2.0
Data Providers:
0 Maintainers of Official Datasets
0 Work as a group
0 Bring the community which works on their data closer to them/
direct communication
0 See and take advantage of ENGAGE Data Curation Community
work (e.g. cleansing, better formats)
0 Easy to see / gather all the Applications that are based on their
official datasets.
0 See the impact of their datasets.
0 Understand which datasets have RE-USE value for users.
0 Community Help in the process of Digitalization and Opening of
current or older Public Data (history dimension)
17. Search for a dataset...
...use your own language
30. Functionalities of ENGAGE open data e-infrastructure
0 Contribution of ENGAGE over existing infrastructures:
1. Service for researchers and citizens
2. Metadata specification and content organisation (embracement of
the Linked Data Paradigm while ensuring the quality and
responsiveness of highly structured information models)
3. Automation in data entry and curation
4. Crowdsourcing and interaction with and between users of the
platform
5. Data curation tools and services
6. Dataset visualisation possibilities
7. Multilinguality
8. User help and training
31. Value Proposition through individual tools
Search in diverse and dispersed data sources in EU
supported by ENGAGE
Be able to transform your datasets keeping the valuable
information with the ENGAGE external tools (Open
Refine, Scrapperwiki etc.)
See your results through visualisation tools
Structure your data according to your needs – control all
the levels of your dataset (data, metadata, format)
Refine existing datasets by metadata enrichment
32. Value Proposition through collaboration
Create your community(ies) with members of mutual
interests
Each community will be able to increase the value of its
data sets by applying their own perspectives based on its
unique needs
Upload your work and share it with your community
Find other data sets, valuable for your work, uploaded by
your community (Collaborate / Exchange / Ask / Provide)
Combine their results with yours – make new datasets
33. Elastic Search
Ckan API ScraperWiki API
Open
Refine
Django
Wiki
Amazon S3
Python / Django Framework
HerokuPostgresql
Virtuoso PostgreSQL
Apache SolR
Django Framework
Gateways and
integrated
tools
User Interface
ENGAGE Core
Components
HTML / Jquery
Translate
Storage
Components
CERIF
34. Performing scenarios
Scenario 1: Searching, downloading, extending/
visualizing/ curating/ linking and uploading interesting
datasets
Scenario 2: Getting information about other open data
websites and comparing them via the ENGAGE website
Scenario 3: Getting information about manuals, API's and
tutorials (training)
36. Agenda
The ENGAGE project, an introduction
The ENGAGE 2.0 platform, released in Beta since April 2013
Open data for re-use in Europe, some barriers to overcome?
Findings from the ENGAGE project
Discussion
Your suggestions to overcome the barriers
8
37. From V1 evaluation
Asking for:
– More of the specific datasets that users are looking for
– Better performing advanced search functionality
– More / more open dataset formats
– More tools for visualization
– More metadata
– More metadata in the language that the user understands
– Better understandable metadata
– Easy to find metadata
– Information about the quality of the datasets
– Ability to rate and post comments on datasets
➢ Metadata are very important in solving many problems +
Multilinguality + Dataset formats
3
38. Challenges of data sourcing
• Great diversity and variety on datasets in terms of
• File format
• Encoding
• License
• Language
• Metadata standard (Discovery level)
• Metadata standard (Data ‐ Domain level)
• Some PSI sites (even new) do not provide an API
• Most sites provide an API only for discovery
• Linked Data potential still not achieved (IT‐savvy / researchers only)
• Live query of other portals datasets has issues:
– Schema Mapping
– Performance
39. Barriers to overcome?
Metadata
Need for a rich format to facilitate discovery and search
(DCAT, CKAN..., CERIF?)
How to have it filled: human vs extraction
Tracking of data ownership and provenance, for trust and
security
Datasets formats: from pdf/csv toward LOD/rdf
Multilinguality (metadata and data)
Licences
Many “open” licences: CC and national licences
Curation: what licence for a merged and enriched A + B,
depending on A and B licences?
4
40. Agenda
The ENGAGE project, an introduction
The ENGAGE 2.0 platform, released in Beta since April 2013
Open data for re-use in Europe, some barriers to overcome?
Findings from the ENGAGE project
Discussion
Your suggestions to overcome the barriers
11
41. Barriers to overcome?
Metadata
Need for a rich format to facilitate discovery and search
(DCAT, CKAN..., CERIF?)
How to have it filled: human vs extraction
Tracking of data ownership and provenance, for trust and
security
Datasets formats: from pdf/csv toward LOD/rdf
Multilinguality (metadata and data)
Licences
Many “open” licences: CC and national licences
Curation: what licence for a merged and enriched A + B,
depending on A and B licences?
5
42. Rich contextual metadata is important
0 Captures context, purpose, provenance, coverage, etc.
0 Allows the user to:
0 Discover a dataset
0 Evaluate utility and re-use potential
0 Reuse it!
0 Enables advanced services
0 Sophisticated search/discovery and navigation, mining, visualisation,
reporting
11th International Conference on Current Research Information Systems (CRIS 2012),
Prague, 6-9 June 2012
43. • Need canonical form to reduce n(n‐1) conversions to n
– PSI data has several different metadata ‘standards’
• Canonical form must be able to ingest or generate the other
metadata ‘standards’
– Implies has to be richer than the others
• Syntax (structure) and semantics
• Support multiple semantics over canonical syntax
• Canonical form must support whatever architecture is used
Mapping considerations
50. CERIF Common European Research Information Format –
maintained by euroCRIS
From http://cerifsupport.org/2013/04/02/data-in-cerif/ , B. Joerg
CERI
F
51. Barriers to overcome?
Metadata
Need for a rich format to facilitate discovery and search
(DCAT, CKAN..., CERIF?)
How to have it filled: human vs extraction
Tracking of data ownership and provenance, for trust and
security
Datasets formats: from pdf/csv toward LOD/rdf
Multilinguality (metadata and data)
Licences
Many “open” licences: CC and national licences
Curation: what licence for a merged and enriched A + B,
depending on A and B licences?
6
54. Barriers to overcome?
Metadata
Need for a rich format to facilitate discovery and search
(DCAT, CKAN..., CERIF?)
How to have it filled: human vs extraction
Tracking of data ownership and provenance, for trust and
security
Datasets formats: from pdf/csv toward LOD/rdf
Multilinguality (metadata and data)
Licences
Many “open” licences: CC and national licences
Curation: what licence for a merged and enriched A + B,
depending on A and B licences?
7
57. Barriers to overcome?
Metadata
Need for a rich format to facilitate discovery and search
(DCAT, CKAN..., CERIF?)
How to have it filled: human vs extraction
Tracking of data ownership and provenance, for trust and
security
Datasets formats: from pdf/csv toward LOD/rdf
Multilinguality (metadata and data)
Licences
Many “open” licences: CC and national licences
Curation: what licence for a merged and enriched A + B,
depending on A and B licences?
8
59. Open licenses landscape – per country
Country Portal Licence
France Data.gouv.fr Licence Ouverte
United Kingdom Data.gov.uk Open Government Licence
Italy Dati.gov.it Creative Commons Attribuzione - Non commerciale 2.5 Italia
(CC BY-NC 2.5)
Germany Govdata.de
Datenlizenz Deutschland – Namensnennung
Datenlizenz Deutschland – Namensnennung – nicht
kommerziell
Norway Data.norge.no Norsk lisens for offentlige data (NLOD)
Netherlands Data.overheid.nl
No specific common licence but a recommendation for the
agencies publishing data through the portal to use the
framework of the Open Government Act, and to apply Creative
Commons Zero of Public Domain if any licence is desired at all
Spain Datos.gob.es
No specific licence but two parts in extensive legal notes that
cover data re-use and are based on different pieces of Spanish
national legislation
Belgium Data.gov.be
No specific common licence. Each public service or government
institution determines the terms and conditions governing
access to and use of its data published through portal.
From Bunakov, V., Jeffery, K. (2013). Licence management for Public Sector Information.
Conference for eDemocracy & Open Government
61. Open license content – an example
Regulation components of data.gouv.fr open licence
From Bunakov, V., Jeffery, K. (2013). Licence management for Public Sector Information.
Conference for eDemocracy & Open Government
62. Conclusion: Participants' suggestions
The ENGAGE platform and features are interesting to promote
data re-use pending the fulfillment of the next points:
1. Have a clear view of the positioning of ENGAGE in the Open
Data ecosystem, including the added value / differences with
respect to HOMER and other Open Data-related EC projects
2. Ensure ENGAGE sustainability
3. For ENGAGE in particular, and for EC projects in general, the
developed software should be required to be open source in
order to ensure their sustainability
4. Success stories related to the use of ENGAGE should be
promoted, for example demonstrating the savings in time for
Researchers
5. The educative side should be strong, with the inclusion of
basic information on Linked Data, video tutorials,...
25