Are you interested in finding out how your organisation can comply with the new European Commission Directive on Open Data and the Re-use of Public Sector Information (also known as the ‘Open Data Directive’)? The Open Data Directive entered into force on 16 July 2019 and will transposed into National Law in July 2021.
https://digital-strategy.ec.europa.eu/en/policies/open-data
In this presentation, we look at how an organisation can get started with Open Data publishing, including what data do we manage, which data should we publish as Open Data, or how can we make data available as Open Data?
Presented as part of the webinar 'It’s time to Open - Preparing for new Open Data and Reuse of PSI Directive'.
https://www.eventbrite.co.uk/e/its-time-to-open-preparing-for-new-open-data-and-reuse-of-psi-directive-tickets-143034131939#
3. Directive on Open Data and
Re-Use of Public Sector Information
• Stimulate the publication of dynamic data and the uptake of Application
Programme Interfaces (APIs);
• Limit the exceptions that now enable public bodies to charge more than marginal
costs of dissemination for data re-use;
• Extend the scope of the directive to include data held by public undertakings,
under a specific set of rules, and research data resulting from public funding; and
• Strengthen the transparency requirements for agreements involving public sector
information between public and private parties, thereby avoiding exclusive deals.
• Require the publication of a list of high-value datasets to be provided free of
charge.
3
4. Where do I start?
4
1. What data do we manage?
2. Which data should we publish as
Open Data?
3. How can we make data available as
Open Data?
6. What data do we manage?
6
We don’t manage any data
It’s already published on our website
It’s too big
The data is of poor quality
It’s not very interesting
The data is domain-specific
People will misinterpret the data
It’s available in reports
7. What data do we manage?
7
https://www.tusla.ie/publications/
8. 8
Identify Datasets through a Data Audit
A Data Audit helps Public Bodies compile a comprehensive inventory of datasets they
manage, and classify if these datasets are suitable for data sharing or publication as
Open Data.
(Also useful for GDPR and Public Service Data Catalogue)
https://datacatalogue.gov.ie/
9. Identify Datasets through a Data Audit
• Name of the dataset
• Description of the dataset
• Dataset manager(s)
• Link to dataset
• Classification
• Format
• Publication frequency
• Demand for this dataset
• General Comments
9
13. Which data should we publish as Open Data?
13
High-Value
Sustainable Low-Hanging Fruit
14. High-Value Datasets
• Have a high commercial potential and can speed up the emergence of value-
added EU-wide information products. They will also serve as key data sources for
the development of Artificial Intelligence.
• They are subject to a separate set of rules ensuring their availability free of charge,
in machine readable formats, provided via APIs and, where relevant, as bulk
download.
14
• Defined as documents the re-use of which is associated with
important benefits for the society and economy.
15. High-Value Datasets (HVDs)
15
• Geospatial
– e.g. national and local maps, postcodes
• Statistics
– e.g. demographic and economic indicators
• Companies and company ownership
– e.g. business registers and registration identifiers
• Mobility
– e.g. road signs and inland waterways
• Earth observation and environment
– e.g. energy consumption and satellite images
• Meteorological
– e.g. in situ data from instruments and weather forecasts
16. HVDs – Is there a Demand for this information?
16
➢ From colleagues
➢ From other Public Bodies
➢ From researchers
➢ From the general public
➢ Via FOIs
➢ Via PQs
17. Sustainable Dataset
Can we continue to support the publication of this dataset on an ongoing
basis?
• Are individuals a bottleneck?
• Is the publication process repeatable?
• Can publishing be automated?
• Can a harvester be built to automate publication?
17
19. Low-Hanging Fruit
19
Start publishing the easier, non-controversial datasets,
e.g. service information, data that is already available in non machine-readable
formats, etc.
People will misinterpret the data
I don’t mind, but someone else might
We’ll get spam
The data is of poor quality
20. Which data should we publish as Open Data?
20
High-Value Sustainable
26. Where is your data currently stored/managed?
26
• Spreadsheets on your computer
• On a shared workspace
• On a file server
• In a specific system
• On your website
27. Make data available online in open, machine-readable formats
27
General
• CSV
• JSON
• XML
• ODF
• RDF
• TSV
Geospatial
• GeoJSON
• GML
• KML
• WKT
• LAS
• IFC
• Shapefile
• WMS
Domain-Specific
• PX
• JSON-stat
• NetCDF
• BUFR
• Datex II
• GTFS
• HDF5
• GRIB
29. Building an Application Programming Interface (API)
29
✓ Integration
✓ Flexibility
✓ Best option for dynamic data
✓ Application development ❖ Upfront effort
31. Manual Publication of Data
31
✓ Good for once-off publication
✓ Suitable for linking to an API endpoint
❖ Not sustainable in long-term
32. Be harvested by
32
✓ Automated
✓ Keeps data up-to-date
✓ Harvester built by Derilinx
33. Where do I start?
33
1. What data do we manage?
– Carry out a data-audit
2. Which data should we publish as Open Data?
– High-value / sustainable / low-hanging fruit
3. How can we make data available as Open Data?
– Publish online in open, machine-readable formats
4. How to be discoverable on ?
– Manually / via harvester