TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
How to Publish Open Data
1. Digital Enterprise Research Institute www.deri.ie
How to publish Open Data
Richard Cyganiak
Opening Up Government Data – Galway, 8 Nov 2011
Stefan.Decker@deri.org
http://www.StefanDecker.org/
Copyright 2010 Digital Enterprise Research Institute. All rights reserved.
2. TimBL’s 5-star plan for open data
Digital Enterprise Research Institute www.deri.ie
★Make your stuff available on the Web
★★Make it available as structured data
(e.g., an Excel sheet instead of image scan of a table)
★★★Use a non-proprietary format
(e.g., a CSV file instead of an Excel sheet)
★★★★Use linked data format
(i.e., URIs to identify things, and RDF to represent data)
★★★★★Link your data to other people’s data to provide
context
Source: http://inkdroid.org/journal/2010/06/04/the-5-stars-of-open-linked-data/
6. Five-shamrock scheme
Digital Enterprise Research Institute www.deri.ie
1. Publish data on the web
2. Publish data in a machine-processable format
3. Use an open standard format
7. Five-shamrock scheme
Digital Enterprise Research Institute www.deri.ie
1. Publish data on the web
2. Publish data in a machine-processable format
3. Use an open standard format
4. Publish under an open license
8. Five-shamrock scheme
Digital Enterprise Research Institute www.deri.ie
1. Publish data on the web
2. Publish data in a machine-processable format
3. Use an open standard format
4. Publish under an open license
5. List your data in a data catalog
10. Why?
Digital Enterprise Research Institute www.deri.ie
The web is where people look for it first
Google can index it
Less phone calls and emails (and FoI requests) to
answer
11. Lots of data is already there
Digital Enterprise Research Institute www.deri.ie
Databases
Reports
Spreadsheets
Maps
13. Why?
Digital Enterprise Research Institute www.deri.ie
Allow others to do their own processing, analysis and
visualisation of your data
New services, new ideas
14. Examples
Digital Enterprise Research Institute www.deri.ie
CSO Quarterly National Household Survey
http://cso.ie/qnhs/calendar_quarters_qnhs.htm
EPA enforcement files and ScraperWiki
http://www.epa.ie/whatwedo/enforce/lic/info/
https://views.scraperwiki.com/run/irish-epa-visuals/
Galway and Fingal planning applications
http://lab.linkeddata.deri.ie/2010/planning-apps/
Getting the data: 210 lines of code vs. 30 lines of code
15. Symptom: screenscraping
Digital Enterprise Research Institute www.deri.ie
People use tools like ScraperWiki to get at data that isn't
machine-readable
https://scraperwiki.com/tags/ireland
Scraping is not the right way of doing this
Expensive
Brittle
Strain on computing resources
16. Formats
Digital Enterprise Research Institute www.deri.ie
Good: MS Excel, CSV, XML, JSON, Microdata
Not so good: Pure websites, MS Word
Bad: PDF
Really bad: Only charts/maps without numbers
17. Good practices
Digital Enterprise Research Institute www.deri.ie
Publish in multiple formats, at least one machine-
readable
Publish Excel files alongside large PDF reports
Publish CSV alongside database-backed web
applications
19. Why?
Digital Enterprise Research Institute www.deri.ie
Not all formats are created equal
Some formats bring many tools and applications that
people can already use
20. Quick tour of formats
Digital Enterprise Research Institute www.deri.ie
CSV – Comma-Separated Values
More open (and simpler) alternative to Excel format
Can be opened in and exported from Excel, Google
Spreadsheets, Google Refine, …
KML – Keyhole Markup Language
Simple format for presenting geographic data
Can be opened in Google Maps
RSS – Really Simple Syndication
Notifications of updates of any kind
Can be opened in RSS readers and many email clients
21. Developer-oriented formats
Digital Enterprise Research Institute www.deri.ie
XML – Extensible Markup Language
W3C (World Wide Web Consortium) standard, 1997
established, reliable, ubiquitous
JSON – Javascript Object Notation
IETF (Internet Engineering Task Force) standard, 2006
great for web APIs
very simple; very fashionable right now
RDF – Resource Description Framework
W3C standard, 2004
great for data integration
steeper learning curve
22. Also: standard classifications
Digital Enterprise Research Institute www.deri.ie
Within your data, use the same categories as everybody
else
CSO
http://www.cso.ie/surveysandmethodologies/classifications_stan.
htm
StatCentral list of classifications
http://www.statcentral.ie/classifications.asp
23. Also: standard identifiers
Digital Enterprise Research Institute www.deri.ie
Example: School roll numbers
Department of Education publishes an Excel file with all school
roll numbers
Can be used to Google the same school on other websites,
school evaluation reports etc
Example: Ordnance Survey UK geo identifiers
Uses URIs (web addresses) as identifiers
http://data.ordnancesurvey.co.uk/doc/7000000000037256
Great for use in RDF
24. Linked Open Data Cloud
Digital Enterprise Research Institute www.deri.ie
25. Summary
Digital Enterprise Research Institute www.deri.ie
Prefer open, widely used standards
But: also prefer what you know best
Support multiple formats for different audiences where it
makes sense
Great: CSV, KML, RSS, XML, JSON
27. Why?
Digital Enterprise Research Institute www.deri.ie
Regulates what others can and cannot do with the data
For re-users, uncertainty about rights is a major concern
A good way to ensure that your organisation gets
acknowledged
You need some non-discriminatory policy for giving
rights to the data anyway (PSI directive)
28. Complex topic
Digital Enterprise Research Institute www.deri.ie
Destroying a potential income stream?
Content licenses vs database licenses
Mixing and compatibility of licenses
Wikipedia, OpenStreetMap
29. Irish PSI License
Digital Enterprise Research Institute www.deri.ie
Created in response to PSI Directive
Available at http://psi.gov.ie/
Problems: Documents may not be used “for the principal
purpose of advertising or promoting a particular product
or service”
Can't be combined with Wikipedia or OpenStreetMap
Not an open license according to Open Definition
http://opendefinition.org/
31. License features
Digital Enterprise Research Institute www.deri.ie
You're allowed to do pretty much anything, provided
you…
Attribution (“By”) – give credit
ShareAlike (“SA”) – adapted data must be published in
the same way
32. Does Open Data have to be free?
Digital Enterprise Research Institute www.deri.ie
Many would say yes
A matter of terminology and definitions
Either way there is nothing wrong with charging for
certain data
33. Data protection
Digital Enterprise Research Institute www.deri.ie
Personal information is not open data
Freedom of Information legislation
http://foi.gov.ie/
34. Summary
Digital Enterprise Research Institute www.deri.ie
Stating an explicit license is important
Irish PSI License: It's readily available, but not “open
enough” for some applications
Open Data Commons licenses with various constraints
36. Why?
Digital Enterprise Research Institute www.deri.ie
So that people know it exists
This is how the world learns about available data
This is how you learn what they do and need
37. Some key information about a dataset
Digital Enterprise Research Institute www.deri.ie
What data is being published?
What's the license?
When was the data collected?
When will it be updated, if at all?
How was/is this data collected?
What was/is the data used for?
Contact person?
Where to give feedback?
38. How to do this in practice?
Digital Enterprise Research Institute www.deri.ie
Have a simple page on your website
Use an open community data catalog
Set up your own catalog
Use a national Irish data catalog???
39. Open community catalogs
Digital Enterprise Research Institute www.deri.ie
The Data Hub
http://thedatahub.org
Irish CKAN
http://ie.ckan.net
40. Set up your own catalog
Digital Enterprise Research Institute www.deri.ie
Requires a budget
Roll your own software?
data.fingal.ie
Use open source, e.g., CKAN?
data.gov.uk
Berlin Open Data
…
41. National Irish data catalog?
Digital Enterprise Research Institute www.deri.ie
CSO'sStatCentral?
Marine Institute's ISDE?
Who publishes the catalog in other countries?
UK: Cabinet Office
US: White House
Australia: Dept of Finance and Deregulation
New Zealand: Dept of Internal Affairs
42. Summary
Digital Enterprise Research Institute www.deri.ie
Data catalogs make it easy to find data
Basic metadata, how to give feedback etc
Important: How often are datasets accessed?
“Request a dataset” feature
Also: Open Data Ireland Google Group
http://groups.google.com/group/open-data-ireland
43. Five-shamrock scheme
Digital Enterprise Research Institute www.deri.ie
1. Publish data on the web
2. Publish data in a machine-processable format
3. Use an open standard format
4. Publish under an open license
5. List your data in a data catalog