SlideShare uma empresa Scribd logo
1 de 43
Digital Enterprise Research Institute                                         www.deri.ie




                                  How to publish Open Data

                                       Richard Cyganiak
                        Opening Up Government Data – Galway, 8 Nov 2011




 Stefan.Decker@deri.org
 http://www.StefanDecker.org/

 Copyright 2010 Digital Enterprise Research Institute. All rights reserved.
TimBL’s 5-star plan for open data
Digital Enterprise Research Institute                                                                www.deri.ie




       ★Make your stuff available on the Web
       ★★Make it available as structured data
                                        (e.g., an Excel sheet instead of image scan of a table)

       ★★★Use a non-proprietary format
                                        (e.g., a CSV file instead of an Excel sheet)

       ★★★★Use linked data format
                                        (i.e., URIs to identify things, and RDF to represent data)

       ★★★★★Link your data to other people’s data to provide
              context

   Source: http://inkdroid.org/journal/2010/06/04/the-5-stars-of-open-linked-data/
Five-shamrock scheme
Digital Enterprise Research Institute   www.deri.ie
Five-shamrock scheme
Digital Enterprise Research Institute            www.deri.ie




                   1. Publish data on the web
Five-shamrock scheme
Digital Enterprise Research Institute                                www.deri.ie




                   1. Publish data on the web

                   2. Publish data in a machine-processableformat
Five-shamrock scheme
Digital Enterprise Research Institute                                 www.deri.ie




                   1. Publish data on the web

                   2. Publish data in a machine-processable format

                   3. Use an open standard format
Five-shamrock scheme
Digital Enterprise Research Institute                                 www.deri.ie




                   1. Publish data on the web

                   2. Publish data in a machine-processable format

                   3. Use an open standard format

                   4. Publish under an open license
Five-shamrock scheme
Digital Enterprise Research Institute                                 www.deri.ie




                   1. Publish data on the web

                   2. Publish data in a machine-processable format

                   3. Use an open standard format

                   4. Publish under an open license

                   5. List your data in a data catalog
Digital Enterprise Research Institute            www.deri.ie




                    1. Publish data on the web
Why?
Digital Enterprise Research Institute                            www.deri.ie




            The web is where people look for it first
            Google can index it
            Less phone calls and emails (and FoI requests) to
             answer
Lots of data is already there
Digital Enterprise Research Institute   www.deri.ie




            Databases
            Reports
            Spreadsheets
            Maps
Digital Enterprise Research Institute                       www.deri.ie




                                         2. Publish data
                                          in a machine-
                                        processableformat
Why?
Digital Enterprise Research Institute                                www.deri.ie




            Allow others to do their own processing, analysis and
             visualisation of your data
            New services, new ideas
Examples
Digital Enterprise Research Institute                                          www.deri.ie




            CSO Quarterly National Household Survey
                   http://cso.ie/qnhs/calendar_quarters_qnhs.htm
            EPA enforcement files and ScraperWiki
                   http://www.epa.ie/whatwedo/enforce/lic/info/
                   https://views.scraperwiki.com/run/irish-epa-visuals/
            Galway and Fingal planning applications
                   http://lab.linkeddata.deri.ie/2010/planning-apps/
                   Getting the data: 210 lines of code vs. 30 lines of code
Symptom: screenscraping
Digital Enterprise Research Institute                                 www.deri.ie




            People use tools like ScraperWiki to get at data that isn't
             machine-readable
                   https://scraperwiki.com/tags/ireland
            Scraping is not the right way of doing this
                   Expensive
                   Brittle
                   Strain on computing resources
Formats
Digital Enterprise Research Institute                       www.deri.ie




            Good: MS Excel, CSV, XML, JSON, Microdata
            Not so good: Pure websites, MS Word
            Bad: PDF
            Really bad: Only charts/maps without numbers
Good practices
Digital Enterprise Research Institute                             www.deri.ie




            Publish in multiple formats, at least one machine-
             readable
            Publish Excel files alongside large PDF reports
            Publish CSV alongside database-backed web
             applications
Digital Enterprise Research Institute   www.deri.ie




         3. Use an open standard format
Why?
Digital Enterprise Research Institute                              www.deri.ie




            Not all formats are created equal
            Some formats bring many tools and applications that
             people can already use
Quick tour of formats
Digital Enterprise Research Institute                                     www.deri.ie




            CSV – Comma-Separated Values
                   More open (and simpler) alternative to Excel format
                   Can be opened in and exported from Excel, Google
                    Spreadsheets, Google Refine, …
            KML – Keyhole Markup Language
                   Simple format for presenting geographic data
                   Can be opened in Google Maps
            RSS – Really Simple Syndication
                   Notifications of updates of any kind
                   Can be opened in RSS readers and many email clients
Developer-oriented formats
Digital Enterprise Research Institute                                       www.deri.ie




            XML – Extensible Markup Language
                   W3C (World Wide Web Consortium) standard, 1997
                   established, reliable, ubiquitous
            JSON – Javascript Object Notation
                   IETF (Internet Engineering Task Force) standard, 2006
                   great for web APIs
                   very simple; very fashionable right now
            RDF – Resource Description Framework
                   W3C standard, 2004
                   great for data integration
                   steeper learning curve
Also: standard classifications
Digital Enterprise Research Institute                                           www.deri.ie




            Within your data, use the same categories as everybody
             else
            CSO
                   http://www.cso.ie/surveysandmethodologies/classifications_stan.
                    htm
            StatCentral list of classifications
                   http://www.statcentral.ie/classifications.asp
Also: standard identifiers
Digital Enterprise Research Institute                                             www.deri.ie




            Example: School roll numbers
                   Department of Education publishes an Excel file with all school
                    roll numbers
                   Can be used to Google the same school on other websites,
                    school evaluation reports etc
            Example: Ordnance Survey UK geo identifiers
                   Uses URIs (web addresses) as identifiers
                   http://data.ordnancesurvey.co.uk/doc/7000000000037256
                   Great for use in RDF
Linked Open Data Cloud
Digital Enterprise Research Institute   www.deri.ie
Summary
Digital Enterprise Research Institute                               www.deri.ie




            Prefer open, widely used standards
            But: also prefer what you know best
            Support multiple formats for different audiences where it
             makes sense
            Great: CSV, KML, RSS, XML, JSON
Digital Enterprise Research Institute                    www.deri.ie




                                   4. Publish under an
                                       open license
Why?
Digital Enterprise Research Institute                               www.deri.ie




            Regulates what others can and cannot do with the data
            For re-users, uncertainty about rights is a major concern
            A good way to ensure that your organisation gets
             acknowledged
            You need some non-discriminatory policy for giving
             rights to the data anyway (PSI directive)
Complex topic
Digital Enterprise Research Institute                www.deri.ie




            Destroying a potential income stream?
            Content licenses vs database licenses
            Mixing and compatibility of licenses
                   Wikipedia, OpenStreetMap
Irish PSI License
Digital Enterprise Research Institute                                   www.deri.ie




            Created in response to PSI Directive
            Available at http://psi.gov.ie/
            Problems: Documents may not be used “for the principal
             purpose of advertising or promoting a particular product
             or service”
                   Can't be combined with Wikipedia or OpenStreetMap
            Not an open license according to Open Definition
                   http://opendefinition.org/
Open database licenses
Digital Enterprise Research Institute                                         www.deri.ie




                                        http://opendefinition.org/licenses/
License features
Digital Enterprise Research Institute                            www.deri.ie




            You're allowed to do pretty much anything, provided
             you…
            Attribution (“By”) – give credit
            ShareAlike (“SA”) – adapted data must be published in
             the same way
Does Open Data have to be free?
Digital Enterprise Research Institute                              www.deri.ie




            Many would say yes
            A matter of terminology and definitions
            Either way there is nothing wrong with charging for
             certain data
Data protection
Digital Enterprise Research Institute                www.deri.ie




            Personal information is not open data
            Freedom of Information legislation
                   http://foi.gov.ie/
Summary
Digital Enterprise Research Institute                               www.deri.ie




            Stating an explicit license is important
            Irish PSI License: It's readily available, but not “open
             enough” for some applications
            Open Data Commons licenses with various constraints
Digital Enterprise Research Institute                     www.deri.ie




                                 5. List your data in a
                                      data catalog
Why?
Digital Enterprise Research Institute                            www.deri.ie




            So that people know it exists
            This is how the world learns about available data
            This is how you learn what they do and need
Some key information about a dataset
Digital Enterprise Research Institute          www.deri.ie




        What data is being published?
        What's the license?
        When was the data collected?
        When will it be updated, if at all?
        How was/is this data collected?
        What was/is the data used for?
        Contact person?
        Where to give feedback?
How to do this in practice?
Digital Enterprise Research Institute               www.deri.ie




            Have a simple page on your website
            Use an open community data catalog
            Set up your own catalog
            Use a national Irish data catalog???
Open community catalogs
Digital Enterprise Research Institute       www.deri.ie




            The Data Hub
                   http://thedatahub.org
            Irish CKAN
                   http://ie.ckan.net
Set up your own catalog
Digital Enterprise Research Institute       www.deri.ie




            Requires a budget
            Roll your own software?
                   data.fingal.ie
            Use open source, e.g., CKAN?
                   data.gov.uk
                   Berlin Open Data
                   …
National Irish data catalog?
Digital Enterprise Research Institute                             www.deri.ie




            CSO'sStatCentral?
            Marine Institute's ISDE?
            Who publishes the catalog in other countries?
                   UK: Cabinet Office
                   US: White House
                   Australia: Dept of Finance and Deregulation
                   New Zealand: Dept of Internal Affairs
Summary
Digital Enterprise Research Institute                                  www.deri.ie




        Data catalogs make it easy to find data
        Basic metadata, how to give feedback etc
        Important: How often are datasets accessed?
        “Request a dataset” feature
        Also: Open Data Ireland Google Group
                   http://groups.google.com/group/open-data-ireland
Five-shamrock scheme
Digital Enterprise Research Institute                                 www.deri.ie




                   1. Publish data on the web

                   2. Publish data in a machine-processable format

                   3. Use an open standard format

                   4. Publish under an open license

                   5. List your data in a data catalog

Mais conteúdo relacionado

Mais procurados

Open data showcase
Open data showcaseOpen data showcase
Open data showcase
Fadi Maali
 
Approximate Semantic Matching of Heterogeneous Events
Approximate Semantic Matching of Heterogeneous EventsApproximate Semantic Matching of Heterogeneous Events
Approximate Semantic Matching of Heterogeneous Events
Edward Curry
 

Mais procurados (20)

RDFa: putting RDF on the Web
RDFa: putting RDF on the WebRDFa: putting RDF on the Web
RDFa: putting RDF on the Web
 
Transitioning web application frameworks towards the Semantic Web (master the...
Transitioning web application frameworks towards the Semantic Web (master the...Transitioning web application frameworks towards the Semantic Web (master the...
Transitioning web application frameworks towards the Semantic Web (master the...
 
ISWC 2012 - Industry Track - Linked Enterprise Data: leveraging the Semantic ...
ISWC 2012 - Industry Track - Linked Enterprise Data: leveraging the Semantic ...ISWC 2012 - Industry Track - Linked Enterprise Data: leveraging the Semantic ...
ISWC 2012 - Industry Track - Linked Enterprise Data: leveraging the Semantic ...
 
Self-service Linked Government Data
Self-service Linked Government DataSelf-service Linked Government Data
Self-service Linked Government Data
 
Open data showcase
Open data showcaseOpen data showcase
Open data showcase
 
An Environmental Chargeback for Data Center and Cloud Computing Consumers
An Environmental Chargeback for Data Center and Cloud Computing ConsumersAn Environmental Chargeback for Data Center and Cloud Computing Consumers
An Environmental Chargeback for Data Center and Cloud Computing Consumers
 
Linked Data: opportunities and challenges
Linked Data: opportunities and challengesLinked Data: opportunities and challenges
Linked Data: opportunities and challenges
 
Leveraging existing Web Frameworks for a SIOC explorer (Scripting for the Sem...
Leveraging existing Web Frameworks for a SIOC explorer (Scripting for the Sem...Leveraging existing Web Frameworks for a SIOC explorer (Scripting for the Sem...
Leveraging existing Web Frameworks for a SIOC explorer (Scripting for the Sem...
 
AiLibrary Garage.com application review - by Gordon Kraft
AiLibrary Garage.com   application review - by Gordon Kraft AiLibrary Garage.com   application review - by Gordon Kraft
AiLibrary Garage.com application review - by Gordon Kraft
 
Querying Heterogeneous Datasets on the Linked Data Web
Querying Heterogeneous Datasets on the Linked Data WebQuerying Heterogeneous Datasets on the Linked Data Web
Querying Heterogeneous Datasets on the Linked Data Web
 
Enterprise Energy Management using a Linked Dataspace for Energy Intelligence
Enterprise Energy Management using a Linked Dataspace for Energy IntelligenceEnterprise Energy Management using a Linked Dataspace for Energy Intelligence
Enterprise Energy Management using a Linked Dataspace for Energy Intelligence
 
Using Linked Data and the Internet of Things for Energy Management
Using Linked Data and the Internet of Things for Energy ManagementUsing Linked Data and the Internet of Things for Energy Management
Using Linked Data and the Internet of Things for Energy Management
 
Transcript - DOIs to support citation of grey literature
Transcript - DOIs to support citation of grey literatureTranscript - DOIs to support citation of grey literature
Transcript - DOIs to support citation of grey literature
 
Applied Linked Open Data: A Mobile Solution for Galway Volvo Ocean Race
Applied Linked Open Data: A Mobile Solution for Galway Volvo Ocean RaceApplied Linked Open Data: A Mobile Solution for Galway Volvo Ocean Race
Applied Linked Open Data: A Mobile Solution for Galway Volvo Ocean Race
 
Approximate Semantic Matching of Heterogeneous Events
Approximate Semantic Matching of Heterogeneous EventsApproximate Semantic Matching of Heterogeneous Events
Approximate Semantic Matching of Heterogeneous Events
 
The Gnowsis Semantic Desktop approach to Personal Information Management - Di...
The Gnowsis Semantic Desktopapproach to Personal InformationManagement - Di...The Gnowsis Semantic Desktopapproach to Personal InformationManagement - Di...
The Gnowsis Semantic Desktop approach to Personal Information Management - Di...
 
Turning social disputes into knowledge representations DERI reading group 201...
Turning social disputes into knowledge representations DERI reading group 201...Turning social disputes into knowledge representations DERI reading group 201...
Turning social disputes into knowledge representations DERI reading group 201...
 
Introduction to Open Data
Introduction to Open DataIntroduction to Open Data
Introduction to Open Data
 
Knowledge management on the desktop
Knowledge management on the desktopKnowledge management on the desktop
Knowledge management on the desktop
 
Federating Distributed Social Data to Build an Interlinked Online Information...
Federating Distributed Social Data to Build an Interlinked Online Information...Federating Distributed Social Data to Build an Interlinked Online Information...
Federating Distributed Social Data to Build an Interlinked Online Information...
 

Semelhante a How to Publish Open Data

Open Data - Oi Sir Tim Hands Off My Spreadsheet
Open Data - Oi Sir Tim Hands Off My SpreadsheetOpen Data - Oi Sir Tim Hands Off My Spreadsheet
Open Data - Oi Sir Tim Hands Off My Spreadsheet
Snowflake Software
 
FAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceFAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practice
Carole Goble
 
Integrating figshare into our RDM workflow: University of Salford
Integrating figshare into our RDM workflow: University of SalfordIntegrating figshare into our RDM workflow: University of Salford
Integrating figshare into our RDM workflow: University of Salford
David Clay
 
Dublinked tech workshop_15_dec2011
Dublinked tech workshop_15_dec2011Dublinked tech workshop_15_dec2011
Dublinked tech workshop_15_dec2011
Dublinked .
 
Fsw2011 smob
Fsw2011 smobFsw2011 smob
Fsw2011 smob
juanaya
 

Semelhante a How to Publish Open Data (20)

Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...
 
Open Data - Where can it take us?
Open Data - Where can it take us? Open Data - Where can it take us?
Open Data - Where can it take us?
 
Interlinking Personal Semantic Data on the Semantic Desktop and the Web of Data
Interlinking Personal Semantic Data on the Semantic Desktop and the Web of DataInterlinking Personal Semantic Data on the Semantic Desktop and the Web of Data
Interlinking Personal Semantic Data on the Semantic Desktop and the Web of Data
 
Open Data - Oi Sir Tim Hands Off My Spreadsheet
Open Data - Oi Sir Tim Hands Off My SpreadsheetOpen Data - Oi Sir Tim Hands Off My Spreadsheet
Open Data - Oi Sir Tim Hands Off My Spreadsheet
 
From research to business: the Web of linked data
From research to business: the Web of linked dataFrom research to business: the Web of linked data
From research to business: the Web of linked data
 
Rethinking Microblogging: Open Distributed Semantic
Rethinking Microblogging: Open Distributed SemanticRethinking Microblogging: Open Distributed Semantic
Rethinking Microblogging: Open Distributed Semantic
 
Linked Building (Energy) Data
Linked Building (Energy) DataLinked Building (Energy) Data
Linked Building (Energy) Data
 
Linked data for Enterprise Data Integration
Linked data for Enterprise Data IntegrationLinked data for Enterprise Data Integration
Linked data for Enterprise Data Integration
 
Gilbane 2009 -- How Can Content Management Software Keep Pace?
Gilbane 2009 -- How Can Content Management Software Keep Pace?Gilbane 2009 -- How Can Content Management Software Keep Pace?
Gilbane 2009 -- How Can Content Management Software Keep Pace?
 
FAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceFAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practice
 
DERI Overview March 2009
DERI Overview March 2009DERI Overview March 2009
DERI Overview March 2009
 
Integrating figshare into our RDM workflow: University of Salford
Integrating figshare into our RDM workflow: University of SalfordIntegrating figshare into our RDM workflow: University of Salford
Integrating figshare into our RDM workflow: University of Salford
 
UKSG Conference 2016 Breakout Session - figshare in the wild – university cas...
UKSG Conference 2016 Breakout Session - figshare in the wild – university cas...UKSG Conference 2016 Breakout Session - figshare in the wild – university cas...
UKSG Conference 2016 Breakout Session - figshare in the wild – university cas...
 
Research Data Management at the University of Salford
Research Data Management at the University of SalfordResearch Data Management at the University of Salford
Research Data Management at the University of Salford
 
Llinked open data training for EU institutions
Llinked open data training for EU institutionsLlinked open data training for EU institutions
Llinked open data training for EU institutions
 
FAIR data: LOUD for all audiences
FAIR data: LOUD for all audiencesFAIR data: LOUD for all audiences
FAIR data: LOUD for all audiences
 
RDM@Edinburgh_interoperation_IDCC2015
RDM@Edinburgh_interoperation_IDCC2015RDM@Edinburgh_interoperation_IDCC2015
RDM@Edinburgh_interoperation_IDCC2015
 
Dublinked tech workshop_15_dec2011
Dublinked tech workshop_15_dec2011Dublinked tech workshop_15_dec2011
Dublinked tech workshop_15_dec2011
 
A distributional structured semantic space for querying rdf graph data
A distributional structured semantic space for querying rdf graph dataA distributional structured semantic space for querying rdf graph data
A distributional structured semantic space for querying rdf graph data
 
Fsw2011 smob
Fsw2011 smobFsw2011 smob
Fsw2011 smob
 

Mais de Richard Cyganiak

EDF2012: The Web of Data and its Five Stars
EDF2012: The Web of Data and its Five StarsEDF2012: The Web of Data and its Five Stars
EDF2012: The Web of Data and its Five Stars
Richard Cyganiak
 
How to get your data into Sindice and Google with sitemap4rdf
How to get your data into Sindice and Google with sitemap4rdfHow to get your data into Sindice and Google with sitemap4rdf
How to get your data into Sindice and Google with sitemap4rdf
Richard Cyganiak
 
Self-Service Linked Government Data with dcat and Gridworks
Self-Service Linked Government Data with dcat and GridworksSelf-Service Linked Government Data with dcat and Gridworks
Self-Service Linked Government Data with dcat and Gridworks
Richard Cyganiak
 

Mais de Richard Cyganiak (9)

SHACL: Shaping the Big Ball of Data Mud
SHACL: Shaping the Big Ball of Data MudSHACL: Shaping the Big Ball of Data Mud
SHACL: Shaping the Big Ball of Data Mud
 
What's New in RDF 1.1?
What's New in RDF 1.1?What's New in RDF 1.1?
What's New in RDF 1.1?
 
EDF2012: The Web of Data and its Five Stars
EDF2012: The Web of Data and its Five StarsEDF2012: The Web of Data and its Five Stars
EDF2012: The Web of Data and its Five Stars
 
Practical Cross-Dataset Queries with SPARQL (Introduction)
Practical Cross-Dataset Queries with SPARQL (Introduction)Practical Cross-Dataset Queries with SPARQL (Introduction)
Practical Cross-Dataset Queries with SPARQL (Introduction)
 
Sigma EE: Reaping low-hanging fruits in RDF-based data integration
Sigma EE: Reaping low-hanging fruits in RDF-based data integrationSigma EE: Reaping low-hanging fruits in RDF-based data integration
Sigma EE: Reaping low-hanging fruits in RDF-based data integration
 
Investigating Community Implementation of the GoodRelations Ontology
Investigating Community Implementation of the GoodRelations OntologyInvestigating Community Implementation of the GoodRelations Ontology
Investigating Community Implementation of the GoodRelations Ontology
 
How to get your data into Sindice and Google with sitemap4rdf
How to get your data into Sindice and Google with sitemap4rdfHow to get your data into Sindice and Google with sitemap4rdf
How to get your data into Sindice and Google with sitemap4rdf
 
Self-Service Linked Government Data with dcat and Gridworks
Self-Service Linked Government Data with dcat and GridworksSelf-Service Linked Government Data with dcat and Gridworks
Self-Service Linked Government Data with dcat and Gridworks
 
The State of Linked Government Data
The State of Linked Government DataThe State of Linked Government Data
The State of Linked Government Data
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

How to Publish Open Data

  • 1. Digital Enterprise Research Institute www.deri.ie How to publish Open Data Richard Cyganiak Opening Up Government Data – Galway, 8 Nov 2011 Stefan.Decker@deri.org http://www.StefanDecker.org/ Copyright 2010 Digital Enterprise Research Institute. All rights reserved.
  • 2. TimBL’s 5-star plan for open data Digital Enterprise Research Institute www.deri.ie ★Make your stuff available on the Web ★★Make it available as structured data (e.g., an Excel sheet instead of image scan of a table) ★★★Use a non-proprietary format (e.g., a CSV file instead of an Excel sheet) ★★★★Use linked data format (i.e., URIs to identify things, and RDF to represent data) ★★★★★Link your data to other people’s data to provide context Source: http://inkdroid.org/journal/2010/06/04/the-5-stars-of-open-linked-data/
  • 3. Five-shamrock scheme Digital Enterprise Research Institute www.deri.ie
  • 4. Five-shamrock scheme Digital Enterprise Research Institute www.deri.ie  1. Publish data on the web
  • 5. Five-shamrock scheme Digital Enterprise Research Institute www.deri.ie  1. Publish data on the web  2. Publish data in a machine-processableformat
  • 6. Five-shamrock scheme Digital Enterprise Research Institute www.deri.ie  1. Publish data on the web  2. Publish data in a machine-processable format  3. Use an open standard format
  • 7. Five-shamrock scheme Digital Enterprise Research Institute www.deri.ie  1. Publish data on the web  2. Publish data in a machine-processable format  3. Use an open standard format  4. Publish under an open license
  • 8. Five-shamrock scheme Digital Enterprise Research Institute www.deri.ie  1. Publish data on the web  2. Publish data in a machine-processable format  3. Use an open standard format  4. Publish under an open license  5. List your data in a data catalog
  • 9. Digital Enterprise Research Institute www.deri.ie 1. Publish data on the web
  • 10. Why? Digital Enterprise Research Institute www.deri.ie  The web is where people look for it first  Google can index it  Less phone calls and emails (and FoI requests) to answer
  • 11. Lots of data is already there Digital Enterprise Research Institute www.deri.ie  Databases  Reports  Spreadsheets  Maps
  • 12. Digital Enterprise Research Institute www.deri.ie 2. Publish data in a machine- processableformat
  • 13. Why? Digital Enterprise Research Institute www.deri.ie  Allow others to do their own processing, analysis and visualisation of your data  New services, new ideas
  • 14. Examples Digital Enterprise Research Institute www.deri.ie  CSO Quarterly National Household Survey  http://cso.ie/qnhs/calendar_quarters_qnhs.htm  EPA enforcement files and ScraperWiki  http://www.epa.ie/whatwedo/enforce/lic/info/  https://views.scraperwiki.com/run/irish-epa-visuals/  Galway and Fingal planning applications  http://lab.linkeddata.deri.ie/2010/planning-apps/  Getting the data: 210 lines of code vs. 30 lines of code
  • 15. Symptom: screenscraping Digital Enterprise Research Institute www.deri.ie  People use tools like ScraperWiki to get at data that isn't machine-readable  https://scraperwiki.com/tags/ireland  Scraping is not the right way of doing this  Expensive  Brittle  Strain on computing resources
  • 16. Formats Digital Enterprise Research Institute www.deri.ie  Good: MS Excel, CSV, XML, JSON, Microdata  Not so good: Pure websites, MS Word  Bad: PDF  Really bad: Only charts/maps without numbers
  • 17. Good practices Digital Enterprise Research Institute www.deri.ie  Publish in multiple formats, at least one machine- readable  Publish Excel files alongside large PDF reports  Publish CSV alongside database-backed web applications
  • 18. Digital Enterprise Research Institute www.deri.ie 3. Use an open standard format
  • 19. Why? Digital Enterprise Research Institute www.deri.ie  Not all formats are created equal  Some formats bring many tools and applications that people can already use
  • 20. Quick tour of formats Digital Enterprise Research Institute www.deri.ie  CSV – Comma-Separated Values  More open (and simpler) alternative to Excel format  Can be opened in and exported from Excel, Google Spreadsheets, Google Refine, …  KML – Keyhole Markup Language  Simple format for presenting geographic data  Can be opened in Google Maps  RSS – Really Simple Syndication  Notifications of updates of any kind  Can be opened in RSS readers and many email clients
  • 21. Developer-oriented formats Digital Enterprise Research Institute www.deri.ie  XML – Extensible Markup Language  W3C (World Wide Web Consortium) standard, 1997  established, reliable, ubiquitous  JSON – Javascript Object Notation  IETF (Internet Engineering Task Force) standard, 2006  great for web APIs  very simple; very fashionable right now  RDF – Resource Description Framework  W3C standard, 2004  great for data integration  steeper learning curve
  • 22. Also: standard classifications Digital Enterprise Research Institute www.deri.ie  Within your data, use the same categories as everybody else  CSO  http://www.cso.ie/surveysandmethodologies/classifications_stan. htm  StatCentral list of classifications  http://www.statcentral.ie/classifications.asp
  • 23. Also: standard identifiers Digital Enterprise Research Institute www.deri.ie  Example: School roll numbers  Department of Education publishes an Excel file with all school roll numbers  Can be used to Google the same school on other websites, school evaluation reports etc  Example: Ordnance Survey UK geo identifiers  Uses URIs (web addresses) as identifiers  http://data.ordnancesurvey.co.uk/doc/7000000000037256  Great for use in RDF
  • 24. Linked Open Data Cloud Digital Enterprise Research Institute www.deri.ie
  • 25. Summary Digital Enterprise Research Institute www.deri.ie  Prefer open, widely used standards  But: also prefer what you know best  Support multiple formats for different audiences where it makes sense  Great: CSV, KML, RSS, XML, JSON
  • 26. Digital Enterprise Research Institute www.deri.ie 4. Publish under an open license
  • 27. Why? Digital Enterprise Research Institute www.deri.ie  Regulates what others can and cannot do with the data  For re-users, uncertainty about rights is a major concern  A good way to ensure that your organisation gets acknowledged  You need some non-discriminatory policy for giving rights to the data anyway (PSI directive)
  • 28. Complex topic Digital Enterprise Research Institute www.deri.ie  Destroying a potential income stream?  Content licenses vs database licenses  Mixing and compatibility of licenses  Wikipedia, OpenStreetMap
  • 29. Irish PSI License Digital Enterprise Research Institute www.deri.ie  Created in response to PSI Directive  Available at http://psi.gov.ie/  Problems: Documents may not be used “for the principal purpose of advertising or promoting a particular product or service”  Can't be combined with Wikipedia or OpenStreetMap  Not an open license according to Open Definition  http://opendefinition.org/
  • 30. Open database licenses Digital Enterprise Research Institute www.deri.ie http://opendefinition.org/licenses/
  • 31. License features Digital Enterprise Research Institute www.deri.ie  You're allowed to do pretty much anything, provided you…  Attribution (“By”) – give credit  ShareAlike (“SA”) – adapted data must be published in the same way
  • 32. Does Open Data have to be free? Digital Enterprise Research Institute www.deri.ie  Many would say yes  A matter of terminology and definitions  Either way there is nothing wrong with charging for certain data
  • 33. Data protection Digital Enterprise Research Institute www.deri.ie  Personal information is not open data  Freedom of Information legislation  http://foi.gov.ie/
  • 34. Summary Digital Enterprise Research Institute www.deri.ie  Stating an explicit license is important  Irish PSI License: It's readily available, but not “open enough” for some applications  Open Data Commons licenses with various constraints
  • 35. Digital Enterprise Research Institute www.deri.ie 5. List your data in a data catalog
  • 36. Why? Digital Enterprise Research Institute www.deri.ie  So that people know it exists  This is how the world learns about available data  This is how you learn what they do and need
  • 37. Some key information about a dataset Digital Enterprise Research Institute www.deri.ie  What data is being published?  What's the license?  When was the data collected?  When will it be updated, if at all?  How was/is this data collected?  What was/is the data used for?  Contact person?  Where to give feedback?
  • 38. How to do this in practice? Digital Enterprise Research Institute www.deri.ie  Have a simple page on your website  Use an open community data catalog  Set up your own catalog  Use a national Irish data catalog???
  • 39. Open community catalogs Digital Enterprise Research Institute www.deri.ie  The Data Hub  http://thedatahub.org  Irish CKAN  http://ie.ckan.net
  • 40. Set up your own catalog Digital Enterprise Research Institute www.deri.ie  Requires a budget  Roll your own software?  data.fingal.ie  Use open source, e.g., CKAN?  data.gov.uk  Berlin Open Data  …
  • 41. National Irish data catalog? Digital Enterprise Research Institute www.deri.ie  CSO'sStatCentral?  Marine Institute's ISDE?  Who publishes the catalog in other countries?  UK: Cabinet Office  US: White House  Australia: Dept of Finance and Deregulation  New Zealand: Dept of Internal Affairs
  • 42. Summary Digital Enterprise Research Institute www.deri.ie  Data catalogs make it easy to find data  Basic metadata, how to give feedback etc  Important: How often are datasets accessed?  “Request a dataset” feature  Also: Open Data Ireland Google Group  http://groups.google.com/group/open-data-ireland
  • 43. Five-shamrock scheme Digital Enterprise Research Institute www.deri.ie  1. Publish data on the web  2. Publish data in a machine-processable format  3. Use an open standard format  4. Publish under an open license  5. List your data in a data catalog