SlideShare a Scribd company logo
1 of 19
Download to read offline
Johann Höchtl, Danube University Krems, Austria
Institutionalising Open Data Quality:
Processes, Standards, Tools
Open Data Quality: from Theory to Practice
2
1. Assess data quality
3
What is Data Quality?
http://opendata.stackexchange.com/questions/613/what-are-the-data-quality-measures-for-open-data
4
A: Measures towards Trust
1.Establish quantitative measures
2.Provide statistics
3.Show-case lighthouse projects and business use
Trust
Quantity research
Evaluation Community involvement /
management
5
2. Solve current problems
6
● Inconsistent encoding
– Microsoft Excel caused data problems even when used […]
UTF-8
– Data contaminated with characters incomprehensible to
UTF-8; ill-formatted following UTF-8; flipped erratically
between other character formats; used US ASCII standard,
ISO-8859 standard and a similar non-ISO encoding
● Inconsistent dates, file names, data fields
– Data were regularly formatted with commas; changed its
filename convention; omitted or added data fields; changed
the way it formatted dates
http://www.computerweekly.com/news/2240227682/Poor-data-quality-hindering-government-open-data-transparency-programme
Mundane problems -
Encodings & Formats
7
Mundane problems –
Broken Links
http://thomaslevine.com/!/data-catalog-dead-links/
http://openstate.eu/2014/06/nederlands-nauwelijks-nieuwe-Datasets-op-data-overheid-nl/
City of Vienna – Resource check
8
B: Measures towards Open Data
Quality: Process Domain
● Data publication must be made an integral, well- defined
and standardized part of daily procedures and routines
– A. Zuiderwijk, M. Janssen, S. Choenni, and R. Meijer, “Design principles for improving the process of publishing open data,”
Transforming Government: People, Process and Policy, vol. 8, no. 2, pp. 185–204, 2014.
● Process model in which open data serves as a facilitator
towards open government
– G. Lee and Y. H. Kwak, “An Open Government Implementation Model: Moving to Increased Public Engagement,” IBM Center
for The Business of Government, Jan. 2011 [Online]. Available:
http://www.businessofgovernment.org/sites/default/files/An%20Open%20Government%20Implementation%20Model.pdf
● Establish a Chief Data Officer
– Y. Lee, “A cubic framework for the chief data officer : succeeding in a world of big data,” 2014.
9
B: Measures towards Open Data
Quality: Standards Domain
● Data on the Web
– Data on the Web Best Practices Working Group Charter
http://www.w3.org/2013/05/odbp-charter.html
– Encodings: UTF8
● File formats
– CSV: CSV on the Web Working Group
http://www.w3.org/2013/csvw/wiki/Main_Page
– Frictionless open Data: CSV Files (OKFN guidance document)
http://data.okfn.org/doc/csv
● Data entities
– Geo-Data: Spatial Data on the Web Working Group Charter
http://www.w3.org/2015/spatial/charter
– Date & Time: ISO 8601 http://www.w3.org/TR/NOTE-datetime
10
B: Measures towards Open Data
Quality: Tools Domain
● Identify Problems
● Curate File Formats & Encodings
https://github.com/ckan/ideas-and-roadmap/issues/65
T. Levine, “How can we figure out what is inside thousands of spreadsheets?,”
CEUR workshop proceedings, vol. 1209, pp. 34–38, Jul. 2014.
http://ceur-ws.org/Vol-1209/paper_12.pdf
11
Measures Towards Open Data Quality
Processes
Tools
ISOStandards
12
Open Data Quality at the
European Open Data Portal
● A.6. Mechanisms for probing broken links
The portal infrastructure will include a mechanism for
systematically probing for broken links. […] The contractor will
define and implement a communication protocol to alert the
owner of the resource.
● A.8. Mechanism allowing data linking
When RDF, * record a link between datasets that use the same
URIs; * propose a mapping between URIs that are likely to
denote the same entities
● B.6. User feedback mechanism
Allowing visitors […] suggestions for improvements in the data
quality
13
Open Data Quality in Austria
● Cooperation OGD Austria represents administration
open data portal operators
– Defines standards and procedures
– Aligned with International, European and D-A-CH efforts
● Institutionalising effort by
Sub-Working Group of Cooperation OGD Austria
Licenses
Metadata
Open Documents
Quality
Linked Data
Cooperation
14
2
check
Portal betrieben von Provider
Data producer
Data consumer
publishes
obtains
Data
references
produces
Monitor
3
checks
checks
1
4
provides
Community-Portal
Data portal
operated by
improves
5
delivers
improvesinformes
Open Data Quality
Integration Framework
1.Quality processes and
procedure models to assess
and publish data
2.Contributions of the Open
Data users
3.Quality checks when entering
(meta-)data descriptions at
the data portal
4.Monitoring of data quality
over time
5.Community-driven data portal
with user-generated content,
e.g. enrich metadata,
alternative data formats, etc.
Partners
Project duration: 30 months
o Start: October 2015
o End: March 2018
Semantic Web Company Danube University Krems Vienna University of Economics and
Business
Improving Data Quality in Open Data
ADEQUATe
• Data contaminated
with characters
incomprehensible to
UTF-8; flipped
erratically between
other character
formats;
• Data were regularly
formatted with
commas; changed its
filename convention;
A D E Q U A T e
D a t a Q u a l i t y D o m a i n s & C h a l l e n g e s
Address Data Quality in
three domains:
1. Tools
2. Standards
3. Processes
HowWhy
Easy to check
Availability, conformance,
processability, timeliness,
representational-consistency;
interlinking, conformance to
vocabularies, provenance
(Linked Data)
Hard to assess
Completeness, consistency,
accuracy, credibility, relevance
What
ADEQUATe: GOALS
Improving Data Quality on Open Data
1. Quality measures
2. Evolution monitor
3. Quality improvement through
o Algorithms
o Data linkage
o Crowdsourcing
Use cases
ADEQUATe: GOALS
M24
Refinements
M18
Quality improvements Use case connection
M12
Quality monitoring framework Data linkage
M8
Architecture Blueprint
M6
Quality metrics Requirements
ADEQUATe: GOALSADEQUATe: Milestones
19
Donau-Universität Krems.
Die Universität für Weiterbildung.
Johann Höchtl
Center for E-Governance
Johann.hoechtl@donau-uni.ac.at
@myprivate42
at.linkedin.com/in/johannhoechtl
CC-BY 3.0

More Related Content

Viewers also liked

Martin Bardsley: Quality In Austerity-Indicators of Quality
Martin Bardsley: Quality In Austerity-Indicators of QualityMartin Bardsley: Quality In Austerity-Indicators of Quality
Martin Bardsley: Quality In Austerity-Indicators of Quality
Nuffield Trust
 
Linking Open Data with Drupal
Linking Open Data with DrupalLinking Open Data with Drupal
Linking Open Data with Drupal
emmanuel_jamin
 
Rigor and relevance ppt
Rigor and relevance pptRigor and relevance ppt
Rigor and relevance ppt
deborahsutton
 

Viewers also liked (20)

Enterprise Data World Webinars: Data Quality for Data Modelers
Enterprise Data World Webinars: Data Quality for Data ModelersEnterprise Data World Webinars: Data Quality for Data Modelers
Enterprise Data World Webinars: Data Quality for Data Modelers
 
Applied semantic technology and linked data
Applied semantic technology and linked dataApplied semantic technology and linked data
Applied semantic technology and linked data
 
Martin Bardsley: Quality In Austerity-Indicators of Quality
Martin Bardsley: Quality In Austerity-Indicators of QualityMartin Bardsley: Quality In Austerity-Indicators of Quality
Martin Bardsley: Quality In Austerity-Indicators of Quality
 
Query-Driven Management of Linked Data Quality
Query-Driven Management of Linked Data QualityQuery-Driven Management of Linked Data Quality
Query-Driven Management of Linked Data Quality
 
Linking Open Data with Drupal
Linking Open Data with DrupalLinking Open Data with Drupal
Linking Open Data with Drupal
 
Sieve - Data Quality and Fusion - LWDM2012
Sieve - Data Quality and Fusion - LWDM2012Sieve - Data Quality and Fusion - LWDM2012
Sieve - Data Quality and Fusion - LWDM2012
 
Linked Data Quality Assessment – daQ and Luzzu
Linked Data Quality Assessment – daQ and LuzzuLinked Data Quality Assessment – daQ and Luzzu
Linked Data Quality Assessment – daQ and Luzzu
 
Open Data for Social Good
Open Data for Social GoodOpen Data for Social Good
Open Data for Social Good
 
Linked Data in Libraries
Linked Data in LibrariesLinked Data in Libraries
Linked Data in Libraries
 
Quality Metrics for Linked Open Data
Quality Metrics for  Linked Open Data Quality Metrics for  Linked Open Data
Quality Metrics for Linked Open Data
 
Rigor and relevance ppt
Rigor and relevance pptRigor and relevance ppt
Rigor and relevance ppt
 
Data Quality: A Raising Data Warehousing Concern
Data Quality: A Raising Data Warehousing ConcernData Quality: A Raising Data Warehousing Concern
Data Quality: A Raising Data Warehousing Concern
 
Linked Open Data for Libraries
Linked Open Data for LibrariesLinked Open Data for Libraries
Linked Open Data for Libraries
 
Linked Data Quality Assessment: A Survey
Linked Data Quality Assessment: A SurveyLinked Data Quality Assessment: A Survey
Linked Data Quality Assessment: A Survey
 
Linked Open Data
Linked Open DataLinked Open Data
Linked Open Data
 
Ensuring data quality
Ensuring data qualityEnsuring data quality
Ensuring data quality
 
Open data quality
Open data qualityOpen data quality
Open data quality
 
Data Quality Presentation
Data Quality PresentationData Quality Presentation
Data Quality Presentation
 
Linked Open Data Principles, Technologies and Examples
Linked Open Data Principles, Technologies and ExamplesLinked Open Data Principles, Technologies and Examples
Linked Open Data Principles, Technologies and Examples
 
Data, Information And Knowledge Management Framework And The Data Management ...
Data, Information And Knowledge Management Framework And The Data Management ...Data, Information And Knowledge Management Framework And The Data Management ...
Data, Information And Knowledge Management Framework And The Data Management ...
 

Similar to Institutionalising open data quality - Processes Standards, Tools

Prcn 2019 stage 1264-question-presentation_poster file_id-15
Prcn 2019 stage 1264-question-presentation_poster file_id-15Prcn 2019 stage 1264-question-presentation_poster file_id-15
Prcn 2019 stage 1264-question-presentation_poster file_id-15
madynav
 
U mass data-project proposal-noapp
U mass data-project proposal-noappU mass data-project proposal-noapp
U mass data-project proposal-noapp
Daisy LaFlamme
 

Similar to Institutionalising open data quality - Processes Standards, Tools (20)

ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)
 
Prcn 2019 stage 1264-question-presentation_poster file_id-15
Prcn 2019 stage 1264-question-presentation_poster file_id-15Prcn 2019 stage 1264-question-presentation_poster file_id-15
Prcn 2019 stage 1264-question-presentation_poster file_id-15
 
Allotrope Foundation & OSTHUS at SmartLab Exchange 2015: Update on the Allotr...
Allotrope Foundation & OSTHUS at SmartLab Exchange 2015: Update on the Allotr...Allotrope Foundation & OSTHUS at SmartLab Exchange 2015: Update on the Allotr...
Allotrope Foundation & OSTHUS at SmartLab Exchange 2015: Update on the Allotr...
 
ALIGNED Data Curation Methods and Tools
ALIGNED Data Curation Methods and ToolsALIGNED Data Curation Methods and Tools
ALIGNED Data Curation Methods and Tools
 
Presentation ADEQUATe Project: Workshop on Quality Assessment and Improvement...
Presentation ADEQUATe Project: Workshop on Quality Assessment and Improvement...Presentation ADEQUATe Project: Workshop on Quality Assessment and Improvement...
Presentation ADEQUATe Project: Workshop on Quality Assessment and Improvement...
 
U mass data-project proposal-noapp
U mass data-project proposal-noappU mass data-project proposal-noapp
U mass data-project proposal-noapp
 
EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 14, 2016...
EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 14, 2016...EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 14, 2016...
EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 14, 2016...
 
WEBINAR: Open Research Data in Horizon 2020
WEBINAR: Open Research Data in Horizon 2020WEBINAR: Open Research Data in Horizon 2020
WEBINAR: Open Research Data in Horizon 2020
 
H2020 Open Data Pilot
H2020 Open Data PilotH2020 Open Data Pilot
H2020 Open Data Pilot
 
EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 7, 2016|...
EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 7, 2016|...EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 7, 2016|...
EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 7, 2016|...
 
Open Research Data & H2020
Open Research Data & H2020Open Research Data & H2020
Open Research Data & H2020
 
HKU Data Curation MLIM7350 Student Project: Data Curation Workshop
HKU Data Curation MLIM7350 Student Project: Data Curation WorkshopHKU Data Curation MLIM7350 Student Project: Data Curation Workshop
HKU Data Curation MLIM7350 Student Project: Data Curation Workshop
 
H2020 data pilot openaire
H2020 data pilot openaireH2020 data pilot openaire
H2020 data pilot openaire
 
The Horizon 2020 Open Data Pilot - OpenAIRE webinar (Oct. 21 2014) by Sarah J...
The Horizon 2020 Open Data Pilot - OpenAIRE webinar (Oct. 21 2014) by Sarah J...The Horizon 2020 Open Data Pilot - OpenAIRE webinar (Oct. 21 2014) by Sarah J...
The Horizon 2020 Open Data Pilot - OpenAIRE webinar (Oct. 21 2014) by Sarah J...
 
How links can make your open data even greater
How links can make your open data even greaterHow links can make your open data even greater
How links can make your open data even greater
 
How to make your data count webinar, 26 Nov 2018
How to make your data count webinar, 26 Nov 2018How to make your data count webinar, 26 Nov 2018
How to make your data count webinar, 26 Nov 2018
 
Experimental transformation of ABS data into Data Cube Vocabulary (DCV) form...
Experimental transformation of  ABS data into Data Cube Vocabulary (DCV) form...Experimental transformation of  ABS data into Data Cube Vocabulary (DCV) form...
Experimental transformation of ABS data into Data Cube Vocabulary (DCV) form...
 
COMSODE networking session at ICT Lisbon 2015
COMSODE networking session at ICT Lisbon 2015COMSODE networking session at ICT Lisbon 2015
COMSODE networking session at ICT Lisbon 2015
 
Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT
Research Data Management: An Introductory Webinar from OpenAIRE and EUDATResearch Data Management: An Introductory Webinar from OpenAIRE and EUDAT
Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT
 
Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT
Research Data Management: An Introductory Webinar from OpenAIRE and EUDATResearch Data Management: An Introductory Webinar from OpenAIRE and EUDAT
Research Data Management: An Introductory Webinar from OpenAIRE and EUDAT
 

More from Johann Höchtl

Smart Cities, Smart Regions and the Role of Open Data
Smart Cities, Smart Regions and the Role of Open DataSmart Cities, Smart Regions and the Role of Open Data
Smart Cities, Smart Regions and the Role of Open Data
Johann Höchtl
 
DCAT-Application Profile for Data Providers
DCAT-Application Profile for Data ProvidersDCAT-Application Profile for Data Providers
DCAT-Application Profile for Data Providers
Johann Höchtl
 

More from Johann Höchtl (20)

Homomorphic encryption on Blockchain Principles
Homomorphic encryption on Blockchain PrinciplesHomomorphic encryption on Blockchain Principles
Homomorphic encryption on Blockchain Principles
 
Performance-indicator based policy-making in Austria
Performance-indicator based policy-making in AustriaPerformance-indicator based policy-making in Austria
Performance-indicator based policy-making in Austria
 
Datenqualität auf Offenen Datenportalen
Datenqualität auf Offenen DatenportalenDatenqualität auf Offenen Datenportalen
Datenqualität auf Offenen Datenportalen
 
ADV FIWARE Workshop starring Docker and Virtualisation
ADV FIWARE Workshop starring Docker and VirtualisationADV FIWARE Workshop starring Docker and Virtualisation
ADV FIWARE Workshop starring Docker and Virtualisation
 
Yound Coders Festival
Yound Coders FestivalYound Coders Festival
Yound Coders Festival
 
Sind wir schon da?!
Sind wir schon da?!Sind wir schon da?!
Sind wir schon da?!
 
Offener Haushalt – Transparenz in öffentlichen Haushalten
Offener Haushalt – Transparenz in öffentlichen HaushaltenOffener Haushalt – Transparenz in öffentlichen Haushalten
Offener Haushalt – Transparenz in öffentlichen Haushalten
 
Datenqualität von Datenportalen
Datenqualität von DatenportalenDatenqualität von Datenportalen
Datenqualität von Datenportalen
 
Open Government Data & offene Wirtschaftsdaten - Two of a Kind?
Open Government Data & offene Wirtschaftsdaten - Two of a Kind?Open Government Data & offene Wirtschaftsdaten - Two of a Kind?
Open Government Data & offene Wirtschaftsdaten - Two of a Kind?
 
Elektronische Literaturverwaltung mit Zotero
Elektronische Literaturverwaltung mit ZoteroElektronische Literaturverwaltung mit Zotero
Elektronische Literaturverwaltung mit Zotero
 
The Case of opendataportal.at
The Case of opendataportal.atThe Case of opendataportal.at
The Case of opendataportal.at
 
From E-Government to Open Government
From E-Government to Open GovernmentFrom E-Government to Open Government
From E-Government to Open Government
 
Smart Cities and Smart ICT
Smart Cities and Smart ICTSmart Cities and Smart ICT
Smart Cities and Smart ICT
 
Evaluation of Open Government Data Implementation of City of Vienna
Evaluation of Open Government Data Implementation of City of ViennaEvaluation of Open Government Data Implementation of City of Vienna
Evaluation of Open Government Data Implementation of City of Vienna
 
Costs of Closed Science
Costs of Closed ScienceCosts of Closed Science
Costs of Closed Science
 
Smart Cities, Smart Regions and the Role of Open Data
Smart Cities, Smart Regions and the Role of Open DataSmart Cities, Smart Regions and the Role of Open Data
Smart Cities, Smart Regions and the Role of Open Data
 
OGD for Culture and Art
OGD for Culture and ArtOGD for Culture and Art
OGD for Culture and Art
 
Evaluierung der Open Government Data Umsetzung der Stadt Wien - Auszug
Evaluierung der Open Government Data Umsetzung der Stadt Wien - AuszugEvaluierung der Open Government Data Umsetzung der Stadt Wien - Auszug
Evaluierung der Open Government Data Umsetzung der Stadt Wien - Auszug
 
Open Government Data DCAT Application Profile
Open Government Data DCAT Application ProfileOpen Government Data DCAT Application Profile
Open Government Data DCAT Application Profile
 
DCAT-Application Profile for Data Providers
DCAT-Application Profile for Data ProvidersDCAT-Application Profile for Data Providers
DCAT-Application Profile for Data Providers
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 

Institutionalising open data quality - Processes Standards, Tools

  • 1. Johann Höchtl, Danube University Krems, Austria Institutionalising Open Data Quality: Processes, Standards, Tools Open Data Quality: from Theory to Practice
  • 3. 3 What is Data Quality? http://opendata.stackexchange.com/questions/613/what-are-the-data-quality-measures-for-open-data
  • 4. 4 A: Measures towards Trust 1.Establish quantitative measures 2.Provide statistics 3.Show-case lighthouse projects and business use Trust Quantity research Evaluation Community involvement / management
  • 6. 6 ● Inconsistent encoding – Microsoft Excel caused data problems even when used […] UTF-8 – Data contaminated with characters incomprehensible to UTF-8; ill-formatted following UTF-8; flipped erratically between other character formats; used US ASCII standard, ISO-8859 standard and a similar non-ISO encoding ● Inconsistent dates, file names, data fields – Data were regularly formatted with commas; changed its filename convention; omitted or added data fields; changed the way it formatted dates http://www.computerweekly.com/news/2240227682/Poor-data-quality-hindering-government-open-data-transparency-programme Mundane problems - Encodings & Formats
  • 7. 7 Mundane problems – Broken Links http://thomaslevine.com/!/data-catalog-dead-links/ http://openstate.eu/2014/06/nederlands-nauwelijks-nieuwe-Datasets-op-data-overheid-nl/ City of Vienna – Resource check
  • 8. 8 B: Measures towards Open Data Quality: Process Domain ● Data publication must be made an integral, well- defined and standardized part of daily procedures and routines – A. Zuiderwijk, M. Janssen, S. Choenni, and R. Meijer, “Design principles for improving the process of publishing open data,” Transforming Government: People, Process and Policy, vol. 8, no. 2, pp. 185–204, 2014. ● Process model in which open data serves as a facilitator towards open government – G. Lee and Y. H. Kwak, “An Open Government Implementation Model: Moving to Increased Public Engagement,” IBM Center for The Business of Government, Jan. 2011 [Online]. Available: http://www.businessofgovernment.org/sites/default/files/An%20Open%20Government%20Implementation%20Model.pdf ● Establish a Chief Data Officer – Y. Lee, “A cubic framework for the chief data officer : succeeding in a world of big data,” 2014.
  • 9. 9 B: Measures towards Open Data Quality: Standards Domain ● Data on the Web – Data on the Web Best Practices Working Group Charter http://www.w3.org/2013/05/odbp-charter.html – Encodings: UTF8 ● File formats – CSV: CSV on the Web Working Group http://www.w3.org/2013/csvw/wiki/Main_Page – Frictionless open Data: CSV Files (OKFN guidance document) http://data.okfn.org/doc/csv ● Data entities – Geo-Data: Spatial Data on the Web Working Group Charter http://www.w3.org/2015/spatial/charter – Date & Time: ISO 8601 http://www.w3.org/TR/NOTE-datetime
  • 10. 10 B: Measures towards Open Data Quality: Tools Domain ● Identify Problems ● Curate File Formats & Encodings https://github.com/ckan/ideas-and-roadmap/issues/65 T. Levine, “How can we figure out what is inside thousands of spreadsheets?,” CEUR workshop proceedings, vol. 1209, pp. 34–38, Jul. 2014. http://ceur-ws.org/Vol-1209/paper_12.pdf
  • 11. 11 Measures Towards Open Data Quality Processes Tools ISOStandards
  • 12. 12 Open Data Quality at the European Open Data Portal ● A.6. Mechanisms for probing broken links The portal infrastructure will include a mechanism for systematically probing for broken links. […] The contractor will define and implement a communication protocol to alert the owner of the resource. ● A.8. Mechanism allowing data linking When RDF, * record a link between datasets that use the same URIs; * propose a mapping between URIs that are likely to denote the same entities ● B.6. User feedback mechanism Allowing visitors […] suggestions for improvements in the data quality
  • 13. 13 Open Data Quality in Austria ● Cooperation OGD Austria represents administration open data portal operators – Defines standards and procedures – Aligned with International, European and D-A-CH efforts ● Institutionalising effort by Sub-Working Group of Cooperation OGD Austria Licenses Metadata Open Documents Quality Linked Data Cooperation
  • 14. 14 2 check Portal betrieben von Provider Data producer Data consumer publishes obtains Data references produces Monitor 3 checks checks 1 4 provides Community-Portal Data portal operated by improves 5 delivers improvesinformes Open Data Quality Integration Framework 1.Quality processes and procedure models to assess and publish data 2.Contributions of the Open Data users 3.Quality checks when entering (meta-)data descriptions at the data portal 4.Monitoring of data quality over time 5.Community-driven data portal with user-generated content, e.g. enrich metadata, alternative data formats, etc.
  • 15. Partners Project duration: 30 months o Start: October 2015 o End: March 2018 Semantic Web Company Danube University Krems Vienna University of Economics and Business Improving Data Quality in Open Data ADEQUATe
  • 16. • Data contaminated with characters incomprehensible to UTF-8; flipped erratically between other character formats; • Data were regularly formatted with commas; changed its filename convention; A D E Q U A T e D a t a Q u a l i t y D o m a i n s & C h a l l e n g e s Address Data Quality in three domains: 1. Tools 2. Standards 3. Processes HowWhy Easy to check Availability, conformance, processability, timeliness, representational-consistency; interlinking, conformance to vocabularies, provenance (Linked Data) Hard to assess Completeness, consistency, accuracy, credibility, relevance What
  • 17. ADEQUATe: GOALS Improving Data Quality on Open Data 1. Quality measures 2. Evolution monitor 3. Quality improvement through o Algorithms o Data linkage o Crowdsourcing Use cases ADEQUATe: GOALS
  • 18. M24 Refinements M18 Quality improvements Use case connection M12 Quality monitoring framework Data linkage M8 Architecture Blueprint M6 Quality metrics Requirements ADEQUATe: GOALSADEQUATe: Milestones
  • 19. 19 Donau-Universität Krems. Die Universität für Weiterbildung. Johann Höchtl Center for E-Governance Johann.hoechtl@donau-uni.ac.at @myprivate42 at.linkedin.com/in/johannhoechtl CC-BY 3.0