2. What?
An Open (Govt.) Data Monitoring Tool
– Metadata Quality and Consistency
– Benchmarking: Who fixed what and how fast?
– Is the data still there?
3. Why?
●
Dangling URLs into Nirvana
– Data is meant to stay
●
(Meta-)Data is required to be consistent in order
to be useful
●
Tendency to give without monitoring
– Decoupled Metadata from Data
– Question of responsibility
4. How?
●
Watcher
– Get all metadata from CKAN data portal (legacy API calls)
– Analyse metadata and URLs
– Write result into staging database (SQL)
– Watch for new / changed datasets
●
Analyser
– Perform analysis on staging area (partly long-running and tedious), write result into RedisDB
●
Who has the most data released? EASY!
●
Who uploaded when which datasets?
●
Who fixed the most mistakes during the last week?
●
Who has the longest outstanding bugs?
●
Which datasets are no more available?
5. How? ctd.
●
Presentation
– Make some fancy display from the Redis results
– Data drill-down
–
– What else?
6. Architecture
●
Heroku PaaS
●
PostgreSQL data store
●
Redis for ephemeral data
●
Application logic in Go
●
Front-end using Bootstrap & AngularJS
7. What's there
●
Metadata spec machine readable
http://htmlpreview.github.io/?https://github.com/the42/ogdat/blob/master/ppogdatspec/ogdat_s
(automated conversion process from PDF [sic!])
●
Watcher stable
●
Analyser work in progress
●
Presentation layer: HELP
8. Show me and I believe
●
Uhm … nothing fancy yet
●
Business logic & server processes
●
Source: https://github.com/the42/ogdat/
9. Lessons learned
●
There are many (minor) issues with metadata
●
Heroku is easy to get going
●
Go as a novel language is easy to develop in
– Built-in concurrency features come in handy when
checking eg. Urls in parallel
●
CKAN API@data.gv.at is not that fast and times
10. Contact
Johann Höchtl
johann.hoechtl@gmail.com
@myprivate42
http://www.slideshare.net/jhoechtl/
https://www.facebook.com/myprivate42
●