15. ELIS
–
Mul*media
Lab
I) Thou shalt … Know your Data
• Make the Scientists aware of …
• Understand your data assets:
• How was it collected/generated?
• How is it formatted? Is formatting consistent?
• How is it stored?
• Are there missing values? If so, which ones, why?
• Where/how can you process it?
• Are there duplicated values, codes?
• Over 80% of time is spent cleaning data
• Missing metadata
• Provenance & versioning
• How can experiments be re-done?
22. ELIS
–
Mul*media
Lab
15’ Open Data Publishing Framework
e.g.
data.gent.be
opendata.antwerpen.be
23. ELIS
–
Mul*media
Lab
Publishes 2 to 5 Star Data
tdt/core
tdt/input
triple store
24. ELIS
–
Mul*media
Lab
REST-full API for Developers
triple store
core
RESTful data adapter
CSV
XLS
JSON
XML
SPARQL
endpoint
...
e.g. datatank.gent.be/Grondgebied/Straten
or data.irail.be/NMBS/Stations
33. ELIS
–
Mul*media
Lab
II) Thou shalt … Provide Tooling
• Make the Scientists aware of …
• This is not a one-size-fits-all process
• Many Open Source Software & libraries
• Consider security and privacy issues
• Find an IT-partner within your organization
45. ELIS
–
Mul*media
Lab
III) Thou shalt … Embrace Open Access by Default
• Make the Scientists & the Management Board aware of …
• The importance of having an E2E metadata workflow in
place, i.e. & a.o., for each publication have your Authors
& Organizations rigidly formalised (thus, formalise the
metadata at the Source)
63. ELIS
–
Mul*media
Lab
QUESTIONS?
dr. Erik Mannens
erik.mannens@ugent.be
@erikmannens
Thoughts?
64. ELIS
–
Mul*media
Lab
Credits
• O’Reilly Strata Conference – Keep Your Data Science
Efforts from Derailing (Sean Murphy – Data Community
DC)
• LDOW2013 – Ranking Universities Using Linked Open
Data (Rouzbeh Meymandpour – University of Sydney)
• Did not have time to check all licenses of the Flickr
photos – in my defense, I did not kill annyone