O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.
Próximos SlideShares
Illik verteilte systeme
Illik verteilte systeme
Carregando em…3
×
1 de 19

Comsode tools - pushing data to open ecosystem

2

Compartilhar

Baixar para ler offline

Description of Open Data Node and Methodology for publication of open data - slightly focused on libraries.
Anuthor: Jindrich Mynarz

Audiolivros relacionados

Gratuito durante 30 dias do Scribd

Ver tudo

Comsode tools - pushing data to open ecosystem

  1. 1. COMSODE tools Pushing data to the open ecosystem Jindřich Mynarz EEA.sk ELAG 2015 Stockholm June 9, 2015
  2. 2. The gist of the talk To save legacy library data and satisfy internal and external requirements on your data you need ETL. “Libraries have to focus on making their data infrastructure more efficient if they want to keep up with the ever changing needs of their audience and invest in sustainable service development.” — Lukas Koster (source)
  3. 3. Building tools to publish & reuse open data EU FP7 project (2013➝2015) Project partners: ● University of Milano-Bicocca, Italy ● Charles University in Prague, Czech Republic ● EEA, Czech Republic and Slovakia ● ADDSEN, Slovakia ● Spinque, the Netherlands ● Ministry of Interior of the Slovak Republic
  4. 4. Legacy library data Save the data? ● …or let it go? ● What’s the cost of recovering the legacy? ● To save legacy data you need automation ⇒ ETL ● Unfortunately, paraphrasing Tolstoy, “tidy datasets are all alike but every messy dataset is messy in its own way.” (source)
  5. 5. Confusion of tongues ● MARC used to be (or still is?) the lingua franca. What's next? ● Many data formats required to be supported ● MARC→Web impedance mismatch ● Export & import in systems integration
  6. 6. Open Data Node “(Linked) open data plumbing” ● Open Data Node (ODN) is a platform for publishing (open) data & automating internal data flows that enables progressive enhancement of data. ● Main product of the COMSODE project ● Free, open source, modular, integrated (e. g., single sign-on)
  7. 7. Open Data Node networks ● Data replication (e.g., local copy of name authority file) ● Data synchronization (e. g., periodical harvesting of incremental updates via OAI-PMH) ● Data distribution (e.g., shared cataloguing)
  8. 8. Open Data Node workflow 1. Catalogue your internal data 2. Create a data processing pipeline for the datasets to be published 3. Schedule the pipeline to be run to publish the data
  9. 9. Internal catalogue ● Map out the data you have or external data you use; both open and closed. ● If data cannot be found, it is as if it did not exist, so make data discoverable and provide it with descriptive metadata (DCAT-AP). ● Based on CKAN.
  10. 10. ● An extensible ETL tool with native RDF support for automating repetitive data exchange and transformation tasks. ● Allows you to define, execute, monitor, debug (examine intermediate data), schedule, and share (import/export) data transformations. ● Open source, dual-licensed to enable commercial extensions
  11. 11. Extract-Transform-Load pipeline Data flow of an ETL process in UnifiedViews is defined as a pipeline composed of data processing units.
  12. 12. Data processing units Extractors ● Download file ● Load from SQL database ● SPARQL endpoint extractor Transformers ● Zip/unzip ● Find/replace ● Parse and serialize RDF ● SPARQL Update ● XSLT ● ISO 2709 to MARCXML ● SPARQL SELECT to CSV Loaders ● Files upload ● Load to Virtuoso ● Load to SQL database + Quality Assessment
  13. 13. Public catalogue ● Public interface that enables users to discover & access your data. ● Links to data dumps, APIs (REST API, SPARQL endpoint), and applications based on the data. ● Provides metadata, such as licence, dataset maintainer’s contact, or last update date. ● Based on CKAN.
  14. 14. COMSODE methodology ● Guidelines on how to use ODN for those with little open data experience ● Defines phases, practices, roles, and artifacts. ● Phases: a. Development of open data publication plan b. Preparation of publication c. Realization of publication d. Archiving http://opendatanode.org/product/methodology-for-od-publishing
  15. 15. Open Data Node in use ● Reality check ○ Eating our own dog food ○ Testing the ODN’s versatility ● 150 datasets transformed by COMSODE partners ● Supporting 10 pilot projects, including: ○ eDemokracia: Slovak nation-wide e-government project ○ Czech Trade Inspection Authority ○ Slovak Environment Agency ○ Slovak National Library
  16. 16. Slovak National Library COMSODE pilot
  17. 17. Demo time!
  18. 18. Impact ● Improve your internal & external data flows. ● Libraries are required to publish data by the EU directive on the re-use of public sector information. ○ If you release MARC, is the cost of access to the data marginal? ● Insiders have access, yet outsiders often have more experience to build value upon the data.
  19. 19. In conclusion ♫ The pipelines, the pipelines are calling... ♫ To save legacy library data and satisfy internal and external requirements on your data you need ETL. http://opendatanode.org Image credits from the Noun Project: Database by Dmitry Baranovskiy, Counter by Sergey Demushkin, Ventil by Sergey Demushkin, Spider Web by Denis, Scroll by EliRatus, Chest by Victor Escorsin, Pipes by Christopher T. Howlett, Adoption by Luis Prado, Plumber by Luis Prado, Filter by Muneer A.Safiah, Lock by Alex Auda Samora, Lego by Jon Trillana, Atom by Mister Pixel

×