O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Introduction to the FP7 CODE project @ BDBC

781 visualizações

Publicada em

The FP7 CODE project will be presented at the Big Data Benchmarking Community call. Here, a high-level overview shall introduce CODEs vision and show the progress after 6-months.

Publicada em: Tecnologia
  • Seja o primeiro a comentar

  • Seja a primeira pessoa a gostar disto

Introduction to the FP7 CODE project @ BDBC

  1. 1. Big Data Benchmarking Community Call CODECommercial Empowered Linked Open Data Ecosystem in Research presented by Florian Stegmaier University of Passau 2012-10-04
  2. 2. Some basic facts...• Budget: 2,4M €, funded by the European Commission• Started in May 2012 with a runtime of 2 years 2
  3. 3. Current situation• Data is being produced in an immense rate: • Evaluation campaigns (e.g., CLEF campaign) • Benchmarking communities (e.g., TPC) • Researchers (e.g., proceedings, journals or slides) Most data remains unstructured and sophisticated access methods are missing! 3
  4. 4. Why is this a problem?!• From a data perspective... • ...what is the quality of data? • ...how to deal with missing values?• From a user perspective... • ...how can i compare this data? baseline? • ...are there contradicting facts? The semantics of documents must be unleashed to make them accessible and processible! 4
  5. 5. The long way to knowledge... 5
  6. 6. Step 1: Analyze data• Analysis of documents has to find: • Structural elements (TOC, images, etc.) • Extract facts and numerical measures • Disambiguate facts (from „string“ to „object“)• Automatic annotation is defective or not complete • Crowdsourced annotation of documents • Marketplace offers revenue for expert knowledge 6
  7. 7. Step 2: Lift and extend data• Extracted and disambiguated data will be lifted into the Linked Data cloud • Interlink with already existent data of the cloud • Enrich data with provenance information (increase quality estimations) • Perform OLAP queries on data cubes (e.g., time series) Enriched and aggregated data is exposed as Linked Data endpoint. 7
  8. 8. Step 3: Interact with data• Query wizard will focus on: • Excel based interaction possibilities • On the fly creation of statistical analyses on a federated dataset• Marketplace encourages users to interact with dataNon-IT (but maybe domain) experts are able to create visual analytics as well as create new data cubes. 8
  9. 9. ...does all this actually work?Current analysis of PDFsis able to discover basictable of contents,reading direction, as wellas specific objects. 9
  10. 10. ...how are the users involved?One possible way toengage users inannotating data is theMendeley Desktop.(early stage) 10
  11. 11. ...what about lifting data?Basic triplificationchain established tolift table based datainto a Semantic Webcompatible datacube. 11
  12. 12. ...how can i find data?The first prototype ofthe query wizard is ableto show and interact withretrieved data in a Excel-like manner. 12
  13. 13. ...and the marketplace?The data can be exposedin several ways, just likein mind maps to help thingsgetting structured.(example shows biggerplate) 13
  14. 14. Thank you for your attention! http://www.code-research.eu/ https://www.facebook.com/CODEresearchEU #CODEresearchEU (Twitter)Thanks to Michael Granitzer, Christin Seifert, Kai Schlegel and Sebastian Bayerl for supporting me with input, figures and slide templates and last butnot least our consortium for the prototype screenshots ;)