O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

What Factors Influence the Design of a Linked Data Generation Algorithm?

Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Carregando em…3
×

Confira estes a seguir

1 de 57 Anúncio

What Factors Influence the Design of a Linked Data Generation Algorithm?

Baixar para ler offline

Generating Linked Data remains a complicated and intensive engineering process. While different factors determine how a Linked Data generation algorithm is designed, potential alternatives for each factor are currently not considered when designing the tools’ underlying algorithms. Certain design patterns are frequently ap- plied across different tools, covering certain alternatives of a few of these factors, whereas other alternatives are never explored. Consequently, there are no adequate tools for Linked Data generation for certain occasions, or tools with inadequate and inefficient algorithms are chosen. In this position paper, we determine such factors, based on our experiences, and present a preliminary list. These factors could be considered when a Linked Data generation algorithm is designed or a tool is chosen. We investigated which factors are covered by widely known Linked Data generation tools and concluded that only certain design patterns are frequently encountered. By these means, we aim to point out that Linked Data generation is above and beyond bare implementations, and algorithms need to be thoroughly and systematically studied and exploited.

Generating Linked Data remains a complicated and intensive engineering process. While different factors determine how a Linked Data generation algorithm is designed, potential alternatives for each factor are currently not considered when designing the tools’ underlying algorithms. Certain design patterns are frequently ap- plied across different tools, covering certain alternatives of a few of these factors, whereas other alternatives are never explored. Consequently, there are no adequate tools for Linked Data generation for certain occasions, or tools with inadequate and inefficient algorithms are chosen. In this position paper, we determine such factors, based on our experiences, and present a preliminary list. These factors could be considered when a Linked Data generation algorithm is designed or a tool is chosen. We investigated which factors are covered by widely known Linked Data generation tools and concluded that only certain design patterns are frequently encountered. By these means, we aim to point out that Linked Data generation is above and beyond bare implementations, and algorithms need to be thoroughly and systematically studied and exploited.

Anúncio
Anúncio

Mais Conteúdo rRelacionado

Diapositivos para si (20)

Semelhante a What Factors Influence the Design of a Linked Data Generation Algorithm? (20)

Anúncio

Mais de andimou (10)

Mais recentes (20)

Anúncio

What Factors Influence the Design of a Linked Data Generation Algorithm?

  1. 1. What Factors Influence the Design of a Linked Data Generation Algorithm? Anastasia Dimou, Pieter Heyvaert, Ben De Meester, Ruben Verborgh imec.be - IDLab.technology Anastasia.Dimou@ugent.be @natadimou 23/04/2018
  2. 2. Semantic Web technologies for intelligent software agents
  3. 3. intelligent software agents not enough Linked Data limited interaction with heterogeneous data
  4. 4. human agents do not put effort to provide Linked Data until there are software agents that use it
  5. 5. Semantic Web and Linked Data vicious cycle
  6. 6. facilitate Linked Data generation by reducing data & semantic heterogeneity increasing Linked Data quality
  7. 7. How successful are we after all with the design of our Linked Data generation algorithms?
  8. 8. Discuss SotA Reflect on factors Outline observations
  9. 9. Discuss SotA Reflect on factors Outline observations
  10. 10. data owner custom implementation Linked Data my data
  11. 11. data owner Linked Data Linked Data Linked Data Linked Data DB CSV XML JSON format-specific solutions - multiple data sources
  12. 12. data owner Linked Data Linked Data Linked Data Linked Data DB CSV XML JSON rules definition & execution separation R2RML
  13. 13. Ultrawrap Ontop https://github.com/ontop/ontop Morph https://github.com/oeg-upm/morph-rdb R2RMLParser https://github.com/antidot/db2triples R2RML implementations
  14. 14. Ultrawrap Sequeda et al. Ultrawrap: SPARQL execution on relational data. JWS 2013 Ontop Rodriguez-Muro et al. Efficient SPARQL-to-SQL with R2RML mappings. JWS 2015 Morph Priyatna et al. Formalisation and Experiences of R2RML based SPARQL to SQL query translation using Morph. WWW2014 R2RMLParser Konstantinou et al. An Approach for the Incremental Export of Relational Databases into RDF Graphs. IJAIT 2015 R2RML implementations
  15. 15. Ultrawrap Sequeda et al. Ultrawrap: SPARQL execution on relational data. JWS 2013 Ontop Rodriguez-Muro et al. Efficient SPARQL-to-SQL with R2RML mappings. JWS 2015 Morph Priyatna et al. Formalisation and Experiences of R2RML based SPARQL to SQL query translation using Morph. WWW2014 R2RMLParser Konstantinou et al. An Approach for the Incremental Export of Relational Databases into RDF Graphs. IJAIT 2015 R2RML implementations SPARQLtoSQL translation/rewriting algorithm incremental RDF graph generation
  16. 16. data owner Linked Data Linked Data Linked Data Linked Data DB CSV XML JSON rules definition & execution separation R2RML
  17. 17. data owner DB CSV XML JSON uniform declaration & execution Linked Data
  18. 18. any data source Logical Source non-relational DBs Nested Relational Model RML Dimou et al. RML: A Generic Language for Integrated RDF Mappings of Heterogeneous Data. LDOW2014 D2RML Chortaras et al. D2RML: Integrating Heterogeneous Data and Web Services into Custom RDF Graphs. LDOW2018 xR2RML Michel et al. Translation of relational and non-relational databases into RDF with xR2RML. WEBIST2015 KR2RML Slepicka et al. KR2RML: An Alternative Interpretation of R2RML for Heterogeneous Sources. COLD2015 R2RML extensions for heterogeneous data sources
  19. 19. XSPARQL Akhtar et al. XSPARQL: Traveling between the XML and RDF Worlds – and Avoiding the XSLT Pilgrimage. ESWC2008 SPARQL-Generate Lefrancois et al. A SPARQL extension for generating RDF from heterogeneous formats. ESWC2017 Datalift Scharffe et al. Enabling Linked Data Publication with the Datalift Platform. AAAI2012 Morph-streams Morph-streams: SPARQLStream OBDA in action. SR4LD2014 I’m sorry if I didn’t mention your language! other languages for heterogeneous data sources parallel XML & DB pipelines SPARQL-based mapping language direct mapping - SPARQL construct import data streams in relational DB
  20. 20. RML Dimou et al. RML: A Generic Language for Integrated RDF Mappings of Heterogeneous Data. LDOW2014 D2RML Chortaras et al. D2RML: Integrating Heterogeneous Data and Web Services into Custom RDF Graphs. LDOW2018 xRML Michel et al. Translation of relational and non-relational databases into RDF with xR2RML. WEBIST2015 KR2RML Slepicka et al. KR2RML: An Alternative Interpretation of R2RML for Heterogeneous Sources. COLD2015 XSPARQL Akhtar et al. XSPARQL: Traveling between the XML and RDF Worlds – and Avoiding the XSLT Pilgrimage. ESWC2008 SPARQL-Generate Lefrancois et al. A SPARQL extension for generating RDF from heterogeneous formats. ESWC2017 Datalift Scharffe et al. Enabling Linked Data Publication with the Datalift Platform. AAAI2012 Morph-streams Morph-streams: SPARQLStream OBDA in action. SR4LD2014 I’m sorry if I didn’t mention your language! other languages for heterogeneous data sourcesR2RML extensions for heterogeneous data sources
  21. 21. We have diverse languages to define Linked Data generation rules! We look into languages with respect to their expressivity OR effort to edit OR ..., BUT how systematic are our evaluations? Do we focus too much on languages & we forget about implementations? speculations on languages
  22. 22. data owner DB CSV XML JSON Linked Data
  23. 23. RML https://github.com/RMLio/RML-Mapper https://github.com/carml/carml D2RML xR2RML https://github.com/frmichel/morph-xr2rml KR2RML https://github.com/usc-isi-i2/Web-Karma/tree/master/kar ma-common/src/main/java/edu/isi/karma/kr2rml XSPARQL https://github.com/semantalytics/xsparql SPARQL-Generate https://github.com/thesmartenergy/sparql-generate Datalift https://gforge.inria.fr/scm/?group_id=2935 Morph-streams https://github.com/jpcik/morph-streams I’m sorry if I don’t mention your implementation! other implementations for heterogeneous data sourcesR2RML extensions for heterogeneous data sources
  24. 24. For each language, there is its own implementations :-( We start having more implementations/language :-) BUT is the gain due to the language or the implementation? how much implementations differ among each other? should we look into languages wrt the implementations they allow OR the automation they allow on top of them OR….? speculations on languages and their implementations
  25. 25. We start dealing with heterogeneity :-) data format (DBs, CSV, XML, JSON) interfaces (DBs, Web APIs, ) Big Data BUT we mainly focus on static data :-( Do we cover all cases? What about streaming data? Can we generate a Linked Data stream from a raw data stream? speculations on implementations - input data
  26. 26. We start dealing with data retrieval :-) streaming static big data instead of loading on memory NOT just import data in DB BUT Nested Relational model incremental Linked Data generation BUT is this all we can do wrt data retrieval? speculations on implementations - input data complexity
  27. 27. Explored algorithms related to SPARQLtoSQL translation/rewriting BUT how mature is our SPARQLtoSQL translation/rewriting? MORE should we also focus on optimizations wrt rules executions? execution planning? others? speculations on implementations - query complexity
  28. 28. What factors influence the design of your Linked Data generation algorithm?
  29. 29. Discuss SotA Reflect on factors Outline observations
  30. 30. Factors purpose materialization diversity dynamicity direction location driving force trigger complexity
  31. 31. Factors purpose materialization diversity dynamicity direction location driving force trigger complexity
  32. 32. data owner Linked Data raw data data consumer Linked Data raw data production-driven consumption-driven purpose
  33. 33. Factors purpose (production Vs consumption-driven) materialization diversity dynamicity direction location driving force trigger complexity
  34. 34. dumping on-the-fly data owner Linked Data raw data data owner Linked Data raw data materialization
  35. 35. Factors purpose (production Vs consumption-driven) materialization (dumping Vs on-the-fly) diversity dynamicity direction location driving force trigger complexity
  36. 36. homogeneity heterogeneity table CSV XML JSON Linked Data CSV CSV CSV Linked Data diversity
  37. 37. Factors purpose (production Vs consumption-driven) materialization (dumping Vs on-the-fly) diversity (homogeneity Vs heterogeneity) dynamicity direction location driving force trigger complexity
  38. 38. static dynamic dynamicity
  39. 39. Factors purpose (production Vs consumption-driven) Materialization (dumping Vs on-the-fly) diversity (homogeneity Vs heterogeneity) dynamicity (static Vs dynamic) direction location driving force trigger complexity
  40. 40. target-centric source-centric table CSV XML JSON Linked Data table CSV XML JSON Linked Data Linked Data Linked Data Linked Data Linked Data direction
  41. 41. Factors purpose (production Vs consumption-driven) materialization (dumping Vs on-the-fly) diversity (homogeneity Vs heterogeneity) dynamicity (static Vs dynamic) Direction (target Vs source-centric) location driving force trigger complexity
  42. 42. in-situ table CSV Linked Data XML JSON Linked Data XML JSON Linked Data remote location
  43. 43. Factors purpose (production Vs consumption-driven) materialization (dumping Vs on-the-fly) diversity (homogeneity Vs heterogeneity) dynamicity (static Vs dynamic) direction (target Vs source-centric) location (in-situ Vs remote) driving force trigger complexity
  44. 44. Linked Data raw data mapping-driven data-driven rules rules Linked Data raw data 1 2 2 1 driving force
  45. 45. Factors purpose (production Vs consumption-driven) materialization (dumping Vs on-the-fly) diversity (homogeneity Vs heterogeneity) dynamicity (static Vs dynamic) direction (target Vs source-centric) location (in-situ Vs remote) driving force (mapping Vs data-driven) trigger complexity
  46. 46. Linked Data raw data on demand real-time rules rules Linked Data raw data 1 2 2 1 data owner data consumer raw data raw data rules Linked Data raw data 2 1 data owner trigger
  47. 47. Factors purpose (production Vs consumption-driven) materialization (dumping Vs on-the-fly) diversity (homogeneity Vs heterogeneity) dynamicity (static Vs dynamic) direction (target Vs source-centric) location (in-situ Vs remote) driving force (mapping Vs data-driven) trigger (on-demand Vs real-time) complexity
  48. 48. Linked Data raw data data / rules query translation/rewriting rules rules Linked Data raw data query complexity
  49. 49. Factors purpose (production Vs consumption-driven) materialization (dumping Vs on-the-fly) diversity (homogeneity Vs heterogeneity) dynamicity (static Vs dynamic) direction (target Vs source-centric) location (in-situ Vs remote) driving force (mapping Vs data-driven) trigger (on-demand Vs real-time) complexity (data/rules, SPARQLtoSQL translation)
  50. 50. Discuss SotA Reflect on factors Outline observations
  51. 51. consumption-driven tools both static and dynamic data only homogeneous data used for production purposes NOT optimized for production optimized for SPARQLtoSQL translation/rewriting conclusions production-driven tools heterogeneous BUT static data dynamic from otherwise static data to deal with Big Data (latest trend) NO optimization for complexity
  52. 52. NO tools which support data-driven approach real-time data (data/rules) complexity Certain design patterns are frequently applied, covering certain alternatives of a few of factors, whereas alternatives are never explored NO adequate tools for Linked Data generation for certain occasions OR inadequate and inefficient tools are chosen as best alternatives conclusions
  53. 53. We still lack an in-depth understanding of complexity & many degrees of freedom in designing algorithms to generate Linked Data
  54. 54. This prevents agents from effortlessly generating Linked Data and profiting from Semantic Web technologies
  55. 55. Should we perhaps reconsider the directions we take for Linked Data generation?
  56. 56. Once we exploit how Linked Data is best generated in different cases, intelligent software agents will have enough Linked Data to work with but ‘till then we should not consider Linked Data availability as granted
  57. 57. What Factors Influence the Design of a Linked Data Generation Algorithm? Anastasia Dimou, Pieter Heyvaert, Ben De Meester, Ruben Verborgh imec.be - IDLab.technology Anastasia.Dimou@ugent.be @natadimou 23/04/2018

×