SlideShare uma empresa Scribd logo
1 de 17
UnifiedViews: Towards ETL Tool
for Simple yet Powerful RDF Data Management
T. Knap, P. Škoda, J. Klímek, M. Nečaský
http://xrg.cz | knap@ksi.mff.cuni.cz
XML and Web Engineering Research Group
Faculty of Mathematics and Physics
Charles University in Prague, Czech Republic
Dateso 2015
Agenda
 UnifiedViews
 Basic concepts, Impact
 Areas of ongoing and future work
Dateso 2015
UnifiedViews
Basic Concepts, Impact
Dateso 2015
UnifiedViews
 an Extract-Transform-Load (ETL)
framework with UI that allows users to
define, execute, monitor, debug, schedule,
and share RDF data processing tasks
 UnifiedViews differs from other ETL
frameworks by natively supporting processing
of RDF data.
Dateso 2015
A Pipeline
 Every data processing task is modelled as a pipeline in
UnifiedViews
 Every pipeline consists of one or more DPUs (data
processing units) and arrows depicting data flow
 a
Dateso 2015
A Data Processing Unit (DPU)
 Plugin, which encapsulates certain functionality, typically on top of
RDF data
 Users may prepare custom plugins
 Every DPU has its inputs, outputs, business logic and configuration
 E.g., DPU may apply SPARQL Update query to the input RDF data and
produce output RDF data
 a
Dateso 2015
Key Features
 Web administration interface:
 Define and manage pipelines
 Validate, execute, monitor and debug pipelines
 Possibility to schedule tasks, set up notifications about the pipeline executions
 Define and manage DPUs
 Possibility to debug inputs to/outputs from DPUs
 Possibility to share pipelines and DPUs
 Possibility to get notifications about the result of the pipeline execution
 Multi-user environment
 Engine running the tasks
 Ensures that DPUs on the pipeline are executed in the proper order
 It may send notifications about the result of the pipeline execution
 Core DPUs to work with RDF data
 Easy way how to extend Unified Views with your own DPUs
 Every DPU is an OSGi bundle, as a result, two DPUs using two different
versions of the same library may coexist in the framework
Dateso 2015
Impact of UnifiedViews
 Projects
• OpenData.cz initiative
• INTLIB (2012-2014) – TaCR project
• LOD2 (2011-2014) – EU FP7 project
• UnifiedViews integrated into the LOD2 stack
• COMSODE (2013-2015) – EU FP7 project
• Open Data Node contains UnifiedViews
• YourDataStories (2015+), H2020
• TenForce, Belgium
 also commercial projects
• Semantic Web Company (Austria),
• EEA s.r.o. (SK)
Dateso 2015
UnifiedViews
Ongoing and Future Work
Dateso 2015
Automatic Schema Alignment and
Object Linkage
 Object Linkage:
 Motivation: If various datasets use the same identifiers for the same
real world objects (cities, countries), level of data integration is
increased and costs of ad-hoc application integration is reduced
 Goal: To automatically discover that certain columns in the processed
tabular data represent certain types of data (e.g. cities, countries) and
automatically mapping values in this column to Linked Data URIs taken
from the preferred dataset for the given type of data
 Schema Alignment:
 Motivation: increase understandability of the data and simplify reuse
of the data by various applications by using common vocabularies.
 Goal: To automatically suggest mappings of used RDF vocabulary
terms (e.g., predicated) to well-known RDF terms (e.g., predicates)
Dateso 2015
Simplicity of Use
 Hiding SPARQL Queries
 Goal: To provide set of DPUs for executing typical SPARQL query
operations on top of RDF data
 Autocompleting Terms from Well-known Vocabularies
 Goal: To Suggest and autocomplete vocabulary terms from well-
known Linked Data vocabularies
• Vocabulary autocomplete-aware controls (text boxes)
• Description of the term, formal def., recommended usage
 Wizards for Simple Definition of Data Processing Tasks
 Motivation: Defining data processing tasks typically requires
detailed knowledge of the DPUs that are available in the
deployed UnifiedViews instance;
 Goal: Step by step guides for defining new typical types of data
processing tasks, e.g, extracting and publishing tabular
Dateso 2015
Sustainability and Quality
 Sustainable RDF Data Processing
 Goal: To allow task designer to define for each DPU a set of
SPARQL queries, which tests that the output data
produced by the given DPU satisfies certain conditions. If
possible, automate creation of such queries.
 Assessing Quality of Produced Data, Recommendation
of Cleansing DPUs
 Motivation: task designer should be informed about any
problems in the data, e.g., w.r.t. syntactic/semantic
accuracy of the produced Linked Data or completeness of
the published datasets
 Goal: Set of DPUs assessing the quality of the data,
cleansing the data
Dateso 2015
Conclusions
Dateso 2015
Summary
 UnifiedViews – ETL tool for RDF data
processing
 Basic concepts, Impact
 Areas of ongoing and future work
Dateso 2015
Would you like to try UnifiedViews?
 UnifiedViews is available under open
source license
 GPLv3 + LGPLv3
 Hosted on GitHub
 Repository: https://github.com/UnifiedView
 Current latest version: Unified Views 2.0.1
 More info:
 unifiedviews.eu
Dateso 2015
Thank You!
Dateso 2015
How to contribute?
 Guideline for contributors:
 https://grips.semantic-
web.at/display/UDDOC/Guidelines+for+Contributors
Dateso 2015
Join the Unified Views Team

Mais conteúdo relacionado

Mais procurados

Regal - a Repository for Electronic Documents and Bibliographic Data
Regal - a Repository for Electronic Documents and Bibliographic DataRegal - a Repository for Electronic Documents and Bibliographic Data
Regal - a Repository for Electronic Documents and Bibliographic Data
Felix Ostrowski
 
Seige arndt-lightning talk swib13
Seige arndt-lightning talk swib13Seige arndt-lightning talk swib13
Seige arndt-lightning talk swib13
Leander Seige
 
Automated interpretability of linked data ontologies: an evaluation within th...
Automated interpretability of linked data ontologies: an evaluation within th...Automated interpretability of linked data ontologies: an evaluation within th...
Automated interpretability of linked data ontologies: an evaluation within th...
Nuno Freire
 

Mais procurados (20)

Overview of OSLC - INCOSE IW 2018 MBSE Workshop
Overview of OSLC - INCOSE IW 2018 MBSE Workshop Overview of OSLC - INCOSE IW 2018 MBSE Workshop
Overview of OSLC - INCOSE IW 2018 MBSE Workshop
 
Qualitative data analysis software's By Iqbal Rana
Qualitative data analysis software's By Iqbal RanaQualitative data analysis software's By Iqbal Rana
Qualitative data analysis software's By Iqbal Rana
 
Supervised Papers Classification on Large-Scale High-Dimensional Data with Ap...
Supervised Papers Classification on Large-Scale High-Dimensional Data with Ap...Supervised Papers Classification on Large-Scale High-Dimensional Data with Ap...
Supervised Papers Classification on Large-Scale High-Dimensional Data with Ap...
 
European Data Portal - ePSI platform webinar 8 February 2016
European Data Portal - ePSI platform webinar 8 February 2016European Data Portal - ePSI platform webinar 8 February 2016
European Data Portal - ePSI platform webinar 8 February 2016
 
Linked data-tooling-xml
Linked data-tooling-xmlLinked data-tooling-xml
Linked data-tooling-xml
 
Work Package 2 - Month 6 by Hannes Mühleisen
Work Package 2 - Month 6 by Hannes MühleisenWork Package 2 - Month 6 by Hannes Mühleisen
Work Package 2 - Month 6 by Hannes Mühleisen
 
Data Analytics.01. Data selection and capture
Data Analytics.01. Data selection and captureData Analytics.01. Data selection and capture
Data Analytics.01. Data selection and capture
 
General Presentation European Data Portal
General Presentation European Data PortalGeneral Presentation European Data Portal
General Presentation European Data Portal
 
Regal - a Repository for Electronic Documents and Bibliographic Data
Regal - a Repository for Electronic Documents and Bibliographic DataRegal - a Repository for Electronic Documents and Bibliographic Data
Regal - a Repository for Electronic Documents and Bibliographic Data
 
Enabling the digital thread using open OSLC standards
Enabling the digital thread using open OSLC standardsEnabling the digital thread using open OSLC standards
Enabling the digital thread using open OSLC standards
 
Eol Matthias Hutterer
Eol Matthias HuttererEol Matthias Hutterer
Eol Matthias Hutterer
 
Seige arndt-lightning talk swib13
Seige arndt-lightning talk swib13Seige arndt-lightning talk swib13
Seige arndt-lightning talk swib13
 
Automated interpretability of linked data ontologies: an evaluation within th...
Automated interpretability of linked data ontologies: an evaluation within th...Automated interpretability of linked data ontologies: an evaluation within th...
Automated interpretability of linked data ontologies: an evaluation within th...
 
BigDataEurope @BDVA Summit2016 2: Societal Pilots
BigDataEurope @BDVA Summit2016 2: Societal PilotsBigDataEurope @BDVA Summit2016 2: Societal Pilots
BigDataEurope @BDVA Summit2016 2: Societal Pilots
 
Claremont Report on Database Research: Research Directions (Le Gruenwald)
Claremont Report on Database Research: Research Directions (Le Gruenwald)Claremont Report on Database Research: Research Directions (Le Gruenwald)
Claremont Report on Database Research: Research Directions (Le Gruenwald)
 
Choosing the right software for your research study : an overview of leading ...
Choosing the right software for your research study : an overview of leading ...Choosing the right software for your research study : an overview of leading ...
Choosing the right software for your research study : an overview of leading ...
 
Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015
Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015
Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015
 
Geospatial Querying in Apache Marmotta - Apache Big Data North America 2016
Geospatial Querying in Apache Marmotta -  Apache Big Data North America 2016Geospatial Querying in Apache Marmotta -  Apache Big Data North America 2016
Geospatial Querying in Apache Marmotta - Apache Big Data North America 2016
 
LDBC 6th TUC Meeting conclusions by Peter Boncz
LDBC 6th TUC Meeting conclusions by Peter BonczLDBC 6th TUC Meeting conclusions by Peter Boncz
LDBC 6th TUC Meeting conclusions by Peter Boncz
 
SC1 Workshop 2 General Introduction to BDE
SC1 Workshop 2 General Introduction to BDESC1 Workshop 2 General Introduction to BDE
SC1 Workshop 2 General Introduction to BDE
 

Semelhante a UnifiedViews: Towards ETL Tool for Simple yet Powerful RDF Data Management.

Database Integrated Analytics using R InitialExperiences wi
Database Integrated Analytics using R InitialExperiences wiDatabase Integrated Analytics using R InitialExperiences wi
Database Integrated Analytics using R InitialExperiences wi
OllieShoresna
 
Dublinked tech workshop_15_dec2011
Dublinked tech workshop_15_dec2011Dublinked tech workshop_15_dec2011
Dublinked tech workshop_15_dec2011
Dublinked .
 
Oracle Data Integrator 11g Integration and Administration
Oracle Data Integrator 11g  Integration and AdministrationOracle Data Integrator 11g  Integration and Administration
Oracle Data Integrator 11g Integration and Administration
Md. Noor Alam
 
Education Data Standards Overview
Education Data Standards OverviewEducation Data Standards Overview
Education Data Standards Overview
Frank Walsh
 
Payola ESWC 2014 demo poster
Payola ESWC 2014 demo posterPayola ESWC 2014 demo poster
Payola ESWC 2014 demo poster
Jiří Helmich
 
Michael Lang Sr. Presentation
Michael Lang Sr. PresentationMichael Lang Sr. Presentation
Michael Lang Sr. Presentation
Mediabistro
 
Ajith_kumar_4.3 Years_Informatica_ETL
Ajith_kumar_4.3 Years_Informatica_ETLAjith_kumar_4.3 Years_Informatica_ETL
Ajith_kumar_4.3 Years_Informatica_ETL
Ajith Kumar Pampatti
 
Resume_Informatica&IDQ_4+years_of_exp
Resume_Informatica&IDQ_4+years_of_expResume_Informatica&IDQ_4+years_of_exp
Resume_Informatica&IDQ_4+years_of_exp
rajarao marisa
 

Semelhante a UnifiedViews: Towards ETL Tool for Simple yet Powerful RDF Data Management. (20)

Planetdata simpda
Planetdata simpdaPlanetdata simpda
Planetdata simpda
 
PlanetData: Consuming Structured Data at Web Scale
PlanetData: Consuming Structured Data at Web ScalePlanetData: Consuming Structured Data at Web Scale
PlanetData: Consuming Structured Data at Web Scale
 
Database Integrated Analytics using R InitialExperiences wi
Database Integrated Analytics using R InitialExperiences wiDatabase Integrated Analytics using R InitialExperiences wi
Database Integrated Analytics using R InitialExperiences wi
 
Dublinked tech workshop_15_dec2011
Dublinked tech workshop_15_dec2011Dublinked tech workshop_15_dec2011
Dublinked tech workshop_15_dec2011
 
Oracle Data Integrator 11g Integration and Administration
Oracle Data Integrator 11g  Integration and AdministrationOracle Data Integrator 11g  Integration and Administration
Oracle Data Integrator 11g Integration and Administration
 
Creating and Utilizing Linked Open Statistical Data for the Development of Ad...
Creating and Utilizing Linked Open Statistical Data for the Development of Ad...Creating and Utilizing Linked Open Statistical Data for the Development of Ad...
Creating and Utilizing Linked Open Statistical Data for the Development of Ad...
 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival data
 
Education Data Standards Overview
Education Data Standards OverviewEducation Data Standards Overview
Education Data Standards Overview
 
Payola ESWC 2014 demo poster
Payola ESWC 2014 demo posterPayola ESWC 2014 demo poster
Payola ESWC 2014 demo poster
 
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
 
Dwh faqs
Dwh faqsDwh faqs
Dwh faqs
 
SEMLIB Final Conference | DERI presentation
SEMLIB Final Conference | DERI presentationSEMLIB Final Conference | DERI presentation
SEMLIB Final Conference | DERI presentation
 
Michael Lang Sr. Presentation
Michael Lang Sr. PresentationMichael Lang Sr. Presentation
Michael Lang Sr. Presentation
 
Ajith_kumar_4.3 Years_Informatica_ETL
Ajith_kumar_4.3 Years_Informatica_ETLAjith_kumar_4.3 Years_Informatica_ETL
Ajith_kumar_4.3 Years_Informatica_ETL
 
Resume_Informatica&IDQ_4+years_of_exp
Resume_Informatica&IDQ_4+years_of_expResume_Informatica&IDQ_4+years_of_exp
Resume_Informatica&IDQ_4+years_of_exp
 
Data Integration And Visualization
Data Integration And VisualizationData Integration And Visualization
Data Integration And Visualization
 
LOD2 Webinar: UnifiedViews
LOD2 Webinar: UnifiedViewsLOD2 Webinar: UnifiedViews
LOD2 Webinar: UnifiedViews
 
How Data Virtualization Adds Value to Your Data Science Stack
How Data Virtualization Adds Value to Your Data Science StackHow Data Virtualization Adds Value to Your Data Science Stack
How Data Virtualization Adds Value to Your Data Science Stack
 
VenkatSubbaReddy_Resume
VenkatSubbaReddy_ResumeVenkatSubbaReddy_Resume
VenkatSubbaReddy_Resume
 
Resume ratna rao updated
Resume ratna rao updatedResume ratna rao updated
Resume ratna rao updated
 

Último

Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
masabamasaba
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
VictoriaMetrics
 

Último (20)

WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
 
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 

UnifiedViews: Towards ETL Tool for Simple yet Powerful RDF Data Management.

  • 1. UnifiedViews: Towards ETL Tool for Simple yet Powerful RDF Data Management T. Knap, P. Škoda, J. Klímek, M. Nečaský http://xrg.cz | knap@ksi.mff.cuni.cz XML and Web Engineering Research Group Faculty of Mathematics and Physics Charles University in Prague, Czech Republic Dateso 2015
  • 2. Agenda  UnifiedViews  Basic concepts, Impact  Areas of ongoing and future work Dateso 2015
  • 4. UnifiedViews  an Extract-Transform-Load (ETL) framework with UI that allows users to define, execute, monitor, debug, schedule, and share RDF data processing tasks  UnifiedViews differs from other ETL frameworks by natively supporting processing of RDF data. Dateso 2015
  • 5. A Pipeline  Every data processing task is modelled as a pipeline in UnifiedViews  Every pipeline consists of one or more DPUs (data processing units) and arrows depicting data flow  a Dateso 2015
  • 6. A Data Processing Unit (DPU)  Plugin, which encapsulates certain functionality, typically on top of RDF data  Users may prepare custom plugins  Every DPU has its inputs, outputs, business logic and configuration  E.g., DPU may apply SPARQL Update query to the input RDF data and produce output RDF data  a Dateso 2015
  • 7. Key Features  Web administration interface:  Define and manage pipelines  Validate, execute, monitor and debug pipelines  Possibility to schedule tasks, set up notifications about the pipeline executions  Define and manage DPUs  Possibility to debug inputs to/outputs from DPUs  Possibility to share pipelines and DPUs  Possibility to get notifications about the result of the pipeline execution  Multi-user environment  Engine running the tasks  Ensures that DPUs on the pipeline are executed in the proper order  It may send notifications about the result of the pipeline execution  Core DPUs to work with RDF data  Easy way how to extend Unified Views with your own DPUs  Every DPU is an OSGi bundle, as a result, two DPUs using two different versions of the same library may coexist in the framework Dateso 2015
  • 8. Impact of UnifiedViews  Projects • OpenData.cz initiative • INTLIB (2012-2014) – TaCR project • LOD2 (2011-2014) – EU FP7 project • UnifiedViews integrated into the LOD2 stack • COMSODE (2013-2015) – EU FP7 project • Open Data Node contains UnifiedViews • YourDataStories (2015+), H2020 • TenForce, Belgium  also commercial projects • Semantic Web Company (Austria), • EEA s.r.o. (SK) Dateso 2015
  • 10. Automatic Schema Alignment and Object Linkage  Object Linkage:  Motivation: If various datasets use the same identifiers for the same real world objects (cities, countries), level of data integration is increased and costs of ad-hoc application integration is reduced  Goal: To automatically discover that certain columns in the processed tabular data represent certain types of data (e.g. cities, countries) and automatically mapping values in this column to Linked Data URIs taken from the preferred dataset for the given type of data  Schema Alignment:  Motivation: increase understandability of the data and simplify reuse of the data by various applications by using common vocabularies.  Goal: To automatically suggest mappings of used RDF vocabulary terms (e.g., predicated) to well-known RDF terms (e.g., predicates) Dateso 2015
  • 11. Simplicity of Use  Hiding SPARQL Queries  Goal: To provide set of DPUs for executing typical SPARQL query operations on top of RDF data  Autocompleting Terms from Well-known Vocabularies  Goal: To Suggest and autocomplete vocabulary terms from well- known Linked Data vocabularies • Vocabulary autocomplete-aware controls (text boxes) • Description of the term, formal def., recommended usage  Wizards for Simple Definition of Data Processing Tasks  Motivation: Defining data processing tasks typically requires detailed knowledge of the DPUs that are available in the deployed UnifiedViews instance;  Goal: Step by step guides for defining new typical types of data processing tasks, e.g, extracting and publishing tabular Dateso 2015
  • 12. Sustainability and Quality  Sustainable RDF Data Processing  Goal: To allow task designer to define for each DPU a set of SPARQL queries, which tests that the output data produced by the given DPU satisfies certain conditions. If possible, automate creation of such queries.  Assessing Quality of Produced Data, Recommendation of Cleansing DPUs  Motivation: task designer should be informed about any problems in the data, e.g., w.r.t. syntactic/semantic accuracy of the produced Linked Data or completeness of the published datasets  Goal: Set of DPUs assessing the quality of the data, cleansing the data Dateso 2015
  • 14. Summary  UnifiedViews – ETL tool for RDF data processing  Basic concepts, Impact  Areas of ongoing and future work Dateso 2015
  • 15. Would you like to try UnifiedViews?  UnifiedViews is available under open source license  GPLv3 + LGPLv3  Hosted on GitHub  Repository: https://github.com/UnifiedView  Current latest version: Unified Views 2.0.1  More info:  unifiedviews.eu Dateso 2015
  • 17. How to contribute?  Guideline for contributors:  https://grips.semantic- web.at/display/UDDOC/Guidelines+for+Contributors Dateso 2015 Join the Unified Views Team

Notas do Editor

  1. Priklad ulohy It may employ custom plugins (data processing units, DPUs) created by users. General Problem with RDF data processing: Consumers have to write most of the logic to define, execute, monitor, schedule, and share RDF data processing tasks
  2. online platform for data exploitation focused in the financial flows that are critical for transparency, collaboration and participation
  3. To realise 1), first, it is necessary to identify that certain columns contain certain types of values; such identification is always probabilistic and typically based on the comparison of the name of the column with the list of names of the RDF classes and/or based on matching sample data from the considered column against known codelists, such as list of Czech cities; experiments are needed to decide the particular algorithm for identification of types among input data. Second step to realise 1) is to apply predefined Silk~\cite{DBLP:conf/www/VolzBGK09} rules for the given identified type of data within the column of the input tabular data. To realise 2), various schema matching techniques has to be experimented~\cite{Rahm:2001:SAA:767149.767154}.
  4. Evolution of DPUs (Done) Proper handling of version migrations