SlideShare uma empresa Scribd logo
1 de 21
Baixar para ler offline
Using SPARQL and SPIN for
 Data Quality Management
   on the Semantic Web
  Christian Fürber / Martin Hepp
   christian@fuerber.com, mhepp@computer.org

              Presentation @ BIS
                 May 4th 2010
Vision of the Semantic Web
                                                         Publishing data on the
                                                      web in a meaningful way for
                                                           more automation,
                                                           better integration,
                                                     and higher reusability of data.


            © Hanspeter Graf / www.pixelio.de


C. Fürber, M. Hepp: Using SPARQL and SPIN for Data
Quality Management on the Semantic Web                                             2
Growth of Data:                                                                                             Retrieving
                                                                                                            information

 Well on Track…

                                                                                                          Building smart
                      Reference: http://www4.wiwiss.fu-berlin.de/bizer/pub/lod-datasets_2009-03-05.html
                                                                                                          SemWeb apps

C. Fürber, M. Hepp: Using SPARQL and SPIN for Data
Quality Management on the Semantic Web                                                                               3
…but what if the published data was of
                poor quality?

                                         Get a giant
                                         camcorder
                                            from
                                          amazon!




C. Fürber, M. Hepp:                              4
Using SPARQL and SPIN for Data Quality
Management on the Semantic Web
Using Poor Data is Costly
   Without quality checks your SemWeb Apps will
   take this data seriously and…

                                         …get an oversized shipping
                                         package with expensive postage,

…and waste transportation capacity.


C. Fürber, M. Hepp:                                                    5
Using SPARQL and SPIN for Data Quality
Management on the Semantic Web
Is there any way to avoid data
                        quality disasters?

Yes, if we know about data quality
problems, before anything bad will
              happen!
                                                        A giant
                                                     camcorder on
                                                       the road!
C. Fürber, M. Hepp: Using SPARQL and SPIN for Data
Quality Management on the Semantic Web                              6
The Impact of Poor Data Quality

                                                                                   Higher Costs


                                                                                 Missed Revenues
                                                          Poor Decisions

                                                                                  Lower Product /
                                                     Failed Business Processes    Service Quality

                                                          Failed Projects        Lower Stakeholder
                                                                                    Satisfaction

                                                                                  Fatal Disasters


C. Fürber, M. Hepp: Using SPARQL and SPIN for Data
Quality Management on the Semantic Web                                                              7
Data Quality is a Key Bottleneck of the
  Unique value violation
                         Semantic Web
<vocab:location rdf:about="http://www.stockdbdemo2.com/stockdblocation/1">
  <vocab:location_ZIP></vocab:location_ZIP>                   Missing literal values
  <vocab:location_STREETNO></vocab:location_STREETNO>
  <vocab:location_COUNTRY>France</vocab:location_COUNTRY>
  <vocab:location_ID rdf:datatype="http://www.w3.org/2001/XMLSchema#int"
  >1</vocab:location_ID>
  <vocab:location_STREET>8489 Strong St.</vocab:location_STREET>
  <vocab:location_STATE>NV</vocab:location_STATE>
  <rdfs:label>location #1</rdfs:label>                          Functional dependency
                                                                         violation
  <vocab:location_CITY>Las Vegas</vocab:location_CITY>
 </vocab:location>

                  Syntax violation

C. Fürber, M. Hepp: Using SPARQL and SPIN for Data
Quality Management on the Semantic Web                                           8
<vocab:location rdf:about="http://www.stockdbdemo2.com/stockdblocation/1">
                                                                            <vocab:location_ZIP></vocab:location_ZIP>



                                                     Our Approach           <vocab:location_STREETNO></vocab:location_STREETNO>
                                                                            <vocab:location_COUNTRY>France</vocab:location_COUNTRY>
                                                                            <vocab:location_ID rdf:datatype="http://www.w3.org/2001/XMLSchema#int"
                                                                            >1</vocab:location_ID>
                                                                            <vocab:location_STREET>8489 Strong St.</vocab:location_STREET>
                                                                            <vocab:location_STATE>NV</vocab:location_STATE>
                                                                            <rdfs:label>location #1</rdfs:label>
                                                                            <vocab:location_CITY>Las Vegas</vocab:location_CITY>
                                                                           </vocab:location>




   Identification of data quality problems on
      instance level of Semantic Web sources
     solely with Semantic Web technologies.
                                                      Integration advantages

                                                      Access to SemWeb data may be
                                                      useful for dqm.

C. Fürber, M. Hepp: Using SPARQL and SPIN for Data
Quality Management on the Semantic Web                                                                                                    9
Proposed Architecture

              SPARQL + SPIN                                            Query Layer

                                   Domain-           SPIN
                                   Ontology                          Ontology Layer
                                                       OBDQM


                                                                  Data Sources Layer
                                    Knowledge
                                                       Linked
  RDB                                 Base           Data Cloud


C. Fürber, M. Hepp: Using SPARQL and SPIN for Data
Quality Management on the Semantic Web                                           10
Defining Data Quality Rules with
                     SPARQL (1)
       Define what is allowed and negate it.



                                                     Define what is not allowed.



                 Negations and regular expressions save manual
                                     effort.
C. Fürber, M. Hepp: Using SPARQL and SPIN for Data
Quality Management on the Semantic Web                                             11
Defining Data Quality Rules with
                     SPARQL (2)
               The city „Las Vegas“ must be in the country „USA“.

  # Checking functional dependency of {?arg4} with {?arg2}
  CONSTRUCT {
      _:b0 a spin:ConstraintViolation .
      _:b0 spin:violationRoot ?this .
      _:b0 spin:violationPath vocab:location_COUNTRY .
  }
  WHERE {
      ?this vocab:location_CITY „Las Vegas“ .
      FILTER (!spl:hasValue(?this, vocab:location_COUNTRY, “USA”)) .
  }


C. Fürber, M. Hepp: Using SPARQL and SPIN for Data
Quality Management on the Semantic Web                              12
Defining Data Quality Rules with
                     SPARQL (3)
        High reusability of data quality rules through SPIN‘s
        SPARQL query templates.

              # Checking functional dependency of {?arg4} with {?arg2}
              CONSTRUCT {
                  _:b0 a spin:ConstraintViolation .
                  _:b0 spin:violationRoot ?this .
                  _:b0 spin:violationPath ?arg3 .
              }
              WHERE {
                  ?this ?arg1 ?arg2 .
                  FILTER (!spl:hasValue(?this, ?arg3, ?arg4)) .
              }
C. Fürber, M. Hepp: Using SPARQL and SPIN for Data
Quality Management on the Semantic Web                                   13
Enforced DQ-Rules with SPIN




                                         Application: http://www.topquadrant.com/products/TB_Composer.html#free


C. Fürber, M. Hepp:                                                                                               14
Using SPARQL and SPIN for Data Quality
Management on the Semantic Web
More Data Quality Rule Templates (1)
  Data Quality Problem                               SPARQL Query Template
  Missing literal values                             ASK WHERE {
                                                     ?this ?arg1 "" .
                                                     }

  Out of range value                                 ASK WHERE {
                                                     ?this ?arg1 ?value .
  (lower limit)                                      FILTER (?value < ?arg2) .
                                                     }

  Out of range value                                 ASK WHERE {
                                                     ?this ?arg1 ?value .
  (upper limit)                                      FILTER (?value > ?arg2) .
                                                     }


                                                       Global Ontology

                                                                                 Knowledge
                  RDB                                      RDB                     Base
C. Fürber, M. Hepp: Using SPARQL and SPIN for Data
Quality Management on the Semantic Web                                                       15
More Data Quality Rule Templates (2)
  Data Quality Problem                               SPARQL Query Template
  Syntax violation                                   ASK WHERE {
                                                     ?this ?arg1 ?value .
  (only letters and dots                             FILTER (!regex(str(?value),
  allowed)                                           "^([A-Za-z,. ])*$"))}

  Unique value violation                             CONSTRUCT {
                                                     _:b0 a spin:ConstraintViolation .
                                                     _:b0 spin:violationRoot ?a .
                                                     _:b0 spin:violationPath ?arg1 .
                                                     }
                                                     WHERE {
                                                     ?a ?arg1 ?uniqueValue .
                                                     ?b ?arg1 ?uniqueValue .
                                                     FILTER (?a != ?b)}

                                                       Global Ontology


                    RDB                                  RDB                             Knowledge
                                                                                           Base
C. Fürber, M. Hepp: Using SPARQL and SPIN for Data
Quality Management on the Semantic Web                                                               16
Contributions

• Domain-independent SPARQL query
  templates for data quality problem identification
• Queries are highly reusable
• Architecture enables the use of Linked Data
• Methodology for data quality management of
  Semantic Web data
• First approach on how to apply SPIN for DQM

C. Fürber, M. Hepp: Using SPARQL and SPIN for Data
Quality Management on the Semantic Web                               17
Limitations & Open Issues
• Knowing the problem does not mean we can
  solve it
• Homonym / Synonym handling
• Incomplete knowledge may cause constraint
  violations of clean instances
• Current approach focuses on literal values
• Scalability on large data sets


C. Fürber, M. Hepp: Using SPARQL and SPIN for Data
Quality Management on the Semantic Web                18
Ongoing Extensions
• Extension to a broader set of data quality problems
• Enabling synonym handling and homonym tolerance
• Enhancement of peformance
• Calculation of information quality scores
• Integration of Linked Data as trusted reference for
  data quality management
• Evaluate the quality of popular Semantic Web data sets
  on instance level (e.g. Geonames & DBPedia)
• Extension for (semi-)automated data cleansing

C. Fürber, M. Hepp: Using SPARQL and SPIN for Data
Quality Management on the Semantic Web                        19
Christian Fuerber
       Researcher
       E-Business & Web Science Research Group

                     Werner-Heisenberg-Weg 39
                     85577 Neubiberg
                     Germany

                     skype            c.fuerber
                     email            christian@fuerber.com
                     web              http://www.unibw.de/ebusiness
                     homepage         http://www.fuerber.com




Paper is available at http://bit.ly/bYes0V

                                                                      20
References & Links
     LOD-Cloud:
       http://www4.wiwiss.fu-berlin.de/bizer/pub/lod-datasets_2009-03-05.html

     D2RQ:
       http://www4.wiwiss.fu-berlin.de/bizer/d2rq/spec/

     SPIN:
        http://spinrdf.org/

     TopBraid Composer Free Edition:
        http://www.topquadrant.com/products/TB_Composer.html#free



C. Fürber, M. Hepp:                                                         21
Using SPARQL and SPIN for Data Quality
Management on the Semantic Web

Mais conteúdo relacionado

Semelhante a Using SPARQL and SPIN for Data Quality Management on the Semantic Web

Large Scale Geospatial Indexing and Analysis on Apache Spark
Large Scale Geospatial Indexing and Analysis on Apache SparkLarge Scale Geospatial Indexing and Analysis on Apache Spark
Large Scale Geospatial Indexing and Analysis on Apache SparkDatabricks
 
Spark + Hadoop Perfect together
Spark + Hadoop Perfect togetherSpark + Hadoop Perfect together
Spark + Hadoop Perfect togetherIsheeta Sanghi
 
Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero
Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcaderoIasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero
Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcaderoCodecamp Romania
 
Hadoop Application Architectures - Fraud Detection
Hadoop Application Architectures - Fraud  DetectionHadoop Application Architectures - Fraud  Detection
Hadoop Application Architectures - Fraud Detectionhadooparchbook
 
Data Infrastructure for a World of Music
Data Infrastructure for a World of MusicData Infrastructure for a World of Music
Data Infrastructure for a World of MusicLars Albertsson
 
Rapid mobile phone based surveys (Scott Chaplowe, IFRC)
Rapid mobile phone based surveys (Scott Chaplowe, IFRC)Rapid mobile phone based surveys (Scott Chaplowe, IFRC)
Rapid mobile phone based surveys (Scott Chaplowe, IFRC)ALNAP
 
How Linked Data Can Speed Information Discovery
How Linked Data Can Speed Information DiscoveryHow Linked Data Can Speed Information Discovery
How Linked Data Can Speed Information DiscoveryAlex Meadows
 
Analyzing Pwned Passwords with Spark - OWASP Meetup July 2018
Analyzing Pwned Passwords with Spark - OWASP Meetup July 2018Analyzing Pwned Passwords with Spark - OWASP Meetup July 2018
Analyzing Pwned Passwords with Spark - OWASP Meetup July 2018Kelley Robinson
 
SAP_BODS-Data Migration Consultant
SAP_BODS-Data Migration ConsultantSAP_BODS-Data Migration Consultant
SAP_BODS-Data Migration Consultantguru dev
 
What Is Hadoop | Hadoop Tutorial For Beginners | Edureka
What Is Hadoop | Hadoop Tutorial For Beginners | EdurekaWhat Is Hadoop | Hadoop Tutorial For Beginners | Edureka
What Is Hadoop | Hadoop Tutorial For Beginners | EdurekaEdureka!
 
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...Edureka!
 
LODOP - Multi-Query Optimization for Linked Data Profiling Queries
LODOP - Multi-Query Optimization for Linked Data Profiling QueriesLODOP - Multi-Query Optimization for Linked Data Profiling Queries
LODOP - Multi-Query Optimization for Linked Data Profiling QueriesAnja Jentzsch
 
Progress Report 20091009
Progress Report 20091009Progress Report 20091009
Progress Report 20091009xoanon
 
Kenan Blevins Resume SAP
Kenan Blevins Resume SAPKenan Blevins Resume SAP
Kenan Blevins Resume SAPKenan Blevins
 
Semantic Web-based E-Commerce: The GoodRelations Ontology
Semantic Web-based E-Commerce: The GoodRelations OntologySemantic Web-based E-Commerce: The GoodRelations Ontology
Semantic Web-based E-Commerce: The GoodRelations OntologyMartin Hepp
 
Big Data Trend with Open Platform
Big Data Trend with Open PlatformBig Data Trend with Open Platform
Big Data Trend with Open PlatformJongwook Woo
 
Analyzing Pwned Passwords with Spark and Scala
Analyzing Pwned Passwords with Spark and ScalaAnalyzing Pwned Passwords with Spark and Scala
Analyzing Pwned Passwords with Spark and ScalaKelley Robinson
 
ESWC 2015 Closing and "General Chair's minute of Madness"
ESWC 2015 Closing and "General Chair's minute of Madness"ESWC 2015 Closing and "General Chair's minute of Madness"
ESWC 2015 Closing and "General Chair's minute of Madness"Fabien Gandon
 

Semelhante a Using SPARQL and SPIN for Data Quality Management on the Semantic Web (20)

Big data with java
Big data with javaBig data with java
Big data with java
 
Large Scale Geospatial Indexing and Analysis on Apache Spark
Large Scale Geospatial Indexing and Analysis on Apache SparkLarge Scale Geospatial Indexing and Analysis on Apache Spark
Large Scale Geospatial Indexing and Analysis on Apache Spark
 
Spark + Hadoop Perfect together
Spark + Hadoop Perfect togetherSpark + Hadoop Perfect together
Spark + Hadoop Perfect together
 
Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero
Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcaderoIasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero
Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero
 
Hadoop Application Architectures - Fraud Detection
Hadoop Application Architectures - Fraud  DetectionHadoop Application Architectures - Fraud  Detection
Hadoop Application Architectures - Fraud Detection
 
Data Infrastructure for a World of Music
Data Infrastructure for a World of MusicData Infrastructure for a World of Music
Data Infrastructure for a World of Music
 
Rapid mobile phone based surveys (Scott Chaplowe, IFRC)
Rapid mobile phone based surveys (Scott Chaplowe, IFRC)Rapid mobile phone based surveys (Scott Chaplowe, IFRC)
Rapid mobile phone based surveys (Scott Chaplowe, IFRC)
 
How Linked Data Can Speed Information Discovery
How Linked Data Can Speed Information DiscoveryHow Linked Data Can Speed Information Discovery
How Linked Data Can Speed Information Discovery
 
STI Summit 2011 - Social semantics
STI Summit 2011 - Social semanticsSTI Summit 2011 - Social semantics
STI Summit 2011 - Social semantics
 
Analyzing Pwned Passwords with Spark - OWASP Meetup July 2018
Analyzing Pwned Passwords with Spark - OWASP Meetup July 2018Analyzing Pwned Passwords with Spark - OWASP Meetup July 2018
Analyzing Pwned Passwords with Spark - OWASP Meetup July 2018
 
SAP_BODS-Data Migration Consultant
SAP_BODS-Data Migration ConsultantSAP_BODS-Data Migration Consultant
SAP_BODS-Data Migration Consultant
 
What Is Hadoop | Hadoop Tutorial For Beginners | Edureka
What Is Hadoop | Hadoop Tutorial For Beginners | EdurekaWhat Is Hadoop | Hadoop Tutorial For Beginners | Edureka
What Is Hadoop | Hadoop Tutorial For Beginners | Edureka
 
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
 
LODOP - Multi-Query Optimization for Linked Data Profiling Queries
LODOP - Multi-Query Optimization for Linked Data Profiling QueriesLODOP - Multi-Query Optimization for Linked Data Profiling Queries
LODOP - Multi-Query Optimization for Linked Data Profiling Queries
 
Progress Report 20091009
Progress Report 20091009Progress Report 20091009
Progress Report 20091009
 
Kenan Blevins Resume SAP
Kenan Blevins Resume SAPKenan Blevins Resume SAP
Kenan Blevins Resume SAP
 
Semantic Web-based E-Commerce: The GoodRelations Ontology
Semantic Web-based E-Commerce: The GoodRelations OntologySemantic Web-based E-Commerce: The GoodRelations Ontology
Semantic Web-based E-Commerce: The GoodRelations Ontology
 
Big Data Trend with Open Platform
Big Data Trend with Open PlatformBig Data Trend with Open Platform
Big Data Trend with Open Platform
 
Analyzing Pwned Passwords with Spark and Scala
Analyzing Pwned Passwords with Spark and ScalaAnalyzing Pwned Passwords with Spark and Scala
Analyzing Pwned Passwords with Spark and Scala
 
ESWC 2015 Closing and "General Chair's minute of Madness"
ESWC 2015 Closing and "General Chair's minute of Madness"ESWC 2015 Closing and "General Chair's minute of Madness"
ESWC 2015 Closing and "General Chair's minute of Madness"
 

Último

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 

Último (20)

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 

Using SPARQL and SPIN for Data Quality Management on the Semantic Web

  • 1. Using SPARQL and SPIN for Data Quality Management on the Semantic Web Christian Fürber / Martin Hepp christian@fuerber.com, mhepp@computer.org Presentation @ BIS May 4th 2010
  • 2. Vision of the Semantic Web Publishing data on the web in a meaningful way for more automation, better integration, and higher reusability of data. © Hanspeter Graf / www.pixelio.de C. Fürber, M. Hepp: Using SPARQL and SPIN for Data Quality Management on the Semantic Web 2
  • 3. Growth of Data: Retrieving information Well on Track… Building smart Reference: http://www4.wiwiss.fu-berlin.de/bizer/pub/lod-datasets_2009-03-05.html SemWeb apps C. Fürber, M. Hepp: Using SPARQL and SPIN for Data Quality Management on the Semantic Web 3
  • 4. …but what if the published data was of poor quality? Get a giant camcorder from amazon! C. Fürber, M. Hepp: 4 Using SPARQL and SPIN for Data Quality Management on the Semantic Web
  • 5. Using Poor Data is Costly Without quality checks your SemWeb Apps will take this data seriously and… …get an oversized shipping package with expensive postage, …and waste transportation capacity. C. Fürber, M. Hepp: 5 Using SPARQL and SPIN for Data Quality Management on the Semantic Web
  • 6. Is there any way to avoid data quality disasters? Yes, if we know about data quality problems, before anything bad will happen! A giant camcorder on the road! C. Fürber, M. Hepp: Using SPARQL and SPIN for Data Quality Management on the Semantic Web 6
  • 7. The Impact of Poor Data Quality Higher Costs Missed Revenues Poor Decisions Lower Product / Failed Business Processes Service Quality Failed Projects Lower Stakeholder Satisfaction Fatal Disasters C. Fürber, M. Hepp: Using SPARQL and SPIN for Data Quality Management on the Semantic Web 7
  • 8. Data Quality is a Key Bottleneck of the Unique value violation Semantic Web <vocab:location rdf:about="http://www.stockdbdemo2.com/stockdblocation/1"> <vocab:location_ZIP></vocab:location_ZIP> Missing literal values <vocab:location_STREETNO></vocab:location_STREETNO> <vocab:location_COUNTRY>France</vocab:location_COUNTRY> <vocab:location_ID rdf:datatype="http://www.w3.org/2001/XMLSchema#int" >1</vocab:location_ID> <vocab:location_STREET>8489 Strong St.</vocab:location_STREET> <vocab:location_STATE>NV</vocab:location_STATE> <rdfs:label>location #1</rdfs:label> Functional dependency violation <vocab:location_CITY>Las Vegas</vocab:location_CITY> </vocab:location> Syntax violation C. Fürber, M. Hepp: Using SPARQL and SPIN for Data Quality Management on the Semantic Web 8
  • 9. <vocab:location rdf:about="http://www.stockdbdemo2.com/stockdblocation/1"> <vocab:location_ZIP></vocab:location_ZIP> Our Approach <vocab:location_STREETNO></vocab:location_STREETNO> <vocab:location_COUNTRY>France</vocab:location_COUNTRY> <vocab:location_ID rdf:datatype="http://www.w3.org/2001/XMLSchema#int" >1</vocab:location_ID> <vocab:location_STREET>8489 Strong St.</vocab:location_STREET> <vocab:location_STATE>NV</vocab:location_STATE> <rdfs:label>location #1</rdfs:label> <vocab:location_CITY>Las Vegas</vocab:location_CITY> </vocab:location> Identification of data quality problems on instance level of Semantic Web sources solely with Semantic Web technologies. Integration advantages Access to SemWeb data may be useful for dqm. C. Fürber, M. Hepp: Using SPARQL and SPIN for Data Quality Management on the Semantic Web 9
  • 10. Proposed Architecture SPARQL + SPIN Query Layer Domain- SPIN Ontology Ontology Layer OBDQM Data Sources Layer Knowledge Linked RDB Base Data Cloud C. Fürber, M. Hepp: Using SPARQL and SPIN for Data Quality Management on the Semantic Web 10
  • 11. Defining Data Quality Rules with SPARQL (1) Define what is allowed and negate it. Define what is not allowed. Negations and regular expressions save manual effort. C. Fürber, M. Hepp: Using SPARQL and SPIN for Data Quality Management on the Semantic Web 11
  • 12. Defining Data Quality Rules with SPARQL (2) The city „Las Vegas“ must be in the country „USA“. # Checking functional dependency of {?arg4} with {?arg2} CONSTRUCT { _:b0 a spin:ConstraintViolation . _:b0 spin:violationRoot ?this . _:b0 spin:violationPath vocab:location_COUNTRY . } WHERE { ?this vocab:location_CITY „Las Vegas“ . FILTER (!spl:hasValue(?this, vocab:location_COUNTRY, “USA”)) . } C. Fürber, M. Hepp: Using SPARQL and SPIN for Data Quality Management on the Semantic Web 12
  • 13. Defining Data Quality Rules with SPARQL (3) High reusability of data quality rules through SPIN‘s SPARQL query templates. # Checking functional dependency of {?arg4} with {?arg2} CONSTRUCT { _:b0 a spin:ConstraintViolation . _:b0 spin:violationRoot ?this . _:b0 spin:violationPath ?arg3 . } WHERE { ?this ?arg1 ?arg2 . FILTER (!spl:hasValue(?this, ?arg3, ?arg4)) . } C. Fürber, M. Hepp: Using SPARQL and SPIN for Data Quality Management on the Semantic Web 13
  • 14. Enforced DQ-Rules with SPIN Application: http://www.topquadrant.com/products/TB_Composer.html#free C. Fürber, M. Hepp: 14 Using SPARQL and SPIN for Data Quality Management on the Semantic Web
  • 15. More Data Quality Rule Templates (1) Data Quality Problem SPARQL Query Template Missing literal values ASK WHERE { ?this ?arg1 "" . } Out of range value ASK WHERE { ?this ?arg1 ?value . (lower limit) FILTER (?value < ?arg2) . } Out of range value ASK WHERE { ?this ?arg1 ?value . (upper limit) FILTER (?value > ?arg2) . } Global Ontology Knowledge RDB RDB Base C. Fürber, M. Hepp: Using SPARQL and SPIN for Data Quality Management on the Semantic Web 15
  • 16. More Data Quality Rule Templates (2) Data Quality Problem SPARQL Query Template Syntax violation ASK WHERE { ?this ?arg1 ?value . (only letters and dots FILTER (!regex(str(?value), allowed) "^([A-Za-z,. ])*$"))} Unique value violation CONSTRUCT { _:b0 a spin:ConstraintViolation . _:b0 spin:violationRoot ?a . _:b0 spin:violationPath ?arg1 . } WHERE { ?a ?arg1 ?uniqueValue . ?b ?arg1 ?uniqueValue . FILTER (?a != ?b)} Global Ontology RDB RDB Knowledge Base C. Fürber, M. Hepp: Using SPARQL and SPIN for Data Quality Management on the Semantic Web 16
  • 17. Contributions • Domain-independent SPARQL query templates for data quality problem identification • Queries are highly reusable • Architecture enables the use of Linked Data • Methodology for data quality management of Semantic Web data • First approach on how to apply SPIN for DQM C. Fürber, M. Hepp: Using SPARQL and SPIN for Data Quality Management on the Semantic Web 17
  • 18. Limitations & Open Issues • Knowing the problem does not mean we can solve it • Homonym / Synonym handling • Incomplete knowledge may cause constraint violations of clean instances • Current approach focuses on literal values • Scalability on large data sets C. Fürber, M. Hepp: Using SPARQL and SPIN for Data Quality Management on the Semantic Web 18
  • 19. Ongoing Extensions • Extension to a broader set of data quality problems • Enabling synonym handling and homonym tolerance • Enhancement of peformance • Calculation of information quality scores • Integration of Linked Data as trusted reference for data quality management • Evaluate the quality of popular Semantic Web data sets on instance level (e.g. Geonames & DBPedia) • Extension for (semi-)automated data cleansing C. Fürber, M. Hepp: Using SPARQL and SPIN for Data Quality Management on the Semantic Web 19
  • 20. Christian Fuerber Researcher E-Business & Web Science Research Group Werner-Heisenberg-Weg 39 85577 Neubiberg Germany skype c.fuerber email christian@fuerber.com web http://www.unibw.de/ebusiness homepage http://www.fuerber.com Paper is available at http://bit.ly/bYes0V 20
  • 21. References & Links LOD-Cloud: http://www4.wiwiss.fu-berlin.de/bizer/pub/lod-datasets_2009-03-05.html D2RQ: http://www4.wiwiss.fu-berlin.de/bizer/d2rq/spec/ SPIN: http://spinrdf.org/ TopBraid Composer Free Edition: http://www.topquadrant.com/products/TB_Composer.html#free C. Fürber, M. Hepp: 21 Using SPARQL and SPIN for Data Quality Management on the Semantic Web