SlideShare uma empresa Scribd logo
1 de 23
Baixar para ler offline
Using Semantic Web
Resources for Data Quality
      Management
       Christian Fürber and Martin Hepp
      christian@fuerber.com, mhepp@computer.org

  Presentation at the 17th International Conference on
 Knowledge Engineering and Knowledge Management,
        October 10-15, 2010, Lisbon, Portugal
Purpose of Data
  Measurement                                      Information &
                                                   Knowledge

                                      101010101
                                      010101010
                                     DATA
                                      101010101
                                      001010101
    Automation                        001010101     Decisions




C. Fürber, M. Hepp:                                          2
Using SemWeb Resources for DQM
Data Quality in Practice




       Reference: http://www.heise.de/newsticker/meldung/Comdirect-Bank-macht-Kunden-zu-Billiardaeren-996088.html


C. Fürber, M. Hepp:                                                                                                 3
Using SemWeb Resources for DQM
The Web of Messy Data?
 Retrieved from http://dbpedia.org/sparql on July 20th




                                                                         Which one is
                                                                          the correct
                                                                         population?




C. Fürber, M. Hepp:                                                                     4
Using SemWeb Resources for DQM
The Web of Messy Data?
 Retrieved from http://dbpedia.org/sparql on July 20th




                                                                            Places with
                                                                             negative
                                                                           population?!?




C. Fürber, M. Hepp:                                                                        5
Using SemWeb Resources for DQM
Risk of Failure
  Measurement                                      Information &
                                                   Knowledge

                                     101010101
                                     010101010
                                    DATA
                                     101010101
                                     001010101
    Automation                       001010101      Decisions




C. Fürber, M. Hepp:                                          6
Using SemWeb Resources for DQM
Data Quality Problem Types
                                                      Inconsistent duplicates
                     Invalid characters                              Missing classification




                                                                                                                       Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
  Incorrect reference                                                                  Approximate duplicates




                                                                                                                           Reference: Linking Open Data cloud diagram, by
                                                      Character alignment violation

                   Word transpositions
                                 Invalid substrings
                                                           Mistyping / Misspelling errors
  Cardinality violation
                                                 Missing values                  Referential integrity violation
                 Misfielded values
        Unique value violation        False values              Functional Dependency
                          Out of range values
                                                                Violation                Imprecise values
   Existence of Homonyms             Meaningless values
                                                                        Incorrect classification
        Existence of Synonyms                                Contradictory relationships
                          Outdated conceptual elements          Untyped literals        Outdated values


C. Fürber, M. Hepp:                                                                                                7
Using SemWeb Resources for DQM
Goals

• Use Semantic Web data to identify data
  quality problems on instance level

• Support Data Quality Management (DQM)
  process


C. Fürber, M. Hepp:                        8
Using SemWeb Resources for DQM
Total Data Quality Management
  for and based on the Semantic Web
                                                               Develop and
     Define what‘s
                                                              apply SPARQL
     good and / or
                                                              queries based
      what‘s poor                Define    Measure
                                                                 on DQ-
      data quality
                                                                Definition

                                          DQ
                                 Improve   Analyze

                                                     Reference: Richard Wang (1998)




C. Fürber, M. Hepp:                                                                   9
Using SemWeb Resources for DQM
How can the Semantic Web support
    Data Quality Management?

   Availability of FREE Data Quality Knowledge,
   e.g. for the identification of…

                • Legal value violations
                • Functional dependency violations


C. Fürber, M. Hepp:                                  10
Using SemWeb Resources for DQM
Using Trusted References
  Las Vegas                      France       DQ-Constraints



                             local:Location                    tref:Location


 Las Vegas

                                                                               Las Vegas
                France
                                                                       USA


    Tested Knowledgebase                                       Trusted Reference

C. Fürber, M. Hepp:                                                                  11
Using SemWeb Resources for DQM
Basic Architecture




C. Fürber, M. Hepp:                               12
Using SemWeb Resources for DQM
Basic Characteristics of SPIN
                                 • Allows definition of generalized
                                   SPARQL query templates
 http://spinrdf.org/
                                 • Constraint checking based on
                                   SPARQL
                                 • Definition of inferencing rules via
                                   SPARQL



C. Fürber, M. Hepp:                                                  13
Using SemWeb Resources for DQM
Generic Data Quality Constraints
       Library for Easy DQ-Defintion
                                                • Mandatory properties &
                                                  literals
                                                • Legal values*
                                                • Legal value ranges
                                                • Functional dependencies*
                                                • Legal syntaxes
                                                • Uniqueness

                                                * Designed to use trusted references

          available @ http://semwebquality.org/ontologies/dq-constraints#
C. Fürber, M. Hepp:                                                          14
Using SemWeb Resources for DQM
Definition of Data Quality
                Constraints based on SPIN




C. Fürber, M. Hepp:                           15
Using SemWeb Resources for DQM
Constraint checking in Practice




C. Fürber, M. Hepp:                       16
Using SemWeb Resources for DQM
Legal Value Constraints
   Return all instances of class vcard:Address that do not have a
   matching value for property vcard:country-name in property
   tref:country
                      SELECT ?s
                      WHERE {
                          ?s a vcard:Address .
                          ?s vcard:country-name ?value .
                      OPTIONAL {
                          ?s2 a tref:Location .
                          ?s2 tref:country ?value1 .
                          FILTER(str(?value1)= str(?value))
                          } .
                          FILTER(!bound(?value1))
                      }
C. Fürber, M. Hepp:                                                 17
Using SemWeb Resources for DQM
Functional Dependency Constraints
   Return all instances of vcard:ADR with city-country-combinations
   that do not have a matching pair in instances of gn:Location.

                     SELECT ?s
                     WHERE {
                     ?s a gr:LocationOfSalesOrServiceProvisioning .
                     ?s vcard:ADR ?node
                     ?node vcard:city ?value1 .
                     ?node vcard:country ?value2 .
                     NOT EXISTS {
                     ?s2 a gn:Location .
                     ?s2 gn:asciiname ?value1 .
                     ?s2 gn:country ?value2 .
                     }}



C. Fürber, M. Hepp:                                                   18
Using SemWeb Resources for DQM
Acquisition of Semantic Web
                 Sources for DQM
        (1)          Replication of relevant knowledge-bases
        (2)          On the fly via federated SPARQL queries:
                            PREFIX dbo:<http://dbpedia.org/ontology/>
                            SELECT *
                            WHERE {
                            ?s1 :location_CITY ?city .
                            OPTIONAL{
                            SERVICE <http://dbpedia.org/sparql>{
                            ?s2 a dbo:City .
                            ?s2 rdfs:label ?city .
                            FILTER (lang(?city) = "en") .
                            }
                            }
                            FILTER(!bound(?s2))
                            }

C. Fürber, M. Hepp:                                                     19
Using SemWeb Resources for DQM
Limitations
• High degree of uncertainty about quality of Semantic
  Web resources
• Risk for data quality problem proliferation
• Lack of Semantic Web resources for certain domains
• Flexible design of RDF and structural heterogeneity
  complicate definition of generic DQ constraints
• Scalability on large data sets
• DQ constraints close the world



C. Fürber, M. Hepp:                                      20
Using SemWeb Resources for DQM
Contributions
• Data quality control for Semantic Web data
• Identification of potential inconsistencies
  between Semantic Web Resources
• Reduction of effort for the definition of functional
  dependency rules and legal value rules
• Reuse of shared data quality rules on a Web
  scale


C. Fürber, M. Hepp:                                  21
Using SemWeb Resources for DQM
Future Work
• Semantic Web information quality assessment
  framework (SWIQA) with computation of KPI‘s
• Analysis and identification of useful „trusted
  references“ based on SWIQA
• Application on multi-source master data of
  information systems
• Evaluation on large data sets


C. Fürber, M. Hepp:                                22
Using SemWeb Resources for DQM
Data Quality Constraints Library for SPIN @
http://semwebquality.org/ontologies/dq-constraints#

          Christian Fürber
          Researcher
          E-Business & Web Science Research Group

                        Werner-Heisenberg-Weg 39
                        85577 Neubiberg
                        Germany

                        skype            c.fuerber
                        email            christian@fuerber.com
                        web              http://www.unibw.de/ebusiness
                        homepage         http://www.fuerber.com
                        twitter          http://www.twitter.com/cfuerber




     Paper available at http://bit.ly/c5v6TM
                                                                           23

Mais conteúdo relacionado

Destaque (8)

Data analysis powerpoint
Data analysis powerpointData analysis powerpoint
Data analysis powerpoint
 
Data analysis powerpoint
Data analysis powerpointData analysis powerpoint
Data analysis powerpoint
 
Lesson Plan
Lesson PlanLesson Plan
Lesson Plan
 
Chapter 10-DATA ANALYSIS & PRESENTATION
Chapter 10-DATA ANALYSIS & PRESENTATIONChapter 10-DATA ANALYSIS & PRESENTATION
Chapter 10-DATA ANALYSIS & PRESENTATION
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Qualitative data analysis
Qualitative data analysisQualitative data analysis
Qualitative data analysis
 
Displaying Data
Displaying DataDisplaying Data
Displaying Data
 
Digital in 2016
Digital in 2016Digital in 2016
Digital in 2016
 

Semelhante a Using Semantic Web Resources for Data Quality Management

From Linked Data to Semantic Applications
From Linked Data to Semantic ApplicationsFrom Linked Data to Semantic Applications
From Linked Data to Semantic ApplicationsAndre Freitas
 
Story cmpe255
Story cmpe255Story cmpe255
Story cmpe255WeifengMa
 
ESWC 2015 Closing and "General Chair's minute of Madness"
ESWC 2015 Closing and "General Chair's minute of Madness"ESWC 2015 Closing and "General Chair's minute of Madness"
ESWC 2015 Closing and "General Chair's minute of Madness"Fabien Gandon
 
PRISSMA, Towards Mobile Adaptive Presentation of the Web of Data
PRISSMA,Towards Mobile Adaptive Presentation of the Web of DataPRISSMA,Towards Mobile Adaptive Presentation of the Web of Data
PRISSMA, Towards Mobile Adaptive Presentation of the Web of DataLuca Costabello
 
Deep neural networks and tabular data
Deep neural networks and tabular dataDeep neural networks and tabular data
Deep neural networks and tabular dataJimmyLiang20
 
Prov4J: A Semantic Web Framework for Generic Provenance Management
Prov4J: A Semantic Web Framework for Generic Provenance Management Prov4J: A Semantic Web Framework for Generic Provenance Management
Prov4J: A Semantic Web Framework for Generic Provenance Management Andre Freitas
 
Formal, Executable Semantics of Web Languages: JavaScript and PHP
Formal, Executable Semantics of Web Languages: JavaScript and PHPFormal, Executable Semantics of Web Languages: JavaScript and PHP
Formal, Executable Semantics of Web Languages: JavaScript and PHPFACE
 
Final Acb All Hands 26 11 07.Key
Final Acb All Hands 26 11 07.KeyFinal Acb All Hands 26 11 07.Key
Final Acb All Hands 26 11 07.Keyguest3d0531
 
EDF2012 Peter Boncz - LOD benchmarking SRbench
EDF2012   Peter Boncz - LOD benchmarking SRbenchEDF2012   Peter Boncz - LOD benchmarking SRbench
EDF2012 Peter Boncz - LOD benchmarking SRbenchEuropean Data Forum
 
wEb infomation retrieval
wEb infomation retrievalwEb infomation retrieval
wEb infomation retrievalGeorge Ang
 
Sieve - Data Quality and Fusion - LWDM2012
Sieve - Data Quality and Fusion - LWDM2012Sieve - Data Quality and Fusion - LWDM2012
Sieve - Data Quality and Fusion - LWDM2012Pablo Mendes
 
Speculating on the Future of the Metadata Standards Landscape
Speculating on the Future of the Metadata Standards LandscapeSpeculating on the Future of the Metadata Standards Landscape
Speculating on the Future of the Metadata Standards LandscapeJenn Riley
 
RCOMM 2011 - Sentiment Classification
RCOMM 2011 - Sentiment ClassificationRCOMM 2011 - Sentiment Classification
RCOMM 2011 - Sentiment Classificationbohanairl
 
RCOMM 2011 - Sentiment Classification with RapidMiner
RCOMM 2011 - Sentiment Classification with RapidMinerRCOMM 2011 - Sentiment Classification with RapidMiner
RCOMM 2011 - Sentiment Classification with RapidMinerbohanairl
 
Initial Usage Analysis of DBpedia's Triple Pattern Fragments
Initial Usage Analysis of DBpedia's Triple Pattern FragmentsInitial Usage Analysis of DBpedia's Triple Pattern Fragments
Initial Usage Analysis of DBpedia's Triple Pattern FragmentsRuben Verborgh
 
438_AmeeruddinMohammed
438_AmeeruddinMohammed438_AmeeruddinMohammed
438_AmeeruddinMohammedAmeeruddin MD
 

Semelhante a Using Semantic Web Resources for Data Quality Management (19)

From Linked Data to Semantic Applications
From Linked Data to Semantic ApplicationsFrom Linked Data to Semantic Applications
From Linked Data to Semantic Applications
 
Story cmpe255
Story cmpe255Story cmpe255
Story cmpe255
 
Data aware apps
Data aware appsData aware apps
Data aware apps
 
ESWC 2015 Closing and "General Chair's minute of Madness"
ESWC 2015 Closing and "General Chair's minute of Madness"ESWC 2015 Closing and "General Chair's minute of Madness"
ESWC 2015 Closing and "General Chair's minute of Madness"
 
VictorCassen
VictorCassenVictorCassen
VictorCassen
 
PRISSMA, Towards Mobile Adaptive Presentation of the Web of Data
PRISSMA,Towards Mobile Adaptive Presentation of the Web of DataPRISSMA,Towards Mobile Adaptive Presentation of the Web of Data
PRISSMA, Towards Mobile Adaptive Presentation of the Web of Data
 
Deep neural networks and tabular data
Deep neural networks and tabular dataDeep neural networks and tabular data
Deep neural networks and tabular data
 
Prov4J: A Semantic Web Framework for Generic Provenance Management
Prov4J: A Semantic Web Framework for Generic Provenance Management Prov4J: A Semantic Web Framework for Generic Provenance Management
Prov4J: A Semantic Web Framework for Generic Provenance Management
 
Formal, Executable Semantics of Web Languages: JavaScript and PHP
Formal, Executable Semantics of Web Languages: JavaScript and PHPFormal, Executable Semantics of Web Languages: JavaScript and PHP
Formal, Executable Semantics of Web Languages: JavaScript and PHP
 
Final Acb All Hands 26 11 07.Key
Final Acb All Hands 26 11 07.KeyFinal Acb All Hands 26 11 07.Key
Final Acb All Hands 26 11 07.Key
 
EDF2012 Peter Boncz - LOD benchmarking SRbench
EDF2012   Peter Boncz - LOD benchmarking SRbenchEDF2012   Peter Boncz - LOD benchmarking SRbench
EDF2012 Peter Boncz - LOD benchmarking SRbench
 
wEb infomation retrieval
wEb infomation retrievalwEb infomation retrieval
wEb infomation retrieval
 
Open problems big_data_19_feb_2015_ver_0.1
Open problems big_data_19_feb_2015_ver_0.1Open problems big_data_19_feb_2015_ver_0.1
Open problems big_data_19_feb_2015_ver_0.1
 
Sieve - Data Quality and Fusion - LWDM2012
Sieve - Data Quality and Fusion - LWDM2012Sieve - Data Quality and Fusion - LWDM2012
Sieve - Data Quality and Fusion - LWDM2012
 
Speculating on the Future of the Metadata Standards Landscape
Speculating on the Future of the Metadata Standards LandscapeSpeculating on the Future of the Metadata Standards Landscape
Speculating on the Future of the Metadata Standards Landscape
 
RCOMM 2011 - Sentiment Classification
RCOMM 2011 - Sentiment ClassificationRCOMM 2011 - Sentiment Classification
RCOMM 2011 - Sentiment Classification
 
RCOMM 2011 - Sentiment Classification with RapidMiner
RCOMM 2011 - Sentiment Classification with RapidMinerRCOMM 2011 - Sentiment Classification with RapidMiner
RCOMM 2011 - Sentiment Classification with RapidMiner
 
Initial Usage Analysis of DBpedia's Triple Pattern Fragments
Initial Usage Analysis of DBpedia's Triple Pattern FragmentsInitial Usage Analysis of DBpedia's Triple Pattern Fragments
Initial Usage Analysis of DBpedia's Triple Pattern Fragments
 
438_AmeeruddinMohammed
438_AmeeruddinMohammed438_AmeeruddinMohammed
438_AmeeruddinMohammed
 

Último

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 

Último (20)

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

Using Semantic Web Resources for Data Quality Management

  • 1. Using Semantic Web Resources for Data Quality Management Christian Fürber and Martin Hepp christian@fuerber.com, mhepp@computer.org Presentation at the 17th International Conference on Knowledge Engineering and Knowledge Management, October 10-15, 2010, Lisbon, Portugal
  • 2. Purpose of Data Measurement Information & Knowledge 101010101 010101010 DATA 101010101 001010101 Automation 001010101 Decisions C. Fürber, M. Hepp: 2 Using SemWeb Resources for DQM
  • 3. Data Quality in Practice Reference: http://www.heise.de/newsticker/meldung/Comdirect-Bank-macht-Kunden-zu-Billiardaeren-996088.html C. Fürber, M. Hepp: 3 Using SemWeb Resources for DQM
  • 4. The Web of Messy Data? Retrieved from http://dbpedia.org/sparql on July 20th Which one is the correct population? C. Fürber, M. Hepp: 4 Using SemWeb Resources for DQM
  • 5. The Web of Messy Data? Retrieved from http://dbpedia.org/sparql on July 20th Places with negative population?!? C. Fürber, M. Hepp: 5 Using SemWeb Resources for DQM
  • 6. Risk of Failure Measurement Information & Knowledge 101010101 010101010 DATA 101010101 001010101 Automation 001010101 Decisions C. Fürber, M. Hepp: 6 Using SemWeb Resources for DQM
  • 7. Data Quality Problem Types Inconsistent duplicates Invalid characters Missing classification Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/ Incorrect reference Approximate duplicates Reference: Linking Open Data cloud diagram, by Character alignment violation Word transpositions Invalid substrings Mistyping / Misspelling errors Cardinality violation Missing values Referential integrity violation Misfielded values Unique value violation False values Functional Dependency Out of range values Violation Imprecise values Existence of Homonyms Meaningless values Incorrect classification Existence of Synonyms Contradictory relationships Outdated conceptual elements Untyped literals Outdated values C. Fürber, M. Hepp: 7 Using SemWeb Resources for DQM
  • 8. Goals • Use Semantic Web data to identify data quality problems on instance level • Support Data Quality Management (DQM) process C. Fürber, M. Hepp: 8 Using SemWeb Resources for DQM
  • 9. Total Data Quality Management for and based on the Semantic Web Develop and Define what‘s apply SPARQL good and / or queries based what‘s poor Define Measure on DQ- data quality Definition DQ Improve Analyze Reference: Richard Wang (1998) C. Fürber, M. Hepp: 9 Using SemWeb Resources for DQM
  • 10. How can the Semantic Web support Data Quality Management? Availability of FREE Data Quality Knowledge, e.g. for the identification of… • Legal value violations • Functional dependency violations C. Fürber, M. Hepp: 10 Using SemWeb Resources for DQM
  • 11. Using Trusted References Las Vegas France DQ-Constraints local:Location tref:Location Las Vegas Las Vegas France USA Tested Knowledgebase Trusted Reference C. Fürber, M. Hepp: 11 Using SemWeb Resources for DQM
  • 12. Basic Architecture C. Fürber, M. Hepp: 12 Using SemWeb Resources for DQM
  • 13. Basic Characteristics of SPIN • Allows definition of generalized SPARQL query templates http://spinrdf.org/ • Constraint checking based on SPARQL • Definition of inferencing rules via SPARQL C. Fürber, M. Hepp: 13 Using SemWeb Resources for DQM
  • 14. Generic Data Quality Constraints Library for Easy DQ-Defintion • Mandatory properties & literals • Legal values* • Legal value ranges • Functional dependencies* • Legal syntaxes • Uniqueness * Designed to use trusted references available @ http://semwebquality.org/ontologies/dq-constraints# C. Fürber, M. Hepp: 14 Using SemWeb Resources for DQM
  • 15. Definition of Data Quality Constraints based on SPIN C. Fürber, M. Hepp: 15 Using SemWeb Resources for DQM
  • 16. Constraint checking in Practice C. Fürber, M. Hepp: 16 Using SemWeb Resources for DQM
  • 17. Legal Value Constraints Return all instances of class vcard:Address that do not have a matching value for property vcard:country-name in property tref:country SELECT ?s WHERE { ?s a vcard:Address . ?s vcard:country-name ?value . OPTIONAL { ?s2 a tref:Location . ?s2 tref:country ?value1 . FILTER(str(?value1)= str(?value)) } . FILTER(!bound(?value1)) } C. Fürber, M. Hepp: 17 Using SemWeb Resources for DQM
  • 18. Functional Dependency Constraints Return all instances of vcard:ADR with city-country-combinations that do not have a matching pair in instances of gn:Location. SELECT ?s WHERE { ?s a gr:LocationOfSalesOrServiceProvisioning . ?s vcard:ADR ?node ?node vcard:city ?value1 . ?node vcard:country ?value2 . NOT EXISTS { ?s2 a gn:Location . ?s2 gn:asciiname ?value1 . ?s2 gn:country ?value2 . }} C. Fürber, M. Hepp: 18 Using SemWeb Resources for DQM
  • 19. Acquisition of Semantic Web Sources for DQM (1) Replication of relevant knowledge-bases (2) On the fly via federated SPARQL queries: PREFIX dbo:<http://dbpedia.org/ontology/> SELECT * WHERE { ?s1 :location_CITY ?city . OPTIONAL{ SERVICE <http://dbpedia.org/sparql>{ ?s2 a dbo:City . ?s2 rdfs:label ?city . FILTER (lang(?city) = "en") . } } FILTER(!bound(?s2)) } C. Fürber, M. Hepp: 19 Using SemWeb Resources for DQM
  • 20. Limitations • High degree of uncertainty about quality of Semantic Web resources • Risk for data quality problem proliferation • Lack of Semantic Web resources for certain domains • Flexible design of RDF and structural heterogeneity complicate definition of generic DQ constraints • Scalability on large data sets • DQ constraints close the world C. Fürber, M. Hepp: 20 Using SemWeb Resources for DQM
  • 21. Contributions • Data quality control for Semantic Web data • Identification of potential inconsistencies between Semantic Web Resources • Reduction of effort for the definition of functional dependency rules and legal value rules • Reuse of shared data quality rules on a Web scale C. Fürber, M. Hepp: 21 Using SemWeb Resources for DQM
  • 22. Future Work • Semantic Web information quality assessment framework (SWIQA) with computation of KPI‘s • Analysis and identification of useful „trusted references“ based on SWIQA • Application on multi-source master data of information systems • Evaluation on large data sets C. Fürber, M. Hepp: 22 Using SemWeb Resources for DQM
  • 23. Data Quality Constraints Library for SPIN @ http://semwebquality.org/ontologies/dq-constraints# Christian Fürber Researcher E-Business & Web Science Research Group Werner-Heisenberg-Weg 39 85577 Neubiberg Germany skype c.fuerber email christian@fuerber.com web http://www.unibw.de/ebusiness homepage http://www.fuerber.com twitter http://www.twitter.com/cfuerber Paper available at http://bit.ly/c5v6TM 23