SlideShare a Scribd company logo
1 of 7
Validation: requirements and
approaches
Dave Reynolds, Epimorphics Ltd
@der42
Validation requirements
based on experiences with data.gov.uk Linked Data
 Most current Linked Data in data.gov.uk is:
 described using a range of vocabularies and documentation
 validated , if at all, by publisher using internal/ad hoc tooling
 Emerging requirement for shared validation approach:
 to enable interoperability
 so publishers know the shape of data required
 publishing tools can e.g. auto-populate forms
 consuming tools know what to expect
 Key requirements:
 declarative – easily inspectable by tools
 declared – can locate the structure definition for a data set
 accessible to mortals
A spread of requirements
 regular data
 statistics, financial, environmental measurements, ...
 irregular data
 organizational structure, strategic plans, ...
 controlled terms
 code lists, regulated entities, geographic regions, ...
Regular data
 use Data Cube vocabulary
 http://www.w3.org/TR/vocab-data-cube/
 meets the requirements:
 declarative specification of structure - Data Structure Definition (DSD)
 declared: all observations link to DataSet link to DSD
 fairly understandable:
:complianceDsd a qb:DataStructureDefinition;
rdfs:label "complianceDsd"@en;
qb:component [qb:dimension :bathingWater],
[qb:dimension :samplingPoint],
[qb:dimension :sampleYear],
[qb:measure :complianceClassification],
[qb:attribute :inYearDetail];
qb:sliceKey :complianceByYearKey,
:complianceBySamplingPointKey .
But how to validate a data cube?
 Specification now defines “well-formed” cubes
 closed world notion of compliance with DSD
 integrity constraints specified by a set of SPARQL queries
 Lessons:
 SPARQL was sufficient to express all the required ICs
 some of the queries are convoluted and non-obvious
 at least one is quadratically slow unless optimizer is magic
 Useful compromise
 SPARQL doesn’t meet requirements of inspectable and
understandable
 but tools and humans can operate at the DSD level
Irregular data
 typically mix-and-match range of vocabularies
 declare usage via void:vocabulary
 target users find OWL impenetrable
 requirement for “vocabulary profiles”
 closed-world constraints on properties (cardinalities, ranges)
 expressivity of closed-world OWL would be sufficient
 but need a presentation layer to simplify authoring and
consumption – OSLC resource shapes?
 discovery mechanism
Controlled terms
 the other 80% of the problem
 common resource shapes the easy part
 interoperability means re-using terms for things in the domain
 sets of controlled terms (URI sets, code lists etc)
 can be very large
 often managed by third parties independent of data publisher
and vocabulary definer
 can be dynamic
 typically handled by some form of registry
 governed, closed-world, lists of approved terms at point in time
 implication
 need ability to validate against external services such as
registries

More Related Content

What's hot

How Portable Are the Metadata Standards for Scientific Data?
How Portable Are the Metadata Standards for Scientific Data?How Portable Are the Metadata Standards for Scientific Data?
How Portable Are the Metadata Standards for Scientific Data?Jian Qin
 
The Reach of Crossref metadata - Crossref LIVE South Africa
The Reach of Crossref metadata - Crossref LIVE South AfricaThe Reach of Crossref metadata - Crossref LIVE South Africa
The Reach of Crossref metadata - Crossref LIVE South AfricaCrossref
 
ePADD: Opening the world of email research through NLP -- nlp4arc, 2017
ePADD: Opening the world of email research through NLP -- nlp4arc, 2017ePADD: Opening the world of email research through NLP -- nlp4arc, 2017
ePADD: Opening the world of email research through NLP -- nlp4arc, 2017Josh Schneider
 
Crossref LIVE: The Benefits of Open Infrastructure (APAC time zones) - 29th O...
Crossref LIVE: The Benefits of Open Infrastructure (APAC time zones) - 29th O...Crossref LIVE: The Benefits of Open Infrastructure (APAC time zones) - 29th O...
Crossref LIVE: The Benefits of Open Infrastructure (APAC time zones) - 29th O...Crossref
 
Exploration of a Data Landscape using a Collaborative Linked Data Framework.
Exploration of a Data Landscape using a Collaborative Linked Data Framework.Exploration of a Data Landscape using a Collaborative Linked Data Framework.
Exploration of a Data Landscape using a Collaborative Linked Data Framework.Laurent Alquier
 
Semantic challenges in sharing dataset metadata and creating federated datase...
Semantic challenges in sharing dataset metadata and creating federated datase...Semantic challenges in sharing dataset metadata and creating federated datase...
Semantic challenges in sharing dataset metadata and creating federated datase...Valeria Pesce
 
Data discovery through federated dataset catalogs
Data discovery through federated dataset catalogsData discovery through federated dataset catalogs
Data discovery through federated dataset catalogsValeria Pesce
 
Dataset description using the W3C HCLS standard
Dataset description using the W3C HCLS standardDataset description using the W3C HCLS standard
Dataset description using the W3C HCLS standardmhaendel
 
Karma is a tool! Managing your Data
Karma is a tool! Managing your DataKarma is a tool! Managing your Data
Karma is a tool! Managing your DataVioleta Ilik
 
Integrating with others: Stable VIVO URIs for local authority records; linkin...
Integrating with others: Stable VIVO URIs for local authority records; linkin...Integrating with others: Stable VIVO URIs for local authority records; linkin...
Integrating with others: Stable VIVO URIs for local authority records; linkin...Violeta Ilik
 
New Metadata Developments
New Metadata Developments New Metadata Developments
New Metadata Developments Crossref
 
The Case for Stable VIVO URIs
The Case for Stable VIVO URIsThe Case for Stable VIVO URIs
The Case for Stable VIVO URIsVioleta Ilik
 
ORCID Use Cases from the CDL
ORCID Use Cases from the CDLORCID Use Cases from the CDL
ORCID Use Cases from the CDLLisa Schiff
 
OAI Metadata: Why and How
OAI Metadata: Why and HowOAI Metadata: Why and How
OAI Metadata: Why and HowJenn Riley
 
Validata: A tool for testing profile conformance
Validata: A tool for testing profile conformanceValidata: A tool for testing profile conformance
Validata: A tool for testing profile conformanceAlasdair Gray
 
Karma Data Modeling
Karma Data ModelingKarma Data Modeling
Karma Data ModelingVioleta Ilik
 
The DataTags System: Sharing Sensitive Data with Confidence
The DataTags System: Sharing Sensitive Data with ConfidenceThe DataTags System: Sharing Sensitive Data with Confidence
The DataTags System: Sharing Sensitive Data with ConfidenceMerce Crosas
 
10. Participation Reports
10. Participation Reports10. Participation Reports
10. Participation ReportsCrossref
 

What's hot (20)

How Portable Are the Metadata Standards for Scientific Data?
How Portable Are the Metadata Standards for Scientific Data?How Portable Are the Metadata Standards for Scientific Data?
How Portable Are the Metadata Standards for Scientific Data?
 
The Reach of Crossref metadata - Crossref LIVE South Africa
The Reach of Crossref metadata - Crossref LIVE South AfricaThe Reach of Crossref metadata - Crossref LIVE South Africa
The Reach of Crossref metadata - Crossref LIVE South Africa
 
ePADD: Opening the world of email research through NLP -- nlp4arc, 2017
ePADD: Opening the world of email research through NLP -- nlp4arc, 2017ePADD: Opening the world of email research through NLP -- nlp4arc, 2017
ePADD: Opening the world of email research through NLP -- nlp4arc, 2017
 
Crossref LIVE: The Benefits of Open Infrastructure (APAC time zones) - 29th O...
Crossref LIVE: The Benefits of Open Infrastructure (APAC time zones) - 29th O...Crossref LIVE: The Benefits of Open Infrastructure (APAC time zones) - 29th O...
Crossref LIVE: The Benefits of Open Infrastructure (APAC time zones) - 29th O...
 
Distributed Registries
Distributed RegistriesDistributed Registries
Distributed Registries
 
Exploration of a Data Landscape using a Collaborative Linked Data Framework.
Exploration of a Data Landscape using a Collaborative Linked Data Framework.Exploration of a Data Landscape using a Collaborative Linked Data Framework.
Exploration of a Data Landscape using a Collaborative Linked Data Framework.
 
Semantic challenges in sharing dataset metadata and creating federated datase...
Semantic challenges in sharing dataset metadata and creating federated datase...Semantic challenges in sharing dataset metadata and creating federated datase...
Semantic challenges in sharing dataset metadata and creating federated datase...
 
Data discovery through federated dataset catalogs
Data discovery through federated dataset catalogsData discovery through federated dataset catalogs
Data discovery through federated dataset catalogs
 
Dataset description using the W3C HCLS standard
Dataset description using the W3C HCLS standardDataset description using the W3C HCLS standard
Dataset description using the W3C HCLS standard
 
Karma is a tool! Managing your Data
Karma is a tool! Managing your DataKarma is a tool! Managing your Data
Karma is a tool! Managing your Data
 
Integrating with others: Stable VIVO URIs for local authority records; linkin...
Integrating with others: Stable VIVO URIs for local authority records; linkin...Integrating with others: Stable VIVO URIs for local authority records; linkin...
Integrating with others: Stable VIVO URIs for local authority records; linkin...
 
New Metadata Developments
New Metadata Developments New Metadata Developments
New Metadata Developments
 
The Case for Stable VIVO URIs
The Case for Stable VIVO URIsThe Case for Stable VIVO URIs
The Case for Stable VIVO URIs
 
ORCID Use Cases from the CDL
ORCID Use Cases from the CDLORCID Use Cases from the CDL
ORCID Use Cases from the CDL
 
OAI Metadata: Why and How
OAI Metadata: Why and HowOAI Metadata: Why and How
OAI Metadata: Why and How
 
Resource Browser
Resource BrowserResource Browser
Resource Browser
 
Validata: A tool for testing profile conformance
Validata: A tool for testing profile conformanceValidata: A tool for testing profile conformance
Validata: A tool for testing profile conformance
 
Karma Data Modeling
Karma Data ModelingKarma Data Modeling
Karma Data Modeling
 
The DataTags System: Sharing Sensitive Data with Confidence
The DataTags System: Sharing Sensitive Data with ConfidenceThe DataTags System: Sharing Sensitive Data with Confidence
The DataTags System: Sharing Sensitive Data with Confidence
 
10. Participation Reports
10. Participation Reports10. Participation Reports
10. Participation Reports
 

Viewers also liked

Registry Technical Training
Registry Technical TrainingRegistry Technical Training
Registry Technical TrainingDave Reynolds
 
Ukgovld registry-webinar-v3
Ukgovld registry-webinar-v3Ukgovld registry-webinar-v3
Ukgovld registry-webinar-v3Dave Reynolds
 
Maximizing benefit of Open Data through Linked Data Services
Maximizing benefit of Open Data through Linked Data ServicesMaximizing benefit of Open Data through Linked Data Services
Maximizing benefit of Open Data through Linked Data ServicesDave Reynolds
 
Using linked data for dataset publication
Using linked data for dataset publicationUsing linked data for dataset publication
Using linked data for dataset publicationDave Reynolds
 
Industrialized Linked Data
Industrialized Linked DataIndustrialized Linked Data
Industrialized Linked DataDave Reynolds
 
Ukgovld registry-intro
Ukgovld registry-introUkgovld registry-intro
Ukgovld registry-introDave Reynolds
 
Linked Data Hypercubes - Semtech London
Linked Data Hypercubes - Semtech LondonLinked Data Hypercubes - Semtech London
Linked Data Hypercubes - Semtech LondonDave Reynolds
 
Resilient Linked Data
Resilient Linked DataResilient Linked Data
Resilient Linked DataDave Reynolds
 

Viewers also liked (9)

Registry Technical Training
Registry Technical TrainingRegistry Technical Training
Registry Technical Training
 
Ukgovld registry-webinar-v3
Ukgovld registry-webinar-v3Ukgovld registry-webinar-v3
Ukgovld registry-webinar-v3
 
Maximizing benefit of Open Data through Linked Data Services
Maximizing benefit of Open Data through Linked Data ServicesMaximizing benefit of Open Data through Linked Data Services
Maximizing benefit of Open Data through Linked Data Services
 
Using linked data for dataset publication
Using linked data for dataset publicationUsing linked data for dataset publication
Using linked data for dataset publication
 
Industrialized Linked Data
Industrialized Linked DataIndustrialized Linked Data
Industrialized Linked Data
 
Ukgovld registry-intro
Ukgovld registry-introUkgovld registry-intro
Ukgovld registry-intro
 
Registry webinar
Registry webinarRegistry webinar
Registry webinar
 
Linked Data Hypercubes - Semtech London
Linked Data Hypercubes - Semtech LondonLinked Data Hypercubes - Semtech London
Linked Data Hypercubes - Semtech London
 
Resilient Linked Data
Resilient Linked DataResilient Linked Data
Resilient Linked Data
 

Similar to Validation: Requirements and approaches

Chris Oliver: RDA: Designed for Current and Future Environments
Chris Oliver: RDA: Designed for Current and Future EnvironmentsChris Oliver: RDA: Designed for Current and Future Environments
Chris Oliver: RDA: Designed for Current and Future EnvironmentsALATechSource
 
How to describe a dataset. Interoperability issues
How to describe a dataset. Interoperability issuesHow to describe a dataset. Interoperability issues
How to describe a dataset. Interoperability issuesValeria Pesce
 
Knowledge Discovery in an Agents Environment
Knowledge Discovery in an Agents EnvironmentKnowledge Discovery in an Agents Environment
Knowledge Discovery in an Agents EnvironmentManjulaPatel
 
Linked Open Data in the World of Patents
Linked Open Data in the World of Patents Linked Open Data in the World of Patents
Linked Open Data in the World of Patents Dr. Haxel Consult
 
Ontologies, controlled vocabularies and Dataverse
Ontologies, controlled vocabularies and DataverseOntologies, controlled vocabularies and Dataverse
Ontologies, controlled vocabularies and Dataversevty
 
How to Find a Needle in the Haystack
How to Find a Needle in the HaystackHow to Find a Needle in the Haystack
How to Find a Needle in the HaystackAdrian Stevenson
 
Metadata: Digital Humanties
Metadata: Digital HumantiesMetadata: Digital Humanties
Metadata: Digital HumantiesMatthew Miguez
 
Seminario eMadrid sobre "Nuevas experiencias en laboratorios remotos". Estand...
Seminario eMadrid sobre "Nuevas experiencias en laboratorios remotos". Estand...Seminario eMadrid sobre "Nuevas experiencias en laboratorios remotos". Estand...
Seminario eMadrid sobre "Nuevas experiencias en laboratorios remotos". Estand...eMadrid network
 
Sabrina Kirrane INSIGHT Viva Presentation
Sabrina Kirrane INSIGHT Viva Presentation Sabrina Kirrane INSIGHT Viva Presentation
Sabrina Kirrane INSIGHT Viva Presentation Sabrina Kirrane
 
Database Concepts 101
Database Concepts 101Database Concepts 101
Database Concepts 101Amit Garg
 
Dataset Catalogs as a Foundation for FAIR* Data
Dataset Catalogs as a Foundation for FAIR* DataDataset Catalogs as a Foundation for FAIR* Data
Dataset Catalogs as a Foundation for FAIR* DataTom Plasterer
 
Semantics in Financial Services -David Newman
Semantics in Financial Services -David NewmanSemantics in Financial Services -David Newman
Semantics in Financial Services -David NewmanPeter Berger
 
Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web
Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web
Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web María Poveda Villalón
 
Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)
Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)
Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)Dr.-Ing. Thomas Hartmann
 
Dataset description: DCAT and other vocabularies
Dataset description: DCAT and other vocabulariesDataset description: DCAT and other vocabularies
Dataset description: DCAT and other vocabulariesValeria Pesce
 
Swap For Dummies Rsp 2007 11 29
Swap For Dummies Rsp 2007 11 29Swap For Dummies Rsp 2007 11 29
Swap For Dummies Rsp 2007 11 29Julie Allinson
 

Similar to Validation: Requirements and approaches (20)

Chris Oliver: RDA: Designed for Current and Future Environments
Chris Oliver: RDA: Designed for Current and Future EnvironmentsChris Oliver: RDA: Designed for Current and Future Environments
Chris Oliver: RDA: Designed for Current and Future Environments
 
How to describe a dataset. Interoperability issues
How to describe a dataset. Interoperability issuesHow to describe a dataset. Interoperability issues
How to describe a dataset. Interoperability issues
 
How to Describe a Dataset. Interoperability Issues, by Valeria Pesce
How to Describe a Dataset. Interoperability Issues, by Valeria PesceHow to Describe a Dataset. Interoperability Issues, by Valeria Pesce
How to Describe a Dataset. Interoperability Issues, by Valeria Pesce
 
Knowledge Discovery in an Agents Environment
Knowledge Discovery in an Agents EnvironmentKnowledge Discovery in an Agents Environment
Knowledge Discovery in an Agents Environment
 
Linked Open Data in the World of Patents
Linked Open Data in the World of Patents Linked Open Data in the World of Patents
Linked Open Data in the World of Patents
 
Ontologies, controlled vocabularies and Dataverse
Ontologies, controlled vocabularies and DataverseOntologies, controlled vocabularies and Dataverse
Ontologies, controlled vocabularies and Dataverse
 
A Case for linked Data for Medical Devices in the IVD Market
A Case for linked Data for Medical Devices in the IVD MarketA Case for linked Data for Medical Devices in the IVD Market
A Case for linked Data for Medical Devices in the IVD Market
 
How to Find a Needle in the Haystack
How to Find a Needle in the HaystackHow to Find a Needle in the Haystack
How to Find a Needle in the Haystack
 
Metadata: Digital Humanties
Metadata: Digital HumantiesMetadata: Digital Humanties
Metadata: Digital Humanties
 
LeVan, "Search Web Services"
LeVan, "Search Web Services"LeVan, "Search Web Services"
LeVan, "Search Web Services"
 
Seminario eMadrid sobre "Nuevas experiencias en laboratorios remotos". Estand...
Seminario eMadrid sobre "Nuevas experiencias en laboratorios remotos". Estand...Seminario eMadrid sobre "Nuevas experiencias en laboratorios remotos". Estand...
Seminario eMadrid sobre "Nuevas experiencias en laboratorios remotos". Estand...
 
Linked Data
Linked DataLinked Data
Linked Data
 
Sabrina Kirrane INSIGHT Viva Presentation
Sabrina Kirrane INSIGHT Viva Presentation Sabrina Kirrane INSIGHT Viva Presentation
Sabrina Kirrane INSIGHT Viva Presentation
 
Database Concepts 101
Database Concepts 101Database Concepts 101
Database Concepts 101
 
Dataset Catalogs as a Foundation for FAIR* Data
Dataset Catalogs as a Foundation for FAIR* DataDataset Catalogs as a Foundation for FAIR* Data
Dataset Catalogs as a Foundation for FAIR* Data
 
Semantics in Financial Services -David Newman
Semantics in Financial Services -David NewmanSemantics in Financial Services -David Newman
Semantics in Financial Services -David Newman
 
Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web
Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web
Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web
 
Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)
Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)
Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)
 
Dataset description: DCAT and other vocabularies
Dataset description: DCAT and other vocabulariesDataset description: DCAT and other vocabularies
Dataset description: DCAT and other vocabularies
 
Swap For Dummies Rsp 2007 11 29
Swap For Dummies Rsp 2007 11 29Swap For Dummies Rsp 2007 11 29
Swap For Dummies Rsp 2007 11 29
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 

Recently uploaded (20)

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 

Validation: Requirements and approaches

  • 1. Validation: requirements and approaches Dave Reynolds, Epimorphics Ltd @der42
  • 2. Validation requirements based on experiences with data.gov.uk Linked Data  Most current Linked Data in data.gov.uk is:  described using a range of vocabularies and documentation  validated , if at all, by publisher using internal/ad hoc tooling  Emerging requirement for shared validation approach:  to enable interoperability  so publishers know the shape of data required  publishing tools can e.g. auto-populate forms  consuming tools know what to expect  Key requirements:  declarative – easily inspectable by tools  declared – can locate the structure definition for a data set  accessible to mortals
  • 3. A spread of requirements  regular data  statistics, financial, environmental measurements, ...  irregular data  organizational structure, strategic plans, ...  controlled terms  code lists, regulated entities, geographic regions, ...
  • 4. Regular data  use Data Cube vocabulary  http://www.w3.org/TR/vocab-data-cube/  meets the requirements:  declarative specification of structure - Data Structure Definition (DSD)  declared: all observations link to DataSet link to DSD  fairly understandable: :complianceDsd a qb:DataStructureDefinition; rdfs:label "complianceDsd"@en; qb:component [qb:dimension :bathingWater], [qb:dimension :samplingPoint], [qb:dimension :sampleYear], [qb:measure :complianceClassification], [qb:attribute :inYearDetail]; qb:sliceKey :complianceByYearKey, :complianceBySamplingPointKey .
  • 5. But how to validate a data cube?  Specification now defines “well-formed” cubes  closed world notion of compliance with DSD  integrity constraints specified by a set of SPARQL queries  Lessons:  SPARQL was sufficient to express all the required ICs  some of the queries are convoluted and non-obvious  at least one is quadratically slow unless optimizer is magic  Useful compromise  SPARQL doesn’t meet requirements of inspectable and understandable  but tools and humans can operate at the DSD level
  • 6. Irregular data  typically mix-and-match range of vocabularies  declare usage via void:vocabulary  target users find OWL impenetrable  requirement for “vocabulary profiles”  closed-world constraints on properties (cardinalities, ranges)  expressivity of closed-world OWL would be sufficient  but need a presentation layer to simplify authoring and consumption – OSLC resource shapes?  discovery mechanism
  • 7. Controlled terms  the other 80% of the problem  common resource shapes the easy part  interoperability means re-using terms for things in the domain  sets of controlled terms (URI sets, code lists etc)  can be very large  often managed by third parties independent of data publisher and vocabulary definer  can be dynamic  typically handled by some form of registry  governed, closed-world, lists of approved terms at point in time  implication  need ability to validate against external services such as registries