SlideShare a Scribd company logo
1 of 37
Download to read offline
Using OWL in
Closed World Applications

         Evren Sirin, CTO
        Clark & Parsia, LLC
      evren@clarkparsia.com
Who are we?
• Clark & Parsia is a semantic software startup 
  – HQ in Washington, DC & office in Boston
• Provides software development and integration
  services
• Specializing in Semantic Web, web services, and
  advanced AI technologies for federal and
  enterprise customers 
                 http://clarkparsia.com/
                 Twitter: @candp
                                             2
Some Applications
• Customer and product data
  – Find which customer would be interested in buying a
    certain product
• System and component descriptions
  – Configure components to build a desired system
• Workforce and employee data
  – Locate employees with desired expertise
• Patient history and drug data
  – Detect and prevent potentially harmful drug interactions

                                                     3
Common Theme
• There is data and lots of it!
• Adding semantics to the data helps a lot
  – Some times simple taxonomies, but other times,
    complex ontologies
• We have complete knowledge about the domain
• Errors in the data cause problems
  – Failures in applications, errors in decision making,
    potential loss of revenue, security vulnerabilities, etc.


                                                      4
Data Validation
• Fundamental data management problem
  – Verify data integrity and correctness 
  – Enforce validity of updates 
• Relevant in many scenarios
  – Storing data for stand-alone applications
  – Exchanging data in distributed settings
• Solved (to some degree) in RDBMSs
  – Harder to achieve as data semantics increase and/or
    more expressive integrity conditions are required
                                                 5
Disclaimer
• Data validity not important for every use case
  – Invalid data may be fine for an application
  – Invalidity may even be a requirement
• Focus of this talk is cases where data consistency
  and integrity are crucial




                                                 6
Roadmap for an App
• How to build one of these applications?
  – Represent data as RDF triples
     • First step for accomplishing data integration and analysis
  – Enrich data with more semantics (RDFS, OWL)
     • Infer implicit information from explicit assertions
  – Ensure data validity
     • Detect errors in the data
  – Do something cool with the data
     • Obviously...

                                                             7
Reasoning Example
• Input ontology
      # Every manager is an employee
      Manager subClassOf Employee
      # Person0853 is a manager
      Person0853 type Manager
• Output inferences
      # Person0853 is an employee
      Person0853 type Employee
Reasoning Example
• Input ontology
      # Every manager is an employee
                                       Schema
      Manager subClassOf Employee
      # Person0853 is a manager
      Person0853 type Manager
• Output inferences
      # Person0853 is an employee
      Person0853 type Employee
Reasoning Example
• Input ontology
      # Every manager is an employee
                                       Schema
      Manager subClassOf Employee
      # Person0853 is a manager
      Person0853 type Manager          Instance data
• Output inferences
      # Person0853 is an employee
      Person0853 type Employee
Validating RDF Data
• Common misunderstanding
  – RDFS/OWL is to RDF what XML Schema is to XML
  – Describe integrity conditions in RDFS or OWL
     • Typing constraints - RDFS domain/range
     • Participation constraints - OWL some values restrictions
     • Uniqueness constraints - OWL cardinality restriction
  – Use a reasoner to find inconsistencies
• Problem: Open World Assumption


                                                         9
Closed vs. Open World
• Two different views on truth:
   – CWA: Any statement that is not known to be true is false
   – OWA: A statement is false only if it is known to be false
• Used in different contexts
   – Databases use CWA because (typically) they contain 
     complete information
   – Ontologies use OWA because (typically) they don't...
     that is, they contain incomplete information
• Data validation results significantly different when
  using CWA instead of OWA
                                                      10
Typing Constraint
 • Only managers can supervise employees
 • Input ontology
    o   supervises domain Manager
    o   Person085 supervises Person173


                      OWA                        CWA
 Consistent           true                       false
              Infer that               Assume that
 Reason       Person085 type Manager   Person085 type not Manager
Participation Constraint
• Each supervisor must supervise at least
  one employee
• Input axioms
  o   Supervisor subClassOf supervises some Employee
  o   Person085 type Supervisor

                       OWA                      CWA
Consistent                 true                  false

              Infer that                Assume that
Reason       Person085 supervises _:b   Person085 supervises _:b
             _:b type Employee          does not exist
Uniqueness Constraint
 • Employees can have at most one supervisor
 • Input axioms
    o   supervises InverseFunctional
    o   Person085 supervises Person173
    o   Person632 supervises Person173


                      OWA                          CWA
Consistent             true                         false
                                        Assume that
             Infer that
Reason       Person085 sameAs Person632
                                        Person085 sameAs Person632
                                        does not hold
Workarounds for CW
• Manually close the world
  – Declare all individuals different from each other
  – Count existing property values and add a max
    cardinality restriction
  – Make all disjointness statements explicit and add
    negated types to individuals
• Drawbacks
  – Can be computationally expensive
  – Likely to be error-prone
Problem Summary
• Definitions in an OWL schema may have two
  purposes
  – Infer new statements
  – Check if existing statements are valid
• Using OWA for validation is undesirable
  – Not always but in many cases
• In a problem domain we may have:
  – Complete knowledge about some parts of the domain
  – Incomplete knowledge about the other parts
Integrity Constraint
             Solution
• We defined an alternative semantics for OWL
  – Integrity Constraint (IC) semantics use CWA
  – Can be combined with regular inference axioms
• Ontology developer chooses which axioms will
  be interpreted with...
  – OWA - regular OWL axiom, or
  – CWA - integrity constraint
IC Extension
• Syntax specification
  – How do we syntactically say an axiom is an IC and
    not a regular OWL axiom?
• Semantics specification
  – How do we exactly interpret an IC?
• Validation algorithm
  – Given the semantics how do we check for IC
    violations?
IC Syntax
• Similar approach to using owl:imports
• Define a new annotation property in a new
  namespace

         Ont1 owl:imports Ont2
         Ont1 ic:imports IC1

• Backward compatible, requires minimum change
  in tools
IC Semantics
• OWL semantics based on model theory
  – Similar to First Order Logic
  – Formal, precise, and unambiguous
• IC semantics specification
  – Extends OWL model theory
  – Change couple basic definitions, everything else
    follows
• Details published in technical papers
  – We are submitting a W3C member submission soon
Use Case: SKOS
• Simple Knowledge Organization System (SKOS)
• SKOS provides a model for expressing the basic
  structure and content of concept schemes
  – Thesauri, classification schemes, subject heading lists,
    taxonomies, folksonomies, etc.
• SKOS data model specification
  – Informal (Text): http://www.w3.org/TR/skos-reference/
  – Formal (OWL): http://www.w3.org/2004/02/skos/core.rdf


                                                       20
SKOS Example
# SKOS reference ontology that contains inference rules
skos:broaderTransitive Transitive                       skos-reference.ttl
skos:broaderTransitive subPropertyOf skos:broader

# Constraints from SKOS reference expressed as ICs
skos:related propertyDisjointWith skos:broaderTransitive skos-constraints.ttl


# SKOS data that violates the SKOS data model
[] a owl:Ontology ; owl:imports skos-reference.ttl ;
                     ic:imports skos-constraints.ttl .     skos-invalid.ttl

A skos:broader B ; skos:related C .
B skos:broader C .
Explanation
VIOLATION: A violates related propertyDisjointWith broaderTransitive
   INFERRED: A related C
      ASSERTED: A related C
   INFERRED: A broaderTransitive C
      ASSERTED: A broader B
      ASSERTED: B broader C
      ASSERTED: broader subPropertyOf broaderTransitive
      ASSERTED: broaderTransitive Transitive



                                                        22
Another SKOS Example
# SKOS-XL ontology with a cardinality restriction
skosxl:Label subClassOf                             skos-xl.ttl
                 skosxl:literalForm cardinality 1

# SKOS data that violates the SKOS data model
[] a owl:Ontology ; owl:imports skos-xl.ttl .
                                                    skos-data.tll
A skosxl:labelRelation LabelA
LabelA type skosxl:Label .


            Result: Consistent
Another SKOS Example
# SKOS-XL ontology with a cardinality restriction
skosxl:Label subClassOf                             skos-xl.ttl
                 skosxl:literalForm cardinality 1

# SKOS data that violates the SKOS data model
[] a owl:Ontology ; owl:imports skos-xl.ttl ;
                     ic:imports skos-xl.ttl .       skos-data.tll

A skosxl:labelRelation LabelA
LabelA type skosxl:Label .


            Result: IC Violation
Linked Data Application
• Large amounts of instance data
• Validate before publishing/consuming LOD
• Instance data + Inference axioms + Constraints
  – Infer new facts using inference axioms with OWA
  – Validate data using constraints with CWA
  – Inference axioms and constraints are both expressed
    in OWL



                                                25
Validation Algorithm
• An automated translation algorithm
• Automatically maps an OWL IC to ...
  – A SPARQL query, or
  – A RIF rule
• Many different implementation possibilities
• Off-the-shelf tools can be used for IC validation
SPARQL Translation
Supervisor subClassOf supervises some Employee



       SELECT * {
          ?x type Supervisor.
          NOT EXISTS {
             ?x supervises ?y.
             ?y type Employee.
          }
       }
RIF Translation
Supervisor subClassOf supervises some Employee



       Forall ?x ?y (
         invalid() :- And (
            ?x[type -> Supervisor]
            Naf And (
               ?x[supervises -> ?y]
               ?y[type -> Employee] )))
Solution Summary
• Separate ICs from regular OWL ICs
  – No new syntax
  – Import-based mechanism
• Alternative semantics for ICs
  – Extends OWL model theory
  – Provides the meanings of ICs formally
• Validation algorithm
  – Translate ICs to another formalism
  – SPARQL or RIF engines can be used
Performance
• Using ICs can improve performance!
• Expressive OWL reasoning is not easy
• Profiles of OWL defined for tractable reasoning
  – OWL 2 QL, OWL 2 EL, OWL 2 RL
  – Less expressive but more efficient
• Modeling some OWL axioms as ICs may reduce
  the overall expressivity


                                          30
Prototype
• Pellet IC validator
  –   Translates ICs into SPARQL queries automatically
  –   Executes SPARQL queries with Pellet
  –   Query results show constraint violations
  –   Automatically explain constraint violations
• Free download
  – http://clarkparsia.com/pellet/icv



                                                  31
Code Example
// create an inferencing model using Pellet reasoner
InfModel dataModel = ModelFactory.createInfModel(r);

// load the schema and instance data to Pellet
dataModel.read( "file:data.rdf" );
dataModel.read( "file:schema.owl" );

// Create the IC validator and associate it with the dataset
JenaICValidator validator = new JenaICValidator(dataModel);

// Load the constraints into the IC validator
validator.getConstraints().read("file:constraints.owl");

// Get the constraint violations
Iterator<ConstraintViolation> violations =
                                      validator.getViolations();
Next Steps
• W3C Member submission for IC semantics
• Robust IC validator implementation
  – Incremental validation
  – Multi-threaded validation
• Support for IC editing
• Integration with PelletDb
  – Scalable reasoning + validation


                                       33
References
• Evren Sirin, Michael Smith, Evan Wallace
  Opening, Closing Worlds - On Integrity Constraints
  OWL: Experiences and Directions Workshop
  (OWLED '08), October 2008.
• Evren Sirin, Jiao Tao
  Towards Integrity Constraints in OWL
  OWL: Experiences and Directions Workshop
  (OWLED '09), October 2009.
• Jiao Tao, Evren Sirin, Jie Bao, Deborah L. McGuinness
  Integrity Constraints in OWL
  To AppearThe 24th AAAIConference on Artificial
  Intelligence (AAAI '10), July 2010.
Questions

More Related Content

What's hot

Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Lucidworks
 
Jena – A Semantic Web Framework for Java
Jena – A Semantic Web Framework for JavaJena – A Semantic Web Framework for Java
Jena – A Semantic Web Framework for Java
Aleksander Pohl
 

What's hot (19)

Infrastructure Provisioning in the context of organization
Infrastructure Provisioning in the context of organizationInfrastructure Provisioning in the context of organization
Infrastructure Provisioning in the context of organization
 
Learning sparql 2012 12
Learning sparql 2012 12Learning sparql 2012 12
Learning sparql 2012 12
 
Eclipse JNoSQL: One API to Many NoSQL Databases - BYOL [HOL5998]
Eclipse JNoSQL: One API to Many NoSQL Databases - BYOL [HOL5998]Eclipse JNoSQL: One API to Many NoSQL Databases - BYOL [HOL5998]
Eclipse JNoSQL: One API to Many NoSQL Databases - BYOL [HOL5998]
 
Jakarta EE Meets NoSQL in the Cloud Age [DEV6109]
Jakarta EE Meets NoSQL in the Cloud Age [DEV6109]Jakarta EE Meets NoSQL in the Cloud Age [DEV6109]
Jakarta EE Meets NoSQL in the Cloud Age [DEV6109]
 
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
 
Managed Search: Presented by Jacob Graves, Getty Images
Managed Search: Presented by Jacob Graves, Getty ImagesManaged Search: Presented by Jacob Graves, Getty Images
Managed Search: Presented by Jacob Graves, Getty Images
 
Intro to Elasticsearch
Intro to ElasticsearchIntro to Elasticsearch
Intro to Elasticsearch
 
Neural Architectures for Named Entity Recognition
Neural Architectures for Named Entity RecognitionNeural Architectures for Named Entity Recognition
Neural Architectures for Named Entity Recognition
 
Eventually Elasticsearch: Eventual Consistency in the Real World
Eventually Elasticsearch: Eventual Consistency in the Real WorldEventually Elasticsearch: Eventual Consistency in the Real World
Eventually Elasticsearch: Eventual Consistency in the Real World
 
elasticsearch
elasticsearchelasticsearch
elasticsearch
 
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
 
Jena – A Semantic Web Framework for Java
Jena – A Semantic Web Framework for JavaJena – A Semantic Web Framework for Java
Jena – A Semantic Web Framework for Java
 
Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics
Approaching Join Index: Presented by Mikhail Khludnev, Grid DynamicsApproaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics
Approaching Join Index: Presented by Mikhail Khludnev, Grid Dynamics
 
Semantic Integration with Apache Jena and Stanbol
Semantic Integration with Apache Jena and StanbolSemantic Integration with Apache Jena and Stanbol
Semantic Integration with Apache Jena and Stanbol
 
Epita pres
Epita presEpita pres
Epita pres
 
Customizing Ranking Models for Enterprise Search: Presented by Ammar Haris & ...
Customizing Ranking Models for Enterprise Search: Presented by Ammar Haris & ...Customizing Ranking Models for Enterprise Search: Presented by Ammar Haris & ...
Customizing Ranking Models for Enterprise Search: Presented by Ammar Haris & ...
 
Composable Futures with Akka 2.0
Composable Futures with Akka 2.0Composable Futures with Akka 2.0
Composable Futures with Akka 2.0
 
Vectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic MatchingVectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic Matching
 
Building a lightweight discovery interface for Chinese patents
Building a lightweight discovery interface for Chinese patentsBuilding a lightweight discovery interface for Chinese patents
Building a lightweight discovery interface for Chinese patents
 

Similar to Sem tech 2010_integrity_constraints

Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014
The Hive
 
Active Record PowerPoint
Active Record PowerPointActive Record PowerPoint
Active Record PowerPoint
Elizabeth Cruz
 
Chris Gates - Attacking Oracle Web Applications With Metasploit (and wXf)
Chris Gates - Attacking Oracle Web Applications With Metasploit (and wXf)Chris Gates - Attacking Oracle Web Applications With Metasploit (and wXf)
Chris Gates - Attacking Oracle Web Applications With Metasploit (and wXf)
Source Conference
 
SOURCE Boston --Attacking Oracle Web Applications with Metasploit & wXf
SOURCE Boston --Attacking Oracle Web Applications with Metasploit & wXfSOURCE Boston --Attacking Oracle Web Applications with Metasploit & wXf
SOURCE Boston --Attacking Oracle Web Applications with Metasploit & wXf
Chris Gates
 
Hacking Oracle Web Applications With Metasploit
Hacking Oracle Web Applications With MetasploitHacking Oracle Web Applications With Metasploit
Hacking Oracle Web Applications With Metasploit
Chris Gates
 

Similar to Sem tech 2010_integrity_constraints (20)

Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?
 
Agile Data Science: Building Hadoop Analytics Applications
Agile Data Science: Building Hadoop Analytics ApplicationsAgile Data Science: Building Hadoop Analytics Applications
Agile Data Science: Building Hadoop Analytics Applications
 
Agile Data: Building Hadoop Analytics Applications
Agile Data: Building Hadoop Analytics ApplicationsAgile Data: Building Hadoop Analytics Applications
Agile Data: Building Hadoop Analytics Applications
 
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014
 
PHP - Introduction to Advanced SQL
PHP - Introduction to Advanced SQLPHP - Introduction to Advanced SQL
PHP - Introduction to Advanced SQL
 
Active Record PowerPoint
Active Record PowerPointActive Record PowerPoint
Active Record PowerPoint
 
Chris Gates - Attacking Oracle Web Applications With Metasploit (and wXf)
Chris Gates - Attacking Oracle Web Applications With Metasploit (and wXf)Chris Gates - Attacking Oracle Web Applications With Metasploit (and wXf)
Chris Gates - Attacking Oracle Web Applications With Metasploit (and wXf)
 
SOURCE Boston --Attacking Oracle Web Applications with Metasploit & wXf
SOURCE Boston --Attacking Oracle Web Applications with Metasploit & wXfSOURCE Boston --Attacking Oracle Web Applications with Metasploit & wXf
SOURCE Boston --Attacking Oracle Web Applications with Metasploit & wXf
 
Building a Lightweight Discovery Interface for China's Patents@NYC Solr/Lucen...
Building a Lightweight Discovery Interface for China's Patents@NYC Solr/Lucen...Building a Lightweight Discovery Interface for China's Patents@NYC Solr/Lucen...
Building a Lightweight Discovery Interface for China's Patents@NYC Solr/Lucen...
 
Oracle Database Interview Questions -PART 1 | sql, plsql, dbms, scenario base...
Oracle Database Interview Questions -PART 1 | sql, plsql, dbms, scenario base...Oracle Database Interview Questions -PART 1 | sql, plsql, dbms, scenario base...
Oracle Database Interview Questions -PART 1 | sql, plsql, dbms, scenario base...
 
Introduction to Machine Learning for Oracle Database Professionals
Introduction to Machine Learning for Oracle Database ProfessionalsIntroduction to Machine Learning for Oracle Database Professionals
Introduction to Machine Learning for Oracle Database Professionals
 
SRV318_Research at PNNL Powered by AWS
SRV318_Research at PNNL Powered by AWSSRV318_Research at PNNL Powered by AWS
SRV318_Research at PNNL Powered by AWS
 
Research at PNNL: Powered by AWS - SRV318 - re:Invent 2017
Research at PNNL: Powered by AWS - SRV318 - re:Invent 2017Research at PNNL: Powered by AWS - SRV318 - re:Invent 2017
Research at PNNL: Powered by AWS - SRV318 - re:Invent 2017
 
Agile Data Science: Hadoop Analytics Applications
Agile Data Science: Hadoop Analytics ApplicationsAgile Data Science: Hadoop Analytics Applications
Agile Data Science: Hadoop Analytics Applications
 
Webinar 5-reasons-object-storage.pptx
Webinar 5-reasons-object-storage.pptxWebinar 5-reasons-object-storage.pptx
Webinar 5-reasons-object-storage.pptx
 
Introduction to SoapUI day 1
Introduction to SoapUI day 1Introduction to SoapUI day 1
Introduction to SoapUI day 1
 
Soap UI - Getting started
Soap UI - Getting startedSoap UI - Getting started
Soap UI - Getting started
 
What Drove Wordnik Non-Relational?
What Drove Wordnik Non-Relational?What Drove Wordnik Non-Relational?
What Drove Wordnik Non-Relational?
 
Hacking oracle using metasploit
Hacking oracle using metasploitHacking oracle using metasploit
Hacking oracle using metasploit
 
Hacking Oracle Web Applications With Metasploit
Hacking Oracle Web Applications With MetasploitHacking Oracle Web Applications With Metasploit
Hacking Oracle Web Applications With Metasploit
 

More from Clark & Parsia LLC

Validating Linked Data with OWL
Validating Linked Data with OWLValidating Linked Data with OWL
Validating Linked Data with OWL
Clark & Parsia LLC
 
PelletDb: Scalable Reasoning for Enterprise Semantics
PelletDb: Scalable Reasoning for Enterprise SemanticsPelletDb: Scalable Reasoning for Enterprise Semantics
PelletDb: Scalable Reasoning for Enterprise Semantics
Clark & Parsia LLC
 
Automated Planning as a Semantic Technology
Automated Planning as a Semantic TechnologyAutomated Planning as a Semantic Technology
Automated Planning as a Semantic Technology
Clark & Parsia LLC
 

More from Clark & Parsia LLC (10)

Stardog Linked Data Catalog
Stardog Linked Data CatalogStardog Linked Data Catalog
Stardog Linked Data Catalog
 
Stardog 1.1: Easier, Smarter, Faster RDF Database
Stardog 1.1: Easier, Smarter, Faster RDF DatabaseStardog 1.1: Easier, Smarter, Faster RDF Database
Stardog 1.1: Easier, Smarter, Faster RDF Database
 
Stardog talk-dc-march-17
Stardog talk-dc-march-17Stardog talk-dc-march-17
Stardog talk-dc-march-17
 
Validating Linked Data with OWL
Validating Linked Data with OWLValidating Linked Data with OWL
Validating Linked Data with OWL
 
Terp: An OWL-friendly SPARQL
Terp: An OWL-friendly SPARQLTerp: An OWL-friendly SPARQL
Terp: An OWL-friendly SPARQL
 
PelletServer: REST and Semantic Technologies
PelletServer: REST and Semantic TechnologiesPelletServer: REST and Semantic Technologies
PelletServer: REST and Semantic Technologies
 
PelletDb: Scalable Reasoning for Enterprise Semantics
PelletDb: Scalable Reasoning for Enterprise SemanticsPelletDb: Scalable Reasoning for Enterprise Semantics
PelletDb: Scalable Reasoning for Enterprise Semantics
 
Automated Planning as a Semantic Technology
Automated Planning as a Semantic TechnologyAutomated Planning as a Semantic Technology
Automated Planning as a Semantic Technology
 
Empire: JPA for RDF & SPARQL
Empire: JPA for RDF & SPARQLEmpire: JPA for RDF & SPARQL
Empire: JPA for RDF & SPARQL
 
SemTech 2010: Pelorus Platform
SemTech 2010: Pelorus PlatformSemTech 2010: Pelorus Platform
SemTech 2010: Pelorus Platform
 

Sem tech 2010_integrity_constraints

  • 1. Using OWL in Closed World Applications Evren Sirin, CTO Clark & Parsia, LLC evren@clarkparsia.com
  • 2. Who are we? • Clark & Parsia is a semantic software startup  – HQ in Washington, DC & office in Boston • Provides software development and integration services • Specializing in Semantic Web, web services, and advanced AI technologies for federal and enterprise customers  http://clarkparsia.com/ Twitter: @candp 2
  • 3. Some Applications • Customer and product data – Find which customer would be interested in buying a certain product • System and component descriptions – Configure components to build a desired system • Workforce and employee data – Locate employees with desired expertise • Patient history and drug data – Detect and prevent potentially harmful drug interactions 3
  • 4. Common Theme • There is data and lots of it! • Adding semantics to the data helps a lot – Some times simple taxonomies, but other times, complex ontologies • We have complete knowledge about the domain • Errors in the data cause problems – Failures in applications, errors in decision making, potential loss of revenue, security vulnerabilities, etc. 4
  • 5. Data Validation • Fundamental data management problem – Verify data integrity and correctness  – Enforce validity of updates  • Relevant in many scenarios – Storing data for stand-alone applications – Exchanging data in distributed settings • Solved (to some degree) in RDBMSs – Harder to achieve as data semantics increase and/or more expressive integrity conditions are required 5
  • 6. Disclaimer • Data validity not important for every use case – Invalid data may be fine for an application – Invalidity may even be a requirement • Focus of this talk is cases where data consistency and integrity are crucial 6
  • 7. Roadmap for an App • How to build one of these applications? – Represent data as RDF triples • First step for accomplishing data integration and analysis – Enrich data with more semantics (RDFS, OWL) • Infer implicit information from explicit assertions – Ensure data validity • Detect errors in the data – Do something cool with the data • Obviously... 7
  • 8. Reasoning Example • Input ontology # Every manager is an employee Manager subClassOf Employee # Person0853 is a manager Person0853 type Manager • Output inferences # Person0853 is an employee Person0853 type Employee
  • 9. Reasoning Example • Input ontology # Every manager is an employee Schema Manager subClassOf Employee # Person0853 is a manager Person0853 type Manager • Output inferences # Person0853 is an employee Person0853 type Employee
  • 10. Reasoning Example • Input ontology # Every manager is an employee Schema Manager subClassOf Employee # Person0853 is a manager Person0853 type Manager Instance data • Output inferences # Person0853 is an employee Person0853 type Employee
  • 11. Validating RDF Data • Common misunderstanding – RDFS/OWL is to RDF what XML Schema is to XML – Describe integrity conditions in RDFS or OWL • Typing constraints - RDFS domain/range • Participation constraints - OWL some values restrictions • Uniqueness constraints - OWL cardinality restriction – Use a reasoner to find inconsistencies • Problem: Open World Assumption 9
  • 12. Closed vs. Open World • Two different views on truth: – CWA: Any statement that is not known to be true is false – OWA: A statement is false only if it is known to be false • Used in different contexts – Databases use CWA because (typically) they contain  complete information – Ontologies use OWA because (typically) they don't... that is, they contain incomplete information • Data validation results significantly different when using CWA instead of OWA 10
  • 13. Typing Constraint • Only managers can supervise employees • Input ontology o supervises domain Manager o Person085 supervises Person173 OWA CWA  Consistent true false Infer that Assume that  Reason Person085 type Manager Person085 type not Manager
  • 14. Participation Constraint • Each supervisor must supervise at least one employee • Input axioms o Supervisor subClassOf supervises some Employee o Person085 type Supervisor OWA CWA Consistent true false Infer that Assume that Reason Person085 supervises _:b Person085 supervises _:b _:b type Employee does not exist
  • 15. Uniqueness Constraint • Employees can have at most one supervisor • Input axioms o supervises InverseFunctional o Person085 supervises Person173 o Person632 supervises Person173 OWA CWA Consistent true false Assume that Infer that Reason Person085 sameAs Person632 Person085 sameAs Person632 does not hold
  • 16. Workarounds for CW • Manually close the world – Declare all individuals different from each other – Count existing property values and add a max cardinality restriction – Make all disjointness statements explicit and add negated types to individuals • Drawbacks – Can be computationally expensive – Likely to be error-prone
  • 17. Problem Summary • Definitions in an OWL schema may have two purposes – Infer new statements – Check if existing statements are valid • Using OWA for validation is undesirable – Not always but in many cases • In a problem domain we may have: – Complete knowledge about some parts of the domain – Incomplete knowledge about the other parts
  • 18. Integrity Constraint Solution • We defined an alternative semantics for OWL – Integrity Constraint (IC) semantics use CWA – Can be combined with regular inference axioms • Ontology developer chooses which axioms will be interpreted with... – OWA - regular OWL axiom, or – CWA - integrity constraint
  • 19. IC Extension • Syntax specification – How do we syntactically say an axiom is an IC and not a regular OWL axiom? • Semantics specification – How do we exactly interpret an IC? • Validation algorithm – Given the semantics how do we check for IC violations?
  • 20. IC Syntax • Similar approach to using owl:imports • Define a new annotation property in a new namespace Ont1 owl:imports Ont2 Ont1 ic:imports IC1 • Backward compatible, requires minimum change in tools
  • 21. IC Semantics • OWL semantics based on model theory – Similar to First Order Logic – Formal, precise, and unambiguous • IC semantics specification – Extends OWL model theory – Change couple basic definitions, everything else follows • Details published in technical papers – We are submitting a W3C member submission soon
  • 22. Use Case: SKOS • Simple Knowledge Organization System (SKOS) • SKOS provides a model for expressing the basic structure and content of concept schemes – Thesauri, classification schemes, subject heading lists, taxonomies, folksonomies, etc. • SKOS data model specification – Informal (Text): http://www.w3.org/TR/skos-reference/ – Formal (OWL): http://www.w3.org/2004/02/skos/core.rdf 20
  • 23. SKOS Example # SKOS reference ontology that contains inference rules skos:broaderTransitive Transitive skos-reference.ttl skos:broaderTransitive subPropertyOf skos:broader # Constraints from SKOS reference expressed as ICs skos:related propertyDisjointWith skos:broaderTransitive skos-constraints.ttl # SKOS data that violates the SKOS data model [] a owl:Ontology ; owl:imports skos-reference.ttl ;                  ic:imports skos-constraints.ttl . skos-invalid.ttl A skos:broader B ; skos:related C . B skos:broader C .
  • 24. Explanation VIOLATION: A violates related propertyDisjointWith broaderTransitive INFERRED: A related C ASSERTED: A related C INFERRED: A broaderTransitive C ASSERTED: A broader B ASSERTED: B broader C ASSERTED: broader subPropertyOf broaderTransitive ASSERTED: broaderTransitive Transitive 22
  • 25. Another SKOS Example # SKOS-XL ontology with a cardinality restriction skosxl:Label subClassOf skos-xl.ttl skosxl:literalForm cardinality 1 # SKOS data that violates the SKOS data model [] a owl:Ontology ; owl:imports skos-xl.ttl . skos-data.tll A skosxl:labelRelation LabelA LabelA type skosxl:Label . Result: Consistent
  • 26. Another SKOS Example # SKOS-XL ontology with a cardinality restriction skosxl:Label subClassOf skos-xl.ttl skosxl:literalForm cardinality 1 # SKOS data that violates the SKOS data model [] a owl:Ontology ; owl:imports skos-xl.ttl ;                  ic:imports skos-xl.ttl . skos-data.tll A skosxl:labelRelation LabelA LabelA type skosxl:Label . Result: IC Violation
  • 27. Linked Data Application • Large amounts of instance data • Validate before publishing/consuming LOD • Instance data + Inference axioms + Constraints – Infer new facts using inference axioms with OWA – Validate data using constraints with CWA – Inference axioms and constraints are both expressed in OWL 25
  • 28. Validation Algorithm • An automated translation algorithm • Automatically maps an OWL IC to ... – A SPARQL query, or – A RIF rule • Many different implementation possibilities • Off-the-shelf tools can be used for IC validation
  • 29. SPARQL Translation Supervisor subClassOf supervises some Employee SELECT * { ?x type Supervisor. NOT EXISTS { ?x supervises ?y. ?y type Employee. } }
  • 30. RIF Translation Supervisor subClassOf supervises some Employee Forall ?x ?y ( invalid() :- And ( ?x[type -> Supervisor] Naf And ( ?x[supervises -> ?y] ?y[type -> Employee] )))
  • 31. Solution Summary • Separate ICs from regular OWL ICs – No new syntax – Import-based mechanism • Alternative semantics for ICs – Extends OWL model theory – Provides the meanings of ICs formally • Validation algorithm – Translate ICs to another formalism – SPARQL or RIF engines can be used
  • 32. Performance • Using ICs can improve performance! • Expressive OWL reasoning is not easy • Profiles of OWL defined for tractable reasoning – OWL 2 QL, OWL 2 EL, OWL 2 RL – Less expressive but more efficient • Modeling some OWL axioms as ICs may reduce the overall expressivity 30
  • 33. Prototype • Pellet IC validator – Translates ICs into SPARQL queries automatically – Executes SPARQL queries with Pellet – Query results show constraint violations – Automatically explain constraint violations • Free download – http://clarkparsia.com/pellet/icv 31
  • 34. Code Example // create an inferencing model using Pellet reasoner InfModel dataModel = ModelFactory.createInfModel(r); // load the schema and instance data to Pellet dataModel.read( "file:data.rdf" ); dataModel.read( "file:schema.owl" ); // Create the IC validator and associate it with the dataset JenaICValidator validator = new JenaICValidator(dataModel); // Load the constraints into the IC validator validator.getConstraints().read("file:constraints.owl"); // Get the constraint violations Iterator<ConstraintViolation> violations = validator.getViolations();
  • 35. Next Steps • W3C Member submission for IC semantics • Robust IC validator implementation – Incremental validation – Multi-threaded validation • Support for IC editing • Integration with PelletDb – Scalable reasoning + validation 33
  • 36. References • Evren Sirin, Michael Smith, Evan Wallace Opening, Closing Worlds - On Integrity Constraints OWL: Experiences and Directions Workshop (OWLED '08), October 2008. • Evren Sirin, Jiao Tao Towards Integrity Constraints in OWL OWL: Experiences and Directions Workshop (OWLED '09), October 2009. • Jiao Tao, Evren Sirin, Jie Bao, Deborah L. McGuinness Integrity Constraints in OWL To AppearThe 24th AAAIConference on Artificial Intelligence (AAAI '10), July 2010.