SlideShare uma empresa Scribd logo
1 de 36
Baixar para ler offline
On Implementing
Probabilistic Relational Models



           2010, October 29th
  Contact: Shou Matsumoto (cardialfly@[yahoo|gmail].com)
Content
            Purpose
            Contextualization
                  •E/R
                  •PRM
                  •Link Uncertainty
            A Java implementation
                  •UnBBayes-PRM*




*Project page: http://sourceforge.net/projects/unbbayes/
Objectives
           What is this presentation for?
            – Overview of PRM and its
              underlying concepts
Purpose




            – Overview of extensions of PRM
               • Link uncertainty
            – To present a simple
              implementation of PRM
  3
               • UnBBayes-PRM
Motivations
           E/R models are heavily used
             – Most of commercial databases are
               based on E/R models
Purpose




           PRM allows E/R with uncertainty
             – PRM is compatible with optimizations
               of BN and E/R
           Implementations of PRM are rare

  4
Target
             For whom is this presentation intended?
               – People interested on PRM
                 • E.g. Database architects willing to incorporate
Purpose




                   probabilistic reasoning
                 • People looking for a BN extension with the
                   expressiveness of relational calculus
              – People looking for a PRM tool
                 • E.g. Developers looking for a sample
                   implementation
                 • Learners willing to exercise PRM
  5           We assume you have basic knowledge about Bayesian Networks
What is PRM?
Contextualization




                    BN       +     E/R

                            =
                            =

                           PRM
                           PRM
    6
What is E/R?
                       E/R = Entity-Relationship
                        Abstract conceptual representation of data
Contextualization



                    
                         – Often used in relational database models
                            • E.g. Oracle, MySQL, PostgreSQL...
                       Entities = “nouns”
                         – A set of elements in a domain
                       Relationships = “verbs”
                         – Captures how 2 or more entities are related
                       Attributes = “characteristics”
    7                              Attributes holds actual data content.
What is E/R?
                       Constraints
Contextualization



                        – Cardinality
                           • 1-1, 1-many, many-1, many-many
                        – Primary Key (PK):
                           • minimal set of uniquely identifying attributes
                        – Foreign Key (FK):
                           • Attributes that refers to other attributes (PK)
                               – This is used to conduct relationships
                        – Allowed values
                        – Etc.
    8
What is E/R?

                       E/R can be represented as a set of Tables
Contextualization



                         – Entities → tables
                         – Attributes → columns
                         – Values of attributes → content of a cell
                         – 1-1 and 1-many (many-1) relationships → FK
                         – Many-many relationships → table + FK
                       Problem
                         – Classic E/R models do not handle uncertainty
    9                           UnBBayes-PRM sees E/R as a set of tables.
So, what is PRM?
                       Probabilistic Relational Models
Contextualization



                        – Template for probability distribution over a
                          database (E/R model)
                           • Compact graphical probabilistic model
                              – well defined semantics
                           • Natural domain modeling
                              – objects, properties, relations...
                           • Attributes can depend on attributes of related
                             entities
                           • Generalization over a variety of situations
  10
So, what is PRM?
                       PRM's learning algorithms
Contextualization



                        – Captures relationships in Bayesian learning
                          algorithms
                           • There's no need to “flatten” database
                       PRM's are composed of:
                        – Relational Schema,
                        – Relational Skeleton,
                        – Probabilistic distribution.

  11                           Machine learning is a major concern in PRM
Schema
                       Static part
Contextualization



                        – Entities + Relationships + Attributes
                        – PK, FK, possible (allowed) values...

                    hasFather

                                                               Person
                                                                ID: PK
                    Person      BloodType                Father : FK to Person
                                                         Mother: FK to Person
                                                    BloodType : any of {A,B,AB,O}


                    hasMother
  12
Skeleton
                       Dynamic part
Contextualization



                        – Instantiation of a Schema
                        – Actual objects
                           • Attributes are filled with some values

                    ID: Augustine                           ID: Mary
                    Father: NULL                          Father: NULL
                    Mother: NULL                          Mother: NULL
                    BloodType: O         ID: George       BloodType: A
                                      Father: Augustine
                                        Mother: Mary
  13                                  BloodType: NULL
PRM's structure
                       Schema + probabilistic dependencies
Contextualization



                       Attributes have path expressions describing their
                        parents of that attribute.
                         – Path expressions = slot chain
                            • List of FK
                         – If slot chain contains 1-many relationship, the
                           number of parents is unknown
                       Conditional Probability Distribution (CPD)
                         – Conditional Probability Table (CPT)
                         – Functions + parameters
  14                (Slot chain = empty) := no parents | parents reside in the same table
PRM's structure
                                                                              John Doe Jane Doe
Contextualization



                             Person               Instantiation
                                                   Instantiation                      Me
                              Person
                    FK1                  FK2
                               PK
                             Father
                            Mother
                           BloodType
                                                                CPD of BloodType
                                                                 CPD of BloodType
                                                           Father   A     A       A        ...
                                                           Mother   A     B       AB       ...
                        Edge from
                         Edge from        Edge from
                                           Edge from       A        75%   25%     50%      ...
                        BloodType
                         BloodType        BloodType
                                           BloodType       B        0%    25%     25%      ...
                       of the object
                        of the object    of the object
                                          of the object    AB       0%    25%     25%      ...
                    referenced by FK1 referenced by FK2
                     referenced by FK1 referenced by FK2   O        25%   25%     0%       ...
  15
CPD with aggregation
                     How do we declare the CPD if the number of parents is
                      unknown?
Contextualization



                     Approach 1: special purpose scripts
                       – E.g. UnBBayes-MEBN's CPD scripts
                          • A set of IF-THEN-ELSE statements
                     Approach 2: aggregation
                       – E.g. Mode, Max, Min, Average...
                          • Equivalent to an intermediate “deterministic” node



  16                              UnBBayes-PRM uses the approach 2
Inference
                     Instantiation of a BN from skeleton
Contextualization



                     Descriptive attributes become random
                      variables
                     Once generated, further inference is done as
                      normal BN (evidence propagation)




  17
Does the instantiated BN
                                have cycles?
                     Case 1: check at PRM schema level
                       – Schema has no cycle → instances have no cycle
Contextualization



                     Case 2: schema contains cycles, but the instantiated BN
                      does not

                                    ID: Augustine            ID: Mary
                                     BloodType              BloodType
                                   Person                        Person

                                               ID: George
                        (Father)                                          (Mother)
                                               Washington
                                               BloodType
  18                                             Person
Extension:
                                     link uncertainty
                       We only mentioned about distribution over attributes
                        of the objects in a model
Contextualization




                         – Only the values of the attributes were uncertain
                       Uncertainty over relational structure of domain was
                        not addressed yet
                         – Structure uncertainty
                            • Values of FK are uncertain
                                – Slot chains are uncertain
                       Reference uncertainty & existence uncertainty
  19                      OBS. Link uncertainty is not implemented in UnBBayes-PRM
Reference uncertainty
                       Slots' (FK) values become a random variable
Contextualization



                         – Problem
                            • Unknown number of possible values
                                – It's difficult to declare CPD at schema level
                         – Solution
                            • Create partitions based on “other attributes”
                                – Assuming that ordinal attributes has a
                                  known number of possible values

  20
Reference uncertainty
                       Entity2
                        Entity2                   Entity1
                                                   Entity1               Possible values:
Contextualization



                         PK                         PK                   PKs of Entity2
                                                FKToEntity2                (unknown)
                    BooleanAttrib
                       Link to a single instance of Entity2
                        based on the current value of PK
                    Link to a set (partition) of instances of Entity2,
                      based on the current value of BooleanAttrib
                                                   Entity1
                                                    Entity1
                       Entity2
                        Entity2
                                                     PK                  Possible values:
                         PK                     FKToEntity2               2 (true/false)
                    BooleanAttrib                  Selector
  21                              We can now specify parents of FKs and CPD
Reference uncertainty:
                              instantiating BN
Contextualization




                       Edge types:
                        –   I: within single object
                        –   II: between objects
                        –   III: from FKs of a slot chain
                        –   IV: from partition attributes to selectors
                        –   V: from selectors to FK
  22                Extracted from Probabilistic Relational Models (Getoor et al., SRL07)
Existence uncertainty
                       Creation of a Boolean attribute “Exists” in tables
Contextualization



                        – Technically, entities also contain “Exists”
                           • But we assume instances (objects) of entities
                             “do exist” if they were instantiated
                               – So, this mechanism is mainly for
                                 relationships
                        – Because “Exists” is not a FK, we can use it as a
                          normal random variable.
                           • No major changes on BN instantiation

  23                     Objects are related to every possible objects, with 0% ~ 100%
UnBBayes-PRM
                           Open-source Java software
A Java Implementation



                            – GUI & inference machine
                           Features
                            –   Edit Schema and Skeleton as tables
                            –   Edit probabilistic dependencies as CPT
                            –   Edit constraints (PK, FK and allowed values)
                            –   Generate BN from Skeleton
                            –   Save/load projects from file
                           Developed as a plug-in for UnBBayes:
                            – Alpha version (for internal use)
  24                             Project page: http://sourceforge.net/projects/unbbayes/
UnBBayes-PRM
A Java Implementation




  25                    A plugin descriptor is the main and minimal content of a plugin
UnBBayes-PRM
A Java Implementation




  26                    A plugin descriptor is the main and minimal content of a plugin
27
     A Java Implementation
                             UnBBayes-PRM
UnBBayes-PRM - I/O
                        /* Table and PK declaration */
A Java Implementation


                        CREATE TABLE "Person" (
                                   "id"       VARCHAR2(300)            not null,
                                   "Father" VARCHAR2(300) ,
                                   "Mother"            VARCHAR2(300) ,
                                   "BloodType"         VARCHAR2(300)
                        );
                        ALTER TABLE "Person" ADD CONSTRAINT PK_Person
                                  PRIMARY KEY ("id");
                        /* Possible values */
                        ALTER TABLE "Person" ADD CONSTRAINT CK_BloodType
                                  CHECK ( "BloodType" IN ('A', 'B', 'AB', 'O'));
                        /* Foreign keys (relationships) */
                        ALTER TABLE "Person" ADD CONSTRAINT FK_Person_Father
                                  FOREIGN KEY ("Father") REFERENCES "Person" ("id");
                        ALTER TABLE "Person" ADD CONSTRAINT FK_Person_Mother
                                  FOREIGN KEY ("Mother") REFERENCES "Person" ("id");
  28                     PRM is currently stored as a SQL script. This is a temporary solution.
UnBBayes-PRM - I/O
                           Dependencies are stored as in-table comments
A Java Implementation




                        COMMENT ON COLUMN Person.BloodType IS 'Person.BloodType()
                        [ FK_Person_Father ] , Person.BloodType()[ FK_Person_Mother ] ; { 0.75 0.0
                        0.0 0.25 0.25 0.25 0.25 0.25 (...) }';

                           Basic format:
                             – <listOfParents>;{<listOfProbabilities>}
                           <listOfParents> := comma separated list
                            – <parentClass>.<parentColumn>
                              (<aggregateFunction>){<listOfForeignKeys>}
                               • <listOfForeignKeys> represents a slot chain
  29                                       This is also a temporary solution.
UnBBayes-PRM:
                                       limitations
                           No support for link uncertainty
A Java Implementation




                            – But existence uncertainty can be “simulated”
                           Only 1 attribute as PK
                           Only String types allowed
                            – Thus, no sequences are allowed
                           No marginalization
                            – Cannot delete dependencies
                               • We must re-create attribute or edit the SQL
                                 script
  30
UnBBayes-PRM:
                                      limitations
                           2 edges (dependencies) to a same attribute is
A Java Implementation



                            not allowed
                            – Even using different slot chains
                           3 aggregation functions:
                            – mode, min, max.
                         No machine learning
                         No direct access to an actual database (yet)
                            – Only by means of a SQL script.

  31
UnBBayes-PRM:
                     (possible) future works
              Add extension points for plug-ins
              Integration with DBMS
                 – Constraints/rules can be delegated to DBMS
Conclusion




                    • Some of the limitations may be automatically fixed
              Implement machine learning and link
               uncertainty
              Edit E/R models as diagrams
              PRM → MSBN compilation

32                        DBMS = DataBase Management System
UnBBayes-PRM:
                       (possible) future works
                Implement Dynamic PRM
                 – Dynamic BN + E/R
Conclusion




                Integration with PROXIMITY¹
                 – RDN - Relational Dependency Network
                      • Generalization of BN + E/R + Relational Markov
                        Network




33               ¹A Java open-source tool from University of Massachusetts Amherst
Finally
                PRM looks practical
                  – Uncertainty on relational data
                    • Immediate applicability in databases
Conclusion




                – Advanced DBMS can add advanced
                  features
              Machine learning seems to be PRM's major
               concern
                – It was not addressed by this presentation
34
Finally
                PRM cannot specify advanced rules and
                 constraints on conditional probabilities
                 – Some conditions must be fulfilled “manually”
Conclusion




                 – Some may be fulfilled by DBMS' features
                UnBBayes-PRM provides an editor and inference
                 engine for basic PRM




35
Questions?




Project page: http://sourceforge.net/projects/unbbayes/

Mais conteúdo relacionado

Mais procurados

Knowledge Representation & Reasoning
Knowledge Representation & ReasoningKnowledge Representation & Reasoning
Knowledge Representation & Reasoning
Sajid Marwat
 

Mais procurados (7)

What is knowledge representation and reasoning ?
What is knowledge representation and reasoning ?What is knowledge representation and reasoning ?
What is knowledge representation and reasoning ?
 
artificial intelligence
artificial intelligenceartificial intelligence
artificial intelligence
 
Knowledge Representation & Reasoning
Knowledge Representation & ReasoningKnowledge Representation & Reasoning
Knowledge Representation & Reasoning
 
A Fuzzy Valid-Time Model for Relational Databases Within the Hibernate Framework
A Fuzzy Valid-Time Model for Relational Databases Within the Hibernate FrameworkA Fuzzy Valid-Time Model for Relational Databases Within the Hibernate Framework
A Fuzzy Valid-Time Model for Relational Databases Within the Hibernate Framework
 
Knowledge representation
Knowledge representationKnowledge representation
Knowledge representation
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligence
 
REPRESENTATION OF DECLARATIVE KNOWLEDGE
REPRESENTATION OF DECLARATIVE KNOWLEDGEREPRESENTATION OF DECLARATIVE KNOWLEDGE
REPRESENTATION OF DECLARATIVE KNOWLEDGE
 

Destaque

Geomodelling, resource & reserve estimation using mining software
Geomodelling, resource & reserve estimation using mining softwareGeomodelling, resource & reserve estimation using mining software
Geomodelling, resource & reserve estimation using mining software
Chandra Bose
 

Destaque (6)

UnBBayes Plugin Framework
UnBBayes Plugin FrameworkUnBBayes Plugin Framework
UnBBayes Plugin Framework
 
UnBBayes Overview
UnBBayes OverviewUnBBayes Overview
UnBBayes Overview
 
Spe prms
Spe prmsSpe prms
Spe prms
 
Petroleum resources reserves
Petroleum resources reservesPetroleum resources reserves
Petroleum resources reserves
 
What are my 3P Reserves? Haas Petroleum Engineering Services
What are my 3P Reserves? Haas Petroleum Engineering ServicesWhat are my 3P Reserves? Haas Petroleum Engineering Services
What are my 3P Reserves? Haas Petroleum Engineering Services
 
Geomodelling, resource & reserve estimation using mining software
Geomodelling, resource & reserve estimation using mining softwareGeomodelling, resource & reserve estimation using mining software
Geomodelling, resource & reserve estimation using mining software
 

Mais de Rommel Carvalho

Proposta de Modelo de Classificação de Riscos de Contratos Públicos
Proposta de Modelo de Classificação de Riscos de Contratos PúblicosProposta de Modelo de Classificação de Riscos de Contratos Públicos
Proposta de Modelo de Classificação de Riscos de Contratos Públicos
Rommel Carvalho
 
Categorização de achados em auditorias de TI com modelos supervisionados e nã...
Categorização de achados em auditorias de TI com modelos supervisionados e nã...Categorização de achados em auditorias de TI com modelos supervisionados e nã...
Categorização de achados em auditorias de TI com modelos supervisionados e nã...
Rommel Carvalho
 
Filiação partidária e risco de corrupção de servidores públicos federais
Filiação partidária e risco de corrupção de servidores públicos federaisFiliação partidária e risco de corrupção de servidores públicos federais
Filiação partidária e risco de corrupção de servidores públicos federais
Rommel Carvalho
 
Uso de mineração de dados e textos para cálculo de preços de referência em co...
Uso de mineração de dados e textos para cálculo de preços de referência em co...Uso de mineração de dados e textos para cálculo de preços de referência em co...
Uso de mineração de dados e textos para cálculo de preços de referência em co...
Rommel Carvalho
 
BMAW 2014 - Using Bayesian Networks to Identify and Prevent Split Purchases i...
BMAW 2014 - Using Bayesian Networks to Identify and Prevent Split Purchases i...BMAW 2014 - Using Bayesian Networks to Identify and Prevent Split Purchases i...
BMAW 2014 - Using Bayesian Networks to Identify and Prevent Split Purchases i...
Rommel Carvalho
 
URSW 2013 - UMP-ST plug-in
URSW 2013 - UMP-ST plug-inURSW 2013 - UMP-ST plug-in
URSW 2013 - UMP-ST plug-in
Rommel Carvalho
 
Dados Abertos Governamentais
Dados Abertos GovernamentaisDados Abertos Governamentais
Dados Abertos Governamentais
Rommel Carvalho
 
Modeling a Probabilistic Ontology for Maritime Domain Awareness
Modeling a Probabilistic Ontology for Maritime Domain AwarenessModeling a Probabilistic Ontology for Maritime Domain Awareness
Modeling a Probabilistic Ontology for Maritime Domain Awareness
Rommel Carvalho
 
Probabilistic Ontology: Representation and Modeling Methodology
Probabilistic Ontology: Representation and Modeling MethodologyProbabilistic Ontology: Representation and Modeling Methodology
Probabilistic Ontology: Representation and Modeling Methodology
Rommel Carvalho
 

Mais de Rommel Carvalho (20)

Ouvidoria de Balcão vs Ouvidoria Digital: Desafios na Era Big Data
Ouvidoria de Balcão vs Ouvidoria Digital: Desafios na Era Big DataOuvidoria de Balcão vs Ouvidoria Digital: Desafios na Era Big Data
Ouvidoria de Balcão vs Ouvidoria Digital: Desafios na Era Big Data
 
Como transformar servidores em cientistas de dados e diminuir a distância ent...
Como transformar servidores em cientistas de dados e diminuir a distância ent...Como transformar servidores em cientistas de dados e diminuir a distância ent...
Como transformar servidores em cientistas de dados e diminuir a distância ent...
 
Proposta de Modelo de Classificação de Riscos de Contratos Públicos
Proposta de Modelo de Classificação de Riscos de Contratos PúblicosProposta de Modelo de Classificação de Riscos de Contratos Públicos
Proposta de Modelo de Classificação de Riscos de Contratos Públicos
 
Categorização de achados em auditorias de TI com modelos supervisionados e nã...
Categorização de achados em auditorias de TI com modelos supervisionados e nã...Categorização de achados em auditorias de TI com modelos supervisionados e nã...
Categorização de achados em auditorias de TI com modelos supervisionados e nã...
 
Mapeamento de risco de corrupção na administração pública federal
Mapeamento de risco de corrupção na administração pública federalMapeamento de risco de corrupção na administração pública federal
Mapeamento de risco de corrupção na administração pública federal
 
Ciência de Dados no Combate à Corrupção
Ciência de Dados no Combate à CorrupçãoCiência de Dados no Combate à Corrupção
Ciência de Dados no Combate à Corrupção
 
Aplicação de técnicas de mineração de textos para classificação automática de...
Aplicação de técnicas de mineração de textos para classificação automática de...Aplicação de técnicas de mineração de textos para classificação automática de...
Aplicação de técnicas de mineração de textos para classificação automática de...
 
Filiação partidária e risco de corrupção de servidores públicos federais
Filiação partidária e risco de corrupção de servidores públicos federaisFiliação partidária e risco de corrupção de servidores públicos federais
Filiação partidária e risco de corrupção de servidores públicos federais
 
Uso de mineração de dados e textos para cálculo de preços de referência em co...
Uso de mineração de dados e textos para cálculo de preços de referência em co...Uso de mineração de dados e textos para cálculo de preços de referência em co...
Uso de mineração de dados e textos para cálculo de preços de referência em co...
 
Detecção preventiva de fracionamento de compras
Detecção preventiva de fracionamento de comprasDetecção preventiva de fracionamento de compras
Detecção preventiva de fracionamento de compras
 
Identificação automática de tipos de pedidos mais frequentes da LAI
Identificação automática de tipos de pedidos mais frequentes da LAIIdentificação automática de tipos de pedidos mais frequentes da LAI
Identificação automática de tipos de pedidos mais frequentes da LAI
 
BMAW 2014 - Using Bayesian Networks to Identify and Prevent Split Purchases i...
BMAW 2014 - Using Bayesian Networks to Identify and Prevent Split Purchases i...BMAW 2014 - Using Bayesian Networks to Identify and Prevent Split Purchases i...
BMAW 2014 - Using Bayesian Networks to Identify and Prevent Split Purchases i...
 
A GUI for MLN
A GUI for MLNA GUI for MLN
A GUI for MLN
 
URSW 2013 - UMP-ST plug-in
URSW 2013 - UMP-ST plug-inURSW 2013 - UMP-ST plug-in
URSW 2013 - UMP-ST plug-in
 
Integração do Portal da Copa @ Comissão CMA do Senado Federal
Integração do Portal da Copa @ Comissão CMA do Senado FederalIntegração do Portal da Copa @ Comissão CMA do Senado Federal
Integração do Portal da Copa @ Comissão CMA do Senado Federal
 
Dados Abertos Governamentais
Dados Abertos GovernamentaisDados Abertos Governamentais
Dados Abertos Governamentais
 
Modeling a Probabilistic Ontology for Maritime Domain Awareness
Modeling a Probabilistic Ontology for Maritime Domain AwarenessModeling a Probabilistic Ontology for Maritime Domain Awareness
Modeling a Probabilistic Ontology for Maritime Domain Awareness
 
Probabilistic Ontology: Representation and Modeling Methodology
Probabilistic Ontology: Representation and Modeling MethodologyProbabilistic Ontology: Representation and Modeling Methodology
Probabilistic Ontology: Representation and Modeling Methodology
 
SWRL-F - A Fuzzy Logic Extension of the Semantic Web Rule Language
SWRL-F - A Fuzzy Logic Extension of the Semantic Web Rule LanguageSWRL-F - A Fuzzy Logic Extension of the Semantic Web Rule Language
SWRL-F - A Fuzzy Logic Extension of the Semantic Web Rule Language
 
Default Logics for Plausible Reasoning with Controversial Axioms
Default Logics for Plausible Reasoning with Controversial AxiomsDefault Logics for Plausible Reasoning with Controversial Axioms
Default Logics for Plausible Reasoning with Controversial Axioms
 

UnBBayes-PRM - On Implementing Probabilistic Relational Models

  • 1. On Implementing Probabilistic Relational Models 2010, October 29th Contact: Shou Matsumoto (cardialfly@[yahoo|gmail].com)
  • 2. Content Purpose Contextualization •E/R •PRM •Link Uncertainty A Java implementation •UnBBayes-PRM* *Project page: http://sourceforge.net/projects/unbbayes/
  • 3. Objectives  What is this presentation for? – Overview of PRM and its underlying concepts Purpose – Overview of extensions of PRM • Link uncertainty – To present a simple implementation of PRM 3 • UnBBayes-PRM
  • 4. Motivations  E/R models are heavily used – Most of commercial databases are based on E/R models Purpose  PRM allows E/R with uncertainty – PRM is compatible with optimizations of BN and E/R  Implementations of PRM are rare 4
  • 5. Target  For whom is this presentation intended? – People interested on PRM • E.g. Database architects willing to incorporate Purpose probabilistic reasoning • People looking for a BN extension with the expressiveness of relational calculus – People looking for a PRM tool • E.g. Developers looking for a sample implementation • Learners willing to exercise PRM 5 We assume you have basic knowledge about Bayesian Networks
  • 6. What is PRM? Contextualization BN + E/R = = PRM PRM 6
  • 7. What is E/R?  E/R = Entity-Relationship Abstract conceptual representation of data Contextualization  – Often used in relational database models • E.g. Oracle, MySQL, PostgreSQL...  Entities = “nouns” – A set of elements in a domain  Relationships = “verbs” – Captures how 2 or more entities are related  Attributes = “characteristics” 7 Attributes holds actual data content.
  • 8. What is E/R?  Constraints Contextualization – Cardinality • 1-1, 1-many, many-1, many-many – Primary Key (PK): • minimal set of uniquely identifying attributes – Foreign Key (FK): • Attributes that refers to other attributes (PK) – This is used to conduct relationships – Allowed values – Etc. 8
  • 9. What is E/R?  E/R can be represented as a set of Tables Contextualization – Entities → tables – Attributes → columns – Values of attributes → content of a cell – 1-1 and 1-many (many-1) relationships → FK – Many-many relationships → table + FK  Problem – Classic E/R models do not handle uncertainty 9 UnBBayes-PRM sees E/R as a set of tables.
  • 10. So, what is PRM?  Probabilistic Relational Models Contextualization – Template for probability distribution over a database (E/R model) • Compact graphical probabilistic model – well defined semantics • Natural domain modeling – objects, properties, relations... • Attributes can depend on attributes of related entities • Generalization over a variety of situations 10
  • 11. So, what is PRM?  PRM's learning algorithms Contextualization – Captures relationships in Bayesian learning algorithms • There's no need to “flatten” database  PRM's are composed of: – Relational Schema, – Relational Skeleton, – Probabilistic distribution. 11 Machine learning is a major concern in PRM
  • 12. Schema  Static part Contextualization – Entities + Relationships + Attributes – PK, FK, possible (allowed) values... hasFather Person ID: PK Person BloodType Father : FK to Person Mother: FK to Person BloodType : any of {A,B,AB,O} hasMother 12
  • 13. Skeleton  Dynamic part Contextualization – Instantiation of a Schema – Actual objects • Attributes are filled with some values ID: Augustine ID: Mary Father: NULL Father: NULL Mother: NULL Mother: NULL BloodType: O ID: George BloodType: A Father: Augustine Mother: Mary 13 BloodType: NULL
  • 14. PRM's structure  Schema + probabilistic dependencies Contextualization  Attributes have path expressions describing their parents of that attribute. – Path expressions = slot chain • List of FK – If slot chain contains 1-many relationship, the number of parents is unknown  Conditional Probability Distribution (CPD) – Conditional Probability Table (CPT) – Functions + parameters 14 (Slot chain = empty) := no parents | parents reside in the same table
  • 15. PRM's structure John Doe Jane Doe Contextualization Person Instantiation Instantiation Me Person FK1 FK2 PK Father Mother BloodType CPD of BloodType CPD of BloodType Father A A A ... Mother A B AB ... Edge from Edge from Edge from Edge from A 75% 25% 50% ... BloodType BloodType BloodType BloodType B 0% 25% 25% ... of the object of the object of the object of the object AB 0% 25% 25% ... referenced by FK1 referenced by FK2 referenced by FK1 referenced by FK2 O 25% 25% 0% ... 15
  • 16. CPD with aggregation  How do we declare the CPD if the number of parents is unknown? Contextualization  Approach 1: special purpose scripts – E.g. UnBBayes-MEBN's CPD scripts • A set of IF-THEN-ELSE statements  Approach 2: aggregation – E.g. Mode, Max, Min, Average... • Equivalent to an intermediate “deterministic” node 16 UnBBayes-PRM uses the approach 2
  • 17. Inference  Instantiation of a BN from skeleton Contextualization  Descriptive attributes become random variables  Once generated, further inference is done as normal BN (evidence propagation) 17
  • 18. Does the instantiated BN have cycles?  Case 1: check at PRM schema level – Schema has no cycle → instances have no cycle Contextualization  Case 2: schema contains cycles, but the instantiated BN does not ID: Augustine ID: Mary BloodType BloodType Person Person ID: George (Father) (Mother) Washington BloodType 18 Person
  • 19. Extension: link uncertainty  We only mentioned about distribution over attributes of the objects in a model Contextualization – Only the values of the attributes were uncertain  Uncertainty over relational structure of domain was not addressed yet – Structure uncertainty • Values of FK are uncertain – Slot chains are uncertain  Reference uncertainty & existence uncertainty 19 OBS. Link uncertainty is not implemented in UnBBayes-PRM
  • 20. Reference uncertainty  Slots' (FK) values become a random variable Contextualization – Problem • Unknown number of possible values – It's difficult to declare CPD at schema level – Solution • Create partitions based on “other attributes” – Assuming that ordinal attributes has a known number of possible values 20
  • 21. Reference uncertainty Entity2 Entity2 Entity1 Entity1 Possible values: Contextualization PK PK PKs of Entity2 FKToEntity2 (unknown) BooleanAttrib Link to a single instance of Entity2 based on the current value of PK Link to a set (partition) of instances of Entity2, based on the current value of BooleanAttrib Entity1 Entity1 Entity2 Entity2 PK Possible values: PK FKToEntity2 2 (true/false) BooleanAttrib Selector 21 We can now specify parents of FKs and CPD
  • 22. Reference uncertainty: instantiating BN Contextualization  Edge types: – I: within single object – II: between objects – III: from FKs of a slot chain – IV: from partition attributes to selectors – V: from selectors to FK 22 Extracted from Probabilistic Relational Models (Getoor et al., SRL07)
  • 23. Existence uncertainty  Creation of a Boolean attribute “Exists” in tables Contextualization – Technically, entities also contain “Exists” • But we assume instances (objects) of entities “do exist” if they were instantiated – So, this mechanism is mainly for relationships – Because “Exists” is not a FK, we can use it as a normal random variable. • No major changes on BN instantiation 23 Objects are related to every possible objects, with 0% ~ 100%
  • 24. UnBBayes-PRM  Open-source Java software A Java Implementation – GUI & inference machine  Features – Edit Schema and Skeleton as tables – Edit probabilistic dependencies as CPT – Edit constraints (PK, FK and allowed values) – Generate BN from Skeleton – Save/load projects from file  Developed as a plug-in for UnBBayes: – Alpha version (for internal use) 24 Project page: http://sourceforge.net/projects/unbbayes/
  • 25. UnBBayes-PRM A Java Implementation 25 A plugin descriptor is the main and minimal content of a plugin
  • 26. UnBBayes-PRM A Java Implementation 26 A plugin descriptor is the main and minimal content of a plugin
  • 27. 27 A Java Implementation UnBBayes-PRM
  • 28. UnBBayes-PRM - I/O /* Table and PK declaration */ A Java Implementation CREATE TABLE "Person" ( "id" VARCHAR2(300) not null, "Father" VARCHAR2(300) , "Mother" VARCHAR2(300) , "BloodType" VARCHAR2(300) ); ALTER TABLE "Person" ADD CONSTRAINT PK_Person PRIMARY KEY ("id"); /* Possible values */ ALTER TABLE "Person" ADD CONSTRAINT CK_BloodType CHECK ( "BloodType" IN ('A', 'B', 'AB', 'O')); /* Foreign keys (relationships) */ ALTER TABLE "Person" ADD CONSTRAINT FK_Person_Father FOREIGN KEY ("Father") REFERENCES "Person" ("id"); ALTER TABLE "Person" ADD CONSTRAINT FK_Person_Mother FOREIGN KEY ("Mother") REFERENCES "Person" ("id"); 28 PRM is currently stored as a SQL script. This is a temporary solution.
  • 29. UnBBayes-PRM - I/O  Dependencies are stored as in-table comments A Java Implementation COMMENT ON COLUMN Person.BloodType IS 'Person.BloodType() [ FK_Person_Father ] , Person.BloodType()[ FK_Person_Mother ] ; { 0.75 0.0 0.0 0.25 0.25 0.25 0.25 0.25 (...) }';  Basic format: – <listOfParents>;{<listOfProbabilities>}  <listOfParents> := comma separated list – <parentClass>.<parentColumn> (<aggregateFunction>){<listOfForeignKeys>} • <listOfForeignKeys> represents a slot chain 29 This is also a temporary solution.
  • 30. UnBBayes-PRM: limitations  No support for link uncertainty A Java Implementation – But existence uncertainty can be “simulated”  Only 1 attribute as PK  Only String types allowed – Thus, no sequences are allowed  No marginalization – Cannot delete dependencies • We must re-create attribute or edit the SQL script 30
  • 31. UnBBayes-PRM: limitations  2 edges (dependencies) to a same attribute is A Java Implementation not allowed – Even using different slot chains  3 aggregation functions: – mode, min, max.  No machine learning  No direct access to an actual database (yet) – Only by means of a SQL script. 31
  • 32. UnBBayes-PRM: (possible) future works  Add extension points for plug-ins  Integration with DBMS – Constraints/rules can be delegated to DBMS Conclusion • Some of the limitations may be automatically fixed  Implement machine learning and link uncertainty  Edit E/R models as diagrams  PRM → MSBN compilation 32 DBMS = DataBase Management System
  • 33. UnBBayes-PRM: (possible) future works  Implement Dynamic PRM – Dynamic BN + E/R Conclusion  Integration with PROXIMITY¹ – RDN - Relational Dependency Network • Generalization of BN + E/R + Relational Markov Network 33 ¹A Java open-source tool from University of Massachusetts Amherst
  • 34. Finally  PRM looks practical – Uncertainty on relational data • Immediate applicability in databases Conclusion – Advanced DBMS can add advanced features  Machine learning seems to be PRM's major concern – It was not addressed by this presentation 34
  • 35. Finally  PRM cannot specify advanced rules and constraints on conditional probabilities – Some conditions must be fulfilled “manually” Conclusion – Some may be fulfilled by DBMS' features  UnBBayes-PRM provides an editor and inference engine for basic PRM 35