SlideShare a Scribd company logo
1 of 30
Download to read offline
Type inference through
         the analysis of
        Wikipedia links
                              Andrea Giovanni Nuzzolese
                                              nuzzoles@cs.unibo.it
                                              Aldo Gangemi
                                               aldo.gangemi@cnr.it
                                          Valentina Presutti
                                           valentina.presutti@cnr.it
                                           Paolo Ciancarini
                                             ciancarini@cs.unibo.it
  stlab.istc.cnr.it
                      16 April 2012 - Lyon, France - LDOW 2012
Outline

• Motivations
• Materials
• Applied methods
• Results
• Conclusions




     stlab.istc.cnr.it
                            2
Motivations
                                       ✦   Only a subset of the DBpedia
                                           resources is typed with the
     Resources used in wikilinks
             relations:
                                           DBpedia ontology (DBPO)
            15,944,381                 ✦   The typing procedure is top-
                                           down.
                                       ✦   Is the DBPO complete with
                                           respect to the DBpedia
                                           domain?
Resources having a                     ✦   How good and homogeneous
   DBPO type:
    1,518,697
                                           is the granularity of DBPO
                                           types?



 stlab.istc.cnr.it
                                   3
Materials
Wikilink triples with typed
      subject/object:
        16,745,830                                   DBpedia 3.6



                                               Dataset                    # of triples

                Wikilinks                   wikilink triples              107,892,317
                 triples:
               107,892,317
                                  infobox mapping-based “data” triples     9,357,273


                                            rdfs:label triples             7,972,225


                                            rdf:type triples               6,173,940

       DBpedia ontology:
          272 classes            infobox mapping-based “object” triples    4,251,239




          stlab.istc.cnr.it
                                    4
What we did
• Wikilinks of a DBpedia resource convey knowledge that
 can be used for classifying it.
• Classification methods
 ✦   Inductive learning: k-Nearest Neighbor algorithm
 ✦   Abductive classification based on EKPs [1] and homotypes used as
     background knowledge
• The methods were performed on
     Resources having a                    Resources used in wikilinks
        DBPO type:                                 relations:                                                   Sample of
         1,518,697                                15,944,381                                                untyped resources:
                                                                                                                  1,000



                             [1] A. G. Nuzzolese, A. Gangemi,V. Presutti, and P. Ciancarini. Encyclopedic Knowledge Patterns from Wikipedia Links.
                                 In L. Aroyo, N. Noy, and C. Welty, editors, Proceedings of the 10th International Semantic Web Conference (ISWC2011),
                                 pages 520-536. Springer, 2011.
         stlab.istc.cnr.it
                                                              5
Inductive classification
• We designed two inductive classification
  experiments based on the k-NN algorithm
 ✦ on 272 features, i.e., all the classes in the DBPO

 ✦ on 27 features, i.e., the top-level classes in the DBPO

   hierarchy
• For each experiment we built a labeled feature
  space model as training set by using a randomly
  sampled 20% of typed resources
 ✦ the algorithms were tested on the remaining 80% of typed

   resources


       stlab.istc.cnr.it
                                5
                                6
Building the training set for K-Nearest
                  Neighbor algorithm


dbpo:wikiPageWikiLink         dbpedia:Apple_Inc.                             dbpedia:NeXT




                                                   dbpedia:Steve_Jobs




                                dbpedia:Forbes                         dbpedia:Cupertino,_California


                                       Mammal    Scientist   Company      Drug      City    Magazine   Class
              dbpedia:Steve_Jobs
                                                                                                        ...
                        stlab.istc.cnr.it
                                                             7
Building the training set for K-Nearest
                  Neighbor algorithm
                                                   dbpo:Organisation

dbpo:wikiPageWikiLink         dbpedia:Apple_Inc.                          dbpedia:NeXT
      rdf:type




                                                   dbpedia:Steve_Jobs

            dbpo:Magazine                                                                 dbpo:City



                                dbpedia:Forbes         dbpo:Persondbpedia:Cupertino,_California


                                       Mammal    Scientist   Company    Drug    City     Magazine      Class
                 dbpedia:Steve_Jobs                                                                 dbpo:Person
                                                                                                        ...
                        stlab.istc.cnr.it
                                                             7
Building the training set for K-Nearest
                  Neighbor algorithm
                                                  dbpo:Organisation

dbpo:wikiPageWikiLink           dbpo:Magazine                                 dbpo:City
      rdf:type

     kp:linksTo


                                                  dbpedia:Steve_Jobs




                                       Mammal   Scientist   Company    Drug       City    Magazine      Class
                  dbpedia:Steve_Jobs        0      0           1        0          1         1       dbpo:Person
                                                                                                         ...
                        stlab.istc.cnr.it
                                                            7
Building the training set for K-Nearest
                  Neighbor algorithm
                                                    dbpo:Organisation

dbpo:wikiPageWikiLink           dbpo:Magazine                                   dbpo:City
      rdf:type

     kp:linksTo


                                                    dbpedia:Steve_Jobs




                                       Mammal     Scientist   Company    Drug       City    Magazine      Class
                  dbpedia:Steve_Jobs        0        0           1        0          1         1       dbpo:Person
                          ...               ...       ...         ...     ...        ...      ...          ...
                        stlab.istc.cnr.it
                                                              7
Building the training set for K-Nearest
                  Neighbor algorithm
                                                 dbpo:Organisation

dbpo:wikiPageWikiLink            dbpo:Magazine                        dbpo:City
      rdf:type

     kp:linksTo


                                                 dbpedia:Steve_Jobs




                  ✦     Precision using all DBPO types as features: 31.65%
                  ✦     Precision using the top-level of DBPO as features: 40.27%
                         stlab.istc.cnr.it
                                                         7
Abductive classification with
           EKPs

• EKPs
 ✦   A EKP of a certain entity
     type is a small vocabulary
     that captures the core
     types used for describing
     such entity type as it
     emerges from the
     Wikipedia crowds


                      visit aemoo.org for an exploratory tool based on EKPs
        stlab.istc.cnr.it
                                         8
How can we infer
  the type of
“Galileo Galilei”?




                                http://www.aemoo.org
        stlab.istc.cnr.it
                            9
How can we infer                We know its path types
  the type of
“Galileo Galilei”?




                                         http://www.aemoo.org
        stlab.istc.cnr.it
                            9
We compare the path types involving
We have 231 EKPs            “Galileo Galilei” as subject with EKPs in
                            order to identify the most similar, which
                                     is the "Scientist" EKP.




                                                      http://www.aemoo.org
        stlab.istc.cnr.it
                                   9
The inferred type for
    the resource
  “Galileo Galiei” is
 the class “Scientist”




                                  http://www.aemoo.org
          stlab.istc.cnr.it
                              9
Distinctive weakness of some
            EKPs

                                            ✦ The distinctive weakness
                                              seems due to wide
                                              overlaps among some
                                              EKPs
                                            ✦ Systematic ambiguity of

                                              the 4 largest classes


   ✦   Precision and recall on all DBPO types both 44.4%
   ✦   Precision and recall on the top-level of DBPO hierarchy: 36.5% and 79.5%
       stlab.istc.cnr.it
                                    10
Homotype-based abductive
     classification
• Homotypes are wikilinks that have the same
 type on both the subject and the object of the
 triple
  dbpo:Philosopher dbpedia:Immanuel_Kant                    dbpedia:Plato              dbpo:Philosopher
                rdf:type            dbpo:wikiPageWikiLink                   rdf:type




• We have observed how the homotype is usually
 the most frequent (or in the top 3) wikilink type
• Given an untyped entity, we hypothesize that
 the most frequent type involved in its ingoing/
 outgoing wikilinks detects its homotype, hence
 it indicates its type 11
      stlab.istc.cnr.it
Homotype-based abductive
     classification
                           s




  stlab.istc.cnr.it
                      12
Homotype-based abductive
     classification
                           s




  stlab.istc.cnr.it
                      12
Results on classifying already
      typed resources




   stlab.istc.cnr.it
                       13
Results on untyped resources
 • Results on a sample of 1,000 untyped resources
  are much less satisfactory

       With EKPs


                                  With Homotypes




     stlab.istc.cnr.it
                         14
Why? [1]

• Typed entities: 2:3 typed wikilinks ratio

• Untyped entities: 1:3 typed wikilinks ratio

• Link structure for untyped entities is not rich
 enough




     stlab.istc.cnr.it
                            15
Why? [2]

• DBPO does not provide a complete set of
  classes for correctly typing DBpedia resources

dbpedia:List_of_FIFA_World_Cup_finals      Collection
   dbpedia:Computer_Science      ScientificDiscipline
          dbpedia:Counterattack    Plan
        dbpedia:Eros(concept)    Concept
  dbpedia:Gentlemen’s_agreement      Agreement


      stlab.istc.cnr.it
                             16
Conclusions
• We have investigated different approaches for
 typing DBpedia resources based on the data set
 of wikilinks

• Results are acceptable in the test set, but
 extensive untypedness in output links, and poor
 DBPO coverage severely compromise automatic
 typing for untyped resources

• We have analyzed possible causes deriving from
 some bias in DBpedia
                              17
     stlab.istc.cnr.it
Future work
• Yago could be helpful but
 ✦ there is a lack of mapping between YAGO and DBPO
 ✦ it has larger coverage and only an overlap with DBPO

 ✦ the granularity of its categories is finer, and not easily

   reusable, because the top level is very large




      stlab.istc.cnr.it
                               18
Thank you
           Andrea Nuzzolese
                    -
           STLab, ISTC-CNR
                    &
Dipartimento di Scienze dell’Informazione
         University of Bologna
                  Italy


 stlab.istc.cnr.it
                        19
Path popularity
             nRes(dbpo:MusicalArtist)
 dbpo:MusicalArtist

           Anthony_Kiedis         Michael_Jackson
rdf:type
                                                                      Jackie_Jackson
                            rdf:type
                                        rdf:type
                                                                        rdf:type
              Chad_Smith
                            dbpo:MusicalArtist
                                                                   dbpo:MusicalArtist


Number of resources having type dbpo:MusicalArtist
                                                                            rdf:type



                                                                      John_Lennon
                                       Paul_McCartney
                                                        rdf:type
                                                                    dbpo:MusicalArtist
 PREFIX dbpo : http://dbpedia.org/ontology/

     stlab.istc.cnr.it
Path popularity
                       nSubjectRes(Pi,j)
                                                 Jackson_5
           Dave_Grohl               Michael_Jackson

                                                                      Jackie_Jackson
                              Nirvana


                               Si             dbpo:wikiPageWikiLink            Oj
                        dbpo:MusicalArtist                                  dbpo:Band

   Foo Fighters                                   Beatles
    Number of distinct resources
that participate in a path as subjects
                                                                      John_Lennon
                                     Paul_McCartney


   PREFIX dbpo : http://dbpedia.org/ontology/

         stlab.istc.cnr.it
Path popularity
               pathPopularity(Pi,j,Si)
                                                Jackson_5
        Dave_Grohl                 Michael_Jackson

                                                                   Jackie_Jackson
                              Nirvana

                                    Madonna
                                                      Prince
                              Charlie_Parker                   Keith_Jarrett

Foo Fighters                                     Beatles
   nSubjectRes(Pi,j)/nRes(Si)

                                                                   John_Lennon
                                     Paul_McCartney
          stlab.istc.cnr.it

More Related Content

Similar to Type inference through the analysis of Wikipedia links

Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database
Towards a Simple, Standards-Compliant, and Generic Phylogenetic DatabaseTowards a Simple, Standards-Compliant, and Generic Phylogenetic Database
Towards a Simple, Standards-Compliant, and Generic Phylogenetic DatabaseHilmar Lapp
 
Knowledge Organization System (KOS) for biodiversity information resources, G...
Knowledge Organization System (KOS) for biodiversity information resources, G...Knowledge Organization System (KOS) for biodiversity information resources, G...
Knowledge Organization System (KOS) for biodiversity information resources, G...Dag Endresen
 
Towards an Empirical Semantic Web Science: Knowledge Pattern Extraction and U...
Towards an Empirical Semantic Web Science: Knowledge Pattern Extraction and U...Towards an Empirical Semantic Web Science: Knowledge Pattern Extraction and U...
Towards an Empirical Semantic Web Science: Knowledge Pattern Extraction and U...Andrea Nuzzolese
 
Triplifier talk
Triplifier talkTriplifier talk
Triplifier talkJohn Deck
 
Context Semantic Analysis: a knowledge-based technique for computing inter-do...
Context Semantic Analysis: a knowledge-based technique for computing inter-do...Context Semantic Analysis: a knowledge-based technique for computing inter-do...
Context Semantic Analysis: a knowledge-based technique for computing inter-do...Fabio Benedetti
 
Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...
Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...
Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...Databricks
 
NRNB Annual Report 2017
NRNB Annual Report 2017NRNB Annual Report 2017
NRNB Annual Report 2017Alexander Pico
 
Omdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital AgeOmdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital Agepetermurrayrust
 
Life Science Database Cross Search and Metadata
Life Science Database Cross Search and MetadataLife Science Database Cross Search and Metadata
Life Science Database Cross Search and MetadataMaori Ito
 
Semantic Representation of Provenance in Wikipedia
Semantic Representation of Provenance in WikipediaSemantic Representation of Provenance in Wikipedia
Semantic Representation of Provenance in WikipediaFabrizio Orlandi
 
Facilitating Data Curation: a Solution Developed in the Toxicology Domain
Facilitating Data Curation: a Solution Developed in the Toxicology DomainFacilitating Data Curation: a Solution Developed in the Toxicology Domain
Facilitating Data Curation: a Solution Developed in the Toxicology DomainChristophe Debruyne
 
NRNB Annual Report 2018
NRNB Annual Report 2018NRNB Annual Report 2018
NRNB Annual Report 2018Alexander Pico
 
BioNLP-SADI: A Suite of interoperable BioNLP Semantic Web Services based on S...
BioNLP-SADI: A Suite of interoperable BioNLP Semantic Web Services based on S...BioNLP-SADI: A Suite of interoperable BioNLP Semantic Web Services based on S...
BioNLP-SADI: A Suite of interoperable BioNLP Semantic Web Services based on S...Syed Ahmad Chan Bukhari, PhD
 
論文サーベイ(Sasaki)
論文サーベイ(Sasaki)論文サーベイ(Sasaki)
論文サーベイ(Sasaki)Hajime Sasaki
 
Deep Content Learning in Traffic Prediction and Text Classification
Deep Content Learning in Traffic Prediction and Text ClassificationDeep Content Learning in Traffic Prediction and Text Classification
Deep Content Learning in Traffic Prediction and Text ClassificationHPCC Systems
 
UMich CI Days: Scaling a code in the human dimension
UMich CI Days: Scaling a code in the human dimensionUMich CI Days: Scaling a code in the human dimension
UMich CI Days: Scaling a code in the human dimensionmatthewturk
 

Similar to Type inference through the analysis of Wikipedia links (20)

Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database
Towards a Simple, Standards-Compliant, and Generic Phylogenetic DatabaseTowards a Simple, Standards-Compliant, and Generic Phylogenetic Database
Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database
 
Knowledge Organization System (KOS) for biodiversity information resources, G...
Knowledge Organization System (KOS) for biodiversity information resources, G...Knowledge Organization System (KOS) for biodiversity information resources, G...
Knowledge Organization System (KOS) for biodiversity information resources, G...
 
Towards an Empirical Semantic Web Science: Knowledge Pattern Extraction and U...
Towards an Empirical Semantic Web Science: Knowledge Pattern Extraction and U...Towards an Empirical Semantic Web Science: Knowledge Pattern Extraction and U...
Towards an Empirical Semantic Web Science: Knowledge Pattern Extraction and U...
 
Triplifier talk
Triplifier talkTriplifier talk
Triplifier talk
 
Context Semantic Analysis: a knowledge-based technique for computing inter-do...
Context Semantic Analysis: a knowledge-based technique for computing inter-do...Context Semantic Analysis: a knowledge-based technique for computing inter-do...
Context Semantic Analysis: a knowledge-based technique for computing inter-do...
 
Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...
Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...
Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...
 
NRNB Annual Report 2017
NRNB Annual Report 2017NRNB Annual Report 2017
NRNB Annual Report 2017
 
BioNLPSADI
BioNLPSADIBioNLPSADI
BioNLPSADI
 
2015 genome-center
2015 genome-center2015 genome-center
2015 genome-center
 
Omdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital AgeOmdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital Age
 
4V - WP3 Progress Report (TIN2013-46238)
4V - WP3 Progress Report (TIN2013-46238)4V - WP3 Progress Report (TIN2013-46238)
4V - WP3 Progress Report (TIN2013-46238)
 
Life Science Database Cross Search and Metadata
Life Science Database Cross Search and MetadataLife Science Database Cross Search and Metadata
Life Science Database Cross Search and Metadata
 
Semantic Representation of Provenance in Wikipedia
Semantic Representation of Provenance in WikipediaSemantic Representation of Provenance in Wikipedia
Semantic Representation of Provenance in Wikipedia
 
Neo4j and bioinformatics
Neo4j and bioinformaticsNeo4j and bioinformatics
Neo4j and bioinformatics
 
Facilitating Data Curation: a Solution Developed in the Toxicology Domain
Facilitating Data Curation: a Solution Developed in the Toxicology DomainFacilitating Data Curation: a Solution Developed in the Toxicology Domain
Facilitating Data Curation: a Solution Developed in the Toxicology Domain
 
NRNB Annual Report 2018
NRNB Annual Report 2018NRNB Annual Report 2018
NRNB Annual Report 2018
 
BioNLP-SADI: A Suite of interoperable BioNLP Semantic Web Services based on S...
BioNLP-SADI: A Suite of interoperable BioNLP Semantic Web Services based on S...BioNLP-SADI: A Suite of interoperable BioNLP Semantic Web Services based on S...
BioNLP-SADI: A Suite of interoperable BioNLP Semantic Web Services based on S...
 
論文サーベイ(Sasaki)
論文サーベイ(Sasaki)論文サーベイ(Sasaki)
論文サーベイ(Sasaki)
 
Deep Content Learning in Traffic Prediction and Text Classification
Deep Content Learning in Traffic Prediction and Text ClassificationDeep Content Learning in Traffic Prediction and Text Classification
Deep Content Learning in Traffic Prediction and Text Classification
 
UMich CI Days: Scaling a code in the human dimension
UMich CI Days: Scaling a code in the human dimensionUMich CI Days: Scaling a code in the human dimension
UMich CI Days: Scaling a code in the human dimension
 

More from Andrea Nuzzolese

Aemoo: Linked Data Exploration based on Knowledge Patterns
Aemoo: Linked Data Exploration based on Knowledge PatternsAemoo: Linked Data Exploration based on Knowledge Patterns
Aemoo: Linked Data Exploration based on Knowledge PatternsAndrea Nuzzolese
 
Conference Linked Data: the ScholarlyData project
Conference Linked Data: the ScholarlyData projectConference Linked Data: the ScholarlyData project
Conference Linked Data: the ScholarlyData projectAndrea Nuzzolese
 
Semantic Technologies in ST&DL
Semantic Technologies in ST&DLSemantic Technologies in ST&DL
Semantic Technologies in ST&DLAndrea Nuzzolese
 
Evaluating citation functions in CiTO: cognitive issues
Evaluating citation functions in CiTO: cognitive issuesEvaluating citation functions in CiTO: cognitive issues
Evaluating citation functions in CiTO: cognitive issuesAndrea Nuzzolese
 
Knowledge Patterns for the Web: extraction, transformation, and reuse
Knowledge Patterns for the Web: extraction, transformation, and reuseKnowledge Patterns for the Web: extraction, transformation, and reuse
Knowledge Patterns for the Web: extraction, transformation, and reuseAndrea Nuzzolese
 
Towards the automatic identification of the nature of citations
Towards the automatic identification of the nature of citationsTowards the automatic identification of the nature of citations
Towards the automatic identification of the nature of citationsAndrea Nuzzolese
 
Knowledge Representation and Reasoning with Apache Stanbol
Knowledge Representation and Reasoning with Apache StanbolKnowledge Representation and Reasoning with Apache Stanbol
Knowledge Representation and Reasoning with Apache StanbolAndrea Nuzzolese
 
Gathering Lexical Linked Data and Knowledge Patterns from FrameNet
Gathering Lexical Linked Data and Knowledge Patterns from FrameNetGathering Lexical Linked Data and Knowledge Patterns from FrameNet
Gathering Lexical Linked Data and Knowledge Patterns from FrameNetAndrea Nuzzolese
 
Aemoo: exploratory search based on knowledge patterns over the Semantic Web
Aemoo:  exploratory search based on knowledge patterns over the Semantic WebAemoo:  exploratory search based on knowledge patterns over the Semantic Web
Aemoo: exploratory search based on knowledge patterns over the Semantic WebAndrea Nuzzolese
 

More from Andrea Nuzzolese (12)

Aemoo: Linked Data Exploration based on Knowledge Patterns
Aemoo: Linked Data Exploration based on Knowledge PatternsAemoo: Linked Data Exploration based on Knowledge Patterns
Aemoo: Linked Data Exploration based on Knowledge Patterns
 
Conference Linked Data: the ScholarlyData project
Conference Linked Data: the ScholarlyData projectConference Linked Data: the ScholarlyData project
Conference Linked Data: the ScholarlyData project
 
Semantic Technologies in ST&DL
Semantic Technologies in ST&DLSemantic Technologies in ST&DL
Semantic Technologies in ST&DL
 
Oke
OkeOke
Oke
 
Sheldon challenge
Sheldon challengeSheldon challenge
Sheldon challenge
 
Evaluating citation functions in CiTO: cognitive issues
Evaluating citation functions in CiTO: cognitive issuesEvaluating citation functions in CiTO: cognitive issues
Evaluating citation functions in CiTO: cognitive issues
 
Knowledge Patterns for the Web: extraction, transformation, and reuse
Knowledge Patterns for the Web: extraction, transformation, and reuseKnowledge Patterns for the Web: extraction, transformation, and reuse
Knowledge Patterns for the Web: extraction, transformation, and reuse
 
Loditaly2014 new
Loditaly2014 newLoditaly2014 new
Loditaly2014 new
 
Towards the automatic identification of the nature of citations
Towards the automatic identification of the nature of citationsTowards the automatic identification of the nature of citations
Towards the automatic identification of the nature of citations
 
Knowledge Representation and Reasoning with Apache Stanbol
Knowledge Representation and Reasoning with Apache StanbolKnowledge Representation and Reasoning with Apache Stanbol
Knowledge Representation and Reasoning with Apache Stanbol
 
Gathering Lexical Linked Data and Knowledge Patterns from FrameNet
Gathering Lexical Linked Data and Knowledge Patterns from FrameNetGathering Lexical Linked Data and Knowledge Patterns from FrameNet
Gathering Lexical Linked Data and Knowledge Patterns from FrameNet
 
Aemoo: exploratory search based on knowledge patterns over the Semantic Web
Aemoo:  exploratory search based on knowledge patterns over the Semantic WebAemoo:  exploratory search based on knowledge patterns over the Semantic Web
Aemoo: exploratory search based on knowledge patterns over the Semantic Web
 

Recently uploaded

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 

Recently uploaded (20)

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 

Type inference through the analysis of Wikipedia links

  • 1. Type inference through the analysis of Wikipedia links Andrea Giovanni Nuzzolese nuzzoles@cs.unibo.it Aldo Gangemi aldo.gangemi@cnr.it Valentina Presutti valentina.presutti@cnr.it Paolo Ciancarini ciancarini@cs.unibo.it stlab.istc.cnr.it 16 April 2012 - Lyon, France - LDOW 2012
  • 2. Outline • Motivations • Materials • Applied methods • Results • Conclusions stlab.istc.cnr.it 2
  • 3. Motivations ✦ Only a subset of the DBpedia resources is typed with the Resources used in wikilinks relations: DBpedia ontology (DBPO) 15,944,381 ✦ The typing procedure is top- down. ✦ Is the DBPO complete with respect to the DBpedia domain? Resources having a ✦ How good and homogeneous DBPO type: 1,518,697 is the granularity of DBPO types? stlab.istc.cnr.it 3
  • 4. Materials Wikilink triples with typed subject/object: 16,745,830 DBpedia 3.6 Dataset # of triples Wikilinks wikilink triples 107,892,317 triples: 107,892,317 infobox mapping-based “data” triples 9,357,273 rdfs:label triples 7,972,225 rdf:type triples 6,173,940 DBpedia ontology: 272 classes infobox mapping-based “object” triples 4,251,239 stlab.istc.cnr.it 4
  • 5. What we did • Wikilinks of a DBpedia resource convey knowledge that can be used for classifying it. • Classification methods ✦ Inductive learning: k-Nearest Neighbor algorithm ✦ Abductive classification based on EKPs [1] and homotypes used as background knowledge • The methods were performed on Resources having a Resources used in wikilinks DBPO type: relations: Sample of 1,518,697 15,944,381 untyped resources: 1,000 [1] A. G. Nuzzolese, A. Gangemi,V. Presutti, and P. Ciancarini. Encyclopedic Knowledge Patterns from Wikipedia Links. In L. Aroyo, N. Noy, and C. Welty, editors, Proceedings of the 10th International Semantic Web Conference (ISWC2011), pages 520-536. Springer, 2011. stlab.istc.cnr.it 5
  • 6. Inductive classification • We designed two inductive classification experiments based on the k-NN algorithm ✦ on 272 features, i.e., all the classes in the DBPO ✦ on 27 features, i.e., the top-level classes in the DBPO hierarchy • For each experiment we built a labeled feature space model as training set by using a randomly sampled 20% of typed resources ✦ the algorithms were tested on the remaining 80% of typed resources stlab.istc.cnr.it 5 6
  • 7. Building the training set for K-Nearest Neighbor algorithm dbpo:wikiPageWikiLink dbpedia:Apple_Inc. dbpedia:NeXT dbpedia:Steve_Jobs dbpedia:Forbes dbpedia:Cupertino,_California Mammal Scientist Company Drug City Magazine Class dbpedia:Steve_Jobs ... stlab.istc.cnr.it 7
  • 8. Building the training set for K-Nearest Neighbor algorithm dbpo:Organisation dbpo:wikiPageWikiLink dbpedia:Apple_Inc. dbpedia:NeXT rdf:type dbpedia:Steve_Jobs dbpo:Magazine dbpo:City dbpedia:Forbes dbpo:Persondbpedia:Cupertino,_California Mammal Scientist Company Drug City Magazine Class dbpedia:Steve_Jobs dbpo:Person ... stlab.istc.cnr.it 7
  • 9. Building the training set for K-Nearest Neighbor algorithm dbpo:Organisation dbpo:wikiPageWikiLink dbpo:Magazine dbpo:City rdf:type kp:linksTo dbpedia:Steve_Jobs Mammal Scientist Company Drug City Magazine Class dbpedia:Steve_Jobs 0 0 1 0 1 1 dbpo:Person ... stlab.istc.cnr.it 7
  • 10. Building the training set for K-Nearest Neighbor algorithm dbpo:Organisation dbpo:wikiPageWikiLink dbpo:Magazine dbpo:City rdf:type kp:linksTo dbpedia:Steve_Jobs Mammal Scientist Company Drug City Magazine Class dbpedia:Steve_Jobs 0 0 1 0 1 1 dbpo:Person ... ... ... ... ... ... ... ... stlab.istc.cnr.it 7
  • 11. Building the training set for K-Nearest Neighbor algorithm dbpo:Organisation dbpo:wikiPageWikiLink dbpo:Magazine dbpo:City rdf:type kp:linksTo dbpedia:Steve_Jobs ✦ Precision using all DBPO types as features: 31.65% ✦ Precision using the top-level of DBPO as features: 40.27% stlab.istc.cnr.it 7
  • 12. Abductive classification with EKPs • EKPs ✦ A EKP of a certain entity type is a small vocabulary that captures the core types used for describing such entity type as it emerges from the Wikipedia crowds visit aemoo.org for an exploratory tool based on EKPs stlab.istc.cnr.it 8
  • 13. How can we infer the type of “Galileo Galilei”? http://www.aemoo.org stlab.istc.cnr.it 9
  • 14. How can we infer We know its path types the type of “Galileo Galilei”? http://www.aemoo.org stlab.istc.cnr.it 9
  • 15. We compare the path types involving We have 231 EKPs “Galileo Galilei” as subject with EKPs in order to identify the most similar, which is the "Scientist" EKP. http://www.aemoo.org stlab.istc.cnr.it 9
  • 16. The inferred type for the resource “Galileo Galiei” is the class “Scientist” http://www.aemoo.org stlab.istc.cnr.it 9
  • 17. Distinctive weakness of some EKPs ✦ The distinctive weakness seems due to wide overlaps among some EKPs ✦ Systematic ambiguity of the 4 largest classes ✦ Precision and recall on all DBPO types both 44.4% ✦ Precision and recall on the top-level of DBPO hierarchy: 36.5% and 79.5% stlab.istc.cnr.it 10
  • 18. Homotype-based abductive classification • Homotypes are wikilinks that have the same type on both the subject and the object of the triple dbpo:Philosopher dbpedia:Immanuel_Kant dbpedia:Plato dbpo:Philosopher rdf:type dbpo:wikiPageWikiLink rdf:type • We have observed how the homotype is usually the most frequent (or in the top 3) wikilink type • Given an untyped entity, we hypothesize that the most frequent type involved in its ingoing/ outgoing wikilinks detects its homotype, hence it indicates its type 11 stlab.istc.cnr.it
  • 19. Homotype-based abductive classification s stlab.istc.cnr.it 12
  • 20. Homotype-based abductive classification s stlab.istc.cnr.it 12
  • 21. Results on classifying already typed resources stlab.istc.cnr.it 13
  • 22. Results on untyped resources • Results on a sample of 1,000 untyped resources are much less satisfactory With EKPs With Homotypes stlab.istc.cnr.it 14
  • 23. Why? [1] • Typed entities: 2:3 typed wikilinks ratio • Untyped entities: 1:3 typed wikilinks ratio • Link structure for untyped entities is not rich enough stlab.istc.cnr.it 15
  • 24. Why? [2] • DBPO does not provide a complete set of classes for correctly typing DBpedia resources dbpedia:List_of_FIFA_World_Cup_finals Collection dbpedia:Computer_Science ScientificDiscipline dbpedia:Counterattack Plan dbpedia:Eros(concept) Concept dbpedia:Gentlemen’s_agreement Agreement stlab.istc.cnr.it 16
  • 25. Conclusions • We have investigated different approaches for typing DBpedia resources based on the data set of wikilinks • Results are acceptable in the test set, but extensive untypedness in output links, and poor DBPO coverage severely compromise automatic typing for untyped resources • We have analyzed possible causes deriving from some bias in DBpedia 17 stlab.istc.cnr.it
  • 26. Future work • Yago could be helpful but ✦ there is a lack of mapping between YAGO and DBPO ✦ it has larger coverage and only an overlap with DBPO ✦ the granularity of its categories is finer, and not easily reusable, because the top level is very large stlab.istc.cnr.it 18
  • 27. Thank you Andrea Nuzzolese - STLab, ISTC-CNR & Dipartimento di Scienze dell’Informazione University of Bologna Italy stlab.istc.cnr.it 19
  • 28. Path popularity nRes(dbpo:MusicalArtist) dbpo:MusicalArtist Anthony_Kiedis Michael_Jackson rdf:type Jackie_Jackson rdf:type rdf:type rdf:type Chad_Smith dbpo:MusicalArtist dbpo:MusicalArtist Number of resources having type dbpo:MusicalArtist rdf:type John_Lennon Paul_McCartney rdf:type dbpo:MusicalArtist PREFIX dbpo : http://dbpedia.org/ontology/ stlab.istc.cnr.it
  • 29. Path popularity nSubjectRes(Pi,j) Jackson_5 Dave_Grohl Michael_Jackson Jackie_Jackson Nirvana Si dbpo:wikiPageWikiLink Oj dbpo:MusicalArtist dbpo:Band Foo Fighters Beatles Number of distinct resources that participate in a path as subjects John_Lennon Paul_McCartney PREFIX dbpo : http://dbpedia.org/ontology/ stlab.istc.cnr.it
  • 30. Path popularity pathPopularity(Pi,j,Si) Jackson_5 Dave_Grohl Michael_Jackson Jackie_Jackson Nirvana Madonna Prince Charlie_Parker Keith_Jarrett Foo Fighters Beatles nSubjectRes(Pi,j)/nRes(Si) John_Lennon Paul_McCartney stlab.istc.cnr.it