SlideShare uma empresa Scribd logo
1 de 38
Baixar para ler offline
Similarity on DBpedia
UIMR




PhD student: Samantha Lam
Supervisor: Conor Hayes
Similarity

How similar are the following films:




                                      2
Similarity

How similar are the following films:    (Unsatisfactory)
                                      Answer: it depends!




                                                            3
DBpedia Graph

Films - nodes - on DBpedia.




                              4
DBpedia Graph

Films - nodes - on DBpedia.

Some things about DBpedia:

    Big, rich, dense Knowledge Base
     → 3.77m nodes, 400m edges (EN)

    Lots of prior work (as we shall see...)

    But very heterogeneous - vocabularies, categories




                                                        4
DBpedia Graph

Films - nodes - on DBpedia.

Some things about DBpedia:

    Big, rich, dense Knowledge Base
     → 3.77m nodes, 400m edges (EN)

    Lots of prior work (as we shall see...)

    But very heterogeneous - vocabularies, categories

                           It is a graph




                                                        4
Similarity in general

   Cognitive Science - Tversky (1977) - psychology - featural.
       E.g. film: genre, language, director
   Modelling of human thought, semantic relations, how do we
   relate things to each other? (Quillian & Collins 1969)




                                                                 5
Semantic

The notion of semantic networks is derived from the hierarchical
semantic memory model [Collins & Quillian, 1969]




                                                                   6
Semantic Similarity

Different techniques:
    Word frequency: Latent semantic analysis (doesn’t actually
    use semantic net structure)
    Rada (1989) - average shortest path length
    Resnik (1999) - information content of lcs




                                                                 7
Semantic Similarity

Different techniques:
    Word frequency: Latent semantic analysis (doesn’t actually
    use semantic net structure)
    Rada (1989) - average shortest path length
    Resnik (1999) - information content of lcs

Unfortunately...
    Word frequency N/A
    Often assumes hierarchical/tree structure of
    taxonomy/ontology. (Both Rada and Resnik assume
    taxonomy is an is-A hierarchy)




                                                                 7
Semantic Similarity

Remember, DBpedia not as ‘neat’:




(Image source: http://www.visualdataweb.org/relfinder/)

                                                          8
On DBpedia/Wikipedia

Recent applications:

    Gabrilovich & Markovitch (2007) - express text as a weighted
    vector of Wikipedia articles, Explicit Semantic Analysis (ESA)




                                                                     9
On DBpedia/Wikipedia

Recent applications:

    Gabrilovich & Markovitch (2007) - express text as a weighted
    vector of Wikipedia articles, Explicit Semantic Analysis (ESA)

    Witten & Milne (2008) - the Wikipedia Link-based measure -
    similarity of neighbours




                                                                     9
On DBpedia/Wikipedia

Recent applications:

    Gabrilovich & Markovitch (2007) - express text as a weighted
    vector of Wikipedia articles, Explicit Semantic Analysis (ESA)

    Witten & Milne (2008) - the Wikipedia Link-based measure -
    similarity of neighbours

    Passant (2010) - Linked Data Semantic Distance




                                                                     9
On DBpedia/Wikipedia

Recent applications:

    Gabrilovich & Markovitch (2007) - express text as a weighted
    vector of Wikipedia articles, Explicit Semantic Analysis (ESA)

    Witten & Milne (2008) - the Wikipedia Link-based measure -
    similarity of neighbours

    Passant (2010) - Linked Data Semantic Distance

    Mirizzi et al. (2012) uses DBpedia for movie recommendation
    using a Vector Space Model




                                                                     9
On DBpedia/Wikipedia

Recent applications:

    Gabrilovich & Markovitch (2007) - express text as a weighted
    vector of Wikipedia articles, Explicit Semantic Analysis (ESA)

    Witten & Milne (2008) - the Wikipedia Link-based measure -
    similarity of neighbours

    Passant (2010) - Linked Data Semantic Distance ← uses paths!

    Mirizzi et al. (2012) uses DBpedia for movie recommendation
    using a Vector Space Model




                                                                     10
Similarity

Important:
    Properties can be related to each other

                                       node type 2, e.g. film
                                       node, e.g. director
                                       type 1, e.g. influenced
                                       type 2, e.g. collaborated with




                                                                        11
Network Similarity

Social Network Analysis
    Established field - notions of influence, centrality, rank etc.
    Often applied to small networks




           Note: Ranking is often based on similarity



                                                                    12
Network Similarity

Homogeneous network measures:

    PageRank - Sergey & Brin (1998) - random-surfer with
    teleportation
    SimRank - Jeh & Widom (2002) - iteratively ‘inherits’ rank
    of neighbours




                                                                 13
Network Similarity

Homogeneous network measures:

    PageRank - Sergey & Brin (1998) - random-surfer with
    teleportation
    SimRank - Jeh & Widom (2002) - iteratively ‘inherits’ rank
    of neighbours




                                                                 13
Network Similarity

Homogeneous network measures:

    PageRank - Sergey & Brin (1998) - random-surfer with
    teleportation
    SimRank - Jeh & Widom (2002) - iteratively ‘inherits’ rank
    of neighbours




    σact - Thiel & Berthold (2010) - node similarities from
    spreading activation with a decay factor



                                                                 13
Network Similarity

Heterogeneous network measures:

    PathSim - Sun & Han (2009) - count instances of
    ‘meta-path’ (specific link pattern)




                                                      14
Network Similarity


Applicability to DBpedia:
    PageRank, SimRank - N/A - assumes homogeneous links!
    Spreading Activation - possible with constraints
    Apply PathSim - but how to learn such meta-paths?




                                                           15
Network Similarity


Applicability to DBpedia:
    PageRank, SimRank - N/A - assumes homogeneous links!
    Spreading Activation - possible with constraints
    Apply PathSim - but how to learn such meta-paths?

Another idea:
    Count node-disjoint paths.
    Why? View each path as one distinct ‘reason’.




                                                           15
Similarity

                                                 
                        Totoro    GITS     Matrix
              Totoro     44        1        0    
                                                 
              GITS        1       35        2    
               Matrix     0         2       58

   Totoro – GITS
       Category:Anime films

   GITS – Matrix
       Category:Brain-computer interfacing in fiction
       Matrix → Category:The Matrix (franchise) →
       Category:Media franchises ← GITS




                                                       16
Similarity

How similar are the following films:   Answer: it still depends




                                                                 17
Similarity

How similar are the following films:   Answer: it still depends
                                      - on the path you take




                                                                 18
Summary

   Similarity, useful concept in many areas, hard to define
       how are films similar?

   DBpedia, richly linked KB
       film information available here

→ Problem: How to define similarity on DBpedia?




                                                             19
Summary

   Similarity, useful concept in many areas, hard to define
       how are films similar?

   DBpedia, richly linked KB
       film information available here

→ Problem: How to define similarity on DBpedia?

   Past methods - don’t exploit linkedness
   Network analysis methods can aid this
       test trial with node-disjoint paths, GITS more similar to Matrix
       than Totoro




                                                                          19
Summary

   Similarity, useful concept in many areas, hard to define
       how are films similar?

   DBpedia, richly linked KB
       film information available here

→ Problem: How to define similarity on DBpedia?

   Past methods - don’t exploit linkedness
   Network analysis methods can aid this
       test trial with node-disjoint paths, GITS more similar to Matrix
       than Totoro




                                                                          20
Ongoing/Future Work


Mining DBpedia as Network

   Analyse structured and related data




                                         21
Ongoing/Future Work


Mining DBpedia as Network

   Analyse structured and related data

   Similarity as complement to – reasoning, retrieval, querying

   Also useful in NLP, recommender systems, knowledge
   discovery

→ Examples: work we do in UIMR




                                                                  21
Ioana Hulpus (2011/2012)

Graph-based topic analysis with the support of Linked Data




                                                             22
Ioana Hulpus (2011/2012)

Graph-based topic analysis with the support of Linked Data




                                                             23
Benjamin Heitmann (2011/2012)

Spreading activation for cross-domain recommendation




                                                       24
Challenges/Discussion

Challenges:
    Topology of DBpedia graph
        Standard SNA measures for homogeneous networks, e.g.
        density, degree distribution - how to apply to DBpedia?
        What does a path actually mean?
        Which subgraphs to use?
        How do metrics vary with different subgraphs, e.g. diff
        ontologies/categories?




                                                                  25
Challenges/Discussion

Challenges:
    Topology of DBpedia graph
        Standard SNA measures for homogeneous networks, e.g.
        density, degree distribution - how to apply to DBpedia?
        What does a path actually mean?
        Which subgraphs to use?
        How do metrics vary with different subgraphs, e.g. diff
        ontologies/categories?
    Scalability (not problem, but challenge)
    Evaluation - how do we confirm something is similar?




                                                                  25
Challenges/Discussion

Challenges:
    Topology of DBpedia graph
        Standard SNA measures for homogeneous networks, e.g.
        density, degree distribution - how to apply to DBpedia?
        What does a path actually mean?
        Which subgraphs to use?
        How do metrics vary with different subgraphs, e.g. diff
        ontologies/categories?
    Scalability (not problem, but challenge)
    Evaluation - how do we confirm something is similar?

       Thanks for listening! Questions/Suggestions?


                                                                  25

Mais conteúdo relacionado

Mais procurados

Semantic Web Foundations for Representing, Reasoning, and Traversing Contextu...
Semantic Web Foundations for Representing, Reasoning, and Traversing Contextu...Semantic Web Foundations for Representing, Reasoning, and Traversing Contextu...
Semantic Web Foundations for Representing, Reasoning, and Traversing Contextu...
Artificial Intelligence Institute at UofSC
 
Introduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataIntroduction to question answering for linked data & big data
Introduction to question answering for linked data & big data
Andre Freitas
 
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Andre Freitas
 
Identifying Topics in Social Media Posts using DBpedia
Identifying Topics in Social Media Posts using DBpediaIdentifying Topics in Social Media Posts using DBpedia
Identifying Topics in Social Media Posts using DBpedia
Óscar Muñoz García
 
NISO Webinar: Software Preservation and Use: I Saved the Files But Can I Run ...
NISO Webinar: Software Preservation and Use: I Saved the Files But Can I Run ...NISO Webinar: Software Preservation and Use: I Saved the Files But Can I Run ...
NISO Webinar: Software Preservation and Use: I Saved the Files But Can I Run ...
National Information Standards Organization (NISO)
 

Mais procurados (20)

Tuning Personalized PageRank for Semantics-aware Recommendations based on Lin...
Tuning Personalized PageRank for Semantics-aware Recommendations based on Lin...Tuning Personalized PageRank for Semantics-aware Recommendations based on Lin...
Tuning Personalized PageRank for Semantics-aware Recommendations based on Lin...
 
Semantic Web Foundations for Representing, Reasoning, and Traversing Contextu...
Semantic Web Foundations for Representing, Reasoning, and Traversing Contextu...Semantic Web Foundations for Representing, Reasoning, and Traversing Contextu...
Semantic Web Foundations for Representing, Reasoning, and Traversing Contextu...
 
Introduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataIntroduction to question answering for linked data & big data
Introduction to question answering for linked data & big data
 
SDA2013 Pundit: Creating, Exploring and Consuming Annotations
SDA2013 Pundit: Creating, Exploring and Consuming AnnotationsSDA2013 Pundit: Creating, Exploring and Consuming Annotations
SDA2013 Pundit: Creating, Exploring and Consuming Annotations
 
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
 
Tutorial on Semantic Digital Libraries (WWW'2007)
Tutorial on Semantic Digital Libraries (WWW'2007)Tutorial on Semantic Digital Libraries (WWW'2007)
Tutorial on Semantic Digital Libraries (WWW'2007)
 
Identifying Topics in Social Media Posts using DBpedia
Identifying Topics in Social Media Posts using DBpediaIdentifying Topics in Social Media Posts using DBpedia
Identifying Topics in Social Media Posts using DBpedia
 
Normative Requirements as Linked Data
Normative Requirements as Linked DataNormative Requirements as Linked Data
Normative Requirements as Linked Data
 
Question answering in linked data
Question answering in linked dataQuestion answering in linked data
Question answering in linked data
 
On the many graphs of the Web and the interest of adding their missing links.
On the many graphs of the Web and the interest of adding their missing links. On the many graphs of the Web and the interest of adding their missing links.
On the many graphs of the Web and the interest of adding their missing links.
 
Wimmics Research Team Overview 2017
Wimmics Research Team Overview 2017Wimmics Research Team Overview 2017
Wimmics Research Team Overview 2017
 
bridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the webbridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the web
 
ESWC 2015 Closing and "General Chair's minute of Madness"
ESWC 2015 Closing and "General Chair's minute of Madness"ESWC 2015 Closing and "General Chair's minute of Madness"
ESWC 2015 Closing and "General Chair's minute of Madness"
 
NISO Webinar: Software Preservation and Use: I Saved the Files But Can I Run ...
NISO Webinar: Software Preservation and Use: I Saved the Files But Can I Run ...NISO Webinar: Software Preservation and Use: I Saved the Files But Can I Run ...
NISO Webinar: Software Preservation and Use: I Saved the Files But Can I Run ...
 
Semantic Web
Semantic WebSemantic Web
Semantic Web
 
Combining Multimedia and Semantics (LACNEM2010)
Combining Multimedia and Semantics (LACNEM2010)Combining Multimedia and Semantics (LACNEM2010)
Combining Multimedia and Semantics (LACNEM2010)
 
From the Semantic Web to the Web of Data: ten years of linking up
From the Semantic Web to the Web of Data: ten years of linking upFrom the Semantic Web to the Web of Data: ten years of linking up
From the Semantic Web to the Web of Data: ten years of linking up
 
Better Search With Structured Knowledge
Better Search With Structured KnowledgeBetter Search With Structured Knowledge
Better Search With Structured Knowledge
 
The Unbearable Lightness of Wiking
The Unbearable Lightness of Wiking The Unbearable Lightness of Wiking
The Unbearable Lightness of Wiking
 
Verifying Integrity Constraints of a RDF-based WordNet
Verifying Integrity Constraints of a RDF-based WordNetVerifying Integrity Constraints of a RDF-based WordNet
Verifying Integrity Constraints of a RDF-based WordNet
 

Semelhante a Similarity on DBpedia

Capturing emerging relations between schema ontologies on the Web of Data
Capturing emerging relations between schema ontologies on the Web of DataCapturing emerging relations between schema ontologies on the Web of Data
Capturing emerging relations between schema ontologies on the Web of Data
Andriy Nikolov
 
Schema-agnositc queries over large-schema databases: a distributional semanti...
Schema-agnositc queries over large-schema databases: a distributional semanti...Schema-agnositc queries over large-schema databases: a distributional semanti...
Schema-agnositc queries over large-schema databases: a distributional semanti...
Andre Freitas
 
NISO Forum, Denver, Sept. 24, 2012: Opening Keynote: The Many and the One: BC...
NISO Forum, Denver, Sept. 24, 2012: Opening Keynote: The Many and the One: BC...NISO Forum, Denver, Sept. 24, 2012: Opening Keynote: The Many and the One: BC...
NISO Forum, Denver, Sept. 24, 2012: Opening Keynote: The Many and the One: BC...
National Information Standards Organization (NISO)
 
How To Make Linked Data More than Data
How To Make Linked Data More than DataHow To Make Linked Data More than Data
How To Make Linked Data More than Data
Amit Sheth
 
5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information Retrieval5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information Retrieval
Bhaskar Mitra
 
Prateek Jain's Dissertation Defense - Linked Open Data Alignment and Querying
Prateek Jain's Dissertation Defense - Linked Open Data Alignment and QueryingPrateek Jain's Dissertation Defense - Linked Open Data Alignment and Querying
Prateek Jain's Dissertation Defense - Linked Open Data Alignment and Querying
Artificial Intelligence Institute at UofSC
 
Challenging Problems for Scalable Mining of Heterogeneous Social and Informat...
Challenging Problems for Scalable Mining of Heterogeneous Social and Informat...Challenging Problems for Scalable Mining of Heterogeneous Social and Informat...
Challenging Problems for Scalable Mining of Heterogeneous Social and Informat...
BigMine
 

Semelhante a Similarity on DBpedia (20)

Capturing emerging relations between schema ontologies on the Web of Data
Capturing emerging relations between schema ontologies on the Web of DataCapturing emerging relations between schema ontologies on the Web of Data
Capturing emerging relations between schema ontologies on the Web of Data
 
ESWC 2011 BLOOMS+
ESWC 2011 BLOOMS+ ESWC 2011 BLOOMS+
ESWC 2011 BLOOMS+
 
Schema-agnositc queries over large-schema databases: a distributional semanti...
Schema-agnositc queries over large-schema databases: a distributional semanti...Schema-agnositc queries over large-schema databases: a distributional semanti...
Schema-agnositc queries over large-schema databases: a distributional semanti...
 
Fueling the future with Semantic Web patterns - Keynote at WOP2014@ISWC
Fueling the future with Semantic Web patterns - Keynote at WOP2014@ISWCFueling the future with Semantic Web patterns - Keynote at WOP2014@ISWC
Fueling the future with Semantic Web patterns - Keynote at WOP2014@ISWC
 
NISO Forum, Denver, Sept. 24, 2012: Opening Keynote: The Many and the One: BC...
NISO Forum, Denver, Sept. 24, 2012: Opening Keynote: The Many and the One: BC...NISO Forum, Denver, Sept. 24, 2012: Opening Keynote: The Many and the One: BC...
NISO Forum, Denver, Sept. 24, 2012: Opening Keynote: The Many and the One: BC...
 
Information Quality in the Web Era
Information Quality in the Web EraInformation Quality in the Web Era
Information Quality in the Web Era
 
PhD Proposal Defense - Prateek Jain
PhD Proposal Defense - Prateek JainPhD Proposal Defense - Prateek Jain
PhD Proposal Defense - Prateek Jain
 
Social Network Analysis Introduction including Data Structure Graph overview.
Social Network Analysis Introduction including Data Structure Graph overview. Social Network Analysis Introduction including Data Structure Graph overview.
Social Network Analysis Introduction including Data Structure Graph overview.
 
How To Make Linked Data More than Data
How To Make Linked Data More than DataHow To Make Linked Data More than Data
How To Make Linked Data More than Data
 
How To Make Linked Data More than Data
How To Make Linked Data More than DataHow To Make Linked Data More than Data
How To Make Linked Data More than Data
 
5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information Retrieval5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information Retrieval
 
The Semantic Web: RPI ITWS Capstone (Fall 2012)
The Semantic Web: RPI ITWS Capstone (Fall 2012)The Semantic Web: RPI ITWS Capstone (Fall 2012)
The Semantic Web: RPI ITWS Capstone (Fall 2012)
 
ITWS Capstone Lecture (Spring 2013)
ITWS Capstone Lecture (Spring 2013)ITWS Capstone Lecture (Spring 2013)
ITWS Capstone Lecture (Spring 2013)
 
Prateek Jain's Dissertation Defense - Linked Open Data Alignment and Querying
Prateek Jain's Dissertation Defense - Linked Open Data Alignment and QueryingPrateek Jain's Dissertation Defense - Linked Open Data Alignment and Querying
Prateek Jain's Dissertation Defense - Linked Open Data Alignment and Querying
 
Effective Semantics for Engineering NLP Systems
Effective Semantics for Engineering NLP SystemsEffective Semantics for Engineering NLP Systems
Effective Semantics for Engineering NLP Systems
 
Presentation at MTSR 2012
Presentation at MTSR 2012Presentation at MTSR 2012
Presentation at MTSR 2012
 
Challenging Problems for Scalable Mining of Heterogeneous Social and Informat...
Challenging Problems for Scalable Mining of Heterogeneous Social and Informat...Challenging Problems for Scalable Mining of Heterogeneous Social and Informat...
Challenging Problems for Scalable Mining of Heterogeneous Social and Informat...
 
Context Semantic Analysis: a knowledge-based technique for computing inter-do...
Context Semantic Analysis: a knowledge-based technique for computing inter-do...Context Semantic Analysis: a knowledge-based technique for computing inter-do...
Context Semantic Analysis: a knowledge-based technique for computing inter-do...
 
Graph Based Machine Learning with Applications to Media Analytics
Graph Based Machine Learning with Applications to Media AnalyticsGraph Based Machine Learning with Applications to Media Analytics
Graph Based Machine Learning with Applications to Media Analytics
 
Jürgens diata12-communities
Jürgens diata12-communitiesJürgens diata12-communities
Jürgens diata12-communities
 

Similarity on DBpedia

  • 1. Similarity on DBpedia UIMR PhD student: Samantha Lam Supervisor: Conor Hayes
  • 2. Similarity How similar are the following films: 2
  • 3. Similarity How similar are the following films: (Unsatisfactory) Answer: it depends! 3
  • 4. DBpedia Graph Films - nodes - on DBpedia. 4
  • 5. DBpedia Graph Films - nodes - on DBpedia. Some things about DBpedia: Big, rich, dense Knowledge Base → 3.77m nodes, 400m edges (EN) Lots of prior work (as we shall see...) But very heterogeneous - vocabularies, categories 4
  • 6. DBpedia Graph Films - nodes - on DBpedia. Some things about DBpedia: Big, rich, dense Knowledge Base → 3.77m nodes, 400m edges (EN) Lots of prior work (as we shall see...) But very heterogeneous - vocabularies, categories It is a graph 4
  • 7. Similarity in general Cognitive Science - Tversky (1977) - psychology - featural. E.g. film: genre, language, director Modelling of human thought, semantic relations, how do we relate things to each other? (Quillian & Collins 1969) 5
  • 8. Semantic The notion of semantic networks is derived from the hierarchical semantic memory model [Collins & Quillian, 1969] 6
  • 9. Semantic Similarity Different techniques: Word frequency: Latent semantic analysis (doesn’t actually use semantic net structure) Rada (1989) - average shortest path length Resnik (1999) - information content of lcs 7
  • 10. Semantic Similarity Different techniques: Word frequency: Latent semantic analysis (doesn’t actually use semantic net structure) Rada (1989) - average shortest path length Resnik (1999) - information content of lcs Unfortunately... Word frequency N/A Often assumes hierarchical/tree structure of taxonomy/ontology. (Both Rada and Resnik assume taxonomy is an is-A hierarchy) 7
  • 11. Semantic Similarity Remember, DBpedia not as ‘neat’: (Image source: http://www.visualdataweb.org/relfinder/) 8
  • 12. On DBpedia/Wikipedia Recent applications: Gabrilovich & Markovitch (2007) - express text as a weighted vector of Wikipedia articles, Explicit Semantic Analysis (ESA) 9
  • 13. On DBpedia/Wikipedia Recent applications: Gabrilovich & Markovitch (2007) - express text as a weighted vector of Wikipedia articles, Explicit Semantic Analysis (ESA) Witten & Milne (2008) - the Wikipedia Link-based measure - similarity of neighbours 9
  • 14. On DBpedia/Wikipedia Recent applications: Gabrilovich & Markovitch (2007) - express text as a weighted vector of Wikipedia articles, Explicit Semantic Analysis (ESA) Witten & Milne (2008) - the Wikipedia Link-based measure - similarity of neighbours Passant (2010) - Linked Data Semantic Distance 9
  • 15. On DBpedia/Wikipedia Recent applications: Gabrilovich & Markovitch (2007) - express text as a weighted vector of Wikipedia articles, Explicit Semantic Analysis (ESA) Witten & Milne (2008) - the Wikipedia Link-based measure - similarity of neighbours Passant (2010) - Linked Data Semantic Distance Mirizzi et al. (2012) uses DBpedia for movie recommendation using a Vector Space Model 9
  • 16. On DBpedia/Wikipedia Recent applications: Gabrilovich & Markovitch (2007) - express text as a weighted vector of Wikipedia articles, Explicit Semantic Analysis (ESA) Witten & Milne (2008) - the Wikipedia Link-based measure - similarity of neighbours Passant (2010) - Linked Data Semantic Distance ← uses paths! Mirizzi et al. (2012) uses DBpedia for movie recommendation using a Vector Space Model 10
  • 17. Similarity Important: Properties can be related to each other node type 2, e.g. film node, e.g. director type 1, e.g. influenced type 2, e.g. collaborated with 11
  • 18. Network Similarity Social Network Analysis Established field - notions of influence, centrality, rank etc. Often applied to small networks Note: Ranking is often based on similarity 12
  • 19. Network Similarity Homogeneous network measures: PageRank - Sergey & Brin (1998) - random-surfer with teleportation SimRank - Jeh & Widom (2002) - iteratively ‘inherits’ rank of neighbours 13
  • 20. Network Similarity Homogeneous network measures: PageRank - Sergey & Brin (1998) - random-surfer with teleportation SimRank - Jeh & Widom (2002) - iteratively ‘inherits’ rank of neighbours 13
  • 21. Network Similarity Homogeneous network measures: PageRank - Sergey & Brin (1998) - random-surfer with teleportation SimRank - Jeh & Widom (2002) - iteratively ‘inherits’ rank of neighbours σact - Thiel & Berthold (2010) - node similarities from spreading activation with a decay factor 13
  • 22. Network Similarity Heterogeneous network measures: PathSim - Sun & Han (2009) - count instances of ‘meta-path’ (specific link pattern) 14
  • 23. Network Similarity Applicability to DBpedia: PageRank, SimRank - N/A - assumes homogeneous links! Spreading Activation - possible with constraints Apply PathSim - but how to learn such meta-paths? 15
  • 24. Network Similarity Applicability to DBpedia: PageRank, SimRank - N/A - assumes homogeneous links! Spreading Activation - possible with constraints Apply PathSim - but how to learn such meta-paths? Another idea: Count node-disjoint paths. Why? View each path as one distinct ‘reason’. 15
  • 25. Similarity   Totoro GITS Matrix  Totoro 44 1 0     GITS 1 35 2  Matrix 0 2 58 Totoro – GITS Category:Anime films GITS – Matrix Category:Brain-computer interfacing in fiction Matrix → Category:The Matrix (franchise) → Category:Media franchises ← GITS 16
  • 26. Similarity How similar are the following films: Answer: it still depends 17
  • 27. Similarity How similar are the following films: Answer: it still depends - on the path you take 18
  • 28. Summary Similarity, useful concept in many areas, hard to define how are films similar? DBpedia, richly linked KB film information available here → Problem: How to define similarity on DBpedia? 19
  • 29. Summary Similarity, useful concept in many areas, hard to define how are films similar? DBpedia, richly linked KB film information available here → Problem: How to define similarity on DBpedia? Past methods - don’t exploit linkedness Network analysis methods can aid this test trial with node-disjoint paths, GITS more similar to Matrix than Totoro 19
  • 30. Summary Similarity, useful concept in many areas, hard to define how are films similar? DBpedia, richly linked KB film information available here → Problem: How to define similarity on DBpedia? Past methods - don’t exploit linkedness Network analysis methods can aid this test trial with node-disjoint paths, GITS more similar to Matrix than Totoro 20
  • 31. Ongoing/Future Work Mining DBpedia as Network Analyse structured and related data 21
  • 32. Ongoing/Future Work Mining DBpedia as Network Analyse structured and related data Similarity as complement to – reasoning, retrieval, querying Also useful in NLP, recommender systems, knowledge discovery → Examples: work we do in UIMR 21
  • 33. Ioana Hulpus (2011/2012) Graph-based topic analysis with the support of Linked Data 22
  • 34. Ioana Hulpus (2011/2012) Graph-based topic analysis with the support of Linked Data 23
  • 35. Benjamin Heitmann (2011/2012) Spreading activation for cross-domain recommendation 24
  • 36. Challenges/Discussion Challenges: Topology of DBpedia graph Standard SNA measures for homogeneous networks, e.g. density, degree distribution - how to apply to DBpedia? What does a path actually mean? Which subgraphs to use? How do metrics vary with different subgraphs, e.g. diff ontologies/categories? 25
  • 37. Challenges/Discussion Challenges: Topology of DBpedia graph Standard SNA measures for homogeneous networks, e.g. density, degree distribution - how to apply to DBpedia? What does a path actually mean? Which subgraphs to use? How do metrics vary with different subgraphs, e.g. diff ontologies/categories? Scalability (not problem, but challenge) Evaluation - how do we confirm something is similar? 25
  • 38. Challenges/Discussion Challenges: Topology of DBpedia graph Standard SNA measures for homogeneous networks, e.g. density, degree distribution - how to apply to DBpedia? What does a path actually mean? Which subgraphs to use? How do metrics vary with different subgraphs, e.g. diff ontologies/categories? Scalability (not problem, but challenge) Evaluation - how do we confirm something is similar? Thanks for listening! Questions/Suggestions? 25