SlideShare uma empresa Scribd logo
1 de 32
.nju.edu.cn




                RELIN: Relatedness and Informativeness-based
                    Centrality for Entity Summarization




                     Gong Cheng1, Thanh Tran2, Yuzhong Qu1
1 State Key Laboratory for Novel Software Technology, Nanjing University, China

          2 Institute AIFB, Karlsruhe Institute of Technology, Germany

                               gcheng@nju.edu.cn



                           Presented at ISWC2011
Motivation
                                                                                            ws .nju.edu.cn

        DBpedia describes 3.64M entities with 1B RDF triples.
            1B/3.64M = 281 RDF triples per entity
        A piece of lengthy entity description is unacceptable in tasks that require quick
        identification of the underlying entity.




Gong Cheng (程龚) gcheng@nju.edu.cn                                                           2 of 30
Entity search --- find entities that match an information need
                                                                     ws .nju.edu.cn




Gong Cheng (程龚) gcheng@nju.edu.cn                                    3 of 30
Pay-as-you-go data integration --- judge whether two entities denote the same
                                                                                    ws .nju.edu.cn




                                     sameAs?




Gong Cheng (程龚) gcheng@nju.edu.cn                                                   4 of 30
Motivation
                                                                                            ws .nju.edu.cn

        DBpedia describes 3.64M entities with 1B RDF triples.
            1B/3.64M = 281 RDF triples per entity
        A piece of lengthy entity description is unacceptable in tasks that require quick
        identification of the underlying entity.
        Problem: to summarize lengthy entity descriptions




Gong Cheng (程龚) gcheng@nju.edu.cn                                                           5 of 30
Outline
                                    ws .nju.edu.cn

        Problem statement
        The RELIN model
        Implementation
        Experiments
        Conclusions




Gong Cheng (程龚) gcheng@nju.edu.cn   6 of 30
Data graph
                                    ws .nju.edu.cn




Gong Cheng (程龚) gcheng@nju.edu.cn   7 of 30
Feature set
                                    ws .nju.edu.cn




Gong Cheng (程龚) gcheng@nju.edu.cn   8 of 30
Entity summarization
                                                 ws .nju.edu.cn

        Entity summarization = feature ranking
        Entity summary = k top-ranked features




Gong Cheng (程龚) gcheng@nju.edu.cn                9 of 30
Outline
                                    ws .nju.edu.cn

        Problem statement
        The RELIN model
        Implementation
        Experiments
        Conclusions




Gong Cheng (程龚) gcheng@nju.edu.cn   10 of 30
Centrality-based ranking: concepts
                                                                                 ws .nju.edu.cn

        Widely applied to text summarization and ontology summarization
        By constructing a graph
            Nodes: data elements to be ranked
            Edges: connecting related nodes
        and then, measuring node centrality
            e.g. degree, PageRank, …



                     f2
                                       f1       f4
                                                                Relatednesss ≥ threshold

                                                         f5
   Relatednesss < threshold
                                                 f3



Gong Cheng (程龚) gcheng@nju.edu.cn                                               11 of 30
PageRank
                                                                                ws .nju.edu.cn

        Simulating a random surfer’s behavior who navigates from node to node
        Two types of action
            Following a random edge (with a uniform probability distribution)
            Jumping at random (with a uniform probability distribution)
        Ranking based on the stationary distribution of such a Markov chain




                     f2
                                     f1               f4

                                                                 f5

                                                       f3



Gong Cheng (程龚) gcheng@nju.edu.cn                                               12 of 30
Centrality-based ranking for entity summarization: problems
                                                                                            ws .nju.edu.cn

        How to define a good feature
            Not only capturing the main themes of the entity description
            But also distinguishing the entity from others
        Loss of information
            Float-valued function  boolean-valued function




                     f2
                                      f1               f4
                                                                           Relatednesss ≥ threshold

                                                                  f5
   Relatednesss < threshold
                                                        f3



Gong Cheng (程龚) gcheng@nju.edu.cn                                                          13 of 30
RELIN: concepts
                                                                                               ws .nju.edu.cn

        An extension of PageRank
            Following a random edge (           )
            within a complete graph, with a probability proportional to the relatedness between the
            two associated nodes, i.e. no threshold needed
            Jumping at random (          )
            with a probability proportional to the amount of information carried by the target that
            helps to identify the entity




Gong Cheng (程龚) gcheng@nju.edu.cn                                                              14 of 30
RELIN: RELatedness and INformativeness-based centrality
                                                                                                ws .nju.edu.cn

        Two kinds of action
            Relational move --- more likely to a feature that carries related information about the
            theme currently under investigation
            Informational jump --- more likely to a feature that provides a large amount of new
            information for clarifying the identity of the underlying entity
        Two non-uniform probability distributions




Gong Cheng (程龚) gcheng@nju.edu.cn                                                               15 of 30
Formalization
                                                                                                    ws .nju.edu.cn

        Actions (given the current feature fq)
            P(M|fq): the probability of performing a relational move from fq
            P(J|fq): the probability of performing an informational jump from fq
            subject to P(M|fq) + P(J|fq) = 1
        Targets for actions (given FS the feature set)
            P(fp|fq,M): the probability of performing a relational move from fq to fp
            P(fp|fq,J): the probability of performing an informational jump from fq to fp
            subject to      P f p | f q , M 1 and      P f p | fq , J 1
                           f p FS                         f p FS
        Result
            x(t): |FS|-dimensional vector
            xp(t): the probability that the surfer visits fp at step t
            Finally,
                 xp t 1                  xq t   P M | fq P f p | fq , M   P J | fq P f p | fq , J
                                    f q FS

            and
                 lim x t       x
                 t




Gong Cheng (程龚) gcheng@nju.edu.cn                                                                   16 of 30
Outline
                                    ws .nju.edu.cn

        Problem statement
        The RELIN model
        Implementation
        Experiments
        Conclusions




Gong Cheng (程龚) gcheng@nju.edu.cn   17 of 30
Actions
                                        ws .nju.edu.cn

        P(M|fq) = 1 – λ
        P(J|fq) = λ
        λ: to be tuned in experiments




Gong Cheng (程龚) gcheng@nju.edu.cn       18 of 30
Relatedness --- P(fp|fq,M)
                                                                                      ws .nju.edu.cn

        Relatedness between features (i.e. property-value pairs) combines
            Relatedness between properties (i.e. resources)
            Relatedness between values (i.e. resources)
        Relatedness between resources = relatedness between resource names
            URI: label or local name
            Literal: lexical form
        Distributional relatedness between resource names
            More related = more often co-occur in certain contexts (e.g. documents)
        Estimated via “pointwise mutual information + Google”

                                                                      Hits si , s j
                                                      P si , s j
                                                                              N


                                                                   Hits s j
                                                      P sj
                                                                      N




Gong Cheng (程龚) gcheng@nju.edu.cn                                                     19 of 30
Informativeness --- P(fp|fq,J)
                                                                                           ws .nju.edu.cn

        Self-information

        o: informational jump from fq to fp
        P(fp|fq): the probability that fp belongs to a feature set given fq also does so
        Estimated via a statistical analysis of the data set




        Approximation: P(fp|fq) = P(fp)




Gong Cheng (程龚) gcheng@nju.edu.cn                                                          20 of 30
Outline
                                    ws .nju.edu.cn

        Problem statement
        The RELIN model
        Implementation
        Experiments
        Conclusions




Gong Cheng (程龚) gcheng@nju.edu.cn   21 of 30
Experiments
                                    ws .nju.edu.cn

        Intrinsic evaluation
        Extrinsic evaluation




Gong Cheng (程龚) gcheng@nju.edu.cn   22 of 30
Intrinsic evaluation --- design
                                                                                ws .nju.edu.cn

        Task
            To manually construct ideal entity summaries as the gold standard
        Participants
            24 students majoring in computer science
        Test cases
            149 entity descriptions randomly selected from DBpedia 3.4
        Assignment
            4.43 participants per entity description
        Output
            Top-5 features
            Top-10 features




Gong Cheng (程龚) gcheng@nju.edu.cn                                               23 of 30
Intrinsic evaluation --- results
                                                                          ws .nju.edu.cn

        Metric: overlap between summaries

        Agreement between participants about ideal summaries
            2.91 when k=5
            7.86 when k=10
        Quality of summaries computed under different approach settings




             Baselines
              Ours




Gong Cheng (程龚) gcheng@nju.edu.cn                                         24 of 30
Extrinsic evaluation --- design
                                                                                 ws .nju.edu.cn

        Task
            To manually confirm entity mappings by using summaries
        Participants
            19 students majoring in computer science
        Test cases
            47 pairs of entity descriptions (DBpedia 3.4 ↔ Freebase Dec. 2009)
            Gold-standard judgments based on owl:sameAs links
                 24 correct and 23 incorrect
        Assignment
            3.62 participants per pair, per approach setting
        Output
            Judgment: correct or incorrect




Gong Cheng (程龚) gcheng@nju.edu.cn                                                25 of 30
Extrinsic evaluation --- results
                                                                                         ws .nju.edu.cn

        Metrics
            Accuracy of the judgments
                  1.0 = consistent with the gold standard
                  0.0 = inconsistent
            Time spent
                  Normalized by the average time per judgment spent by the participant
                  1.0 = medium efficiency
                  Smaller value = higher efficiency


        Results




Gong Cheng (程龚) gcheng@nju.edu.cn                                                        26 of 30
Discussion
                                                                                      ws .nju.edu.cn

        Automatically computed summaries are still not as good as handcrafted ones.

                                                                         k=5   k=10
         Agreement between ideal summaries                              2.91 7.86
         Agreement between computed summaries and ideal summaries       2.40 4.88

        User-specific notion of informativeness
            Longitude and latitude are highly informative, but …
        Information redundancy
            Longitude + latitude = point
            What if multiple sources …
        Summarization = what + how (to present)




Gong Cheng (程龚) gcheng@nju.edu.cn                                                     27 of 30
Outline
                                    ws .nju.edu.cn

        Problem statement
        The RELIN model
        Implementation
        Experiments
        Conclusions




Gong Cheng (程龚) gcheng@nju.edu.cn   28 of 30
Conclusions
                                                                                            ws .nju.edu.cn

        Problem of entity summarization
            Extractive
            About identifying the entity that underlies a lengthy description
        The RELIN model
            Variant of the random surfer model
            Non-uniform probability distributions
            Informativeness + relatedness
        Implementation
            Based on linguistic and information theory concepts
            Using information captured by the labels of nodes and edges in the data graph
        Experiments
            Closer to handcrafted ideal summaries
            Assisting users in confirming entity mappings more accurately




Gong Cheng (程龚) gcheng@nju.edu.cn                                                           29 of 30
Future work --- application-specific entity summarization
                                                                ws .nju.edu.cn




                                    sameAs?




Gong Cheng (程龚) gcheng@nju.edu.cn                               30 of 30
Related work --- summarization
                                                                                 ws .nju.edu.cn



   Paradigm               Approach             Measure             Model
                                                                        RELIN
         Extractive                                              - Relatedness
    - Text                  Centrality-based     PageRank-like   - Informativeness
    - Ontology                                                   - Non-uniform
                                                                 probability distribution



                                                     Others           PageRank
      Non-extractive
                                               - Degree          - Relatedness
    - Database               Centroid-based
                                               - Betweenness     - Uniform probability
    - Graph
                                               -…                distribution




Gong Cheng (程龚) gcheng@nju.edu.cn                                                31 of 30
Related work --- ranking
                                                                                         ws .nju.edu.cn

        Different goals --- to best identify the underlying entity
            B. Aleman-Meza et al., Ranking Complex Relationships on the Semantic Web. IEEE
            Internet Comput. 2005.
            R. Delbru et al., Hierarchical Link Analysis for Ranking Web Data. ESWC 2010.
            T. Franz. TripleRank: Ranking Semantic Web Data By Tensor Decomposition. ISWC
            2009.
            …
        Exploitation of data semantics at different levels --- use labels of nodes and edges
            T. Penin et al., Snippet Generation for Semantic Web Search Engines. ASWC 2009.
            X. Zhang et al., Ontology Summarization Based on RDF Sentence Graph. WWW 2007.
            …




Gong Cheng (程龚) gcheng@nju.edu.cn                                                       32 of 30

Mais conteúdo relacionado

Destaque

Browsing Linked Data with MyView
Browsing Linked Data with MyViewBrowsing Linked Data with MyView
Browsing Linked Data with MyViewGong Cheng
 
Facilitating Human Intervention in Coreference Resolution with Comparative En...
Facilitating Human Intervention in Coreference Resolution with Comparative En...Facilitating Human Intervention in Coreference Resolution with Comparative En...
Facilitating Human Intervention in Coreference Resolution with Comparative En...Gong Cheng
 
HIEDS: A Generic and Efficient Approach to Hierarchical Dataset Summarization
HIEDS: A Generic and Efficient Approach to Hierarchical Dataset SummarizationHIEDS: A Generic and Efficient Approach to Hierarchical Dataset Summarization
HIEDS: A Generic and Efficient Approach to Hierarchical Dataset SummarizationGong Cheng
 
Web的图结构分析
Web的图结构分析Web的图结构分析
Web的图结构分析Gong Cheng
 
Towards Supporting the Life Cycle of Web Data
Towards Supporting the Life Cycle of Web DataTowards Supporting the Life Cycle of Web Data
Towards Supporting the Life Cycle of Web DataGong Cheng
 
BipRank: Ranking and Summarizing RDF Vocabulary Descriptions
BipRank: Ranking and Summarizing RDF Vocabulary DescriptionsBipRank: Ranking and Summarizing RDF Vocabulary Descriptions
BipRank: Ranking and Summarizing RDF Vocabulary DescriptionsGong Cheng
 

Destaque (6)

Browsing Linked Data with MyView
Browsing Linked Data with MyViewBrowsing Linked Data with MyView
Browsing Linked Data with MyView
 
Facilitating Human Intervention in Coreference Resolution with Comparative En...
Facilitating Human Intervention in Coreference Resolution with Comparative En...Facilitating Human Intervention in Coreference Resolution with Comparative En...
Facilitating Human Intervention in Coreference Resolution with Comparative En...
 
HIEDS: A Generic and Efficient Approach to Hierarchical Dataset Summarization
HIEDS: A Generic and Efficient Approach to Hierarchical Dataset SummarizationHIEDS: A Generic and Efficient Approach to Hierarchical Dataset Summarization
HIEDS: A Generic and Efficient Approach to Hierarchical Dataset Summarization
 
Web的图结构分析
Web的图结构分析Web的图结构分析
Web的图结构分析
 
Towards Supporting the Life Cycle of Web Data
Towards Supporting the Life Cycle of Web DataTowards Supporting the Life Cycle of Web Data
Towards Supporting the Life Cycle of Web Data
 
BipRank: Ranking and Summarizing RDF Vocabulary Descriptions
BipRank: Ranking and Summarizing RDF Vocabulary DescriptionsBipRank: Ranking and Summarizing RDF Vocabulary Descriptions
BipRank: Ranking and Summarizing RDF Vocabulary Descriptions
 

Mais de Gong Cheng

Towards Content-Based Dataset Search - Test Collections and Beyond
Towards Content-Based Dataset Search - Test Collections and BeyondTowards Content-Based Dataset Search - Test Collections and Beyond
Towards Content-Based Dataset Search - Test Collections and BeyondGong Cheng
 
从元数据到内容——新一代知识图谱搜索引擎初探
从元数据到内容——新一代知识图谱搜索引擎初探从元数据到内容——新一代知识图谱搜索引擎初探
从元数据到内容——新一代知识图谱搜索引擎初探Gong Cheng
 
知识图谱中的实体摘要:基于神经网络的方法
知识图谱中的实体摘要:基于神经网络的方法知识图谱中的实体摘要:基于神经网络的方法
知识图谱中的实体摘要:基于神经网络的方法Gong Cheng
 
Generating Compact and Relaxable Answers to Keyword Queries over Knowledge Gr...
Generating Compact and Relaxable Answers to Keyword Queries over Knowledge Gr...Generating Compact and Relaxable Answers to Keyword Queries over Knowledge Gr...
Generating Compact and Relaxable Answers to Keyword Queries over Knowledge Gr...Gong Cheng
 
知识图谱中的关联搜索
知识图谱中的关联搜索知识图谱中的关联搜索
知识图谱中的关联搜索Gong Cheng
 
面向高考机器人的知识表示与推理初探
面向高考机器人的知识表示与推理初探面向高考机器人的知识表示与推理初探
面向高考机器人的知识表示与推理初探Gong Cheng
 
知识图谱中的实体关联搜索
知识图谱中的实体关联搜索知识图谱中的实体关联搜索
知识图谱中的实体关联搜索Gong Cheng
 
Semantic Data Retrieval: Search, Ranking, and Summarization
Semantic Data Retrieval: Search, Ranking, and SummarizationSemantic Data Retrieval: Search, Ranking, and Summarization
Semantic Data Retrieval: Search, Ranking, and SummarizationGong Cheng
 
Semantic Web related top conference review
Semantic Web related top conference reviewSemantic Web related top conference review
Semantic Web related top conference reviewGong Cheng
 
Relatedness-based Multi-Entity Summarization
Relatedness-based Multi-Entity SummarizationRelatedness-based Multi-Entity Summarization
Relatedness-based Multi-Entity SummarizationGong Cheng
 
Generating Illustrative Snippets for Open Data on the Web
Generating Illustrative Snippets for Open Data on the WebGenerating Illustrative Snippets for Open Data on the Web
Generating Illustrative Snippets for Open Data on the WebGong Cheng
 
常识推理在地理自动答题中的需求分析
常识推理在地理自动答题中的需求分析常识推理在地理自动答题中的需求分析
常识推理在地理自动答题中的需求分析Gong Cheng
 
Efficient Algorithms for Association Finding and Frequent Association Pattern...
Efficient Algorithms for Association Finding and Frequent Association Pattern...Efficient Algorithms for Association Finding and Frequent Association Pattern...
Efficient Algorithms for Association Finding and Frequent Association Pattern...Gong Cheng
 
Summarizing Semantic Data
Summarizing Semantic DataSummarizing Semantic Data
Summarizing Semantic DataGong Cheng
 
Summarizing Entity Descriptions for Effective and Efficient Human-centered En...
Summarizing Entity Descriptions for Effective and Efficient Human-centered En...Summarizing Entity Descriptions for Effective and Efficient Human-centered En...
Summarizing Entity Descriptions for Effective and Efficient Human-centered En...Gong Cheng
 
Explass: Exploring Associations between Entities via Top-K Ontological Patter...
Explass: Exploring Associations between Entities via Top-K Ontological Patter...Explass: Exploring Associations between Entities via Top-K Ontological Patter...
Explass: Exploring Associations between Entities via Top-K Ontological Patter...Gong Cheng
 
Towards Exploratory Relationship Search: A Clustering-based Approach
Towards Exploratory Relationship Search: A Clustering-based ApproachTowards Exploratory Relationship Search: A Clustering-based Approach
Towards Exploratory Relationship Search: A Clustering-based ApproachGong Cheng
 
NJVR: The NanJing Vocabulary Repository
NJVR: The NanJing Vocabulary RepositoryNJVR: The NanJing Vocabulary Repository
NJVR: The NanJing Vocabulary RepositoryGong Cheng
 
An Empirical Study of Vocabulary Relatedness and Its Application to Recommend...
An Empirical Study of Vocabulary Relatedness and Its Application to Recommend...An Empirical Study of Vocabulary Relatedness and Its Application to Recommend...
An Empirical Study of Vocabulary Relatedness and Its Application to Recommend...Gong Cheng
 
Term Dependence on the Semantic Web
Term Dependence on the Semantic WebTerm Dependence on the Semantic Web
Term Dependence on the Semantic WebGong Cheng
 

Mais de Gong Cheng (20)

Towards Content-Based Dataset Search - Test Collections and Beyond
Towards Content-Based Dataset Search - Test Collections and BeyondTowards Content-Based Dataset Search - Test Collections and Beyond
Towards Content-Based Dataset Search - Test Collections and Beyond
 
从元数据到内容——新一代知识图谱搜索引擎初探
从元数据到内容——新一代知识图谱搜索引擎初探从元数据到内容——新一代知识图谱搜索引擎初探
从元数据到内容——新一代知识图谱搜索引擎初探
 
知识图谱中的实体摘要:基于神经网络的方法
知识图谱中的实体摘要:基于神经网络的方法知识图谱中的实体摘要:基于神经网络的方法
知识图谱中的实体摘要:基于神经网络的方法
 
Generating Compact and Relaxable Answers to Keyword Queries over Knowledge Gr...
Generating Compact and Relaxable Answers to Keyword Queries over Knowledge Gr...Generating Compact and Relaxable Answers to Keyword Queries over Knowledge Gr...
Generating Compact and Relaxable Answers to Keyword Queries over Knowledge Gr...
 
知识图谱中的关联搜索
知识图谱中的关联搜索知识图谱中的关联搜索
知识图谱中的关联搜索
 
面向高考机器人的知识表示与推理初探
面向高考机器人的知识表示与推理初探面向高考机器人的知识表示与推理初探
面向高考机器人的知识表示与推理初探
 
知识图谱中的实体关联搜索
知识图谱中的实体关联搜索知识图谱中的实体关联搜索
知识图谱中的实体关联搜索
 
Semantic Data Retrieval: Search, Ranking, and Summarization
Semantic Data Retrieval: Search, Ranking, and SummarizationSemantic Data Retrieval: Search, Ranking, and Summarization
Semantic Data Retrieval: Search, Ranking, and Summarization
 
Semantic Web related top conference review
Semantic Web related top conference reviewSemantic Web related top conference review
Semantic Web related top conference review
 
Relatedness-based Multi-Entity Summarization
Relatedness-based Multi-Entity SummarizationRelatedness-based Multi-Entity Summarization
Relatedness-based Multi-Entity Summarization
 
Generating Illustrative Snippets for Open Data on the Web
Generating Illustrative Snippets for Open Data on the WebGenerating Illustrative Snippets for Open Data on the Web
Generating Illustrative Snippets for Open Data on the Web
 
常识推理在地理自动答题中的需求分析
常识推理在地理自动答题中的需求分析常识推理在地理自动答题中的需求分析
常识推理在地理自动答题中的需求分析
 
Efficient Algorithms for Association Finding and Frequent Association Pattern...
Efficient Algorithms for Association Finding and Frequent Association Pattern...Efficient Algorithms for Association Finding and Frequent Association Pattern...
Efficient Algorithms for Association Finding and Frequent Association Pattern...
 
Summarizing Semantic Data
Summarizing Semantic DataSummarizing Semantic Data
Summarizing Semantic Data
 
Summarizing Entity Descriptions for Effective and Efficient Human-centered En...
Summarizing Entity Descriptions for Effective and Efficient Human-centered En...Summarizing Entity Descriptions for Effective and Efficient Human-centered En...
Summarizing Entity Descriptions for Effective and Efficient Human-centered En...
 
Explass: Exploring Associations between Entities via Top-K Ontological Patter...
Explass: Exploring Associations between Entities via Top-K Ontological Patter...Explass: Exploring Associations between Entities via Top-K Ontological Patter...
Explass: Exploring Associations between Entities via Top-K Ontological Patter...
 
Towards Exploratory Relationship Search: A Clustering-based Approach
Towards Exploratory Relationship Search: A Clustering-based ApproachTowards Exploratory Relationship Search: A Clustering-based Approach
Towards Exploratory Relationship Search: A Clustering-based Approach
 
NJVR: The NanJing Vocabulary Repository
NJVR: The NanJing Vocabulary RepositoryNJVR: The NanJing Vocabulary Repository
NJVR: The NanJing Vocabulary Repository
 
An Empirical Study of Vocabulary Relatedness and Its Application to Recommend...
An Empirical Study of Vocabulary Relatedness and Its Application to Recommend...An Empirical Study of Vocabulary Relatedness and Its Application to Recommend...
An Empirical Study of Vocabulary Relatedness and Its Application to Recommend...
 
Term Dependence on the Semantic Web
Term Dependence on the Semantic WebTerm Dependence on the Semantic Web
Term Dependence on the Semantic Web
 

Último

Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 

Último (20)

Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 

RELIN: Relatedness and Informativeness-based Centrality for Entity Summarization

  • 1. .nju.edu.cn RELIN: Relatedness and Informativeness-based Centrality for Entity Summarization Gong Cheng1, Thanh Tran2, Yuzhong Qu1 1 State Key Laboratory for Novel Software Technology, Nanjing University, China 2 Institute AIFB, Karlsruhe Institute of Technology, Germany gcheng@nju.edu.cn Presented at ISWC2011
  • 2. Motivation ws .nju.edu.cn DBpedia describes 3.64M entities with 1B RDF triples. 1B/3.64M = 281 RDF triples per entity A piece of lengthy entity description is unacceptable in tasks that require quick identification of the underlying entity. Gong Cheng (程龚) gcheng@nju.edu.cn 2 of 30
  • 3. Entity search --- find entities that match an information need ws .nju.edu.cn Gong Cheng (程龚) gcheng@nju.edu.cn 3 of 30
  • 4. Pay-as-you-go data integration --- judge whether two entities denote the same ws .nju.edu.cn sameAs? Gong Cheng (程龚) gcheng@nju.edu.cn 4 of 30
  • 5. Motivation ws .nju.edu.cn DBpedia describes 3.64M entities with 1B RDF triples. 1B/3.64M = 281 RDF triples per entity A piece of lengthy entity description is unacceptable in tasks that require quick identification of the underlying entity. Problem: to summarize lengthy entity descriptions Gong Cheng (程龚) gcheng@nju.edu.cn 5 of 30
  • 6. Outline ws .nju.edu.cn Problem statement The RELIN model Implementation Experiments Conclusions Gong Cheng (程龚) gcheng@nju.edu.cn 6 of 30
  • 7. Data graph ws .nju.edu.cn Gong Cheng (程龚) gcheng@nju.edu.cn 7 of 30
  • 8. Feature set ws .nju.edu.cn Gong Cheng (程龚) gcheng@nju.edu.cn 8 of 30
  • 9. Entity summarization ws .nju.edu.cn Entity summarization = feature ranking Entity summary = k top-ranked features Gong Cheng (程龚) gcheng@nju.edu.cn 9 of 30
  • 10. Outline ws .nju.edu.cn Problem statement The RELIN model Implementation Experiments Conclusions Gong Cheng (程龚) gcheng@nju.edu.cn 10 of 30
  • 11. Centrality-based ranking: concepts ws .nju.edu.cn Widely applied to text summarization and ontology summarization By constructing a graph Nodes: data elements to be ranked Edges: connecting related nodes and then, measuring node centrality e.g. degree, PageRank, … f2 f1 f4 Relatednesss ≥ threshold f5 Relatednesss < threshold f3 Gong Cheng (程龚) gcheng@nju.edu.cn 11 of 30
  • 12. PageRank ws .nju.edu.cn Simulating a random surfer’s behavior who navigates from node to node Two types of action Following a random edge (with a uniform probability distribution) Jumping at random (with a uniform probability distribution) Ranking based on the stationary distribution of such a Markov chain f2 f1 f4 f5 f3 Gong Cheng (程龚) gcheng@nju.edu.cn 12 of 30
  • 13. Centrality-based ranking for entity summarization: problems ws .nju.edu.cn How to define a good feature Not only capturing the main themes of the entity description But also distinguishing the entity from others Loss of information Float-valued function  boolean-valued function f2 f1 f4 Relatednesss ≥ threshold f5 Relatednesss < threshold f3 Gong Cheng (程龚) gcheng@nju.edu.cn 13 of 30
  • 14. RELIN: concepts ws .nju.edu.cn An extension of PageRank Following a random edge ( ) within a complete graph, with a probability proportional to the relatedness between the two associated nodes, i.e. no threshold needed Jumping at random ( ) with a probability proportional to the amount of information carried by the target that helps to identify the entity Gong Cheng (程龚) gcheng@nju.edu.cn 14 of 30
  • 15. RELIN: RELatedness and INformativeness-based centrality ws .nju.edu.cn Two kinds of action Relational move --- more likely to a feature that carries related information about the theme currently under investigation Informational jump --- more likely to a feature that provides a large amount of new information for clarifying the identity of the underlying entity Two non-uniform probability distributions Gong Cheng (程龚) gcheng@nju.edu.cn 15 of 30
  • 16. Formalization ws .nju.edu.cn Actions (given the current feature fq) P(M|fq): the probability of performing a relational move from fq P(J|fq): the probability of performing an informational jump from fq subject to P(M|fq) + P(J|fq) = 1 Targets for actions (given FS the feature set) P(fp|fq,M): the probability of performing a relational move from fq to fp P(fp|fq,J): the probability of performing an informational jump from fq to fp subject to P f p | f q , M 1 and P f p | fq , J 1 f p FS f p FS Result x(t): |FS|-dimensional vector xp(t): the probability that the surfer visits fp at step t Finally, xp t 1 xq t P M | fq P f p | fq , M P J | fq P f p | fq , J f q FS and lim x t x t Gong Cheng (程龚) gcheng@nju.edu.cn 16 of 30
  • 17. Outline ws .nju.edu.cn Problem statement The RELIN model Implementation Experiments Conclusions Gong Cheng (程龚) gcheng@nju.edu.cn 17 of 30
  • 18. Actions ws .nju.edu.cn P(M|fq) = 1 – λ P(J|fq) = λ λ: to be tuned in experiments Gong Cheng (程龚) gcheng@nju.edu.cn 18 of 30
  • 19. Relatedness --- P(fp|fq,M) ws .nju.edu.cn Relatedness between features (i.e. property-value pairs) combines Relatedness between properties (i.e. resources) Relatedness between values (i.e. resources) Relatedness between resources = relatedness between resource names URI: label or local name Literal: lexical form Distributional relatedness between resource names More related = more often co-occur in certain contexts (e.g. documents) Estimated via “pointwise mutual information + Google” Hits si , s j P si , s j N Hits s j P sj N Gong Cheng (程龚) gcheng@nju.edu.cn 19 of 30
  • 20. Informativeness --- P(fp|fq,J) ws .nju.edu.cn Self-information o: informational jump from fq to fp P(fp|fq): the probability that fp belongs to a feature set given fq also does so Estimated via a statistical analysis of the data set Approximation: P(fp|fq) = P(fp) Gong Cheng (程龚) gcheng@nju.edu.cn 20 of 30
  • 21. Outline ws .nju.edu.cn Problem statement The RELIN model Implementation Experiments Conclusions Gong Cheng (程龚) gcheng@nju.edu.cn 21 of 30
  • 22. Experiments ws .nju.edu.cn Intrinsic evaluation Extrinsic evaluation Gong Cheng (程龚) gcheng@nju.edu.cn 22 of 30
  • 23. Intrinsic evaluation --- design ws .nju.edu.cn Task To manually construct ideal entity summaries as the gold standard Participants 24 students majoring in computer science Test cases 149 entity descriptions randomly selected from DBpedia 3.4 Assignment 4.43 participants per entity description Output Top-5 features Top-10 features Gong Cheng (程龚) gcheng@nju.edu.cn 23 of 30
  • 24. Intrinsic evaluation --- results ws .nju.edu.cn Metric: overlap between summaries Agreement between participants about ideal summaries 2.91 when k=5 7.86 when k=10 Quality of summaries computed under different approach settings Baselines Ours Gong Cheng (程龚) gcheng@nju.edu.cn 24 of 30
  • 25. Extrinsic evaluation --- design ws .nju.edu.cn Task To manually confirm entity mappings by using summaries Participants 19 students majoring in computer science Test cases 47 pairs of entity descriptions (DBpedia 3.4 ↔ Freebase Dec. 2009) Gold-standard judgments based on owl:sameAs links 24 correct and 23 incorrect Assignment 3.62 participants per pair, per approach setting Output Judgment: correct or incorrect Gong Cheng (程龚) gcheng@nju.edu.cn 25 of 30
  • 26. Extrinsic evaluation --- results ws .nju.edu.cn Metrics Accuracy of the judgments 1.0 = consistent with the gold standard 0.0 = inconsistent Time spent Normalized by the average time per judgment spent by the participant 1.0 = medium efficiency Smaller value = higher efficiency Results Gong Cheng (程龚) gcheng@nju.edu.cn 26 of 30
  • 27. Discussion ws .nju.edu.cn Automatically computed summaries are still not as good as handcrafted ones. k=5 k=10 Agreement between ideal summaries 2.91 7.86 Agreement between computed summaries and ideal summaries 2.40 4.88 User-specific notion of informativeness Longitude and latitude are highly informative, but … Information redundancy Longitude + latitude = point What if multiple sources … Summarization = what + how (to present) Gong Cheng (程龚) gcheng@nju.edu.cn 27 of 30
  • 28. Outline ws .nju.edu.cn Problem statement The RELIN model Implementation Experiments Conclusions Gong Cheng (程龚) gcheng@nju.edu.cn 28 of 30
  • 29. Conclusions ws .nju.edu.cn Problem of entity summarization Extractive About identifying the entity that underlies a lengthy description The RELIN model Variant of the random surfer model Non-uniform probability distributions Informativeness + relatedness Implementation Based on linguistic and information theory concepts Using information captured by the labels of nodes and edges in the data graph Experiments Closer to handcrafted ideal summaries Assisting users in confirming entity mappings more accurately Gong Cheng (程龚) gcheng@nju.edu.cn 29 of 30
  • 30. Future work --- application-specific entity summarization ws .nju.edu.cn sameAs? Gong Cheng (程龚) gcheng@nju.edu.cn 30 of 30
  • 31. Related work --- summarization ws .nju.edu.cn Paradigm Approach Measure Model RELIN Extractive - Relatedness - Text Centrality-based PageRank-like - Informativeness - Ontology - Non-uniform probability distribution Others PageRank Non-extractive - Degree - Relatedness - Database Centroid-based - Betweenness - Uniform probability - Graph -… distribution Gong Cheng (程龚) gcheng@nju.edu.cn 31 of 30
  • 32. Related work --- ranking ws .nju.edu.cn Different goals --- to best identify the underlying entity B. Aleman-Meza et al., Ranking Complex Relationships on the Semantic Web. IEEE Internet Comput. 2005. R. Delbru et al., Hierarchical Link Analysis for Ranking Web Data. ESWC 2010. T. Franz. TripleRank: Ranking Semantic Web Data By Tensor Decomposition. ISWC 2009. … Exploitation of data semantics at different levels --- use labels of nodes and edges T. Penin et al., Snippet Generation for Semantic Web Search Engines. ASWC 2009. X. Zhang et al., Ontology Summarization Based on RDF Sentence Graph. WWW 2007. … Gong Cheng (程龚) gcheng@nju.edu.cn 32 of 30