SlideShare uma empresa Scribd logo
1 de 25
.nju.edu.cn




                      BipRank: Ranking and Summarizing
                         RDF Vocabulary Descriptions




          Gong Cheng1, Feng Ji2, Shengmei Luo2, Weiyi Ge1, Yuzhong Qu1

1State   Key Laboratory for Novel Software Technology, Nanjing University, China
          2Communication Services R&D Institute, ZTE Corporation, China



                             Presented at JIST2011
Outline
                                    ws .nju.edu.cn

        Introduction
        Salience measurement
        Vocabulary summarization
        Conclusions




Gong Cheng (程龚) gcheng@nju.edu.cn   2 of 25
Vocabularies and Linked Data
                                                  ws .nju.edu.cn


   Vocabularies                         Your own vocabulary


                                Reuse




   Linked Data




Gong Cheng (程龚) gcheng@nju.edu.cn                 3 of 25
Vocabulary search engines
                                    ws .nju.edu.cn




Gong Cheng (程龚) gcheng@nju.edu.cn   4 of 25
Vocabularies
                                    ws .nju.edu.cn




                                      Scale




Gong Cheng (程龚) gcheng@nju.edu.cn   5 of 25
Vocabulary snippets --- state of the art
                                               ws .nju.edu.cn




Gong Cheng (程龚) gcheng@nju.edu.cn              6 of 25
Vocabulary snippets --- our approach
                                           ws .nju.edu.cn




Gong Cheng (程龚) gcheng@nju.edu.cn          7 of 25
Vocabulary summarization
                                                                            ws .nju.edu.cn




           Vocabulary summarization = ranking and selecting RDF sentences



Gong Cheng (程龚) gcheng@nju.edu.cn                                           8 of 25
Outline
                                    ws .nju.edu.cn

        Introduction
        Salience measurement
        Vocabulary summarization
        Conclusions




Gong Cheng (程龚) gcheng@nju.edu.cn   9 of 25
A bipartite view of vocabulary description
                                                 ws .nju.edu.cn




Gong Cheng (程龚) gcheng@nju.edu.cn                10 of 25
Surfer behavior --- type A
                                    ws .nju.edu.cn




Gong Cheng (程龚) gcheng@nju.edu.cn   11 of 25
Surfer behavior --- type B
                                    ws .nju.edu.cn




Gong Cheng (程龚) gcheng@nju.edu.cn   12 of 25
BipRank
                                                                     ws .nju.edu.cn




       Next step                ?   Uniform   Current step



                                                             type-A behavior


                                                             type-B behavior




Gong Cheng (程龚) gcheng@nju.edu.cn                                    13 of 25
Pattern of RDF sentence
                                    ws .nju.edu.cn




Gong Cheng (程龚) gcheng@nju.edu.cn   14 of 25
p(s|u)
                                                                         ws .nju.edu.cn

        Frequency of Pattern(s)
             #RDF_sentence in the vocabulary that has the same pattern
        Popularity of Pattern(s)
             #Vocabulary in the repository that has the same pattern




Gong Cheng (程龚) gcheng@nju.edu.cn                                        15 of 25
Evaluation setting
                                                                           ws .nju.edu.cn

        Test cases
            9 moderate-sized vocabularies randomly selected from Falcons
        Gold standard
            Salience given by 6 human experts
        Competitors
            Cp: Zhang et al. (WWW2007)
            Our approach
                 BipRank-U: pattern-unaware
                 BipRank-F: using pattern frequency
                 BipRank-P: using pattern popularity
        Metric
            Pearson product-moment correlation coefficient




Gong Cheng (程龚) gcheng@nju.edu.cn                                          16 of 25
Evaluation results
                                    ws .nju.edu.cn




Gong Cheng (程龚) gcheng@nju.edu.cn   17 of 25
Outline
                                    ws .nju.edu.cn

        Introduction
        Salience measurement
        Vocabulary summarization
        Conclusions




Gong Cheng (程龚) gcheng@nju.edu.cn   18 of 25
Goodness of a summary
                                                           ws .nju.edu.cn

        Salience




        Query relevance
            Textual similarity between query and summary




        Cohesion
            Term overlap between RDF sentences




Gong Cheng (程龚) gcheng@nju.edu.cn                          19 of 25
Looking for the best summary
                                              ws .nju.edu.cn

        Multi-objective optimization
        Single aggregate objective function




        Solution: a greedy strategy




Gong Cheng (程龚) gcheng@nju.edu.cn             20 of 25
Evaluation setting
                                                                      ws .nju.edu.cn

        Judges
            18 human experts
        Test cases
            190 searches over 2,012 vocabularies crawled by Falcons
        Competitors
            Generic: Zhang et al. (WWW2007)
            Our approach
                 QR: query relevance
                 QR+S: query relevance + salience
                 QR+C: query relevance + cohesion
        Metric
            Rating on a 10-point scale




Gong Cheng (程龚) gcheng@nju.edu.cn                                     21 of 25
Evaluation results
                                    ws .nju.edu.cn




Gong Cheng (程龚) gcheng@nju.edu.cn   22 of 25
Performance testing
                                                                ws .nju.edu.cn


                           Size of summary   Runtime




                                               Size of vocabulary




Gong Cheng (程龚) gcheng@nju.edu.cn                               23 of 25
Outline
                                    ws .nju.edu.cn

        Introduction
        Salience measurement
        Vocabulary summarization
        Conclusions




Gong Cheng (程龚) gcheng@nju.edu.cn   24 of 25
Conclusions
                                                           ws .nju.edu.cn

        Salience measurement
            Sentence-term graph
            BipRank
            Pattern of RDF sentence
        Vocabulary summarization
            Salience
            Query relevance
            Cohesion


        Implemented in Falcons Ontology Search
            http://ws.nju.edu.cn/falcons/ontologysearch/




Gong Cheng (程龚) gcheng@nju.edu.cn                          25 of 25

Mais conteúdo relacionado

Mais de Gong Cheng

常识推理在地理自动答题中的需求分析
常识推理在地理自动答题中的需求分析常识推理在地理自动答题中的需求分析
常识推理在地理自动答题中的需求分析
Gong Cheng
 
HIEDS: A Generic and Efficient Approach to Hierarchical Dataset Summarization
HIEDS: A Generic and Efficient Approach to Hierarchical Dataset SummarizationHIEDS: A Generic and Efficient Approach to Hierarchical Dataset Summarization
HIEDS: A Generic and Efficient Approach to Hierarchical Dataset Summarization
Gong Cheng
 
Taking up the Gaokao Challenge: An Information Retrieval Approach
Taking up the Gaokao Challenge: An Information Retrieval ApproachTaking up the Gaokao Challenge: An Information Retrieval Approach
Taking up the Gaokao Challenge: An Information Retrieval Approach
Gong Cheng
 
An Empirical Study of Vocabulary Relatedness and Its Application to Recommend...
An Empirical Study of Vocabulary Relatedness and Its Application to Recommend...An Empirical Study of Vocabulary Relatedness and Its Application to Recommend...
An Empirical Study of Vocabulary Relatedness and Its Application to Recommend...
Gong Cheng
 
RELIN: Relatedness and Informativeness-based Centrality for Entity Summarization
RELIN: Relatedness and Informativeness-based Centrality for Entity SummarizationRELIN: Relatedness and Informativeness-based Centrality for Entity Summarization
RELIN: Relatedness and Informativeness-based Centrality for Entity Summarization
Gong Cheng
 
Browsing Linked Data with MyView
Browsing Linked Data with MyViewBrowsing Linked Data with MyView
Browsing Linked Data with MyView
Gong Cheng
 

Mais de Gong Cheng (20)

Semantic Data Retrieval: Search, Ranking, and Summarization
Semantic Data Retrieval: Search, Ranking, and SummarizationSemantic Data Retrieval: Search, Ranking, and Summarization
Semantic Data Retrieval: Search, Ranking, and Summarization
 
Semantic Web related top conference review
Semantic Web related top conference reviewSemantic Web related top conference review
Semantic Web related top conference review
 
Relatedness-based Multi-Entity Summarization
Relatedness-based Multi-Entity SummarizationRelatedness-based Multi-Entity Summarization
Relatedness-based Multi-Entity Summarization
 
Generating Illustrative Snippets for Open Data on the Web
Generating Illustrative Snippets for Open Data on the WebGenerating Illustrative Snippets for Open Data on the Web
Generating Illustrative Snippets for Open Data on the Web
 
常识推理在地理自动答题中的需求分析
常识推理在地理自动答题中的需求分析常识推理在地理自动答题中的需求分析
常识推理在地理自动答题中的需求分析
 
Efficient Algorithms for Association Finding and Frequent Association Pattern...
Efficient Algorithms for Association Finding and Frequent Association Pattern...Efficient Algorithms for Association Finding and Frequent Association Pattern...
Efficient Algorithms for Association Finding and Frequent Association Pattern...
 
Summarizing Semantic Data
Summarizing Semantic DataSummarizing Semantic Data
Summarizing Semantic Data
 
HIEDS: A Generic and Efficient Approach to Hierarchical Dataset Summarization
HIEDS: A Generic and Efficient Approach to Hierarchical Dataset SummarizationHIEDS: A Generic and Efficient Approach to Hierarchical Dataset Summarization
HIEDS: A Generic and Efficient Approach to Hierarchical Dataset Summarization
 
Taking up the Gaokao Challenge: An Information Retrieval Approach
Taking up the Gaokao Challenge: An Information Retrieval ApproachTaking up the Gaokao Challenge: An Information Retrieval Approach
Taking up the Gaokao Challenge: An Information Retrieval Approach
 
知识的摘要
知识的摘要知识的摘要
知识的摘要
 
Explass: Exploring Associations between Entities via Top-K Ontological Patter...
Explass: Exploring Associations between Entities via Top-K Ontological Patter...Explass: Exploring Associations between Entities via Top-K Ontological Patter...
Explass: Exploring Associations between Entities via Top-K Ontological Patter...
 
Facilitating Human Intervention in Coreference Resolution with Comparative En...
Facilitating Human Intervention in Coreference Resolution with Comparative En...Facilitating Human Intervention in Coreference Resolution with Comparative En...
Facilitating Human Intervention in Coreference Resolution with Comparative En...
 
Towards Exploratory Relationship Search: A Clustering-based Approach
Towards Exploratory Relationship Search: A Clustering-based ApproachTowards Exploratory Relationship Search: A Clustering-based Approach
Towards Exploratory Relationship Search: A Clustering-based Approach
 
NJVR: The NanJing Vocabulary Repository
NJVR: The NanJing Vocabulary RepositoryNJVR: The NanJing Vocabulary Repository
NJVR: The NanJing Vocabulary Repository
 
Web的图结构分析
Web的图结构分析Web的图结构分析
Web的图结构分析
 
An Empirical Study of Vocabulary Relatedness and Its Application to Recommend...
An Empirical Study of Vocabulary Relatedness and Its Application to Recommend...An Empirical Study of Vocabulary Relatedness and Its Application to Recommend...
An Empirical Study of Vocabulary Relatedness and Its Application to Recommend...
 
RELIN: Relatedness and Informativeness-based Centrality for Entity Summarization
RELIN: Relatedness and Informativeness-based Centrality for Entity SummarizationRELIN: Relatedness and Informativeness-based Centrality for Entity Summarization
RELIN: Relatedness and Informativeness-based Centrality for Entity Summarization
 
Browsing Linked Data with MyView
Browsing Linked Data with MyViewBrowsing Linked Data with MyView
Browsing Linked Data with MyView
 
Towards Supporting the Life Cycle of Web Data
Towards Supporting the Life Cycle of Web DataTowards Supporting the Life Cycle of Web Data
Towards Supporting the Life Cycle of Web Data
 
Falcons Explorer: Tabular and Relational End-user Programming for the Web of ...
Falcons Explorer: Tabular and Relational End-user Programming for the Web of ...Falcons Explorer: Tabular and Relational End-user Programming for the Web of ...
Falcons Explorer: Tabular and Relational End-user Programming for the Web of ...
 

Último

Último (20)

Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 

BipRank: Ranking and Summarizing RDF Vocabulary Descriptions

  • 1. .nju.edu.cn BipRank: Ranking and Summarizing RDF Vocabulary Descriptions Gong Cheng1, Feng Ji2, Shengmei Luo2, Weiyi Ge1, Yuzhong Qu1 1State Key Laboratory for Novel Software Technology, Nanjing University, China 2Communication Services R&D Institute, ZTE Corporation, China Presented at JIST2011
  • 2. Outline ws .nju.edu.cn Introduction Salience measurement Vocabulary summarization Conclusions Gong Cheng (程龚) gcheng@nju.edu.cn 2 of 25
  • 3. Vocabularies and Linked Data ws .nju.edu.cn Vocabularies Your own vocabulary Reuse Linked Data Gong Cheng (程龚) gcheng@nju.edu.cn 3 of 25
  • 4. Vocabulary search engines ws .nju.edu.cn Gong Cheng (程龚) gcheng@nju.edu.cn 4 of 25
  • 5. Vocabularies ws .nju.edu.cn Scale Gong Cheng (程龚) gcheng@nju.edu.cn 5 of 25
  • 6. Vocabulary snippets --- state of the art ws .nju.edu.cn Gong Cheng (程龚) gcheng@nju.edu.cn 6 of 25
  • 7. Vocabulary snippets --- our approach ws .nju.edu.cn Gong Cheng (程龚) gcheng@nju.edu.cn 7 of 25
  • 8. Vocabulary summarization ws .nju.edu.cn Vocabulary summarization = ranking and selecting RDF sentences Gong Cheng (程龚) gcheng@nju.edu.cn 8 of 25
  • 9. Outline ws .nju.edu.cn Introduction Salience measurement Vocabulary summarization Conclusions Gong Cheng (程龚) gcheng@nju.edu.cn 9 of 25
  • 10. A bipartite view of vocabulary description ws .nju.edu.cn Gong Cheng (程龚) gcheng@nju.edu.cn 10 of 25
  • 11. Surfer behavior --- type A ws .nju.edu.cn Gong Cheng (程龚) gcheng@nju.edu.cn 11 of 25
  • 12. Surfer behavior --- type B ws .nju.edu.cn Gong Cheng (程龚) gcheng@nju.edu.cn 12 of 25
  • 13. BipRank ws .nju.edu.cn Next step ? Uniform Current step type-A behavior type-B behavior Gong Cheng (程龚) gcheng@nju.edu.cn 13 of 25
  • 14. Pattern of RDF sentence ws .nju.edu.cn Gong Cheng (程龚) gcheng@nju.edu.cn 14 of 25
  • 15. p(s|u) ws .nju.edu.cn Frequency of Pattern(s) #RDF_sentence in the vocabulary that has the same pattern Popularity of Pattern(s) #Vocabulary in the repository that has the same pattern Gong Cheng (程龚) gcheng@nju.edu.cn 15 of 25
  • 16. Evaluation setting ws .nju.edu.cn Test cases 9 moderate-sized vocabularies randomly selected from Falcons Gold standard Salience given by 6 human experts Competitors Cp: Zhang et al. (WWW2007) Our approach BipRank-U: pattern-unaware BipRank-F: using pattern frequency BipRank-P: using pattern popularity Metric Pearson product-moment correlation coefficient Gong Cheng (程龚) gcheng@nju.edu.cn 16 of 25
  • 17. Evaluation results ws .nju.edu.cn Gong Cheng (程龚) gcheng@nju.edu.cn 17 of 25
  • 18. Outline ws .nju.edu.cn Introduction Salience measurement Vocabulary summarization Conclusions Gong Cheng (程龚) gcheng@nju.edu.cn 18 of 25
  • 19. Goodness of a summary ws .nju.edu.cn Salience Query relevance Textual similarity between query and summary Cohesion Term overlap between RDF sentences Gong Cheng (程龚) gcheng@nju.edu.cn 19 of 25
  • 20. Looking for the best summary ws .nju.edu.cn Multi-objective optimization Single aggregate objective function Solution: a greedy strategy Gong Cheng (程龚) gcheng@nju.edu.cn 20 of 25
  • 21. Evaluation setting ws .nju.edu.cn Judges 18 human experts Test cases 190 searches over 2,012 vocabularies crawled by Falcons Competitors Generic: Zhang et al. (WWW2007) Our approach QR: query relevance QR+S: query relevance + salience QR+C: query relevance + cohesion Metric Rating on a 10-point scale Gong Cheng (程龚) gcheng@nju.edu.cn 21 of 25
  • 22. Evaluation results ws .nju.edu.cn Gong Cheng (程龚) gcheng@nju.edu.cn 22 of 25
  • 23. Performance testing ws .nju.edu.cn Size of summary Runtime Size of vocabulary Gong Cheng (程龚) gcheng@nju.edu.cn 23 of 25
  • 24. Outline ws .nju.edu.cn Introduction Salience measurement Vocabulary summarization Conclusions Gong Cheng (程龚) gcheng@nju.edu.cn 24 of 25
  • 25. Conclusions ws .nju.edu.cn Salience measurement Sentence-term graph BipRank Pattern of RDF sentence Vocabulary summarization Salience Query relevance Cohesion Implemented in Falcons Ontology Search http://ws.nju.edu.cn/falcons/ontologysearch/ Gong Cheng (程龚) gcheng@nju.edu.cn 25 of 25