SlideShare uma empresa Scribd logo
1 de 22
Baixar para ler offline
Ravali Pochampally
Kamal Karlapalem
 [IIIT Hyderabad]
• WWW
  - diverse content
  - 100+ articles on major
    topics



• Google News/Amazon
  - organized
  - (yet) too much text




                             2
• Condenses information

          salient points
          length α (1/content)
          user-specified parameters


• Lacks Organization

         × delineation of issues
         × model diversity
         × too long (?)




                                       3
• A view intends to represent an issue pertaining to a
     set of related1 articles
      - organized (multiple concise views)
      - information exploration
      - detailed snapshot


  • Example2 : review dataset (hotel)
      - views               [positive, negative, food, facilities]
      - summary             [unorganized]


1. articles concerning a common topic (FIFA 2010, swine flu in India etc.)
2. http://sites.google.com/site/diverseviews/comparison
                                                                             4
• Related Work
• Problem


• Extraction of views
  • Ranking




• Results
  • Discussion


                        7
• Allison et. al
     • idea of multiple view-points
     • framework


   • Tombros et. al
     • clustering of top-ranking sentences


   • TextTiling 1
     • divide text into multi-paragraph units
     • unit represents a sub-topic

1. M. A. Hearst 1997                            8
Related
Articles


                  Problem
           [Mining Diverse Views]




                                    Ranked set of views


                                                      9
* http://sites.google.com/site/diverseviews/datasets

                                                       10
• Datasets
     • sources: google news, amazon.com, tripadvisor.com
     • crawling + parsing [html and rss]




   • Data cleaning & pre-processing
     • stopwords, stemming and duplicates
     • word-frequency, TF-IDF 1




1. http://nlp.stanford.edu/IR-book/html/htmledition/tf-idf-weighting-1.html   11
• Main idea
  • We score each sentence in our dataset and extract the
    top-ranked ones. These sentences are used to generate
    views [Pruning]

  • We assign a Importance (I) score to each sentence


  • Importance Ik of a sentence Sk belonging to article dj of
    length r is

                            Ik = Πr Ti,j
                                    r
                  Ti,j = TF-IDF of wi є (Sk Λ dj)
                                                                12
• A measure of similarity is required to extract views
  from sentences
  • Semantic similarity : likeness of meaning


• Mihalcea et. al
  • specificity of a word can be determined by its idf
  • we use
       word-to-word similarity &
       specificity - to calculate semantic similarity




                                                         13
• Semantic similarity between sentences Si and Sj
      where w represents a word in a sentence is



                (symmetric relation & range Є [0,1])

  • Need to define maxSim(w, Sj)
    • Wordnet : sets of cognitive synonyms (synsets)
    • wup1 : based on path length between synsets

1. Z. Wu 1994                                          14
• Clustering is used to extract views from the set of
 important sentences

• Hierarchical Agglomerative Clustering (HAC) was
 used
  • upper triangle [symmetric matrix]
  • no restrictions on # of clusters
  • terminate clustering when scoring parameter converges


• We treat clusters as views discussing similar content
                                                            15
• Focus on average pair-wise similarity between the
 sentences in a view

• Cohesion

                      C = Σi,j Є V Si,j
                              len(v)
                     Si,j = sim(Ti , Tj )

              V = set of sentences (Ti) in the view
              len(v) = # of sentences in the view


                                                      16
• Most relevant view (MRV)
  • preference to views discussing similar content [greater
    cohesion]
  • top-ranked view


• Outlier view (OV)
  • single sentence
  • low semantic similarity with other sentences
  • Cohesion = 0 [ordered by importance]




                                                              17
HTML + Text                   Raw Text
Related                   Data Cleaning                      Extract
                          and                                Top-ranked
Articles
                          Preprocessing                      sentences




                                                     Top-n
                                                     Sentences




 Ranked                   Ranking                            Clustering
 Views                    (Cohesion)                         Engine
 MRV & OV
             Ranked
             Views                        Views



                                                                          18
• Number of top-ranking sentences (n) vs. cohesion
         • n which can maximize cohesion
         • median cohesion >= mean [outliers]
         • 20 <= n <= 35
         • incremental clustering   1




     • More top-ranking sentences need not necessarily lead
        to views with better cohesion

1. M. Charikar STOC 1997

                                                              19
20
• IR model as an alternate to summarization
        multiple diverse views
        easily navigable
        browse top x views
        detailed (yet organized) snapshot of a ToI
        clustering at sentence/phrase level*


   • Future work
       • polarity of a view
       • user feedback
         • Implicit [clicks, time-spent]
         • Explicit [user-ratings]
* as opposed to document clustering
                                                      21
Thanks!

Mais conteúdo relacionado

Semelhante a Mining Diverse Views from Related Articles

Vectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic MatchingVectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic MatchingSimon Hughes
 
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com Lucidworks
 
ESWC 2011 BLOOMS+
ESWC 2011 BLOOMS+ ESWC 2011 BLOOMS+
ESWC 2011 BLOOMS+ Prateek Jain
 
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Lucidworks
 
Big Data Palooza Talk: Aspects of Semantic Processing
Big Data Palooza Talk: Aspects of Semantic ProcessingBig Data Palooza Talk: Aspects of Semantic Processing
Big Data Palooza Talk: Aspects of Semantic ProcessingNa'im Tyson
 
Math-Bridge Edit Authoring
Math-Bridge Edit AuthoringMath-Bridge Edit Authoring
Math-Bridge Edit Authoringmetamath
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...S. Diana Hu
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...Joaquin Delgado PhD.
 
IASSIST 2011 - Representation of the Data Documentation Initiative using Sema...
IASSIST 2011 - Representation of the Data Documentation Initiative using Sema...IASSIST 2011 - Representation of the Data Documentation Initiative using Sema...
IASSIST 2011 - Representation of the Data Documentation Initiative using Sema...Dr.-Ing. Thomas Hartmann
 
Open IE tutorial 2018
Open IE tutorial 2018Open IE tutorial 2018
Open IE tutorial 2018Andre Freitas
 
Introduction to Software - Coder Forge - John Mulhall
Introduction to Software - Coder Forge - John MulhallIntroduction to Software - Coder Forge - John Mulhall
Introduction to Software - Coder Forge - John MulhallJohn Mulhall
 
DODDLE-OWL: A Domain Ontology Construction Tool with OWL
DODDLE-OWL: A Domain Ontology Construction Tool with OWLDODDLE-OWL: A Domain Ontology Construction Tool with OWL
DODDLE-OWL: A Domain Ontology Construction Tool with OWLTakeshi Morita
 
Effective Semantics for Engineering NLP Systems
Effective Semantics for Engineering NLP SystemsEffective Semantics for Engineering NLP Systems
Effective Semantics for Engineering NLP SystemsAndre Freitas
 
A Graph is a Graph is a Graph: Equivalence, Transformation, and Composition o...
A Graph is a Graph is a Graph: Equivalence, Transformation, and Composition o...A Graph is a Graph is a Graph: Equivalence, Transformation, and Composition o...
A Graph is a Graph is a Graph: Equivalence, Transformation, and Composition o...Joshua Shinavier
 
Introduction into Search Engines and Information Retrieval
Introduction into Search Engines and Information RetrievalIntroduction into Search Engines and Information Retrieval
Introduction into Search Engines and Information RetrievalA. LE
 
Intro to nlp
Intro to nlpIntro to nlp
Intro to nlpankit_ppt
 
Ontology Engineering: Introduction
Ontology Engineering: IntroductionOntology Engineering: Introduction
Ontology Engineering: IntroductionGuus Schreiber
 

Semelhante a Mining Diverse Views from Related Articles (20)

Vectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic MatchingVectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic Matching
 
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
 
ESWC 2011 BLOOMS+
ESWC 2011 BLOOMS+ ESWC 2011 BLOOMS+
ESWC 2011 BLOOMS+
 
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
 
Big Data Palooza Talk: Aspects of Semantic Processing
Big Data Palooza Talk: Aspects of Semantic ProcessingBig Data Palooza Talk: Aspects of Semantic Processing
Big Data Palooza Talk: Aspects of Semantic Processing
 
Math-Bridge Edit Authoring
Math-Bridge Edit AuthoringMath-Bridge Edit Authoring
Math-Bridge Edit Authoring
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
 
IASSIST 2011 - Representation of the Data Documentation Initiative using Sema...
IASSIST 2011 - Representation of the Data Documentation Initiative using Sema...IASSIST 2011 - Representation of the Data Documentation Initiative using Sema...
IASSIST 2011 - Representation of the Data Documentation Initiative using Sema...
 
Open IE tutorial 2018
Open IE tutorial 2018Open IE tutorial 2018
Open IE tutorial 2018
 
NLP & DBpedia
 NLP & DBpedia NLP & DBpedia
NLP & DBpedia
 
Introduction to Software - Coder Forge - John Mulhall
Introduction to Software - Coder Forge - John MulhallIntroduction to Software - Coder Forge - John Mulhall
Introduction to Software - Coder Forge - John Mulhall
 
Final presentation
Final presentationFinal presentation
Final presentation
 
DODDLE-OWL: A Domain Ontology Construction Tool with OWL
DODDLE-OWL: A Domain Ontology Construction Tool with OWLDODDLE-OWL: A Domain Ontology Construction Tool with OWL
DODDLE-OWL: A Domain Ontology Construction Tool with OWL
 
Effective Semantics for Engineering NLP Systems
Effective Semantics for Engineering NLP SystemsEffective Semantics for Engineering NLP Systems
Effective Semantics for Engineering NLP Systems
 
A Graph is a Graph is a Graph: Equivalence, Transformation, and Composition o...
A Graph is a Graph is a Graph: Equivalence, Transformation, and Composition o...A Graph is a Graph is a Graph: Equivalence, Transformation, and Composition o...
A Graph is a Graph is a Graph: Equivalence, Transformation, and Composition o...
 
Introduction into Search Engines and Information Retrieval
Introduction into Search Engines and Information RetrievalIntroduction into Search Engines and Information Retrieval
Introduction into Search Engines and Information Retrieval
 
Intro to nlp
Intro to nlpIntro to nlp
Intro to nlp
 
Ontology Engineering: Introduction
Ontology Engineering: IntroductionOntology Engineering: Introduction
Ontology Engineering: Introduction
 
MongoDB for Genealogy
MongoDB for GenealogyMongoDB for Genealogy
MongoDB for Genealogy
 

Mais de RENDER project

Text Stream Processing Tutorial @WIMS 2012
Text Stream Processing Tutorial @WIMS 2012Text Stream Processing Tutorial @WIMS 2012
Text Stream Processing Tutorial @WIMS 2012RENDER project
 
Internals Of An Aggregated Web News Feed
Internals Of An Aggregated Web News FeedInternals Of An Aggregated Web News Feed
Internals Of An Aggregated Web News FeedRENDER project
 
Unterstützungswerkzeuge für Wikipedia
Unterstützungswerkzeuge für WikipediaUnterstützungswerkzeuge für Wikipedia
Unterstützungswerkzeuge für WikipediaRENDER project
 
Render Review: Wikipedia Case Study, Year 1
Render Review: Wikipedia Case Study, Year 1Render Review: Wikipedia Case Study, Year 1
Render Review: Wikipedia Case Study, Year 1RENDER project
 
Wiki case study - Review year 1
Wiki case study  - Review year 1Wiki case study  - Review year 1
Wiki case study - Review year 1RENDER project
 
Towards a diversity-minded Wikipedia
Towards a diversity-minded WikipediaTowards a diversity-minded Wikipedia
Towards a diversity-minded WikipediaRENDER project
 
Diversiweb2011 07 Approximate subgraph matching - Mitja Trampus
Diversiweb2011 07 Approximate subgraph matching - Mitja TrampusDiversiweb2011 07 Approximate subgraph matching - Mitja Trampus
Diversiweb2011 07 Approximate subgraph matching - Mitja TrampusRENDER project
 
Diversiweb2011 06 Faceted Approach To Diverse Query Processing - Devika P. Ma...
Diversiweb2011 06 Faceted Approach To Diverse Query Processing - Devika P. Ma...Diversiweb2011 06 Faceted Approach To Diverse Query Processing - Devika P. Ma...
Diversiweb2011 06 Faceted Approach To Diverse Query Processing - Devika P. Ma...RENDER project
 
Diversiweb2011 05 Scalable Detection of Sentiment-Based Contradictions - Mika...
Diversiweb2011 05 Scalable Detection of Sentiment-Based Contradictions - Mika...Diversiweb2011 05 Scalable Detection of Sentiment-Based Contradictions - Mika...
Diversiweb2011 05 Scalable Detection of Sentiment-Based Contradictions - Mika...RENDER project
 
Diversiweb2011 04 Expressing Opinion Diversity - Delia Rusu
Diversiweb2011 04 Expressing Opinion Diversity - Delia RusuDiversiweb2011 04 Expressing Opinion Diversity - Delia Rusu
Diversiweb2011 04 Expressing Opinion Diversity - Delia RusuRENDER project
 
Diversiweb2011 03 Towards a Knowledge Diversity Model - Denny Vrandecic
Diversiweb2011 03 Towards a Knowledge Diversity Model - Denny VrandecicDiversiweb2011 03 Towards a Knowledge Diversity Model - Denny Vrandecic
Diversiweb2011 03 Towards a Knowledge Diversity Model - Denny VrandecicRENDER project
 
Diversiweb2011 01 Opening - Elena Simperl
Diversiweb2011 01 Opening - Elena SimperlDiversiweb2011 01 Opening - Elena Simperl
Diversiweb2011 01 Opening - Elena SimperlRENDER project
 
Data Collection and Integration, Linked Data Management
Data Collection and Integration, Linked Data ManagementData Collection and Integration, Linked Data Management
Data Collection and Integration, Linked Data ManagementRENDER project
 
Render Project introduction and overview
Render Project introduction and overviewRender Project introduction and overview
Render Project introduction and overviewRENDER project
 

Mais de RENDER project (17)

Text Stream Processing Tutorial @WIMS 2012
Text Stream Processing Tutorial @WIMS 2012Text Stream Processing Tutorial @WIMS 2012
Text Stream Processing Tutorial @WIMS 2012
 
Internals Of An Aggregated Web News Feed
Internals Of An Aggregated Web News FeedInternals Of An Aggregated Web News Feed
Internals Of An Aggregated Web News Feed
 
Unterstützungswerkzeuge für Wikipedia
Unterstützungswerkzeuge für WikipediaUnterstützungswerkzeuge für Wikipedia
Unterstützungswerkzeuge für Wikipedia
 
Render Review: Wikipedia Case Study, Year 1
Render Review: Wikipedia Case Study, Year 1Render Review: Wikipedia Case Study, Year 1
Render Review: Wikipedia Case Study, Year 1
 
Wiki case study - Review year 1
Wiki case study  - Review year 1Wiki case study  - Review year 1
Wiki case study - Review year 1
 
Towards a diversity-minded Wikipedia
Towards a diversity-minded WikipediaTowards a diversity-minded Wikipedia
Towards a diversity-minded Wikipedia
 
Diversiweb2011 07 Approximate subgraph matching - Mitja Trampus
Diversiweb2011 07 Approximate subgraph matching - Mitja TrampusDiversiweb2011 07 Approximate subgraph matching - Mitja Trampus
Diversiweb2011 07 Approximate subgraph matching - Mitja Trampus
 
Diversiweb2011 06 Faceted Approach To Diverse Query Processing - Devika P. Ma...
Diversiweb2011 06 Faceted Approach To Diverse Query Processing - Devika P. Ma...Diversiweb2011 06 Faceted Approach To Diverse Query Processing - Devika P. Ma...
Diversiweb2011 06 Faceted Approach To Diverse Query Processing - Devika P. Ma...
 
Diversiweb2011 05 Scalable Detection of Sentiment-Based Contradictions - Mika...
Diversiweb2011 05 Scalable Detection of Sentiment-Based Contradictions - Mika...Diversiweb2011 05 Scalable Detection of Sentiment-Based Contradictions - Mika...
Diversiweb2011 05 Scalable Detection of Sentiment-Based Contradictions - Mika...
 
Diversiweb2011 04 Expressing Opinion Diversity - Delia Rusu
Diversiweb2011 04 Expressing Opinion Diversity - Delia RusuDiversiweb2011 04 Expressing Opinion Diversity - Delia Rusu
Diversiweb2011 04 Expressing Opinion Diversity - Delia Rusu
 
Diversiweb2011 03 Towards a Knowledge Diversity Model - Denny Vrandecic
Diversiweb2011 03 Towards a Knowledge Diversity Model - Denny VrandecicDiversiweb2011 03 Towards a Knowledge Diversity Model - Denny Vrandecic
Diversiweb2011 03 Towards a Knowledge Diversity Model - Denny Vrandecic
 
Diversiweb2011 01 Opening - Elena Simperl
Diversiweb2011 01 Opening - Elena SimperlDiversiweb2011 01 Opening - Elena Simperl
Diversiweb2011 01 Opening - Elena Simperl
 
Data Collection and Integration, Linked Data Management
Data Collection and Integration, Linked Data ManagementData Collection and Integration, Linked Data Management
Data Collection and Integration, Linked Data Management
 
Diversity toolkit
Diversity toolkitDiversity toolkit
Diversity toolkit
 
RENDER Telefonica
RENDER TelefonicaRENDER Telefonica
RENDER Telefonica
 
Defining Diversity
Defining DiversityDefining Diversity
Defining Diversity
 
Render Project introduction and overview
Render Project introduction and overviewRender Project introduction and overview
Render Project introduction and overview
 

Último

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 

Último (20)

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 

Mining Diverse Views from Related Articles

  • 2. • WWW - diverse content - 100+ articles on major topics • Google News/Amazon - organized - (yet) too much text 2
  • 3. • Condenses information  salient points  length α (1/content)  user-specified parameters • Lacks Organization × delineation of issues × model diversity × too long (?) 3
  • 4. • A view intends to represent an issue pertaining to a set of related1 articles - organized (multiple concise views) - information exploration - detailed snapshot • Example2 : review dataset (hotel) - views [positive, negative, food, facilities] - summary [unorganized] 1. articles concerning a common topic (FIFA 2010, swine flu in India etc.) 2. http://sites.google.com/site/diverseviews/comparison 4
  • 5.
  • 6.
  • 7. • Related Work • Problem • Extraction of views • Ranking • Results • Discussion 7
  • 8. • Allison et. al • idea of multiple view-points • framework • Tombros et. al • clustering of top-ranking sentences • TextTiling 1 • divide text into multi-paragraph units • unit represents a sub-topic 1. M. A. Hearst 1997 8
  • 9. Related Articles Problem [Mining Diverse Views] Ranked set of views 9
  • 11. • Datasets • sources: google news, amazon.com, tripadvisor.com • crawling + parsing [html and rss] • Data cleaning & pre-processing • stopwords, stemming and duplicates • word-frequency, TF-IDF 1 1. http://nlp.stanford.edu/IR-book/html/htmledition/tf-idf-weighting-1.html 11
  • 12. • Main idea • We score each sentence in our dataset and extract the top-ranked ones. These sentences are used to generate views [Pruning] • We assign a Importance (I) score to each sentence • Importance Ik of a sentence Sk belonging to article dj of length r is Ik = Πr Ti,j r Ti,j = TF-IDF of wi є (Sk Λ dj) 12
  • 13. • A measure of similarity is required to extract views from sentences • Semantic similarity : likeness of meaning • Mihalcea et. al • specificity of a word can be determined by its idf • we use  word-to-word similarity &  specificity - to calculate semantic similarity 13
  • 14. • Semantic similarity between sentences Si and Sj where w represents a word in a sentence is (symmetric relation & range Є [0,1]) • Need to define maxSim(w, Sj) • Wordnet : sets of cognitive synonyms (synsets) • wup1 : based on path length between synsets 1. Z. Wu 1994 14
  • 15. • Clustering is used to extract views from the set of important sentences • Hierarchical Agglomerative Clustering (HAC) was used • upper triangle [symmetric matrix] • no restrictions on # of clusters • terminate clustering when scoring parameter converges • We treat clusters as views discussing similar content 15
  • 16. • Focus on average pair-wise similarity between the sentences in a view • Cohesion C = Σi,j Є V Si,j len(v) Si,j = sim(Ti , Tj ) V = set of sentences (Ti) in the view len(v) = # of sentences in the view 16
  • 17. • Most relevant view (MRV) • preference to views discussing similar content [greater cohesion] • top-ranked view • Outlier view (OV) • single sentence • low semantic similarity with other sentences • Cohesion = 0 [ordered by importance] 17
  • 18. HTML + Text Raw Text Related Data Cleaning Extract and Top-ranked Articles Preprocessing sentences Top-n Sentences Ranked Ranking Clustering Views (Cohesion) Engine MRV & OV Ranked Views Views 18
  • 19. • Number of top-ranking sentences (n) vs. cohesion • n which can maximize cohesion • median cohesion >= mean [outliers] • 20 <= n <= 35 • incremental clustering 1 • More top-ranking sentences need not necessarily lead to views with better cohesion 1. M. Charikar STOC 1997 19
  • 20. 20
  • 21. • IR model as an alternate to summarization  multiple diverse views  easily navigable  browse top x views  detailed (yet organized) snapshot of a ToI  clustering at sentence/phrase level* • Future work • polarity of a view • user feedback • Implicit [clicks, time-spent] • Explicit [user-ratings] * as opposed to document clustering 21