SlideShare a Scribd company logo
1 of 17
Download to read offline
Approximate subgraph matching
 for detection of topic variations

                Mitja Trampuš
                Dunja Mladenić
                AI Lab, Jožef Stefan Institute, Slovenia




                                        ailab.ijs.si
Mining Diversity
• Web content varies in many aspects, e.g.
  o   Topical
  o   Social (author, target audience, people written about)
  o   Geographical (publisher, places written about)
  o   Sentiment (positive/negative)
  o   Writing style (structure, vocabulary)
  o   Coverage bias


• This work: (micro-)topical diversity
  o Macroscopic = largely solved
  o Microscopic = challenge

                                            AI Lab, Jozef Stefan Institute
                                                               ailab.ijs.si
Task:
Given a collection of texts on a topic,
• identify a common template
• align texts to the template




                                 AI Lab, Jozef Stefan Institute
                                                    ailab.ijs.si
Template representation
• Syntactic
  o info1: X people were killed / killed X people / resulted in X
    casualties
  o info2: blew up Y / destroyed Y / attacked in a Y


• Semantic
  o kill(bomber, people); count(people, X)
  o destroy(bomber, Y)



             X   count
                              kill bomber destroy     Y
                     people
                                             AI Lab, Jozef Stefan Institute
                                                                ailab.ijs.si
car
                                             drive
                 12      count
                                      kill suicide destroychurch
                            believers
                                           bomber
                      exit run                wear vest
Semantic Graph

                                  treatment           execute             attack
Prerequisite:



                               receive                        withstand
                 100    count
                                         kill terroristdemolish
                                                              hospital
                            patients




                  2     count
                             police slaughter     blow uppolice
                                            bomber
                             officer                     station

                                                     AI Lab, Jozef Stefan Institute
                                                                        ailab.ijs.si
Mining Templates
 • Template := subgraph with frequent
               specializations
     o Specializations implied by background taxonomy
       (WordNet)
     o Threshold frequency manually defined

                 X      count
                                        kill bomberdestroy building
                            people




12                             tear
     count             suicide down              2
        believers kill              church           count
                                                          police slaughter           police
                                                                               blow up
                       bomber                                            bomber
                                                         officer                     station
                                                         AI Lab, Jozef Stefan Institute
                                                                            ailab.ijs.si
Semantic Graph Construction
1. Data: Google News crawl
2. HTML cleanup
3. Named entity tagging
4. Pronoun resolution (he/she/him/her)
5. Named entity consolidation (Barack Obama
   vs President Obama)
6. Parsing, triple/fact/assertion extraction
   (for now: subj-verb-obj only)
7. Ontology/taxonomy alignment
8. Merging triples into a graph
                             AI Lab, Jozef Stefan Institute
                                                ailab.ijs.si
Approximate subgraph matching
                       number count
                                              kill person destroy
                                                                location
                                  people




                                                                     FREQUENT
                                                                      SUBTREE MINING

         count
number
             people   kill person destroylocation      number count                            destroy
                                                                                kill person           location
                                                                  people




                                                                                         GENERALIZE


  12     count                     tear
                           suicide down                   2    count      slaughter      blow up
            believers kill              church                      police                     police
                           bomber                                                  bomber
                                                                   officer                     station
                                                                    AI Lab, Jozef Stefan Institute
                                                                                       ailab.ijs.si
Approximate subgraph matching
                   number count
                                        kill person destroy
                                                          location
                              people




                                                  SPECIALIZE


                   number count
                                        kill bomberdestroy
                                                         building
                              people




12                             tear
     count             suicide down                 2
        believers kill              church                count
                                                               police slaughter           police
                                                                                    blow up
                       bomber                                                 bomber
                                                              officer                     station
                                                              AI Lab, Jozef Stefan Institute
                                                                                 ailab.ijs.si
Preliminary results
• 5 test domains; for each:
  o ~10 graphs, ~10000 nodes
  o 10-60 seconds


• At min. support 30%
  o 20 maximal patterns, 9 manually judged as interesting




                                          AI Lab, Jozef Stefan Institute
                                                             ailab.ijs.si
AI Lab, Jozef Stefan Institute
                   ailab.ijs.si
Conclusion
• Future work:
  o Mapping text -> semantics
      • Other ontologies?
  o Interestingness measure for assertions and patterns
  o Evaluation (precision, recall; multiple domains)
  o Alternative approaches to generalizing subgraphs


• Template extraction is achievable,
  but not easy
• Human filtering of results hard to avoid
• Current approach reasonably fast
                                           AI Lab, Jozef Stefan Institute
                                                              ailab.ijs.si
Thank you.
AI Lab, Jozef Stefan Institute
                   ailab.ijs.si
Can we extract all relations?
                           Kind of …
Thousands of small quakes
resumed 18 months ago and
continue to rattle Mammoth
Lakes, June Lake and other Mono
County resort towns. The
temblors, most measuring 1 to 3
on the Richter scale, started
beneath Mammoth Mountain.

Subject – Verb – Object
       Triplets

                            Semantic Graph
AI Lab, Jozef Stefan Institute
                   ailab.ijs.si
Templates - why
• Interpret content
   o news archives: structure/annotate old texts, enable
     semantic search
   o wikipedia: suggestions for infobox entries


• Generate content
   o wikipedia: a starting point for new articles / a checklist of
     information to be included




• No normative definition of “good Jozef Stefanailab.ijs.si
                                AI Lab,
                                        template”
                                                Institute
Evaluation
• Qualitative
  o Usage-specific
  o Not useful for tuning algorithms


• Quantitative
  o Precision
  o Recall




                                       AI Lab, Jozef Stefan Institute
                                                          ailab.ijs.si

More Related Content

More from RENDER project (12)

Internals Of An Aggregated Web News Feed
Internals Of An Aggregated Web News FeedInternals Of An Aggregated Web News Feed
Internals Of An Aggregated Web News Feed
 
Unterstützungswerkzeuge für Wikipedia
Unterstützungswerkzeuge für WikipediaUnterstützungswerkzeuge für Wikipedia
Unterstützungswerkzeuge für Wikipedia
 
Render Review: Wikipedia Case Study, Year 1
Render Review: Wikipedia Case Study, Year 1Render Review: Wikipedia Case Study, Year 1
Render Review: Wikipedia Case Study, Year 1
 
Wiki case study - Review year 1
Wiki case study  - Review year 1Wiki case study  - Review year 1
Wiki case study - Review year 1
 
Towards a diversity-minded Wikipedia
Towards a diversity-minded WikipediaTowards a diversity-minded Wikipedia
Towards a diversity-minded Wikipedia
 
Diversiweb2011 02 Opening- Devika P. Madalli
Diversiweb2011 02 Opening- Devika P. MadalliDiversiweb2011 02 Opening- Devika P. Madalli
Diversiweb2011 02 Opening- Devika P. Madalli
 
Diversiweb2011 05 Scalable Detection of Sentiment-Based Contradictions - Mika...
Diversiweb2011 05 Scalable Detection of Sentiment-Based Contradictions - Mika...Diversiweb2011 05 Scalable Detection of Sentiment-Based Contradictions - Mika...
Diversiweb2011 05 Scalable Detection of Sentiment-Based Contradictions - Mika...
 
Diversiweb2011 04 Expressing Opinion Diversity - Delia Rusu
Diversiweb2011 04 Expressing Opinion Diversity - Delia RusuDiversiweb2011 04 Expressing Opinion Diversity - Delia Rusu
Diversiweb2011 04 Expressing Opinion Diversity - Delia Rusu
 
Diversiweb2011 01 Opening - Elena Simperl
Diversiweb2011 01 Opening - Elena SimperlDiversiweb2011 01 Opening - Elena Simperl
Diversiweb2011 01 Opening - Elena Simperl
 
Data Collection and Integration, Linked Data Management
Data Collection and Integration, Linked Data ManagementData Collection and Integration, Linked Data Management
Data Collection and Integration, Linked Data Management
 
RENDER Telefonica
RENDER TelefonicaRENDER Telefonica
RENDER Telefonica
 
Defining Diversity
Defining DiversityDefining Diversity
Defining Diversity
 

Recently uploaded

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Recently uploaded (20)

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 

Diversiweb2011 07 Approximate subgraph matching - Mitja Trampus

  • 1. Approximate subgraph matching for detection of topic variations Mitja Trampuš Dunja Mladenić AI Lab, Jožef Stefan Institute, Slovenia ailab.ijs.si
  • 2. Mining Diversity • Web content varies in many aspects, e.g. o Topical o Social (author, target audience, people written about) o Geographical (publisher, places written about) o Sentiment (positive/negative) o Writing style (structure, vocabulary) o Coverage bias • This work: (micro-)topical diversity o Macroscopic = largely solved o Microscopic = challenge AI Lab, Jozef Stefan Institute ailab.ijs.si
  • 3. Task: Given a collection of texts on a topic, • identify a common template • align texts to the template AI Lab, Jozef Stefan Institute ailab.ijs.si
  • 4. Template representation • Syntactic o info1: X people were killed / killed X people / resulted in X casualties o info2: blew up Y / destroyed Y / attacked in a Y • Semantic o kill(bomber, people); count(people, X) o destroy(bomber, Y) X count kill bomber destroy Y people AI Lab, Jozef Stefan Institute ailab.ijs.si
  • 5. car drive 12 count kill suicide destroychurch believers bomber exit run wear vest Semantic Graph treatment execute attack Prerequisite: receive withstand 100 count kill terroristdemolish hospital patients 2 count police slaughter blow uppolice bomber officer station AI Lab, Jozef Stefan Institute ailab.ijs.si
  • 6. Mining Templates • Template := subgraph with frequent specializations o Specializations implied by background taxonomy (WordNet) o Threshold frequency manually defined X count kill bomberdestroy building people 12 tear count suicide down 2 believers kill church count police slaughter police blow up bomber bomber officer station AI Lab, Jozef Stefan Institute ailab.ijs.si
  • 7. Semantic Graph Construction 1. Data: Google News crawl 2. HTML cleanup 3. Named entity tagging 4. Pronoun resolution (he/she/him/her) 5. Named entity consolidation (Barack Obama vs President Obama) 6. Parsing, triple/fact/assertion extraction (for now: subj-verb-obj only) 7. Ontology/taxonomy alignment 8. Merging triples into a graph AI Lab, Jozef Stefan Institute ailab.ijs.si
  • 8. Approximate subgraph matching number count kill person destroy location people FREQUENT SUBTREE MINING count number people kill person destroylocation number count destroy kill person location people GENERALIZE 12 count tear suicide down 2 count slaughter blow up believers kill church police police bomber bomber officer station AI Lab, Jozef Stefan Institute ailab.ijs.si
  • 9. Approximate subgraph matching number count kill person destroy location people SPECIALIZE number count kill bomberdestroy building people 12 tear count suicide down 2 believers kill church count police slaughter police blow up bomber bomber officer station AI Lab, Jozef Stefan Institute ailab.ijs.si
  • 10. Preliminary results • 5 test domains; for each: o ~10 graphs, ~10000 nodes o 10-60 seconds • At min. support 30% o 20 maximal patterns, 9 manually judged as interesting AI Lab, Jozef Stefan Institute ailab.ijs.si
  • 11. AI Lab, Jozef Stefan Institute ailab.ijs.si
  • 12. Conclusion • Future work: o Mapping text -> semantics • Other ontologies? o Interestingness measure for assertions and patterns o Evaluation (precision, recall; multiple domains) o Alternative approaches to generalizing subgraphs • Template extraction is achievable, but not easy • Human filtering of results hard to avoid • Current approach reasonably fast AI Lab, Jozef Stefan Institute ailab.ijs.si
  • 13. Thank you. AI Lab, Jozef Stefan Institute ailab.ijs.si
  • 14. Can we extract all relations? Kind of … Thousands of small quakes resumed 18 months ago and continue to rattle Mammoth Lakes, June Lake and other Mono County resort towns. The temblors, most measuring 1 to 3 on the Richter scale, started beneath Mammoth Mountain. Subject – Verb – Object Triplets Semantic Graph
  • 15. AI Lab, Jozef Stefan Institute ailab.ijs.si
  • 16. Templates - why • Interpret content o news archives: structure/annotate old texts, enable semantic search o wikipedia: suggestions for infobox entries • Generate content o wikipedia: a starting point for new articles / a checklist of information to be included • No normative definition of “good Jozef Stefanailab.ijs.si AI Lab, template” Institute
  • 17. Evaluation • Qualitative o Usage-specific o Not useful for tuning algorithms • Quantitative o Precision o Recall AI Lab, Jozef Stefan Institute ailab.ijs.si