SlideShare a Scribd company logo
1 of 22
Information & Database Systems Lab




                                     Entity Graph Mining and Matching
                                                                          Seung-won Hwang
                                                                         Associate Professor
                                             Department of Computer Science and Engineering
                                                                            POSTECH, Korea
Mining Human Intelligence from the Web: Click Graph
                                      Language-agnostic/data-intensive: e.g., arabic Corpus?
Information & Database Systems Lab




                                                                  Are q1 and q2 similar?




                                                                  Are u3 and u4 similar?
Mining at Finer Granularity: Named Entity (NE) Graph
                                      Person name, Place name, Organization name, Product name
                                        Newspapers, Web sites, TV programs, …
Information & Database Systems Lab




                                                                                             Apple
                                                                                                                 MS
                                                                                       tenure
                                                                                                          Co-founder
                                                                                            jobs
                                                                                                                 gates
                                                                                                   complicated

                                                                                            Mac
Case I: Matching names with twitter accounts [EDBT11]
Information & Database Systems Lab
Case II: Entity Translation [EMNLP10,CIKM11]
                                      What are the features?
                                      How are the features combined?
                                     (using translation as an application scenario)
Information & Database Systems Lab




                                                                 NE                                      NE
                                                                                                                   NE
                                                      NE
                                                                                               NE
                                                                                NE                            NE
                                                                      NE
                                                                                                                        NE
                                                NE
                                                            NE                       NE   NE        NE
                                                                           NE
                                                                                                                         NE
                                     English                                                                  NE
                                                                                                                              Chinese
                                     Corpus      NE
                                                                                                                              Corpus
                                                                                          NE
                                                                 NE                                 NE
                                                                                     NE

                                                                                                                        NE
                                                           NE                                                 NE
                                                                      NE                       NE



                                                            Ge=(Ve, Ee)                              Gc=(Vc, Ec)
NE Translation
                                      Goal
                                        Finding a NE in source language into its NE in target language
                                        Ex) “Obama” (English)  “奥巴马” (Chinese)
                                      Resources: comparable corpora
Information & Database Systems Lab




                                                                       NEE          NEE
                                                                         Features     Features
                                                                                                                Find!!
                                                                       NEE          NEE
                                                                         Features     Features

                                        Xinhua News Agency (English)
                                                                                                          NEE            NEC

                                                                                                          NEE            NEC
                                                                       NEC          NEC
                                                                                                          NEE            NEC
                                                                         Features     Features

                                                                       NEC          NEC                   NEE            NEC
                                                                         Features     Features

                                        Xinhua News Agency (Chinese)
NE Translation Similarity Features
                                      Entity Name Similarity (E): S.Wan [1], L. Haizhou [2], K. Knight [3]
                                          Pronunciation similarity between named entities
                                          Ex) “Obama” and “奥巴马” (pronounced Aobama)
Information & Database Systems Lab




                                      Entity Context Similarity (EC): M. Diab [4], H. Ji [5], K. Yu [6]
                                          Contextual word similarity between named entities
                                          Ex) The president (总统) Obama (奥巴马)
                                              “As president, Obama signed economic stimulus legislation …”



                                      Relationship Similarity (R): G.-w.You [7]
                                          Co-occurrence similarity between pairs of named entities
                                          Ex) (“Jackie Chan”, “Bill Gates” ) vs. (“成龙”, “比尔·盖茨 ”)
Motivation
                                      Taxonomy Table

                                                                        Entity     Relationship
                                        Using Entity Names            E [1,2,3]         R         You [7]
Information & Database Systems Lab




                                        Using Textual Context         EC [4,5,6]        ?
                                                                      Shao [8]




                                     Research questions:
                                        Why RC is not used?
                                        Can all four categories combined?
In this paper…
                                      We propose a new NE translation similarity feature
                                         Relationship Context similarity (RC)
                                            Contextual word similarity between named entities
                                            Ex) pair (“Barack”, “Michelle”)  Spouse
Information & Database Systems Lab




                                      We propose new holistic approaches
                                            Combining all E, EC, R, and RC




                                      We validate our proposed approach using extensive
                                       experiments
Our Framework
                                      We abstract this problem as…
                                      Graph Matching of two NE relationship graphs extracted from
                                       comparable corpora
Information & Database Systems Lab




                                                                                                              Populate a decision matrix
                                                                                                                R, |Ve|-by-|Vc| matrix



                                                                NE                                      NE
                                                                                                                    NE
                                                     NE
                                                                                              NE
                                                                               NE                            NE
                                                                     NE
                                                                                                                         NE
                                               NE
                                                           NE                       NE   NE        NE
                                                                          NE
                                                                                                                          NE
                                     English                                                                 NE
                                                                                                                                    Chinese
                                     Corpus     NE
                                                                                                                                    Corpus
                                                                                         NE
                                                                NE                                 NE
                                                                                    NE

                                                                                                                         NE
                                                          NE                                                 NE
                                                                     NE                       NE



                                                           Ge=(Ve, Ee)                              Gc=(Vc, Ec)
Our Framework
                                      Overview – 3 Steps
                                        Initialization
                                                                                                                 奥巴马        成龙
                                            Construct NE relationship graphs
                                            Build an initial pairwise similarity matrix R0        Obama         .99   .1   .2
Information & Database Systems Lab




                                            Use Entity (E) and Entity Context (EC) similarities
                                                                                                   Jackie chan              .1
                                        Iterative reinforcement
                                            Build a final pairwise similarity matrix R∞
                                            Use Relationship (R) and Relationship Context (RC) similarities


                                        Matching
                                            Find 1:1 matching from R∞
                                                                                                                 奥巴马        成龙
                                            Build a binary hard decision matrix R*
                                                                                                   Obama         .99   .1   .2



                                                                                                   Jackie chan              .99
Initialization
                                      Constructing NE relationship graphs G = (N, E)
                                         Extract NEs using entity tagger for each document in each corpus
                                         Regard NEs that appears more than δ times as Nodes
                                         Connect two Nodes when they co-occur more than δ times
Information & Database Systems Lab




                                      Initializing R0
                                         Computing entity similarity matrix SE
                                             Use Edit-Distance (ED) between ‘ei’ and Pinyin representation of ‘cj’
                                             Ex) ED(“Obama”, “奥巴马”) = ED(“Obama”, “Aobama”)


                                                                    E
                                                                                ED(ei , PYC j )
                                                               S   ij   1
                                                                            Len(ei ) Len( PYC j )
Initialization
                                      Initializing R0
                                         Computing entity context similarity matrix SEC
                                             Context word
Information & Database Systems Lab




                                               ex) “As president, Obama signed economic stimulus legislation …”




                                             Context window

                                               CW ( NE , d ) {wi   l/2   , wi   l/2 1   ,..., wi ( NE ),..., wi   l/2 1   , wi   l/2   }




                                             Correlation between a NE and a context word : Log-odd ratios
Initialization
                                      Initializing R0
                                         Computing entity context similarity matrix SEC
                                             Projected Context Association Vector
Information & Database Systems Lab




                                               Obama           Score                            奥巴马   Score
                                                 …              …                                …     …
                                              President         0.9                              …     …
                                                 …              …                               总统    0.85
                                                 …              …                                …     …



                                                                                Dictionary
                                     USA
                                                                                     …
                                                                                                美
                                                                                                國
                                                                              (President, 总统)
                                                                                     …
                                                                                     …


                                                          president                                           统总
Initialization
                                      Initializing R0
                                         Computing entity context similarity matrix SEC
                                             Context Similarity between ‘ei’ and ‘cj’
                                             Compute cosine similarity between two vectors
Information & Database Systems Lab




                                                                           EC
                                                                                CAei CAc j
                                                                      S   ij
                                                                                CAei    CAc j


                                         Merging SE and SEC
                                             Min-Max normalization in range [0:1]
                                             Merge


                                                                        Rij     SijE SijEC
Reinforcement
                                      Intuition
                                         Two NEs with a strong relationship
                                            Co-occur frequently                    have edge
                                            Share similar context                  have similar relationship context
Information & Database Systems Lab




                                                                                                       NE
                                                                        NE

                                                                                                      context
                                                                  context

                                                            X
                                                                                                                  Y



                                                                 context                                                  context


                                                                        NE
                                                                                                                                NE




                                                       English NE Graph                                      Chinese NE Graph
                                           1. Align neighbors
                                               using relationship (R) and relationship context (RC) similarity
                                           2. Update the similarity score
Reinforcement
                                      Iterative Approach

                                                 Relationship Context (RC) Similarity between
                                                 relation pair (i, u) and (j, v)
Information & Database Systems Lab




                                               Relationship-based Similarity (R & RC)                              Entity-based Similarity (E & EC)

                                                                                            t      RC
                                                                                           Ruv ( Siu , jv )
                                                     Rij 1
                                                       t
                                                                                                              (1           0
                                                                                                                       ) Rij
                                                                             t
                                                                ( u ,v ) k B ( i , j , )          2k


                                      Ordered set of aligned neighbor pairs of (i, j)
                                      at iteration t

                                                                                                   Relationship (R) Similarity of
                                                                                                   i’s neighbor u and j’s neighbor v
Matching
                                      Finding 1:1 matching using greedy algorithm

                                      Steps
Information & Database Systems Lab




                                       1.    Find a translation pair with the highest final similarity score
                                       2.    Select the pair and remove the corresponding row and column from R∞
                                       3.    Repeat 1. and 2. until the similarity score < threshold




                                        R∞
Experiments
                                      Dataset
                                        English Gigaword Corpus
                                            Xinhua News Agency 2008.01~2008.12
                                            100,746 news documents
                                        Chinese Gigaword Corpus
Information & Database Systems Lab




                                            Xinhua News Agency 2008.01~2008.12
                                            88,029 news documents


                                      Approaches
                                          EC                              : consider Entity context similarity feature only
                                          E                               : consider Entity name similarity feature only
                                          Shao (E+EC)                     : combine Entity name & Entity Context similarities
                                          You (E+R)                       : combine Entity name & Relationship similarities
                                          Ours
                                            E+EC+R (when ϒ = 0)
                                            E+EC+R+RC


                                      Measure
                                        Precision, Recall, and F1-score
Experiments
                                      Effectiveness of overall framework
                                         500 person named entities
                                         Set λ = 0.15
                                         5-fold cross-validation for threshold parameter learning
Information & Database Systems Lab




                                      Other type of NE (100 Location named entities)
Directions
                                      Graph matching
                                      Graph cleansing [VLDB11]
                                      Scalable entity search
Information & Database Systems Lab




                                                                  US Presidents
                                                                  Bill Clinton
                                                                  William J Clinton
                                                                  George W. Bush
                                                                  George H.W. Bush
                                                                  Dubya
Thanks
                                      Question?
Information & Database Systems Lab




                                     Visit: www.postech.ac.kr/~swhwang for these papers

More Related Content

More from Michael Shilman

Personal Desire / Design Fiction
Personal Desire / Design FictionPersonal Desire / Design Fiction
Personal Desire / Design FictionMichael Shilman
 
Myoyoung Kim: Visual Storytelling, Infographics!
Myoyoung Kim: Visual Storytelling, Infographics!Myoyoung Kim: Visual Storytelling, Infographics!
Myoyoung Kim: Visual Storytelling, Infographics!Michael Shilman
 
Ignite Seoul: Machine Learning
Ignite Seoul: Machine LearningIgnite Seoul: Machine Learning
Ignite Seoul: Machine LearningMichael Shilman
 
Collective Intelligence Lecture 1: Introduction
Collective Intelligence Lecture 1: IntroductionCollective Intelligence Lecture 1: Introduction
Collective Intelligence Lecture 1: IntroductionMichael Shilman
 

More from Michael Shilman (8)

Iterative Prototyping
Iterative PrototypingIterative Prototyping
Iterative Prototyping
 
Personal Desire / Design Fiction
Personal Desire / Design FictionPersonal Desire / Design Fiction
Personal Desire / Design Fiction
 
Data Design
Data DesignData Design
Data Design
 
Data Mining
Data MiningData Mining
Data Mining
 
Myoyoung Kim: Visual Storytelling, Infographics!
Myoyoung Kim: Visual Storytelling, Infographics!Myoyoung Kim: Visual Storytelling, Infographics!
Myoyoung Kim: Visual Storytelling, Infographics!
 
Class, where are we?
Class, where are we?Class, where are we?
Class, where are we?
 
Ignite Seoul: Machine Learning
Ignite Seoul: Machine LearningIgnite Seoul: Machine Learning
Ignite Seoul: Machine Learning
 
Collective Intelligence Lecture 1: Introduction
Collective Intelligence Lecture 1: IntroductionCollective Intelligence Lecture 1: Introduction
Collective Intelligence Lecture 1: Introduction
 

Recently uploaded

Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Cheap Rate Call Girls In Noida Sector 62 Metro 959961乂3876
Cheap Rate Call Girls In Noida Sector 62 Metro 959961乂3876Cheap Rate Call Girls In Noida Sector 62 Metro 959961乂3876
Cheap Rate Call Girls In Noida Sector 62 Metro 959961乂3876dlhescort
 
Eluru Call Girls Service ☎ ️93326-06886 ❤️‍🔥 Enjoy 24/7 Escort Service
Eluru Call Girls Service ☎ ️93326-06886 ❤️‍🔥 Enjoy 24/7 Escort ServiceEluru Call Girls Service ☎ ️93326-06886 ❤️‍🔥 Enjoy 24/7 Escort Service
Eluru Call Girls Service ☎ ️93326-06886 ❤️‍🔥 Enjoy 24/7 Escort ServiceDamini Dixit
 
How to Get Started in Social Media for Art League City
How to Get Started in Social Media for Art League CityHow to Get Started in Social Media for Art League City
How to Get Started in Social Media for Art League CityEric T. Tung
 
Value Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and painsValue Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and painsP&CO
 
Marel Q1 2024 Investor Presentation from May 8, 2024
Marel Q1 2024 Investor Presentation from May 8, 2024Marel Q1 2024 Investor Presentation from May 8, 2024
Marel Q1 2024 Investor Presentation from May 8, 2024Marel
 
Organizational Transformation Lead with Culture
Organizational Transformation Lead with CultureOrganizational Transformation Lead with Culture
Organizational Transformation Lead with CultureSeta Wicaksana
 
Malegaon Call Girls Service ☎ ️82500–77686 ☎️ Enjoy 24/7 Escort Service
Malegaon Call Girls Service ☎ ️82500–77686 ☎️ Enjoy 24/7 Escort ServiceMalegaon Call Girls Service ☎ ️82500–77686 ☎️ Enjoy 24/7 Escort Service
Malegaon Call Girls Service ☎ ️82500–77686 ☎️ Enjoy 24/7 Escort ServiceDamini Dixit
 
Business Model Canvas (BMC)- A new venture concept
Business Model Canvas (BMC)-  A new venture conceptBusiness Model Canvas (BMC)-  A new venture concept
Business Model Canvas (BMC)- A new venture conceptP&CO
 
Call Now ☎️🔝 9332606886🔝 Call Girls ❤ Service In Bhilwara Female Escorts Serv...
Call Now ☎️🔝 9332606886🔝 Call Girls ❤ Service In Bhilwara Female Escorts Serv...Call Now ☎️🔝 9332606886🔝 Call Girls ❤ Service In Bhilwara Female Escorts Serv...
Call Now ☎️🔝 9332606886🔝 Call Girls ❤ Service In Bhilwara Female Escorts Serv...Anamikakaur10
 
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756dollysharma2066
 
👉Chandigarh Call Girls 👉9878799926👉Just Call👉Chandigarh Call Girl In Chandiga...
👉Chandigarh Call Girls 👉9878799926👉Just Call👉Chandigarh Call Girl In Chandiga...👉Chandigarh Call Girls 👉9878799926👉Just Call👉Chandigarh Call Girl In Chandiga...
👉Chandigarh Call Girls 👉9878799926👉Just Call👉Chandigarh Call Girl In Chandiga...rajveerescorts2022
 
Chandigarh Escorts Service 📞8868886958📞 Just📲 Call Nihal Chandigarh Call Girl...
Chandigarh Escorts Service 📞8868886958📞 Just📲 Call Nihal Chandigarh Call Girl...Chandigarh Escorts Service 📞8868886958📞 Just📲 Call Nihal Chandigarh Call Girl...
Chandigarh Escorts Service 📞8868886958📞 Just📲 Call Nihal Chandigarh Call Girl...Sheetaleventcompany
 
Call Girls In Majnu Ka Tilla 959961~3876 Shot 2000 Night 8000
Call Girls In Majnu Ka Tilla 959961~3876 Shot 2000 Night 8000Call Girls In Majnu Ka Tilla 959961~3876 Shot 2000 Night 8000
Call Girls In Majnu Ka Tilla 959961~3876 Shot 2000 Night 8000dlhescort
 
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service AvailableCall Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service AvailableSeo
 
Falcon Invoice Discounting: Empowering Your Business Growth
Falcon Invoice Discounting: Empowering Your Business GrowthFalcon Invoice Discounting: Empowering Your Business Growth
Falcon Invoice Discounting: Empowering Your Business GrowthFalcon investment
 
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...amitlee9823
 
SEO Case Study: How I Increased SEO Traffic & Ranking by 50-60% in 6 Months
SEO Case Study: How I Increased SEO Traffic & Ranking by 50-60%  in 6 MonthsSEO Case Study: How I Increased SEO Traffic & Ranking by 50-60%  in 6 Months
SEO Case Study: How I Increased SEO Traffic & Ranking by 50-60% in 6 MonthsIndeedSEO
 

Recently uploaded (20)

unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabiunwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
 
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Cheap Rate Call Girls In Noida Sector 62 Metro 959961乂3876
Cheap Rate Call Girls In Noida Sector 62 Metro 959961乂3876Cheap Rate Call Girls In Noida Sector 62 Metro 959961乂3876
Cheap Rate Call Girls In Noida Sector 62 Metro 959961乂3876
 
Eluru Call Girls Service ☎ ️93326-06886 ❤️‍🔥 Enjoy 24/7 Escort Service
Eluru Call Girls Service ☎ ️93326-06886 ❤️‍🔥 Enjoy 24/7 Escort ServiceEluru Call Girls Service ☎ ️93326-06886 ❤️‍🔥 Enjoy 24/7 Escort Service
Eluru Call Girls Service ☎ ️93326-06886 ❤️‍🔥 Enjoy 24/7 Escort Service
 
How to Get Started in Social Media for Art League City
How to Get Started in Social Media for Art League CityHow to Get Started in Social Media for Art League City
How to Get Started in Social Media for Art League City
 
Value Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and painsValue Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and pains
 
Marel Q1 2024 Investor Presentation from May 8, 2024
Marel Q1 2024 Investor Presentation from May 8, 2024Marel Q1 2024 Investor Presentation from May 8, 2024
Marel Q1 2024 Investor Presentation from May 8, 2024
 
Organizational Transformation Lead with Culture
Organizational Transformation Lead with CultureOrganizational Transformation Lead with Culture
Organizational Transformation Lead with Culture
 
Falcon Invoice Discounting platform in india
Falcon Invoice Discounting platform in indiaFalcon Invoice Discounting platform in india
Falcon Invoice Discounting platform in india
 
Malegaon Call Girls Service ☎ ️82500–77686 ☎️ Enjoy 24/7 Escort Service
Malegaon Call Girls Service ☎ ️82500–77686 ☎️ Enjoy 24/7 Escort ServiceMalegaon Call Girls Service ☎ ️82500–77686 ☎️ Enjoy 24/7 Escort Service
Malegaon Call Girls Service ☎ ️82500–77686 ☎️ Enjoy 24/7 Escort Service
 
Business Model Canvas (BMC)- A new venture concept
Business Model Canvas (BMC)-  A new venture conceptBusiness Model Canvas (BMC)-  A new venture concept
Business Model Canvas (BMC)- A new venture concept
 
Call Now ☎️🔝 9332606886🔝 Call Girls ❤ Service In Bhilwara Female Escorts Serv...
Call Now ☎️🔝 9332606886🔝 Call Girls ❤ Service In Bhilwara Female Escorts Serv...Call Now ☎️🔝 9332606886🔝 Call Girls ❤ Service In Bhilwara Female Escorts Serv...
Call Now ☎️🔝 9332606886🔝 Call Girls ❤ Service In Bhilwara Female Escorts Serv...
 
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756
 
👉Chandigarh Call Girls 👉9878799926👉Just Call👉Chandigarh Call Girl In Chandiga...
👉Chandigarh Call Girls 👉9878799926👉Just Call👉Chandigarh Call Girl In Chandiga...👉Chandigarh Call Girls 👉9878799926👉Just Call👉Chandigarh Call Girl In Chandiga...
👉Chandigarh Call Girls 👉9878799926👉Just Call👉Chandigarh Call Girl In Chandiga...
 
Chandigarh Escorts Service 📞8868886958📞 Just📲 Call Nihal Chandigarh Call Girl...
Chandigarh Escorts Service 📞8868886958📞 Just📲 Call Nihal Chandigarh Call Girl...Chandigarh Escorts Service 📞8868886958📞 Just📲 Call Nihal Chandigarh Call Girl...
Chandigarh Escorts Service 📞8868886958📞 Just📲 Call Nihal Chandigarh Call Girl...
 
Call Girls In Majnu Ka Tilla 959961~3876 Shot 2000 Night 8000
Call Girls In Majnu Ka Tilla 959961~3876 Shot 2000 Night 8000Call Girls In Majnu Ka Tilla 959961~3876 Shot 2000 Night 8000
Call Girls In Majnu Ka Tilla 959961~3876 Shot 2000 Night 8000
 
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service AvailableCall Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
 
Falcon Invoice Discounting: Empowering Your Business Growth
Falcon Invoice Discounting: Empowering Your Business GrowthFalcon Invoice Discounting: Empowering Your Business Growth
Falcon Invoice Discounting: Empowering Your Business Growth
 
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
 
SEO Case Study: How I Increased SEO Traffic & Ranking by 50-60% in 6 Months
SEO Case Study: How I Increased SEO Traffic & Ranking by 50-60%  in 6 MonthsSEO Case Study: How I Increased SEO Traffic & Ranking by 50-60%  in 6 Months
SEO Case Study: How I Increased SEO Traffic & Ranking by 50-60% in 6 Months
 

Seungwon Hwang: Entity Graph Mining and Matching

  • 1. Information & Database Systems Lab Entity Graph Mining and Matching Seung-won Hwang Associate Professor Department of Computer Science and Engineering POSTECH, Korea
  • 2. Mining Human Intelligence from the Web: Click Graph  Language-agnostic/data-intensive: e.g., arabic Corpus? Information & Database Systems Lab Are q1 and q2 similar? Are u3 and u4 similar?
  • 3. Mining at Finer Granularity: Named Entity (NE) Graph  Person name, Place name, Organization name, Product name  Newspapers, Web sites, TV programs, … Information & Database Systems Lab Apple MS tenure Co-founder jobs gates complicated Mac
  • 4. Case I: Matching names with twitter accounts [EDBT11] Information & Database Systems Lab
  • 5. Case II: Entity Translation [EMNLP10,CIKM11]  What are the features?  How are the features combined? (using translation as an application scenario) Information & Database Systems Lab NE NE NE NE NE NE NE NE NE NE NE NE NE NE NE NE English NE Chinese Corpus NE Corpus NE NE NE NE NE NE NE NE NE Ge=(Ve, Ee) Gc=(Vc, Ec)
  • 6. NE Translation  Goal  Finding a NE in source language into its NE in target language  Ex) “Obama” (English)  “奥巴马” (Chinese)  Resources: comparable corpora Information & Database Systems Lab NEE NEE Features Features Find!! NEE NEE Features Features Xinhua News Agency (English) NEE NEC NEE NEC NEC NEC NEE NEC Features Features NEC NEC NEE NEC Features Features Xinhua News Agency (Chinese)
  • 7. NE Translation Similarity Features  Entity Name Similarity (E): S.Wan [1], L. Haizhou [2], K. Knight [3]  Pronunciation similarity between named entities  Ex) “Obama” and “奥巴马” (pronounced Aobama) Information & Database Systems Lab  Entity Context Similarity (EC): M. Diab [4], H. Ji [5], K. Yu [6]  Contextual word similarity between named entities  Ex) The president (总统) Obama (奥巴马) “As president, Obama signed economic stimulus legislation …”  Relationship Similarity (R): G.-w.You [7]  Co-occurrence similarity between pairs of named entities  Ex) (“Jackie Chan”, “Bill Gates” ) vs. (“成龙”, “比尔·盖茨 ”)
  • 8. Motivation  Taxonomy Table Entity Relationship Using Entity Names E [1,2,3] R You [7] Information & Database Systems Lab Using Textual Context EC [4,5,6] ? Shao [8] Research questions:  Why RC is not used?  Can all four categories combined?
  • 9. In this paper…  We propose a new NE translation similarity feature  Relationship Context similarity (RC)  Contextual word similarity between named entities  Ex) pair (“Barack”, “Michelle”)  Spouse Information & Database Systems Lab  We propose new holistic approaches  Combining all E, EC, R, and RC  We validate our proposed approach using extensive experiments
  • 10. Our Framework  We abstract this problem as…  Graph Matching of two NE relationship graphs extracted from comparable corpora Information & Database Systems Lab Populate a decision matrix R, |Ve|-by-|Vc| matrix NE NE NE NE NE NE NE NE NE NE NE NE NE NE NE NE English NE Chinese Corpus NE Corpus NE NE NE NE NE NE NE NE NE Ge=(Ve, Ee) Gc=(Vc, Ec)
  • 11. Our Framework  Overview – 3 Steps  Initialization 奥巴马 成龙  Construct NE relationship graphs  Build an initial pairwise similarity matrix R0 Obama .99 .1 .2 Information & Database Systems Lab  Use Entity (E) and Entity Context (EC) similarities Jackie chan .1  Iterative reinforcement  Build a final pairwise similarity matrix R∞  Use Relationship (R) and Relationship Context (RC) similarities  Matching  Find 1:1 matching from R∞ 奥巴马 成龙  Build a binary hard decision matrix R* Obama .99 .1 .2 Jackie chan .99
  • 12. Initialization  Constructing NE relationship graphs G = (N, E)  Extract NEs using entity tagger for each document in each corpus  Regard NEs that appears more than δ times as Nodes  Connect two Nodes when they co-occur more than δ times Information & Database Systems Lab  Initializing R0  Computing entity similarity matrix SE  Use Edit-Distance (ED) between ‘ei’ and Pinyin representation of ‘cj’  Ex) ED(“Obama”, “奥巴马”) = ED(“Obama”, “Aobama”) E ED(ei , PYC j ) S ij 1 Len(ei ) Len( PYC j )
  • 13. Initialization  Initializing R0  Computing entity context similarity matrix SEC  Context word Information & Database Systems Lab ex) “As president, Obama signed economic stimulus legislation …”  Context window CW ( NE , d ) {wi l/2 , wi l/2 1 ,..., wi ( NE ),..., wi l/2 1 , wi l/2 }  Correlation between a NE and a context word : Log-odd ratios
  • 14. Initialization  Initializing R0  Computing entity context similarity matrix SEC  Projected Context Association Vector Information & Database Systems Lab Obama Score 奥巴马 Score … … … … President 0.9 … … … … 总统 0.85 … … … … Dictionary USA … 美 國 (President, 总统) … … president 统总
  • 15. Initialization  Initializing R0  Computing entity context similarity matrix SEC  Context Similarity between ‘ei’ and ‘cj’  Compute cosine similarity between two vectors Information & Database Systems Lab EC CAei CAc j S ij CAei CAc j  Merging SE and SEC  Min-Max normalization in range [0:1]  Merge Rij SijE SijEC
  • 16. Reinforcement  Intuition  Two NEs with a strong relationship  Co-occur frequently  have edge  Share similar context  have similar relationship context Information & Database Systems Lab NE NE context context X Y context context NE NE English NE Graph Chinese NE Graph 1. Align neighbors using relationship (R) and relationship context (RC) similarity 2. Update the similarity score
  • 17. Reinforcement  Iterative Approach Relationship Context (RC) Similarity between relation pair (i, u) and (j, v) Information & Database Systems Lab Relationship-based Similarity (R & RC) Entity-based Similarity (E & EC) t RC Ruv ( Siu , jv ) Rij 1 t (1 0 ) Rij t ( u ,v ) k B ( i , j , ) 2k Ordered set of aligned neighbor pairs of (i, j) at iteration t Relationship (R) Similarity of i’s neighbor u and j’s neighbor v
  • 18. Matching  Finding 1:1 matching using greedy algorithm  Steps Information & Database Systems Lab 1. Find a translation pair with the highest final similarity score 2. Select the pair and remove the corresponding row and column from R∞ 3. Repeat 1. and 2. until the similarity score < threshold R∞
  • 19. Experiments  Dataset  English Gigaword Corpus  Xinhua News Agency 2008.01~2008.12  100,746 news documents  Chinese Gigaword Corpus Information & Database Systems Lab  Xinhua News Agency 2008.01~2008.12  88,029 news documents  Approaches  EC : consider Entity context similarity feature only  E : consider Entity name similarity feature only  Shao (E+EC) : combine Entity name & Entity Context similarities  You (E+R) : combine Entity name & Relationship similarities  Ours  E+EC+R (when ϒ = 0)  E+EC+R+RC  Measure  Precision, Recall, and F1-score
  • 20. Experiments  Effectiveness of overall framework  500 person named entities  Set λ = 0.15  5-fold cross-validation for threshold parameter learning Information & Database Systems Lab  Other type of NE (100 Location named entities)
  • 21. Directions  Graph matching  Graph cleansing [VLDB11]  Scalable entity search Information & Database Systems Lab US Presidents Bill Clinton William J Clinton George W. Bush George H.W. Bush Dubya
  • 22. Thanks  Question? Information & Database Systems Lab Visit: www.postech.ac.kr/~swhwang for these papers