SlideShare uma empresa Scribd logo
1 de 14
Baixar para ler offline
Social Relation Based Scalable
 Semantic Search Refinement
  Yi Zeng1, Xu Ren1, Yulin Qin1,2, Ning Zhong1,3,
         Zhisheng Huang4, Yan Wang1

1. International WIC Institute, Beijing University of Technology, China
                   2. Carnegie Mellon University, USA
               3. Maebashi Institute of Technology, Japan
            4. Vrije University Amsterdam, the Netherlands




                                                                          1
Motivation
•   Vague/Incomplete queries over large scale semantic data
    (How to get more refined queries to reduce the size of the result set?).
•   Large scale semantic data vs most relevant data for a specific user

                  Diversity for different users in the
                 context of large scale semantic data


          User interests                        Network of friends,
                                                collaborators, etc.

        Interests based                         Search refinement
       search refinement                    through social relationship


               Group interests based search refinement

                                                                          2
Social Relations and Social Networks
             • Most of the social networks follow the power law distribution.
             • Using the FOAF vocabularies, the DBLP coauthor network is created.




    Fig. 1: Coauthor number distribution in     Fig. 2: log-log diagram of Figure 1.
    the SwetoDBLP dataset.

•    Approximate power law distribution         not many authors who have a lot of
     coauthors, and most of the authors are with very few coauthors.
•    Considering the scalability issue, when the number of authors expand
     rapidly, it will not hard to rebuild the coauthor network since most of the
     authors will just have a few links.
                                                                                       3
Search Refinement through
                      Social Relationship
Table 1: A partial result of the expert finding search task         Domain experts
“Artificial Intelligence authors”(User name: John McCarthy).           dataset


Satisfied Authors without     Satisfied Authors with
social relation refinement   social relation refinement             User URIs

  Carl Kesselman (312)       Hans W. Guesgen (117) *
 Thomas S. Huang (271)         Virginia Dignum (69) *        Coauthor Network
                                                                 dataset
  Edward A. Fox (269)          John McCarthy (65) *
    Lei Wang (250)              Aaron Sloman (36) *   Bridging two separate
 John Mylopoulos (245)         Carl Kesselman (312)   datasets together and help to
   Ewa Deelman (237)          Thomas S. Huang (271) refine the expert finding task.
           ...                            ...
In an enterprise setting, if the found experts have some previous
relationship with the employer, the cooperation may be smoother.
                                                                                     4
Social Network based Interest Retention
    Models for Search Refinement




                                          5
Obtaining the Retained Interests
 •  Are retained interests appeared more frequently than others?
  (Frequency) Total Interest : TI (i ) = ∑n m(i, j )
                                          j =1
 • Except for frequency, what else is important to correctly obtain retained
    interests?
    Forgetting mechanism in cognitive memory retention
   (exponential function model, power function model) [Anderson, Schooler 1991].

     (Frequency and Recency) Memory Retention:
                                    P = Ae−bT ; P = AT −b




Pictures from: [Schooler 1993] Schooler, L. J. & Anderson, J. R.: Recency and Context: An
Environmental Analysis of Memory. In Proceedings of the Fifteenth Annual Conference of the
Cognitive Science Society, pp. 889-894, 1993.
                                                                                             6
Obtaining the Retained Interests
•   (Frequency and Recency) Exponential Model for Interest Retention :
    EIR(i ) = ∑ j =1 m(i, j ) × Ae
                     n                   − bTi , j



•   (Frequency and Recency) Power Model for Interest Retention :

    PIR(i ) = ∑ j =1 m(i, j ) × ATi , j − b
                    n




    [Zeng 2009a] Cognitive Memory Retention Based Starting Point for Query Extension and
    Granular Selection, Yi Zeng, Haiyan Zhou, Ning Zhong, Yulin Qin, Shengfu Lu, Yiyu Yao, Yang
    Gao. In: Cognitive Memory Component (v1), LarKC deliverable 2-3-1, Coordinated by Jose
    Quesada and Yi Zeng, March 30, 2009.
    [Zeng 2009b] Yi Zeng, Yiyu Yao, Ning Zhong. DBLP-SSE: A DBLP Search Support Engine, In:
    Proceedings of the 2009 IEEE/WIC/ACM International Conference on Web Intelligence, IEEE
    Computer Society, Milan, Italy, September 15-18, 2009.
    [Maanen 2009] Leendert van Maanen, Julian N. Marewski.: Recommender Systems for
    Literature Selection: A Competition between Decision Making and Memory Models, CogSci 2009,
    July 31-August 1, 2009.

                                                                                                  7
Obtaining the Retained Interests
                                                                          •   To some extend, current
                                                                              interests are relevant to
                                                                              interest retention.
                                                                              Using the power law
                                                                              model, under A=0.855,
                                                                              and b=1.295, we
                                                                              selected all the authors
                                                                              whose publication
                                                                              numbers are above 100,
    Figure 7a: A comparative study of          Figure 7b: Difference on       and we predict their top 9
    total research interests from 1990 to      the contribution values        interests from 2000 to
    2008 and retained interests in 2009        from papers published in       2007 using interest
    (based on both the power law and           different years
    exponential law models)                                                   retention (1226 persons).
                                                                              49.54% of this samples
                                                                              can predict 3 out of 9
                                                                              interests.
                                              • We analyzed research Interest retention for all
                                              the 615,124 computer scientists based on the
                                              SwetoDBLP dataset. We released the “computer
                                              scientists’ research interest RDF dataset :
                                                                http://www.iwici.org/dblp-sse
Figure 7c: A Comparison of Total Interests and Interest Retentions http://wiki.larkc.eu/csri-rdf
of the author “Ricardo A. Baeza-Yates”. (Nov, 2009 from DBLP)                                      8
Network
                                                                                          Link
             Retained Interests Search                                               Search
                                                                                                                PageRank
          in a Social Environment
                                   Information    Retrieval                Web
                                                                         Web                  Carlos Castillo
Group Retained Interests :                       Query                       Content
                                                                                                                     Spam
• Diversity               Challenge                       Ricardo A. Baeza-Yates
• Consistency                                    Engine        Mining               Analysis Analysis           Detection

Group Retained Interest :                            Top 9 Retained          Top 9 Group Retained
                                                        Interests                  Interests
             ⎧1 (i ∈ RItop 9 )
             ⎪          p
                                                 Web              7.81       Search               35
 E (i, p ) = ⎨                 ,                 Search           5.59       Retrieval            30
             ⎪0 (i ∉ RI p )
                        top 9
             ⎩                                   Retrieval        3.19       Web                  28
 GIR (i ) = ∑ p =1 E (i, p ),
                n
                                                 Information      2.27       Information          26
                                                 Query            2.14       System               19
For most prolific authors in DBLP                Engine           2.10       Query                18
(publication number >50):
                                                 Minining         1.26       Analysis             14
5161 persons
                                                 Challenge        …          Text                 …
On average, 52.55% of an
individual’s retained interests are              Analysis         …          Model                …
consistent with his/her group                    Top 9 interests retention of a user and his group
                                                 interests retention. (Ricardo A. Baeza-Yates,
retained interests.                              based on May 2008 version of SwetoDBLP).                           9
Search Refinement by Interests
               from Different Perspectives
•   Vague/incomplete queries may produce too many results that the
    users have to wade through.

•   Research interests may be very related with search tasks.

•    Research interests can be evaluated from various perspectives.
    (1) Total Interests;
    (2) Retained Interests;
    (3) Co-author Group retained interests;




                                                                      10
Refinement with Retained interests,
     group retained interests

                             8 requests to DBLP authors
                                 were sent out.
                             7 replied.

                             Participants 7 DBLP authors:
                             • Preference order 100% :
                                 List 2, List 3    List 1
                             •   Preference order 100% :
                                 List 2 ≈ List 3
                             •   Preference order 83.3% :
                                 List 2 > List 3    List 1
                             •   Preference order 16.7% :
                                 List 3 > List 2    List 1




                                                             11
Future Research




                  12
Semantic Similarity
                  ---- Obtaining More Accurate Interest Descriptions and
                             Observations of Interest Dynamics
                                                                         Network
                                                               Link
                                                        Search
                                    Search
                                                                                 PageRank       search      retrieval   0.645
                                                                                                search      query       0.552
    Information    Retrieval                   Web
                                             Web                  Carlos Castillo               search      pagerank    0.813
                  Query                          Content
                                                                                      Spam      retrieval   query       0.467
   Challenge                Ricardo A. Baeza-Yates                                              retrieval   pagerank    0.293
                                                       Analysis Analysis            Detection
                  Engine         Mining
                                                                                                query       pagerank    0.098
Figure 14. Consistent interests without consideration of semantic
similarity.                                                                                     logic       reasoning   0.667
                                                        Network
                                                           Link
                                                                                                logic       inference   0.606
                                                      Search
                                 Search                                          PageRank       reasoning   inference   0.909
                Retrieval                                                                       ontology    OWL         0.805
 Information                                Web
                                          Web                  Carlos Castillo                  Table . Some examples on
        Query                                 Content                                           semantic similarities based on
                                                                                      Spam
Challenge                                                                                       Normalized Google Distance.
                        Ricardo A. Baeza-Yates
                                                     Analysis Analysis           Detection
               Engine          Mining
Figure 15. Consistent interests with consideration of semantic
similarity.                                                                                                             13
Thank You!




             14

Mais conteúdo relacionado

Semelhante a Social Relation Based Scalable Semantic Search Refinement

Research Interests : Their Dynamics, Structures and Applications in Personali...
Research Interests : Their Dynamics, Structures and Applications in Personali...Research Interests : Their Dynamics, Structures and Applications in Personali...
Research Interests : Their Dynamics, Structures and Applications in Personali...Yi Zeng
 
DBLP-SSE: A DBLP Search Support Engine
DBLP-SSE: A DBLP Search Support EngineDBLP-SSE: A DBLP Search Support Engine
DBLP-SSE: A DBLP Search Support EngineYi Zeng
 
The Human, the Eye and the Brain : Unifying Relevance Feedback for User Mode...
The Human, the Eye and the Brain : Unifying Relevance Feedback for  User Mode...The Human, the Eye and the Brain : Unifying Relevance Feedback for  User Mode...
The Human, the Eye and the Brain : Unifying Relevance Feedback for User Mode...Sampath Jayarathna
 
Epistemic networks for Epistemic Commitments
Epistemic networks for Epistemic CommitmentsEpistemic networks for Epistemic Commitments
Epistemic networks for Epistemic CommitmentsSimon Knight
 
Community Analysis of Deep Networks (poster)
Community Analysis of Deep Networks (poster)Community Analysis of Deep Networks (poster)
Community Analysis of Deep Networks (poster)Behrang Mehrparvar
 
Toward Personalized Peer-to-Peer Top-k Processing
Toward Personalized Peer-to-Peer Top-k ProcessingToward Personalized Peer-to-Peer Top-k Processing
Toward Personalized Peer-to-Peer Top-k Processingasapteam
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for ScienceIan Foster
 
Research Data Sharing LERU
Research Data Sharing LERU Research Data Sharing LERU
Research Data Sharing LERU LIBER Europe
 
Generating domain specific sentiment lexicons using the Web Directory
Generating domain specific sentiment lexicons using the Web Directory Generating domain specific sentiment lexicons using the Web Directory
Generating domain specific sentiment lexicons using the Web Directory acijjournal
 
Jana Diesner, "Words and Networks: Considering the Content of Text Data for N...
Jana Diesner, "Words and Networks: Considering the Content of Text Data for N...Jana Diesner, "Words and Networks: Considering the Content of Text Data for N...
Jana Diesner, "Words and Networks: Considering the Content of Text Data for N...summersocialwebshop
 
02 Network Data Collection
02 Network Data Collection02 Network Data Collection
02 Network Data Collectiondnac
 
Extracting Semantic User Networks from Informal Communication Exchanges
Extracting Semantic User Networks from Informal Communication ExchangesExtracting Semantic User Networks from Informal Communication Exchanges
Extracting Semantic User Networks from Informal Communication ExchangesSuvodeep Mazumdar
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Scienceresearchinventy
 

Semelhante a Social Relation Based Scalable Semantic Search Refinement (20)

Research Interests : Their Dynamics, Structures and Applications in Personali...
Research Interests : Their Dynamics, Structures and Applications in Personali...Research Interests : Their Dynamics, Structures and Applications in Personali...
Research Interests : Their Dynamics, Structures and Applications in Personali...
 
DBLP-SSE: A DBLP Search Support Engine
DBLP-SSE: A DBLP Search Support EngineDBLP-SSE: A DBLP Search Support Engine
DBLP-SSE: A DBLP Search Support Engine
 
The Human, the Eye and the Brain : Unifying Relevance Feedback for User Mode...
The Human, the Eye and the Brain : Unifying Relevance Feedback for  User Mode...The Human, the Eye and the Brain : Unifying Relevance Feedback for  User Mode...
The Human, the Eye and the Brain : Unifying Relevance Feedback for User Mode...
 
Epistemic networks for Epistemic Commitments
Epistemic networks for Epistemic CommitmentsEpistemic networks for Epistemic Commitments
Epistemic networks for Epistemic Commitments
 
Community Analysis of Deep Networks (poster)
Community Analysis of Deep Networks (poster)Community Analysis of Deep Networks (poster)
Community Analysis of Deep Networks (poster)
 
Toward Personalized Peer-to-Peer Top-k Processing
Toward Personalized Peer-to-Peer Top-k ProcessingToward Personalized Peer-to-Peer Top-k Processing
Toward Personalized Peer-to-Peer Top-k Processing
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for Science
 
Ap26261267
Ap26261267Ap26261267
Ap26261267
 
Research Data Sharing LERU
Research Data Sharing LERU Research Data Sharing LERU
Research Data Sharing LERU
 
119 128
119 128119 128
119 128
 
119 128
119 128119 128
119 128
 
Generating domain specific sentiment lexicons using the Web Directory
Generating domain specific sentiment lexicons using the Web Directory Generating domain specific sentiment lexicons using the Web Directory
Generating domain specific sentiment lexicons using the Web Directory
 
Jana Diesner, "Words and Networks: Considering the Content of Text Data for N...
Jana Diesner, "Words and Networks: Considering the Content of Text Data for N...Jana Diesner, "Words and Networks: Considering the Content of Text Data for N...
Jana Diesner, "Words and Networks: Considering the Content of Text Data for N...
 
Neuroscience as networked science
Neuroscience as networked scienceNeuroscience as networked science
Neuroscience as networked science
 
02 Network Data Collection
02 Network Data Collection02 Network Data Collection
02 Network Data Collection
 
02 Network Data Collection (2016)
02 Network Data Collection (2016)02 Network Data Collection (2016)
02 Network Data Collection (2016)
 
PhD Defense
PhD DefensePhD Defense
PhD Defense
 
Extracting Semantic User Networks from Informal Communication Exchanges
Extracting Semantic User Networks from Informal Communication ExchangesExtracting Semantic User Networks from Informal Communication Exchanges
Extracting Semantic User Networks from Informal Communication Exchanges
 
Extracting Semantic
Extracting Semantic Extracting Semantic
Extracting Semantic
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Science
 

Último

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 

Último (20)

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 

Social Relation Based Scalable Semantic Search Refinement

  • 1. Social Relation Based Scalable Semantic Search Refinement Yi Zeng1, Xu Ren1, Yulin Qin1,2, Ning Zhong1,3, Zhisheng Huang4, Yan Wang1 1. International WIC Institute, Beijing University of Technology, China 2. Carnegie Mellon University, USA 3. Maebashi Institute of Technology, Japan 4. Vrije University Amsterdam, the Netherlands 1
  • 2. Motivation • Vague/Incomplete queries over large scale semantic data (How to get more refined queries to reduce the size of the result set?). • Large scale semantic data vs most relevant data for a specific user Diversity for different users in the context of large scale semantic data User interests Network of friends, collaborators, etc. Interests based Search refinement search refinement through social relationship Group interests based search refinement 2
  • 3. Social Relations and Social Networks • Most of the social networks follow the power law distribution. • Using the FOAF vocabularies, the DBLP coauthor network is created. Fig. 1: Coauthor number distribution in Fig. 2: log-log diagram of Figure 1. the SwetoDBLP dataset. • Approximate power law distribution not many authors who have a lot of coauthors, and most of the authors are with very few coauthors. • Considering the scalability issue, when the number of authors expand rapidly, it will not hard to rebuild the coauthor network since most of the authors will just have a few links. 3
  • 4. Search Refinement through Social Relationship Table 1: A partial result of the expert finding search task Domain experts “Artificial Intelligence authors”(User name: John McCarthy). dataset Satisfied Authors without Satisfied Authors with social relation refinement social relation refinement User URIs Carl Kesselman (312) Hans W. Guesgen (117) * Thomas S. Huang (271) Virginia Dignum (69) * Coauthor Network dataset Edward A. Fox (269) John McCarthy (65) * Lei Wang (250) Aaron Sloman (36) * Bridging two separate John Mylopoulos (245) Carl Kesselman (312) datasets together and help to Ewa Deelman (237) Thomas S. Huang (271) refine the expert finding task. ... ... In an enterprise setting, if the found experts have some previous relationship with the employer, the cooperation may be smoother. 4
  • 5. Social Network based Interest Retention Models for Search Refinement 5
  • 6. Obtaining the Retained Interests • Are retained interests appeared more frequently than others? (Frequency) Total Interest : TI (i ) = ∑n m(i, j ) j =1 • Except for frequency, what else is important to correctly obtain retained interests? Forgetting mechanism in cognitive memory retention (exponential function model, power function model) [Anderson, Schooler 1991]. (Frequency and Recency) Memory Retention: P = Ae−bT ; P = AT −b Pictures from: [Schooler 1993] Schooler, L. J. & Anderson, J. R.: Recency and Context: An Environmental Analysis of Memory. In Proceedings of the Fifteenth Annual Conference of the Cognitive Science Society, pp. 889-894, 1993. 6
  • 7. Obtaining the Retained Interests • (Frequency and Recency) Exponential Model for Interest Retention : EIR(i ) = ∑ j =1 m(i, j ) × Ae n − bTi , j • (Frequency and Recency) Power Model for Interest Retention : PIR(i ) = ∑ j =1 m(i, j ) × ATi , j − b n [Zeng 2009a] Cognitive Memory Retention Based Starting Point for Query Extension and Granular Selection, Yi Zeng, Haiyan Zhou, Ning Zhong, Yulin Qin, Shengfu Lu, Yiyu Yao, Yang Gao. In: Cognitive Memory Component (v1), LarKC deliverable 2-3-1, Coordinated by Jose Quesada and Yi Zeng, March 30, 2009. [Zeng 2009b] Yi Zeng, Yiyu Yao, Ning Zhong. DBLP-SSE: A DBLP Search Support Engine, In: Proceedings of the 2009 IEEE/WIC/ACM International Conference on Web Intelligence, IEEE Computer Society, Milan, Italy, September 15-18, 2009. [Maanen 2009] Leendert van Maanen, Julian N. Marewski.: Recommender Systems for Literature Selection: A Competition between Decision Making and Memory Models, CogSci 2009, July 31-August 1, 2009. 7
  • 8. Obtaining the Retained Interests • To some extend, current interests are relevant to interest retention. Using the power law model, under A=0.855, and b=1.295, we selected all the authors whose publication numbers are above 100, Figure 7a: A comparative study of Figure 7b: Difference on and we predict their top 9 total research interests from 1990 to the contribution values interests from 2000 to 2008 and retained interests in 2009 from papers published in 2007 using interest (based on both the power law and different years exponential law models) retention (1226 persons). 49.54% of this samples can predict 3 out of 9 interests. • We analyzed research Interest retention for all the 615,124 computer scientists based on the SwetoDBLP dataset. We released the “computer scientists’ research interest RDF dataset : http://www.iwici.org/dblp-sse Figure 7c: A Comparison of Total Interests and Interest Retentions http://wiki.larkc.eu/csri-rdf of the author “Ricardo A. Baeza-Yates”. (Nov, 2009 from DBLP) 8
  • 9. Network Link Retained Interests Search Search PageRank in a Social Environment Information Retrieval Web Web Carlos Castillo Group Retained Interests : Query Content Spam • Diversity Challenge Ricardo A. Baeza-Yates • Consistency Engine Mining Analysis Analysis Detection Group Retained Interest : Top 9 Retained Top 9 Group Retained Interests Interests ⎧1 (i ∈ RItop 9 ) ⎪ p Web 7.81 Search 35 E (i, p ) = ⎨ , Search 5.59 Retrieval 30 ⎪0 (i ∉ RI p ) top 9 ⎩ Retrieval 3.19 Web 28 GIR (i ) = ∑ p =1 E (i, p ), n Information 2.27 Information 26 Query 2.14 System 19 For most prolific authors in DBLP Engine 2.10 Query 18 (publication number >50): Minining 1.26 Analysis 14 5161 persons Challenge … Text … On average, 52.55% of an individual’s retained interests are Analysis … Model … consistent with his/her group Top 9 interests retention of a user and his group interests retention. (Ricardo A. Baeza-Yates, retained interests. based on May 2008 version of SwetoDBLP). 9
  • 10. Search Refinement by Interests from Different Perspectives • Vague/incomplete queries may produce too many results that the users have to wade through. • Research interests may be very related with search tasks. • Research interests can be evaluated from various perspectives. (1) Total Interests; (2) Retained Interests; (3) Co-author Group retained interests; 10
  • 11. Refinement with Retained interests, group retained interests 8 requests to DBLP authors were sent out. 7 replied. Participants 7 DBLP authors: • Preference order 100% : List 2, List 3 List 1 • Preference order 100% : List 2 ≈ List 3 • Preference order 83.3% : List 2 > List 3 List 1 • Preference order 16.7% : List 3 > List 2 List 1 11
  • 13. Semantic Similarity ---- Obtaining More Accurate Interest Descriptions and Observations of Interest Dynamics Network Link Search Search PageRank search retrieval 0.645 search query 0.552 Information Retrieval Web Web Carlos Castillo search pagerank 0.813 Query Content Spam retrieval query 0.467 Challenge Ricardo A. Baeza-Yates retrieval pagerank 0.293 Analysis Analysis Detection Engine Mining query pagerank 0.098 Figure 14. Consistent interests without consideration of semantic similarity. logic reasoning 0.667 Network Link logic inference 0.606 Search Search PageRank reasoning inference 0.909 Retrieval ontology OWL 0.805 Information Web Web Carlos Castillo Table . Some examples on Query Content semantic similarities based on Spam Challenge Normalized Google Distance. Ricardo A. Baeza-Yates Analysis Analysis Detection Engine Mining Figure 15. Consistent interests with consideration of semantic similarity. 13