SlideShare uma empresa Scribd logo
1 de 22
Personalizing Web Search using  Long Term Browsing History Nicolaas Matthijs (University of Cambridge, UK) Filip Radlinski (Microsoft, Vancouver) WSDM 2011   10/02/2011
What is personalized web search ?
What is personalized web search ? ,[object Object],= Personalized web search
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Related Work
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Goal
Search Personalization Process ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
User Profile Extraction ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
User Profile Extraction ,[object Object],[object Object],[object Object],[object Object],[object Object]
User Profile Extraction ,[object Object],[object Object],= User Profile: list of terms and term weights ,[object Object],[object Object]
Search Personalization Process ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Result re-ranking ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Evaluation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Evaluation: Capturing Data
Step 1: Offline Relevance Judgments ,[object Object],[object Object],[object Object],[object Object],[object Object]
Step 1: Results ,[object Object],[object Object],[object Object],[object Object]
Step 1: Results ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Step 2: Online Interleaved Evaluation ,[object Object],[object Object],[object Object],[object Object]
Step 2: Online Interleaved Evaluation ,[object Object],Original ranking (Google) Personalized ranking Personalized ranking 1. Infrared - Wikipedia http://wikipedia.org/infrared 2. IRTech - Infrared technologies http://www.irtech.org 3. International Rectifier -  Stock Quotes http://finance.yahoo.co.uk/IRE 4. SIGIR - New York Conference http://www.sigir.org 5. About Us - International Rectifier http://www.inrect.com   1. SIGIR - New York Conference http://www.sigir.org 2. Information Retrieval - Wikipedia http://wikipedia.org/ir 3. IRTech - Infrared technologies http://www.irtech.org 4. Infrared - Wikipedia http://wikipedia.org/infrared 5. About Us - International Rectifier http://www.inrect.com P O O 1. SIGIR - New York Conference http://www.sigir.org (P) 2. Infrared - Wikipedia http://wikipedia.org/infrared (O) 3. IRTech - Infrared technologies http://www.irtech.org (O) 4. Information Retrieval -  Wikipedia http://wikipedia.org/ir (P) 5. International Rectifier -  Stock Quotes http://finance.yahoo.co.uk/IRE (O) Interleaved Ranking
Results ,[object Object],[object Object],[object Object],[object Object]
Future Work ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Conclusion ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Questions

Mais conteúdo relacionado

Mais procurados

Mais procurados (7)

A Hybrid Approach for Personalized Recommender System Using Weighted TFIDF on...
A Hybrid Approach for Personalized Recommender System Using Weighted TFIDF on...A Hybrid Approach for Personalized Recommender System Using Weighted TFIDF on...
A Hybrid Approach for Personalized Recommender System Using Weighted TFIDF on...
 
`A Survey on approaches of Web Mining in Varied Areas
`A Survey on approaches of Web Mining in Varied Areas`A Survey on approaches of Web Mining in Varied Areas
`A Survey on approaches of Web Mining in Varied Areas
 
an empirical performance evaluation of relational keyword search techniques
an empirical performance evaluation of relational keyword search techniquesan empirical performance evaluation of relational keyword search techniques
an empirical performance evaluation of relational keyword search techniques
 
What Do We Know About IPL Users?
What Do We Know About IPL Users?What Do We Know About IPL Users?
What Do We Know About IPL Users?
 
COST-SENSITIVE TOPICAL DATA ACQUISITION FROM THE WEB
COST-SENSITIVE TOPICAL DATA ACQUISITION FROM THE WEBCOST-SENSITIVE TOPICAL DATA ACQUISITION FROM THE WEB
COST-SENSITIVE TOPICAL DATA ACQUISITION FROM THE WEB
 
Designing Guidelines for Visual Analytics System to Augment Organizational An...
Designing Guidelines for Visual Analytics System to Augment Organizational An...Designing Guidelines for Visual Analytics System to Augment Organizational An...
Designing Guidelines for Visual Analytics System to Augment Organizational An...
 
Presentation federated search
Presentation federated searchPresentation federated search
Presentation federated search
 

Destaque (6)

Responsive presentation
Responsive presentationResponsive presentation
Responsive presentation
 
Opera Mini, Mobile Browsing India By Sagar
Opera Mini, Mobile Browsing India By SagarOpera Mini, Mobile Browsing India By Sagar
Opera Mini, Mobile Browsing India By Sagar
 
Apache Mahout Tutorial - Recommendation - 2013/2014
Apache Mahout Tutorial - Recommendation - 2013/2014 Apache Mahout Tutorial - Recommendation - 2013/2014
Apache Mahout Tutorial - Recommendation - 2013/2014
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Ppt on serivcing in hotel
Ppt on serivcing in hotelPpt on serivcing in hotel
Ppt on serivcing in hotel
 
Recommendation system
Recommendation system Recommendation system
Recommendation system
 

Semelhante a WSDM 2011 - Nicolaas Matthijs and Filip Radlinski

Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
inventionjournals
 
Summary of data citation synthesis activity & Review
Summary of data citation synthesis activity & ReviewSummary of data citation synthesis activity & Review
Summary of data citation synthesis activity & Review
Micah Altman
 
EDRM LegalTech NY 2009 Luncheon Presentation
EDRM LegalTech NY 2009 Luncheon PresentationEDRM LegalTech NY 2009 Luncheon Presentation
EDRM LegalTech NY 2009 Luncheon Presentation
John Wang
 
Systems Analysis And Design 2
Systems Analysis And Design 2Systems Analysis And Design 2
Systems Analysis And Design 2
MISY
 

Semelhante a WSDM 2011 - Nicolaas Matthijs and Filip Radlinski (20)

Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
 
IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS A scientometric analysis of cloud c...
IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS A scientometric analysis of cloud c...IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS A scientometric analysis of cloud c...
IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS A scientometric analysis of cloud c...
 
Query Recommendation by using Collaborative Filtering Approach
Query Recommendation by using Collaborative Filtering ApproachQuery Recommendation by using Collaborative Filtering Approach
Query Recommendation by using Collaborative Filtering Approach
 
Search Me: Designing Information Retrieval Experiences
Search Me: Designing Information Retrieval ExperiencesSearch Me: Designing Information Retrieval Experiences
Search Me: Designing Information Retrieval Experiences
 
IRJET- Text-based Domain and Image Categorization of Google Search Engine usi...
IRJET- Text-based Domain and Image Categorization of Google Search Engine usi...IRJET- Text-based Domain and Image Categorization of Google Search Engine usi...
IRJET- Text-based Domain and Image Categorization of Google Search Engine usi...
 
Sweeny ux-seo om-cap 2014_v3
Sweeny ux-seo om-cap 2014_v3Sweeny ux-seo om-cap 2014_v3
Sweeny ux-seo om-cap 2014_v3
 
Evaluation of eLearning
Evaluation of eLearningEvaluation of eLearning
Evaluation of eLearning
 
MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...
MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...
MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...
 
Summary of data citation synthesis activity & Review
Summary of data citation synthesis activity & ReviewSummary of data citation synthesis activity & Review
Summary of data citation synthesis activity & Review
 
Classification of search_engine
Classification of search_engineClassification of search_engine
Classification of search_engine
 
SharePoint Jumpstart #2 Making Basic SharePoint Search Work
SharePoint Jumpstart #2 Making Basic SharePoint Search WorkSharePoint Jumpstart #2 Making Basic SharePoint Search Work
SharePoint Jumpstart #2 Making Basic SharePoint Search Work
 
Pratical Deep Dive into the Semantic Web - #smconnect
Pratical Deep Dive into the Semantic Web - #smconnectPratical Deep Dive into the Semantic Web - #smconnect
Pratical Deep Dive into the Semantic Web - #smconnect
 
IRJET- A Novel Technique for Inferring User Search using Feedback Sessions
IRJET- A Novel Technique for Inferring User Search using Feedback SessionsIRJET- A Novel Technique for Inferring User Search using Feedback Sessions
IRJET- A Novel Technique for Inferring User Search using Feedback Sessions
 
Focus
FocusFocus
Focus
 
EDRM LegalTech NY 2009 Luncheon Presentation
EDRM LegalTech NY 2009 Luncheon PresentationEDRM LegalTech NY 2009 Luncheon Presentation
EDRM LegalTech NY 2009 Luncheon Presentation
 
Modelling Time-aware Search Tasks for Search Personalisation
Modelling Time-aware Search Tasks for Search PersonalisationModelling Time-aware Search Tasks for Search Personalisation
Modelling Time-aware Search Tasks for Search Personalisation
 
Projection Multi Scale Hashing Keyword Search in Multidimensional Datasets
Projection Multi Scale Hashing Keyword Search in Multidimensional DatasetsProjection Multi Scale Hashing Keyword Search in Multidimensional Datasets
Projection Multi Scale Hashing Keyword Search in Multidimensional Datasets
 
Making IA Real: Planning an Information Architecture Strategy
Making IA Real: Planning an Information Architecture StrategyMaking IA Real: Planning an Information Architecture Strategy
Making IA Real: Planning an Information Architecture Strategy
 
Systems Analysis And Design 2
Systems Analysis And Design 2Systems Analysis And Design 2
Systems Analysis And Design 2
 
Enhancing Relevancy & User Experience with #SharePoint Search sps-philly 2015
Enhancing Relevancy & User Experience with #SharePoint Search   sps-philly 2015Enhancing Relevancy & User Experience with #SharePoint Search   sps-philly 2015
Enhancing Relevancy & User Experience with #SharePoint Search sps-philly 2015
 

Mais de Nicolaas Matthijs

Apereo OAE - State of the project - Open Apereo 2015
Apereo OAE - State of the project - Open Apereo 2015Apereo OAE - State of the project - Open Apereo 2015
Apereo OAE - State of the project - Open Apereo 2015
Nicolaas Matthijs
 
Apereo OAE - State of the project
Apereo OAE - State of the projectApereo OAE - State of the project
Apereo OAE - State of the project
Nicolaas Matthijs
 

Mais de Nicolaas Matthijs (10)

Apereo OAE - State of the project - Open Apereo 2015
Apereo OAE - State of the project - Open Apereo 2015Apereo OAE - State of the project - Open Apereo 2015
Apereo OAE - State of the project - Open Apereo 2015
 
Apereo OAE - Architectural overview
Apereo OAE - Architectural overviewApereo OAE - Architectural overview
Apereo OAE - Architectural overview
 
Apereo OAE - State of the project
Apereo OAE - State of the projectApereo OAE - State of the project
Apereo OAE - State of the project
 
Apereo OAE - Bootcamp
Apereo OAE - BootcampApereo OAE - Bootcamp
Apereo OAE - Bootcamp
 
Apereo Mexico 2014 - Apereo OAE - State of the project
Apereo Mexico 2014 - Apereo OAE - State of the projectApereo Mexico 2014 - Apereo OAE - State of the project
Apereo Mexico 2014 - Apereo OAE - State of the project
 
Apereo Europe - Apereo OAE
Apereo Europe - Apereo OAEApereo Europe - Apereo OAE
Apereo Europe - Apereo OAE
 
ESUP Days - Apereo OAE
ESUP Days - Apereo OAEESUP Days - Apereo OAE
ESUP Days - Apereo OAE
 
Apereo OAE - Architectural overview
Apereo OAE - Architectural overviewApereo OAE - Architectural overview
Apereo OAE - Architectural overview
 
Apereo OAE - Bootcamp
Apereo OAE - BootcampApereo OAE - Bootcamp
Apereo OAE - Bootcamp
 
Sakai 3 R&D
Sakai 3 R&DSakai 3 R&D
Sakai 3 R&D
 

Último

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Último (20)

Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 

WSDM 2011 - Nicolaas Matthijs and Filip Radlinski

  • 1. Personalizing Web Search using Long Term Browsing History Nicolaas Matthijs (University of Cambridge, UK) Filip Radlinski (Microsoft, Vancouver) WSDM 2011 10/02/2011
  • 2. What is personalized web search ?
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.

Notas do Editor

  1. Talking about paper “Say Title” which was done as part of my Master’s thesis at the Univ of Cambridge Supervised by Filip from Microsoft Research
  2. Search for IR = short, ambiguous query. For the search engine, that looks the same, even though information need is different => Physicist : more likely be interested in InfraRed => Attendee of the conference: more likely be interested in Information Retrieval => Stock broker: more likely be interested in stock information from International Rectifier All presented the same ranking => not optimal
  3. More emphasis on their interests
  4. Quite a lot of research in personalized web search, but in general we see 2 different approaches Pclick is the best approach within th clickthrough-based ones that we found and we compare to Teevan is best profile-based approach within profile-based ones that we found and compare to
  5. 3 major goals: - Improve personalization - Improve evaluation - Create tool that people can use
  6. Search personalization is a 2-step process: first one is extracting user’s interests and second is re-ranking search results User is represented by the following things Last 2 can be trivially extracted from browsing history User Profile => has to be learned
  7. Use structure encapsulated in HTML code Title, metadata description, full text, metadata keywords, extracted terms, noun phrases Specify how important each data source --> limited ourselves to give each data source a weight of 0, 1 or relative
  8. WordNet: include only those of a given set of PoS tags N-Gram: only include those terms that appear more than a given number of times on the web
  9. Calculate a weight for each term Frequency vector = number of occurrences for the term in each of the data sources TF weighting: dot product of weight vector and frequency vector TF-IDF: divide by log of Document Frequency. Normally, the document frequency is calculated from browsing history --> word that shows up a lot in your browsing history does actually mean it’s relevant relative to all information on internet --> used the Google N-Gram information pBM25: N = number of docs on internet, derived from google n-gram, nti = number of documents with that term (N-Gram), R = number of docs in browsing hist, rti = number of docs in browsing hist have that term
  10. 2nd step: re-rank results given the user profile Previously shown, re-ranking snippets is just as good as they are less noisy and more keyword focused + more realistic implementation.
  11. Score is indication of how relevant the result is for the current user Matching: sum over all snippet terms of freq of term in snippet times times weight of term Unique matching: ignore multiple occurrences of the same term Language Model = probability of the snippet given the user profile Extra weight to previously visited pages = extension to the Pclick concept
  12. Difficult --> show how the personalization impacts day-to-day search activity First step is an offline relevance judgments exercise in which we try to come up with some parameter configurations that work well Second step is a large scale online evaluation to check how well the parameter configurations generalize over unseen users and browsing history and whether it makes a difference in real life
  13. Choose implicitly => don’t want to require additional user actions Generate unique identifier for every user => anonymous On every page visit it would store URL / Length of HTML / Duration Visit / Time and Date Except for secure HTTPS pages Stored in database => Server would fetch the actual HTML
  14. Relevance: Not Relevant (0), Relevant (1), Very Relevant (2) Normalized Discounted Cumulative Gain == rank quality score
  15. MaxNDCG = Approach that yielded highest average NDCG score (0.568 over 0.506) MaxQuer = Approach that improved highest number of queries (52 out of 72) MaxBestParam = Obtained by greedily selecting each parameter in given order MaxNoRank = Best approach that doesn’t take the Google ranking into account --> interesting that we were able to find an approach that outperformed Google on its own. Later we found that it’s probably a case of overfitting training data, didn’t generalize in the online evaluation.
  16. Using the entire list of words performed considerably worse
  17. Interleaved evaluation present single ranking that interleaves 2 rankings --> evaluate which one is higher quality
  18. Better than anything published so far