SlideShare a Scribd company logo
1 of 13
Download to read offline
Re-ranking Web
Documents Based
on Personal
Preferences
Satarupa Guha, Priyanshu Agrawal, Shubham Sangal
IIIT Hyderabad
• Each Individual is unique.
• Search Engines should re-
rank its results based on
his profile so that his
specific requirements can
be met
Personalisation… Why ??
• Personalisation works using
Individualisation and
Contextualisation
• Individualisation means creating
each User’s Individual Profile
using his Long-Term History
• Contextualisation means
Identifying relation between
sessions using Short-Term
History
Dataset… Fully Anonymized
• Dataset was provided by Yandex ( Almost 16 GB of train
data and 400 MB of test Data)
• Search Data for 30 days from a large city was collected.
• To allay privacy concerns the user data is fully anonymized.
• Only meaningless numeric IDs of users, queries, query terms,
sessions, URLs and their domains are released.
• Training was done on data of first 27 days and testing on last
3 days.
Data Format
The log represents a stream of user actions, with each
line representing a session metadata, a query action, or a click
action. Each line contains tab-separated values according to
the following format:
Session metadata (TypeOfRecord = M):
SessionID TypeOfRecord Day USERID
Query action (TypeOfRecord = Q or T):
SessionID TimePassed TypeOfRecord SERPID QueryID
ListOfTerms ListOfURLsAndDomains
Click action (TypeOfRecord = C):
SessionID TimePassed TypeOfRecord SERPID URLID
Approach
Feature Engineering
Train Data Test Data
Basic Features User-Specific Features Pair-wise Features
Feature Representation
Regression Model
Trained Model
Re-ranked URLs
Train Data
Test
Data
Feature Engineering
• Initially each URL is labelled with a relevance label of:
o 0 (irrelevant) : no clicks or click less than 50 time units
o 1 (relevant): clicks with 50 to 399 time units
o 2 (highly relevant): dwell time grater than 400 units
• Basic Features used include URL-query instance such as
original position of URL for query , URLId , DomainId of
URLs , TermIds of query and various joint features such as
URLId X position , URLId X QueryId , DomainId X TermId
etc.
• User Specific Features include for each user , dividing each
URL in 5 categories based on relevance labels and whether
URL was previously displayed or skipped to the User.
• Pairwise Features include for each query checking the
relative position of a pair of URL
f(URLi, URLj) = 1 if URLi occurs at better position than
URLj
f(URLi, URLj) = -1 if URLj occurs at better position than
URLi
Feature Engineering
Logistic Regression
• Simple Logistic Regression Model was
used.
• URL with positive relevance was
labelled 1. Rest labelled -1.
• Relevance Labels assigned to each
URL w.r.t to each User were used as
Weight.
• Out of Core learning Algorithms
are used as they can work with huge
Data in hours.
Vowpal Wabbit for
Logistic Regression
• Fast Machine Learning tool.
• Works on huge data in a
parallel manner.
• RAM efficient.
• Fast Convergence to a Good
Predictor.
• Handles TBs of Data in Matter
of Hours.
Machine Learning
Evaluation and Results
• We used NDCG ( Normalized Discounted Cumulative Gain
)for evaluation of our results.
• Calculated using the ranking of URLs for each query, and
then averaged over queries.
𝐷𝐶𝐺@10 =
𝑖=1
10
2 𝑟𝑒𝑙 𝑖−1
𝑙𝑜𝑔2(𝑖+1)
where 𝑟𝑒𝑙𝑖, 𝑖 = 1,2, … 10 is the relevance list that contain 10
URL’s relevance values. IDCG@10 is the maximum possible
(ideal) DCG for a given set of queries, documents, and
relevance value. Then, NDCG@10 is given by
𝑁𝐷𝐶𝐺@10 =
𝐷𝐶𝐺@10
𝐼𝐷𝐶𝐺@10
• Uploaded the final Result file online on Kaggle Portal for
Evaluation.
• Our Team (Satarupa Guha) Got NDCG Score 0.76857
• Team who secured first Position got 0.80725
Thank You !!!

More Related Content

Viewers also liked

First Grade Jeopardy Review
First Grade Jeopardy Review First Grade Jeopardy Review
First Grade Jeopardy Review chatchis21
 
Huong dan-su-dung-phan-mem-ban-hang
Huong dan-su-dung-phan-mem-ban-hangHuong dan-su-dung-phan-mem-ban-hang
Huong dan-su-dung-phan-mem-ban-hangNguyễn Tuân
 
Procrastination in organization
Procrastination in organizationProcrastination in organization
Procrastination in organizationfarahm3d
 
Scope management term paper
Scope management term paperScope management term paper
Scope management term paperMuhammad Umar
 
Centrepoint presentation
Centrepoint presentationCentrepoint presentation
Centrepoint presentationMuhammad Umar
 
Selection of Roof Casting Formwork Systems for the Bird Island Project: Case ...
Selection of Roof Casting Formwork Systems for the Bird Island Project: Case ...Selection of Roof Casting Formwork Systems for the Bird Island Project: Case ...
Selection of Roof Casting Formwork Systems for the Bird Island Project: Case ...Muhammad Umar
 
E business case study danish furniture
E  business case study danish furnitureE  business case study danish furniture
E business case study danish furniturefarahm3d
 
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Lucidworks
 

Viewers also liked (9)

First Grade Jeopardy Review
First Grade Jeopardy Review First Grade Jeopardy Review
First Grade Jeopardy Review
 
Huong dan-su-dung-phan-mem-ban-hang
Huong dan-su-dung-phan-mem-ban-hangHuong dan-su-dung-phan-mem-ban-hang
Huong dan-su-dung-phan-mem-ban-hang
 
Procrastination in organization
Procrastination in organizationProcrastination in organization
Procrastination in organization
 
Scope management term paper
Scope management term paperScope management term paper
Scope management term paper
 
Centrepoint presentation
Centrepoint presentationCentrepoint presentation
Centrepoint presentation
 
Selection of Roof Casting Formwork Systems for the Bird Island Project: Case ...
Selection of Roof Casting Formwork Systems for the Bird Island Project: Case ...Selection of Roof Casting Formwork Systems for the Bird Island Project: Case ...
Selection of Roof Casting Formwork Systems for the Bird Island Project: Case ...
 
E business case study danish furniture
E  business case study danish furnitureE  business case study danish furniture
E business case study danish furniture
 
Mass transit system
Mass transit systemMass transit system
Mass transit system
 
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
 

Similar to Re-ranking Web Documents Using Personal Preferences

Click Log Mining CS598
Click Log Mining CS598Click Log Mining CS598
Click Log Mining CS598Shih-Wen Huang
 
10 personalized-web-search-techniques
10 personalized-web-search-techniques10 personalized-web-search-techniques
10 personalized-web-search-techniquesdipanjalishipne
 
Evolving the Optimal Relevancy Ranking Model at Dice.com
Evolving the Optimal Relevancy Ranking Model at Dice.comEvolving the Optimal Relevancy Ranking Model at Dice.com
Evolving the Optimal Relevancy Ranking Model at Dice.comSimon Hughes
 
Personalized Re-Ranking of Documents
Personalized Re-Ranking of DocumentsPersonalized Re-Ranking of Documents
Personalized Re-Ranking of Documentskswapna9
 
SharePoint User Group Meeting- SharePoint 2013 Search
SharePoint User Group Meeting- SharePoint 2013 SearchSharePoint User Group Meeting- SharePoint 2013 Search
SharePoint User Group Meeting- SharePoint 2013 SearchC/D/H Technology Consultants
 
How To Build your own Custom Search Engine
How To Build your own Custom Search EngineHow To Build your own Custom Search Engine
How To Build your own Custom Search EngineRicha Budhraja
 
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...Lucidworks
 
LSP ( Logic Score Preference ) _ Rajan_Dhabalia_San Francisco State University
LSP ( Logic Score Preference ) _ Rajan_Dhabalia_San Francisco State UniversityLSP ( Logic Score Preference ) _ Rajan_Dhabalia_San Francisco State University
LSP ( Logic Score Preference ) _ Rajan_Dhabalia_San Francisco State Universitydhabalia
 
Lucene Bootcamp - 2
Lucene Bootcamp - 2Lucene Bootcamp - 2
Lucene Bootcamp - 2GokulD
 
Architecting AI Solutions in Azure for Business
Architecting AI Solutions in Azure for BusinessArchitecting AI Solutions in Azure for Business
Architecting AI Solutions in Azure for BusinessIvo Andreev
 
Developing Search-driven application in SharePoint 2013
 Developing Search-driven application in SharePoint 2013  Developing Search-driven application in SharePoint 2013
Developing Search-driven application in SharePoint 2013 SPC Adriatics
 
SharePoint 2013 Search Based Solutions
SharePoint 2013 Search Based SolutionsSharePoint 2013 Search Based Solutions
SharePoint 2013 Search Based SolutionsSPC Adriatics
 
Optimising Queries - Series 1 Query Optimiser Architecture
Optimising Queries - Series 1 Query Optimiser ArchitectureOptimising Queries - Series 1 Query Optimiser Architecture
Optimising Queries - Series 1 Query Optimiser ArchitectureDAGEOP LTD
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...Joaquin Delgado PhD.
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...S. Diana Hu
 
Artificial Intelligence for Data Quality
Artificial Intelligence for Data QualityArtificial Intelligence for Data Quality
Artificial Intelligence for Data QualityVera Ekimenko
 
Search on the fly: how to lighten your Big Data - Simona Russo, Auro Rolle - ...
Search on the fly: how to lighten your Big Data - Simona Russo, Auro Rolle - ...Search on the fly: how to lighten your Big Data - Simona Russo, Auro Rolle - ...
Search on the fly: how to lighten your Big Data - Simona Russo, Auro Rolle - ...Codemotion
 
Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...
Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...
Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...Lippo Group Digital
 
SlideSeekerUVCE
SlideSeekerUVCESlideSeekerUVCE
SlideSeekerUVCErohitnbwlf
 

Similar to Re-ranking Web Documents Using Personal Preferences (20)

Click Log Mining CS598
Click Log Mining CS598Click Log Mining CS598
Click Log Mining CS598
 
10 personalized-web-search-techniques
10 personalized-web-search-techniques10 personalized-web-search-techniques
10 personalized-web-search-techniques
 
Evolving the Optimal Relevancy Ranking Model at Dice.com
Evolving the Optimal Relevancy Ranking Model at Dice.comEvolving the Optimal Relevancy Ranking Model at Dice.com
Evolving the Optimal Relevancy Ranking Model at Dice.com
 
Personalized Re-Ranking of Documents
Personalized Re-Ranking of DocumentsPersonalized Re-Ranking of Documents
Personalized Re-Ranking of Documents
 
SharePoint User Group Meeting- SharePoint 2013 Search
SharePoint User Group Meeting- SharePoint 2013 SearchSharePoint User Group Meeting- SharePoint 2013 Search
SharePoint User Group Meeting- SharePoint 2013 Search
 
How To Build your own Custom Search Engine
How To Build your own Custom Search EngineHow To Build your own Custom Search Engine
How To Build your own Custom Search Engine
 
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
 
LSP ( Logic Score Preference ) _ Rajan_Dhabalia_San Francisco State University
LSP ( Logic Score Preference ) _ Rajan_Dhabalia_San Francisco State UniversityLSP ( Logic Score Preference ) _ Rajan_Dhabalia_San Francisco State University
LSP ( Logic Score Preference ) _ Rajan_Dhabalia_San Francisco State University
 
Lucene Bootcamp - 2
Lucene Bootcamp - 2Lucene Bootcamp - 2
Lucene Bootcamp - 2
 
Architecting AI Solutions in Azure for Business
Architecting AI Solutions in Azure for BusinessArchitecting AI Solutions in Azure for Business
Architecting AI Solutions in Azure for Business
 
Developing Search-driven application in SharePoint 2013
 Developing Search-driven application in SharePoint 2013  Developing Search-driven application in SharePoint 2013
Developing Search-driven application in SharePoint 2013
 
SharePoint 2013 Search Based Solutions
SharePoint 2013 Search Based SolutionsSharePoint 2013 Search Based Solutions
SharePoint 2013 Search Based Solutions
 
How seo works
How seo worksHow seo works
How seo works
 
Optimising Queries - Series 1 Query Optimiser Architecture
Optimising Queries - Series 1 Query Optimiser ArchitectureOptimising Queries - Series 1 Query Optimiser Architecture
Optimising Queries - Series 1 Query Optimiser Architecture
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 
Artificial Intelligence for Data Quality
Artificial Intelligence for Data QualityArtificial Intelligence for Data Quality
Artificial Intelligence for Data Quality
 
Search on the fly: how to lighten your Big Data - Simona Russo, Auro Rolle - ...
Search on the fly: how to lighten your Big Data - Simona Russo, Auro Rolle - ...Search on the fly: how to lighten your Big Data - Simona Russo, Auro Rolle - ...
Search on the fly: how to lighten your Big Data - Simona Russo, Auro Rolle - ...
 
Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...
Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...
Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...
 
SlideSeekerUVCE
SlideSeekerUVCESlideSeekerUVCE
SlideSeekerUVCE
 

Recently uploaded

一比一定制波士顿学院毕业证学位证书
一比一定制波士顿学院毕业证学位证书一比一定制波士顿学院毕业证学位证书
一比一定制波士顿学院毕业证学位证书A
 
Washington Football Commanders Redskins Feathers Shirt
Washington Football Commanders Redskins Feathers ShirtWashington Football Commanders Redskins Feathers Shirt
Washington Football Commanders Redskins Feathers Shirtrahman018755
 
Abortion Clinic in Kwa thema +27791653574 Kwa thema WhatsApp Abortion Clinic ...
Abortion Clinic in Kwa thema +27791653574 Kwa thema WhatsApp Abortion Clinic ...Abortion Clinic in Kwa thema +27791653574 Kwa thema WhatsApp Abortion Clinic ...
Abortion Clinic in Kwa thema +27791653574 Kwa thema WhatsApp Abortion Clinic ...mikehavy0
 
Subdomain enumeration is a crucial phase in cybersecurity, particularly durin...
Subdomain enumeration is a crucial phase in cybersecurity, particularly durin...Subdomain enumeration is a crucial phase in cybersecurity, particularly durin...
Subdomain enumeration is a crucial phase in cybersecurity, particularly durin...Varun Mithran
 
TOP 100 Vulnerabilities Step-by-Step Guide Handbook
TOP 100 Vulnerabilities Step-by-Step Guide HandbookTOP 100 Vulnerabilities Step-by-Step Guide Handbook
TOP 100 Vulnerabilities Step-by-Step Guide HandbookVarun Mithran
 
I’ll See Y’All Motherfuckers In Game 7 Shirt
I’ll See Y’All Motherfuckers In Game 7 ShirtI’ll See Y’All Motherfuckers In Game 7 Shirt
I’ll See Y’All Motherfuckers In Game 7 Shirtrahman018755
 
一比一原版布兰迪斯大学毕业证如何办理
一比一原版布兰迪斯大学毕业证如何办理一比一原版布兰迪斯大学毕业证如何办理
一比一原版布兰迪斯大学毕业证如何办理A
 
Beyond Inbound: Unlocking the Secrets of API Egress Traffic Management
Beyond Inbound: Unlocking the Secrets of API Egress Traffic ManagementBeyond Inbound: Unlocking the Secrets of API Egress Traffic Management
Beyond Inbound: Unlocking the Secrets of API Egress Traffic Managementseank14
 
原版定制英国赫瑞瓦特大学毕业证原件一模一样
原版定制英国赫瑞瓦特大学毕业证原件一模一样原版定制英国赫瑞瓦特大学毕业证原件一模一样
原版定制英国赫瑞瓦特大学毕业证原件一模一样AS
 
原版定制美国加州大学河滨分校毕业证原件一模一样
原版定制美国加州大学河滨分校毕业证原件一模一样原版定制美国加州大学河滨分校毕业证原件一模一样
原版定制美国加州大学河滨分校毕业证原件一模一样A
 
一比一原版(毕业证书)新加坡南洋理工学院毕业证原件一模一样
一比一原版(毕业证书)新加坡南洋理工学院毕业证原件一模一样一比一原版(毕业证书)新加坡南洋理工学院毕业证原件一模一样
一比一原版(毕业证书)新加坡南洋理工学院毕业证原件一模一样AS
 
一比一定制(Temasek毕业证书)新加坡淡马锡理工学院毕业证学位证书
一比一定制(Temasek毕业证书)新加坡淡马锡理工学院毕业证学位证书一比一定制(Temasek毕业证书)新加坡淡马锡理工学院毕业证学位证书
一比一定制(Temasek毕业证书)新加坡淡马锡理工学院毕业证学位证书B
 
一比一原版(TRU毕业证书)温哥华社区学院毕业证如何办理
一比一原版(TRU毕业证书)温哥华社区学院毕业证如何办理一比一原版(TRU毕业证书)温哥华社区学院毕业证如何办理
一比一原版(TRU毕业证书)温哥华社区学院毕业证如何办理Fir
 
iThome_CYBERSEC2024_Drive_Into_the_DarkWeb
iThome_CYBERSEC2024_Drive_Into_the_DarkWebiThome_CYBERSEC2024_Drive_Into_the_DarkWeb
iThome_CYBERSEC2024_Drive_Into_the_DarkWebJie Liau
 
一比一定制加州大学欧文分校毕业证学位证书
一比一定制加州大学欧文分校毕业证学位证书一比一定制加州大学欧文分校毕业证学位证书
一比一定制加州大学欧文分校毕业证学位证书A
 
Free scottie t shirts Free scottie t shirts
Free scottie t shirts Free scottie t shirtsFree scottie t shirts Free scottie t shirts
Free scottie t shirts Free scottie t shirtsrahman018755
 
一比一原版(Polytechnic毕业证书)新加坡理工学院毕业证原件一模一样
一比一原版(Polytechnic毕业证书)新加坡理工学院毕业证原件一模一样一比一原版(Polytechnic毕业证书)新加坡理工学院毕业证原件一模一样
一比一原版(Polytechnic毕业证书)新加坡理工学院毕业证原件一模一样AS
 
Discovering OfficialUSA.com Your Go-To Resource.pdf
Discovering OfficialUSA.com Your Go-To Resource.pdfDiscovering OfficialUSA.com Your Go-To Resource.pdf
Discovering OfficialUSA.com Your Go-To Resource.pdfSadaf Khan
 
如何办理(UCLA毕业证)加州大学洛杉矶分校毕业证成绩单本科硕士学位证留信学历认证
如何办理(UCLA毕业证)加州大学洛杉矶分校毕业证成绩单本科硕士学位证留信学历认证如何办理(UCLA毕业证)加州大学洛杉矶分校毕业证成绩单本科硕士学位证留信学历认证
如何办理(UCLA毕业证)加州大学洛杉矶分校毕业证成绩单本科硕士学位证留信学历认证hfkmxufye
 

Recently uploaded (20)

GOOGLE Io 2024 At takes center stage.pdf
GOOGLE Io 2024 At takes center stage.pdfGOOGLE Io 2024 At takes center stage.pdf
GOOGLE Io 2024 At takes center stage.pdf
 
一比一定制波士顿学院毕业证学位证书
一比一定制波士顿学院毕业证学位证书一比一定制波士顿学院毕业证学位证书
一比一定制波士顿学院毕业证学位证书
 
Washington Football Commanders Redskins Feathers Shirt
Washington Football Commanders Redskins Feathers ShirtWashington Football Commanders Redskins Feathers Shirt
Washington Football Commanders Redskins Feathers Shirt
 
Abortion Clinic in Kwa thema +27791653574 Kwa thema WhatsApp Abortion Clinic ...
Abortion Clinic in Kwa thema +27791653574 Kwa thema WhatsApp Abortion Clinic ...Abortion Clinic in Kwa thema +27791653574 Kwa thema WhatsApp Abortion Clinic ...
Abortion Clinic in Kwa thema +27791653574 Kwa thema WhatsApp Abortion Clinic ...
 
Subdomain enumeration is a crucial phase in cybersecurity, particularly durin...
Subdomain enumeration is a crucial phase in cybersecurity, particularly durin...Subdomain enumeration is a crucial phase in cybersecurity, particularly durin...
Subdomain enumeration is a crucial phase in cybersecurity, particularly durin...
 
TOP 100 Vulnerabilities Step-by-Step Guide Handbook
TOP 100 Vulnerabilities Step-by-Step Guide HandbookTOP 100 Vulnerabilities Step-by-Step Guide Handbook
TOP 100 Vulnerabilities Step-by-Step Guide Handbook
 
I’ll See Y’All Motherfuckers In Game 7 Shirt
I’ll See Y’All Motherfuckers In Game 7 ShirtI’ll See Y’All Motherfuckers In Game 7 Shirt
I’ll See Y’All Motherfuckers In Game 7 Shirt
 
一比一原版布兰迪斯大学毕业证如何办理
一比一原版布兰迪斯大学毕业证如何办理一比一原版布兰迪斯大学毕业证如何办理
一比一原版布兰迪斯大学毕业证如何办理
 
Beyond Inbound: Unlocking the Secrets of API Egress Traffic Management
Beyond Inbound: Unlocking the Secrets of API Egress Traffic ManagementBeyond Inbound: Unlocking the Secrets of API Egress Traffic Management
Beyond Inbound: Unlocking the Secrets of API Egress Traffic Management
 
原版定制英国赫瑞瓦特大学毕业证原件一模一样
原版定制英国赫瑞瓦特大学毕业证原件一模一样原版定制英国赫瑞瓦特大学毕业证原件一模一样
原版定制英国赫瑞瓦特大学毕业证原件一模一样
 
原版定制美国加州大学河滨分校毕业证原件一模一样
原版定制美国加州大学河滨分校毕业证原件一模一样原版定制美国加州大学河滨分校毕业证原件一模一样
原版定制美国加州大学河滨分校毕业证原件一模一样
 
一比一原版(毕业证书)新加坡南洋理工学院毕业证原件一模一样
一比一原版(毕业证书)新加坡南洋理工学院毕业证原件一模一样一比一原版(毕业证书)新加坡南洋理工学院毕业证原件一模一样
一比一原版(毕业证书)新加坡南洋理工学院毕业证原件一模一样
 
一比一定制(Temasek毕业证书)新加坡淡马锡理工学院毕业证学位证书
一比一定制(Temasek毕业证书)新加坡淡马锡理工学院毕业证学位证书一比一定制(Temasek毕业证书)新加坡淡马锡理工学院毕业证学位证书
一比一定制(Temasek毕业证书)新加坡淡马锡理工学院毕业证学位证书
 
一比一原版(TRU毕业证书)温哥华社区学院毕业证如何办理
一比一原版(TRU毕业证书)温哥华社区学院毕业证如何办理一比一原版(TRU毕业证书)温哥华社区学院毕业证如何办理
一比一原版(TRU毕业证书)温哥华社区学院毕业证如何办理
 
iThome_CYBERSEC2024_Drive_Into_the_DarkWeb
iThome_CYBERSEC2024_Drive_Into_the_DarkWebiThome_CYBERSEC2024_Drive_Into_the_DarkWeb
iThome_CYBERSEC2024_Drive_Into_the_DarkWeb
 
一比一定制加州大学欧文分校毕业证学位证书
一比一定制加州大学欧文分校毕业证学位证书一比一定制加州大学欧文分校毕业证学位证书
一比一定制加州大学欧文分校毕业证学位证书
 
Free scottie t shirts Free scottie t shirts
Free scottie t shirts Free scottie t shirtsFree scottie t shirts Free scottie t shirts
Free scottie t shirts Free scottie t shirts
 
一比一原版(Polytechnic毕业证书)新加坡理工学院毕业证原件一模一样
一比一原版(Polytechnic毕业证书)新加坡理工学院毕业证原件一模一样一比一原版(Polytechnic毕业证书)新加坡理工学院毕业证原件一模一样
一比一原版(Polytechnic毕业证书)新加坡理工学院毕业证原件一模一样
 
Discovering OfficialUSA.com Your Go-To Resource.pdf
Discovering OfficialUSA.com Your Go-To Resource.pdfDiscovering OfficialUSA.com Your Go-To Resource.pdf
Discovering OfficialUSA.com Your Go-To Resource.pdf
 
如何办理(UCLA毕业证)加州大学洛杉矶分校毕业证成绩单本科硕士学位证留信学历认证
如何办理(UCLA毕业证)加州大学洛杉矶分校毕业证成绩单本科硕士学位证留信学历认证如何办理(UCLA毕业证)加州大学洛杉矶分校毕业证成绩单本科硕士学位证留信学历认证
如何办理(UCLA毕业证)加州大学洛杉矶分校毕业证成绩单本科硕士学位证留信学历认证
 

Re-ranking Web Documents Using Personal Preferences

  • 1. Re-ranking Web Documents Based on Personal Preferences Satarupa Guha, Priyanshu Agrawal, Shubham Sangal IIIT Hyderabad
  • 2. • Each Individual is unique. • Search Engines should re- rank its results based on his profile so that his specific requirements can be met Personalisation… Why ??
  • 3. • Personalisation works using Individualisation and Contextualisation • Individualisation means creating each User’s Individual Profile using his Long-Term History • Contextualisation means Identifying relation between sessions using Short-Term History
  • 4. Dataset… Fully Anonymized • Dataset was provided by Yandex ( Almost 16 GB of train data and 400 MB of test Data) • Search Data for 30 days from a large city was collected. • To allay privacy concerns the user data is fully anonymized. • Only meaningless numeric IDs of users, queries, query terms, sessions, URLs and their domains are released. • Training was done on data of first 27 days and testing on last 3 days.
  • 5. Data Format The log represents a stream of user actions, with each line representing a session metadata, a query action, or a click action. Each line contains tab-separated values according to the following format: Session metadata (TypeOfRecord = M): SessionID TypeOfRecord Day USERID Query action (TypeOfRecord = Q or T): SessionID TimePassed TypeOfRecord SERPID QueryID ListOfTerms ListOfURLsAndDomains Click action (TypeOfRecord = C): SessionID TimePassed TypeOfRecord SERPID URLID
  • 6. Approach Feature Engineering Train Data Test Data Basic Features User-Specific Features Pair-wise Features Feature Representation Regression Model Trained Model Re-ranked URLs Train Data Test Data
  • 7. Feature Engineering • Initially each URL is labelled with a relevance label of: o 0 (irrelevant) : no clicks or click less than 50 time units o 1 (relevant): clicks with 50 to 399 time units o 2 (highly relevant): dwell time grater than 400 units • Basic Features used include URL-query instance such as original position of URL for query , URLId , DomainId of URLs , TermIds of query and various joint features such as URLId X position , URLId X QueryId , DomainId X TermId etc.
  • 8. • User Specific Features include for each user , dividing each URL in 5 categories based on relevance labels and whether URL was previously displayed or skipped to the User. • Pairwise Features include for each query checking the relative position of a pair of URL f(URLi, URLj) = 1 if URLi occurs at better position than URLj f(URLi, URLj) = -1 if URLj occurs at better position than URLi Feature Engineering
  • 9. Logistic Regression • Simple Logistic Regression Model was used. • URL with positive relevance was labelled 1. Rest labelled -1. • Relevance Labels assigned to each URL w.r.t to each User were used as Weight. • Out of Core learning Algorithms are used as they can work with huge Data in hours.
  • 10. Vowpal Wabbit for Logistic Regression • Fast Machine Learning tool. • Works on huge data in a parallel manner. • RAM efficient. • Fast Convergence to a Good Predictor. • Handles TBs of Data in Matter of Hours. Machine Learning
  • 11. Evaluation and Results • We used NDCG ( Normalized Discounted Cumulative Gain )for evaluation of our results. • Calculated using the ranking of URLs for each query, and then averaged over queries. 𝐷𝐶𝐺@10 = 𝑖=1 10 2 𝑟𝑒𝑙 𝑖−1 𝑙𝑜𝑔2(𝑖+1) where 𝑟𝑒𝑙𝑖, 𝑖 = 1,2, … 10 is the relevance list that contain 10 URL’s relevance values. IDCG@10 is the maximum possible (ideal) DCG for a given set of queries, documents, and relevance value. Then, NDCG@10 is given by 𝑁𝐷𝐶𝐺@10 = 𝐷𝐶𝐺@10 𝐼𝐷𝐶𝐺@10
  • 12. • Uploaded the final Result file online on Kaggle Portal for Evaluation. • Our Team (Satarupa Guha) Got NDCG Score 0.76857 • Team who secured first Position got 0.80725