Personalized Search Features Group 44

•

0 gostou•1,029 visualizações

This document discusses personalized search and re-ranking search results based on a user's profile and past behavior. It describes extracting features from query logs covering 27 days of search data to train a classifier. Features include documents clicked and time spent by both the same and different users for a given query. The model is trained using LambdaMART ranking algorithm on 24 days of data and validated on 3 days. It then re-ranks the top 10 search results for test queries based on the extracted features to provide a personalized search ranking. Evaluation on a test platform showed an NDCG score higher than the baseline, indicating more relevant results.

Software Tecnologia Design

April 17, 2014 Group 44
Personalized
Re-rank
Features
Faculty Mentor : Dr. Vasudev
Verma
Swapna Kidambi
Meenal Goyal
Sumit Mishra
Chetan Jain

Search engines return results plainly based on the submitted query text and not based
on the context intended and users favor context based personalised search results.
Advantages :
• User gets the expected results faster.
• Only relevant data will be shown
Challenges:
• Dataset given is in the form of numbers.
• Speciﬁc emphasis on adaptation efficiency prohibits us from directly applying most
of existing domain adaptation methods and for generic ranking model and
personalised search, adaption efficiency is crucial because:
• Such an operation must be executable on the scale of all the search engine users.
• Handling the dynamic nature of users’ search intent and at the same time the need to
offer the searchers a great experience quickly.
Why Personalization ?
April 17, 2014 Group 44

Elements of Personalized search
April 17, 2014 Group 44
We are provided with a 27 day dataset containing :
• Session id
• User id
• Queries hit in a session and the top 10 results it fetched
• Documents clicked and
• The time duration for which these documents were viewed.
We have last 3 data-set as the testing data.

April 17, 2014 Group 44
• Divided the training dataset (27 days)
• Training Data (24 days)
• Validation Data (3 days)
Extraction of the features to train the classifier :
• Broadly, features for a given query take into account :
• Same query hit by the same user in history and the results it fetched
• Same query hit by different users in history and results they fetched
• Different queries hit by same user in history and their results
Our Approach

April 17, 2014 Group 44
• Features also embed information about :
• Documents clicked in the retrieved documents.
• Time spent on clicked documents.
So , we have , for a query , information about :
• all documents that a user clicked , skipped , missed
• time spent on documents
• documents relevant to user in previous searches
• documents relevant to query in previous searches
Our Approach

April 17, 2014 Group 44
• We have a set of features for each query in training data.
• Trained a classifier based on the features extracted and improved the model
with help of validation data.
• On getting query, found its features based on the data-set.
• Model along with this feature set retrieves the top relevant documents.
Feature Extraction:
• Our aim was to extract features for every training, validation and test user-
query-document triplet.(u , q(u) , d(q,u) ).
Our Approach

April 17, 2014 Group 44
Workflow
Training Data(24
GB)
Validation Data(3
GB)
Query Terms
Model
Set of Features
for all queries in
Data
Set of features
for query terms
Ranked output
of 10
Documents.
Feature Extraction Training a model using LambdaMart
Feature Extraction for query terms
Given to LAmbdaMart

How do we train the model and get the results?
• Ranklib - RankLib is a library of learning to rank algorithms.
• Lambda MART - LambdaMART is the boosted tree version of
LambdaRank, which is based on RankNet.
• It takes as input a set of urls with the feature values for each of the url
and produces the ranked output.
April 17, 2014 Group 44
Our Approach(Tools used)

• To check the results , uploaded the output file on the yandex website .
• We obtained an accuracy more than the baseline which is 0.49 NDCG(a score
to find how much accurate an output is) score.
April 17, 2014 Group 44
Observations

Mais conteúdo relacionado

Mais procurados

Tensors Are All You Need: Faster Inference with HummingbirdDatabricks

Using Deep Learning and Customized Solr Components to Improve search Relevanc...Lucidworks

Haystack 2019 - Towards a Learning To Rank Ecosystem @ Snag - We've got LTR t...OpenSource Connections

Click-through relevance ranking in solr & lucid works enterprise - By Andrz...lucenerevolution

Dataiku hadoop summit - semi-supervised learning with hadoop for understand...Dataiku

Karen's Favourite Features of SQL Server 2016Karen Lopez

Data Science with Python - WeCloudDataWeCloudData

How to Survive as a Data Architect in a Polyglot Database WorldKaren Lopez

Re-imagine Data Monitoring with whylogs and SparkDatabricks

Made to Measure: Ranking Evaluation using ElasticsearchDaniel Schneiter

Haystacks slidesTed Sullivan

Apache Spark 3.0: Overview of What’s New and Why CareDatabricks

Giovanni Lanzani – SQL & NoSQL databases for data driven applications - NoSQL...NoSQLmatters

Real-time Recommendations for Retail: Architecture, Algorithms, and DesignJuliet Hougland

7 Databases in 70 minutesKaren Lopez

Optimizing the Catalyst Optimizer for Complex PlansDatabricks

The Next Generation of AI-Powered SearchLucidworks

Introduction to Recommender SystemsTuri, Inc.

Reflected intelligence evolving self-learning data systemsTrey Grainger

How to obtain the Cloudera Data Engineer Certificationelephantscale

Mais procurados (20)

Tensors Are All You Need: Faster Inference with Hummingbird

Using Deep Learning and Customized Solr Components to Improve search Relevanc...

Haystack 2019 - Towards a Learning To Rank Ecosystem @ Snag - We've got LTR t...

Click-through relevance ranking in solr & lucid works enterprise - By Andrz...

Dataiku hadoop summit - semi-supervised learning with hadoop for understand...

Karen's Favourite Features of SQL Server 2016

Data Science with Python - WeCloudData

How to Survive as a Data Architect in a Polyglot Database World

Re-imagine Data Monitoring with whylogs and Spark

Made to Measure: Ranking Evaluation using Elasticsearch

Haystacks slides

Apache Spark 3.0: Overview of What’s New and Why Care

Giovanni Lanzani – SQL & NoSQL databases for data driven applications - NoSQL...

Real-time Recommendations for Retail: Architecture, Algorithms, and Design

7 Databases in 70 minutes

Optimizing the Catalyst Optimizer for Complex Plans

The Next Generation of AI-Powered Search

Introduction to Recommender Systems

Reflected intelligence evolving self-learning data systems

How to obtain the Cloudera Data Engineer Certification

Destaque

Mining model for hotel recommendations (Kaggle Challenge)Arjun Varma

User Engagement as Evaluation: a Ranking or a Regression Problem?Frédéric Guillou

Learning to Rank: An Introduction to LambdaMARTJulian Qian

Tribology in MedicineLibin Thomas

Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Lucidworks

Advances in tribologyApurv Tanay

Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorialAlexandros Karatzoglou

Destaque (7)

Mining model for hotel recommendations (Kaggle Challenge)

User Engagement as Evaluation: a Ranking or a Regression Problem?

Learning to Rank: An Introduction to LambdaMART

Tribology in Medicine

Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...

Advances in tribology

Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial

Semelhante a Personalized Search Features Group 44

Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...Lucidworks

How to gain a foothold in the world of classificationTorsten Schön

Evolving the Optimal Relevancy Ranking Model at Dice.comSimon Hughes

G filter pptravi teja

Discovering the New SuccessFactors LMS Admin FeaturesAshton Plusquellec

A flexible recommenndation system for Cable TVIntoTheMinds

A Flexible Recommendation System for Cable TVFrancisco Couto

Combining IR with Relevance Feedback for Concept LocationSonia Haiduc

Organizing user search historiespramod shetty

Mcq peresentationShah Jalal Hridoy

Usability testing 2013.12.20.Visual Cognition and Modeling Lab

Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...Sonya Liberman

Personalized Search and Job Recommendations - Simon Hughes, Dice.comLucidworks

Making Improvement Standard: Dynamic Agile Practices through Lean Standard WorkLitheSpeed

The User Is Always Right (Usually): 4 User Research Methods That Get ResultsMichael Hartman

Development of a MOOC Management SystemTechnological Ecosystems for Enhancing Multiculturality

Modern Perspectives on Recommender Systems and their Applications in MendeleyKris Jack

NIDHI PROJECT.pptxXavinr007

Home base sn ppt t4 tmbarefoot

體驗劇場_1050524_W14_易用性測試_楊政達Visual Cognition and Modeling Lab

Semelhante a Personalized Search Features Group 44 (20)

Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...

How to gain a foothold in the world of classification

Evolving the Optimal Relevancy Ranking Model at Dice.com

G filter ppt

Discovering the New SuccessFactors LMS Admin Features

A flexible recommenndation system for Cable TV

A Flexible Recommendation System for Cable TV

Combining IR with Relevance Feedback for Concept Location

Organizing user search histories

Mcq peresentation

Usability testing 2013.12.20.

Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...

Personalized Search and Job Recommendations - Simon Hughes, Dice.com

Making Improvement Standard: Dynamic Agile Practices through Lean Standard Work

The User Is Always Right (Usually): 4 User Research Methods That Get Results

Development of a MOOC Management System

Modern Perspectives on Recommender Systems and their Applications in Mendeley

NIDHI PROJECT.pptx

Home base sn ppt t4 t

體驗劇場_1050524_W14_易用性測試_楊政達

Último

How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes

Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveCall Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure

5 Signs You Need a Fashion PLM Software.pdfWave PLM

Software Quality Assurance Interview QuestionsArshad QA

call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls

The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171

Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.

Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave

SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI

Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.

How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc

Diamond Application Development Crafting Solutions with PrecisionSolGuruz

Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE9953056974 Low Rate Call Girls In Saket, Delhi NCR

Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531

Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab

Microsoft AI Transformation Partner Playbook.pdfWilly Marroquin (WillyDevNET)

A Secure and Reliable Document Management System is Essential.docxComplianceQuest1

Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171

CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823

Personalized Search Features Group 44

1. April 17, 2014 Group 44 Personalized Re-rank Features Faculty Mentor : Dr. Vasudev Verma Swapna Kidambi Meenal Goyal Sumit Mishra Chetan Jain

2. What is personalized search ? April 17, 2014 Group 44 “ Search Results that vary based on searcher’s profile and past behaviour “ • Today’s Problems : • Search Engines being impersonal. • User may not find relevant results as it does not consider user expertise level. • As the number of web-page results increase, information overload problem becomes severe and remedy would be the results according to user’s preferences.

3. Search engines return results plainly based on the submitted query text and not based on the context intended and users favor context based personalised search results. Advantages : • User gets the expected results faster. • Only relevant data will be shown Challenges: • Dataset given is in the form of numbers. • Speciﬁc emphasis on adaptation efficiency prohibits us from directly applying most of existing domain adaptation methods and for generic ranking model and personalised search, adaption efficiency is crucial because: • Such an operation must be executable on the scale of all the search engine users. • Handling the dynamic nature of users’ search intent and at the same time the need to offer the searchers a great experience quickly. Why Personalization ? April 17, 2014 Group 44

4. Elements of Personalized search April 17, 2014 Group 44 We are provided with a 27 day dataset containing : • Session id • User id • Queries hit in a session and the top 10 results it fetched • Documents clicked and • The time duration for which these documents were viewed. We have last 3 data-set as the testing data.

5. April 17, 2014 Group 44 • Divided the training dataset (27 days) • Training Data (24 days) • Validation Data (3 days) Extraction of the features to train the classifier : • Broadly, features for a given query take into account : • Same query hit by the same user in history and the results it fetched • Same query hit by different users in history and results they fetched • Different queries hit by same user in history and their results Our Approach

6. April 17, 2014 Group 44 • Features also embed information about : • Documents clicked in the retrieved documents. • Time spent on clicked documents. So , we have , for a query , information about : • all documents that a user clicked , skipped , missed • time spent on documents • documents relevant to user in previous searches • documents relevant to query in previous searches Our Approach

7. April 17, 2014 Group 44 • We have a set of features for each query in training data. • Trained a classifier based on the features extracted and improved the model with help of validation data. • On getting query, found its features based on the data-set. • Model along with this feature set retrieves the top relevant documents. Feature Extraction: • Our aim was to extract features for every training, validation and test user- query-document triplet.(u , q(u) , d(q,u) ). Our Approach

8. April 17, 2014 Group 44 Workflow Training Data(24 GB) Validation Data(3 GB) Query Terms Model Set of Features for all queries in Data Set of features for query terms Ranked output of 10 Documents. Feature Extraction Training a model using LambdaMart Feature Extraction for query terms Given to LAmbdaMart

9. How do we train the model and get the results? • Ranklib - RankLib is a library of learning to rank algorithms. • Lambda MART - LambdaMART is the boosted tree version of LambdaRank, which is based on RankNet. • It takes as input a set of urls with the feature values for each of the url and produces the ranked output. April 17, 2014 Group 44 Our Approach(Tools used)

10. • To check the results , uploaded the output file on the yandex website . • We obtained an accuracy more than the baseline which is 0.49 NDCG(a score to find how much accurate an output is) score. April 17, 2014 Group 44 Observations

11. Thank You :) April 17, 2014 Group 44

Personalized Search Features Group 44

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (7)

Semelhante a Personalized Search Features Group 44

Semelhante a Personalized Search Features Group 44 (20)

Último

Último (20)

Personalized Search Features Group 44