SlideShare a Scribd company logo
1 of 27
Download to read offline
Learning to rank
fulltext results from
clicks
Tomáš Kramár
@tkramar
@synopsitv
Let's build a fulltext search
engine.
Query
Find
matches
Rank
results
1 2
43
Let's build a fulltext search
engine.
Query
Find
matches
Rank
results
1 2
43
● ElasticSearch
● LIKE %%
● ...
Let's build a fulltext search
engine.
Query
Find
matches
Rank
results
1 2
43
● By number of hits
● By PageRank
● By Date
● ...
How do
you
choose
relevant
results?
Number of
keywords in title
2 2
Number of
keywords in text
2 0
Domain carrerjet.sk vienna-rb.at
Category Job search Programming
Language Slovak English
Document feature How much I care about it
(the higher the more I care)
# keywords in title 2.1
# keywords in text 1
Domain is carreerjet.sk -2
Domain is vienna-rb.at 3.5
Category is Job Search -1
Category is Programming 4.2
Language is Slovak 0.9
Language is English 1.5
Document feature How much I
care about it
# keywords in title 2.1 2 2
# keywords in text 1 2 0
Domain is carreerjet.sk -2 1 0
Domain is vienna-rb.at 3.5 0 1
Category is Job Search -1 1 0
Category is Programming 4.2 0 1
Language is Slovak 0.9 1 0
Language is English 1.5 0 1
= 4.1 = 13.3rank = d . u
Rate each
result on
a scale 1-
5.
rating = d . u =
= d1
. u1
+ d2
. u2
+ ... + dn
. un
d1,1
. u1
+ d1,2
. u2
+ ... + d1,n
. un
= 3
d2,1
. u1
+ d2,2
. u2
+ ... + d2,n
. un
= 5
d3,1
. u1
+ d3,2
. u2
+ ... + dn
. u3,n
= 1
d4,1
. u1
+ d4,1
. u2
+ ... + dn
. u4,n
= 3
rating = d . u =
= d1
. u1
+ d2
. u2
+ ... + dn
. un
d1,1
. u1
+ d1,2
. u2
+ ... + d1,n
. un
= 3
d2,1
. u1
+ d2,2
. u2
+ ... + d2,n
. un
= 5
d3,1
. u1
+ d3,2
. u2
+ ... + dn
. u3,n
= 1
d4,1
. u1
+ d4,1
. u2
+ ... + dn
. u4,n
= 3
di,j
are known, solve this system of
equations and you have u. Done.
Except..
● You don't know the explicit
ratings
● User preferences change in time
● Those equations probably don't
have solution
Clicked!
Assume
rating 1.
Not clicked.
Assume
rating 0.
Except..
● You don't know the explicit
ratings
● User preferences change in time
● Those equations probably don't
have solution
Approximation function
h(d): d → rank
h(d) = d1
.u1
+ ... + dn
.un
= estimated_rank
If the function is good, it should make
minimal errors
error = (estimated_rank - real_rank)2
Gradient descent
1. Set user preferences (u) to arbitrary
values
2. Calculate the estimated rank h(d)
for each document
3. Calculate the mean square error
4. Adjust preferences u in a way that
minimizes the error
5. Repeat until the error converges
meansquareerror
u# of keywords in title
cost function
meansquareerror
u# of keywords in title
cost function
Calculate the derivation of cost
function at this point and it will
give you the direction to move in.
Preference update
ui
= ui
- α.h(d)dui
α learning rate
h(d)dui
partial derivation of
cost function h(d)
by ui
Preference update
ui
= ui
- α.h(d)dui
α learning rate
h(d)dui
partial derivation of
cost function h(d)
by ui
How fast will you
move. Too low -
slow progress. Too
high - you will
overshoot.
Preference update
ui
= ui
- α.h(d)dui
α learning rate
h(d)dui
partial derivation of
cost function h(d)
by ui
Nothing scary. You can
find these online for
standard cost
functions.
For mean square error:
(rank(d) - h(d)) * ui
Gradient descent
1. Set user preferences (u) to arbitrary
values
2. Calculate the estimated rank h(d)
for each document
3. Calculate the square error
4. Adjust preferences u in a way that
minimizes the error
5. Repeat until the error converges
Clicked! Assume
rating 1.
Clicked! Assume
rating 1.
Or? Doesn't
this mean
result #1 is not
relevant?
Clicked! Assume
nothing.
Clicked! Assume
it is better than
#2 and #3.
What's changed?
We no longer have ratings, just document
comparisons.
Cost function - something that
considers ordering, e.g., Kendall's T
(number of concordant and
discordant pairs)
h is now a function of 2
parameters: h(d1, d2). But you can
just do d2 - d1 and learn on that.
d4
> d3
d4
> d2
Learning to rank fulltext results from clicks

More Related Content

Viewers also liked

IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)Marina Santini
 
Владимир Гулин, Mail.Ru Group, Learning to rank using clickthrough data
Владимир Гулин, Mail.Ru Group, Learning to rank using clickthrough dataВладимир Гулин, Mail.Ru Group, Learning to rank using clickthrough data
Владимир Гулин, Mail.Ru Group, Learning to rank using clickthrough dataMail.ru Group
 
Markov model for the online multichannel attribution problem
Markov model for the online multichannel attribution problemMarkov model for the online multichannel attribution problem
Markov model for the online multichannel attribution problemadavide1982
 
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Lucidworks
 
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems -  ACM RecSys 2013 tutorialLearning to Rank for Recommender Systems -  ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorialAlexandros Karatzoglou
 
Collaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemCollaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemMilind Gokhale
 
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Xavier Amatriain
 

Viewers also liked (7)

IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)
 
Владимир Гулин, Mail.Ru Group, Learning to rank using clickthrough data
Владимир Гулин, Mail.Ru Group, Learning to rank using clickthrough dataВладимир Гулин, Mail.Ru Group, Learning to rank using clickthrough data
Владимир Гулин, Mail.Ru Group, Learning to rank using clickthrough data
 
Markov model for the online multichannel attribution problem
Markov model for the online multichannel attribution problemMarkov model for the online multichannel attribution problem
Markov model for the online multichannel attribution problem
 
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
 
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems -  ACM RecSys 2013 tutorialLearning to Rank for Recommender Systems -  ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial
 
Collaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemCollaborative Filtering Recommendation System
Collaborative Filtering Recommendation System
 
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
 

Similar to Learning to rank fulltext results from clicks

Optimizing search engines
Optimizing search enginesOptimizing search engines
Optimizing search enginesSwapnil Kotwal
 
Pf lec 01 intro
Pf lec 01 introPf lec 01 intro
Pf lec 01 introRajaKayani
 
Behaviour driven development
Behaviour driven developmentBehaviour driven development
Behaviour driven developmentTony Nguyen
 
Behaviour driven development
Behaviour driven developmentBehaviour driven development
Behaviour driven developmentJames Wong
 
Behaviour drivendevelopment
Behaviour drivendevelopmentBehaviour drivendevelopment
Behaviour drivendevelopmentHoang Nguyen
 
Behaviour drivendevelopment
Behaviour drivendevelopmentBehaviour drivendevelopment
Behaviour drivendevelopmentLuis Goldster
 
Behaviour driven development
Behaviour driven developmentBehaviour driven development
Behaviour driven developmentFraboni Ec
 
Behaviour driven development
Behaviour driven developmentBehaviour driven development
Behaviour driven developmentHarry Potter
 
Behaviour drivendevelopment
Behaviour drivendevelopmentBehaviour drivendevelopment
Behaviour drivendevelopmentYoung Alista
 
Dataiku at SF DataMining Meetup - Kaggle Yandex Challenge
Dataiku at SF DataMining Meetup - Kaggle Yandex ChallengeDataiku at SF DataMining Meetup - Kaggle Yandex Challenge
Dataiku at SF DataMining Meetup - Kaggle Yandex ChallengeDataiku
 
Designing Object Oriented Software - lecture slides 2013
Designing Object Oriented Software - lecture slides 2013Designing Object Oriented Software - lecture slides 2013
Designing Object Oriented Software - lecture slides 2013Jouni Smed
 
Improving classification accuracy for customer contact transcriptions
Improving classification accuracy for customer contact transcriptionsImproving classification accuracy for customer contact transcriptions
Improving classification accuracy for customer contact transcriptionsMaria Vechtomova
 
Software development slides
Software development slidesSoftware development slides
Software development slidesiarthur
 
@IndeedEng: Tokens and Millicents - technical challenges in launching Indeed...
@IndeedEng:  Tokens and Millicents - technical challenges in launching Indeed...@IndeedEng:  Tokens and Millicents - technical challenges in launching Indeed...
@IndeedEng: Tokens and Millicents - technical challenges in launching Indeed...indeedeng
 
Recommender Systems from A to Z – Model Evaluation
Recommender Systems from A to Z – Model EvaluationRecommender Systems from A to Z – Model Evaluation
Recommender Systems from A to Z – Model EvaluationCrossing Minds
 
How to Build your Training Set for a Learning To Rank Project
How to Build your Training Set for a Learning To Rank ProjectHow to Build your Training Set for a Learning To Rank Project
How to Build your Training Set for a Learning To Rank ProjectSease
 
Translating Qt Applications
Translating Qt ApplicationsTranslating Qt Applications
Translating Qt Applicationsaccount inactive
 
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...GeeksLab Odessa
 

Similar to Learning to rank fulltext results from clicks (20)

Optimizing search engines
Optimizing search enginesOptimizing search engines
Optimizing search engines
 
Pf lec 01 intro
Pf lec 01 introPf lec 01 intro
Pf lec 01 intro
 
Behaviour driven development
Behaviour driven developmentBehaviour driven development
Behaviour driven development
 
Behaviour driven development
Behaviour driven developmentBehaviour driven development
Behaviour driven development
 
Behaviour drivendevelopment
Behaviour drivendevelopmentBehaviour drivendevelopment
Behaviour drivendevelopment
 
Behaviour drivendevelopment
Behaviour drivendevelopmentBehaviour drivendevelopment
Behaviour drivendevelopment
 
Behaviour driven development
Behaviour driven developmentBehaviour driven development
Behaviour driven development
 
Behaviour driven development
Behaviour driven developmentBehaviour driven development
Behaviour driven development
 
Behaviour drivendevelopment
Behaviour drivendevelopmentBehaviour drivendevelopment
Behaviour drivendevelopment
 
Dataiku at SF DataMining Meetup - Kaggle Yandex Challenge
Dataiku at SF DataMining Meetup - Kaggle Yandex ChallengeDataiku at SF DataMining Meetup - Kaggle Yandex Challenge
Dataiku at SF DataMining Meetup - Kaggle Yandex Challenge
 
Designing Object Oriented Software - lecture slides 2013
Designing Object Oriented Software - lecture slides 2013Designing Object Oriented Software - lecture slides 2013
Designing Object Oriented Software - lecture slides 2013
 
Improving classification accuracy for customer contact transcriptions
Improving classification accuracy for customer contact transcriptionsImproving classification accuracy for customer contact transcriptions
Improving classification accuracy for customer contact transcriptions
 
Software development slides
Software development slidesSoftware development slides
Software development slides
 
@IndeedEng: Tokens and Millicents - technical challenges in launching Indeed...
@IndeedEng:  Tokens and Millicents - technical challenges in launching Indeed...@IndeedEng:  Tokens and Millicents - technical challenges in launching Indeed...
@IndeedEng: Tokens and Millicents - technical challenges in launching Indeed...
 
Algorithms overview
Algorithms overviewAlgorithms overview
Algorithms overview
 
Recommender Systems from A to Z – Model Evaluation
Recommender Systems from A to Z – Model EvaluationRecommender Systems from A to Z – Model Evaluation
Recommender Systems from A to Z – Model Evaluation
 
How to Build your Training Set for a Learning To Rank Project
How to Build your Training Set for a Learning To Rank ProjectHow to Build your Training Set for a Learning To Rank Project
How to Build your Training Set for a Learning To Rank Project
 
Cloud Computing Project
Cloud Computing ProjectCloud Computing Project
Cloud Computing Project
 
Translating Qt Applications
Translating Qt ApplicationsTranslating Qt Applications
Translating Qt Applications
 
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...
 

More from tkramar

Lessons learned from SearchD development
Lessons learned from SearchD developmentLessons learned from SearchD development
Lessons learned from SearchD developmenttkramar
 
Live Streaming & Server Sent Events
Live Streaming & Server Sent EventsLive Streaming & Server Sent Events
Live Streaming & Server Sent Eventstkramar
 
Unix is my IDE
Unix is my IDEUnix is my IDE
Unix is my IDEtkramar
 
Optimising Web Application Frontend
Optimising Web Application FrontendOptimising Web Application Frontend
Optimising Web Application Frontendtkramar
 
MongoDB: Repository for Web-scale metadata
MongoDB: Repository for Web-scale metadataMongoDB: Repository for Web-scale metadata
MongoDB: Repository for Web-scale metadatatkramar
 
Cassandra: Indexing and discovering similar images
Cassandra: Indexing and discovering similar imagesCassandra: Indexing and discovering similar images
Cassandra: Indexing and discovering similar imagestkramar
 
CouchDB: replicated data store for distributed proxy server
CouchDB: replicated data store for distributed proxy serverCouchDB: replicated data store for distributed proxy server
CouchDB: replicated data store for distributed proxy servertkramar
 
Ruby vim
Ruby vimRuby vim
Ruby vimtkramar
 

More from tkramar (8)

Lessons learned from SearchD development
Lessons learned from SearchD developmentLessons learned from SearchD development
Lessons learned from SearchD development
 
Live Streaming & Server Sent Events
Live Streaming & Server Sent EventsLive Streaming & Server Sent Events
Live Streaming & Server Sent Events
 
Unix is my IDE
Unix is my IDEUnix is my IDE
Unix is my IDE
 
Optimising Web Application Frontend
Optimising Web Application FrontendOptimising Web Application Frontend
Optimising Web Application Frontend
 
MongoDB: Repository for Web-scale metadata
MongoDB: Repository for Web-scale metadataMongoDB: Repository for Web-scale metadata
MongoDB: Repository for Web-scale metadata
 
Cassandra: Indexing and discovering similar images
Cassandra: Indexing and discovering similar imagesCassandra: Indexing and discovering similar images
Cassandra: Indexing and discovering similar images
 
CouchDB: replicated data store for distributed proxy server
CouchDB: replicated data store for distributed proxy serverCouchDB: replicated data store for distributed proxy server
CouchDB: replicated data store for distributed proxy server
 
Ruby vim
Ruby vimRuby vim
Ruby vim
 

Recently uploaded

Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 

Recently uploaded (20)

Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 

Learning to rank fulltext results from clicks

  • 1. Learning to rank fulltext results from clicks Tomáš Kramár @tkramar @synopsitv
  • 2. Let's build a fulltext search engine. Query Find matches Rank results 1 2 43
  • 3. Let's build a fulltext search engine. Query Find matches Rank results 1 2 43 ● ElasticSearch ● LIKE %% ● ...
  • 4. Let's build a fulltext search engine. Query Find matches Rank results 1 2 43 ● By number of hits ● By PageRank ● By Date ● ...
  • 5.
  • 7. Number of keywords in title 2 2 Number of keywords in text 2 0 Domain carrerjet.sk vienna-rb.at Category Job search Programming Language Slovak English
  • 8. Document feature How much I care about it (the higher the more I care) # keywords in title 2.1 # keywords in text 1 Domain is carreerjet.sk -2 Domain is vienna-rb.at 3.5 Category is Job Search -1 Category is Programming 4.2 Language is Slovak 0.9 Language is English 1.5
  • 9. Document feature How much I care about it # keywords in title 2.1 2 2 # keywords in text 1 2 0 Domain is carreerjet.sk -2 1 0 Domain is vienna-rb.at 3.5 0 1 Category is Job Search -1 1 0 Category is Programming 4.2 0 1 Language is Slovak 0.9 1 0 Language is English 1.5 0 1 = 4.1 = 13.3rank = d . u
  • 10. Rate each result on a scale 1- 5.
  • 11. rating = d . u = = d1 . u1 + d2 . u2 + ... + dn . un d1,1 . u1 + d1,2 . u2 + ... + d1,n . un = 3 d2,1 . u1 + d2,2 . u2 + ... + d2,n . un = 5 d3,1 . u1 + d3,2 . u2 + ... + dn . u3,n = 1 d4,1 . u1 + d4,1 . u2 + ... + dn . u4,n = 3
  • 12. rating = d . u = = d1 . u1 + d2 . u2 + ... + dn . un d1,1 . u1 + d1,2 . u2 + ... + d1,n . un = 3 d2,1 . u1 + d2,2 . u2 + ... + d2,n . un = 5 d3,1 . u1 + d3,2 . u2 + ... + dn . u3,n = 1 d4,1 . u1 + d4,1 . u2 + ... + dn . u4,n = 3 di,j are known, solve this system of equations and you have u. Done.
  • 13. Except.. ● You don't know the explicit ratings ● User preferences change in time ● Those equations probably don't have solution
  • 15. Except.. ● You don't know the explicit ratings ● User preferences change in time ● Those equations probably don't have solution
  • 16. Approximation function h(d): d → rank h(d) = d1 .u1 + ... + dn .un = estimated_rank If the function is good, it should make minimal errors error = (estimated_rank - real_rank)2
  • 17. Gradient descent 1. Set user preferences (u) to arbitrary values 2. Calculate the estimated rank h(d) for each document 3. Calculate the mean square error 4. Adjust preferences u in a way that minimizes the error 5. Repeat until the error converges
  • 18. meansquareerror u# of keywords in title cost function
  • 19. meansquareerror u# of keywords in title cost function Calculate the derivation of cost function at this point and it will give you the direction to move in.
  • 20. Preference update ui = ui - α.h(d)dui α learning rate h(d)dui partial derivation of cost function h(d) by ui
  • 21. Preference update ui = ui - α.h(d)dui α learning rate h(d)dui partial derivation of cost function h(d) by ui How fast will you move. Too low - slow progress. Too high - you will overshoot.
  • 22. Preference update ui = ui - α.h(d)dui α learning rate h(d)dui partial derivation of cost function h(d) by ui Nothing scary. You can find these online for standard cost functions. For mean square error: (rank(d) - h(d)) * ui
  • 23. Gradient descent 1. Set user preferences (u) to arbitrary values 2. Calculate the estimated rank h(d) for each document 3. Calculate the square error 4. Adjust preferences u in a way that minimizes the error 5. Repeat until the error converges
  • 24. Clicked! Assume rating 1. Clicked! Assume rating 1. Or? Doesn't this mean result #1 is not relevant?
  • 25. Clicked! Assume nothing. Clicked! Assume it is better than #2 and #3.
  • 26. What's changed? We no longer have ratings, just document comparisons. Cost function - something that considers ordering, e.g., Kendall's T (number of concordant and discordant pairs) h is now a function of 2 parameters: h(d1, d2). But you can just do d2 - d1 and learn on that. d4 > d3 d4 > d2