The document discusses learning-to-rank models for improving search relevance in e-commerce. It describes how traditional information retrieval models do not scale well to modern needs, while learning-to-rank methods can handle thousands of features and implicit user feedback data. The document reports that using listwise learning-to-rank with NDCG as the loss function improved NDCG by 15.6% and increased conversion rates by 7.5% on e-commerce data. It concludes that deep neural network methods may now outperform traditional machine learning for information retrieval tasks.
3. 3
• Motivations
• Traditional information retrieval models
• Learning-to-rank models
• Relevance
• Ranking Metrics
• Algorithms
• Ranking optimization
• Use cases
• Summary
• What is next?
Disclaimer: If not otherwise specified, images in this presentation
comply with the (CC) creative commons publishing license
4. 4
• E-commerce growing faster than
traditional brick-and-mortar market
($4.06T by 2020)
• Mobile shopping adoption
increasing worldwide (46%
shoppers in Asia and 28% in North
America)
• Online catalogs offering broader
selections and competitive products
• Electronic money transactions
gaining more consumers’ trust
• Massive data collected during web
and mobile interactions providing
foundation for machine learning-
driven optimizations
1.61B
Shoppers
$1.86T
Sales
$150B*
Revenues
ML
*2016 Combined revenues for Amazon, Otto Group, and Rakuten
https://www.statista.com/topics/871/online-shopping/
11. 11
• Basic ideas
• Lexical similarity metrics
• Penalizing repeated occurrences of the same term
• Penalizing term frequency for longer documents
• Only few features
• Manually hand-tuned feature weights based on heuristic
• Cannot include important search signals such as user’s
feedback, product popularity, purchase history, etc.
• Fast and scalable
12. 12
• Data-driven approach
• Directly optimize products rank based on relevance (different
from classification and regression ML tasks)
• Handle thousands of features
• Robust to noisy data
• Handle personalization
• Industry & research state-of-the-art (Amazon, eBay,
Microsoft, Yahoo!, Yandex, etc.)
13. 13
A document is relevant if contains the information the
user was looking for when submitted the query
Relevance is subjective and depends on many factors:
• context (what is displayed and how)
• task (purchase, search info, answer, etc.)
• novelty (unexpected data, ads, ext.)
• time and user’s effort involved
19. 19
• Tree ensemble method
• Handle sparse data
• Handle missing values and various value types
• Robust to outliers
• Learn higher-order feature interactions
• Invariant to feature scaling
• Highly scalable and optimized open source
implementation (XGBoost)
20. 20
Point-wise
• Input: single documents / Output: class labels or scores
• Classify each document as relevant or non-relevant.
• Adjust w to reduce classification errors
Pairwise ranking
• Input: document pairs / Output: partial order preferences
• Classify pairs of documents – D1 > D2?
• Adjust w to reduce discordant pairs
List-wise ranking
• Input: document collections / ranked document list
• Score permutations -- Is {D1,D2,…} > {D1’,D2’,…} ?
• Adjust w to directly maximize ranking measure of interest (NDCG)
Di
Q
Q
DjDi >
Q
DjDi > Dk>
21. 21
Green = relevant
Gray = not-relevant
Blue arrows = boost for
pair-wise loss function
Red arrows = boost for
list-wise loss function
(a) is the perfect ranking;
(b) is ranking with 10 pairwise errors;
(c) is ranking with 8 pairwise errors
28. 28
Bhaskar Mitra, Fernando Diaz, and Nick Craswell. 2017. Learning to Match using Local and Distributed Representations of
Text for Web Search. In Proceedings of the 26th International Conference on World Wide Web (WWW '17).
29. 29
Bhaskar Mitra, Fernando Diaz, and Nick Craswell. 2017. Learning to Match using Local and Distributed Representations of
Text for Web Search. In Proceedings of the 26th International Conference on World Wide Web (WWW '17).
30. 30
Bhaskar Mitra, Fernando Diaz, and Nick Craswell. 2017. Learning to Match using Local and Distributed Representations of
Text for Web Search. In Proceedings of the 26th International Conference on World Wide Web (WWW '17).
31. 31
• Traditional IR methods do not scale to modern e-commerce needs
• User’s implicit feedback is a proxy for search query / document pairs
relevance
• Learning-to-rank (LTR) methods scale to thousand of features and are
robust to data noise
• LTR with listwise-based loss function substantially improve search
relevance (15.6% NDCG increase on e-commerce data)
• NDCG improvements directly correlate to conversion rates (7.5% CTR
increase on e-commerce data)
• DNN methods for IR are starting to outperform traditional ML methods