SlideShare uma empresa Scribd logo
1 de 27
September 23, 2016
Query Understanding
In Amazon Search
Tanvi Motwani
Data Scientist
Amazon Search
PRODUCT SEARCH
2
MOTIVATION
Fashion Product
Query
Brand
Product
Type Price under Gender Is Prime
3
Power Law Distribution of Queries
Large population of long tail queries
Speedy Search Response
Fast Models that respond in milliseconds
Dynamic Search Trends
Adaptive to new trends
Global Search Reach
Deal with 10 different languages
CHALLENGES
4
“label”: “_color”,
“id” : C232,
“name” : “Black”
“label”: “_brand”,
“id” : B1402,
“name” : “The North Face”
“label” : “_product”,
“id” : P232,
“name” : “Jacket”
“category” : “Fashion >
Clothing >
Jackets & Coats”
“class”: “product_query”,
“score”: 0.9
“name”: “query_specificity”,
“score”: 0.7
“class”: “fashion”,
“score”: 0.8
QUERY TAGGERS
QUERY CLASSIFIERS
5
QUERY CLASSIFIERS
6
QUERY CATEGORY CLASSIFIER
“A Multiclass Classifier which classifies input user query into Amazon Categories.”
7
QUERY CATEGORY CLASSIFIER
• Automatic generates large training dataset
• Frequent refresh of training data possible
• Trigram model generalizes well for tail queries
tv
ipod
projector
speakers
headphones
pillow
curtains
pet bells
mattress
shower curtain
suits
mr robot
star trek
downton abbey
game of thrones
Trigram Language Model
8
Large percentage of query searches happen within a category
CUSTOMER SERVICE QUERY CLASSIFIER
“Classifies query into customer service queries versus product query.”
contact amazon
amazon phone number
how do I cancel my order?
where is my order history?
where is my order?
how can I see videos?
amazon prime video help
9
QUERY TAGGERS
10
Brand
Product
Type Price under Gender Is Prime
QUERY TAGGING
FILTERING
11
QUERY TAGGING
IMPROVED UI
12
QUERY TAGGING (BRAND)
adidas shoes
jansport backpack
ray ban sunglasses
ralph lauren men
BRAND
BRAND
BRAND
BRAND
ralph lauren men
IB O
Conditional Random Field
adidas shoes
B O
ray ban sunglasses
B I O
ralph lauren men
B I O
jansport backpack
B O
how? TRAIN
13
what?
14
• Discriminative Model – models conditional probability P(Y|X). We do not
care to model P(X)
• Features: word capitalized, word in atlas or name list, previous word is
“Mrs”, next word is “Times”, …
Recommended Tutorial on CRF –
An Introduction to Conditional Random Fields for Relational Learning
https://people.cs.umass.edu/~mccallum/papers/crf-tutorial.pdf
CONDITIONAL RANDOM FIELD
adidas shoes
jansport backpack
ray ban sunglasses
north face black jacket
polo ralph lauren men
white shoes
ralph lauren
click add purchase
QUERY LOGS PRODUCT CATALOGUE
15
north face black jacket
QUERY LOGS PRODUCT CATALOGUE
16
arg max 𝑏
𝑖 ∈ 𝑃(𝑏)
𝑓(𝑐𝑖, 𝑎𝑖, 𝑝𝑖)
𝑤ℎ𝑒𝑟𝑒,
𝑏 𝑖𝑠 𝑎 𝑏𝑟𝑎𝑛𝑑 𝑎𝑛𝑑,
𝑃(𝑏) 𝑎𝑟𝑒 𝑠𝑒𝑡 𝑜𝑓 𝑝𝑟𝑜𝑑𝑢𝑐𝑡𝑠 𝑐𝑜𝑟𝑟𝑒𝑠𝑝𝑜𝑛𝑑𝑖𝑛𝑔 𝑡𝑜 𝑏𝑟𝑎𝑛𝑑 𝑏
north face black jacket
0.8
0.2BRAND
Matching Strategies:
• Attribute completely contained in query
• Match after removing stop words, prepositions etc.
• Partial query-attribute match
17
QUERY TAGGING - SUMMARY
Training Data
Generator
Query Logs
Product
Catalogue
Conditional
Random
Field
Search
Engine
TRAIN DEPLOY
Manual
Overrides
18
• Context aware
“Philosophy books” v/s “Philosophy face wash”
• Different formulations of same entity
“Marc by Marc Jacobs” v/s “Marc Jacobs”
Query Understanding Team
• Palo Alto, California
• Munich, Germany
• Tokyo, Japan
• Beijing, China
Acknowledgements
Mukund Seshadri, Tracy King, Will Headden, Louka Dlagnekov, Tianyu Cao, Rahul Goutam, Huascar Fiorletta,
Alexander Zeyliger, Smruthi Mukund, Konstantin Stulov, Himanshu Gahlot, Yosi Shturm, Taro Kawagishi, Ravi
Jammalamadaka, Anand Lakshminath, Hernan, Greg Miller, Heran, Nick Trown
ACCESSORY QUERY CLASSIFIER
20
Base Product DB Accessory Product DB
ACCESSORY QUERY CLASSIFIER
macbook pro
macs
apple laptop
laptop
Base Query Corpus
mac ram
apple sleeve
laptop cover
apple skin
Accessory Query
Corpus
Binary Classifier
Class A
Class B
Search Engine
21
ACCESSORY QUERY CLASSIFIER
22
Base Product DB Accessory Product DB
ACCESSORY QUERY CLASSIFIER
paperwhite
kindle
amazon tablet
book reader
Base Query Corpus
Kindle case
Kindle cover
Amazon cover
case for kindle
Accessory Query
Corpus
Binary Classifier
Class A
Class B
Search Engine
23
Training Data
Generator
Query Logs
Product
Catalogue
Conditional
Random
Field
Search
Engine
TRAIN DEPLOY
QUERY TAGGING - SUMMARY
Manual
Overrides
Validation Techniques:
• Offline validation
 Cross validation 80/20 split
 Manual Gold Standard evaluation
• A/B test
 Control – Before the model was deployed
 Treatment – After the model is deployed
24
• Dictionary methods are not context aware
 Example: “philosophy books”, dictionary method will tag
“philosophy” as brand.
• Fails to detect different formulations of same entity.
 Example: “mk” vs. “michael kors”
COMPARISON TO DICTIONARY
LOOKUP METHODS
Our system improved precision over baseline by 10% and
approximately doubled the recall.
25
GLOBAL REACH
26
27
GENERATIVE v/s DISCRIMINATIVE MODELS
𝑃(𝑌, 𝑋)
𝑃 𝑌 𝑋)

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Query Understanding: A Manifesto
Query Understanding: A ManifestoQuery Understanding: A Manifesto
Query Understanding: A Manifesto
 
Feature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive modelsFeature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive models
 
MIRU2020長尾賞受賞論文解説:Attention Branch Networkの展開
MIRU2020長尾賞受賞論文解説:Attention Branch Networkの展開MIRU2020長尾賞受賞論文解説:Attention Branch Networkの展開
MIRU2020長尾賞受賞論文解説:Attention Branch Networkの展開
 
SSII2020 [OS2-03] 深層学習における半教師あり学習の最新動向
SSII2020 [OS2-03] 深層学習における半教師あり学習の最新動向SSII2020 [OS2-03] 深層学習における半教師あり学習の最新動向
SSII2020 [OS2-03] 深層学習における半教師あり学習の最新動向
 
[Dl輪読会]dl hacks輪読
[Dl輪読会]dl hacks輪読[Dl輪読会]dl hacks輪読
[Dl輪読会]dl hacks輪読
 
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems -  ACM RecSys 2013 tutorialLearning to Rank for Recommender Systems -  ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial
 
Factorization Machines and Applications in Recommender Systems
Factorization Machines and Applications in Recommender SystemsFactorization Machines and Applications in Recommender Systems
Factorization Machines and Applications in Recommender Systems
 
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine
Leveraging Lucene/Solr as a Knowledge Graph and Intent EngineLeveraging Lucene/Solr as a Knowledge Graph and Intent Engine
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature Engineering
 
【DL輪読会】Data-Efficient Reinforcement Learning with Self-Predictive Representat...
【DL輪読会】Data-Efficient Reinforcement Learning with Self-Predictive Representat...【DL輪読会】Data-Efficient Reinforcement Learning with Self-Predictive Representat...
【DL輪読会】Data-Efficient Reinforcement Learning with Self-Predictive Representat...
 
Approximate Vector Search at Scale, With Application to Image Search - SciPY ...
Approximate Vector Search at Scale, With Application to Image Search - SciPY ...Approximate Vector Search at Scale, With Application to Image Search - SciPY ...
Approximate Vector Search at Scale, With Application to Image Search - SciPY ...
 
Xgboost
XgboostXgboost
Xgboost
 
[db analytics showcase Sapporo 2017] A15: Pythonでの分散処理再入門 by 株式会社HPCソリューションズ ...
[db analytics showcase Sapporo 2017] A15: Pythonでの分散処理再入門 by 株式会社HPCソリューションズ ...[db analytics showcase Sapporo 2017] A15: Pythonでの分散処理再入門 by 株式会社HPCソリューションズ ...
[db analytics showcase Sapporo 2017] A15: Pythonでの分散処理再入門 by 株式会社HPCソリューションズ ...
 
【DL輪読会】Aspect-based Analysis of Advertising Appeals for Search Engine Advert...
【DL輪読会】Aspect-based Analysis of Advertising Appeals for Search  Engine Advert...【DL輪読会】Aspect-based Analysis of Advertising Appeals for Search  Engine Advert...
【DL輪読会】Aspect-based Analysis of Advertising Appeals for Search Engine Advert...
 
Feature Engineering
Feature Engineering Feature Engineering
Feature Engineering
 
Ijcai 2020
Ijcai 2020Ijcai 2020
Ijcai 2020
 
TorchDataチュートリアル解説
TorchDataチュートリアル解説TorchDataチュートリアル解説
TorchDataチュートリアル解説
 
Autoencoders
AutoencodersAutoencoders
Autoencoders
 
深層学習の数理
深層学習の数理深層学習の数理
深層学習の数理
 
物体検出コンペティションOpen Imagesに挑む
物体検出コンペティションOpen Imagesに挑む物体検出コンペティションOpen Imagesに挑む
物体検出コンペティションOpen Imagesに挑む
 

Destaque

Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
MLconf
 
Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017
Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017 Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017
Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017
MLconf
 
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
MLconf
 
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
MLconf
 
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
MLconf
 
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
MLconf
 
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
MLconf
 

Destaque (20)

Beverly Wright, Executive Director, Business Analytics Center, Georgia Instit...
Beverly Wright, Executive Director, Business Analytics Center, Georgia Instit...Beverly Wright, Executive Director, Business Analytics Center, Georgia Instit...
Beverly Wright, Executive Director, Business Analytics Center, Georgia Instit...
 
Michael Galvin, Sr. Data Scientist, Metis at MLconf ATL 2016
Michael Galvin, Sr. Data Scientist, Metis at MLconf ATL 2016Michael Galvin, Sr. Data Scientist, Metis at MLconf ATL 2016
Michael Galvin, Sr. Data Scientist, Metis at MLconf ATL 2016
 
Jonathan Lenaghan, VP of Science and Technology, PlaceIQ at MLconf ATL 2016
Jonathan Lenaghan, VP of Science and Technology, PlaceIQ at MLconf ATL 2016Jonathan Lenaghan, VP of Science and Technology, PlaceIQ at MLconf ATL 2016
Jonathan Lenaghan, VP of Science and Technology, PlaceIQ at MLconf ATL 2016
 
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
 
Brian Lucena, Senior Data Scientist, Metis at MLconf SF 2016
Brian Lucena, Senior Data Scientist, Metis at MLconf SF 2016Brian Lucena, Senior Data Scientist, Metis at MLconf SF 2016
Brian Lucena, Senior Data Scientist, Metis at MLconf SF 2016
 
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
 
Alex Dimakis, Associate Professor, Dept. of Electrical and Computer Engineeri...
Alex Dimakis, Associate Professor, Dept. of Electrical and Computer Engineeri...Alex Dimakis, Associate Professor, Dept. of Electrical and Computer Engineeri...
Alex Dimakis, Associate Professor, Dept. of Electrical and Computer Engineeri...
 
Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016
Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016
Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016
 
Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017
Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017 Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017
Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017
 
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
 
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
 
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
 
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
 
Mayur Thakur, Managing Director, Goldman Sachs, at MLconf NYC 2017
Mayur Thakur, Managing Director, Goldman Sachs, at MLconf NYC 2017Mayur Thakur, Managing Director, Goldman Sachs, at MLconf NYC 2017
Mayur Thakur, Managing Director, Goldman Sachs, at MLconf NYC 2017
 
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
 
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
 
Harm van Seijen, Research Scientist, Maluuba at MLconf SF 2016
Harm van Seijen, Research Scientist, Maluuba at MLconf SF 2016Harm van Seijen, Research Scientist, Maluuba at MLconf SF 2016
Harm van Seijen, Research Scientist, Maluuba at MLconf SF 2016
 
Kaz Sato, Evangelist, Google at MLconf ATL 2016
Kaz Sato, Evangelist, Google at MLconf ATL 2016Kaz Sato, Evangelist, Google at MLconf ATL 2016
Kaz Sato, Evangelist, Google at MLconf ATL 2016
 
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
 
Erik Bernhardsson, CTO, Better Mortgage
Erik Bernhardsson, CTO, Better MortgageErik Bernhardsson, CTO, Better Mortgage
Erik Bernhardsson, CTO, Better Mortgage
 

Semelhante a Tanvi Motwani, Lead Data Scientist, Guided Search at A9.com at MLconf ATL 2016

Sourcing with Social Media: Tips from a Corporate Sleuth by Sean Campbell
Sourcing with Social Media: Tips from a Corporate Sleuth by Sean CampbellSourcing with Social Media: Tips from a Corporate Sleuth by Sean Campbell
Sourcing with Social Media: Tips from a Corporate Sleuth by Sean Campbell
Reynolds Center for Business Journalism
 
Clustering as presented at UX Poland 2013
Clustering as presented at UX Poland 2013Clustering as presented at UX Poland 2013
Clustering as presented at UX Poland 2013
Ravi Mynampaty
 
Shopin's Retail Intelligence Data Engine (R.I.D.E.) analysis of Coach
Shopin's Retail Intelligence Data Engine (R.I.D.E.) analysis of CoachShopin's Retail Intelligence Data Engine (R.I.D.E.) analysis of Coach
Shopin's Retail Intelligence Data Engine (R.I.D.E.) analysis of Coach
Eran Eyal
 

Semelhante a Tanvi Motwani, Lead Data Scientist, Guided Search at A9.com at MLconf ATL 2016 (20)

Search Analytics: Conversations with Your Customers
Search Analytics: Conversations with Your CustomersSearch Analytics: Conversations with Your Customers
Search Analytics: Conversations with Your Customers
 
Using Search Analytics to Diagnose What’s Ailing your Information Architecture
Using Search Analytics to Diagnose What’s Ailing your Information ArchitectureUsing Search Analytics to Diagnose What’s Ailing your Information Architecture
Using Search Analytics to Diagnose What’s Ailing your Information Architecture
 
Sourcing with Social Media: Tips from a Corporate Sleuth by Sean Campbell
Sourcing with Social Media: Tips from a Corporate Sleuth by Sean CampbellSourcing with Social Media: Tips from a Corporate Sleuth by Sean Campbell
Sourcing with Social Media: Tips from a Corporate Sleuth by Sean Campbell
 
Summit slide loop ny
Summit slide loop nySummit slide loop ny
Summit slide loop ny
 
Search Analytics for Fun and Profit
Search Analytics for Fun and ProfitSearch Analytics for Fun and Profit
Search Analytics for Fun and Profit
 
Teresa Torres - Productized Masterclasses
Teresa Torres - Productized MasterclassesTeresa Torres - Productized Masterclasses
Teresa Torres - Productized Masterclasses
 
Search Analytics: Diagnosing what ails your site
Search Analytics:  Diagnosing what ails your siteSearch Analytics:  Diagnosing what ails your site
Search Analytics: Diagnosing what ails your site
 
Dow Jones Innovation 101 Oct19
Dow Jones Innovation 101 Oct19Dow Jones Innovation 101 Oct19
Dow Jones Innovation 101 Oct19
 
Master Minds on Data Science - Maarten de Rijke
Master Minds on Data Science - Maarten de RijkeMaster Minds on Data Science - Maarten de Rijke
Master Minds on Data Science - Maarten de Rijke
 
Hiring toolbox for startups
Hiring toolbox for startupsHiring toolbox for startups
Hiring toolbox for startups
 
Search is the new UI
Search is the new UISearch is the new UI
Search is the new UI
 
Keyword Research
Keyword ResearchKeyword Research
Keyword Research
 
Marketing AI - How to Build a Keyword Ontology
Marketing AI - How to Build a Keyword OntologyMarketing AI - How to Build a Keyword Ontology
Marketing AI - How to Build a Keyword Ontology
 
Deep Learning for Semantic Search in E-commerce​
Deep Learning for Semantic Search in E-commerce​Deep Learning for Semantic Search in E-commerce​
Deep Learning for Semantic Search in E-commerce​
 
Learn how to search VHL Search Portal - intermediate (tutorial)
Learn how to search VHL Search Portal - intermediate (tutorial)Learn how to search VHL Search Portal - intermediate (tutorial)
Learn how to search VHL Search Portal - intermediate (tutorial)
 
Clustering as presented at UX Poland 2013
Clustering as presented at UX Poland 2013Clustering as presented at UX Poland 2013
Clustering as presented at UX Poland 2013
 
Hearst Faceted Metadata for Site Navigation and Search
Hearst Faceted Metadata for Site Navigation and SearchHearst Faceted Metadata for Site Navigation and Search
Hearst Faceted Metadata for Site Navigation and Search
 
Ad campaign research
Ad campaign researchAd campaign research
Ad campaign research
 
Shopin's Retail Intelligence Data Engine (R.I.D.E.) analysis of Coach
Shopin's Retail Intelligence Data Engine (R.I.D.E.) analysis of CoachShopin's Retail Intelligence Data Engine (R.I.D.E.) analysis of Coach
Shopin's Retail Intelligence Data Engine (R.I.D.E.) analysis of Coach
 
Search Analytics: Diagnosing what ails your site
Search Analytics:  Diagnosing what ails your siteSearch Analytics:  Diagnosing what ails your site
Search Analytics: Diagnosing what ails your site
 

Mais de MLconf

Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
MLconf
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
MLconf
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
MLconf
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
MLconf
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
MLconf
 

Mais de MLconf (20)

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious Experience
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the Cheap
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data Collection
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of ML
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to code
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better Software
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime Changes
 

Último

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Último (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 

Tanvi Motwani, Lead Data Scientist, Guided Search at A9.com at MLconf ATL 2016

  • 1. September 23, 2016 Query Understanding In Amazon Search Tanvi Motwani Data Scientist Amazon Search
  • 4. Power Law Distribution of Queries Large population of long tail queries Speedy Search Response Fast Models that respond in milliseconds Dynamic Search Trends Adaptive to new trends Global Search Reach Deal with 10 different languages CHALLENGES 4
  • 5. “label”: “_color”, “id” : C232, “name” : “Black” “label”: “_brand”, “id” : B1402, “name” : “The North Face” “label” : “_product”, “id” : P232, “name” : “Jacket” “category” : “Fashion > Clothing > Jackets & Coats” “class”: “product_query”, “score”: 0.9 “name”: “query_specificity”, “score”: 0.7 “class”: “fashion”, “score”: 0.8 QUERY TAGGERS QUERY CLASSIFIERS 5
  • 7. QUERY CATEGORY CLASSIFIER “A Multiclass Classifier which classifies input user query into Amazon Categories.” 7
  • 8. QUERY CATEGORY CLASSIFIER • Automatic generates large training dataset • Frequent refresh of training data possible • Trigram model generalizes well for tail queries tv ipod projector speakers headphones pillow curtains pet bells mattress shower curtain suits mr robot star trek downton abbey game of thrones Trigram Language Model 8 Large percentage of query searches happen within a category
  • 9. CUSTOMER SERVICE QUERY CLASSIFIER “Classifies query into customer service queries versus product query.” contact amazon amazon phone number how do I cancel my order? where is my order history? where is my order? how can I see videos? amazon prime video help 9
  • 11. Brand Product Type Price under Gender Is Prime QUERY TAGGING FILTERING 11
  • 13. QUERY TAGGING (BRAND) adidas shoes jansport backpack ray ban sunglasses ralph lauren men BRAND BRAND BRAND BRAND ralph lauren men IB O Conditional Random Field adidas shoes B O ray ban sunglasses B I O ralph lauren men B I O jansport backpack B O how? TRAIN 13 what?
  • 14. 14 • Discriminative Model – models conditional probability P(Y|X). We do not care to model P(X) • Features: word capitalized, word in atlas or name list, previous word is “Mrs”, next word is “Times”, … Recommended Tutorial on CRF – An Introduction to Conditional Random Fields for Relational Learning https://people.cs.umass.edu/~mccallum/papers/crf-tutorial.pdf CONDITIONAL RANDOM FIELD
  • 15. adidas shoes jansport backpack ray ban sunglasses north face black jacket polo ralph lauren men white shoes ralph lauren click add purchase QUERY LOGS PRODUCT CATALOGUE 15
  • 16. north face black jacket QUERY LOGS PRODUCT CATALOGUE 16
  • 17. arg max 𝑏 𝑖 ∈ 𝑃(𝑏) 𝑓(𝑐𝑖, 𝑎𝑖, 𝑝𝑖) 𝑤ℎ𝑒𝑟𝑒, 𝑏 𝑖𝑠 𝑎 𝑏𝑟𝑎𝑛𝑑 𝑎𝑛𝑑, 𝑃(𝑏) 𝑎𝑟𝑒 𝑠𝑒𝑡 𝑜𝑓 𝑝𝑟𝑜𝑑𝑢𝑐𝑡𝑠 𝑐𝑜𝑟𝑟𝑒𝑠𝑝𝑜𝑛𝑑𝑖𝑛𝑔 𝑡𝑜 𝑏𝑟𝑎𝑛𝑑 𝑏 north face black jacket 0.8 0.2BRAND Matching Strategies: • Attribute completely contained in query • Match after removing stop words, prepositions etc. • Partial query-attribute match 17
  • 18. QUERY TAGGING - SUMMARY Training Data Generator Query Logs Product Catalogue Conditional Random Field Search Engine TRAIN DEPLOY Manual Overrides 18 • Context aware “Philosophy books” v/s “Philosophy face wash” • Different formulations of same entity “Marc by Marc Jacobs” v/s “Marc Jacobs”
  • 19. Query Understanding Team • Palo Alto, California • Munich, Germany • Tokyo, Japan • Beijing, China Acknowledgements Mukund Seshadri, Tracy King, Will Headden, Louka Dlagnekov, Tianyu Cao, Rahul Goutam, Huascar Fiorletta, Alexander Zeyliger, Smruthi Mukund, Konstantin Stulov, Himanshu Gahlot, Yosi Shturm, Taro Kawagishi, Ravi Jammalamadaka, Anand Lakshminath, Hernan, Greg Miller, Heran, Nick Trown
  • 21. Base Product DB Accessory Product DB ACCESSORY QUERY CLASSIFIER macbook pro macs apple laptop laptop Base Query Corpus mac ram apple sleeve laptop cover apple skin Accessory Query Corpus Binary Classifier Class A Class B Search Engine 21
  • 23. Base Product DB Accessory Product DB ACCESSORY QUERY CLASSIFIER paperwhite kindle amazon tablet book reader Base Query Corpus Kindle case Kindle cover Amazon cover case for kindle Accessory Query Corpus Binary Classifier Class A Class B Search Engine 23
  • 24. Training Data Generator Query Logs Product Catalogue Conditional Random Field Search Engine TRAIN DEPLOY QUERY TAGGING - SUMMARY Manual Overrides Validation Techniques: • Offline validation  Cross validation 80/20 split  Manual Gold Standard evaluation • A/B test  Control – Before the model was deployed  Treatment – After the model is deployed 24
  • 25. • Dictionary methods are not context aware  Example: “philosophy books”, dictionary method will tag “philosophy” as brand. • Fails to detect different formulations of same entity.  Example: “mk” vs. “michael kors” COMPARISON TO DICTIONARY LOOKUP METHODS Our system improved precision over baseline by 10% and approximately doubled the recall. 25
  • 27. 27 GENERATIVE v/s DISCRIMINATIVE MODELS 𝑃(𝑌, 𝑋) 𝑃 𝑌 𝑋)

Notas do Editor

  1. 00:30 Hi, I am Tanvi Motwani from Query Understanding team of A9 and today we will look into how we make this happen.
  2. 01:00 Typical product search page User types in query which is free text Search box is the most frequent method a customer uses to find products at Amazon Lets zoom into the first result here. Query then hits the QU module which analyzes what user has typed and computes query features. These features then go to ranking module that finds most “relevant” product for our customers. Today we are going to look into how this QU module functions.
  3. 03:00 What we see in product page is detailed information about a product Like web search we have unstructured text eg product description Along with this we have lots of structured information like …. We also have Add to cart, wish list, buy buttons and more that provide users behavioral data We need to make use of this structural information to help us provide precise results Challenge is the query user types in is unstructured text. Extracting structured information from query and matching it with appropriate product fields enable us to surface relevant products for the user We label parts of query like… we call these query annotations. We also perform query classification which is at the level of whole query eg: category class for this query is “Clothing”. High Level motivation
  4. 5:30
  5. 8:00 Add query taggers. Gray out black text.
  6. 9:00 Understanding category enables new features for example we can ask you a specific question.. Screenshots zoom in
  7. 11:00 This is a standard NLP approach Benefits on separate slide.
  8. 12:00 Change this slide with training data picture
  9. 05:30 See if you can show Nike on left nav
  10. Split into slide
  11. Kate spade new york example
  12. 06:30 1. User types “macbook pro” 2. if we use query words as features we get products containing macbook pro like 2nd , 4th . 3. If we tag macbook pro is product type as a feature, the rank function will rank higher products that have the same product type and so we see actual macbook pros bubbling up.
  13. 06:30 1. User types “macbook pro” 2. if we use query words as features we get products containing macbook pro like 2nd , 4th . 3. If we tag macbook pro is product type as a feature, the rank function will rank higher products that have the same product type and so we see actual macbook pros bubbling up.
  14. 6:00