SlideShare uma empresa Scribd logo
1 de 15
Baixar para ler offline
RELEVANCY HACKS FOR ECOMMERCE
VARUN THACKER

!

@VARUNTHACKER
AGENDA
•
•
•
•
•

How to solve multiple eCommerce use cases by using the features present in Solr
Query Parsing
Building on the TF-IDF scoring model and improving it for your data set
Adding relevancy signals to your score to rank documents better
Customising search results on a per query basis
HOW DO QUERIES SCORE DOCUMENTS?
•

Example document:

{
“title” : ”LG Nexus 5”,
“brand” : ”LG”,
“category” : “Smartphones”
“tags” : “phones, android, touch”
}
•

query = LG Nexus
HOW DO QUERIES SCORE DOCUMENTS?
•
•
•

•

Scores are field relative.
I want a Query which will match against all the fields for each token.
Approach 1: Use a BooleanQuery
• Query query1 = new TermQuery(new Term("title", “lg"));
• Query query2 = new TermQuery(new Term("title", "nexus"));
• Query query3 = new TermQuery(new Term("brand", "lg"));
• Query query4 = new TermQuery(new Term("brand", "nexus"));
• Add all the queries into a BooleanQuery
• Score = query1 + query2 + query3 + query4
• This would add the match for "lg" twice.
Approach 2: Use DisjunctionMaxQuery - It automatically scores each document
with the maximum score for that document as produced by any subquery
DEFAULT SIMILARITY FACTORS
•
•
•
•
•

TF - number of occurrences of the term in the document.
IDF - Is a measure of how unique or rare the term is.
Normalisation's - Both at index time and at query time
Coordination factor - number of matches of the query term in each document
These statistics are per field
WHY THE DEFAULT SCORING MAY NOT WORK?
•
•

TF-IDF is calculated per field.
Lets take term frequency first:
• Product 1: iPad Air
• Product 2: iPad Air case. Works well with iPad 3 and iPad 2
• query = iPad
• Product 2 would rank before Product 1
• But obviously this is not what the user would be looking for
• Does iPad occurring multiple times make it more important?
• Idea - Let’s make TF = 1 for a token match
WHY THE DEFAULT SCORING MAY NOT WORK?
•

Tackling Inverse Document Frequency
• Product 1 - brown jacket
• Product 2 - leather jacket
• q = brown leather jacket
• IDF is Not a measure of usefulness but a measure of rarity.
• Should IDF from your corpus be the true judge on whether “leather” is more
important than “brown”
• Maybe you stock less brown jackets but it doesn’t mean that it is more
important than a leather jacket.
• Combine data of many stores in your vertical and compute the IDF score
offline
• Feed it back into your Custom Similarity implementation
WHY THE DEFAULT SCORING MAY NOT WORK?
•

The "tie" factor between two documents with the same number of term matches is
"fieldNorm". This means the document which contains lesser number of tokens.
FUNCTION QUERIES
•
•
•

FunctionQuery allows one to use the actual value of a field and functions of those
fields in a relevancy score.
It iterates over all documents serially applying the function
Can be multiplied into the score by using the boost param in the eDismax request
handler
INCLUDE POPULARITY DATA
•
•
•
•
•

Popularity could be anything - Maximum selling items, Highest viewed products,
trending etc.
Compute the "popularity" score offline for each document in the index.
Stick them into the document if your data set is small else you could use a
ExternalFileField
Use a function query:
•
&boost= multiple popularity score value * score
With the new expressions module coming in Lucene 4.6 it’s fairly simple to add
multiple signals into your ranking formula
• Expression expr = JavascriptCompiler.compile("_score + ln(popularity) +
ln(margin)");
ADDING CLICK THROUGH DATA
•
•

Use this on a per query basis or a set of similar queries.
We used function queries which take
• id’s and their associated boost

!

•

An external application would enable the function query depending on the search
query
BOOSTING NEWER PRODUCTS
•
•

Blindly sort the result
• &sort = release_date desc
Give preference to Newer Products
• recip(ms(NOW/DAY,pub_date),3.16e-11,1,1)
• Where recip(m, x, a, b) = a / (mx + b)
• Picking a=2, b =1, m = 3.16e-11
• Gives a boost =2 for todays product
• Gives a boost =1.3 for 1/2 year old product
• Gives a boost =1 for 1 year old product and so on
I'M STILL NOT SATISFIED!
•
•
•

Take your top N queries and use the QueryElevationComponent :)
Fix particular documents for certain queries
No scoring is taken into consideration for these queries

!

<elevate>
<query text="android phones">
<doc id="nexus 4" />
<doc id="iPhone" exclude="true"/>
</query>
</elevate>
THANK YOU
•

Questions?

Mais conteúdo relacionado

Semelhante a Relevancy hacks for eCommerce

How to Achieve Scale with MongoDB
How to Achieve Scale with MongoDBHow to Achieve Scale with MongoDB
How to Achieve Scale with MongoDBMongoDB
 
Software Project Cost Estimation
Software Project Cost EstimationSoftware Project Cost Estimation
Software Project Cost EstimationDrew Tkac
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchTrey Grainger
 
Metamorphic Testing for Machine Learning Models with Search Relevancy Example
Metamorphic Testing for Machine Learning Models with Search Relevancy ExampleMetamorphic Testing for Machine Learning Models with Search Relevancy Example
Metamorphic Testing for Machine Learning Models with Search Relevancy ExampleVinayaka Mayura G G
 
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...Databricks
 
Dice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank TalkDice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank TalkSimon Hughes
 
Relational data modeling trends for transactional applications
Relational data modeling trends for transactional applicationsRelational data modeling trends for transactional applications
Relational data modeling trends for transactional applicationsIke Ellis
 
Big Data Matching - How to Find Two Similar Needles in a Really Big Haystack
Big Data Matching - How to Find Two Similar Needles in a Really Big HaystackBig Data Matching - How to Find Two Similar Needles in a Really Big Haystack
Big Data Matching - How to Find Two Similar Needles in a Really Big HaystackPrecisely
 
Automated product categorization
Automated product categorizationAutomated product categorization
Automated product categorizationAndreas Loupasakis
 
Automated product categorization
Automated product categorization   Automated product categorization
Automated product categorization Warply
 
Agile experiments in Machine Learning with F#
Agile experiments in Machine Learning with F#Agile experiments in Machine Learning with F#
Agile experiments in Machine Learning with F#J On The Beach
 
Moyez Dreamforce 2017 presentation on Large Data Volumes in Salesforce
Moyez Dreamforce 2017 presentation on Large Data Volumes in SalesforceMoyez Dreamforce 2017 presentation on Large Data Volumes in Salesforce
Moyez Dreamforce 2017 presentation on Large Data Volumes in SalesforceMoyez Thanawalla
 
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Lucidworks
 
Webinar: Scaling MongoDB
Webinar: Scaling MongoDBWebinar: Scaling MongoDB
Webinar: Scaling MongoDBMongoDB
 
Large Data Volume Salesforce experiences
Large Data Volume Salesforce experiencesLarge Data Volume Salesforce experiences
Large Data Volume Salesforce experiencesCidar Mendizabal
 
Knolidge - Discover What You Have
Knolidge - Discover What You HaveKnolidge - Discover What You Have
Knolidge - Discover What You Haveknolidge
 
How I Learned to Stop Worrying and Love Legacy Code - Ox:Agile 2018
How I Learned to Stop Worrying and Love Legacy Code - Ox:Agile 2018How I Learned to Stop Worrying and Love Legacy Code - Ox:Agile 2018
How I Learned to Stop Worrying and Love Legacy Code - Ox:Agile 2018Mike Harris
 

Semelhante a Relevancy hacks for eCommerce (20)

How to Achieve Scale with MongoDB
How to Achieve Scale with MongoDBHow to Achieve Scale with MongoDB
How to Achieve Scale with MongoDB
 
Software Project Cost Estimation
Software Project Cost EstimationSoftware Project Cost Estimation
Software Project Cost Estimation
 
Test data generation
Test data generationTest data generation
Test data generation
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Metamorphic Testing for Machine Learning Models with Search Relevancy Example
Metamorphic Testing for Machine Learning Models with Search Relevancy ExampleMetamorphic Testing for Machine Learning Models with Search Relevancy Example
Metamorphic Testing for Machine Learning Models with Search Relevancy Example
 
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
 
Dice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank TalkDice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank Talk
 
Splunk bsides
Splunk bsidesSplunk bsides
Splunk bsides
 
Relational data modeling trends for transactional applications
Relational data modeling trends for transactional applicationsRelational data modeling trends for transactional applications
Relational data modeling trends for transactional applications
 
Big Data Matching - How to Find Two Similar Needles in a Really Big Haystack
Big Data Matching - How to Find Two Similar Needles in a Really Big HaystackBig Data Matching - How to Find Two Similar Needles in a Really Big Haystack
Big Data Matching - How to Find Two Similar Needles in a Really Big Haystack
 
Automated product categorization
Automated product categorizationAutomated product categorization
Automated product categorization
 
Automated product categorization
Automated product categorization   Automated product categorization
Automated product categorization
 
Agile experiments in Machine Learning with F#
Agile experiments in Machine Learning with F#Agile experiments in Machine Learning with F#
Agile experiments in Machine Learning with F#
 
Moyez Dreamforce 2017 presentation on Large Data Volumes in Salesforce
Moyez Dreamforce 2017 presentation on Large Data Volumes in SalesforceMoyez Dreamforce 2017 presentation on Large Data Volumes in Salesforce
Moyez Dreamforce 2017 presentation on Large Data Volumes in Salesforce
 
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
 
QA and scrum
QA and scrumQA and scrum
QA and scrum
 
Webinar: Scaling MongoDB
Webinar: Scaling MongoDBWebinar: Scaling MongoDB
Webinar: Scaling MongoDB
 
Large Data Volume Salesforce experiences
Large Data Volume Salesforce experiencesLarge Data Volume Salesforce experiences
Large Data Volume Salesforce experiences
 
Knolidge - Discover What You Have
Knolidge - Discover What You HaveKnolidge - Discover What You Have
Knolidge - Discover What You Have
 
How I Learned to Stop Worrying and Love Legacy Code - Ox:Agile 2018
How I Learned to Stop Worrying and Love Legacy Code - Ox:Agile 2018How I Learned to Stop Worrying and Love Legacy Code - Ox:Agile 2018
How I Learned to Stop Worrying and Love Legacy Code - Ox:Agile 2018
 

Último

From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 

Último (20)

From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 

Relevancy hacks for eCommerce

  • 1.
  • 2. RELEVANCY HACKS FOR ECOMMERCE VARUN THACKER ! @VARUNTHACKER
  • 3. AGENDA • • • • • How to solve multiple eCommerce use cases by using the features present in Solr Query Parsing Building on the TF-IDF scoring model and improving it for your data set Adding relevancy signals to your score to rank documents better Customising search results on a per query basis
  • 4. HOW DO QUERIES SCORE DOCUMENTS? • Example document: { “title” : ”LG Nexus 5”, “brand” : ”LG”, “category” : “Smartphones” “tags” : “phones, android, touch” } • query = LG Nexus
  • 5. HOW DO QUERIES SCORE DOCUMENTS? • • • • Scores are field relative. I want a Query which will match against all the fields for each token. Approach 1: Use a BooleanQuery • Query query1 = new TermQuery(new Term("title", “lg")); • Query query2 = new TermQuery(new Term("title", "nexus")); • Query query3 = new TermQuery(new Term("brand", "lg")); • Query query4 = new TermQuery(new Term("brand", "nexus")); • Add all the queries into a BooleanQuery • Score = query1 + query2 + query3 + query4 • This would add the match for "lg" twice. Approach 2: Use DisjunctionMaxQuery - It automatically scores each document with the maximum score for that document as produced by any subquery
  • 6. DEFAULT SIMILARITY FACTORS • • • • • TF - number of occurrences of the term in the document. IDF - Is a measure of how unique or rare the term is. Normalisation's - Both at index time and at query time Coordination factor - number of matches of the query term in each document These statistics are per field
  • 7. WHY THE DEFAULT SCORING MAY NOT WORK? • • TF-IDF is calculated per field. Lets take term frequency first: • Product 1: iPad Air • Product 2: iPad Air case. Works well with iPad 3 and iPad 2 • query = iPad • Product 2 would rank before Product 1 • But obviously this is not what the user would be looking for • Does iPad occurring multiple times make it more important? • Idea - Let’s make TF = 1 for a token match
  • 8. WHY THE DEFAULT SCORING MAY NOT WORK? • Tackling Inverse Document Frequency • Product 1 - brown jacket • Product 2 - leather jacket • q = brown leather jacket • IDF is Not a measure of usefulness but a measure of rarity. • Should IDF from your corpus be the true judge on whether “leather” is more important than “brown” • Maybe you stock less brown jackets but it doesn’t mean that it is more important than a leather jacket. • Combine data of many stores in your vertical and compute the IDF score offline • Feed it back into your Custom Similarity implementation
  • 9. WHY THE DEFAULT SCORING MAY NOT WORK? • The "tie" factor between two documents with the same number of term matches is "fieldNorm". This means the document which contains lesser number of tokens.
  • 10. FUNCTION QUERIES • • • FunctionQuery allows one to use the actual value of a field and functions of those fields in a relevancy score. It iterates over all documents serially applying the function Can be multiplied into the score by using the boost param in the eDismax request handler
  • 11. INCLUDE POPULARITY DATA • • • • • Popularity could be anything - Maximum selling items, Highest viewed products, trending etc. Compute the "popularity" score offline for each document in the index. Stick them into the document if your data set is small else you could use a ExternalFileField Use a function query: • &boost= multiple popularity score value * score With the new expressions module coming in Lucene 4.6 it’s fairly simple to add multiple signals into your ranking formula • Expression expr = JavascriptCompiler.compile("_score + ln(popularity) + ln(margin)");
  • 12. ADDING CLICK THROUGH DATA • • Use this on a per query basis or a set of similar queries. We used function queries which take • id’s and their associated boost ! • An external application would enable the function query depending on the search query
  • 13. BOOSTING NEWER PRODUCTS • • Blindly sort the result • &sort = release_date desc Give preference to Newer Products • recip(ms(NOW/DAY,pub_date),3.16e-11,1,1) • Where recip(m, x, a, b) = a / (mx + b) • Picking a=2, b =1, m = 3.16e-11 • Gives a boost =2 for todays product • Gives a boost =1.3 for 1/2 year old product • Gives a boost =1 for 1 year old product and so on
  • 14. I'M STILL NOT SATISFIED! • • • Take your top N queries and use the QueryElevationComponent :) Fix particular documents for certain queries No scoring is taken into consideration for these queries ! <elevate> <query text="android phones"> <doc id="nexus 4" /> <doc id="iPhone" exclude="true"/> </query> </elevate>