SlideShare uma empresa Scribd logo
1 de 38
Building Ranking Infrastructure: 
Data-Driven, Lean, Flexible 
Sergii Khomenko, Data Scientist, STYLIGHT 
sergii.khomenko@stylight.com, @lc0d3r
Agenda 
2 
1. Problem definition 
2. Boosting 
3. Lean approach to ranking infrastructure 
4. Real-word examples
LUCENE, SOLR, ELASTICSEARCH 
Building 
Ranking 
Infrastructure: 
Data-­‐Driven, 
Lean, 
Flexible 
3
SOLR USERS 
4
THE BEST PLACE TO DICOVER FASHION 
Building 
Ranking 
Infrastructure: 
Data-­‐Driven, 
Lean, 
Flexible 
5
GET INSPIRED BY LOOKS CREATED BY COMMIUNITY 
Building 
Ranking 
Infrastructure: 
Data-­‐Driven, 
Lean, 
Flexible 
6
STYLIGHT – INTERNATIONAL COMMUNITY 
Building 
Ranking 
Infrastructure: 
Data-­‐Driven, 
Lean, 
Flexible 
7
PROBLEM DEFINITION
Problem definition 
9
Problem definition 
Ranking specifics: 
• Seasonal influence 
• Trends 
• Cold start of new countries, shops 
• Multiple dimensions of ranking model
IMPROVING RELEVANCE
TF-IDF - default scoring model in Lucene/Solr 
• matching more query terms is better 
• more occurrences of a query term is better 
• more novel terms increase doc score more 
than common terms
Improving relevance 
Stages to improve relevance in Solr 
• Editorial voting 
(QueryEvaluationComponent) 
• Indexing time 
(analyzing content, text analysis) 
• Query-time 
(function queries, boosting)
Improving relevance - Solr queries 
q = +brand:adidas shop:monshowroom^3 
q = +adidas monshowroom 
defType = dismax 
qf = brand shop^3 
sort = user_ratings desc, score desc 
qq = adidas 
q = {!boost b=$b defType=dismax v=$qq} 
b = prod(popularity, clicks)
Definition of solr.ExternalFileField 
<types> 
<fieldType 
name="float" 
class="solr.FloatField" 
omitNorms="true"/> 
<fieldType 
name="file_delta2" 
class="solr.ExternalFileField" 
keyField="id" 
defVal="1.0" 
indexed="false" 
stored="false" 
valType="float" 
/> 
</types> 
<fields> 
<field 
name="delta2" 
type="file_delta2"/> 
</fields>
Example of external file with boosting 
coresde_DEproductsexternal_delta2.txt 
15062471=0.5 
15062479=0.2 
15062507=0.41
LEAN APPROACH TO RANKING
Lean manufacturing, lean enterprise, or lean 
production, often simply, "lean", is a production practice 
that considers the expenditure of resources for any goal other 
than the creation of value for the end customer to be 
wasteful, and thus a target for elimination. 
Essentially, lean is centered on preserving value with 
less work. 
18
Lean approach to Ranking 
Requirements: 
• Decreasing time to implement new ranking 
model 
• Possibility to use more dynamic ranking models 
• Keeping working infrastructure alive
Lean approach to Ranking 
Requirements: 
• A/B testing without changing entire 
infrastructure 
• Performance level - “still fast” and “transparent”
Python benchmark, consistency checker 
--gaid gaid, -g gaid Google analytics site id. 
--gadate gadate 
a date to fetch the most popular pages from 
Google Analytics 
–-solr solr, -s solr Solr server to benchmark performance. 
--pages number, -p number a number of top pages from Google 
Analytics. 
--repeats number, -r number a number of repeats for an every 
page. 
--compare compare, -c compare Different rankings algorithms to 
compare.
Python benchmark, consistency checker 
python solr-benchmarkbenchmark.py -c 
RankingClassical,RankingDelta2 --cmpmode 
1 
python solr-benchmarkbenchmark.py -c 
RankingClassical,RankingDelta2
Common search infrastructure 
nginx Solr 
Jboss Solr-loadbalancer nginx Solr 
nginx Solr
Updated infrastructure 
nginx Solr 
Jboss Solr-loadbalancer nginx Solr 
nginx Solr 
Jboss Solr-loadbalancer nginx Solr 
Front-end loadbalancer
manifests/nodes/solr0x-node.pp 
include nginx 
nginx::config { "solr_dev": } 
nginx::solr-ranking { "delta2": 
urls => [ 
"/search.action? 
gender=women&brand=2271&tag=1161&tag=877&tag 
=468", 
"/search.action? 
gender=men&brand=11235&tag=10203&tag=10299&ta 
g=10326" 
],
nginx / templates / conf / solr-rewrites.conf.erb 
<% urls.each do |url| -%> 
if ($args ~* <% if url['gender'] > 0 -%>gender_id%3A<%= 
url['gender'] %>.*<% end -%><% url['tags'].each do |tag| - 
%>tag_id%3A<%= tag %>.*<% end -%><% if url['brand'] > 0 - 
%>brand_id%3A%28<%= url['brand'] %>%29<% end -%>) { 
set $orig $args; 
set $args "q={!boost+b=%24b+defType=dismax+v=%24qq} 
&qq=id:*"; 
rewrite ^(.*)$ "$1?$orig" break; 
} 
<% end -%>
REAL-WORLD EXAMPLES
Elephant-Driven architecture 
28
Simplified Version
Simplified Version
Real-world examples
Multiple points to evaluate 
Stages to evaluate the model: 
• R ranking model 
• Independent Solr-node 
• For internal use-cases 
• Testing for some of pages 
• A/B roll out for % of users 
• Production roll out
Real-world examples
Real-world examples
Multiple points to evaluate 
Stages to evaluate the model: 
• R ranking model 
• Independent Solr-node 
• For internal use-cases 
• Testing for some of pages 
• A/B roll out for % of users 
• Production roll out
Thanks for your attention! 
Questions?
Sergii Khomenko 
Data Scientist 
STYLIGHT GmbH 
sergii.khomenko@stylight.com 
@lc0d3r 
Nymphenburger Straße 86 
80636 Munich, Germany 
S T Y L I G H T . C O M
REFERENCE LIST 
38 
• Stack Overflow Tag Trends 
http://hewgill.com/~greg/stackoverflow/ 
stack_overflow/tags/#!lucene+solr+elasticsearch 
+sphinx 
• Public websites using Solr 
http://wiki.apache.org/solr/PublicServers 
• CommonQueryParameters 
http://wiki.apache.org/solr/ 
CommonQueryParameters 
• Thoughts in plain text http://lc0.github.io/ 
• STYLIGHT Engineering 
http://www.stylight.com/Engineering/

Mais conteúdo relacionado

Destaque

Masters of SlideShare
Masters of SlideShareMasters of SlideShare
Masters of SlideShareKapost
 
STOP! VIEW THIS! 10-Step Checklist When Uploading to Slideshare
STOP! VIEW THIS! 10-Step Checklist When Uploading to SlideshareSTOP! VIEW THIS! 10-Step Checklist When Uploading to Slideshare
STOP! VIEW THIS! 10-Step Checklist When Uploading to SlideshareEmpowered Presentations
 
10 Ways to Win at SlideShare SEO & Presentation Optimization
10 Ways to Win at SlideShare SEO & Presentation Optimization10 Ways to Win at SlideShare SEO & Presentation Optimization
10 Ways to Win at SlideShare SEO & Presentation OptimizationOneupweb
 
How To Get More From SlideShare - Super-Simple Tips For Content Marketing
How To Get More From SlideShare - Super-Simple Tips For Content MarketingHow To Get More From SlideShare - Super-Simple Tips For Content Marketing
How To Get More From SlideShare - Super-Simple Tips For Content MarketingContent Marketing Institute
 
A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...
A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...
A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...SlideShare
 
2015 Upload Campaigns Calendar - SlideShare
2015 Upload Campaigns Calendar - SlideShare2015 Upload Campaigns Calendar - SlideShare
2015 Upload Campaigns Calendar - SlideShareSlideShare
 
What to Upload to SlideShare
What to Upload to SlideShareWhat to Upload to SlideShare
What to Upload to SlideShareSlideShare
 
How to Make Awesome SlideShares: Tips & Tricks
How to Make Awesome SlideShares: Tips & TricksHow to Make Awesome SlideShares: Tips & Tricks
How to Make Awesome SlideShares: Tips & TricksSlideShare
 
Getting Started With SlideShare
Getting Started With SlideShareGetting Started With SlideShare
Getting Started With SlideShareSlideShare
 

Destaque (10)

Masters of SlideShare
Masters of SlideShareMasters of SlideShare
Masters of SlideShare
 
STOP! VIEW THIS! 10-Step Checklist When Uploading to Slideshare
STOP! VIEW THIS! 10-Step Checklist When Uploading to SlideshareSTOP! VIEW THIS! 10-Step Checklist When Uploading to Slideshare
STOP! VIEW THIS! 10-Step Checklist When Uploading to Slideshare
 
You Suck At PowerPoint!
You Suck At PowerPoint!You Suck At PowerPoint!
You Suck At PowerPoint!
 
10 Ways to Win at SlideShare SEO & Presentation Optimization
10 Ways to Win at SlideShare SEO & Presentation Optimization10 Ways to Win at SlideShare SEO & Presentation Optimization
10 Ways to Win at SlideShare SEO & Presentation Optimization
 
How To Get More From SlideShare - Super-Simple Tips For Content Marketing
How To Get More From SlideShare - Super-Simple Tips For Content MarketingHow To Get More From SlideShare - Super-Simple Tips For Content Marketing
How To Get More From SlideShare - Super-Simple Tips For Content Marketing
 
A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...
A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...
A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...
 
2015 Upload Campaigns Calendar - SlideShare
2015 Upload Campaigns Calendar - SlideShare2015 Upload Campaigns Calendar - SlideShare
2015 Upload Campaigns Calendar - SlideShare
 
What to Upload to SlideShare
What to Upload to SlideShareWhat to Upload to SlideShare
What to Upload to SlideShare
 
How to Make Awesome SlideShares: Tips & Tricks
How to Make Awesome SlideShares: Tips & TricksHow to Make Awesome SlideShares: Tips & Tricks
How to Make Awesome SlideShares: Tips & Tricks
 
Getting Started With SlideShare
Getting Started With SlideShareGetting Started With SlideShare
Getting Started With SlideShare
 

Semelhante a Building Ranking Infrastructure: Data-Driven, Lean, Flexible - Sergii Khomenko, STYLIGHT / ApacheCon EU, Budapest 2014

6 tips for improving ruby performance
6 tips for improving ruby performance6 tips for improving ruby performance
6 tips for improving ruby performanceEngine Yard
 
Digital analytics with R - Sydney Users of R Forum - May 2015
Digital analytics with R - Sydney Users of R Forum - May 2015Digital analytics with R - Sydney Users of R Forum - May 2015
Digital analytics with R - Sydney Users of R Forum - May 2015Johann de Boer
 
MongoDB World 2018: Pissing Off IT and Delivery: A Tale of 2 ODS’s
MongoDB World 2018: Pissing Off IT and Delivery: A Tale of 2 ODS’sMongoDB World 2018: Pissing Off IT and Delivery: A Tale of 2 ODS’s
MongoDB World 2018: Pissing Off IT and Delivery: A Tale of 2 ODS’sMongoDB
 
MongoDB.local Austin 2018: Pissing Off IT and Delivery: A Tale of 2 ODS's
MongoDB.local Austin 2018:  Pissing Off IT and Delivery: A Tale of 2 ODS'sMongoDB.local Austin 2018:  Pissing Off IT and Delivery: A Tale of 2 ODS's
MongoDB.local Austin 2018: Pissing Off IT and Delivery: A Tale of 2 ODS'sMongoDB
 
gDayX - Advanced angularjs
gDayX - Advanced angularjsgDayX - Advanced angularjs
gDayX - Advanced angularjsgdgvietnam
 
Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21Stamatis Zampetakis
 
[@IndeedEng] Engineering Velocity: Building Great Software Through Fast Itera...
[@IndeedEng] Engineering Velocity: Building Great Software Through Fast Itera...[@IndeedEng] Engineering Velocity: Building Great Software Through Fast Itera...
[@IndeedEng] Engineering Velocity: Building Great Software Through Fast Itera...indeedeng
 
Data Science in the Elastic Stack
Data Science in the Elastic StackData Science in the Elastic Stack
Data Science in the Elastic StackRochelle Sonnenberg
 
ATAGTR2017 Test Approach for Re-engineering Legacy Applications based on Micr...
ATAGTR2017 Test Approach for Re-engineering Legacy Applications based on Micr...ATAGTR2017 Test Approach for Re-engineering Legacy Applications based on Micr...
ATAGTR2017 Test Approach for Re-engineering Legacy Applications based on Micr...Agile Testing Alliance
 
Iterative Methodology for Personalization Models Optimization
 Iterative Methodology for Personalization Models Optimization Iterative Methodology for Personalization Models Optimization
Iterative Methodology for Personalization Models OptimizationSonya Liberman
 
Using Compass to Diagnose Performance Problems
Using Compass to Diagnose Performance Problems Using Compass to Diagnose Performance Problems
Using Compass to Diagnose Performance Problems MongoDB
 
Using Compass to Diagnose Performance Problems in Your Cluster
Using Compass to Diagnose Performance Problems in Your ClusterUsing Compass to Diagnose Performance Problems in Your Cluster
Using Compass to Diagnose Performance Problems in Your ClusterMongoDB
 
gDayX 2013 - Advanced AngularJS - Nicolas Embleton
gDayX 2013 - Advanced AngularJS - Nicolas EmbletongDayX 2013 - Advanced AngularJS - Nicolas Embleton
gDayX 2013 - Advanced AngularJS - Nicolas EmbletonGeorge Nguyen
 
Building a real time big data analytics platform with solr
Building a real time big data analytics platform with solrBuilding a real time big data analytics platform with solr
Building a real time big data analytics platform with solrTrey Grainger
 
Building a real time, big data analytics platform with solr
Building a real time, big data analytics platform with solrBuilding a real time, big data analytics platform with solr
Building a real time, big data analytics platform with solrlucenerevolution
 
Webinar: Scaling MongoDB
Webinar: Scaling MongoDBWebinar: Scaling MongoDB
Webinar: Scaling MongoDBMongoDB
 
FDMEE Scripting - Cloud and On-Premises - It Ain't Groovy, But It's My Bread ...
FDMEE Scripting - Cloud and On-Premises - It Ain't Groovy, But It's My Bread ...FDMEE Scripting - Cloud and On-Premises - It Ain't Groovy, But It's My Bread ...
FDMEE Scripting - Cloud and On-Premises - It Ain't Groovy, But It's My Bread ...Joseph Alaimo Jr
 
Supercharging your Organic CTR
Supercharging your Organic CTRSupercharging your Organic CTR
Supercharging your Organic CTRPhil Pearce
 
How to Automate your Enterprise Application / ERP Testing
How to Automate your  Enterprise Application / ERP TestingHow to Automate your  Enterprise Application / ERP Testing
How to Automate your Enterprise Application / ERP TestingRTTS
 

Semelhante a Building Ranking Infrastructure: Data-Driven, Lean, Flexible - Sergii Khomenko, STYLIGHT / ApacheCon EU, Budapest 2014 (20)

Mstr meetup
Mstr meetupMstr meetup
Mstr meetup
 
6 tips for improving ruby performance
6 tips for improving ruby performance6 tips for improving ruby performance
6 tips for improving ruby performance
 
Digital analytics with R - Sydney Users of R Forum - May 2015
Digital analytics with R - Sydney Users of R Forum - May 2015Digital analytics with R - Sydney Users of R Forum - May 2015
Digital analytics with R - Sydney Users of R Forum - May 2015
 
MongoDB World 2018: Pissing Off IT and Delivery: A Tale of 2 ODS’s
MongoDB World 2018: Pissing Off IT and Delivery: A Tale of 2 ODS’sMongoDB World 2018: Pissing Off IT and Delivery: A Tale of 2 ODS’s
MongoDB World 2018: Pissing Off IT and Delivery: A Tale of 2 ODS’s
 
MongoDB.local Austin 2018: Pissing Off IT and Delivery: A Tale of 2 ODS's
MongoDB.local Austin 2018:  Pissing Off IT and Delivery: A Tale of 2 ODS'sMongoDB.local Austin 2018:  Pissing Off IT and Delivery: A Tale of 2 ODS's
MongoDB.local Austin 2018: Pissing Off IT and Delivery: A Tale of 2 ODS's
 
gDayX - Advanced angularjs
gDayX - Advanced angularjsgDayX - Advanced angularjs
gDayX - Advanced angularjs
 
Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21
 
[@IndeedEng] Engineering Velocity: Building Great Software Through Fast Itera...
[@IndeedEng] Engineering Velocity: Building Great Software Through Fast Itera...[@IndeedEng] Engineering Velocity: Building Great Software Through Fast Itera...
[@IndeedEng] Engineering Velocity: Building Great Software Through Fast Itera...
 
Data Science in the Elastic Stack
Data Science in the Elastic StackData Science in the Elastic Stack
Data Science in the Elastic Stack
 
ATAGTR2017 Test Approach for Re-engineering Legacy Applications based on Micr...
ATAGTR2017 Test Approach for Re-engineering Legacy Applications based on Micr...ATAGTR2017 Test Approach for Re-engineering Legacy Applications based on Micr...
ATAGTR2017 Test Approach for Re-engineering Legacy Applications based on Micr...
 
Iterative Methodology for Personalization Models Optimization
 Iterative Methodology for Personalization Models Optimization Iterative Methodology for Personalization Models Optimization
Iterative Methodology for Personalization Models Optimization
 
Using Compass to Diagnose Performance Problems
Using Compass to Diagnose Performance Problems Using Compass to Diagnose Performance Problems
Using Compass to Diagnose Performance Problems
 
Using Compass to Diagnose Performance Problems in Your Cluster
Using Compass to Diagnose Performance Problems in Your ClusterUsing Compass to Diagnose Performance Problems in Your Cluster
Using Compass to Diagnose Performance Problems in Your Cluster
 
gDayX 2013 - Advanced AngularJS - Nicolas Embleton
gDayX 2013 - Advanced AngularJS - Nicolas EmbletongDayX 2013 - Advanced AngularJS - Nicolas Embleton
gDayX 2013 - Advanced AngularJS - Nicolas Embleton
 
Building a real time big data analytics platform with solr
Building a real time big data analytics platform with solrBuilding a real time big data analytics platform with solr
Building a real time big data analytics platform with solr
 
Building a real time, big data analytics platform with solr
Building a real time, big data analytics platform with solrBuilding a real time, big data analytics platform with solr
Building a real time, big data analytics platform with solr
 
Webinar: Scaling MongoDB
Webinar: Scaling MongoDBWebinar: Scaling MongoDB
Webinar: Scaling MongoDB
 
FDMEE Scripting - Cloud and On-Premises - It Ain't Groovy, But It's My Bread ...
FDMEE Scripting - Cloud and On-Premises - It Ain't Groovy, But It's My Bread ...FDMEE Scripting - Cloud and On-Premises - It Ain't Groovy, But It's My Bread ...
FDMEE Scripting - Cloud and On-Premises - It Ain't Groovy, But It's My Bread ...
 
Supercharging your Organic CTR
Supercharging your Organic CTRSupercharging your Organic CTR
Supercharging your Organic CTR
 
How to Automate your Enterprise Application / ERP Testing
How to Automate your  Enterprise Application / ERP TestingHow to Automate your  Enterprise Application / ERP Testing
How to Automate your Enterprise Application / ERP Testing
 

Mais de Sergii Khomenko

Handle your Lambdas - From event-based processing to Continuous Integration /...
Handle your Lambdas - From event-based processing to Continuous Integration /...Handle your Lambdas - From event-based processing to Continuous Integration /...
Handle your Lambdas - From event-based processing to Continuous Integration /...Sergii Khomenko
 
Building Data applications with Go: from Bloom filters to Data pipelines / FO...
Building Data applications with Go: from Bloom filters to Data pipelines / FO...Building Data applications with Go: from Bloom filters to Data pipelines / FO...
Building Data applications with Go: from Bloom filters to Data pipelines / FO...Sergii Khomenko
 
Building data pipelines: from simple to more advanced - hands-on experience /...
Building data pipelines: from simple to more advanced - hands-on experience /...Building data pipelines: from simple to more advanced - hands-on experience /...
Building data pipelines: from simple to more advanced - hands-on experience /...Sergii Khomenko
 
Scaling up Business Intelligence from the scratch and to 15 countries worldwi...
Scaling up Business Intelligence from the scratch and to 15 countries worldwi...Scaling up Business Intelligence from the scratch and to 15 countries worldwi...
Scaling up Business Intelligence from the scratch and to 15 countries worldwi...Sergii Khomenko
 
Secure Data Scalability at Stylight with Tableau Online and Amazon Redshift /...
Secure Data Scalability at Stylight with Tableau Online and Amazon Redshift /...Secure Data Scalability at Stylight with Tableau Online and Amazon Redshift /...
Secure Data Scalability at Stylight with Tableau Online and Amazon Redshift /...Sergii Khomenko
 
Helping Data Teams with Puppet / Puppet Camp London - Apr 13, 2015
Helping Data Teams with Puppet / Puppet Camp London - Apr 13, 2015Helping Data Teams with Puppet / Puppet Camp London - Apr 13, 2015
Helping Data Teams with Puppet / Puppet Camp London - Apr 13, 2015Sergii Khomenko
 
Scaling your Tableau - Migrating from Tableau Online to a proper DWH solution...
Scaling your Tableau - Migrating from Tableau Online to a proper DWH solution...Scaling your Tableau - Migrating from Tableau Online to a proper DWH solution...
Scaling your Tableau - Migrating from Tableau Online to a proper DWH solution...Sergii Khomenko
 
From simple to more advanced: Lessons learned in 13 months with Tableau
From simple to more advanced: Lessons learned in 13 months with TableauFrom simple to more advanced: Lessons learned in 13 months with Tableau
From simple to more advanced: Lessons learned in 13 months with TableauSergii Khomenko
 
Crunching data with go: Tips, tricks, use-cases
Crunching data with go: Tips, tricks, use-casesCrunching data with go: Tips, tricks, use-cases
Crunching data with go: Tips, tricks, use-casesSergii Khomenko
 
Lean Ranking infrastructure with Solr
Lean Ranking infrastructure with SolrLean Ranking infrastructure with Solr
Lean Ranking infrastructure with SolrSergii Khomenko
 
Data Visualization with R
Data Visualization with RData Visualization with R
Data Visualization with RSergii Khomenko
 

Mais de Sergii Khomenko (11)

Handle your Lambdas - From event-based processing to Continuous Integration /...
Handle your Lambdas - From event-based processing to Continuous Integration /...Handle your Lambdas - From event-based processing to Continuous Integration /...
Handle your Lambdas - From event-based processing to Continuous Integration /...
 
Building Data applications with Go: from Bloom filters to Data pipelines / FO...
Building Data applications with Go: from Bloom filters to Data pipelines / FO...Building Data applications with Go: from Bloom filters to Data pipelines / FO...
Building Data applications with Go: from Bloom filters to Data pipelines / FO...
 
Building data pipelines: from simple to more advanced - hands-on experience /...
Building data pipelines: from simple to more advanced - hands-on experience /...Building data pipelines: from simple to more advanced - hands-on experience /...
Building data pipelines: from simple to more advanced - hands-on experience /...
 
Scaling up Business Intelligence from the scratch and to 15 countries worldwi...
Scaling up Business Intelligence from the scratch and to 15 countries worldwi...Scaling up Business Intelligence from the scratch and to 15 countries worldwi...
Scaling up Business Intelligence from the scratch and to 15 countries worldwi...
 
Secure Data Scalability at Stylight with Tableau Online and Amazon Redshift /...
Secure Data Scalability at Stylight with Tableau Online and Amazon Redshift /...Secure Data Scalability at Stylight with Tableau Online and Amazon Redshift /...
Secure Data Scalability at Stylight with Tableau Online and Amazon Redshift /...
 
Helping Data Teams with Puppet / Puppet Camp London - Apr 13, 2015
Helping Data Teams with Puppet / Puppet Camp London - Apr 13, 2015Helping Data Teams with Puppet / Puppet Camp London - Apr 13, 2015
Helping Data Teams with Puppet / Puppet Camp London - Apr 13, 2015
 
Scaling your Tableau - Migrating from Tableau Online to a proper DWH solution...
Scaling your Tableau - Migrating from Tableau Online to a proper DWH solution...Scaling your Tableau - Migrating from Tableau Online to a proper DWH solution...
Scaling your Tableau - Migrating from Tableau Online to a proper DWH solution...
 
From simple to more advanced: Lessons learned in 13 months with Tableau
From simple to more advanced: Lessons learned in 13 months with TableauFrom simple to more advanced: Lessons learned in 13 months with Tableau
From simple to more advanced: Lessons learned in 13 months with Tableau
 
Crunching data with go: Tips, tricks, use-cases
Crunching data with go: Tips, tricks, use-casesCrunching data with go: Tips, tricks, use-cases
Crunching data with go: Tips, tricks, use-cases
 
Lean Ranking infrastructure with Solr
Lean Ranking infrastructure with SolrLean Ranking infrastructure with Solr
Lean Ranking infrastructure with Solr
 
Data Visualization with R
Data Visualization with RData Visualization with R
Data Visualization with R
 

Último

Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 

Último (20)

Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 

Building Ranking Infrastructure: Data-Driven, Lean, Flexible - Sergii Khomenko, STYLIGHT / ApacheCon EU, Budapest 2014

  • 1. Building Ranking Infrastructure: Data-Driven, Lean, Flexible Sergii Khomenko, Data Scientist, STYLIGHT sergii.khomenko@stylight.com, @lc0d3r
  • 2. Agenda 2 1. Problem definition 2. Boosting 3. Lean approach to ranking infrastructure 4. Real-word examples
  • 3. LUCENE, SOLR, ELASTICSEARCH Building Ranking Infrastructure: Data-­‐Driven, Lean, Flexible 3
  • 5. THE BEST PLACE TO DICOVER FASHION Building Ranking Infrastructure: Data-­‐Driven, Lean, Flexible 5
  • 6. GET INSPIRED BY LOOKS CREATED BY COMMIUNITY Building Ranking Infrastructure: Data-­‐Driven, Lean, Flexible 6
  • 7. STYLIGHT – INTERNATIONAL COMMUNITY Building Ranking Infrastructure: Data-­‐Driven, Lean, Flexible 7
  • 10. Problem definition Ranking specifics: • Seasonal influence • Trends • Cold start of new countries, shops • Multiple dimensions of ranking model
  • 12. TF-IDF - default scoring model in Lucene/Solr • matching more query terms is better • more occurrences of a query term is better • more novel terms increase doc score more than common terms
  • 13. Improving relevance Stages to improve relevance in Solr • Editorial voting (QueryEvaluationComponent) • Indexing time (analyzing content, text analysis) • Query-time (function queries, boosting)
  • 14. Improving relevance - Solr queries q = +brand:adidas shop:monshowroom^3 q = +adidas monshowroom defType = dismax qf = brand shop^3 sort = user_ratings desc, score desc qq = adidas q = {!boost b=$b defType=dismax v=$qq} b = prod(popularity, clicks)
  • 15. Definition of solr.ExternalFileField <types> <fieldType name="float" class="solr.FloatField" omitNorms="true"/> <fieldType name="file_delta2" class="solr.ExternalFileField" keyField="id" defVal="1.0" indexed="false" stored="false" valType="float" /> </types> <fields> <field name="delta2" type="file_delta2"/> </fields>
  • 16. Example of external file with boosting coresde_DEproductsexternal_delta2.txt 15062471=0.5 15062479=0.2 15062507=0.41
  • 17. LEAN APPROACH TO RANKING
  • 18. Lean manufacturing, lean enterprise, or lean production, often simply, "lean", is a production practice that considers the expenditure of resources for any goal other than the creation of value for the end customer to be wasteful, and thus a target for elimination. Essentially, lean is centered on preserving value with less work. 18
  • 19. Lean approach to Ranking Requirements: • Decreasing time to implement new ranking model • Possibility to use more dynamic ranking models • Keeping working infrastructure alive
  • 20. Lean approach to Ranking Requirements: • A/B testing without changing entire infrastructure • Performance level - “still fast” and “transparent”
  • 21. Python benchmark, consistency checker --gaid gaid, -g gaid Google analytics site id. --gadate gadate a date to fetch the most popular pages from Google Analytics –-solr solr, -s solr Solr server to benchmark performance. --pages number, -p number a number of top pages from Google Analytics. --repeats number, -r number a number of repeats for an every page. --compare compare, -c compare Different rankings algorithms to compare.
  • 22. Python benchmark, consistency checker python solr-benchmarkbenchmark.py -c RankingClassical,RankingDelta2 --cmpmode 1 python solr-benchmarkbenchmark.py -c RankingClassical,RankingDelta2
  • 23. Common search infrastructure nginx Solr Jboss Solr-loadbalancer nginx Solr nginx Solr
  • 24. Updated infrastructure nginx Solr Jboss Solr-loadbalancer nginx Solr nginx Solr Jboss Solr-loadbalancer nginx Solr Front-end loadbalancer
  • 25. manifests/nodes/solr0x-node.pp include nginx nginx::config { "solr_dev": } nginx::solr-ranking { "delta2": urls => [ "/search.action? gender=women&brand=2271&tag=1161&tag=877&tag =468", "/search.action? gender=men&brand=11235&tag=10203&tag=10299&ta g=10326" ],
  • 26. nginx / templates / conf / solr-rewrites.conf.erb <% urls.each do |url| -%> if ($args ~* <% if url['gender'] > 0 -%>gender_id%3A<%= url['gender'] %>.*<% end -%><% url['tags'].each do |tag| - %>tag_id%3A<%= tag %>.*<% end -%><% if url['brand'] > 0 - %>brand_id%3A%28<%= url['brand'] %>%29<% end -%>) { set $orig $args; set $args "q={!boost+b=%24b+defType=dismax+v=%24qq} &qq=id:*"; rewrite ^(.*)$ "$1?$orig" break; } <% end -%>
  • 32. Multiple points to evaluate Stages to evaluate the model: • R ranking model • Independent Solr-node • For internal use-cases • Testing for some of pages • A/B roll out for % of users • Production roll out
  • 35. Multiple points to evaluate Stages to evaluate the model: • R ranking model • Independent Solr-node • For internal use-cases • Testing for some of pages • A/B roll out for % of users • Production roll out
  • 36. Thanks for your attention! Questions?
  • 37. Sergii Khomenko Data Scientist STYLIGHT GmbH sergii.khomenko@stylight.com @lc0d3r Nymphenburger Straße 86 80636 Munich, Germany S T Y L I G H T . C O M
  • 38. REFERENCE LIST 38 • Stack Overflow Tag Trends http://hewgill.com/~greg/stackoverflow/ stack_overflow/tags/#!lucene+solr+elasticsearch +sphinx • Public websites using Solr http://wiki.apache.org/solr/PublicServers • CommonQueryParameters http://wiki.apache.org/solr/ CommonQueryParameters • Thoughts in plain text http://lc0.github.io/ • STYLIGHT Engineering http://www.stylight.com/Engineering/