SlideShare a Scribd company logo
1 of 22
Download to read offline
Building a thesaurus
for product search in
ecommerce
From Magento Elastic Suite to PCU
04/07/2017, RISE CAEN
PCU
1
2
● Why product search ?
● Now - Magento Elastic Suite primer
● How MES Thesaurus answers ecommerce-specific needs
● Tomorrow – PCU : Machine Learning for ecommerce
● Questions
PCU
Plateforme de Connaissances Unifiée
Overview
The speaker
● Marc Dutoo, R&D projects lead at Smile, the leading EU Open Source service provider
● PCU project coordinator, Data / API / Cloud expert
Why product search ?
PCU
3
4
● In ecommerce, search is very important
○ Typically 60% of product buys come from it, and only 40% from category /
shelves browsing...
○ but their management cost doesn't reflect that !
● A specific, very concrete criteria of search results being right or wrong :
whether the customer buys its products.
● This financiary incentive curbs everything:
○ Autocompletion is paramount
○ Boosting search fields
○ Scoring searched products
○ Rescoring results
○ Enriching results
○ ...thesaurus building
PCU
Plateforme de Connaissances Unifiée
Why product search ?
5
● Never empty results
○ In other search domains, for instance knowledge management or entreprise
search, an empty result is a good answer, meaning that the knowledge or
document is not yet there and adding it would be an improvement
○ But in ecommerce, there should never be 0 result, because the customer
must never be in a dead end
● => “push” generic, “hot” products to the customer : most viewed, most
bought, discounted, newest, available, hot brands...
○ branches out to recommendation algorithms (the other axis of ecommerce)
● But also curb search algorithms to have as wide returns as possible
○ correction, query expansion i.e. thesaurus
PCU
Plateforme de Connaissances Unifiée
Never empty results
6
● A lot of different features across categories
○ Books : only author, editor
○ Smartphone : size, camera resolution, memory, network, color...
○ Overall, 80 distinct filters are used in 800 mo page view logs
○ Product categories also can be searched
○ And always : price, discount, but also brand, availability, store, description...
● Whose contribution to search varies. Ex. description is often too wide :
○ Book description : can cover as many topics as book do
○ T-shirt description : “this blue t-shirt goes very well with yellow trousers”
● The solution
○ Search in all fields, but allow filtering those specific fields => combined
search + filters functionality
○ And allow to configure separately how each field / feature contributes to
search results : boost title, don't look in book description...
PCU
Plateforme de Connaissances Unifiée
Products differ widely across domains
Now – Magento
Elastic Suite primer
PCU
7
8
● Magento : leading Open Source ecommerce platform
● ElasticSearch : Open Source distributed search platform and
ecosystem, easy to set up and integrate in business applications, built
on Apache Lucene
● Magento Elastic Suite (MES) : Open Source “searchandising” solution
by Smile, the leading Open Source solution provider in Europe
○ Searchandising : mixes search and product selling optimisation
techniques and marketing
● Features :
○ Search (field boost, thesaurus...)
○ Facettes (i.e. filters)
○ Merchandising (product placement...)
PCU
Plateforme de Connaissances Unifiée
Now – Magento Elastic Suite primer
MES THÉSAURUS
PARAMÉTRABLE
Des règles sémantiques pour optimiser le moteur de recherche
 Définissez des synonymes et des concepts adaptés à votre catalogue produit
 Le moteur de recherche utilise ce thésaurus pour créer des requêtes voisines
de la requête de l’utilisateur
 Tous les résultats sont consolidés en utilisant un poids adapté
QUERY IN THE
SEARCH ENGINE
QUERY IN THE
SEARCH ENGINE
BLUE BOTTOM
DENIM BOTTOM
BLUE SKIRT
BLUE SHORT
BLUE PANT
DENIM SKIRT
DENIM SHORT
DENIM PANT
BLUE BOTTOM
DENIM BOTTOM
BLUE SKIRT
BLUE SHORT
BLUE PANT
DENIM SKIRT
DENIM SHORT
DENIM PANT
THESAURUSTHESAURUS
USER QUERYUSER QUERY
WEIGHT
BLUE BOTTOMBLUE BOTTOM
SYNONYMS
BLUE ≈ DENIM
CONCEPTS
SKIRT
BOTTOM  PANT
SHORT
Thesaurus
configuration
MES Thesaurus, for
ecommerce specific
needs
PCU
11
12
● color : denim, sky blue, cyan... : thesaurus of blue
● clothes : jeans <= trousers
● DIY / bricolage : grass => seeds
● HOWEVER iphone - galaxy : incorrect example
○ of a ”high end smartphone” thesaurus rule, because iphone is a different
world than Android and Apple fans are very loyal to the brand !
○ => rather a “high end smartphone” category
● … i.e. very specific to each kind of product, catalog, vendor
PCU
Plateforme de Connaissances Unifiée
MES Thesaurus – practical examples
13
● When the ecommerce website is being built, MES experts talk with the
vendor to help him configure it
○ Products, categories, thesaurus...
● 3 weeks after it has gone “live”, MES experts study how it is used and
patch configuration
○ By looking in search analytics (Magento backoffice or Google Analytics) :
○ In searches with empty or few results, find most entered terms : do they
need synonyms to be added to the thesaurus ?
○ Also seldom clicked categories...
PCU
Plateforme de Connaissances Unifiée
MES Thesaurus – building process
14
● Search of a lot of words : several searches are done and combined
○ Of all words together first – but higher risk of returning no result (also
because higher risk of spelling mistake)
○ Then of each single word, then of all pairs of 2 words, then 3
○ Combined by giving less weight to those last ones
PCU
Plateforme de Connaissances Unifiée
Ecommerce search beyond thesaurus
Tomorrow : Machine
Learning for
ecommerce with PCU
PCU
15
16
● 6 partners over 2017-2019, sponsored by the French ministry of Industry
● In order to democratize Big Data, so that every company will be able to add
value to its own core business thanks to its existing data :
○ a Big Data / Machine Learning / semantic module to enrich any business application
● As showcased in 2 use cases :
○ E-commerce (up to digital in store) & B2B
○ Enterprise search
● Thanks to:
○ A factory of Machine Learning and Semantics-enriched search engines
○ state-of-the-art and new algorithms analyzing user behaviour
○ end-to-end event-driven data processing workflow
○ an open source, best-of-breed, unified, flexible and extensible approach
PCU
Plateforme de Connaissances Unifiée
Factsheet - Unified Knowledge Platform
Overall architecture
Collect Learn Reference Publish
Monitor
17
>>
>
>>
>
>>
>
PCU
Plateforme de Connaissances Unifiée
18
Smile : coordinator, architecture, ecommerce
Paris 13 : Machine Learning, semantics
ESILV : pipeline, semantics
Proxem : text & opinion mining, B2B
Wallix : enterprise search experience
Armadillo : integration & mgmt API & UI
PCU
Plateforme de Connaissances Unifiée
Partners and stakeholders
Financial sponsors : BPI, IDF
Cluster : System@tic
19
● Ecommerce : recommendation algorithms that learn from user
behaviour, for advertising but also search autocompletion
○ Most searched product filters / features
○ Most searched query terms, predict their evolution (for autocompletion)
○ Collaborative filtering on product views
○ More widely, most viewed or sold products or features and their evolution...
● Ecommerce & B2B : also
○ Detect buy intent
○ predict sales
○ predict churn
○ Suggest or train customer segments, using classification
PCU
Plateforme de Connaissances Unifiée
Key Machine Learning outputs - ecommerce
20
● Named Entity Recognition (topics & aspects)
○ For supervized enrichment of ontology
○ up to opinion mining, for polarity of opinion about aspect (good or bad)
● Allowing to transform fulltext search into structured search :
○ “samsung tv” => type:tv and brand:samsung
● Thesaurus that learns from user behaviour :
○ if a lot of people search “jeans”, but don't click on any result and rather search
“trousers”, and only then click on a product => “jeans to trousers” query
expansion should be added to thesaurus
○ Using search query co-occurrence
● Entreprise search, beyond files & classification :
○ Search competences and experiences...
PCU
Plateforme de Connaissances Unifiée
Key Machine Learning outputs - search
21
● Generic platform
○ Unified, flexible, extensible, best-of-breed-based, API-managed
○ Along with a set of standard connectors, data pipeline elements and
Machine Learning (ML) and text mining algorithms
● Use cases and products
○ E-commerce (product, deployed at Smile early adopter customers),
B2B (deployed at Smile & Proxem)
○ Enterprise search (product, deployed at each partner's)
● Open Source Ecosystem
○ Ties with integrated technical components' communities as well as
derived business-specific products
○ Home of platform examples, tryout and adoption
PCU
Plateforme de Connaissances Unifiée
Outputs
22
Questions ?
Thank you
for your
attention !
https://pcu-consortium.github.io/
http://magento-elastic-suite.io - http://www.smile.fr
Contact : marc.dutoo@smile.fr

More Related Content

Similar to PCU@RISE 2017 - Building a thesaurus for product search

Organizational Strategy Final Project
Organizational Strategy Final ProjectOrganizational Strategy Final Project
Organizational Strategy Final ProjectPedram Keyvani
 
Free and Open Source Software
Free and Open Source SoftwareFree and Open Source Software
Free and Open Source SoftwareMoinuddin Ahmed
 
What are the Assumptions About Data Products by Hiya.com Lead PM
What are the Assumptions About Data Products by Hiya.com Lead PMWhat are the Assumptions About Data Products by Hiya.com Lead PM
What are the Assumptions About Data Products by Hiya.com Lead PMProduct School
 
Smile OSS (Magento Expertise)
Smile OSS (Magento Expertise)Smile OSS (Magento Expertise)
Smile OSS (Magento Expertise)Vlad Makarov
 
Live predictions with schemaless data at scale. MLMU Kosice, Exponea
Live predictions with schemaless data at scale. MLMU Kosice, ExponeaLive predictions with schemaless data at scale. MLMU Kosice, Exponea
Live predictions with schemaless data at scale. MLMU Kosice, ExponeaData Science Club
 
How to Use Data for Product Decisions by YouTube Product Manager
How to Use Data for Product Decisions by YouTube Product ManagerHow to Use Data for Product Decisions by YouTube Product Manager
How to Use Data for Product Decisions by YouTube Product ManagerProduct School
 
Analytics is Taking over the World (Again) - UKOUG Tech'17
Analytics is Taking over the World (Again) - UKOUG Tech'17Analytics is Taking over the World (Again) - UKOUG Tech'17
Analytics is Taking over the World (Again) - UKOUG Tech'17Rittman Analytics
 
Using AI for Ecommerce Analytics
Using AI for Ecommerce AnalyticsUsing AI for Ecommerce Analytics
Using AI for Ecommerce AnalyticsAlexander Galea
 
How to Monetize Consumer Platforms by LinkedIn PM
How to Monetize Consumer Platforms by LinkedIn PMHow to Monetize Consumer Platforms by LinkedIn PM
How to Monetize Consumer Platforms by LinkedIn PMProduct School
 
How to Increase Your Product Sense by ServiceNow Senior PM
How to Increase Your Product Sense by ServiceNow Senior PMHow to Increase Your Product Sense by ServiceNow Senior PM
How to Increase Your Product Sense by ServiceNow Senior PMProduct School
 
Google Analytics Konferenz 2019_Attribution: building a model_Martin Frotzler...
Google Analytics Konferenz 2019_Attribution: building a model_Martin Frotzler...Google Analytics Konferenz 2019_Attribution: building a model_Martin Frotzler...
Google Analytics Konferenz 2019_Attribution: building a model_Martin Frotzler...e-dialog GmbH
 
Early Stage Product Development - Incubadora Sinergia
Early Stage Product Development - Incubadora SinergiaEarly Stage Product Development - Incubadora Sinergia
Early Stage Product Development - Incubadora SinergiaRiley Maguire
 
Recruitment Analytics workshop - Endouble Antwerp 6-3-2017
Recruitment Analytics workshop  - Endouble Antwerp 6-3-2017Recruitment Analytics workshop  - Endouble Antwerp 6-3-2017
Recruitment Analytics workshop - Endouble Antwerp 6-3-2017Endouble
 
How to improve your product sense?
How to improve your product sense?How to improve your product sense?
How to improve your product sense?manjeetjakhar
 
Data empathy - A Design Thinking approach to AI application development
Data empathy  -  A Design Thinking approach to AI application development Data empathy  -  A Design Thinking approach to AI application development
Data empathy - A Design Thinking approach to AI application development Franki Chamaki
 
"What we learned from 5 years of building a data science software that actual...
"What we learned from 5 years of building a data science software that actual..."What we learned from 5 years of building a data science software that actual...
"What we learned from 5 years of building a data science software that actual...Dataconomy Media
 
DMO Advanced 2022: 30 Tips for Better Performing Content
DMO Advanced 2022: 30 Tips for Better Performing ContentDMO Advanced 2022: 30 Tips for Better Performing Content
DMO Advanced 2022: 30 Tips for Better Performing ContentJustinCoons1
 
Making better use of Data and AI in Industry 4.0
Making better use of Data and AI in Industry 4.0Making better use of Data and AI in Industry 4.0
Making better use of Data and AI in Industry 4.0Albert Y. C. Chen
 
Convince Management to Invest in a CCMS (Lessons learned)
Convince Management to Invest in a CCMS (Lessons learned)Convince Management to Invest in a CCMS (Lessons learned)
Convince Management to Invest in a CCMS (Lessons learned)Publishing Smarter
 

Similar to PCU@RISE 2017 - Building a thesaurus for product search (20)

Organizational Strategy Final Project
Organizational Strategy Final ProjectOrganizational Strategy Final Project
Organizational Strategy Final Project
 
Free and Open Source Software
Free and Open Source SoftwareFree and Open Source Software
Free and Open Source Software
 
What are the Assumptions About Data Products by Hiya.com Lead PM
What are the Assumptions About Data Products by Hiya.com Lead PMWhat are the Assumptions About Data Products by Hiya.com Lead PM
What are the Assumptions About Data Products by Hiya.com Lead PM
 
Smile OSS (Magento Expertise)
Smile OSS (Magento Expertise)Smile OSS (Magento Expertise)
Smile OSS (Magento Expertise)
 
Live predictions with schemaless data at scale. MLMU Kosice, Exponea
Live predictions with schemaless data at scale. MLMU Kosice, ExponeaLive predictions with schemaless data at scale. MLMU Kosice, Exponea
Live predictions with schemaless data at scale. MLMU Kosice, Exponea
 
How to Use Data for Product Decisions by YouTube Product Manager
How to Use Data for Product Decisions by YouTube Product ManagerHow to Use Data for Product Decisions by YouTube Product Manager
How to Use Data for Product Decisions by YouTube Product Manager
 
Analytics is Taking over the World (Again) - UKOUG Tech'17
Analytics is Taking over the World (Again) - UKOUG Tech'17Analytics is Taking over the World (Again) - UKOUG Tech'17
Analytics is Taking over the World (Again) - UKOUG Tech'17
 
Using AI for Ecommerce Analytics
Using AI for Ecommerce AnalyticsUsing AI for Ecommerce Analytics
Using AI for Ecommerce Analytics
 
How to Monetize Consumer Platforms by LinkedIn PM
How to Monetize Consumer Platforms by LinkedIn PMHow to Monetize Consumer Platforms by LinkedIn PM
How to Monetize Consumer Platforms by LinkedIn PM
 
How to Increase Your Product Sense by ServiceNow Senior PM
How to Increase Your Product Sense by ServiceNow Senior PMHow to Increase Your Product Sense by ServiceNow Senior PM
How to Increase Your Product Sense by ServiceNow Senior PM
 
Google Analytics Konferenz 2019_Attribution: building a model_Martin Frotzler...
Google Analytics Konferenz 2019_Attribution: building a model_Martin Frotzler...Google Analytics Konferenz 2019_Attribution: building a model_Martin Frotzler...
Google Analytics Konferenz 2019_Attribution: building a model_Martin Frotzler...
 
Early Stage Product Development - Incubadora Sinergia
Early Stage Product Development - Incubadora SinergiaEarly Stage Product Development - Incubadora Sinergia
Early Stage Product Development - Incubadora Sinergia
 
Recruitment Analytics workshop - Endouble Antwerp 6-3-2017
Recruitment Analytics workshop  - Endouble Antwerp 6-3-2017Recruitment Analytics workshop  - Endouble Antwerp 6-3-2017
Recruitment Analytics workshop - Endouble Antwerp 6-3-2017
 
How to improve your product sense?
How to improve your product sense?How to improve your product sense?
How to improve your product sense?
 
Data empathy - A Design Thinking approach to AI application development
Data empathy  -  A Design Thinking approach to AI application development Data empathy  -  A Design Thinking approach to AI application development
Data empathy - A Design Thinking approach to AI application development
 
"What we learned from 5 years of building a data science software that actual...
"What we learned from 5 years of building a data science software that actual..."What we learned from 5 years of building a data science software that actual...
"What we learned from 5 years of building a data science software that actual...
 
DMO Advanced 2022: 30 Tips for Better Performing Content
DMO Advanced 2022: 30 Tips for Better Performing ContentDMO Advanced 2022: 30 Tips for Better Performing Content
DMO Advanced 2022: 30 Tips for Better Performing Content
 
Making better use of Data and AI in Industry 4.0
Making better use of Data and AI in Industry 4.0Making better use of Data and AI in Industry 4.0
Making better use of Data and AI in Industry 4.0
 
Summer intern project 2016
Summer intern project 2016Summer intern project 2016
Summer intern project 2016
 
Convince Management to Invest in a CCMS (Lessons learned)
Convince Management to Invest in a CCMS (Lessons learned)Convince Management to Invest in a CCMS (Lessons learned)
Convince Management to Invest in a CCMS (Lessons learned)
 

Recently uploaded

Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trssuser06f238
 
Good agricultural practices 3rd year bpharm. herbal drug technology .pptx
Good agricultural practices 3rd year bpharm. herbal drug technology .pptxGood agricultural practices 3rd year bpharm. herbal drug technology .pptx
Good agricultural practices 3rd year bpharm. herbal drug technology .pptxSimeonChristian
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRlizamodels9
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxMurugaveni B
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...Universidade Federal de Sergipe - UFS
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptArshadWarsi13
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxJorenAcuavera1
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayupadhyaymani499
 
Functional group interconversions(oxidation reduction)
Functional group interconversions(oxidation reduction)Functional group interconversions(oxidation reduction)
Functional group interconversions(oxidation reduction)itwameryclare
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxNandakishor Bhaurao Deshmukh
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)riyaescorts54
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxFarihaAbdulRasheed
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx023NiWayanAnggiSriWa
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024innovationoecd
 

Recently uploaded (20)

Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 tr
 
Good agricultural practices 3rd year bpharm. herbal drug technology .pptx
Good agricultural practices 3rd year bpharm. herbal drug technology .pptxGood agricultural practices 3rd year bpharm. herbal drug technology .pptx
Good agricultural practices 3rd year bpharm. herbal drug technology .pptx
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.ppt
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptx
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyay
 
Functional group interconversions(oxidation reduction)
Functional group interconversions(oxidation reduction)Functional group interconversions(oxidation reduction)
Functional group interconversions(oxidation reduction)
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
 
Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -I
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024
 

PCU@RISE 2017 - Building a thesaurus for product search

  • 1. Building a thesaurus for product search in ecommerce From Magento Elastic Suite to PCU 04/07/2017, RISE CAEN PCU 1
  • 2. 2 ● Why product search ? ● Now - Magento Elastic Suite primer ● How MES Thesaurus answers ecommerce-specific needs ● Tomorrow – PCU : Machine Learning for ecommerce ● Questions PCU Plateforme de Connaissances Unifiée Overview The speaker ● Marc Dutoo, R&D projects lead at Smile, the leading EU Open Source service provider ● PCU project coordinator, Data / API / Cloud expert
  • 4. 4 ● In ecommerce, search is very important ○ Typically 60% of product buys come from it, and only 40% from category / shelves browsing... ○ but their management cost doesn't reflect that ! ● A specific, very concrete criteria of search results being right or wrong : whether the customer buys its products. ● This financiary incentive curbs everything: ○ Autocompletion is paramount ○ Boosting search fields ○ Scoring searched products ○ Rescoring results ○ Enriching results ○ ...thesaurus building PCU Plateforme de Connaissances Unifiée Why product search ?
  • 5. 5 ● Never empty results ○ In other search domains, for instance knowledge management or entreprise search, an empty result is a good answer, meaning that the knowledge or document is not yet there and adding it would be an improvement ○ But in ecommerce, there should never be 0 result, because the customer must never be in a dead end ● => “push” generic, “hot” products to the customer : most viewed, most bought, discounted, newest, available, hot brands... ○ branches out to recommendation algorithms (the other axis of ecommerce) ● But also curb search algorithms to have as wide returns as possible ○ correction, query expansion i.e. thesaurus PCU Plateforme de Connaissances Unifiée Never empty results
  • 6. 6 ● A lot of different features across categories ○ Books : only author, editor ○ Smartphone : size, camera resolution, memory, network, color... ○ Overall, 80 distinct filters are used in 800 mo page view logs ○ Product categories also can be searched ○ And always : price, discount, but also brand, availability, store, description... ● Whose contribution to search varies. Ex. description is often too wide : ○ Book description : can cover as many topics as book do ○ T-shirt description : “this blue t-shirt goes very well with yellow trousers” ● The solution ○ Search in all fields, but allow filtering those specific fields => combined search + filters functionality ○ And allow to configure separately how each field / feature contributes to search results : boost title, don't look in book description... PCU Plateforme de Connaissances Unifiée Products differ widely across domains
  • 7. Now – Magento Elastic Suite primer PCU 7
  • 8. 8 ● Magento : leading Open Source ecommerce platform ● ElasticSearch : Open Source distributed search platform and ecosystem, easy to set up and integrate in business applications, built on Apache Lucene ● Magento Elastic Suite (MES) : Open Source “searchandising” solution by Smile, the leading Open Source solution provider in Europe ○ Searchandising : mixes search and product selling optimisation techniques and marketing ● Features : ○ Search (field boost, thesaurus...) ○ Facettes (i.e. filters) ○ Merchandising (product placement...) PCU Plateforme de Connaissances Unifiée Now – Magento Elastic Suite primer
  • 9. MES THÉSAURUS PARAMÉTRABLE Des règles sémantiques pour optimiser le moteur de recherche  Définissez des synonymes et des concepts adaptés à votre catalogue produit  Le moteur de recherche utilise ce thésaurus pour créer des requêtes voisines de la requête de l’utilisateur  Tous les résultats sont consolidés en utilisant un poids adapté QUERY IN THE SEARCH ENGINE QUERY IN THE SEARCH ENGINE BLUE BOTTOM DENIM BOTTOM BLUE SKIRT BLUE SHORT BLUE PANT DENIM SKIRT DENIM SHORT DENIM PANT BLUE BOTTOM DENIM BOTTOM BLUE SKIRT BLUE SHORT BLUE PANT DENIM SKIRT DENIM SHORT DENIM PANT THESAURUSTHESAURUS USER QUERYUSER QUERY WEIGHT BLUE BOTTOMBLUE BOTTOM SYNONYMS BLUE ≈ DENIM CONCEPTS SKIRT BOTTOM  PANT SHORT
  • 11. MES Thesaurus, for ecommerce specific needs PCU 11
  • 12. 12 ● color : denim, sky blue, cyan... : thesaurus of blue ● clothes : jeans <= trousers ● DIY / bricolage : grass => seeds ● HOWEVER iphone - galaxy : incorrect example ○ of a ”high end smartphone” thesaurus rule, because iphone is a different world than Android and Apple fans are very loyal to the brand ! ○ => rather a “high end smartphone” category ● … i.e. very specific to each kind of product, catalog, vendor PCU Plateforme de Connaissances Unifiée MES Thesaurus – practical examples
  • 13. 13 ● When the ecommerce website is being built, MES experts talk with the vendor to help him configure it ○ Products, categories, thesaurus... ● 3 weeks after it has gone “live”, MES experts study how it is used and patch configuration ○ By looking in search analytics (Magento backoffice or Google Analytics) : ○ In searches with empty or few results, find most entered terms : do they need synonyms to be added to the thesaurus ? ○ Also seldom clicked categories... PCU Plateforme de Connaissances Unifiée MES Thesaurus – building process
  • 14. 14 ● Search of a lot of words : several searches are done and combined ○ Of all words together first – but higher risk of returning no result (also because higher risk of spelling mistake) ○ Then of each single word, then of all pairs of 2 words, then 3 ○ Combined by giving less weight to those last ones PCU Plateforme de Connaissances Unifiée Ecommerce search beyond thesaurus
  • 15. Tomorrow : Machine Learning for ecommerce with PCU PCU 15
  • 16. 16 ● 6 partners over 2017-2019, sponsored by the French ministry of Industry ● In order to democratize Big Data, so that every company will be able to add value to its own core business thanks to its existing data : ○ a Big Data / Machine Learning / semantic module to enrich any business application ● As showcased in 2 use cases : ○ E-commerce (up to digital in store) & B2B ○ Enterprise search ● Thanks to: ○ A factory of Machine Learning and Semantics-enriched search engines ○ state-of-the-art and new algorithms analyzing user behaviour ○ end-to-end event-driven data processing workflow ○ an open source, best-of-breed, unified, flexible and extensible approach PCU Plateforme de Connaissances Unifiée Factsheet - Unified Knowledge Platform
  • 17. Overall architecture Collect Learn Reference Publish Monitor 17 >> > >> > >> > PCU Plateforme de Connaissances Unifiée
  • 18. 18 Smile : coordinator, architecture, ecommerce Paris 13 : Machine Learning, semantics ESILV : pipeline, semantics Proxem : text & opinion mining, B2B Wallix : enterprise search experience Armadillo : integration & mgmt API & UI PCU Plateforme de Connaissances Unifiée Partners and stakeholders Financial sponsors : BPI, IDF Cluster : System@tic
  • 19. 19 ● Ecommerce : recommendation algorithms that learn from user behaviour, for advertising but also search autocompletion ○ Most searched product filters / features ○ Most searched query terms, predict their evolution (for autocompletion) ○ Collaborative filtering on product views ○ More widely, most viewed or sold products or features and their evolution... ● Ecommerce & B2B : also ○ Detect buy intent ○ predict sales ○ predict churn ○ Suggest or train customer segments, using classification PCU Plateforme de Connaissances Unifiée Key Machine Learning outputs - ecommerce
  • 20. 20 ● Named Entity Recognition (topics & aspects) ○ For supervized enrichment of ontology ○ up to opinion mining, for polarity of opinion about aspect (good or bad) ● Allowing to transform fulltext search into structured search : ○ “samsung tv” => type:tv and brand:samsung ● Thesaurus that learns from user behaviour : ○ if a lot of people search “jeans”, but don't click on any result and rather search “trousers”, and only then click on a product => “jeans to trousers” query expansion should be added to thesaurus ○ Using search query co-occurrence ● Entreprise search, beyond files & classification : ○ Search competences and experiences... PCU Plateforme de Connaissances Unifiée Key Machine Learning outputs - search
  • 21. 21 ● Generic platform ○ Unified, flexible, extensible, best-of-breed-based, API-managed ○ Along with a set of standard connectors, data pipeline elements and Machine Learning (ML) and text mining algorithms ● Use cases and products ○ E-commerce (product, deployed at Smile early adopter customers), B2B (deployed at Smile & Proxem) ○ Enterprise search (product, deployed at each partner's) ● Open Source Ecosystem ○ Ties with integrated technical components' communities as well as derived business-specific products ○ Home of platform examples, tryout and adoption PCU Plateforme de Connaissances Unifiée Outputs
  • 22. 22 Questions ? Thank you for your attention ! https://pcu-consortium.github.io/ http://magento-elastic-suite.io - http://www.smile.fr Contact : marc.dutoo@smile.fr