SlideShare uma empresa Scribd logo
1 de 28
Applying NLP to Product Comparison at Visual Meta
1
Ross Turner
Elasticsearch Meetup Berlin 22/02/17
Overview
Product Comparison on the Visual Meta Platform1
Applying NLP to Product Comparison
Using NLP to Maintain a Product Catalogue2
Making Product Discovery Conversational3
2
About Me
Previously…
• Researcher in Natural Language Generation (NLG)
• Software Engineer on Local Search
• Co-founder and Principal Engineer at an NLG Start Up
Currently…
• Engineering Head at Visual Meta
Product Comparison on the Visual Meta Platform
4
Product Comparison at Visual Meta
‘All shops, one site’
• Online marketing platform with
shopping portals in 12 different
countries
• 3 brands: Ladenzeile, ShopAlike,
UmSóLugar
• 100,000,000+ items
• 6,000+ partner shops
Faceted Search at Visual Meta
Discover fashion, furniture and
more….
• 800,000 platform visits per day
• 80 filter types across 21
categories
• Currently porting filter search
to ElasticSearch
Maintaining a Product Catalogue at Visual Meta
Product feeds are continuously synced from partner shops:
• Feed items must be categorised in order to be discoverable on the platform
We want to:
• Identify all variants of a product
• Compare offers across shops
• Make it easy for our for users to browse through millions of products
Model Colour Memory
Apple iPhone 6s Space Grey 32GB
Apple iPhone 6s Space Grey 128GB
Apple iPhone 6s Gold 32GB
Apple iPhone 6s Gold 128GB
Apple iPhone 6s Rose Gold 32GB
Apple iPhone 6s Rose Gold 128GB
Apple iPhone 6s Silver 32GB
Apple iPhone 6s Silver 128GB
Assigning Tags Based on Textual Attributes
8
String Matching
Index item names and descriptions, query product variant tag names against the index
Lucene query:
• +(Name:apple Description:apple) +(Name:iphone Description:iphone) +(Name:6s Description:6s)
+(Name:16gb Description:16gb) +(Name:space Description:space) +(Name:grey)
Test by manually assigning items to a random sample of products
Recall Precision Fscore
0.59 0.64 0.61
Error Analysis
Naming for the same product is not consistent across feeds:
1. abc.com: “Apple iPhone 6 (Space Grey, 64GB)”
2. efg.com: “Apple iPhone 6 64 GB Space Grey”
3. xyz.com: “Apple iPhone 6”
Naming for the same product is not consistent within the same feed:
1. “Apple Iphone 6 - 64GB”
2. “Apple Iphone 6 64GB Space Grey”
3. “Kamakshi Apple iPhone 6 (Latest Model) - 64 GB - Space Gray - Smartphone”
Wrongly categorised Products in the feed:
• “Cover for Apple Iphone 6 - 64GB”
Comparing Tag Names to Item Names
Comparing Names Between Item Feeds
Text Classification
13
Language Models
Drawbacks of bag of words / n-grams:
• Words are equally distant
• Vectors are sparse
Word embeddings capture semantics:
• Vectors are continuous
• Similar words are close in vector space
1. Efficient estimation of word representations in vector space arXiv preprint arXiv:1301.3781 (2013) by Tomas Mikolov, Kai Chen, Greg
Corrado, Jeffrey Dean
15
Word2Vec for Mobile Phone Items
Mobile phone item corpus:
• 7,890 feed items
• 863k tokens, 41.5k unique
Closest words to “Galaxy”:
Word Cosine Distance
1 Samsung 0.51
2 S2 0.48
3 S5 0.46
Classification Performance
Tag Best BOW Classifier Decision Tree with Word2Vec
Fscore Precision Recall Fscore Precision Recall
“Smartphone” 0.95 0.99 0.84 0.92 0.94 0.90
“Home Speakers” 0.55 0.67 0.32 0.79 0.79 0.80
“Creeper Cipök” 0.58 0.97 0.22 0.75 0.86 0.66
“Leder Schuhe” 0.52 0.73 0.25 0.71 0.93 0.58
“Bett mit Schubladen” 0.52 0.65 0.29 0.70 0.81 0.62
Feed Enhancement
17
Two Descriptions of a Samsung TV
Samsung UE40H6400AK. Display diagonal:
101.6 cm (40"), HD type: Full HD, Display
resolution: 1920 x 1080 pixels. Tuner type:
Analog & Digital, Digital signal format
system: DVB-C, DVB-T. RMS rated power:
20 W. Consumer Electronics Control (CEC):
Anynet+. Picture processing technology:
Samsung Wide Color Enhancer
The Samsung UE40H6400 has a 101.6cm
screen size and a resolution of 1920 x
1080 pixels. It is a Full HD TV, has an
Analog & Digital tuner and comes with
Anynet+.
Generating Product Descriptions
Choosing what to say Deciding how to say it
3. E Reiter (2007). An Architecture for Data-to-Text Systems. In Proceedings of ENLG-2007, pages 97-104
Two Descriptions of a Samsung Smartphone
Samsung SM-G920F, Galaxy. Display
diagonal: 12.9 cm (5.1"), Display
resolution: 2560 x 1440 pixels, Display
type: SAMOLED. Processor frequency: 2.1
GHz, Coprocessor frequency: 1.5 GHz.
Internal storage capacity: 32 GB, Internal
RAM: 3072 MB. Main camera resolution
(numeric): 16 MP, Video recording modes:
1080p, 2160p, Maximum frame rate: 30
fps. SIM card capability: Single SIM, SIM
card type: NanoSIM, 2G standards: GSM
The Samsung GALAXY S6 has a 12.9'
display with 2560 x 1440 pixel resolution.
It has a 2.1GHZ processor, a 16 megapixel
camera and 3072MB of internal RAM with
32GB of internal storage capacity.
Building Messages from a Product Catalogue
The Samsung Galaxy S6 has a 12.9' display
with 2560 x 1440 pixel resolution. It has a
2.1GHZ processor, a 16 megapixel camera
and 3072MB of internal RAM with 32GB of
internal storage capacity.
Making Product Discovery Conversational
22
Entity Recognition for Voice Search
Input - “I’d like some red adidas trainers”
Output:
• <brands, [adidas]>
• <categories, [trainers]>
• <colours, [red]>
234. http://visual-meta.com/tech-corner/hi-lara-building-a-conversational-agent-for-visual-metas-first-hackathon.html
Lucene index is built from labels to tag tree
tokens
1. Word shingles are extracted from the input
query
2. Each shingle is queried against the index (top
down, greedy)
Labeled tokens are used to:
1. Query the product index
2. Keep track of the dialogue state
Using the Product Catalogue to Parse Queries
24
• “I’d like some red adidas trainers”
• “I’d like some red adidas”
• “like some red adidas trainers”
• “I’d like some red”
• “like some red adidas”
• “some red adidas trainers”
• ...
• “red”
• “adidas”
• “trainers”
Putting It all Together: Answering Queries
How big is the Samsung Galaxy S6’s screen?
The Samsung Galaxy S6 has a 12’9 display
How much RAM does it have?
It has 3072MB of RAM
Wrapping Up
26
Takeaways
1. Word embeddings, even when trained on limited data can:
a. provide significant improvement over bag of words models for text classification; and
b. reduce the amount of manually curated data required for the task
2. Product catalogues provide a rich information source for conversational apps
3. NLG can be utilised for product feed enhancement as well as conversation
Thank you
28

Mais conteúdo relacionado

Destaque

Destaque (17)

Developing highly scalable applications with Symfony and RabbitMQ
Developing highly scalable applications with  Symfony and RabbitMQDeveloping highly scalable applications with  Symfony and RabbitMQ
Developing highly scalable applications with Symfony and RabbitMQ
 
CloudStack EU user group - Trillian
CloudStack EU user group - TrillianCloudStack EU user group - Trillian
CloudStack EU user group - Trillian
 
NSM (Network Security Monitoring) - Tecland Chapeco
NSM (Network Security Monitoring) - Tecland ChapecoNSM (Network Security Monitoring) - Tecland Chapeco
NSM (Network Security Monitoring) - Tecland Chapeco
 
Choosing the right data storage in the Cloud.
Choosing the right data storage in the Cloud. Choosing the right data storage in the Cloud.
Choosing the right data storage in the Cloud.
 
Reactive Cloud Security | AWS Public Sector Summit 2016
Reactive Cloud Security | AWS Public Sector Summit 2016Reactive Cloud Security | AWS Public Sector Summit 2016
Reactive Cloud Security | AWS Public Sector Summit 2016
 
Apostila De Dispositivos EléTricos
Apostila De Dispositivos EléTricosApostila De Dispositivos EléTricos
Apostila De Dispositivos EléTricos
 
Business selectors
Business selectorsBusiness selectors
Business selectors
 
Writing New Relic Plugins: NSQ
Writing New Relic Plugins: NSQWriting New Relic Plugins: NSQ
Writing New Relic Plugins: NSQ
 
What does "monitoring" mean? (FOSDEM 2017)
What does "monitoring" mean? (FOSDEM 2017)What does "monitoring" mean? (FOSDEM 2017)
What does "monitoring" mean? (FOSDEM 2017)
 
Orchestrating Docker in production - TIAD Camp Docker
Orchestrating Docker in production - TIAD Camp DockerOrchestrating Docker in production - TIAD Camp Docker
Orchestrating Docker in production - TIAD Camp Docker
 
Hunting powerpoint
Hunting powerpointHunting powerpoint
Hunting powerpoint
 
Jake Fox Pd. 5
Jake Fox Pd. 5Jake Fox Pd. 5
Jake Fox Pd. 5
 
Microservices Tracing With Spring Cloud and Zipkin @Szczecin JUG
Microservices Tracing With Spring Cloud and Zipkin @Szczecin JUGMicroservices Tracing With Spring Cloud and Zipkin @Szczecin JUG
Microservices Tracing With Spring Cloud and Zipkin @Szczecin JUG
 
Automated Infrastructure Security: Monitoring using FOSS
Automated Infrastructure Security: Monitoring using FOSSAutomated Infrastructure Security: Monitoring using FOSS
Automated Infrastructure Security: Monitoring using FOSS
 
Application Deployment at UC Riverside
Application Deployment at UC RiversideApplication Deployment at UC Riverside
Application Deployment at UC Riverside
 
Python Pants Build System for Large Codebases
Python Pants Build System for Large CodebasesPython Pants Build System for Large Codebases
Python Pants Build System for Large Codebases
 
API Management - Practical Enterprise Implementation Experience
API Management - Practical Enterprise Implementation ExperienceAPI Management - Practical Enterprise Implementation Experience
API Management - Practical Enterprise Implementation Experience
 

Semelhante a Applying NLP to product comparison at visual meta

Unify Your Selling Channels in One Product Catalog Service
Unify Your Selling Channels in One Product Catalog ServiceUnify Your Selling Channels in One Product Catalog Service
Unify Your Selling Channels in One Product Catalog Service
MongoDB
 

Semelhante a Applying NLP to product comparison at visual meta (20)

Using Machine Learning at Scale: A Gaming Industry Experience!
Using Machine Learning at Scale: A Gaming Industry Experience!Using Machine Learning at Scale: A Gaming Industry Experience!
Using Machine Learning at Scale: A Gaming Industry Experience!
 
Unify Your Selling Channels in One Product Catalog Service
Unify Your Selling Channels in One Product Catalog ServiceUnify Your Selling Channels in One Product Catalog Service
Unify Your Selling Channels in One Product Catalog Service
 
Building search and discovery services for Schibsted (LSRS '17)
Building search and discovery services for Schibsted (LSRS '17)Building search and discovery services for Schibsted (LSRS '17)
Building search and discovery services for Schibsted (LSRS '17)
 
Tokens, Complex Systems, and Nature
Tokens, Complex Systems, and NatureTokens, Complex Systems, and Nature
Tokens, Complex Systems, and Nature
 
Customer Story: Elastic Stack을 이용한 게임 서비스 통합 로깅 플랫폼
Customer Story: Elastic Stack을 이용한 게임 서비스 통합 로깅 플랫폼Customer Story: Elastic Stack을 이용한 게임 서비스 통합 로깅 플랫폼
Customer Story: Elastic Stack을 이용한 게임 서비스 통합 로깅 플랫폼
 
Jeremy cabral search marketing summit - scraping data-driven content (1)
Jeremy cabral   search marketing summit - scraping data-driven content (1)Jeremy cabral   search marketing summit - scraping data-driven content (1)
Jeremy cabral search marketing summit - scraping data-driven content (1)
 
Transformer_Clustering_PyData_2022.pdf
Transformer_Clustering_PyData_2022.pdfTransformer_Clustering_PyData_2022.pdf
Transformer_Clustering_PyData_2022.pdf
 
Oracle Endeca 101 Developer Introduction High Level Overview
Oracle Endeca 101 Developer Introduction High Level OverviewOracle Endeca 101 Developer Introduction High Level Overview
Oracle Endeca 101 Developer Introduction High Level Overview
 
World of IoT by Microsoft Co #iotconfua
World of IoT by Microsoft Co #iotconfuaWorld of IoT by Microsoft Co #iotconfua
World of IoT by Microsoft Co #iotconfua
 
Search enginebasics
Search enginebasicsSearch enginebasics
Search enginebasics
 
Tokens and Complex Systems
Tokens and Complex SystemsTokens and Complex Systems
Tokens and Complex Systems
 
MongoDB, E-commerce and Transactions
MongoDB, E-commerce and TransactionsMongoDB, E-commerce and Transactions
MongoDB, E-commerce and Transactions
 
Prepare for Peak Holiday Season with MongoDB
Prepare for Peak Holiday Season with MongoDBPrepare for Peak Holiday Season with MongoDB
Prepare for Peak Holiday Season with MongoDB
 
Design Systems at Scale - Design Systems London
Design Systems at Scale - Design Systems LondonDesign Systems at Scale - Design Systems London
Design Systems at Scale - Design Systems London
 
MongoDB and Ecommerce : A perfect combination
MongoDB and Ecommerce : A perfect combinationMongoDB and Ecommerce : A perfect combination
MongoDB and Ecommerce : A perfect combination
 
Gadget Store Application
Gadget Store ApplicationGadget Store Application
Gadget Store Application
 
Introduction to Azure DocumentDB
Introduction to Azure DocumentDBIntroduction to Azure DocumentDB
Introduction to Azure DocumentDB
 
Accessibility for design system 19
Accessibility for design system 19Accessibility for design system 19
Accessibility for design system 19
 
The paradox of big data - dataiku / oxalide APEROTECH
The paradox of big data - dataiku / oxalide APEROTECHThe paradox of big data - dataiku / oxalide APEROTECH
The paradox of big data - dataiku / oxalide APEROTECH
 
Cross mobile testautomation mit Xamarin & SpecFlow
Cross mobile testautomation mit Xamarin & SpecFlowCross mobile testautomation mit Xamarin & SpecFlow
Cross mobile testautomation mit Xamarin & SpecFlow
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 

Applying NLP to product comparison at visual meta

  • 1. Applying NLP to Product Comparison at Visual Meta 1 Ross Turner Elasticsearch Meetup Berlin 22/02/17
  • 2. Overview Product Comparison on the Visual Meta Platform1 Applying NLP to Product Comparison Using NLP to Maintain a Product Catalogue2 Making Product Discovery Conversational3 2
  • 3. About Me Previously… • Researcher in Natural Language Generation (NLG) • Software Engineer on Local Search • Co-founder and Principal Engineer at an NLG Start Up Currently… • Engineering Head at Visual Meta
  • 4. Product Comparison on the Visual Meta Platform 4
  • 5. Product Comparison at Visual Meta ‘All shops, one site’ • Online marketing platform with shopping portals in 12 different countries • 3 brands: Ladenzeile, ShopAlike, UmSóLugar • 100,000,000+ items • 6,000+ partner shops
  • 6. Faceted Search at Visual Meta Discover fashion, furniture and more…. • 800,000 platform visits per day • 80 filter types across 21 categories • Currently porting filter search to ElasticSearch
  • 7. Maintaining a Product Catalogue at Visual Meta Product feeds are continuously synced from partner shops: • Feed items must be categorised in order to be discoverable on the platform We want to: • Identify all variants of a product • Compare offers across shops • Make it easy for our for users to browse through millions of products Model Colour Memory Apple iPhone 6s Space Grey 32GB Apple iPhone 6s Space Grey 128GB Apple iPhone 6s Gold 32GB Apple iPhone 6s Gold 128GB Apple iPhone 6s Rose Gold 32GB Apple iPhone 6s Rose Gold 128GB Apple iPhone 6s Silver 32GB Apple iPhone 6s Silver 128GB
  • 8. Assigning Tags Based on Textual Attributes 8
  • 9. String Matching Index item names and descriptions, query product variant tag names against the index Lucene query: • +(Name:apple Description:apple) +(Name:iphone Description:iphone) +(Name:6s Description:6s) +(Name:16gb Description:16gb) +(Name:space Description:space) +(Name:grey) Test by manually assigning items to a random sample of products Recall Precision Fscore 0.59 0.64 0.61
  • 10. Error Analysis Naming for the same product is not consistent across feeds: 1. abc.com: “Apple iPhone 6 (Space Grey, 64GB)” 2. efg.com: “Apple iPhone 6 64 GB Space Grey” 3. xyz.com: “Apple iPhone 6” Naming for the same product is not consistent within the same feed: 1. “Apple Iphone 6 - 64GB” 2. “Apple Iphone 6 64GB Space Grey” 3. “Kamakshi Apple iPhone 6 (Latest Model) - 64 GB - Space Gray - Smartphone” Wrongly categorised Products in the feed: • “Cover for Apple Iphone 6 - 64GB”
  • 11. Comparing Tag Names to Item Names
  • 14. Language Models Drawbacks of bag of words / n-grams: • Words are equally distant • Vectors are sparse Word embeddings capture semantics: • Vectors are continuous • Similar words are close in vector space 1. Efficient estimation of word representations in vector space arXiv preprint arXiv:1301.3781 (2013) by Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean
  • 15. 15 Word2Vec for Mobile Phone Items Mobile phone item corpus: • 7,890 feed items • 863k tokens, 41.5k unique Closest words to “Galaxy”: Word Cosine Distance 1 Samsung 0.51 2 S2 0.48 3 S5 0.46
  • 16. Classification Performance Tag Best BOW Classifier Decision Tree with Word2Vec Fscore Precision Recall Fscore Precision Recall “Smartphone” 0.95 0.99 0.84 0.92 0.94 0.90 “Home Speakers” 0.55 0.67 0.32 0.79 0.79 0.80 “Creeper Cipök” 0.58 0.97 0.22 0.75 0.86 0.66 “Leder Schuhe” 0.52 0.73 0.25 0.71 0.93 0.58 “Bett mit Schubladen” 0.52 0.65 0.29 0.70 0.81 0.62
  • 18. Two Descriptions of a Samsung TV Samsung UE40H6400AK. Display diagonal: 101.6 cm (40"), HD type: Full HD, Display resolution: 1920 x 1080 pixels. Tuner type: Analog & Digital, Digital signal format system: DVB-C, DVB-T. RMS rated power: 20 W. Consumer Electronics Control (CEC): Anynet+. Picture processing technology: Samsung Wide Color Enhancer The Samsung UE40H6400 has a 101.6cm screen size and a resolution of 1920 x 1080 pixels. It is a Full HD TV, has an Analog & Digital tuner and comes with Anynet+.
  • 19. Generating Product Descriptions Choosing what to say Deciding how to say it 3. E Reiter (2007). An Architecture for Data-to-Text Systems. In Proceedings of ENLG-2007, pages 97-104
  • 20. Two Descriptions of a Samsung Smartphone Samsung SM-G920F, Galaxy. Display diagonal: 12.9 cm (5.1"), Display resolution: 2560 x 1440 pixels, Display type: SAMOLED. Processor frequency: 2.1 GHz, Coprocessor frequency: 1.5 GHz. Internal storage capacity: 32 GB, Internal RAM: 3072 MB. Main camera resolution (numeric): 16 MP, Video recording modes: 1080p, 2160p, Maximum frame rate: 30 fps. SIM card capability: Single SIM, SIM card type: NanoSIM, 2G standards: GSM The Samsung GALAXY S6 has a 12.9' display with 2560 x 1440 pixel resolution. It has a 2.1GHZ processor, a 16 megapixel camera and 3072MB of internal RAM with 32GB of internal storage capacity.
  • 21. Building Messages from a Product Catalogue The Samsung Galaxy S6 has a 12.9' display with 2560 x 1440 pixel resolution. It has a 2.1GHZ processor, a 16 megapixel camera and 3072MB of internal RAM with 32GB of internal storage capacity.
  • 22. Making Product Discovery Conversational 22
  • 23. Entity Recognition for Voice Search Input - “I’d like some red adidas trainers” Output: • <brands, [adidas]> • <categories, [trainers]> • <colours, [red]> 234. http://visual-meta.com/tech-corner/hi-lara-building-a-conversational-agent-for-visual-metas-first-hackathon.html
  • 24. Lucene index is built from labels to tag tree tokens 1. Word shingles are extracted from the input query 2. Each shingle is queried against the index (top down, greedy) Labeled tokens are used to: 1. Query the product index 2. Keep track of the dialogue state Using the Product Catalogue to Parse Queries 24 • “I’d like some red adidas trainers” • “I’d like some red adidas” • “like some red adidas trainers” • “I’d like some red” • “like some red adidas” • “some red adidas trainers” • ... • “red” • “adidas” • “trainers”
  • 25. Putting It all Together: Answering Queries How big is the Samsung Galaxy S6’s screen? The Samsung Galaxy S6 has a 12’9 display How much RAM does it have? It has 3072MB of RAM
  • 27. Takeaways 1. Word embeddings, even when trained on limited data can: a. provide significant improvement over bag of words models for text classification; and b. reduce the amount of manually curated data required for the task 2. Product catalogues provide a rich information source for conversational apps 3. NLG can be utilised for product feed enhancement as well as conversation