SlideShare uma empresa Scribd logo
1 de 22
Photo By: David Doubilet
CIKM AnalytiCup
Lazada Product Title Quality Challenge
1
$6,000
2$2,000
3$1,000
$2,000
Team Members
Tam T. Nguyen
nthanhtam@gmail.com
Postdoctoral Research Fellow
Ryerson University
Kaggle Grandmaster
Hossein Fani
hosseinfani@gmail.com
PhD Student
University of New Brunswick
Gilberto Titericz
giba1978@gmail.com
Machine Learning Expert
AirBnb Inc.
Kaggle Grandmaster
Ebrahim Bagheri
ebrahim.bagheri@gmail.com
Associate Professor
Ryerson University
Photo By: Justin Hofman
“hot sexy red clutch rug sack travel backpack unisex cheap with free gift”
𝑦1
clarity
𝑦2
conciseness
“Hot Sexy Tom Clovers Womens Mens Classy Look Cool Simple Style Casual
Canvas Crossbody Messenger Bag Handbag Fashion Bag Tote Handbag Gray”
Problem Setting
Photo By: David Doubilet
Clarity if within five seconds one can understand the title, what the product is, and quickly figure out the key
attributes (color, size, model, ...).
Conciseness if it is short enough to contain all the necessary information. Otherwise, i.e., the title is
too long with many unnecessary words, Or it is too short such that it is unsure what the product is.
Data Set
ML-DM
1. Cleansing
• Noise
• Missing Values
• Outliers
2. Flirting
• Attributes
• Labels (if any)
• Augmentation
3. Feature Eng.
• Extraction
• Reduction
• Selection
4. Model Eng.
• Selection
• Tuning
• Evaluation
1. Cleansing
• Noise
• Html tags in ‘short_description’ (%94)
• Missing Values
• ‘product_type’ (less than %1)
• ‘category_lvl_3’ (about %6) → assign ‘category_lvl_2’
• ‘description’ (less than %1)
• Outliers
• ‘price’ {-1, 999999, 9999999},
• ‘price’ Normalization based on country
2. Flirting
• Attributes
• Color
• Brand
• Non-English
• <img> Image
• <li> enumeration
• 𝒚: Labels
• Disagreement in labels!(label noise)
• Augmentation
• Cloning  color, brand
Label Noise
multi-class
𝑓: 𝑋1 × 𝑋2 × … × 𝑋 𝑑 → 𝑦: 𝑐1, 𝑐2, … , 𝑐 𝑘
binary(boolean) classifier: 𝑦: 0,1
multi-output(label)
𝑓: 𝑋1 × 𝑋2 × … × 𝑋 𝑑 → 𝑦1: 𝑐1, 𝑐2, … , 𝑐 𝑘1
× 𝑦2: 𝑐1, 𝑐2, … , 𝑐 𝑘2
× ⋯ × 𝑦𝑟: 𝑐1, 𝑐2, … , 𝑐 𝑘r
multi-output binary(boolean) classifier: 𝑦1: 0,1 × 𝑦2: 0,1
Targets correlation: (single, fast model for all targets)
Only 3 combinations for (Clear,Concise):
(1,0), (1,1), (0,0)  |~Clear & Concise|= 0
if ~Clear then ~Concise
if Concise then Clear
3. Feature Eng.
• Extraction
• Reduction
• LSA,T-SNE,PCA,SVD
• Selection
• STD
• Correlation X~y
• Linear(t-test, chi2)
• Non-linear(mi)
• Model-driven
• LinearSVM
Feature Engineering
Feature Importance
Linear SVM
10-Fold Set 1 10-Fold Set 2 10-Fold Set 3 10-Fold Set 4
Base Model
Ensemble Model
Final Prediction
Fold Bagging
Fold Bagging
Set Fold Bagging
BLENDBLEND BLEND BLENDSTACK STACK STACK STACK
BLENDBLEND BLEND BLEND
BLEND
Bagging Models
Performance Evaluation
SGD: stochastic gradient descent
LOR: logistic regression
RDG: ridge regression
NBC: naive bayes classifier
XGB: extreme gradient boosting
LGB: light gradient boosting
W2V: word2vec
Model Importance
clarity conciseness
CIKM AnalytiCup 2017: Bagging Model for Product Title Quality with Noise

Mais conteúdo relacionado

Mais de Hossein Fani

Exploratory Social Network Analysis: Ranking
Exploratory Social Network Analysis: RankingExploratory Social Network Analysis: Ranking
Exploratory Social Network Analysis: RankingHossein Fani
 
Exploratory Social Network Analysis with Pajek: Diffusion
Exploratory Social Network Analysis with Pajek: DiffusionExploratory Social Network Analysis with Pajek: Diffusion
Exploratory Social Network Analysis with Pajek: DiffusionHossein Fani
 
Exploratory Social Network Analysis with Pajek: Center & Periphery
Exploratory Social Network Analysis with Pajek: Center & PeripheryExploratory Social Network Analysis with Pajek: Center & Periphery
Exploratory Social Network Analysis with Pajek: Center & PeripheryHossein Fani
 
Exploratory Social Network Analysis with Pajek: Sentiments & Friendship
Exploratory Social Network Analysis with Pajek: Sentiments & FriendshipExploratory Social Network Analysis with Pajek: Sentiments & Friendship
Exploratory Social Network Analysis with Pajek: Sentiments & FriendshipHossein Fani
 
Exploratory Social Network Analysis with Pajek: Attributes & Relations
Exploratory Social Network Analysis with Pajek: Attributes & RelationsExploratory Social Network Analysis with Pajek: Attributes & Relations
Exploratory Social Network Analysis with Pajek: Attributes & RelationsHossein Fani
 
Ontology Engineering
Ontology EngineeringOntology Engineering
Ontology EngineeringHossein Fani
 
Philosophical Software Developing
Philosophical Software DevelopingPhilosophical Software Developing
Philosophical Software DevelopingHossein Fani
 

Mais de Hossein Fani (10)

Exploratory Social Network Analysis: Ranking
Exploratory Social Network Analysis: RankingExploratory Social Network Analysis: Ranking
Exploratory Social Network Analysis: Ranking
 
Exploratory Social Network Analysis with Pajek: Diffusion
Exploratory Social Network Analysis with Pajek: DiffusionExploratory Social Network Analysis with Pajek: Diffusion
Exploratory Social Network Analysis with Pajek: Diffusion
 
Exploratory Social Network Analysis with Pajek: Center & Periphery
Exploratory Social Network Analysis with Pajek: Center & PeripheryExploratory Social Network Analysis with Pajek: Center & Periphery
Exploratory Social Network Analysis with Pajek: Center & Periphery
 
Exploratory Social Network Analysis with Pajek: Sentiments & Friendship
Exploratory Social Network Analysis with Pajek: Sentiments & FriendshipExploratory Social Network Analysis with Pajek: Sentiments & Friendship
Exploratory Social Network Analysis with Pajek: Sentiments & Friendship
 
Exploratory Social Network Analysis with Pajek: Attributes & Relations
Exploratory Social Network Analysis with Pajek: Attributes & RelationsExploratory Social Network Analysis with Pajek: Attributes & Relations
Exploratory Social Network Analysis with Pajek: Attributes & Relations
 
Temporal Network
Temporal NetworkTemporal Network
Temporal Network
 
Ontology Engineering
Ontology EngineeringOntology Engineering
Ontology Engineering
 
Software Test
Software TestSoftware Test
Software Test
 
Philosophical Software Developing
Philosophical Software DevelopingPhilosophical Software Developing
Philosophical Software Developing
 
Trend Analysis
Trend AnalysisTrend Analysis
Trend Analysis
 

Último

如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样wsppdmt
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...kumargunjan9515
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfSayantanBiswas37
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...nirzagarg
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxronsairoathenadugay
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...HyderabadDolls
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...gajnagarg
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.pptibrahimabdi22
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdfkhraisr
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制vexqp
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numberssuginr1
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...SOFTTECHHUB
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareGraham Ware
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...HyderabadDolls
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
 

Último (20)

如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 

CIKM AnalytiCup 2017: Bagging Model for Product Title Quality with Noise

  • 1. Photo By: David Doubilet
  • 2. CIKM AnalytiCup Lazada Product Title Quality Challenge 1 $6,000 2$2,000 3$1,000 $2,000
  • 3. Team Members Tam T. Nguyen nthanhtam@gmail.com Postdoctoral Research Fellow Ryerson University Kaggle Grandmaster Hossein Fani hosseinfani@gmail.com PhD Student University of New Brunswick Gilberto Titericz giba1978@gmail.com Machine Learning Expert AirBnb Inc. Kaggle Grandmaster Ebrahim Bagheri ebrahim.bagheri@gmail.com Associate Professor Ryerson University
  • 5. “hot sexy red clutch rug sack travel backpack unisex cheap with free gift” 𝑦1 clarity 𝑦2 conciseness “Hot Sexy Tom Clovers Womens Mens Classy Look Cool Simple Style Casual Canvas Crossbody Messenger Bag Handbag Fashion Bag Tote Handbag Gray” Problem Setting
  • 6. Photo By: David Doubilet
  • 7. Clarity if within five seconds one can understand the title, what the product is, and quickly figure out the key attributes (color, size, model, ...). Conciseness if it is short enough to contain all the necessary information. Otherwise, i.e., the title is too long with many unnecessary words, Or it is too short such that it is unsure what the product is. Data Set
  • 8.
  • 9. ML-DM 1. Cleansing • Noise • Missing Values • Outliers 2. Flirting • Attributes • Labels (if any) • Augmentation 3. Feature Eng. • Extraction • Reduction • Selection 4. Model Eng. • Selection • Tuning • Evaluation
  • 10. 1. Cleansing • Noise • Html tags in ‘short_description’ (%94) • Missing Values • ‘product_type’ (less than %1) • ‘category_lvl_3’ (about %6) → assign ‘category_lvl_2’ • ‘description’ (less than %1) • Outliers • ‘price’ {-1, 999999, 9999999}, • ‘price’ Normalization based on country
  • 11. 2. Flirting • Attributes • Color • Brand • Non-English • <img> Image • <li> enumeration • 𝒚: Labels • Disagreement in labels!(label noise) • Augmentation • Cloning  color, brand
  • 13.
  • 14. multi-class 𝑓: 𝑋1 × 𝑋2 × … × 𝑋 𝑑 → 𝑦: 𝑐1, 𝑐2, … , 𝑐 𝑘 binary(boolean) classifier: 𝑦: 0,1 multi-output(label) 𝑓: 𝑋1 × 𝑋2 × … × 𝑋 𝑑 → 𝑦1: 𝑐1, 𝑐2, … , 𝑐 𝑘1 × 𝑦2: 𝑐1, 𝑐2, … , 𝑐 𝑘2 × ⋯ × 𝑦𝑟: 𝑐1, 𝑐2, … , 𝑐 𝑘r multi-output binary(boolean) classifier: 𝑦1: 0,1 × 𝑦2: 0,1 Targets correlation: (single, fast model for all targets) Only 3 combinations for (Clear,Concise): (1,0), (1,1), (0,0)  |~Clear & Concise|= 0 if ~Clear then ~Concise if Concise then Clear
  • 15.
  • 16. 3. Feature Eng. • Extraction • Reduction • LSA,T-SNE,PCA,SVD • Selection • STD • Correlation X~y • Linear(t-test, chi2) • Non-linear(mi) • Model-driven • LinearSVM Feature Engineering
  • 18.
  • 19. 10-Fold Set 1 10-Fold Set 2 10-Fold Set 3 10-Fold Set 4 Base Model Ensemble Model Final Prediction Fold Bagging Fold Bagging Set Fold Bagging BLENDBLEND BLEND BLENDSTACK STACK STACK STACK BLENDBLEND BLEND BLEND BLEND Bagging Models
  • 20. Performance Evaluation SGD: stochastic gradient descent LOR: logistic regression RDG: ridge regression NBC: naive bayes classifier XGB: extreme gradient boosting LGB: light gradient boosting W2V: word2vec

Notas do Editor

  1. On Lazada, we have millions of products across thousands of categories. To stand out from the crowd, sellers employ creative, sometimes disruptive efforts to improve their search relevancy or attract the attention of customers. Product titles like this degenerate user experience by cluttering the site with irrelevant, misleading titles. In this challenge, we provide you with a set of product titles, description, and attributes, together with the associated title quality scores (clarity and conciseness) as labeled by our internal QC team. Your task is to build a product title quality model that can automatically grade the clarity and the conciseness of a product title. ‘judging a book by its cover’
  2. On Lazada, we have millions of products across thousands of categories. To stand out from the crowd, sellers employ creative, sometimes disruptive efforts to improve their search relevancy or attract the attention of customers. Product titles like this degenerate user experience by cluttering the site with irrelevant, misleading titles. In this challenge, we provide you with a set of product titles, description, and attributes, together with the associated title quality scores (clarity and conciseness) as labeled by our internal QC team. Your task is to build a product title quality model that can automatically grade the clarity and the conciseness of a product title. ‘judging a book by its cover’
  3. Contraposition Use one target as a feature for the other one. But has problem in practice since we don’t have the validation or test sets’ label.
  4. Plus the attributes, we extract more features from the textual attributes, title and short_description stability selection recursive feature elimination and cross-validation