SlideShare uma empresa Scribd logo
1 de 34
Baixar para ler offline
CONTENT-WORTH DEBUT ARTIST CLASSIFIER
Wonyoung Lucas Seo
INTRODUCTION
HIPHOPLE.com ??????????
- US HIPHOP/R&B based Web Magazine platform
- Major Contents
- Translation
- Lyric Translation( * this is the part I contribute )
- Subtitled MusicVideo
- News of HIPHOP / R&B Industry
- Magazine
- Feature Articles
- Artist Interview
- Album Review
- Fashion / Life Style
Project Inspiration
- CEO’s needs
• Public reputation / preference analysis for
merchandise sales
• K-Hiphop buzz analysis in South-East Asian
region media
• Super Rookie Artist prediction analysis
• Messenger app chatbot
• And many more …
Why don’t I see you
working these days?
I’ve been totally occupied with
this machine learning thing
Do you have anything in mind that you
want to predict or analyze ?
I need some inspiration for my project
Actually, I do !!
This …
And that …
This would be awesome
Yeah and that too …
( CEO )
Predict Super Rookie ?? ( … sounds cool.
Maybe I should first focus on this one)
Let’s predict
Super Rookie!!!
Setting More
Specific Goal …
Hiphople Magazine Team produces contents of
successful debut artists
When producing contents on a debut artist … :
(i) Based on the domain/industrial knowledge , editors
decide whether or not to produce a content on thie
particular rookie artist.
(ii) Spend considerable amount of time to research them.
(iii) Or loose chance to become the first press to write
about the aritst to other press.
How can we reduce research time by
assigning priority list?
&
Improve decision making process?
Let’s predict whether the rookie artist
is worth creating a content & publish /
distribute to our subscribers with Machine
Learning classification technique based on
the quantitative information
H I P H O P L E ’s c o n t e n t s =
R ev i e w + I n t e r v i e w + Fe a t u re
A r t i c l e s + S N S A c c o u n t + @
PROJECT WORKFLOW
1. DefineVariables / Input
2. Collect Data & Store AWS MySQL DB Server
3. Labeling TargetVariable Classes
4. ORM Data Query
5. Merging Data
6. Fitting Models
7. Feature Engineering
8. Model CrossValidation and Optimization
DEFINEVARIABLES / INPUT
Binary Classification
1 : Worth-creating a content with this artist
0 : Ignore this artist
HIPHOPLE editors
participated in labeling process based
on their domain knowledge
TargetVariable
IndependentVariables (Features, Input)
Rows : Debut Album released from Jan 1st 2011 to Apr 18th 2018
- Genre
- Count of single album released before the official album release
- Count of articles published by Music / Culture Media Press
- Ratings from Music Media Press
- Number of SNS followers of each artist
COLLECTING DATA (CRAWLING)
Crawling Data CrawlingTarget Crawling Python Library
Debut album by year www.wikipedia.com Requests, BeautifulSoup
Ratings www.metacritic.com Scrapy
www.pitchfork.com Requests
www.albumoftheyear.com Requests, BeautifulSoup
SNS followers of each artist www.chartmetric.io Selenium
Count of related article www.billboard.com Requests, BeautifulSoup
www.genius.com Requests, BeautifulSoup
www.thesource.com Requests, BeautifulSoup
www.xxlmag.com Requests, BeautifulSoup
Data Crawling à Stored in AWS EC2 MySQL DB
TARGETVARIABLE CLASS LABELING
• Editors participated in labeling
the class as 1 , 0
• Labeling sheet was shared via
Google Spreadsheet
Labeling
• If you had created any contents on this artist/album
please label it as “ 1 ” in roman numerals
• If you don’t remember but still think its worth
creating contents, please label it as “ 1 ”
• If you didn’t created any contents related to this
artist/album, please label it as “ 0 ”
MERGE DATA
Data size : 1083 Rows
- Label ( TargetVariable)
Features : 15
Aritcle Counts
- Billboard
- Genius
- The Source
- XXL Magazine
Album Info
- Genre
- Count of Single
Albums
SNS
- Twitter followers
- Instagram followers
- Facebook page Likes
- Youtube Channel subscribers
- Soundcloud followers
- Spotify followers
Ratings
- Album of theYear
- Pitchfork
- Metacritic
FITTING MODELS
Model
Classification Metrics (Class 1)
Comment
Precision Recall F 1 Score AUC
KNN : K-Nearest Neighbours 0.79 0.43 0.55 0.83
SGD : Steepest Gradient
Decent
0.31 0.92 0.46 0.66
Decision Tree 0.62 0.65 0.63 0.86
Random Forest 0.74 0.67 0.70 0.91 ★
Gradient Boosting 0.69 0.57 0.62 0.93
XGBoost :
Extra Gradient Boosting
0.78 0.64 0.70 0.93
Classification Algorthms: Scikit-Learn Libraries
Reason for selecting Random Forest Classifier
Recall (Sensitivity) was major consideration for this project
- The impact of False Positive ( FP ) in case of considering Precision
• Incorrect prediction : answered Class 1 (prediction) to actual Class 0 data (Real)
• Create content even though the artist & album not valued.
• Spend more time and energy on incorrect class à Wasted internal resources
- The impact of False Negative ( FN ) in case of considering Recall
• Incorrect prediction : answered Class 0 (prediction) to actual Class 1 data (Rea)
• Ignore the artist & album that we shouldn’t have ignored
• Increase possibility of other media creating content first hand
• Increase possibility of losing potential subscribers
• Increase possibility of losing opportunity of arranging management contract with the
artist for management in South Korea
FEATURE ENGINEERING
Feature EngineeringTrials
Feature Type Descriptions
Performance
Improvement
Genre One-Hot Encoding Create DummyVariable with One-Hot Encoding O
Ratings NullValue Imputation
Fill in “ 0 ” rating score for the album that didn’t
received any ratings
O
Buzz
Rating
SNS Followers
Binning
- Binning the range of numeric values
- Create DummyVariables for each bin
X
SNS Followers Ratio
- [ Follower of Each SNS /Total Followers ]
- Calculate Ratio for each SNS platform
X
SNS Followers Re-Grouping
- Private SNS (Twitter, Instagram)
- Page & Channel based SNS (Facebook,Youtube)
- Music based SNS (Soundcloud, Spotify)
X
IMPROVING PERFORMANCE & OPTIMIZATION
Recall Score Improvement
Imbalance Problem
- Number of a Class 1 is far less than Class 0 …
• Classifier measures performance based on “Accuracy”
• Then Classifier biased towards the major classes and hence show very
poor classification rates on minor classes à still shows fairly good
accuracy
- Applied Under Sampling and adjust Class ratio to 50:50
- Trade-off Precision à Focus on Recall Score
Model
Classification Metrics (Class 1)
Comment
Precision Recall F 1 Score AUC
ClusterCentroids 0.45 0.95 0.61 0.86
Random Under Sampler 0.67 0.88 0.89 0.94 ★
CondensedNearestNeighbour 0.61 0.80 0.69 0.92
AllKNN Metrics 0.54 0.89 0.67 0.92
InstanceHardnessThreshold 0.60 0.89 0.72 0.92
NearMiss 0.32 0.95 0.47 0.74
NeighbourhoodCleaningRule 0.60 0.80 0.69 0.93
OneSidedSelection 0.76 0.73 0.75 0.92
TomekLinks Metrics 0.75 0.71 0.73 0.95
Under Sampling : Imblearn Library
Optimization : Scikit-Learn GridSearchCV
• Apply GridSearch Result
• Tune Hyper Parameter
Classification
Report
Precision Recall F1 Score
Class 0 0.98 0.87 0.92
Class 1 0.65 0.93 0.76
Avg /Total 0.91 0.88 0.89
ROC Curve
AUC
TEST NEW DATA !
Youngboy Never Broke Again –
Until Death Call My Name
- Release date : Apr 27th 2018 (Genre : Hiphop)
- Debut Studio Album
* Answer * HIPHOPLE hasn’t produced any
contents for this artist and album as of May 9th
Then … how would the
model predict on this album?
Load Model
Input data from the
artist and album
Class Predict Result:
0 à Ignore this artist and album
Probability Prediction Result:
30% à Low Probability à Low Priority
OH -YEAH
HOW CAN I IMPROVE FURTHER?
1. Research other method to tackle Imbalanced Problem to also increase Precision
2. Try XGBoost Algorithm and tune its best Hyper Parameter
3. Small Dataset
• Accumulate more data for debut albums in future and reflect on model
• Crawl input data from more media press
• Text sentiment analysis would be a plus
4. Dash-based visualization web application
• Automation of Crawling Input Data + Prediction
- Crawl Feature Data à update dataset
- Load Pickle filed model à Predict and return result
List ofTo-Do’s
For more information on this project …
Please visit the links below :)
Github
http://www.github.com/lucaseo
LinkedIn
http://www.linkedin.com/in/lucaseo
Github Blog
http://www.lucaseo.github.io
Please check the links below :D
Github Repository Source Code
https://github.com/lucaseo/content-worth-debut-artist-
classification-project
DataVisualization Web Application
http://le-rookie-clf.herokuapp.com
And about me …
THANKYOU !

Mais conteúdo relacionado

Mais procurados

Social Tags and Music Information Retrieval (Part I)
Social Tags and Music Information Retrieval (Part I)Social Tags and Music Information Retrieval (Part I)
Social Tags and Music Information Retrieval (Part I)Paul Lamere
 
Alexi Murdoch Press Update
Alexi Murdoch Press UpdateAlexi Murdoch Press Update
Alexi Murdoch Press UpdateCitySlang
 
Last.fm Usability Presentation
Last.fm Usability PresentationLast.fm Usability Presentation
Last.fm Usability PresentationRobert Sherron
 
Social Tags and Music Information Retrieval (Part II)
Social Tags and Music Information Retrieval (Part II)Social Tags and Music Information Retrieval (Part II)
Social Tags and Music Information Retrieval (Part II)Paul Lamere
 
Machine Learning and Big Data for Music Discovery at Spotify
Machine Learning and Big Data for Music Discovery at SpotifyMachine Learning and Big Data for Music Discovery at Spotify
Machine Learning and Big Data for Music Discovery at SpotifyChing-Wei Chen
 
Drupal case study: ABC Dig Music
Drupal case study: ABC Dig MusicDrupal case study: ABC Dig Music
Drupal case study: ABC Dig MusicDavid Peterson
 
Machine learning for creative AI applications in music (2018 nov)
Machine learning for creative AI applications in music (2018 nov)Machine learning for creative AI applications in music (2018 nov)
Machine learning for creative AI applications in music (2018 nov)Yi-Hsuan Yang
 
final Pitch Edit
final Pitch Editfinal Pitch Edit
final Pitch Editthe4birds
 
Deezer and Spotify for brands and labels
Deezer and Spotify for brands and labelsDeezer and Spotify for brands and labels
Deezer and Spotify for brands and labelsPlayApp
 
Big Data At Spotify
Big Data At SpotifyBig Data At Spotify
Big Data At SpotifyAdam Kawa
 
Research at MAC Lab, Academia Sincia, in 2017
Research at MAC Lab, Academia Sincia, in 2017Research at MAC Lab, Academia Sincia, in 2017
Research at MAC Lab, Academia Sincia, in 2017Yi-Hsuan Yang
 
A2 music video evaluation 2 the one
A2 music video evaluation 2 the oneA2 music video evaluation 2 the one
A2 music video evaluation 2 the onevgutowski
 
C3 music magazine_treatment
C3 music magazine_treatment C3 music magazine_treatment
C3 music magazine_treatment cassieRix
 
Using mashup technology to improve findability
Using mashup technology to improve findabilityUsing mashup technology to improve findability
Using mashup technology to improve findabilitySten Govaerts
 

Mais procurados (18)

Social Tags and Music Information Retrieval (Part I)
Social Tags and Music Information Retrieval (Part I)Social Tags and Music Information Retrieval (Part I)
Social Tags and Music Information Retrieval (Part I)
 
Film pitch
Film pitchFilm pitch
Film pitch
 
Alexi Murdoch Press Update
Alexi Murdoch Press UpdateAlexi Murdoch Press Update
Alexi Murdoch Press Update
 
Last.fm Usability Presentation
Last.fm Usability PresentationLast.fm Usability Presentation
Last.fm Usability Presentation
 
Social Tags and Music Information Retrieval (Part II)
Social Tags and Music Information Retrieval (Part II)Social Tags and Music Information Retrieval (Part II)
Social Tags and Music Information Retrieval (Part II)
 
Machine Learning and Big Data for Music Discovery at Spotify
Machine Learning and Big Data for Music Discovery at SpotifyMachine Learning and Big Data for Music Discovery at Spotify
Machine Learning and Big Data for Music Discovery at Spotify
 
Drupal case study: ABC Dig Music
Drupal case study: ABC Dig MusicDrupal case study: ABC Dig Music
Drupal case study: ABC Dig Music
 
Machine learning for creative AI applications in music (2018 nov)
Machine learning for creative AI applications in music (2018 nov)Machine learning for creative AI applications in music (2018 nov)
Machine learning for creative AI applications in music (2018 nov)
 
final Pitch Edit
final Pitch Editfinal Pitch Edit
final Pitch Edit
 
Deezer and Spotify for brands and labels
Deezer and Spotify for brands and labelsDeezer and Spotify for brands and labels
Deezer and Spotify for brands and labels
 
Big Data At Spotify
Big Data At SpotifyBig Data At Spotify
Big Data At Spotify
 
Research at MAC Lab, Academia Sincia, in 2017
Research at MAC Lab, Academia Sincia, in 2017Research at MAC Lab, Academia Sincia, in 2017
Research at MAC Lab, Academia Sincia, in 2017
 
Music video pitch
Music video pitchMusic video pitch
Music video pitch
 
A2 music video evaluation 2 the one
A2 music video evaluation 2 the oneA2 music video evaluation 2 the one
A2 music video evaluation 2 the one
 
C3 music magazine_treatment
C3 music magazine_treatment C3 music magazine_treatment
C3 music magazine_treatment
 
Using mashup technology to improve findability
Using mashup technology to improve findabilityUsing mashup technology to improve findability
Using mashup technology to improve findability
 
Uo march2015 talk
Uo march2015 talkUo march2015 talk
Uo march2015 talk
 
Digipak planning
Digipak planningDigipak planning
Digipak planning
 

Semelhante a Project overview eng

MusicBiz2021 - Playlists and Beyond
MusicBiz2021 - Playlists and BeyondMusicBiz2021 - Playlists and Beyond
MusicBiz2021 - Playlists and BeyondPatriceKelly3
 
Conquer CADs Certification Training Day 2
Conquer CADs Certification Training Day 2Conquer CADs Certification Training Day 2
Conquer CADs Certification Training Day 22nyce2fail
 
Random Forests R vs Python by Linda Uruchurtu
Random Forests R vs Python by Linda UruchurtuRandom Forests R vs Python by Linda Uruchurtu
Random Forests R vs Python by Linda UruchurtuPyData
 
Deezer - Big data as a streaming service
Deezer - Big data as a streaming serviceDeezer - Big data as a streaming service
Deezer - Big data as a streaming serviceJulie Knibbe
 
Music Marketing Platforms - A Quick Hit Comparison
Music Marketing Platforms - A Quick Hit ComparisonMusic Marketing Platforms - A Quick Hit Comparison
Music Marketing Platforms - A Quick Hit Comparisonsongsponge
 
Trends in Music Recommendations 2018
Trends in Music Recommendations 2018Trends in Music Recommendations 2018
Trends in Music Recommendations 2018Karthik Murugesan
 
Pubcon Vegas 2010 - Social Media: Measurements & Tools
Pubcon Vegas 2010 - Social Media: Measurements & ToolsPubcon Vegas 2010 - Social Media: Measurements & Tools
Pubcon Vegas 2010 - Social Media: Measurements & ToolsAdam Proehl
 
Streaming Music in 2015: A Rapidly Growing, Maturing and Consolidating Market...
Streaming Music in 2015: A Rapidly Growing, Maturing and Consolidating Market...Streaming Music in 2015: A Rapidly Growing, Maturing and Consolidating Market...
Streaming Music in 2015: A Rapidly Growing, Maturing and Consolidating Market...Aram Sinnreich
 
Designing a Better Online Music Store (by:Larm 2008)
Designing a Better Online Music Store (by:Larm 2008)Designing a Better Online Music Store (by:Larm 2008)
Designing a Better Online Music Store (by:Larm 2008)Vegard Sandvold
 
Understanding ai music discovery and recommendation systems
Understanding ai music discovery and recommendation systemsUnderstanding ai music discovery and recommendation systems
Understanding ai music discovery and recommendation systemsValerio Velardo
 
Nea research task 7 websites
Nea research task 7 websitesNea research task 7 websites
Nea research task 7 websitestwbsmediaconnell
 
Mosaic engr 245 lean launchpad stanford 2020
Mosaic engr 245 lean launchpad stanford 2020Mosaic engr 245 lean launchpad stanford 2020
Mosaic engr 245 lean launchpad stanford 2020Stanford University
 
A Complete Guide on How to Develop Music Streaming App
A Complete Guide on How to Develop Music Streaming AppA Complete Guide on How to Develop Music Streaming App
A Complete Guide on How to Develop Music Streaming AppXongoLab Technologies LLP
 
OCR AS Media Exam Section B Prep
OCR AS Media Exam Section B Prep OCR AS Media Exam Section B Prep
OCR AS Media Exam Section B Prep hughes82
 
As evaluations new and improved (1)
As evaluations new and improved (1)As evaluations new and improved (1)
As evaluations new and improved (1)DannyAA
 
Site Search Analytics
Site Search Analytics Site Search Analytics
Site Search Analytics Stefan Thies
 
Lesson 5 ta researchto identify the conventions of a concept based
Lesson 5 ta researchto identify the conventions of a concept basedLesson 5 ta researchto identify the conventions of a concept based
Lesson 5 ta researchto identify the conventions of a concept basedsandylking
 

Semelhante a Project overview eng (20)

MusicBiz2021 - Playlists and Beyond
MusicBiz2021 - Playlists and BeyondMusicBiz2021 - Playlists and Beyond
MusicBiz2021 - Playlists and Beyond
 
Conquer CADs Certification Training Day 2
Conquer CADs Certification Training Day 2Conquer CADs Certification Training Day 2
Conquer CADs Certification Training Day 2
 
Rapfeatures Pitch Deck
Rapfeatures Pitch DeckRapfeatures Pitch Deck
Rapfeatures Pitch Deck
 
Random Forests R vs Python by Linda Uruchurtu
Random Forests R vs Python by Linda UruchurtuRandom Forests R vs Python by Linda Uruchurtu
Random Forests R vs Python by Linda Uruchurtu
 
Deezer - Big data as a streaming service
Deezer - Big data as a streaming serviceDeezer - Big data as a streaming service
Deezer - Big data as a streaming service
 
Bot binder
Bot binderBot binder
Bot binder
 
Music Marketing Platforms - A Quick Hit Comparison
Music Marketing Platforms - A Quick Hit ComparisonMusic Marketing Platforms - A Quick Hit Comparison
Music Marketing Platforms - A Quick Hit Comparison
 
Trends in Music Recommendations 2018
Trends in Music Recommendations 2018Trends in Music Recommendations 2018
Trends in Music Recommendations 2018
 
Pubcon Vegas 2010 - Social Media: Measurements & Tools
Pubcon Vegas 2010 - Social Media: Measurements & ToolsPubcon Vegas 2010 - Social Media: Measurements & Tools
Pubcon Vegas 2010 - Social Media: Measurements & Tools
 
Streaming Music in 2015: A Rapidly Growing, Maturing and Consolidating Market...
Streaming Music in 2015: A Rapidly Growing, Maturing and Consolidating Market...Streaming Music in 2015: A Rapidly Growing, Maturing and Consolidating Market...
Streaming Music in 2015: A Rapidly Growing, Maturing and Consolidating Market...
 
Designing a Better Online Music Store (by:Larm 2008)
Designing a Better Online Music Store (by:Larm 2008)Designing a Better Online Music Store (by:Larm 2008)
Designing a Better Online Music Store (by:Larm 2008)
 
Understanding ai music discovery and recommendation systems
Understanding ai music discovery and recommendation systemsUnderstanding ai music discovery and recommendation systems
Understanding ai music discovery and recommendation systems
 
A&R Research - The Process
A&R Research - The ProcessA&R Research - The Process
A&R Research - The Process
 
Nea research task 7 websites
Nea research task 7 websitesNea research task 7 websites
Nea research task 7 websites
 
Mosaic engr 245 lean launchpad stanford 2020
Mosaic engr 245 lean launchpad stanford 2020Mosaic engr 245 lean launchpad stanford 2020
Mosaic engr 245 lean launchpad stanford 2020
 
A Complete Guide on How to Develop Music Streaming App
A Complete Guide on How to Develop Music Streaming AppA Complete Guide on How to Develop Music Streaming App
A Complete Guide on How to Develop Music Streaming App
 
OCR AS Media Exam Section B Prep
OCR AS Media Exam Section B Prep OCR AS Media Exam Section B Prep
OCR AS Media Exam Section B Prep
 
As evaluations new and improved (1)
As evaluations new and improved (1)As evaluations new and improved (1)
As evaluations new and improved (1)
 
Site Search Analytics
Site Search Analytics Site Search Analytics
Site Search Analytics
 
Lesson 5 ta researchto identify the conventions of a concept based
Lesson 5 ta researchto identify the conventions of a concept basedLesson 5 ta researchto identify the conventions of a concept based
Lesson 5 ta researchto identify the conventions of a concept based
 

Último

BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 

Último (20)

BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 

Project overview eng

  • 1. CONTENT-WORTH DEBUT ARTIST CLASSIFIER Wonyoung Lucas Seo
  • 3. HIPHOPLE.com ?????????? - US HIPHOP/R&B based Web Magazine platform - Major Contents - Translation - Lyric Translation( * this is the part I contribute ) - Subtitled MusicVideo - News of HIPHOP / R&B Industry - Magazine - Feature Articles - Artist Interview - Album Review - Fashion / Life Style
  • 4. Project Inspiration - CEO’s needs • Public reputation / preference analysis for merchandise sales • K-Hiphop buzz analysis in South-East Asian region media • Super Rookie Artist prediction analysis • Messenger app chatbot • And many more … Why don’t I see you working these days? I’ve been totally occupied with this machine learning thing Do you have anything in mind that you want to predict or analyze ? I need some inspiration for my project Actually, I do !! This … And that … This would be awesome Yeah and that too … ( CEO )
  • 5. Predict Super Rookie ?? ( … sounds cool. Maybe I should first focus on this one)
  • 6. Let’s predict Super Rookie!!! Setting More Specific Goal … Hiphople Magazine Team produces contents of successful debut artists When producing contents on a debut artist … : (i) Based on the domain/industrial knowledge , editors decide whether or not to produce a content on thie particular rookie artist. (ii) Spend considerable amount of time to research them. (iii) Or loose chance to become the first press to write about the aritst to other press. How can we reduce research time by assigning priority list? & Improve decision making process? Let’s predict whether the rookie artist is worth creating a content & publish / distribute to our subscribers with Machine Learning classification technique based on the quantitative information
  • 7. H I P H O P L E ’s c o n t e n t s = R ev i e w + I n t e r v i e w + Fe a t u re A r t i c l e s + S N S A c c o u n t + @
  • 9. 1. DefineVariables / Input 2. Collect Data & Store AWS MySQL DB Server 3. Labeling TargetVariable Classes 4. ORM Data Query 5. Merging Data 6. Fitting Models 7. Feature Engineering 8. Model CrossValidation and Optimization
  • 11. Binary Classification 1 : Worth-creating a content with this artist 0 : Ignore this artist HIPHOPLE editors participated in labeling process based on their domain knowledge TargetVariable
  • 12. IndependentVariables (Features, Input) Rows : Debut Album released from Jan 1st 2011 to Apr 18th 2018 - Genre - Count of single album released before the official album release - Count of articles published by Music / Culture Media Press - Ratings from Music Media Press - Number of SNS followers of each artist
  • 14. Crawling Data CrawlingTarget Crawling Python Library Debut album by year www.wikipedia.com Requests, BeautifulSoup Ratings www.metacritic.com Scrapy www.pitchfork.com Requests www.albumoftheyear.com Requests, BeautifulSoup SNS followers of each artist www.chartmetric.io Selenium Count of related article www.billboard.com Requests, BeautifulSoup www.genius.com Requests, BeautifulSoup www.thesource.com Requests, BeautifulSoup www.xxlmag.com Requests, BeautifulSoup Data Crawling à Stored in AWS EC2 MySQL DB
  • 16. • Editors participated in labeling the class as 1 , 0 • Labeling sheet was shared via Google Spreadsheet Labeling • If you had created any contents on this artist/album please label it as “ 1 ” in roman numerals • If you don’t remember but still think its worth creating contents, please label it as “ 1 ” • If you didn’t created any contents related to this artist/album, please label it as “ 0 ”
  • 18. Data size : 1083 Rows - Label ( TargetVariable) Features : 15 Aritcle Counts - Billboard - Genius - The Source - XXL Magazine Album Info - Genre - Count of Single Albums SNS - Twitter followers - Instagram followers - Facebook page Likes - Youtube Channel subscribers - Soundcloud followers - Spotify followers Ratings - Album of theYear - Pitchfork - Metacritic
  • 20. Model Classification Metrics (Class 1) Comment Precision Recall F 1 Score AUC KNN : K-Nearest Neighbours 0.79 0.43 0.55 0.83 SGD : Steepest Gradient Decent 0.31 0.92 0.46 0.66 Decision Tree 0.62 0.65 0.63 0.86 Random Forest 0.74 0.67 0.70 0.91 ★ Gradient Boosting 0.69 0.57 0.62 0.93 XGBoost : Extra Gradient Boosting 0.78 0.64 0.70 0.93 Classification Algorthms: Scikit-Learn Libraries
  • 21. Reason for selecting Random Forest Classifier Recall (Sensitivity) was major consideration for this project - The impact of False Positive ( FP ) in case of considering Precision • Incorrect prediction : answered Class 1 (prediction) to actual Class 0 data (Real) • Create content even though the artist & album not valued. • Spend more time and energy on incorrect class à Wasted internal resources - The impact of False Negative ( FN ) in case of considering Recall • Incorrect prediction : answered Class 0 (prediction) to actual Class 1 data (Rea) • Ignore the artist & album that we shouldn’t have ignored • Increase possibility of other media creating content first hand • Increase possibility of losing potential subscribers • Increase possibility of losing opportunity of arranging management contract with the artist for management in South Korea
  • 23. Feature EngineeringTrials Feature Type Descriptions Performance Improvement Genre One-Hot Encoding Create DummyVariable with One-Hot Encoding O Ratings NullValue Imputation Fill in “ 0 ” rating score for the album that didn’t received any ratings O Buzz Rating SNS Followers Binning - Binning the range of numeric values - Create DummyVariables for each bin X SNS Followers Ratio - [ Follower of Each SNS /Total Followers ] - Calculate Ratio for each SNS platform X SNS Followers Re-Grouping - Private SNS (Twitter, Instagram) - Page & Channel based SNS (Facebook,Youtube) - Music based SNS (Soundcloud, Spotify) X
  • 24. IMPROVING PERFORMANCE & OPTIMIZATION
  • 25. Recall Score Improvement Imbalance Problem - Number of a Class 1 is far less than Class 0 … • Classifier measures performance based on “Accuracy” • Then Classifier biased towards the major classes and hence show very poor classification rates on minor classes à still shows fairly good accuracy - Applied Under Sampling and adjust Class ratio to 50:50 - Trade-off Precision à Focus on Recall Score
  • 26. Model Classification Metrics (Class 1) Comment Precision Recall F 1 Score AUC ClusterCentroids 0.45 0.95 0.61 0.86 Random Under Sampler 0.67 0.88 0.89 0.94 ★ CondensedNearestNeighbour 0.61 0.80 0.69 0.92 AllKNN Metrics 0.54 0.89 0.67 0.92 InstanceHardnessThreshold 0.60 0.89 0.72 0.92 NearMiss 0.32 0.95 0.47 0.74 NeighbourhoodCleaningRule 0.60 0.80 0.69 0.93 OneSidedSelection 0.76 0.73 0.75 0.92 TomekLinks Metrics 0.75 0.71 0.73 0.95 Under Sampling : Imblearn Library
  • 27. Optimization : Scikit-Learn GridSearchCV • Apply GridSearch Result • Tune Hyper Parameter Classification Report Precision Recall F1 Score Class 0 0.98 0.87 0.92 Class 1 0.65 0.93 0.76 Avg /Total 0.91 0.88 0.89 ROC Curve AUC
  • 29. Youngboy Never Broke Again – Until Death Call My Name - Release date : Apr 27th 2018 (Genre : Hiphop) - Debut Studio Album * Answer * HIPHOPLE hasn’t produced any contents for this artist and album as of May 9th Then … how would the model predict on this album?
  • 30. Load Model Input data from the artist and album Class Predict Result: 0 à Ignore this artist and album Probability Prediction Result: 30% à Low Probability à Low Priority OH -YEAH
  • 31. HOW CAN I IMPROVE FURTHER?
  • 32. 1. Research other method to tackle Imbalanced Problem to also increase Precision 2. Try XGBoost Algorithm and tune its best Hyper Parameter 3. Small Dataset • Accumulate more data for debut albums in future and reflect on model • Crawl input data from more media press • Text sentiment analysis would be a plus 4. Dash-based visualization web application • Automation of Crawling Input Data + Prediction - Crawl Feature Data à update dataset - Load Pickle filed model à Predict and return result List ofTo-Do’s
  • 33. For more information on this project … Please visit the links below :) Github http://www.github.com/lucaseo LinkedIn http://www.linkedin.com/in/lucaseo Github Blog http://www.lucaseo.github.io Please check the links below :D Github Repository Source Code https://github.com/lucaseo/content-worth-debut-artist- classification-project DataVisualization Web Application http://le-rookie-clf.herokuapp.com And about me …