SlideShare uma empresa Scribd logo
1 de 50
Baixar para ler offline
End-to-end
Music Classification
• MIR 

‣ 2017 DJ music
classification 

• 

• 

‣ ...

‣ ...

• ^_^

‣ 

- ML for Music

- Automated feature engineer using RL
ICASSP 2018 😆
Music Classification, Why?
• classification “representation learning” 

‣ Music streaming service
?

‣ content-based recommendation 

• Music streaming service 

‣ !

• 

‣ “ ” 

‣
:
End-to-end Music Classification
• History of E2E Music Classification Models
‣ E2E ?

• Interpretability of E2E Music Classification Models
‣
• Sample-level Deep Convolutional Neural Networks 

for Music Auto-tagging using Raw Waveforms (2017)

Jongpil Lee, Jiyoung Park, Keunhyoung Luke Kim and Juhan Nam

Sound and Music Computing Conf. (SMC), 2017.

‣ Music classification end-to-end approach 

‣ Frequency 

• Sample-level CNN Architectures

for Music Auto-tagging using Raw Waveforms (2018)

Taejun Kim, Jongpil Lee and Juhan Nam,

IEEE Int. Conf. Acoustical, Speech Signal Processing (ICASSP), 2018.

‣ CNN architecture 

‣ Loudness
Spectrogram to End-to-end
1 2 3
2014 2017 (SampleCNN)
• End-to-end
• STFT conv. layer
• (frame-level)
• End-to-end
• STFT conv. layer
• (sample-level)
• Handcrafted spectrogram
• 1D or 2D conv.
Frame- vs. Sample-level
• ( ) 

• Sample-level
‣ 1D convolution 

‣ 7 ( 0.14ms )*

• Frame-level
‣ STFT window 1D convolution 

‣ 256 ( 12ms )*

‣ Spectrogram frame-level 

→ Trade-off between time- & frequency resolution!

• *22,050kHz
Frame-level Mel-spectrogram Model
(=Channel)
1
1) 2D Convolution 2) 1D Convolution
(=Channel)
(=Channel)
• Spectrogram 

• (e.g. MNIST)

•
• Spectrogram 1 sequence 

• Frequency dim. = dim.

• End-to-end
Spectrogram Model
•
‣ ( 3~7 )

‣ 

‣ Phase invariant

•
‣ Task mid-level representation

‣ Hyperparameter tuning (e.g. window length, hop size, etc.)

‣ Phase 

‣ Time- & frequency-resolution trade-off
Audio time & frequency resolution
Convolutional Filters Decouple
Time & Frequency Resolution
• Conv. filter time & frequency resolution trade-off 

• Convolution resolution:

‣ Time resolution stride 

- Stride↓ time resolution↑

‣ Frequency resolution filter 

- #filters↑ frequency resolution↑

‣ Stride time & frequency resolution
Frame-level Raw Waveform Model
• 2014 E2E music classification 

‣ But, spectrogram model 

- , E2E music classification



• STFT 1D strided conv. layer 

‣ Spectrogram 1D conv. mid-level
representation 

• Strided conv. output spectrogram
hyperparameter 

‣ Filter size (=window size of STFT)

‣ Stride (=hop size of STFT)
2
Spectrogram
Waveform
with one 1D conv.
net.
1D Conv. on
Spectrogram
Time
Channel
(=Frequency)
Channel
(=unsortedfrequency-like)
Time
1D conv. filter
STFT
1D conv. filter
1D conv. filter
1D Conv. on
Waveform
vs.
• 1D conv. 2D conv.
spectrogram vs.
waveform
• E2E music classification
Channel
frequency 

→
!
Frame-level Raw Waveform Model
• :
‣ CNN (layer )

‣ Log-based amplitude compression 

‣ conv. layer phase variation 

• 2014 , :
‣ Batch Norm. ResNet ( )

‣ GPU ( ,
)
2
Sample-level Raw Waveform Model
• 

‣ Log-scale amplitude compression

‣ Phase invariance

• :

‣ 

- Filter size = one of {2, 3, 4, 5, 7}

‣ STFT conv. layer 

• net.
3
6551 × 128
19683 × 128
2187× 128
729 × 256
243 × 256
81 × 256
27 × 256
9 × 256
3 × 256
512
1 × 512
50
tag prediction
1Dconvolutionalblock×9
strided conv
FC×2
59049 × 1
raw waveform
1D conv. block
SampleCNN!
Sample-level Raw Waveform Model3
•E2E model spectrogram
•E2E music classification model
Comparison with frame-level models
Comparison with state-of-the-arts
Sample-level Raw Waveform Model3
...
Filter size 2
Filter size 3
Filter size 4
Filter size 5
6551 × 128
19683 × 128
2187× 128
729 × 256
243 × 256
81 × 256
27 × 256
9 × 256
3 × 256
256 512 256
1 × 512
50
tag prediction
1Dconvolutionalblock×9
multi-levelfeatureaggregation
strided conv
FC×2
59049 × 1
raw waveform
globalmaxpooling
globalmaxpooling
Base Architecture
Advanced Sample-level Raw Waveform Model4
image classification
1) Convolutional blocks from ResNet & SENet
2) Multi-level feature aggregation
1D Convolutional Blocks
Conv1D
BatchNorm
MaxPool
relu
Basic block
relu
relu
Conv1D
BatchNorm
Conv1D
BatchNorm
MaxPool
Dropout
Res-n block
relu
relu
sigmoid
relu
T×C
1×C
1×αC
1×C
T×C
T×C
Conv1D
BatchNorm
Dropout
Conv1D
BatchNorm
GlobalAvgPool
FC
FC
Scale
MaxPool
ReSE-n block
relu
relu
sigmoid
T×C
1×C
1×αC
1×C
T×C
T×C
Conv1D
FC
FC
Scale
BatchNorm
MaxPool
GlobalAvgPool
SE block
Excitation
SampleCNN
Res-n Block
• From ResNet (2015 ImageNet challenges )

• Motivation:

‣ Skip-connection net. 

• n: conv. layer (1 or 2)

‣ Conv. layer regularization 

dropout (inspired by WideResNet)
relu
relu
Conv1D
BatchNorm
Conv1D
BatchNorm
MaxPool
Dropout
Skip-connection
Basic
Res-1
Res-2 0.9061
0.9048
0.9055
AUC on MagnaTagATune
1.8 ,
but
Basic
Res-2
SE 0.9083
0.9061
0.9055
SE Block
• From SENet (2017 ImageNet challenges )

• Motivation:

‣ Channel , channel recalibration

- channel (=frequency-like) ( ) ,
( )

- channel weight (0~1) rescale
(recalibration)
relu
relu
sigmoid
T×C
1×C
1×αC
1×C
T×C
T×C
Conv1D
FC
FC
Scale
BatchNorm
MaxPool
GlobalAvgPool
AUC on MagnaTagATune
basic block
1.08
SE Block for Image (2D Conv.)
Squeeze operation:
• Aggregate spatial dimensions
• Produce channel-wise statistics
Excitation operation:
• Using the statistics, learn channel relationships
• Produce weight for each channel
Global spatial information
for each channel
Excitations (range 0~1):
Weight for each channel
Reweight each channel
using the weights
SE Block for Audio (1D Conv.)
Time
Channel(orFrequency-like)
Time
Global temporal statistics
for each channel
Squeeze operation:
• Aggregate temporal dimensions
• Produce frequency-wise statistics
Excitation operation:
• Using the statistics, learn frequency relationships
• Produce weight for each frequency
Excitations (range 0~1):
Weight for each frequency
Reweight each frequency
using the weights
SE Block for Audio (1D Conv.)
relu
relu
sigmoid
T×C
1×C
1×αC
1×C
T×C
T×C
Conv1D
FC
FC
Scale
BatchNorm
MaxPool
GlobalAvgPool
Difference with Original SE Block
• Original SENet FC layer


‣ 𝑟: reduction ratio

• 

‣ 𝜶: amplifying ratio

• Original SENet 16 ,
16 

‣ Audio channel
?
relu
relu
sigmoid
T×C
1×C
1×αC
1×C
T×C
T×C
Conv1D
FC
FC
Scale
BatchNorm
MaxPool
GlobalAvgPool
Amplifying Ratio (alpha) Grid Search
AUC
Amplifying Ratio
OverfittingUnderfitting
ReSE-n block
• Res-n & SE blockrelu
relu
sigmoid
relu
T×C
1×C
1×αC
1×C
T×C
T×C
Conv1D
BatchNorm
Dropout
Conv1D
BatchNorm
GlobalAvgPool
FC
FC
Scale
MaxPool
Basic
Res-2
SE
ReSE-1
ReSE-2 0.9102
0.9066
0.9083
0.9061
0.9055
1.8
AUC on MagnaTagATune
Multi-level Feature Aggregation
• layer 3 output 

‣ 3 output concatenate

‣ Simple, but powerful

• Motivation:

‣ music tag abstraction 

‣ Example:

- “vocal”: low abstraction

- “metal”: high abstraction

• Global max pooling time dimension 

1
Comparison of Architectures
Basic
SE
Res-1
Res-2
ReSE-1
ReSE-2 0.9102
0.9066
0.9061
0.9048
0.9083
0.9055
0.9113
0.9053
0.9098
0.9037
0.9111
0.9077
multi-feature aggregation no multi-feature aggregation
x1.7
x1.08SampleCNN
Comparison with SoTA
Ensemble of 3 models
Best among single models!
Interpretability of Deep Learning
• 

• interpretability 

‣ vision 

‣ Audio 

• Weapons of Math Destruction ( )
— , “ ”

• ,


‣ ( )

‣ “ ?” “ ?” ...
SampleCNN Filter Visualization
• channel frequency signal 

• frequency ( )

• Layer filter
(e.g. mel-scale)

‣ i.e. piano key 

• Layer Low-frequency
Sorted channel index
Frequency(0~11KHz)
Channel back propagate
Input
Filter Viz. Process Example:
layer channel
6551 × 128
19683 × 128
2187× 128
729 × 256
243 × 256
81 × 256
utionalblock×9
strided conv
59049 × 1
raw waveformInitialize input randomly (random noise)
Backprop.
1
2
STFT
3
4
Frequency
Channel 3 of Layer 1
Time
Channel
layer
Excitation
Visualization
relu
relu
sigmoid
T×C
1×C
1×αC
1×C
T×C
T×C
Conv1D
FC
FC
Scale
BatchNorm
MaxPool
GlobalAvgPool
Excitation
Sorted channel index
Excitation
Excitation
Visualization
• SE block channel
tag


• Mid block general , last
block discriminative signal
processing


• block tag
excitation!

‣ Loudness
Sorted channel index
Excitation
Excitation
Visualization
• Excitation loudness
tag
excitation layer
Sorted channel index
Excitation
Standard deviations of excitations
across tags
Analysis of the First Excitation
Sorted channel index
Excitation
tag 50 excitation
Analysis of the First Excitation
• audio segment 

• linear regression line

• SE block loudness
normalize 

• But ,
Average
of

128 Channels
Most

Positive
Channel
Most

Negative
Channel
Most

Neutral
Channel
Least

Regression
Error
Channel
Loudness
Excitation
Analysis of the First Excitation
• linear regression

• loudness normalize


‣ #negative = 109

‣ #positive = 19

• Loudness excitation
?
Loudness
Excitation
Variation of Excitations increases
according to Loudness
• audio segment excitation channel
• segment ( )
• Loudness excitation
Excitation Comparison
with Speech Dataset
TensorFlow Speech Commands Dataset
• 1 audio

• 

‣ : “yes”, “no”, “right”, “go” 

• SE block
Average of 128 Channels
MagnaTagATune
(Music dataset)
TensorFlow Speech Commands
(Speech dataset)
Most Positive Channel
MagnaTagATune
(Music dataset)
TensorFlow Speech Commands
(Speech dataset)
Linear Regression Lines
MagnaTagATune
(Music dataset)
TensorFlow Speech Commands
(Speech dataset)
#negative=109, #positive=19 !
MagnaTagATune
(Music dataset)
TensorFlow Speech Commands
(Speech dataset)
Variation of Excitations increases according to Loudness
• audio filter visualization ?

• SE block squeeze, excitation ?

• ROC-AUC policy gradient directly optimize ?
Image filter viz.
(Taejun Kim)

i2r.jun@gmail.com
^_^
References
• [Cover art] http://www.sqoop.co.ug/201805/four-one-one/nation-media-
group-launches-music-record-label-lit-music.html

Mais conteúdo relacionado

Último

Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowgargpaaro
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...kumargunjan9515
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...gajnagarg
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...kumargunjan9515
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.pptibrahimabdi22
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabiaahmedjiabur940
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numberssuginr1
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxchadhar227
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制vexqp
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...HyderabadDolls
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...nirzagarg
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...nirzagarg
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxronsairoathenadugay
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...gajnagarg
 

Último (20)

Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 

Destaque

Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 

Destaque (20)

Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 

End-to-end Music Classification

  • 2. • MIR ‣ 2017 DJ music classification • • ‣ ... ‣ ... • ^_^ ‣ - ML for Music - Automated feature engineer using RL ICASSP 2018 😆
  • 3. Music Classification, Why? • classification “representation learning” ‣ Music streaming service ? ‣ content-based recommendation • Music streaming service ‣ ! • ‣ “ ” ‣
  • 4. : End-to-end Music Classification • History of E2E Music Classification Models ‣ E2E ? • Interpretability of E2E Music Classification Models ‣
  • 5. • Sample-level Deep Convolutional Neural Networks 
 for Music Auto-tagging using Raw Waveforms (2017)
 Jongpil Lee, Jiyoung Park, Keunhyoung Luke Kim and Juhan Nam
 Sound and Music Computing Conf. (SMC), 2017. ‣ Music classification end-to-end approach ‣ Frequency • Sample-level CNN Architectures
 for Music Auto-tagging using Raw Waveforms (2018)
 Taejun Kim, Jongpil Lee and Juhan Nam,
 IEEE Int. Conf. Acoustical, Speech Signal Processing (ICASSP), 2018. ‣ CNN architecture ‣ Loudness
  • 6.
  • 7. Spectrogram to End-to-end 1 2 3 2014 2017 (SampleCNN) • End-to-end • STFT conv. layer • (frame-level) • End-to-end • STFT conv. layer • (sample-level) • Handcrafted spectrogram • 1D or 2D conv.
  • 8. Frame- vs. Sample-level • ( ) • Sample-level ‣ 1D convolution ‣ 7 ( 0.14ms )* • Frame-level ‣ STFT window 1D convolution ‣ 256 ( 12ms )* ‣ Spectrogram frame-level 
 → Trade-off between time- & frequency resolution! • *22,050kHz
  • 9. Frame-level Mel-spectrogram Model (=Channel) 1 1) 2D Convolution 2) 1D Convolution (=Channel) (=Channel) • Spectrogram • (e.g. MNIST) • • Spectrogram 1 sequence • Frequency dim. = dim. • End-to-end
  • 10. Spectrogram Model • ‣ ( 3~7 ) ‣ ‣ Phase invariant • ‣ Task mid-level representation ‣ Hyperparameter tuning (e.g. window length, hop size, etc.) ‣ Phase ‣ Time- & frequency-resolution trade-off
  • 11. Audio time & frequency resolution
  • 12. Convolutional Filters Decouple Time & Frequency Resolution • Conv. filter time & frequency resolution trade-off • Convolution resolution: ‣ Time resolution stride - Stride↓ time resolution↑ ‣ Frequency resolution filter - #filters↑ frequency resolution↑ ‣ Stride time & frequency resolution
  • 13. Frame-level Raw Waveform Model • 2014 E2E music classification ‣ But, spectrogram model - , E2E music classification
 • STFT 1D strided conv. layer ‣ Spectrogram 1D conv. mid-level representation • Strided conv. output spectrogram hyperparameter ‣ Filter size (=window size of STFT) ‣ Stride (=hop size of STFT) 2 Spectrogram Waveform with one 1D conv. net.
  • 14. 1D Conv. on Spectrogram Time Channel (=Frequency) Channel (=unsortedfrequency-like) Time 1D conv. filter STFT 1D conv. filter 1D conv. filter 1D Conv. on Waveform vs. • 1D conv. 2D conv. spectrogram vs. waveform • E2E music classification Channel frequency 
 → !
  • 15. Frame-level Raw Waveform Model • : ‣ CNN (layer ) ‣ Log-based amplitude compression ‣ conv. layer phase variation • 2014 , : ‣ Batch Norm. ResNet ( ) ‣ GPU ( , ) 2
  • 16. Sample-level Raw Waveform Model • ‣ Log-scale amplitude compression ‣ Phase invariance • : ‣ - Filter size = one of {2, 3, 4, 5, 7} ‣ STFT conv. layer • net. 3 6551 × 128 19683 × 128 2187× 128 729 × 256 243 × 256 81 × 256 27 × 256 9 × 256 3 × 256 512 1 × 512 50 tag prediction 1Dconvolutionalblock×9 strided conv FC×2 59049 × 1 raw waveform 1D conv. block SampleCNN!
  • 17. Sample-level Raw Waveform Model3 •E2E model spectrogram •E2E music classification model Comparison with frame-level models Comparison with state-of-the-arts
  • 18. Sample-level Raw Waveform Model3 ... Filter size 2 Filter size 3 Filter size 4 Filter size 5
  • 19. 6551 × 128 19683 × 128 2187× 128 729 × 256 243 × 256 81 × 256 27 × 256 9 × 256 3 × 256 256 512 256 1 × 512 50 tag prediction 1Dconvolutionalblock×9 multi-levelfeatureaggregation strided conv FC×2 59049 × 1 raw waveform globalmaxpooling globalmaxpooling Base Architecture Advanced Sample-level Raw Waveform Model4 image classification 1) Convolutional blocks from ResNet & SENet 2) Multi-level feature aggregation 1D Convolutional Blocks Conv1D BatchNorm MaxPool relu Basic block relu relu Conv1D BatchNorm Conv1D BatchNorm MaxPool Dropout Res-n block relu relu sigmoid relu T×C 1×C 1×αC 1×C T×C T×C Conv1D BatchNorm Dropout Conv1D BatchNorm GlobalAvgPool FC FC Scale MaxPool ReSE-n block relu relu sigmoid T×C 1×C 1×αC 1×C T×C T×C Conv1D FC FC Scale BatchNorm MaxPool GlobalAvgPool SE block Excitation SampleCNN
  • 20. Res-n Block • From ResNet (2015 ImageNet challenges ) • Motivation: ‣ Skip-connection net. • n: conv. layer (1 or 2) ‣ Conv. layer regularization 
 dropout (inspired by WideResNet) relu relu Conv1D BatchNorm Conv1D BatchNorm MaxPool Dropout Skip-connection Basic Res-1 Res-2 0.9061 0.9048 0.9055 AUC on MagnaTagATune 1.8 , but
  • 21. Basic Res-2 SE 0.9083 0.9061 0.9055 SE Block • From SENet (2017 ImageNet challenges ) • Motivation: ‣ Channel , channel recalibration - channel (=frequency-like) ( ) , ( ) - channel weight (0~1) rescale (recalibration) relu relu sigmoid T×C 1×C 1×αC 1×C T×C T×C Conv1D FC FC Scale BatchNorm MaxPool GlobalAvgPool AUC on MagnaTagATune basic block 1.08
  • 22. SE Block for Image (2D Conv.) Squeeze operation: • Aggregate spatial dimensions • Produce channel-wise statistics Excitation operation: • Using the statistics, learn channel relationships • Produce weight for each channel Global spatial information for each channel Excitations (range 0~1): Weight for each channel Reweight each channel using the weights
  • 23. SE Block for Audio (1D Conv.) Time Channel(orFrequency-like) Time Global temporal statistics for each channel Squeeze operation: • Aggregate temporal dimensions • Produce frequency-wise statistics Excitation operation: • Using the statistics, learn frequency relationships • Produce weight for each frequency Excitations (range 0~1): Weight for each frequency Reweight each frequency using the weights
  • 24. SE Block for Audio (1D Conv.) relu relu sigmoid T×C 1×C 1×αC 1×C T×C T×C Conv1D FC FC Scale BatchNorm MaxPool GlobalAvgPool
  • 25. Difference with Original SE Block • Original SENet FC layer ‣ 𝑟: reduction ratio • ‣ 𝜶: amplifying ratio • Original SENet 16 , 16 ‣ Audio channel ? relu relu sigmoid T×C 1×C 1×αC 1×C T×C T×C Conv1D FC FC Scale BatchNorm MaxPool GlobalAvgPool
  • 26. Amplifying Ratio (alpha) Grid Search AUC Amplifying Ratio OverfittingUnderfitting
  • 27. ReSE-n block • Res-n & SE blockrelu relu sigmoid relu T×C 1×C 1×αC 1×C T×C T×C Conv1D BatchNorm Dropout Conv1D BatchNorm GlobalAvgPool FC FC Scale MaxPool Basic Res-2 SE ReSE-1 ReSE-2 0.9102 0.9066 0.9083 0.9061 0.9055 1.8 AUC on MagnaTagATune
  • 28. Multi-level Feature Aggregation • layer 3 output ‣ 3 output concatenate ‣ Simple, but powerful • Motivation: ‣ music tag abstraction ‣ Example: - “vocal”: low abstraction - “metal”: high abstraction • Global max pooling time dimension 
 1
  • 29. Comparison of Architectures Basic SE Res-1 Res-2 ReSE-1 ReSE-2 0.9102 0.9066 0.9061 0.9048 0.9083 0.9055 0.9113 0.9053 0.9098 0.9037 0.9111 0.9077 multi-feature aggregation no multi-feature aggregation x1.7 x1.08SampleCNN
  • 30. Comparison with SoTA Ensemble of 3 models Best among single models!
  • 31.
  • 32. Interpretability of Deep Learning • • interpretability ‣ vision ‣ Audio • Weapons of Math Destruction ( ) — , “ ” • , ‣ ( ) ‣ “ ?” “ ?” ...
  • 33. SampleCNN Filter Visualization • channel frequency signal • frequency ( ) • Layer filter (e.g. mel-scale) ‣ i.e. piano key • Layer Low-frequency Sorted channel index Frequency(0~11KHz)
  • 34. Channel back propagate Input Filter Viz. Process Example: layer channel 6551 × 128 19683 × 128 2187× 128 729 × 256 243 × 256 81 × 256 utionalblock×9 strided conv 59049 × 1 raw waveformInitialize input randomly (random noise) Backprop. 1 2 STFT 3 4 Frequency Channel 3 of Layer 1 Time Channel layer
  • 36. Excitation Visualization • SE block channel tag • Mid block general , last block discriminative signal processing • block tag excitation! ‣ Loudness Sorted channel index Excitation
  • 37. Excitation Visualization • Excitation loudness tag excitation layer Sorted channel index Excitation Standard deviations of excitations across tags
  • 38. Analysis of the First Excitation Sorted channel index Excitation tag 50 excitation
  • 39. Analysis of the First Excitation • audio segment • linear regression line • SE block loudness normalize • But , Average of 128 Channels Most Positive Channel Most Negative Channel Most Neutral Channel Least Regression Error Channel Loudness Excitation
  • 40. Analysis of the First Excitation • linear regression • loudness normalize ‣ #negative = 109 ‣ #positive = 19 • Loudness excitation ? Loudness Excitation
  • 41. Variation of Excitations increases according to Loudness • audio segment excitation channel • segment ( ) • Loudness excitation
  • 43. TensorFlow Speech Commands Dataset • 1 audio • ‣ : “yes”, “no”, “right”, “go” • SE block
  • 44. Average of 128 Channels MagnaTagATune (Music dataset) TensorFlow Speech Commands (Speech dataset)
  • 45. Most Positive Channel MagnaTagATune (Music dataset) TensorFlow Speech Commands (Speech dataset)
  • 46. Linear Regression Lines MagnaTagATune (Music dataset) TensorFlow Speech Commands (Speech dataset) #negative=109, #positive=19 !
  • 47. MagnaTagATune (Music dataset) TensorFlow Speech Commands (Speech dataset) Variation of Excitations increases according to Loudness
  • 48. • audio filter visualization ? • SE block squeeze, excitation ? • ROC-AUC policy gradient directly optimize ? Image filter viz.
  • 50. References • [Cover art] http://www.sqoop.co.ug/201805/four-one-one/nation-media- group-launches-music-record-label-lit-music.html