SlideShare uma empresa Scribd logo
1 de 40
Baixar para ler offline
Dec. 2011 @ MTG, UPF




                Dimensional
               Music Emotion
                Recognition
Yi-Hsuan Yang
Assistant Research Fellow
Music & Audio Computing (MAC) Lab
Research Center for IT Innovation
Academia Sinica

                                                    1
Music & Emotion
Music conveys emotion and modulates our mood
Music emotion recognition (MER)
  Understand how human perceives/feels emotion when
  listening to music
  Develop systems for emotion-based music retrieval




                                                  2
Why Do We Listen to Music?
  Motive                                                          Ratio
  “to express, release, and influence emotions”                   47%
  “to relax and settle down”                                      33%
  “for enjoyment, fun, and pleasure”                              22%
  “as company and background sound”                               16%
  “because it makes me feel good”                                 13%
  “because it’s a basic need, I can’t live without it”            12%
  “because I like/love music”                                     11%
  “to get energized”                                              9%
  “to evoke memories”                                             4%

“Expression, Perception, and Induction of Musical Emotions: A Review and a
Questionnaire Study of Everyday Listening,” Patrik N. Juslin and Petri Laukka,
Journal of New Music Research, 2004                                              3
Categories of Emotion
Expressed (intended) emotion
  What a performer tries to express
Perceived emotion
  What a listener perceives as being expressed in music
  Usually the same as the expressed emotion
Felt (induced) emotion
  What a listener actually feels
  Strongly influenced by the context of music listening
  (environment, mood)


                                                          4
Emotion Description w/ Mood Labels




                                         5
Courtesy of Ching-Wei Chen @ Gracenote
Description w/ Latent Dimensions




                                   6
Categorical Approach
  Audio spectrum




                       Hevner’ model (1936)




                                      7
Dimensional Approach
 Audio spectrum




                       Emotion plane (Russell
                       1980, Thayer 1989)




                                        8
Categorical vs. Dimensional
              Pros                      Cons

Categorical   •   Intuitive             •   Lack a unifying model
              •   Natural language      •   Ambiguous
              •   Atomic description    •   Subjective
                                        •   Difficult to offer fine-grained
                                            differentiation


Dimensional   •   Focus on a few        •   Less intuitive
                  dimensions            •   Semantic loss in projection
              •   Good user interface   •   Difficult to obtain ground
                                            truth



                                                                      9
Q: No Consensus on Mood Taxonomy
Work                       #    Emotion description
Katayose et al [icpr98]     4   Gloomy, urbane, pathetic, serious
Feng et al [sigir03]        4   Happy, angry, fear, sad
Li et al [ismir03],             Happy, light, graceful, dreamy, longing, dark, sacred,
Wieczorkowska              13   dramatic, agitated, frustrated, mysterious, passionate,
et al [imtci04]                 bluesy
Wang et al [icsp04]         6   Joyous, robust, restless, lyrical, sober, gloomy
Tolos et al [ccnc05]        3   Happy, aggressive, melancholic+calm
Lu et al [taslp06]          4   Exuberant, anxious/frantic, depressed, content
Yang et al [mm06]           4   Happy, angry, sad, relaxed
Skowronek et al                 Arousing, angry, calming, carefree, cheerful, emo-
                           12
[ismir07]                       tional, loving, peaceful, powerful, sad, restless, tender
                                Happy, light, easy, touching, sad, sublime,
Wu et al [mmm08]            8
                                grand, exciting
Hu et al [ismir08]          5   Passionate, cheerful, bittersweet, witty, aggressive
Trohidis et al [ismir08]    6   Surprised, happy, relaxed, quiet, sad, angry         10
Fuzzy Boundary b/w Mood Classes
 Subjective usage of affective terms
     Cheerful, happy, joyous, party/celebratory
     Melancholy, gloomy, sad, sorrowful
 Semantic overlap (#2 and #4) and acoustic overlap
 (#1 and #5) [mirex07.cyril&perfe]

MIREX AMC Taxonomy
Cluster 1 Passionate, rowdy, rousing, confident, boisterous
Cluster 2 Amiable/good-natured, sweet, fun, rollicking, cheerful
Cluster 3 Literate, wistful, bittersweet, autumnal, brooding, poignant
Cluster 4 Witty, humorous, whimsical, wry, campy, quirky, silly
Cluster 5 Aggressive, volatile, fiery, visceral, tense/anxious, intense
                                                                      11
Granularity of Emotion Description
 Small set of emotion classes
   Insufficient comparing to the richness of our perception
 Large set of emotion classes
   Difficult to obtain reliable ground truth data

       □   Happy
       □   Sad                     Acerbic, Aggressive, Ambitious,
       □   Angry                   Amiable, Angry, Bittersweet, Bright,
       □   Relaxed                 Brittle, Calm/, Carefree, Cathartic,
                                   Cerebral, Cheerful, Circular, Clinical,
                                   Cold, Confident, Delicate, Dramatic,
                                   Dreamy, Druggy, Earnest, Eccentric,
                                   Elegant, Energetic, Enigmatic, Epic,
                                   Exciting, Exuberant, Fierce, Fiery, Fun,
                                   Gentle, Gloomy, Greasy, Happy, …

                                                                     12
Sol: Describing Emotions in Emotion Space
                      ○ Activation, activity
            Arousal   ○ Energy and stimulation level




                                          Valence
                                          ○ Pleasantness
                                          ○ Positive and
                                            negative affective
                                            states




                              [psp80]                     13
The Dimensional Approach

Strength
  No need to consider which and how many emotions
  Generalize MER from categorical domain
  to real-valued domain
  Easy to compare different
  computational models
      Arousal
      Valence




                                                    14
The Dimensional Approach

Weakness
 Semantic loss due to projection
 Blurs important psychological distinctions
 3rd dimension: potency [psy07]
    Angry ↔ afraid
    Proud ↔ shameful
    Interested ↔ disappointed
 4th dimension: unpredictability
    Surprised
    Tense ↔ afraid
    Contempt ↔ disgust

                                              15
Music Retrieval in VA Space

           arousal
                          Provide a simple
                          means for 2D user
                          interface
                            Pick a point
valence                     Draw a trajectory

                          Useful for mobile
                          devices with small
                          display space
     Demo
                                                16
Q: How to Predict Emotion Values?
Transformation-based approach [mm06]
  Consider the four quadrants
  Perform 4-class mood classification
  Apply the following transformation
    Arousal = u1 + u2 – u3 – u4
    Valence = u1 + u4 – u2 – u3
            (u denotes likelihood)

  Not rigorous




                                        17
Sol: Perform Regression

Given features,                     y
predict a numerical value

Given N inputs (xi, yi), 1≤ i ≤N,
where xi is feature and yi is the
numerical value to be predicted,
train a regression model R(.)
such that the following mean
squared error (MSE) is minimized

      1 N
   min ∑ (yi − f (xi ))2
                                        x
    f N
        i =1

                                        18
Computational Framework [taslp08]
Predict the VA values                        1 N
  Trains a regression model               min ∑ (yi − f (xi ))2
                                           f N
  f (·) that minimizes the mean                i =1
  squared error (MSE)           yi : numerical emotion value
  One for valence;              xi : feature (input)
  one for arousal               f(xi) : prediction result (output)
                                                   e.g. linear regression
                                                   f(xi) = wTxi +b
                                Emotion
                   Manual        value                   = sumj {wjxij} +b
                  annotation
   Training                                Regressor
     data                                   training
                    Feature
                   extraction   Feature
                                                 Regressor

    Test            Feature     Feature    Automatic         Emotion
    data           extraction              Prediction         value
                                                                         19
Obtain Music Emotion Rating
Manual annotation
  Rates the VA values of each song
    Ordinal rating scale
    Scroll bar


                            Emotion
               Manual        value
 Training     annotation
                                      Regressor
   data                                training
                Feature
               extraction   Feature
                                            Regressor
 Test           Feature     Feature   Automatic         Emotion
 data          extraction             Prediction         value
                                                             20
Evaluation of Emotion Rating
User study
   1240 Chinese pop songs; each 30-sec
   666 subjects; each rates 8 random songs

Subjective evaluation                         0                100
   Easiness of annotating emotion
   Within-subject reliability: compare to one month later
   Between-subject reliability: compare to other subjects


                               Within-subject      Between-subject
    Method        Easiness
                                 reliability          reliability
 Emotion rating      2.82            2.92               2.81
    From 1 to 5 (strongly disagree to strongly agree)
                                                                     21
AnnoEmo: GUI for Emotion Rating [hcm07]
  Encourages differentiation                      Demo




                     Drag & drop      Click to
                      to modify    listen again
                     annotation




                                                     22
Cognitive Load is Still High
Determining VA values is not that easy
Difficult to ensure consistently
  Does dist(0.5,0.8) = dist(–0.2,0.1) in terms of
  our emotion perception?
  Does 0.7 the same for two subjects? 1


                                                   0.5   0.8
                                  -1                           1

                                                   0.1
                                       -0.2



                                              -1
                                                               23
Sol: Ranking Instead of Rating [taslp11a]
 Determines the position of a song
    By the relative ranking with respect to other songs
    Rather than by the exact emotion values
                Oh Happy Day                                 valence
     positive
     valence    I Want to Hold Your Hand by Beatles            =1
                I Feel Good by James Brown
                What a Wonderful World by Louis Armstrong
 relative       Into the Woods by My Morning Jacket              exact
 ranking        The Christmas Song                               rating
                C'est La Vie
                Labita by Lisa One
                Just the Way You Are by Billy Joel
    negative    Perfect Day by Lou Reed                      valence
     valence    When a Man Loves a Woman by Michael Bolton    = –1
                Smells Like Teen Spirit by Nirvana


                                                                          24
Ranking-Based Emotion Annotation
 Emotion tournament
        Requires only n–1 pairwise comparisons
        The global ordering can later be approximated by a
        greedy algorithm [jair99]
                                             a b c d e   f   g h
                                         a                              0
                                         b                              3
                                         c                              1
                                         d                              0
                                         e                              0
                                         f                              7
                                         g                              0
    a      b   c   d   e    f   g    h   h                              1

Which songs is more positive?       f>b>c=h>a=d=e=g
                                                                   25
Online Interface




                   26
Simplify Emotion Annotation
Subjective evaluation
  Both rate and rank
  The ordering of rate and rank does not matter
Result
  Strong




  Weak




                                                  27
Q: Which Features are Relevant? [psy07]



  Sound intensity   Tempo      Rhythm




                     major



    Pitch range
                    Mode     Consonance


                                          28
Feature Extraction
Melody/harmony [MIR toolbox]
   Pitch estimate, key clarity, harmonic change, musical mode
Spectral [Marsyas]
   Spectral flatness measures, spectral crest factors, MFCCs
Temporal [Sound description toolbox]
   Zero-crossing rate, temporal centroid, log-attack time
Rhythmic [Rhythm pattern extractor]
   Beat histogram and average tempo
Psyco-acoustic motivated features [PsySound]
   Loudness, sharpness, timbral width, volume,
   spectral dissonance, tonal dissonance, pure tonal,
   complex tonal, multiplicity, tonality, chord
                                                            29
Data Collection



                  0




                      30
Q: Subjective Issue




Each circle
represents the
emotion annotation
for a music piece
by a subject
                                           31
Sol: Probabilistic MER [taslp11b]
Predicts the probabilistic distribution P(e|d) of the
perceived emotions of a music piece




                                                   32
Sol: Personalized MER [sigir09]
From P(e|d) to P(e|d,u)
  General regressor     personal regressor
  Utilize user feedback

               Manual       Emotion value
              annotation              Regressor
 Training
   data        Feature                 training
              extraction   Feature           Regressor
 Test          Feature     Feature     Automatic         Personalization
 data         extraction               Prediction
                            Emotion value
                                     Emotion-based         User
                                       retrieval         feedback

                                                                      33
Evaluation Setup
Training data
  195 Western/Japanese/Chinese pop songs
  25-sec segment that is representative of the song
     Too long      the emotion may not be homogeneous
     Too short      the listener may not hear enough
Manual annotation
  253 subjects; each rates 12 songs
  Rate the VA values in 11 ordinal levels
   ○ 0 ○ 1 ○ 2 ○ 3 ○ 4 ○ 5 ○ 6 ○ 7 ○ 8 ○ 9 ○ 10

  Each song is annotated by 10+ subjects
  Ground truth obtained by averaging
                                                        34
Quantitative Result
Method                                      R2 of valence R2 of arousal
Multiple linear regression                     0.109          0.568
Adaboost.RT [ijcnn04]                          0.117          0.553
SVR (support vector regression) [sc04]         0.222          0.570
SVR + RReliefF (feature selection) [ml03]      0.254          0.609

 Result
     R2: squared correlation between y and f(x)
     Valence prediction is challenging
        Valence: 0.25 ~ 0.35
        Arousal: 0.60 ~ 0.85

                                                                      35
Qualitative Result
                                               No No No Part 2 - Beyonce
          Out Ta Get Me - Guns N' Roses

You're Crazy - Guns N'                                       All Of Me - 50 Cent
        Roses

Bodies - Sex Pistols                                          New York Giants -
                                                                  Big Pun


I've Got To See You
                                                               Mammas Don't Let
Again - Norah Jones
                                                               Your Babies Grow
                                                              Up To Be Cowboys -
If Only In The Heaven's                                          Willie Nelson
     Eyes - NSYNC

 Live For The One I Love -
                                                   The Last Resort - The Eagles
        Celine Dion
                        Why Do I Have To Choose - Willie Nelson             36
Missing 1: Temporal Context of Music
  “Sweet anticipation” by David Huron
    Music’s most expressive qualities probably relate to
    structural changes across time
  Music emotion can
  also vary within an
  excerpt [tsmc06]




                                                           37
Missing 2: Context of Music Listening




  Listening mood/context
  Familiarity/associated memory
  Preference of the singer/performer/song
  Social relationship
                                            38
Conclusion
A computational framework for predicting numerical
emotion values
  Generalizes MER from categorical to dimensional
  Resolves some issues of emotion description
  Rank instead of rate
  2D user interface for music retrieval
Valence & subjectivity
Content & context

Acknowledgement
  Prof. Homer Chen, National Taiwan University
                                                    39
Reference
Music Emotion Recognition, CRC Press, 2011
“A regression approach to music emotion recognition,”
IEEE TASLP, 2008. (cited by 76)
“Ranking-based emotion recognition for music
organization and retrieval,” IEEE TASLP, 2011
“Prediction of the distribution of perceived
music emotions using discrete samples,”
IEEE TASLP, 2011
“Exploiting online tags for music emotion
classification,” ACM TOMCCAP, 2011
“Machine recognition of music emotion:
A review,” ACM TIST, 2012
                                                        40
                                      CRC Press

Mais conteúdo relacionado

Destaque

Music Symbols
Music SymbolsMusic Symbols
Music Symbols
elizkeren
 
basic note reading lesson plan in MAPEH
basic note reading lesson plan in MAPEHbasic note reading lesson plan in MAPEH
basic note reading lesson plan in MAPEH
Hannah Joy Batucan
 

Destaque (8)

Using Visualizations for Music Discovery
Using Visualizations for Music DiscoveryUsing Visualizations for Music Discovery
Using Visualizations for Music Discovery
 
Music Powerpoint
Music PowerpointMusic Powerpoint
Music Powerpoint
 
Music theory
Music theoryMusic theory
Music theory
 
Music symbols
Music symbols Music symbols
Music symbols
 
Music Symbols
Music SymbolsMusic Symbols
Music Symbols
 
basic note reading lesson plan in MAPEH
basic note reading lesson plan in MAPEHbasic note reading lesson plan in MAPEH
basic note reading lesson plan in MAPEH
 
Physical Education Lesson Plan
Physical Education Lesson PlanPhysical Education Lesson Plan
Physical Education Lesson Plan
 
Teaching Students with Emojis, Emoticons, & Textspeak
Teaching Students with Emojis, Emoticons, & TextspeakTeaching Students with Emojis, Emoticons, & Textspeak
Teaching Students with Emojis, Emoticons, & Textspeak
 

Semelhante a Dimensional Music Emotion Recognition

Pacling 2009 ptaszynski_presentation
Pacling 2009 ptaszynski_presentationPacling 2009 ptaszynski_presentation
Pacling 2009 ptaszynski_presentation
Michal Ptaszynski
 
Cbf fellows session 2_ february_2013 copy
Cbf fellows session 2_ february_2013 copyCbf fellows session 2_ february_2013 copy
Cbf fellows session 2_ february_2013 copy
b4man72
 
2012 - 02 trait theory eysenck big 5
2012 - 02 trait theory eysenck big 52012 - 02 trait theory eysenck big 5
2012 - 02 trait theory eysenck big 5
Dickson College
 

Semelhante a Dimensional Music Emotion Recognition (20)

Pacling 2009 ptaszynski_presentation
Pacling 2009 ptaszynski_presentationPacling 2009 ptaszynski_presentation
Pacling 2009 ptaszynski_presentation
 
Pearl Pu - Emotion Detection In Social Media
Pearl Pu - Emotion Detection In Social MediaPearl Pu - Emotion Detection In Social Media
Pearl Pu - Emotion Detection In Social Media
 
Hillary Rodham Clinton’S Political Advertisement ‘Invisibles’
Hillary Rodham Clinton’S Political Advertisement ‘Invisibles’Hillary Rodham Clinton’S Political Advertisement ‘Invisibles’
Hillary Rodham Clinton’S Political Advertisement ‘Invisibles’
 
Alternative Theoretical Perspectives on Emotion Representation & Modeling
Alternative Theoretical Perspectives on Emotion Representation & ModelingAlternative Theoretical Perspectives on Emotion Representation & Modeling
Alternative Theoretical Perspectives on Emotion Representation & Modeling
 
ATT v2
ATT v2ATT v2
ATT v2
 
Nighttime dreams and video game play
Nighttime dreams and video game playNighttime dreams and video game play
Nighttime dreams and video game play
 
Ideas You Can Play With
Ideas You Can Play WithIdeas You Can Play With
Ideas You Can Play With
 
Cbf fellows session 2_ february_2013 copy
Cbf fellows session 2_ february_2013 copyCbf fellows session 2_ february_2013 copy
Cbf fellows session 2_ february_2013 copy
 
Unit5pt2
Unit5pt2Unit5pt2
Unit5pt2
 
Dream theatres.ppt shortspeech
Dream theatres.ppt  shortspeechDream theatres.ppt  shortspeech
Dream theatres.ppt shortspeech
 
Eas (1)
Eas (1)Eas (1)
Eas (1)
 
Eas (1)
Eas (1)Eas (1)
Eas (1)
 
Eas (1)
Eas (1)Eas (1)
Eas (1)
 
Eas (1)
Eas (1)Eas (1)
Eas (1)
 
Mind maximisation
Mind maximisationMind maximisation
Mind maximisation
 
2012 - 02 trait theory eysenck big 5
2012 - 02 trait theory eysenck big 52012 - 02 trait theory eysenck big 5
2012 - 02 trait theory eysenck big 5
 
Class 2 - Building Rapport
Class 2 - Building RapportClass 2 - Building Rapport
Class 2 - Building Rapport
 
Ijcai2009 ptaszynski
Ijcai2009 ptaszynskiIjcai2009 ptaszynski
Ijcai2009 ptaszynski
 
Jancar memorial lecture
Jancar memorial lectureJancar memorial lecture
Jancar memorial lecture
 
Meg Hegarty - Creativity and Dying: Communication in Caring for the Spirit at...
Meg Hegarty - Creativity and Dying: Communication in Caring for the Spirit at...Meg Hegarty - Creativity and Dying: Communication in Caring for the Spirit at...
Meg Hegarty - Creativity and Dying: Communication in Caring for the Spirit at...
 

Mais de Yi-Hsuan Yang

Mais de Yi-Hsuan Yang (12)

20211026 taicca 3 music analysis sota
20211026 taicca 3 music analysis sota20211026 taicca 3 music analysis sota
20211026 taicca 3 music analysis sota
 
20211026 taicca 2 music generation
20211026 taicca 2 music generation20211026 taicca 2 music generation
20211026 taicca 2 music generation
 
20211026 taicca 1 intro to mir
20211026 taicca 1 intro to mir20211026 taicca 1 intro to mir
20211026 taicca 1 intro to mir
 
Automatic Music Composition with Transformers, Jan 2021
Automatic Music Composition with Transformers, Jan 2021Automatic Music Composition with Transformers, Jan 2021
Automatic Music Composition with Transformers, Jan 2021
 
Research on Automatic Music Composition at the Taiwan AI Labs, April 2020
Research on Automatic Music Composition at the Taiwan AI Labs, April 2020Research on Automatic Music Composition at the Taiwan AI Labs, April 2020
Research on Automatic Music Composition at the Taiwan AI Labs, April 2020
 
ISMIR 2019 tutorial: Generating music with generative adverairal networks (GANs)
ISMIR 2019 tutorial: Generating music with generative adverairal networks (GANs)ISMIR 2019 tutorial: Generating music with generative adverairal networks (GANs)
ISMIR 2019 tutorial: Generating music with generative adverairal networks (GANs)
 
machine learning x music
machine learning x musicmachine learning x music
machine learning x music
 
Learning to Generate Jazz & Pop Piano Music from Audio via MIR Techniques
Learning to Generate Jazz & Pop Piano Music from Audio via MIR TechniquesLearning to Generate Jazz & Pop Piano Music from Audio via MIR Techniques
Learning to Generate Jazz & Pop Piano Music from Audio via MIR Techniques
 
20190625 Research at Taiwan AI Labs: Music and Speech AI
20190625 Research at Taiwan AI Labs: Music and Speech AI20190625 Research at Taiwan AI Labs: Music and Speech AI
20190625 Research at Taiwan AI Labs: Music and Speech AI
 
Machine learning for creative AI applications in music (2018 nov)
Machine learning for creative AI applications in music (2018 nov)Machine learning for creative AI applications in music (2018 nov)
Machine learning for creative AI applications in music (2018 nov)
 
Machine Learning for Creative AI Applications in Music (2018 May)
Machine Learning for Creative AI Applications in Music (2018 May)Machine Learning for Creative AI Applications in Music (2018 May)
Machine Learning for Creative AI Applications in Music (2018 May)
 
Research at MAC Lab, Academia Sincia, in 2017
Research at MAC Lab, Academia Sincia, in 2017Research at MAC Lab, Academia Sincia, in 2017
Research at MAC Lab, Academia Sincia, in 2017
 

Dimensional Music Emotion Recognition

  • 1. Dec. 2011 @ MTG, UPF Dimensional Music Emotion Recognition Yi-Hsuan Yang Assistant Research Fellow Music & Audio Computing (MAC) Lab Research Center for IT Innovation Academia Sinica 1
  • 2. Music & Emotion Music conveys emotion and modulates our mood Music emotion recognition (MER) Understand how human perceives/feels emotion when listening to music Develop systems for emotion-based music retrieval 2
  • 3. Why Do We Listen to Music? Motive Ratio “to express, release, and influence emotions” 47% “to relax and settle down” 33% “for enjoyment, fun, and pleasure” 22% “as company and background sound” 16% “because it makes me feel good” 13% “because it’s a basic need, I can’t live without it” 12% “because I like/love music” 11% “to get energized” 9% “to evoke memories” 4% “Expression, Perception, and Induction of Musical Emotions: A Review and a Questionnaire Study of Everyday Listening,” Patrik N. Juslin and Petri Laukka, Journal of New Music Research, 2004 3
  • 4. Categories of Emotion Expressed (intended) emotion What a performer tries to express Perceived emotion What a listener perceives as being expressed in music Usually the same as the expressed emotion Felt (induced) emotion What a listener actually feels Strongly influenced by the context of music listening (environment, mood) 4
  • 5. Emotion Description w/ Mood Labels 5 Courtesy of Ching-Wei Chen @ Gracenote
  • 6. Description w/ Latent Dimensions 6
  • 7. Categorical Approach Audio spectrum Hevner’ model (1936) 7
  • 8. Dimensional Approach Audio spectrum Emotion plane (Russell 1980, Thayer 1989) 8
  • 9. Categorical vs. Dimensional Pros Cons Categorical • Intuitive • Lack a unifying model • Natural language • Ambiguous • Atomic description • Subjective • Difficult to offer fine-grained differentiation Dimensional • Focus on a few • Less intuitive dimensions • Semantic loss in projection • Good user interface • Difficult to obtain ground truth 9
  • 10. Q: No Consensus on Mood Taxonomy Work # Emotion description Katayose et al [icpr98] 4 Gloomy, urbane, pathetic, serious Feng et al [sigir03] 4 Happy, angry, fear, sad Li et al [ismir03], Happy, light, graceful, dreamy, longing, dark, sacred, Wieczorkowska 13 dramatic, agitated, frustrated, mysterious, passionate, et al [imtci04] bluesy Wang et al [icsp04] 6 Joyous, robust, restless, lyrical, sober, gloomy Tolos et al [ccnc05] 3 Happy, aggressive, melancholic+calm Lu et al [taslp06] 4 Exuberant, anxious/frantic, depressed, content Yang et al [mm06] 4 Happy, angry, sad, relaxed Skowronek et al Arousing, angry, calming, carefree, cheerful, emo- 12 [ismir07] tional, loving, peaceful, powerful, sad, restless, tender Happy, light, easy, touching, sad, sublime, Wu et al [mmm08] 8 grand, exciting Hu et al [ismir08] 5 Passionate, cheerful, bittersweet, witty, aggressive Trohidis et al [ismir08] 6 Surprised, happy, relaxed, quiet, sad, angry 10
  • 11. Fuzzy Boundary b/w Mood Classes Subjective usage of affective terms Cheerful, happy, joyous, party/celebratory Melancholy, gloomy, sad, sorrowful Semantic overlap (#2 and #4) and acoustic overlap (#1 and #5) [mirex07.cyril&perfe] MIREX AMC Taxonomy Cluster 1 Passionate, rowdy, rousing, confident, boisterous Cluster 2 Amiable/good-natured, sweet, fun, rollicking, cheerful Cluster 3 Literate, wistful, bittersweet, autumnal, brooding, poignant Cluster 4 Witty, humorous, whimsical, wry, campy, quirky, silly Cluster 5 Aggressive, volatile, fiery, visceral, tense/anxious, intense 11
  • 12. Granularity of Emotion Description Small set of emotion classes Insufficient comparing to the richness of our perception Large set of emotion classes Difficult to obtain reliable ground truth data □ Happy □ Sad Acerbic, Aggressive, Ambitious, □ Angry Amiable, Angry, Bittersweet, Bright, □ Relaxed Brittle, Calm/, Carefree, Cathartic, Cerebral, Cheerful, Circular, Clinical, Cold, Confident, Delicate, Dramatic, Dreamy, Druggy, Earnest, Eccentric, Elegant, Energetic, Enigmatic, Epic, Exciting, Exuberant, Fierce, Fiery, Fun, Gentle, Gloomy, Greasy, Happy, … 12
  • 13. Sol: Describing Emotions in Emotion Space ○ Activation, activity Arousal ○ Energy and stimulation level Valence ○ Pleasantness ○ Positive and negative affective states [psp80] 13
  • 14. The Dimensional Approach Strength No need to consider which and how many emotions Generalize MER from categorical domain to real-valued domain Easy to compare different computational models Arousal Valence 14
  • 15. The Dimensional Approach Weakness Semantic loss due to projection Blurs important psychological distinctions 3rd dimension: potency [psy07] Angry ↔ afraid Proud ↔ shameful Interested ↔ disappointed 4th dimension: unpredictability Surprised Tense ↔ afraid Contempt ↔ disgust 15
  • 16. Music Retrieval in VA Space arousal Provide a simple means for 2D user interface Pick a point valence Draw a trajectory Useful for mobile devices with small display space Demo 16
  • 17. Q: How to Predict Emotion Values? Transformation-based approach [mm06] Consider the four quadrants Perform 4-class mood classification Apply the following transformation Arousal = u1 + u2 – u3 – u4 Valence = u1 + u4 – u2 – u3 (u denotes likelihood) Not rigorous 17
  • 18. Sol: Perform Regression Given features, y predict a numerical value Given N inputs (xi, yi), 1≤ i ≤N, where xi is feature and yi is the numerical value to be predicted, train a regression model R(.) such that the following mean squared error (MSE) is minimized 1 N min ∑ (yi − f (xi ))2 x f N i =1 18
  • 19. Computational Framework [taslp08] Predict the VA values 1 N Trains a regression model min ∑ (yi − f (xi ))2 f N f (·) that minimizes the mean i =1 squared error (MSE) yi : numerical emotion value One for valence; xi : feature (input) one for arousal f(xi) : prediction result (output) e.g. linear regression f(xi) = wTxi +b Emotion Manual value = sumj {wjxij} +b annotation Training Regressor data training Feature extraction Feature Regressor Test Feature Feature Automatic Emotion data extraction Prediction value 19
  • 20. Obtain Music Emotion Rating Manual annotation Rates the VA values of each song Ordinal rating scale Scroll bar Emotion Manual value Training annotation Regressor data training Feature extraction Feature Regressor Test Feature Feature Automatic Emotion data extraction Prediction value 20
  • 21. Evaluation of Emotion Rating User study 1240 Chinese pop songs; each 30-sec 666 subjects; each rates 8 random songs Subjective evaluation 0 100 Easiness of annotating emotion Within-subject reliability: compare to one month later Between-subject reliability: compare to other subjects Within-subject Between-subject Method Easiness reliability reliability Emotion rating 2.82 2.92 2.81 From 1 to 5 (strongly disagree to strongly agree) 21
  • 22. AnnoEmo: GUI for Emotion Rating [hcm07] Encourages differentiation Demo Drag & drop Click to to modify listen again annotation 22
  • 23. Cognitive Load is Still High Determining VA values is not that easy Difficult to ensure consistently Does dist(0.5,0.8) = dist(–0.2,0.1) in terms of our emotion perception? Does 0.7 the same for two subjects? 1 0.5 0.8 -1 1 0.1 -0.2 -1 23
  • 24. Sol: Ranking Instead of Rating [taslp11a] Determines the position of a song By the relative ranking with respect to other songs Rather than by the exact emotion values Oh Happy Day valence positive valence I Want to Hold Your Hand by Beatles =1 I Feel Good by James Brown What a Wonderful World by Louis Armstrong relative Into the Woods by My Morning Jacket exact ranking The Christmas Song rating C'est La Vie Labita by Lisa One Just the Way You Are by Billy Joel negative Perfect Day by Lou Reed valence valence When a Man Loves a Woman by Michael Bolton = –1 Smells Like Teen Spirit by Nirvana 24
  • 25. Ranking-Based Emotion Annotation Emotion tournament Requires only n–1 pairwise comparisons The global ordering can later be approximated by a greedy algorithm [jair99] a b c d e f g h a 0 b 3 c 1 d 0 e 0 f 7 g 0 a b c d e f g h h 1 Which songs is more positive? f>b>c=h>a=d=e=g 25
  • 27. Simplify Emotion Annotation Subjective evaluation Both rate and rank The ordering of rate and rank does not matter Result Strong Weak 27
  • 28. Q: Which Features are Relevant? [psy07] Sound intensity Tempo Rhythm major Pitch range Mode Consonance 28
  • 29. Feature Extraction Melody/harmony [MIR toolbox] Pitch estimate, key clarity, harmonic change, musical mode Spectral [Marsyas] Spectral flatness measures, spectral crest factors, MFCCs Temporal [Sound description toolbox] Zero-crossing rate, temporal centroid, log-attack time Rhythmic [Rhythm pattern extractor] Beat histogram and average tempo Psyco-acoustic motivated features [PsySound] Loudness, sharpness, timbral width, volume, spectral dissonance, tonal dissonance, pure tonal, complex tonal, multiplicity, tonality, chord 29
  • 31. Q: Subjective Issue Each circle represents the emotion annotation for a music piece by a subject 31
  • 32. Sol: Probabilistic MER [taslp11b] Predicts the probabilistic distribution P(e|d) of the perceived emotions of a music piece 32
  • 33. Sol: Personalized MER [sigir09] From P(e|d) to P(e|d,u) General regressor personal regressor Utilize user feedback Manual Emotion value annotation Regressor Training data Feature training extraction Feature Regressor Test Feature Feature Automatic Personalization data extraction Prediction Emotion value Emotion-based User retrieval feedback 33
  • 34. Evaluation Setup Training data 195 Western/Japanese/Chinese pop songs 25-sec segment that is representative of the song Too long the emotion may not be homogeneous Too short the listener may not hear enough Manual annotation 253 subjects; each rates 12 songs Rate the VA values in 11 ordinal levels ○ 0 ○ 1 ○ 2 ○ 3 ○ 4 ○ 5 ○ 6 ○ 7 ○ 8 ○ 9 ○ 10 Each song is annotated by 10+ subjects Ground truth obtained by averaging 34
  • 35. Quantitative Result Method R2 of valence R2 of arousal Multiple linear regression 0.109 0.568 Adaboost.RT [ijcnn04] 0.117 0.553 SVR (support vector regression) [sc04] 0.222 0.570 SVR + RReliefF (feature selection) [ml03] 0.254 0.609 Result R2: squared correlation between y and f(x) Valence prediction is challenging Valence: 0.25 ~ 0.35 Arousal: 0.60 ~ 0.85 35
  • 36. Qualitative Result No No No Part 2 - Beyonce Out Ta Get Me - Guns N' Roses You're Crazy - Guns N' All Of Me - 50 Cent Roses Bodies - Sex Pistols New York Giants - Big Pun I've Got To See You Mammas Don't Let Again - Norah Jones Your Babies Grow Up To Be Cowboys - If Only In The Heaven's Willie Nelson Eyes - NSYNC Live For The One I Love - The Last Resort - The Eagles Celine Dion Why Do I Have To Choose - Willie Nelson 36
  • 37. Missing 1: Temporal Context of Music “Sweet anticipation” by David Huron Music’s most expressive qualities probably relate to structural changes across time Music emotion can also vary within an excerpt [tsmc06] 37
  • 38. Missing 2: Context of Music Listening Listening mood/context Familiarity/associated memory Preference of the singer/performer/song Social relationship 38
  • 39. Conclusion A computational framework for predicting numerical emotion values Generalizes MER from categorical to dimensional Resolves some issues of emotion description Rank instead of rate 2D user interface for music retrieval Valence & subjectivity Content & context Acknowledgement Prof. Homer Chen, National Taiwan University 39
  • 40. Reference Music Emotion Recognition, CRC Press, 2011 “A regression approach to music emotion recognition,” IEEE TASLP, 2008. (cited by 76) “Ranking-based emotion recognition for music organization and retrieval,” IEEE TASLP, 2011 “Prediction of the distribution of perceived music emotions using discrete samples,” IEEE TASLP, 2011 “Exploiting online tags for music emotion classification,” ACM TOMCCAP, 2011 “Machine recognition of music emotion: A review,” ACM TIST, 2012 40 CRC Press