SlideShare uma empresa Scribd logo
1 de 36
Baixar para ler offline
Emotional Speech
Synthesis
State of the art 2009
Felix Burkhardt


                        19.05.2009   1
outline



 how to model and why simulate emotions?
 emotions in speech
 introduction to speech synthesis approaches
 examples, examples, examples
 conclusion and outlook




                          Emotional Soeech Synthesis - Felix Burkhardt,   19.05.2009   2
contents



how to model and why simulate emotions?
emotions in speech
overview on speech synthesis
examples, examples, examples
conclusion, outlook




                        Emotional Soeech Synthesis - Felix Burkhardt,   19.05.2009   3
emotion models

                                                                       anger                        joy
…everyone except a
psychologist knows what an
emotion is (Young 1973)

categories, e.g. anger, joy, …                     despair


dimensions, e.g. activation,                                            neutral
dominance, valence




                                      arousal
appraisals, e.g. novelty, intrinsic
pleasantness, relevance, coping                                                             content
potential,                                                        e
                                                               anc     boredom
                                                            in
                                                     d om
                                                sadness
                                                                                           emotion cube

                                                                valence

                                                                  source: Burkhardt 2001


                                      Emotional Soeech Synthesis - Felix Burkhardt,          19.05.2009   4
why model emotional behaviour?


aspects of emotion modeling in human-machine interaction:




                                                           source: Batliner et al 2006


                               Emotional Soeech Synthesis - Felix Burkhardt,             19.05.2009   5
applications of emotional tts


           fun, e.g. emotional greetings
           prosthesis
           emotional chat avatars
           gaming, believable characters
    time   adapted dialog design
           adapted persona design
           target-group specific advertising
           …
           believable agents
           …
           artificial humans




                    Emotional Soeech Synthesis - Felix Burkhardt,   19.05.2009   6
aspects of emotional tts




               Emotional Soeech Synthesis - Felix Burkhardt,   19.05.2009   7
contents



why simulate emotions?
emotions in speech
overview on speech synthesis
examples, examples, examples
conclusion, outlook




                       Emotional Soeech Synthesis - Felix Burkhardt,   19.05.2009   8
speech features




           descriptive layers of speech

                                               source: Reynolds et al 2003


                   Emotional Soeech Synthesis - Felix Burkhardt,             19.05.2009   9
emotion in speech



       neutral                                              angry




        happy                                              bored




       frightened                                               sad
            spectrograms from emotional acted speech

                                                   source: TUB emotional database


                       Emotional Soeech Synthesis - Felix Burkhardt,                19.05.2009   10
emotional data?


actors vs. reality
Berlin EmoDB: 10 actors x 7
emotions x 10 sentences
alternatives
   induced data, e.g. Aibo
   television, radio data




                                                             EmoDB: Burkhardt et al 2005



                              Emotional Soeech Synthesis - Felix Burkhardt,                19.05.2009   11
how to describe emotion?


  EmotionML, incubator group at W3C
  Example, embedded in SSML:
<speak version=quot;1.0quot; xmlns=quot;http://www.w3.org/2001/10/synthesisquot; xml:lang=quot;en-USquot;>
  <voice gender=quot;femalequot;>
    <prosody contour=quot;(0%,+20Hz)(10%,+30%)(40%,+10Hz)quot;>
       Hi, am sad know but start getting angry...
    </prosody>
  </voice>
  <emotion>
   <category name=quot;sadness„ set=quot;basicquot; intensity=quot;0.6quot;/>
   <timing start=quot;10%quot; end=quot;50%quot;/>
  </emotion>
  <emotion>
   <category name=quot;angerquot; set=quot;basicquot; intensity=quot;0.4quot;/>
   <timing start=quot;50%quot; end=quot;100%quot;/>
  </emotion>
</speak>                                              http://www.w3.org/2005/Incubator/emotion/



                                        Emotional Soeech Synthesis - Felix Burkhardt,   19.05.2009   12
loquendo tts director




                                              source: Loquendo


                  Emotional Soeech Synthesis - Felix Burkhardt,   19.05.2009   13
contents



why simulate emotions?
emotions in speech
introduction to speech synthesis approaches
examples, examples, examples
conclusion, outlook




                         Emotional Soeech Synthesis - Felix Burkhardt,   19.05.2009   14
speech synthesis taxonomy



                                 speech synthesis systems




  voice response systems   re (copy)-synthesis, voice transformation         arbitary speech synthesizers




     voice conversion            text-to-speech                 concept-to-speech
                                 (unknown input)                (input from text-generation system)




                                         Emotional Soeech Synthesis - Felix Burkhardt,                 19.05.2009   15
tts process chain



    NLP natural                                                   DSP digital
    language                                                      speech
                      phonetic transcription
    processing        prosody track                               processing


   preprocessing                                           unit concatenation / search
   morpho-syntactic analysis                               prosody fitting
   transpcription                                          edge smoothing
   prosody modeling




                               Emotional Soeech Synthesis - Felix Burkhardt,     19.05.2009   16
synthesis approaches

                                           signal modeling                         system modeling


                                                                                             articulatory synthesis
                                                                                          vocal tract shape synthesis
                                                        atory
                                                  rticul
                    rule based                do a                   data based
                                       pseu


expert systems            statistical model generated            non-uniform unit selection            concatenative synthesis
formant synthesis         HMM hidden markov models
                          ANN neural nets
                                                                coding of units                                 type of units
                                                                                                                syllables,
                                                                                                                diphones,
                                   parametric coded                           waveform coded                    allophones,
                                   LPC linear predictive coding               PCM                               subsegments
                                   MFCC mel frequency cepstral                LDM (linear delta mod.)
                                   MBR multi band resynthesis
                                   formants

                                                                                  hybrid approaches
                                                                                  MBRPSOLA, RELP


                                                       Emotional Soeech Synthesis - Felix Burkhardt,                    19.05.2009   17
historic development

    natural sounding
    domain dependent                                                        non-uniform unit
                                                                            selection
                                                                            e.g. RealSpeak
                                           PSOLA based
                                           synthesis
                                           e.g. Elan
                             formant synthesis
                             e.g. Dec Talk
               articulatory
               van Kempelen
flexible              1780   ….   1980              1990                2000
                                                                                        not flexible
historic                                                                                modern
     artificial sounding
     domain independent



                                     Emotional Soeech Synthesis - Felix Burkhardt,             19.05.2009   18
system modeling




              Emotional Soeech Synthesis - Felix Burkhardt,   19.05.2009   19
source filter model




                                            source: Klatt80 formant synthesizer (Klatt 1980)


                Emotional Soeech Synthesis - Felix Burkhardt,                 19.05.2009       20
contents



why simulate emotions?
emotions in speech
overview on speech synthesis
examples, examples, examples
conclusion, outlook




                       Emotional Soeech Synthesis - Felix Burkhardt,   19.05.2009   21
examples: emofilt


open source Java program
based on MBROLA synthesis
engine.
NOT a complete text-to-speech
system
prosody filter between natural
language and digital speech
signal processing modules
as multilingual as MBROLA
which currently supports 35
languages.




                                 Emotional Soeech Synthesis - Felix Burkhardt,   19.05.2009   22
examples: emoSpeak


emoSpeak is integrated
into the MARY text-to-
speech framework by
DFKI.
Marc Schröder
investigated in his ph.d.
thesis, how to assign
rule-based modification of
speech to emotional
dimensions.
the system can be freely
dowloaded

                                                         source: Schröder 2004


                             Emotional Soeech Synthesis - Felix Burkhardt,       19.05.2009   23
examples voice conversion

 Murtaza Bulut et al,   PSOLA - LPC                              neutral angry
 USC                    conversion



 Greg Beller, IRCAM     Phase vocoder                            neutral sad




                            Emotional Soeech Synthesis - Felix Burkhardt,        19.05.2009   24
examples voice transformation

 Olivier Rosec        Mixed LF + harmonic                      woman
 FranceTelecom 2009   model                                    as boy
                                                               as man
                                                               man
                                                               breathy
                                                               whispery
                                                               tense
 Shiva Sundaram       Laughter synthesis by
 USC 2007             LPC synthesis and
                      mass-spring model




                          Emotional Soeech Synthesis - Felix Burkhardt,   19.05.2009   25
examples formant synthesis

 AffectEditor        DEC Talk prosody                         sad        angry
 J. Cahn, MIT 1998   rules




 EmoSyn              prosody rules +                          neutral sad
 Burkhardt, 2000     phonation model                          angry crying
                                                              content




                         Emotional Soeech Synthesis - Felix Burkhardt,           19.05.2009   26
examples diphone synthesis

 MARY                prosody rules for      joy angry
 M. Schröder, DFKI   dimensions
                     three inventories for
                     soft, normal and tense
                     speech
 EmoFilt             prosody rules          neutral joy
 Burkhardt, 1999




                         Emotional Soeech Synthesis - Felix Burkhardt,   19.05.2009   27
examples statistical based

 Tokyo Institute,   HMM models spectral                      neutral joy
 Kobayashi Lab      and prosodic features




                        Emotional Soeech Synthesis - Felix Burkhardt,      19.05.2009   28
examples unit selection

             fun personality voices                     Damian Shouty




             CTTS with expressive                       product research
             units




             extralinguistic units                      Katrin




                 Emotional Soeech Synthesis - Felix Burkhardt,        19.05.2009   29
examples non human

Oudeyer: Sony pet   concatenative                              happy sad
robots

MIT Kismet robot    formant synthesis                          anger fear




                        Emotional Soeech Synthesis - Felix Burkhardt,       19.05.2009   30
examples singing

 vocal tract lab        2007                                       donna nobis
 Peter Birkholz         articulatory


 pavarobotti            1993                                       aria
 Ingo Titze             Articulatory



 Bell Labs Gerstman &   1961 articulatory, first                   bicycle
 Mathews,               song ever




                            Emotional Soeech Synthesis - Felix Burkhardt,        19.05.2009   31
more examples …
              http://emosamples.syntheticspeech.de




             Emotional Soeech Synthesis - Felix Burkhardt,   19.05.2009   32
contents



why simulate emotions?
emotions in speech
overview on speech synthesis
examples, examples, examples
conclusion, outlook




                       Emotional Soeech Synthesis - Felix Burkhardt,   19.05.2009   33
conclusion


emotions are part of natural speech
simulation possible by either
   modeling the process
   including emotional data
still text to speech fights with intelligible, neutral speech
first steps: speaking styles, extralinguistics
first apps: fun, gaming




                              Emotional Soeech Synthesis - Felix Burkhardt,   19.05.2009   34
outlook


discrepancy between
   natural but unflexible vs.
   artificial sounding but flexible
solutions short - middle term:
   very large databases
   hybrid parametric – non-uniform unit selection
   voice transformation techniques
   high quality source filter model based synthesis
solutions on the long run
   physical modeling




                             Emotional Soeech Synthesis - Felix Burkhardt,   19.05.2009   35
references




             Emotional Soeech Synthesis - Felix Burkhardt,   19.05.2009   36

Mais conteúdo relacionado

Destaque

Examples and synthesis of academic licenses to start ups - lebret
Examples and synthesis of academic licenses to start ups - lebretExamples and synthesis of academic licenses to start ups - lebret
Examples and synthesis of academic licenses to start ups - lebretHervé Lebret
 
Major presentation
Major presentationMajor presentation
Major presentationStuti Shukla
 
Disability and Communication
Disability and CommunicationDisability and Communication
Disability and CommunicationMira K Desai
 
Basic health issues and role of private healthcare System in Pakistan
Basic health issues  and role of private healthcare System in PakistanBasic health issues  and role of private healthcare System in Pakistan
Basic health issues and role of private healthcare System in PakistanDr Abdul Ghafoor
 
Collaboration Techniques that really work
Collaboration Techniques that really workCollaboration Techniques that really work
Collaboration Techniques that really workleisa reichelt
 
12 Principles of Collaboration
12 Principles of Collaboration12 Principles of Collaboration
12 Principles of CollaborationJacob Morgan
 
Communication studies i.a.
Communication studies i.a.Communication studies i.a.
Communication studies i.a.Renae Scarlett
 
Collaboration PowerPoint slides
Collaboration PowerPoint slidesCollaboration PowerPoint slides
Collaboration PowerPoint slideseisolomon
 

Destaque (11)

Confrontation skills
Confrontation skillsConfrontation skills
Confrontation skills
 
Examples and synthesis of academic licenses to start ups - lebret
Examples and synthesis of academic licenses to start ups - lebretExamples and synthesis of academic licenses to start ups - lebret
Examples and synthesis of academic licenses to start ups - lebret
 
Major presentation
Major presentationMajor presentation
Major presentation
 
Disability and Communication
Disability and CommunicationDisability and Communication
Disability and Communication
 
Basic health issues and role of private healthcare System in Pakistan
Basic health issues  and role of private healthcare System in PakistanBasic health issues  and role of private healthcare System in Pakistan
Basic health issues and role of private healthcare System in Pakistan
 
Collaboration
CollaborationCollaboration
Collaboration
 
Collaboration Techniques that really work
Collaboration Techniques that really workCollaboration Techniques that really work
Collaboration Techniques that really work
 
12 Principles of Collaboration
12 Principles of Collaboration12 Principles of Collaboration
12 Principles of Collaboration
 
LEARNING DISABILITY
LEARNING DISABILITYLEARNING DISABILITY
LEARNING DISABILITY
 
Communication studies i.a.
Communication studies i.a.Communication studies i.a.
Communication studies i.a.
 
Collaboration PowerPoint slides
Collaboration PowerPoint slidesCollaboration PowerPoint slides
Collaboration PowerPoint slides
 

Último

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 

Último (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 

Emotional Tts

  • 1. Emotional Speech Synthesis State of the art 2009 Felix Burkhardt 19.05.2009 1
  • 2. outline how to model and why simulate emotions? emotions in speech introduction to speech synthesis approaches examples, examples, examples conclusion and outlook Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 2
  • 3. contents how to model and why simulate emotions? emotions in speech overview on speech synthesis examples, examples, examples conclusion, outlook Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 3
  • 4. emotion models anger joy …everyone except a psychologist knows what an emotion is (Young 1973) categories, e.g. anger, joy, … despair dimensions, e.g. activation, neutral dominance, valence arousal appraisals, e.g. novelty, intrinsic pleasantness, relevance, coping content potential, e anc boredom in d om sadness emotion cube valence source: Burkhardt 2001 Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 4
  • 5. why model emotional behaviour? aspects of emotion modeling in human-machine interaction: source: Batliner et al 2006 Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 5
  • 6. applications of emotional tts fun, e.g. emotional greetings prosthesis emotional chat avatars gaming, believable characters time adapted dialog design adapted persona design target-group specific advertising … believable agents … artificial humans Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 6
  • 7. aspects of emotional tts Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 7
  • 8. contents why simulate emotions? emotions in speech overview on speech synthesis examples, examples, examples conclusion, outlook Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 8
  • 9. speech features descriptive layers of speech source: Reynolds et al 2003 Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 9
  • 10. emotion in speech neutral angry happy bored frightened sad spectrograms from emotional acted speech source: TUB emotional database Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 10
  • 11. emotional data? actors vs. reality Berlin EmoDB: 10 actors x 7 emotions x 10 sentences alternatives induced data, e.g. Aibo television, radio data EmoDB: Burkhardt et al 2005 Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 11
  • 12. how to describe emotion? EmotionML, incubator group at W3C Example, embedded in SSML: <speak version=quot;1.0quot; xmlns=quot;http://www.w3.org/2001/10/synthesisquot; xml:lang=quot;en-USquot;> <voice gender=quot;femalequot;> <prosody contour=quot;(0%,+20Hz)(10%,+30%)(40%,+10Hz)quot;> Hi, am sad know but start getting angry... </prosody> </voice> <emotion> <category name=quot;sadness„ set=quot;basicquot; intensity=quot;0.6quot;/> <timing start=quot;10%quot; end=quot;50%quot;/> </emotion> <emotion> <category name=quot;angerquot; set=quot;basicquot; intensity=quot;0.4quot;/> <timing start=quot;50%quot; end=quot;100%quot;/> </emotion> </speak> http://www.w3.org/2005/Incubator/emotion/ Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 12
  • 13. loquendo tts director source: Loquendo Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 13
  • 14. contents why simulate emotions? emotions in speech introduction to speech synthesis approaches examples, examples, examples conclusion, outlook Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 14
  • 15. speech synthesis taxonomy speech synthesis systems voice response systems re (copy)-synthesis, voice transformation arbitary speech synthesizers voice conversion text-to-speech concept-to-speech (unknown input) (input from text-generation system) Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 15
  • 16. tts process chain NLP natural DSP digital language speech phonetic transcription processing prosody track processing preprocessing unit concatenation / search morpho-syntactic analysis prosody fitting transpcription edge smoothing prosody modeling Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 16
  • 17. synthesis approaches signal modeling system modeling articulatory synthesis vocal tract shape synthesis atory rticul rule based do a data based pseu expert systems statistical model generated non-uniform unit selection concatenative synthesis formant synthesis HMM hidden markov models ANN neural nets coding of units type of units syllables, diphones, parametric coded waveform coded allophones, LPC linear predictive coding PCM subsegments MFCC mel frequency cepstral LDM (linear delta mod.) MBR multi band resynthesis formants hybrid approaches MBRPSOLA, RELP Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 17
  • 18. historic development natural sounding domain dependent non-uniform unit selection e.g. RealSpeak PSOLA based synthesis e.g. Elan formant synthesis e.g. Dec Talk articulatory van Kempelen flexible 1780 …. 1980 1990 2000 not flexible historic modern artificial sounding domain independent Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 18
  • 19. system modeling Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 19
  • 20. source filter model source: Klatt80 formant synthesizer (Klatt 1980) Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 20
  • 21. contents why simulate emotions? emotions in speech overview on speech synthesis examples, examples, examples conclusion, outlook Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 21
  • 22. examples: emofilt open source Java program based on MBROLA synthesis engine. NOT a complete text-to-speech system prosody filter between natural language and digital speech signal processing modules as multilingual as MBROLA which currently supports 35 languages. Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 22
  • 23. examples: emoSpeak emoSpeak is integrated into the MARY text-to- speech framework by DFKI. Marc Schröder investigated in his ph.d. thesis, how to assign rule-based modification of speech to emotional dimensions. the system can be freely dowloaded source: Schröder 2004 Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 23
  • 24. examples voice conversion Murtaza Bulut et al, PSOLA - LPC neutral angry USC conversion Greg Beller, IRCAM Phase vocoder neutral sad Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 24
  • 25. examples voice transformation Olivier Rosec Mixed LF + harmonic woman FranceTelecom 2009 model as boy as man man breathy whispery tense Shiva Sundaram Laughter synthesis by USC 2007 LPC synthesis and mass-spring model Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 25
  • 26. examples formant synthesis AffectEditor DEC Talk prosody sad angry J. Cahn, MIT 1998 rules EmoSyn prosody rules + neutral sad Burkhardt, 2000 phonation model angry crying content Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 26
  • 27. examples diphone synthesis MARY prosody rules for joy angry M. Schröder, DFKI dimensions three inventories for soft, normal and tense speech EmoFilt prosody rules neutral joy Burkhardt, 1999 Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 27
  • 28. examples statistical based Tokyo Institute, HMM models spectral neutral joy Kobayashi Lab and prosodic features Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 28
  • 29. examples unit selection fun personality voices Damian Shouty CTTS with expressive product research units extralinguistic units Katrin Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 29
  • 30. examples non human Oudeyer: Sony pet concatenative happy sad robots MIT Kismet robot formant synthesis anger fear Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 30
  • 31. examples singing vocal tract lab 2007 donna nobis Peter Birkholz articulatory pavarobotti 1993 aria Ingo Titze Articulatory Bell Labs Gerstman & 1961 articulatory, first bicycle Mathews, song ever Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 31
  • 32. more examples … http://emosamples.syntheticspeech.de Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 32
  • 33. contents why simulate emotions? emotions in speech overview on speech synthesis examples, examples, examples conclusion, outlook Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 33
  • 34. conclusion emotions are part of natural speech simulation possible by either modeling the process including emotional data still text to speech fights with intelligible, neutral speech first steps: speaking styles, extralinguistics first apps: fun, gaming Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 34
  • 35. outlook discrepancy between natural but unflexible vs. artificial sounding but flexible solutions short - middle term: very large databases hybrid parametric – non-uniform unit selection voice transformation techniques high quality source filter model based synthesis solutions on the long run physical modeling Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 35
  • 36. references Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 36