SlideShare uma empresa Scribd logo
1 de 36
Baixar para ler offline
Emotional Speech
Synthesis
State of the art 2009
Felix Burkhardt


                        19.05.2009   1
outline



 how to model and why simulate emotions?
 emotions in speech
 introduction to speech synthesis approaches
 examples, examples, examples
 conclusion and outlook




                          Emotional Soeech Synthesis - Felix Burkhardt,   19.05.2009   2
contents



how to model and why simulate emotions?
emotions in speech
overview on speech synthesis
examples, examples, examples
conclusion, outlook




                        Emotional Soeech Synthesis - Felix Burkhardt,   19.05.2009   3
emotion models

                                                                       anger                        joy
…everyone except a
psychologist knows what an
emotion is (Young 1973)

categories, e.g. anger, joy, …                     despair


dimensions, e.g. activation,                                            neutral
dominance, valence




                                      arousal
appraisals, e.g. novelty, intrinsic
pleasantness, relevance, coping                                                             content
potential,                                                        e
                                                               anc     boredom
                                                            in
                                                     d om
                                                sadness
                                                                                           emotion cube

                                                                valence

                                                                  source: Burkhardt 2001


                                      Emotional Soeech Synthesis - Felix Burkhardt,          19.05.2009   4
why model emotional behaviour?


aspects of emotion modeling in human-machine interaction:




                                                           source: Batliner et al 2006


                               Emotional Soeech Synthesis - Felix Burkhardt,             19.05.2009   5
applications of emotional tts


           fun, e.g. emotional greetings
           prosthesis
           emotional chat avatars
           gaming, believable characters
    time   adapted dialog design
           adapted persona design
           target-group specific advertising
           …
           believable agents
           …
           artificial humans




                    Emotional Soeech Synthesis - Felix Burkhardt,   19.05.2009   6
aspects of emotional tts




               Emotional Soeech Synthesis - Felix Burkhardt,   19.05.2009   7
contents



why simulate emotions?
emotions in speech
overview on speech synthesis
examples, examples, examples
conclusion, outlook




                       Emotional Soeech Synthesis - Felix Burkhardt,   19.05.2009   8
speech features




           descriptive layers of speech

                                               source: Reynolds et al 2003


                   Emotional Soeech Synthesis - Felix Burkhardt,             19.05.2009   9
emotion in speech



       neutral                                              angry




        happy                                              bored




       frightened                                               sad
            spectrograms from emotional acted speech

                                                   source: TUB emotional database


                       Emotional Soeech Synthesis - Felix Burkhardt,                19.05.2009   10
emotional data?


actors vs. reality
Berlin EmoDB: 10 actors x 7
emotions x 10 sentences
alternatives
   induced data, e.g. Aibo
   television, radio data




                                                             EmoDB: Burkhardt et al 2005



                              Emotional Soeech Synthesis - Felix Burkhardt,                19.05.2009   11
how to describe emotion?


  EmotionML, incubator group at W3C
  Example, embedded in SSML:
<speak version=quot;1.0quot; xmlns=quot;http://www.w3.org/2001/10/synthesisquot; xml:lang=quot;en-USquot;>
  <voice gender=quot;femalequot;>
    <prosody contour=quot;(0%,+20Hz)(10%,+30%)(40%,+10Hz)quot;>
       Hi, am sad know but start getting angry...
    </prosody>
  </voice>
  <emotion>
   <category name=quot;sadness„ set=quot;basicquot; intensity=quot;0.6quot;/>
   <timing start=quot;10%quot; end=quot;50%quot;/>
  </emotion>
  <emotion>
   <category name=quot;angerquot; set=quot;basicquot; intensity=quot;0.4quot;/>
   <timing start=quot;50%quot; end=quot;100%quot;/>
  </emotion>
</speak>                                              http://www.w3.org/2005/Incubator/emotion/



                                        Emotional Soeech Synthesis - Felix Burkhardt,   19.05.2009   12
loquendo tts director




                                              source: Loquendo


                  Emotional Soeech Synthesis - Felix Burkhardt,   19.05.2009   13
contents



why simulate emotions?
emotions in speech
introduction to speech synthesis approaches
examples, examples, examples
conclusion, outlook




                         Emotional Soeech Synthesis - Felix Burkhardt,   19.05.2009   14
speech synthesis taxonomy



                                 speech synthesis systems




  voice response systems   re (copy)-synthesis, voice transformation         arbitary speech synthesizers




     voice conversion            text-to-speech                 concept-to-speech
                                 (unknown input)                (input from text-generation system)




                                         Emotional Soeech Synthesis - Felix Burkhardt,                 19.05.2009   15
tts process chain



    NLP natural                                                   DSP digital
    language                                                      speech
                      phonetic transcription
    processing        prosody track                               processing


   preprocessing                                           unit concatenation / search
   morpho-syntactic analysis                               prosody fitting
   transpcription                                          edge smoothing
   prosody modeling




                               Emotional Soeech Synthesis - Felix Burkhardt,     19.05.2009   16
synthesis approaches

                                           signal modeling                         system modeling


                                                                                             articulatory synthesis
                                                                                          vocal tract shape synthesis
                                                        atory
                                                  rticul
                    rule based                do a                   data based
                                       pseu


expert systems            statistical model generated            non-uniform unit selection            concatenative synthesis
formant synthesis         HMM hidden markov models
                          ANN neural nets
                                                                coding of units                                 type of units
                                                                                                                syllables,
                                                                                                                diphones,
                                   parametric coded                           waveform coded                    allophones,
                                   LPC linear predictive coding               PCM                               subsegments
                                   MFCC mel frequency cepstral                LDM (linear delta mod.)
                                   MBR multi band resynthesis
                                   formants

                                                                                  hybrid approaches
                                                                                  MBRPSOLA, RELP


                                                       Emotional Soeech Synthesis - Felix Burkhardt,                    19.05.2009   17
historic development

    natural sounding
    domain dependent                                                        non-uniform unit
                                                                            selection
                                                                            e.g. RealSpeak
                                           PSOLA based
                                           synthesis
                                           e.g. Elan
                             formant synthesis
                             e.g. Dec Talk
               articulatory
               van Kempelen
flexible              1780   ….   1980              1990                2000
                                                                                        not flexible
historic                                                                                modern
     artificial sounding
     domain independent



                                     Emotional Soeech Synthesis - Felix Burkhardt,             19.05.2009   18
system modeling




              Emotional Soeech Synthesis - Felix Burkhardt,   19.05.2009   19
source filter model




                                            source: Klatt80 formant synthesizer (Klatt 1980)


                Emotional Soeech Synthesis - Felix Burkhardt,                 19.05.2009       20
contents



why simulate emotions?
emotions in speech
overview on speech synthesis
examples, examples, examples
conclusion, outlook




                       Emotional Soeech Synthesis - Felix Burkhardt,   19.05.2009   21
examples: emofilt


open source Java program
based on MBROLA synthesis
engine.
NOT a complete text-to-speech
system
prosody filter between natural
language and digital speech
signal processing modules
as multilingual as MBROLA
which currently supports 35
languages.




                                 Emotional Soeech Synthesis - Felix Burkhardt,   19.05.2009   22
examples: emoSpeak


emoSpeak is integrated
into the MARY text-to-
speech framework by
DFKI.
Marc Schröder
investigated in his ph.d.
thesis, how to assign
rule-based modification of
speech to emotional
dimensions.
the system can be freely
dowloaded

                                                         source: Schröder 2004


                             Emotional Soeech Synthesis - Felix Burkhardt,       19.05.2009   23
examples voice conversion

 Murtaza Bulut et al,   PSOLA - LPC                              neutral angry
 USC                    conversion



 Greg Beller, IRCAM     Phase vocoder                            neutral sad




                            Emotional Soeech Synthesis - Felix Burkhardt,        19.05.2009   24
examples voice transformation

 Olivier Rosec        Mixed LF + harmonic                      woman
 FranceTelecom 2009   model                                    as boy
                                                               as man
                                                               man
                                                               breathy
                                                               whispery
                                                               tense
 Shiva Sundaram       Laughter synthesis by
 USC 2007             LPC synthesis and
                      mass-spring model




                          Emotional Soeech Synthesis - Felix Burkhardt,   19.05.2009   25
examples formant synthesis

 AffectEditor        DEC Talk prosody                         sad        angry
 J. Cahn, MIT 1998   rules




 EmoSyn              prosody rules +                          neutral sad
 Burkhardt, 2000     phonation model                          angry crying
                                                              content




                         Emotional Soeech Synthesis - Felix Burkhardt,           19.05.2009   26
examples diphone synthesis

 MARY                prosody rules for      joy angry
 M. Schröder, DFKI   dimensions
                     three inventories for
                     soft, normal and tense
                     speech
 EmoFilt             prosody rules          neutral joy
 Burkhardt, 1999




                         Emotional Soeech Synthesis - Felix Burkhardt,   19.05.2009   27
examples statistical based

 Tokyo Institute,   HMM models spectral                      neutral joy
 Kobayashi Lab      and prosodic features




                        Emotional Soeech Synthesis - Felix Burkhardt,      19.05.2009   28
examples unit selection

             fun personality voices                     Damian Shouty




             CTTS with expressive                       product research
             units




             extralinguistic units                      Katrin




                 Emotional Soeech Synthesis - Felix Burkhardt,        19.05.2009   29
examples non human

Oudeyer: Sony pet   concatenative                              happy sad
robots

MIT Kismet robot    formant synthesis                          anger fear




                        Emotional Soeech Synthesis - Felix Burkhardt,       19.05.2009   30
examples singing

 vocal tract lab        2007                                       donna nobis
 Peter Birkholz         articulatory


 pavarobotti            1993                                       aria
 Ingo Titze             Articulatory



 Bell Labs Gerstman &   1961 articulatory, first                   bicycle
 Mathews,               song ever




                            Emotional Soeech Synthesis - Felix Burkhardt,        19.05.2009   31
more examples …
              http://emosamples.syntheticspeech.de




             Emotional Soeech Synthesis - Felix Burkhardt,   19.05.2009   32
contents



why simulate emotions?
emotions in speech
overview on speech synthesis
examples, examples, examples
conclusion, outlook




                       Emotional Soeech Synthesis - Felix Burkhardt,   19.05.2009   33
conclusion


emotions are part of natural speech
simulation possible by either
   modeling the process
   including emotional data
still text to speech fights with intelligible, neutral speech
first steps: speaking styles, extralinguistics
first apps: fun, gaming




                              Emotional Soeech Synthesis - Felix Burkhardt,   19.05.2009   34
outlook


discrepancy between
   natural but unflexible vs.
   artificial sounding but flexible
solutions short - middle term:
   very large databases
   hybrid parametric – non-uniform unit selection
   voice transformation techniques
   high quality source filter model based synthesis
solutions on the long run
   physical modeling




                             Emotional Soeech Synthesis - Felix Burkhardt,   19.05.2009   35
references




             Emotional Soeech Synthesis - Felix Burkhardt,   19.05.2009   36

Mais conteúdo relacionado

Destaque

Major presentation
Major presentationMajor presentation
Major presentation
Stuti Shukla
 
Basic health issues and role of private healthcare System in Pakistan
Basic health issues  and role of private healthcare System in PakistanBasic health issues  and role of private healthcare System in Pakistan
Basic health issues and role of private healthcare System in Pakistan
Dr Abdul Ghafoor
 
Communication studies i.a.
Communication studies i.a.Communication studies i.a.
Communication studies i.a.
Renae Scarlett
 
Collaboration PowerPoint slides
Collaboration PowerPoint slidesCollaboration PowerPoint slides
Collaboration PowerPoint slides
eisolomon
 

Destaque (11)

Confrontation skills
Confrontation skillsConfrontation skills
Confrontation skills
 
Examples and synthesis of academic licenses to start ups - lebret
Examples and synthesis of academic licenses to start ups - lebretExamples and synthesis of academic licenses to start ups - lebret
Examples and synthesis of academic licenses to start ups - lebret
 
Major presentation
Major presentationMajor presentation
Major presentation
 
Disability and Communication
Disability and CommunicationDisability and Communication
Disability and Communication
 
Basic health issues and role of private healthcare System in Pakistan
Basic health issues  and role of private healthcare System in PakistanBasic health issues  and role of private healthcare System in Pakistan
Basic health issues and role of private healthcare System in Pakistan
 
Collaboration
CollaborationCollaboration
Collaboration
 
Collaboration Techniques that really work
Collaboration Techniques that really workCollaboration Techniques that really work
Collaboration Techniques that really work
 
12 Principles of Collaboration
12 Principles of Collaboration12 Principles of Collaboration
12 Principles of Collaboration
 
LEARNING DISABILITY
LEARNING DISABILITYLEARNING DISABILITY
LEARNING DISABILITY
 
Communication studies i.a.
Communication studies i.a.Communication studies i.a.
Communication studies i.a.
 
Collaboration PowerPoint slides
Collaboration PowerPoint slidesCollaboration PowerPoint slides
Collaboration PowerPoint slides
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 

Emotional Tts

  • 1. Emotional Speech Synthesis State of the art 2009 Felix Burkhardt 19.05.2009 1
  • 2. outline how to model and why simulate emotions? emotions in speech introduction to speech synthesis approaches examples, examples, examples conclusion and outlook Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 2
  • 3. contents how to model and why simulate emotions? emotions in speech overview on speech synthesis examples, examples, examples conclusion, outlook Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 3
  • 4. emotion models anger joy …everyone except a psychologist knows what an emotion is (Young 1973) categories, e.g. anger, joy, … despair dimensions, e.g. activation, neutral dominance, valence arousal appraisals, e.g. novelty, intrinsic pleasantness, relevance, coping content potential, e anc boredom in d om sadness emotion cube valence source: Burkhardt 2001 Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 4
  • 5. why model emotional behaviour? aspects of emotion modeling in human-machine interaction: source: Batliner et al 2006 Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 5
  • 6. applications of emotional tts fun, e.g. emotional greetings prosthesis emotional chat avatars gaming, believable characters time adapted dialog design adapted persona design target-group specific advertising … believable agents … artificial humans Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 6
  • 7. aspects of emotional tts Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 7
  • 8. contents why simulate emotions? emotions in speech overview on speech synthesis examples, examples, examples conclusion, outlook Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 8
  • 9. speech features descriptive layers of speech source: Reynolds et al 2003 Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 9
  • 10. emotion in speech neutral angry happy bored frightened sad spectrograms from emotional acted speech source: TUB emotional database Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 10
  • 11. emotional data? actors vs. reality Berlin EmoDB: 10 actors x 7 emotions x 10 sentences alternatives induced data, e.g. Aibo television, radio data EmoDB: Burkhardt et al 2005 Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 11
  • 12. how to describe emotion? EmotionML, incubator group at W3C Example, embedded in SSML: <speak version=quot;1.0quot; xmlns=quot;http://www.w3.org/2001/10/synthesisquot; xml:lang=quot;en-USquot;> <voice gender=quot;femalequot;> <prosody contour=quot;(0%,+20Hz)(10%,+30%)(40%,+10Hz)quot;> Hi, am sad know but start getting angry... </prosody> </voice> <emotion> <category name=quot;sadness„ set=quot;basicquot; intensity=quot;0.6quot;/> <timing start=quot;10%quot; end=quot;50%quot;/> </emotion> <emotion> <category name=quot;angerquot; set=quot;basicquot; intensity=quot;0.4quot;/> <timing start=quot;50%quot; end=quot;100%quot;/> </emotion> </speak> http://www.w3.org/2005/Incubator/emotion/ Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 12
  • 13. loquendo tts director source: Loquendo Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 13
  • 14. contents why simulate emotions? emotions in speech introduction to speech synthesis approaches examples, examples, examples conclusion, outlook Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 14
  • 15. speech synthesis taxonomy speech synthesis systems voice response systems re (copy)-synthesis, voice transformation arbitary speech synthesizers voice conversion text-to-speech concept-to-speech (unknown input) (input from text-generation system) Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 15
  • 16. tts process chain NLP natural DSP digital language speech phonetic transcription processing prosody track processing preprocessing unit concatenation / search morpho-syntactic analysis prosody fitting transpcription edge smoothing prosody modeling Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 16
  • 17. synthesis approaches signal modeling system modeling articulatory synthesis vocal tract shape synthesis atory rticul rule based do a data based pseu expert systems statistical model generated non-uniform unit selection concatenative synthesis formant synthesis HMM hidden markov models ANN neural nets coding of units type of units syllables, diphones, parametric coded waveform coded allophones, LPC linear predictive coding PCM subsegments MFCC mel frequency cepstral LDM (linear delta mod.) MBR multi band resynthesis formants hybrid approaches MBRPSOLA, RELP Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 17
  • 18. historic development natural sounding domain dependent non-uniform unit selection e.g. RealSpeak PSOLA based synthesis e.g. Elan formant synthesis e.g. Dec Talk articulatory van Kempelen flexible 1780 …. 1980 1990 2000 not flexible historic modern artificial sounding domain independent Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 18
  • 19. system modeling Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 19
  • 20. source filter model source: Klatt80 formant synthesizer (Klatt 1980) Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 20
  • 21. contents why simulate emotions? emotions in speech overview on speech synthesis examples, examples, examples conclusion, outlook Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 21
  • 22. examples: emofilt open source Java program based on MBROLA synthesis engine. NOT a complete text-to-speech system prosody filter between natural language and digital speech signal processing modules as multilingual as MBROLA which currently supports 35 languages. Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 22
  • 23. examples: emoSpeak emoSpeak is integrated into the MARY text-to- speech framework by DFKI. Marc Schröder investigated in his ph.d. thesis, how to assign rule-based modification of speech to emotional dimensions. the system can be freely dowloaded source: Schröder 2004 Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 23
  • 24. examples voice conversion Murtaza Bulut et al, PSOLA - LPC neutral angry USC conversion Greg Beller, IRCAM Phase vocoder neutral sad Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 24
  • 25. examples voice transformation Olivier Rosec Mixed LF + harmonic woman FranceTelecom 2009 model as boy as man man breathy whispery tense Shiva Sundaram Laughter synthesis by USC 2007 LPC synthesis and mass-spring model Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 25
  • 26. examples formant synthesis AffectEditor DEC Talk prosody sad angry J. Cahn, MIT 1998 rules EmoSyn prosody rules + neutral sad Burkhardt, 2000 phonation model angry crying content Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 26
  • 27. examples diphone synthesis MARY prosody rules for joy angry M. Schröder, DFKI dimensions three inventories for soft, normal and tense speech EmoFilt prosody rules neutral joy Burkhardt, 1999 Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 27
  • 28. examples statistical based Tokyo Institute, HMM models spectral neutral joy Kobayashi Lab and prosodic features Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 28
  • 29. examples unit selection fun personality voices Damian Shouty CTTS with expressive product research units extralinguistic units Katrin Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 29
  • 30. examples non human Oudeyer: Sony pet concatenative happy sad robots MIT Kismet robot formant synthesis anger fear Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 30
  • 31. examples singing vocal tract lab 2007 donna nobis Peter Birkholz articulatory pavarobotti 1993 aria Ingo Titze Articulatory Bell Labs Gerstman & 1961 articulatory, first bicycle Mathews, song ever Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 31
  • 32. more examples … http://emosamples.syntheticspeech.de Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 32
  • 33. contents why simulate emotions? emotions in speech overview on speech synthesis examples, examples, examples conclusion, outlook Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 33
  • 34. conclusion emotions are part of natural speech simulation possible by either modeling the process including emotional data still text to speech fights with intelligible, neutral speech first steps: speaking styles, extralinguistics first apps: fun, gaming Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 34
  • 35. outlook discrepancy between natural but unflexible vs. artificial sounding but flexible solutions short - middle term: very large databases hybrid parametric – non-uniform unit selection voice transformation techniques high quality source filter model based synthesis solutions on the long run physical modeling Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 35
  • 36. references Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 36