2. outline
how to model and why simulate emotions?
emotions in speech
introduction to speech synthesis approaches
examples, examples, examples
conclusion and outlook
Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 2
3. contents
how to model and why simulate emotions?
emotions in speech
overview on speech synthesis
examples, examples, examples
conclusion, outlook
Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 3
4. emotion models
anger joy
…everyone except a
psychologist knows what an
emotion is (Young 1973)
categories, e.g. anger, joy, … despair
dimensions, e.g. activation, neutral
dominance, valence
arousal
appraisals, e.g. novelty, intrinsic
pleasantness, relevance, coping content
potential, e
anc boredom
in
d om
sadness
emotion cube
valence
source: Burkhardt 2001
Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 4
5. why model emotional behaviour?
aspects of emotion modeling in human-machine interaction:
source: Batliner et al 2006
Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 5
6. applications of emotional tts
fun, e.g. emotional greetings
prosthesis
emotional chat avatars
gaming, believable characters
time adapted dialog design
adapted persona design
target-group specific advertising
…
believable agents
…
artificial humans
Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 6
7. aspects of emotional tts
Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 7
8. contents
why simulate emotions?
emotions in speech
overview on speech synthesis
examples, examples, examples
conclusion, outlook
Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 8
9. speech features
descriptive layers of speech
source: Reynolds et al 2003
Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 9
10. emotion in speech
neutral angry
happy bored
frightened sad
spectrograms from emotional acted speech
source: TUB emotional database
Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 10
11. emotional data?
actors vs. reality
Berlin EmoDB: 10 actors x 7
emotions x 10 sentences
alternatives
induced data, e.g. Aibo
television, radio data
EmoDB: Burkhardt et al 2005
Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 11
12. how to describe emotion?
EmotionML, incubator group at W3C
Example, embedded in SSML:
<speak version=quot;1.0quot; xmlns=quot;http://www.w3.org/2001/10/synthesisquot; xml:lang=quot;en-USquot;>
<voice gender=quot;femalequot;>
<prosody contour=quot;(0%,+20Hz)(10%,+30%)(40%,+10Hz)quot;>
Hi, am sad know but start getting angry...
</prosody>
</voice>
<emotion>
<category name=quot;sadness„ set=quot;basicquot; intensity=quot;0.6quot;/>
<timing start=quot;10%quot; end=quot;50%quot;/>
</emotion>
<emotion>
<category name=quot;angerquot; set=quot;basicquot; intensity=quot;0.4quot;/>
<timing start=quot;50%quot; end=quot;100%quot;/>
</emotion>
</speak> http://www.w3.org/2005/Incubator/emotion/
Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 12
13. loquendo tts director
source: Loquendo
Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 13
14. contents
why simulate emotions?
emotions in speech
introduction to speech synthesis approaches
examples, examples, examples
conclusion, outlook
Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 14
15. speech synthesis taxonomy
speech synthesis systems
voice response systems re (copy)-synthesis, voice transformation arbitary speech synthesizers
voice conversion text-to-speech concept-to-speech
(unknown input) (input from text-generation system)
Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 15
16. tts process chain
NLP natural DSP digital
language speech
phonetic transcription
processing prosody track processing
preprocessing unit concatenation / search
morpho-syntactic analysis prosody fitting
transpcription edge smoothing
prosody modeling
Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 16
17. synthesis approaches
signal modeling system modeling
articulatory synthesis
vocal tract shape synthesis
atory
rticul
rule based do a data based
pseu
expert systems statistical model generated non-uniform unit selection concatenative synthesis
formant synthesis HMM hidden markov models
ANN neural nets
coding of units type of units
syllables,
diphones,
parametric coded waveform coded allophones,
LPC linear predictive coding PCM subsegments
MFCC mel frequency cepstral LDM (linear delta mod.)
MBR multi band resynthesis
formants
hybrid approaches
MBRPSOLA, RELP
Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 17
18. historic development
natural sounding
domain dependent non-uniform unit
selection
e.g. RealSpeak
PSOLA based
synthesis
e.g. Elan
formant synthesis
e.g. Dec Talk
articulatory
van Kempelen
flexible 1780 …. 1980 1990 2000
not flexible
historic modern
artificial sounding
domain independent
Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 18
19. system modeling
Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 19
20. source filter model
source: Klatt80 formant synthesizer (Klatt 1980)
Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 20
21. contents
why simulate emotions?
emotions in speech
overview on speech synthesis
examples, examples, examples
conclusion, outlook
Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 21
22. examples: emofilt
open source Java program
based on MBROLA synthesis
engine.
NOT a complete text-to-speech
system
prosody filter between natural
language and digital speech
signal processing modules
as multilingual as MBROLA
which currently supports 35
languages.
Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 22
23. examples: emoSpeak
emoSpeak is integrated
into the MARY text-to-
speech framework by
DFKI.
Marc Schröder
investigated in his ph.d.
thesis, how to assign
rule-based modification of
speech to emotional
dimensions.
the system can be freely
dowloaded
source: Schröder 2004
Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 23
24. examples voice conversion
Murtaza Bulut et al, PSOLA - LPC neutral angry
USC conversion
Greg Beller, IRCAM Phase vocoder neutral sad
Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 24
25. examples voice transformation
Olivier Rosec Mixed LF + harmonic woman
FranceTelecom 2009 model as boy
as man
man
breathy
whispery
tense
Shiva Sundaram Laughter synthesis by
USC 2007 LPC synthesis and
mass-spring model
Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 25
26. examples formant synthesis
AffectEditor DEC Talk prosody sad angry
J. Cahn, MIT 1998 rules
EmoSyn prosody rules + neutral sad
Burkhardt, 2000 phonation model angry crying
content
Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 26
27. examples diphone synthesis
MARY prosody rules for joy angry
M. Schröder, DFKI dimensions
three inventories for
soft, normal and tense
speech
EmoFilt prosody rules neutral joy
Burkhardt, 1999
Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 27
28. examples statistical based
Tokyo Institute, HMM models spectral neutral joy
Kobayashi Lab and prosodic features
Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 28
29. examples unit selection
fun personality voices Damian Shouty
CTTS with expressive product research
units
extralinguistic units Katrin
Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 29
30. examples non human
Oudeyer: Sony pet concatenative happy sad
robots
MIT Kismet robot formant synthesis anger fear
Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 30
31. examples singing
vocal tract lab 2007 donna nobis
Peter Birkholz articulatory
pavarobotti 1993 aria
Ingo Titze Articulatory
Bell Labs Gerstman & 1961 articulatory, first bicycle
Mathews, song ever
Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 31
32. more examples …
http://emosamples.syntheticspeech.de
Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 32
33. contents
why simulate emotions?
emotions in speech
overview on speech synthesis
examples, examples, examples
conclusion, outlook
Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 33
34. conclusion
emotions are part of natural speech
simulation possible by either
modeling the process
including emotional data
still text to speech fights with intelligible, neutral speech
first steps: speaking styles, extralinguistics
first apps: fun, gaming
Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 34
35. outlook
discrepancy between
natural but unflexible vs.
artificial sounding but flexible
solutions short - middle term:
very large databases
hybrid parametric – non-uniform unit selection
voice transformation techniques
high quality source filter model based synthesis
solutions on the long run
physical modeling
Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 35
36. references
Emotional Soeech Synthesis - Felix Burkhardt, 19.05.2009 36