1. InterSpeech 2012
Home
13th Annual Conference of the International Speech Communication Association
September 9-13, 2012 |
Portland, Oregon
About The
Conference
Program
Grants and
Awards
Sponsors
ISDN Number:
1990-9770
Conference
Poster
Helpful Hotlinks
EventScribe
Attendee Roster
Final Agenda
Organizing Secretariat
Computer-Assisted Language
Learning (CALL) Systems
Overview
Computer-assisted language learning (CALL) provides an effective
learning environment so that students can practice in an interactive
manner using multi-media content, either with the supervision of
teachers or on their own pace in self-learning. The advancement of
speech and language technologies has opened new perspectives on
CALL systems, such as automatic pronunciation assessment and
simulated conversational-style lessons. CALL is also regarded as one
of new and promising applications of speech analysis, recognition and
synthesis. CALL covers a variety of aspects including segmental,
prosodic and lexical features. Modeling non-native speech to correctly
segment/recognize utterances while detecting errors included in them
poses a number of challenges in speech processing. Assessing
intelligibility of non-native speech or proficiency of non-native
speakers is also an important issue. In this tutorial, we will give an
overview on these issues and current solutions. The tutorial is mainly
targeted for speech researchers and engineers interested in CALL, but
also for those engaged in language teaching or learning technology.
First we review speech recognition technologies for pronunciation
learning, specifically pronunciation evaluation and error detection.
Statistical approaches to these problems are formulated, and then
acoustic and pronunciation modeling of non-native speech is described.
2. Unlike the conventional non-native speech recognition, error detection
capability is required in CALL, thus an effective error prediction
scheme is vitally important. Next, we address prosodic modeling and
evaluation, such as duration, stress and tones, and then the use of
speech synthesis technologies including re-synthesis and morphing.
After the review of basic component technologies, we introduce a
number of practical CALL systems which have been developed as
commercial products or deployed in classrooms, including those in our
universities. Majority of them focus on learning English as a second
language (ESL), but some deal with other languages such as Japanese
and Chinese. We also review databases of non-native speech, which
are necessary to develop CALL systems.
Outline
1. Introduction and Overview (Kawahara)
Review history and category of CALL systems.
2. Segmental aspect and speech recognition technology
(Kawahara)
2.1. Speech analysis for CALL
2.2. Segmentation of non-native speech
2.3. Error detection of non-native speech
2.4. Scoring of non-native speech
2.5. Acoustic model for non-native speech
2.6. Pronunciation model for non-native speech
2.7. Discriminative modeling
3. Prosodic aspect (Minematsu)
3.1. Prosodic deviations found in non-native pronunciation
3. 3.2. Duration modeling & evaluation
3.3. Stress and tone modeling & evaluation
3.4. Intonation modeling & evaluation
4. Speech synthesis technology for CALL (Minematsu)
4.1. Text-to-speech for CALL
4.2. Re-synthesis for CALL
4.3. Morphing for CALL
5. Practical CALL systems (Kawahara)
Review major CALL systems that have been developed and
deployed for learning English and other languages.
6. Database for CALL (Minematsu)
Review major databases of non-native speech, which are
critical resources in developing CALL systems.
Short Biographies
Tatsuya Kawahara is a professor in Academic Center for Computing
and Media Studies and an affiliated professor in School of Informatics,
Kyoto University.
He has also been an invited researcher at ATR and NICT. He was a
visiting researcher at Bell Laboratories from 1995 to 1996. He has
published more than 200 technical papers on speech recognition,
spoken language processing, and spoken dialog systems. He has been
managing several speech-related projects including a free speech
recognition engine Julius (http://julius.sourceforge.jp/) and the
automatic transcription system for the Japanese Parliament (Diet).
From 2003 to 2006, he was a member of IEEE SPS Speech Technical
Committee. From 2011, he is a secretary of IEEE SPS Japan Chapter.
He was a general chair of IEEE Automatic Speech Recognition &
Understanding workshop (ASRU 2007). He has also served as a
tutorial chair of INTERSPEECH 2010 and a local arrangement chair of
ICASSP 2012. He is an editorial board member of Elsevier Journal of
4. Computer Speech and Language, ACM Transactions on Speech and
Language Processing, and APSIPA Transactions on Signal and
Information. He is a senior member of IEEE.
E-mail: kawahara@i.kyoto-u.ac.jp
Webpage: http://www.ar.media.kyoto-u.ac.jp/members/kawahara/
Nobuaki Minematsu is an associate professor in Graduate School of
Information Science and Technology, the University of Tokyo. He was
a visiting researcher at Royal Institute of Technology, Sweden (KTH)
from 2002 to 2003. He has a very wide interest in speech
communication covering from science to engineering. He has published
more than 200 scientific and technical papers including conference
papers. Those papers are on speech analysis, speech perception, speech
recognition, speech synthesis, language learning systems, etc. He was a
member of the organizing committee of Speech Prosody 2004, L2WS
2010, INTERSPEECH 2010. From 2006, he is a member of SLaTE
(ISCA SIG on Speech and Language Technology in Education). From
2011, he is a treasurer of IEEE SPS Japan Chapter. He has also been
serving as an editorial board member of Acoustic Society of Japan, The
Institute of Electronics, Information and Communication Engineers,
and Information Processing Society of Japan.