Developer Data Modeling Mistakes: From Postgres to NoSQL
Interactive speech based games for autistic children with asperger syndrome
1. Interactive Speech Based Games for Autistic Children with Asperger
Syndrome
Amal alqahtani, Nouf Jaafar, Nourah Alfadda
Information Technology Department
King Saud University, Riyadh, KSA
alqahtani.amal@gmail.com , nouf.abdulaziz.j@gmail.com, nourah.al.fadda@gmail.com
Abstract
1. Introduction
There are many computer users have varying
physical or mental abilities like autistic person.
Nowadays Artificial intelligence based technologies
Therefore, there are different approaches to deal with
are becoming more used to improve traditional
the technical applications, especially for people with
applications or to develop new ones. These
special needs who are suffering from using several
technologies are qualified as “Enhanced Computing
applications. The problem that occurs with children
Technologies” [1] because they allow the development
with autism is mainly when the child with autism tries
of beneficial applications through the use of natural
to contact and communicate with others. This is mainly
interfaces, the extraction of meaningful information
because children find it difficult to articulate their
and/or the creation of adaptive systems that are more
thoughts don’t have any suitable way to make the
reactive to the environment.
others around them understand what they need. One
Interactive Speech Based Games (ISBG) is a system
of the issues to cope with the problem is how to get
that is devoted to provide an appropriate support to
benefit from technology to help this user population.
autistic children with Asperger Syndrome. Asperger
In this paper, we will describe the objectives of
Syndrome is one of the autism spectrum disorders
Interactive Speech Based Games (ISBG) project and
which are characterized by difficulties in social
the overview of the system, then we will review the
interaction and communication, with repetitive
techniques that were used in the project, and how it is
behavior. Asperger’s children don’t have linguistic and
integrated with each other.
cognitive problems like other autism spectrum
disorders.
Categories and Subject Descriptors The key idea behind this project is to create a
K.3.1 [Computer Uses in Education]: Computer multimodal application that integrates speech
assisted instruction. H.5.1 [Information Interfaces and technology in an attempt to extend user's experience.
Presentation]: Multimedia Information Systems. The application is multimodal in the sense that it allows
users to choose the appropriate input method: speech,
General Terms text or point and click method.
Design, Human Factors. In addition to that, this project includes a web site
application, two desktop applications along with
bridges that allow linking desktop applications to the
Keywords web application. By integrating Microsoft Speech
ISBG, AI, Asperger Syndrome, multimodal Technology, two games have been implemented
application, Puzzle game, PECS system. namely Puzzle game and PECS. The speech
technologies include speech recognition and speech synthesis.
2. Motivations of ISBS project
2. The motivations behind the use of speech can be number of applications. SAPI decreases the code
summarized as follows [1]: required for an application to use speech recognition
and text-to-speech [2].
1. Using speech enables development of intuitive
and more natural interfaces for the user.
3.2.1. Overview of Speech API. Speech Application
2. A very large base of users can use the
Interface (SAPI) has the basic standard interfaces and
application.
functionality of the speech recognition technology that
3. Multimodal applications increase user
allows the programmer to create an application and
satisfaction.
integrate it with the speech recognition technology [3].
4. Using speech allows hand free access to
The SAPI consists of two components (see figure 3.1):
applications which is suitable for handicapped
the Application Programming Interface (API) and the
individuals as well as for busy ones (not hand
Device Driver Interface (DDI) for speech engines to
free).
5. Speech based applications provide an implement. API is used for the purpose of reducing the
time required to create such intelligent application and
appropriate support for the treatment of
for the abstraction of feature that hides many low-level
persons suffering from specific disabilities.
details of the implementation of this technology,
whereas the DDI is working with API to make the use
of speech synthesis and speech recognition engines and
application more convenient and this by removing
many implementation details such as multi-threading
and audio device management [4].
There are two types of SAPI engines. The first
engine is called the text-to-speech (TTS) which
converts text strings into synthesized spoken audio.
The other one is called speech recognizer which
recognizes the human audio and converts it to text
strings or files [3].
All speech recognition has a set of predefined words
that help the speech engine to better and speedily
recognize the speech that sends from the application.
The predefined words are called grammar [3].
Figure 1. The overview of ISBG
system.
3. Microsoft technologies
3.1. Introduction
The project was designed using on Microsoft
technologies, namely Microsoft Speech technology, Figure 2. The overview of SAPI [3].
and Microsoft OLE DB provider for SQL server. 3.2.2. Speech Recognition.
These technologies are used to develop speech
3.2.2.1. API for speech Recognition engine
enabled applications and to ensure connection between
• Types of speech recognition engines
remote databases respectively.
(ISpRecognizer) [3]:
3.2. Microsoft Speech technology 1. Shared recognizer:
The main purpose of this type of engine is to allow
Microsoft developed Speech Application the sharing with other speech recognition applications,
Programming Interface (SAPI), to make speech and this is the type that we used in our project.
technology more attainable and helpful for a large 2. InProc speech recognition engine:
3. InProc speech recognition engine is more 3.2.3. Speech Synthesis
appropriate for the large server applications that would 3.2.3.1. API for text-to-speech
run alone on a system and for which performance is The text-to-speech operations can handle
required. synchronously or asynchronously the voice by using
the ISpVoice Component Object Model (COM)
• ISpRecognizer components: interface. ISpVoice can convert the text string or text
1. ISpRecoContext: files into audio, also it has the ability to play audio
file.” An ISpVoice object forwards events back to the
The ISpRecoContext is the main interface of the
application when the corresponding audio data has
speech recognition; it receives a notification when
been rendered to the output device” [3].
speech recognition events occur on the application [3].
2. ISpRecoGrammar: 3.2.3.2. DDI of speech synthesis
This objects is created within the There are two text-to-speech engine interfaces at the
CreatRecoContext method which allow to define a set DDI level which are:
of words that will be recognized when an ISpTTSEngine interface and ISpTTSEngineSite
ISpRecoContext object has created and received a interface (see figure 3.2). The main interface of the
notify. This grammar can be dictation or a command SAPI speech synthesis is the IspTTSEngine which has
and control grammar [3]. a primary method Speak, this method simply called by
SAPI to convert the string text into audio [6].
3.2.2.2. DDI for Speech Recognition (Engine-Level-
Interfaces) • The Speak method has two general functionality
The speech recognition engine using DDI (speech [6]:
recognition manager) to receives the audio from the 1. Creates a linked list of text fragments.
SAPI and return the recognitions result [3]. There are 2. Receives a pointer to the ISpVoice object
two interfaces for doing this: ISpSREngine, and for the purpose of creating a queue events
ISpSREngineSite.The SAPI communicate to the speech and write output data.
engine through using the ISpSREngine interface. The • SAPI have a free-threaded architecture that
main recognition process is made by the enables the SAPI to do several things [6]:
RecognizeStream method provided by ISpSREngine
- SAPI can calls the TTS engine objects on a
interface. This method informs the speech recognition
engine to start the recognition processing and send single thread.
back the results to the SAPI application [5]. - SAPI ensures that parameter validation and
During the execution of RecognizeStream method, thread synchronization have been performed
the SAPI will call the method SetSite provided by properly before calling a TTS engine.
ISpSREngine interface. This method will create a
pointer to the ISpSREngineSite interface then the
engine will communicate to SAPI [5].
Once an engine has recognized a phrase it sends a
notification to the application which is the source of
that phrase, then the engine will call the AddEvent
method [5]. “This method add an event to the speech
recognition engine so the stream position passed into
AddEvent indicates the point in the audio stream after
which the engine is seeking recognitions” [5].
“When the engine gets the final recognition, it will
call Recognition method provided in SpSREngineSite
with an SPRECORESULTINFO structure with the Figure 3: Main objects of TTS
hypothesis flag not set” [5]. So the SAPI DDI gives the
possibility of the engine to have one thread which 3.3. The Microsoft OLE DB Provider for SQL
executes between SAPI and an engine [5]. Server
3.3.1. The problem. One of the needs related to data
management is to move the data from its original
4. containing system into some type of database
management system (DBMS) but this method is costly
and redundant. More than that another need is to be
able to access the Data within a DBMS as well as to
access the data via any other type of information
container. OLE DB is a Microsoft tool created to
address this issue [8].
3.3.2. What is OLE DB? OLE DB (Object Linking
and Embedding, Database, sometimes written as
OLEDB or OLE-DB) "OLE DB is a set of Component
Object Model (COM) interfaces" which allow
applications to access data that stored in various data
sources and it also provide applications with the ability
to implement database services. OLE DB is support the
3.3.4. The Microsoft OLE DB Provider for SQL
many of DBMS functionality that enabling it to share
Server. The Microsoft OLE DB Provider for SQL
its database [8].
Server provides is an OLE DB interface to Microsoft®
SQL Server™ 2000 databases by allowing Activex
3.3.3. OLE DB Providers Overview. The architecture data object (ADO) to directly access Microsoft SQL
of OLE DB is spelt in two main components the first Server as the pervious figure with this provider,
one is consumer which is the application which uses application can access data in remotely SQL Server
OLE DB and other one is provider which is the [10].
software component that implements the OLE DB
interface and provides the data to the consumer [9].
4. Exploratory survey
Providers also split into two categories: service
provider and data provider. The first one encapsulates a
A qualitative study was conducted with the
service by (producing as provider and consuming as
objective of assessing the usefulness of an interactive
consumer) data through OLE DB interfaces. But the
multimodal application using speech for autistic
data provider owns its data and it not dependent on
children especially those who have Asperger syndrome.
other providers to provide data to the consumer (like
The study sample of respondents was comprised
SQL server) [9]. As the following figure:
basically of specialists working in autism centers as
well as parents of children. The questionnaire was
published on a specialized website for autism and
Asperger patients. The number of respondents in this
survey is 30 people.
• Description:
• The developed questionnaire contains two types
of questions:
- Closed questions, that require specific answers
regarding issues related to the use of speech
within a computational tool dedicated to
Autistic.
- Open questions, to get more feedback from
the voters about the project and methods that
could be used.
a. Closed questions:
i. The first question explored whether
autistic children accept the use of
computers and modern technologies.
Figure 4: Architecture of OLE DB [9]
5. Results indicate that the majority of one of respondents disagreed with the
respondents agreed with the statement view that displaying pictures with
(66%) and the other 27% indicated that it pronunciation is perceived as helpful for
sometimes helps while only 7% of
autistic children.
respondents disagreed with the view that
technology is perceived as helpful for
vi. The sixth question explored whether
autistic children.
sounds and songs help in developing the
ii. The second question explored whether communication skills of autistic children.
it’s possible to develop speech and Results indicate that the majority of
pronunciation skills for autistic children respondents agreed with the statement
by using speech based technologies. (80%) and the other 20% indicated that it
Results indicate that the majority of
sometimes helps while no one of
respondents agreed with the statement
(66%) and the other 27% indicated that it respondents disagreed with the view that
sometimes helps while only 7% of sounds and songs are perceived as helpful
respondents disagreed with the view that for autistic children.
speech based technologies are seem as
beneficial for autistic children. vii. The seventh question explored whether
playing puzzle game -which is the most
iii. The third question explored whether it’s
popular game in the autism world- using
possible to further enhance the speech
speech is contribute in developing speech
skills of autistic children by a continuous
and pronunciation skills of autistic
training during the day with their
children. Results indicate that the
specialist at the centre and their parents at
majority of respondents agreed with the
home. Results indicate that the majority
statement (67%) and the other 20%
of respondents agreed with the statement
indicated that they didn't know while 13%
(87%) and the other 13% indicated that it
of respondents disagreed.
sometimes helps while no one of
respondents disagree with the view. b. Open question:
- How can communication skills of autistic
iv. The fourth question explored whether the children be developed using computers?
communication unit in autism centers • By designing sites online as well as programs
using speech based software. Results for autistic children such as: movies cartoon.
indicate that the majority of respondents • By offering appropriate educational programs
have built-in: the voice and image, and the
agreed with the statement (40%) and the
gradient in these programs, commensurate
other 33% indicated that they didn't know with the child.
while only 27% of respondents disagreed • The program should be not deaf and not
with the view that speech based speaking with a powerful psychotropic.
technologies are used in the unit of • By viewing pictures with their own names,
communication in their autism centers. and say their name clearly, and make the child
repeat the word for the picture, and ask the
v. The fifth question explored whether it’s child about them and correct the mistakes
possible to enrich the vocabulary of he/she does.
• Through the design of educational programs
autistic children by displaying pictures
using names, sounds and forms of animals,
with pronunciation of their reference at birds, fruits and other things around the child.
the same time. Results indicate that the • Through the establishment of programs
majority of respondents agreed with the animated by voices, graphical images,
statement (93%) and the other 7% simulations and also make sounds, graphics,
indicated that they didn't know while no combined with each other, the civil aviation
pictures of things in the child's daily life
6. integrated with voices familiar voices of the 2011), from: http://msdn.microsoft.com/en-
children of parents at home. us/library/ms720151(VS.85).aspx
• Insert voices of teachers, showing pictures,
[4] MSDN Library. SAPI Overview. msdn.microsoft.com.
and record the sounds of their parents. Retrieved ( Sep 12, 2011), from:
• Use of programs for pronunciation, and http://msdn.microsoft.com/en-us/library/ms862685.aspx
speech recognition.
• Flash programs supported with pictures and [5] MSDN Library. Speech Recognition API and DDI.
pronunciation of names is an appealing way. msdn.microsoft.com. Retrieved ( Sep 10, 2011), from:
• Using the link between the images with the http://msdn.microsoft.com/en-us/library/ms862708.aspx
word lyrics (meaning) and there should be
[6] MSDN Library. Speech Synthesis API and DDI.
such simple songs to be fun for autistic
msdn.microsoft.com. Retrieved ( Sep 5, 2011), from:
children. http://msdn.microsoft.com/en-us/library/ms862709.aspx
5. Conclusion [7] MSDN Library. TTS Engine Vendor Porting Guide.
msdn.microsoft.com. Retrieved (Sep 7, 2011), from:
http://msdn.microsoft.com/en-
This project is about using speech technology to
us/library/ms717037(v=vs.85).aspx
create a multimodal application which is essential for
enabling interaction with human-machine interfaces for [8] MSDN Library. Microsoft OLE DB.
people with cognitive impaiments. With this msdn.microsoft.com. Retrieved (Sep 7, 2011), from:
technology, we will enable the system to respond to http://msdn.microsoft.com/en-
spoken input as well as traditional methods. With us/library/ms722784(v=VS.85).aspx
voice-enabled feature, we can help not only regular
users that use keyboard or mouse, but also enable [9] MSDN Library. OLE DB Providers Overview.
msdn.microsoft.com. Retrieved (Sep 6, 2011) from:
people with some disabilities like Asperger Syndrome http://msdn.microsoft.com/en-
to easily connect to website and download the games. us/library/ms709836(VS.85).aspx
So, the key contribution of this system is a more
intuitive interactive application for our target [10] MSDN Library. OLE DB Provider for SQL Server.
population which reduces some accessibility obstacles. msdn.microsoft.com. Retrieved (Sep 6, 2011), from:
http://msdn.microsoft.com/en-
us/library/aa213282(SQL.80).aspx
6. Acknowledgments
We acknowledge the support of the College of
Computer Science in King Saud University and the
support of the Information Technology Department.
We would like to express our sincerest thanks to our
advisor, Dr. Souham Meshoul, for her guidance and
support to achieve this project. Special thanks to
Dr.Areej Al-Wabil for reviewing a draft of this
manuscript.
7. References
[1] Rea, S. M. (2005). Building Intelligent .NET
Application. Amsterdam, Addison-Wesley.
[2] MSDN Library. Microsoft Speech Technologies
Developer Center, Microsoft Corporation.
msdn.microsoft.com. Retrieved ( Sep 5, 2011), from:
http://msdn.microsoft.com/en-us/speech/default.aspx
[3] MSDN Library. Microsoft Speech API 5.3 Speech API
Overview. msdn.microsoft.com. Retrieved ( Sep 12,