Intelligent Systems - Predictive Analytics Project

Fall 2016, Department of Computer and Information Science, IUPUI
Prediction Of Schizophrenia from Speech Analysis of individuals
Priyanka Ahire Shreya Chakrabarti Yash Agrawal
Abstract –Schizophrenia is a mental disorder of a
type involving a breakdown in the relation between
thought, emotion, and behavior, leading to faulty
perception, inappropriate actions and feelings,
withdrawal from reality and personal relationships
in to fantasy and delusion, and a sense of mental
fragmentation. Schizophrenia is a disease which
cannot be cured but treatment might help
someway. It can last lifelong. The objective of this
project is to analyze the schizophrenic dataset and
determine the features from which it is easy to
conclude that the patient is schizophrenic. Various
methods are implemented and compared the
results but Logistic regression is the best fit for this
situation.
Keywords – Logistic Regression, Best fit, Random
Forest, OneR, Gaussian Naïve Bayes, Decision Tree
I. INTRODUCTION
Schizophrenia is a mental disorder. People convey
meaning by what they say as well as how they say it:
Tone, word choice, and the length of a phrase are all
crucial cues to understanding what’s going on in
someone’s mind. When a psychiatrist or psychologist
examines a person, they listen for these signals to get
a sense of their wellbeing, drawing on past experience
to guide their judgment. [2]
A similar approach is applied here using machine
learning concept such as diffrerent Classification
algorithms.
This project represents an overview of Analysis of
Schizophrenic dataset using Logistic regression.
Logistic regression is an appropriate regression
analysis to conduct when the dependent variable is
binary (dichotomous). Like all regression analyses, the
logistic regression is a predictive analysis. Logistic
regression is used to describe data and to describe
relationship between dependent variable and one or
more interval or ratio scale independent variable. [3]
Implementing analysis of Schizophrenic dataset is
complex because of the limited dataset. The dataset
consists of speech data of the person who is
schizophrenic and the person who is healthy over a
period of two days. Challenge involved in the analysis
process was that the dataset provided was not large
enough. The results from the Logistic Regression
classification are compared with Random Forest,
Decision Tree and OneR algorithm results.
II. LITERATURE REVIEW
Analysis of the speech dataset is an important research
area in the field of speech classification. The research
poses to be extremely challenging. There are several
popular theories for speech classification such as
Motor theory [2], TRACE model [4,5], cohort
model[6], Fuzzy logical model[4]
Motor Theory- The Motor theory was proposed by
Liberman and Cooper [2] in the 1950s. The Motor
theory was developed further by Liberman et al[1,2].
In this theory, listeners were said tointerpret speech
sounds in terms of the motoric gestures they would use
to make those same sounds.
TRACE Model- The TRACE model[5] is a
connectionist network with an input layer and three
processing layers: pseudo-spectra, phoneme and word.
There are three types of connection in TRACE model.
The first connection type is feedforward excitatory
connections from input to features, features to
phonemes and phonemes to words. The second
connection type is lateral inhibitory connections at the
feature, phonemenon word layers. The last connection
type is top-down feedback excitatory connections
from words to phonemes.
Cohort Model- The original Cohort model was
proposed in 1984 by Wil-son et al[6]. The core idea
at the heart of the Cohort model is that human speech
comprehension is achieved by processing incoming
speech continuously as it is heard. At all times, the
system computes the best interpretation of currently
available input combining information in the speech
signal with prior semantic and syntactic context.
Fuzzy Logic Model- The fuzzy logical theory of
speech perception was developed by Massaro[4]. He
proposes that people remember speech sounds in a
probabilistic, or graded, way. It suggests that people
remember descriptions of the perceptual units of
language, called prototypes. Within each prototype,
various features may combine. However, features are
not just binary, there is a fuzzy value corresponding to

how likely it is that a sound belongs to a particular
speech category. Thus, when perceiving a speech
signal our decision about what we actually hear is
based on the relative goodness of the match between
the stimulus information and values of particular
prototypes. The final decision is based on multiple
features or sources of information, even visual
information.
Signal Modelling- In 2001, Karnjanadecha[22]
proposed signal modeling for high performance and
robust isolated word recognition. In this model, HMM
was used for classification. The recognition accuracy
rate of this experiment was 97.9% for speaker-
independent isolated alphabet recognition. When
adding Gaussian noise (15 dB) or testing like
telephone speech simulation, the recognition rates
were 95.8 and 89.6%, respectively.
Time extended features Model- In 2004, Ibrahim[23]
presented a technique to overcome the confusion
problem by means of time-extended features.He
expanded the duration of the consonants to gain a high
characteristic difference between confusable pairs in
the E-set letters. A continuous density HMM model
was used as the classifier. The best recognition rate
was only 88.72%.Moreover, the author did not test on
any noisy speech.
CNN- In 2015, Palaz at al. used CNN for continuous
speech recognition using raw speech signal [17]. They
extended the CNN-based approach to large vocabulary
speech recogni-tion problem and compared the CNN-
based approach against the conventional ANN-based
approach on Wall Street Journal corpus. They also
showed that the CNN-based method achieves better
performance in comparison with the conventional
ANN-based method as many parameters and features
learned from raw speech by the CNN-based approach
could generalize across different databases.
Pretrained, deep neural networks Model- In 2009,
Mohamed et al. tried using pre-trained, deep neural
networks as part of a hybrid monophone DNN–HMM
model on TIMIT, a small-scale speech task [25], and
in 2012, Mohamed et al. were the first to succeed in
pre-trained DNN–HMMs on acoustic modeling with
varying depths of networks [26,27]. In 2013,
Bocchieri and Tuske succeeded in using DNN for
speech recognition for large vocabulary speech tasks
[28,29].
Sound Event Classification Model- In 2011,
Jonathan developed a model for Sound event
classification in mismatched conditions [24]. In this
model,they developed a nonlinear feature extraction
method which first maps the spectrogram into a higher
dimensional space, by quantizing the dynamic range
into different regions, and then extracts the central
moments of the partitioned monochrome intensity
distributions as the feature of sound.
III. METHODOLOGY
Random Forest: Random forest is a concept of
collective learning technique for classification and
regression that work by building a huge number of
decision trees during training time and yielding the
class that is a kind of grouping or mean expectation of
the individual trees [21].
Decision Tree: Decision trees are non-parametric
supervised learning method used for classification and
regression. The main aim of decision tree is to create
a model that predicts the value of a target variable by
learning simple decision rules inferred from the data
[20].
OneR: OneR, short for "One Rule", is a simple, yet
accurate, classification algorithm that generates one
rule for each predictor in the data, then selects the rule
with the smallest total error as its "one rule". To create
a rule for a predictor, we construct a frequency table
for each predictor against the target. It has been shown
that OneR produces rules only slightly less accurate
than state-of-the-art classification algorithms while
producing rules that are simple for humans to
interpret.[10]
Naïve Bayes Classifier: Since speech recognition
is a multiclass classification problem and Naive Bayes
classifiers can handle multiclass classification
problems, it is also used here for classifying the digits.
Naive Bayes classifier is based on the Bayesian theory
which is a simple and effective probability
classification method. This is a supervised
classification technique. For each class value it
estimates that a given instance belongs to that class [6].
The feature items in one class are assumed to be
independent of other attribute values called class
conditional independence [7]. Naive Bayes classifier
needs only small amount of training set to estimate the

parameters for classification. The classifier is stated as
P(A|B) = P (B|A) * P (A)/P(B) (7)
Where P(A) is the prior probability of marginal
probability of A, P(A|B) is the conditional probability
of A, given B called the posterior probability, P(B|A)
is the conditional probability of B given A and P(B) is
the prior or marginal probability of B which acts as a
normalizing constant. The probability value of the
winning class dominates over that of the others [8].
SVM: SVM is a very useful technique used for
classification. It is a classifier which performs
classification methods by constructing hyper planes in
a multidimensional space that separates different class
labels based on statistical learning theory [7][8].
Though SVM is inherently a binary nonlinear
classifier, we can extend it to multiclass classification
since ASR is a multiclass problem. There are two
major strategies for multiclass classification namely
One-against-All [7] and One-against-One or pair wise
classification [9]. The conventional way is to
decompose the M-class problem into a series of two-
class problems and construct several binary classifiers.
In this work, we have used One-against-One method
in which there is one binary SVM for each pair of
classes to separate members of one class from
members of the other. This method allows us to train
all the system, with a maximum number of different
samples for each class, with a limited computer
memory [12].
Logistic Regression: Logistic regression was first
proposed in the 1940s as an alternative technique to
overcome limitations of ordinary least squares (OLS)
regression in handling dichotomous outcomes.[16]
Logistic regression measures the relationship between
the categorical dependent variable and one or more
independent variables. In logistic regression, the
dependent variable is binary or dichotomous, i.e. it
only contains data coded as 1 (TRUE, success,
Schizophrenic, etc.) or 0 (FALSE, failure, Healthy,
etc.).The goal of logistic regression is to find the best
fitting (yet biologically reasonable) model to describe
the relationship between the dichotomous
characteristic of interest (dependent variable =
response or outcome variable) and a set of independent
(predictor or explanatory) variables. Logistic
regression generates the coefficients (and its standard
errors and significance levels) of a formula to predict
a logit transformation of the probability of presence of
the characteristic of interest:[30]
where p is the probability of presence of the
characteristic of interest. The logit transformation is
defined as the logged odds:
And
IV. IMPLEMENTATION
1. Dataset: The dataset was collected by Department
of Psychology. They have collected the speech
samples from of the individuals who are schizophrenic
and the healthy individual over a period of two day.
All the values mentioned in the dataset are in
percentage form.
The dataset consist of two files:
1.1 Full: This file contains all the data from subjects
across two days. The data has been collected from 15
individuals. Some of them are schizophrenic and few
are healthy.
1.2 Individual: This file contains speech data from
subjects at individual times. This data is collected
across 15 individuals recorded at different times of the
day over the period of 2 days.
The dataset consist of 88 attributes in total. The group
attribute decides if the person is Schizophrenic or
Healthy.(1 –Schizophrenic , 0 – Healthy). The dataset
has been recorded from individuals if they spoke more
than 50 words at a particular time.
2. Logistic Regression is applied to the dataset as the
data is in the form given in Fig 1.
Logistic Regression is best fit for the dataset as the
already has binary classification in the form of healthy
individual and schizophrenic individual (0- Healthy,
1- Schizophrenic)

Fig. 1 Structure of data
The dataset is divided in to 4 data frames. The features
are:
a. Cognitive Processes
b. Pronoun
c. Emotions
d. Social
Fig.2 Distribution of dataset in different features
The logistic regression is performed on each of the
data frames predicting how likely a person with a
particular emotion is to develop schizophrenia.
3. As all the attributes in the dataset are independent
of each other, a Naïve Bayes is implemented and tested
the results.
4. A training set and testing set is created from the
dataset.
a. Training Set- Training set is the data set on which
your model is built. Training set is usually manually
written and your model follows exactly the same rules
and definitions given in the training set.
b. Testing Set- Test set is the data set on which you
apply your model and see if it is working correctly and
yielding expected and desired results or not.
A model is created from the training set and the results
are computed and the model is then applied on testing
data to check whether it is working correctly.
V. RESULTS AND DISCUSSION
1. Results from Logistic Regression:
1.1 Results of Logistic Regression on Emotions
Data frame:
Fig.3 Result on emotion data frame.
1.2. Result of Logistic Regression on Pronouns
Data Frame:

Fig.4 Result on Pronouns Data Frame
1.3. Result of Logistic Regression on Social Data
Frame:
Fig.5 Result on Social Data Frame
1.4. Result of Logistic Regression on Cognitive
Data Frame:
Fig. 6 Result on Cognitive Data Frame
2. Results from Gaussian Naïve Bayes:
Fig. 7 Results from Naïve Bayes approach
3.Results from Random Decision Forests:
Fig. 8 Result after running data on Random Decision Forest.
4. Results from Random Tree:
Fig. 9 Result after running data on Random Tree.
5. Results from OneR algorithm:
Fig. 10 Result after running data on OneR algorithm.
VI. CONTRIBUTION
It is a collaborative work done between Shreya and
Priyanka. Shreya has worked on implementation of
different models and collection of results and also
seeked feedback from the Professor after the final
Presentation. She also has created Presentation.
Priyanka has collected all the datasets from Professor,
generated training and testing sets. Priyanka also
gathered the information from presentation, Literature
Survey and constructed a final report. Yash has no
contribution to this project.
VII. CONCLUSION
The best suited algorithm for the given dataset is a
regression model as the dataset provided is already
divided into Binary format(i.e. 0- Healthy,1-
Schizophrenic). Regression tree based algorithm
(Random Decision Trees) are best used when
dependent variable is continuous. Rule-based
algorithms are best suited if there is a set of IF-THEN

rules for classification. Emotions is the best feature
observed as it gives the desired accuracy among the
other features.(i.e. >=80%)
VIII. FUTURE SCOPE
The future scope is to implement Support Vector
Machine as Logistic Regression is Suitable.
Implement Regularization in statistics to improve the
logistic regression model. The large dataset is
expected in upcoming days then more correct results
are expected.
IX. REFERENCES
1. Liberman, A.M., Cooper, F.S., Shankweiler, D.P.,
Studdert-Kennedy, M.: Perception of speech
code. Psychol. Rev.74,431–461 (1967)
2. Liberman, A.M., Mattingly, I.G.: The motor
theory of speech perception revised. Cognition21,
1–36 (1985)
3. Cole, R., Fanty, M.: ISOLET (Isolated Letter
Speech Recognition),Department of Computer
Science and Engineering, September 12(1994)
4. Massaro, D.W.: Testing between the TRACE
Model and the Fuzzy Logical Model of Speech
perception. Cognitive Psychology, pp.398–421
(1989)
5. McClelland, J.L., Elman, J.L.: The TRACE
model of speech perception. Cognitive
Psychology (1986)6. Wilson, W., Marslen, M.:
Functional parallelism in spoken word
recognition. Cognition 25, 71–102 (1984)
6. Economou K., Lymberopoulos D., 1999. A New
Perspective in Learning Pattern Generation for
Teaching Neural Networks, Volume 12, Issue 4-
5, 767-775.
7. V.N. Vapnik., Statistical Learning Theory, J.
Wiley, N.Y., 1998.
8. N. Cristianini, J. Shawe-Taylor., An introduction
to Support Vector Machines, Cambridge
University Press, Cambridge, U.K., 2000.
9. Ulrich H.-G. Kreßel., Pairwise Classification and
Support Vector Machines, Advances inKernel
Methods Support Vector Machine Learning,
Cambridge, MA, MIT press, pp. 255-268, 1999.
10. http://www.saedsayad.com/oner.html
11. PERFORMANCE OF DIFFERENT
CLASSIFIERS IN SPEECH RECOGNITION
Sonia Suuny1 , David Peter S2 , K. Poulose
Jacob3
12. C.W. Hsu, C.J. Lin, A Comparison of Methods for
Multi-class Support Vector Machines. IEEE
Transactions on Neural Networks, 13(2), pp. 415–
425, 2002.
13. Logistic regression, Newsom, Data analysis 2,
Fall 2015.
14. http://scikitlearn.org/stable/modules/tree.htm
15. https://en.wikipedia.org/wiki/Random_forest
16. Logistic Regression, Chao -Ying Joanne Pen
Indiana University-Bloomington
17. Palaz, D., Magimai, M., Collobert, R.:
Convolutional neural networks-based continuous
speech recognition using raw speech signal. In:
ICASSP (2015)
18. Loizou, P.C., Spanias, A.S.: High-performance
alphabet recognition. IEEE Trans. Speech Audio
Proc.4, 430–445 (1996)
19. Cole, R., Fanty, M., Muthusamy, Y.,
Gopalakrishnan M.: Speaker-independent
recognition of spoken english letters. In:
International Joint Conference on Neural
Networks (IJCNN), pp. 45–51 (1990)
20. Cole, R., Fanty, M.,: Spoken letter recognition. In:
Presented at the Proceedings of the conference on
advances in neural information processing
systems Denver, Colorado, United States (1990)
21. Fanty, M., Cole, R.: Spoken Letter Recognition.
In: Presented at theProceedings of the conference
on advances in neural information processing
systems Denver, Colorado, United States (1990)
22. Karnjanadecha, M., Zahorian, S.A.: Signal
modeling for high-performance robust isolated
word recognition. IEEE Trans. Speech Audio
Proc.9, 647–654 (2001)
23. Ibrahim, M.D., Ahmad, A.M., Smaon, D.F.,
Salam M.S.H.: Improved E-set recognition
performance using time-expanded features. In:
Presented at the second national conference on
computer graphics and multimedia
(CoGRAMM), Selangor, Malaysia(2004)
24. Jonathan, D., Da, T.H., Haizhou, L.: Spectrogram
Image feature for sound event classification in
mismatched conditions. In: IEEE Signal
Processing letters, pp. 130–133 (2011 )
25. Mohamed, A.R., Dahl, G.E., Hinton, G.E.: Deep
belief networks for phone recognition. In: NIPS
workshop on deep learning for speech recognition
and related applications (2009)
26. 26.Mohamed,A.,Dahl,G.,Hinton,G.:“Acousticmo
delingusingdeep belief networks. In: IEEE Trans.
Speech, & Language Proc, Audio(2012)

27. 27.Mohamed,A.,Hinton,G.,Penn,G.:Understandi
nghowdeepbelief networks perform acoustic
modelling. In: Proc. ICASSP (2012)
28. Bocchieri, E., Dimitriadis, D.: Investigating deep
neural network k based transforms of robust audio
features for lvcsr. In: ICASSP(2013)
29. Tuske, Z., Golik, P., Schluter, R., Ney, H.:
Acousticmodelingwithdeepneuralnetworksusingr
awtimesignalforlvcsr.In:Interspeech(2014)
30. https://www.medcalc.org/manual/logistic_regres
sion.php

Intelligent Systems - Predictive Analytics Project

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (19)

Destaque

Destaque (20)

Semelhante a Intelligent Systems - Predictive Analytics Project

Semelhante a Intelligent Systems - Predictive Analytics Project (20)

Mais de Shreya Chakrabarti

Mais de Shreya Chakrabarti (11)

Intelligent Systems - Predictive Analytics Project