Text and Speech Analysis

Natural
Language
Processing
Politecnico di Milano
Polo di Como
Prof. Licia Sbattella
--Student: Lorenzo Monni Sau
Matr.: 771378
AA 2012/2013

Assignment: Text & Speech Analysis

Indice generale
1. Introduction: Goals of the Assignment and used tools................................................................2
2. Choice of the dialogue and text to speech alignment with SPPAS..............................................3
3. Editing the dialogue tiers in Praat and writig a Script for Processing.........................................4
4. POS Tagging................................................................................................................................5
5. Semantic Analysis with JWNL....................................................................................................5
6. Results and main statistics...........................................................................................................5
7. Conclusions..................................................................................................................................7
8. Appendix: Lines of Code. ...........................................................................................................8

1. Introduction: Goals of the Assignment and used tools
The objective of this work is to provide a complete analysis of a piece of conversation,
carrying out the following features:
•

phonologic features of dialogue and a brief statistical analysis;

•

A subdivision in dialogue acts using the DAMSL model;

•

the POS tagging of the dialogue;

•

a brief Semantic Analysis;

•

a Graphical Representation of the results.

Given these goals, the first step has been the choice of the right dialogue for the purpose of
analysis. The audio file of the dialogue together with the written transcription was taken as
input to SPPAS (Automatic Phonetic Annotation of Speech), which is a tool for operations
of alignment between audio and text, with tokenization and phonetization features.
The result of SPPAS analysis got the text aligned with the audio file and it was used as
input to PRAAT, which is a tool to capture audio features of speech such as Pitch,
Intensity and Formants. The alignment was manually edited in Praat to provide the best
match between transcription and audio, and then a Praat script was created to append
some audio features and further annotations to the words in the .txt file.
The POS Tagging part of the project was carried out by using the POS Tagger of the
Stanford University. After this phase the txt with the data looked like a table with audio,
dialogue and syntactic features associated with each word of the conversation.
The last part of the project involved the semantic analysis of dialogue, leveraging the
JWNL java library to query the WordNet lexical database.
Graphical results has been made importing the final .txt file in Microsoft Excel.

2. Choice of the dialogue and text to speech alignment with SPPAS
The choice of the suitable dialogue for the analysis was probably the hardest step in the
assignment, due to the constraints given by the SPPAS limited capabilities of processing.
My first idea was to get an artistically relevant dialogue, so I started with an excerpt from
the film Eyes Wide Shut by Stanley Kubrick, and I tried to get the best results in terms
of alignments.
SPPAS (version 1.4.8) doesn't perform so well with
•

audio files longer than 2 minute;

•

excerpts of films, which usually show a relevant background noise;

•

realistic and natural dialogues, due to superpositions of more voice, non-words

phonemes and other imperfections.
The Bill and Victor Dialogue had both these three characteristics, so it was almost
impossible to obtain a sufficient result in the alignment, even for a following editing
provided in Praat. I tried to remove some noise and underline only the speech parts of the
audio file using a simple matlab script (See appendix for code), but it didn't work.
The second attempt was the dialogue from the italian film Il Divo by Paolo Sorrentino,
in which the speech seemed more clear and fluid than the previous. SPPAS also allows
processing of italian language dialogues. Unfortunately this audio file showed the same
drawbacks of the previous, though I also tried to divide processing in shorter fragments of
the audio file, as you can see in the folder.
The last attempt was for a linear english educational dialogue between two girls, which
worked really good for SPPAS processing. Despite his simpleness and linear dialogue
interaction, it had a good level of emotive speaking and it was enough expressive for the
purpose of the assignment.
To enable a correct alignment with SPPAS I put in the .txt file also the the hashes to signal
the moments of pause in the dialogue. This is another limit of SPPAS, since without the
silence tracing in the .txt it couldn't provide a precise alignment. The resulting files are
shown in the folder of project “SPPAS Processing”.

3. Editing the dialogue tiers in Praat and writig a Script for
Processing
Since the process of alignment in SPPAS was not precise, a further editing in Praat was
needed, moving boundaries and tokens in the right positions when needed. The results of
this editing were saved in the TextGrid file “dialogue-flat-phon_palign”, in the folder
“Editing in Praat”.
Two more tiers have been added in the TextGrid file, indicating the class of dialogue act
(using the theory of dialogue acts classifcation proposed in DAMSL model) and the
speaker.
The final TextGrid file featured the following tiers:
•

PhonAlign Tier;

•

PhnTokAlign Tier;

•

TokensAlign Tier;

•

DialogueAct Tier;

•

Speaker.

In the consequent phase I passed from the Praat Editor View to the Praat scripting
language, to extract required audio features associated to each word token in the dialogue.
The Praat Script “features.praat” takes the Wave file and the TextGrid file as input and
produces a txt file which shows:
•

Word token;

•

Mean Pitch of token;

•

Mean Intensity of token;

•

DialogueAct;

•

Speaker.

The results were saved in the .txt file “conversation-audio” in the folder “Editing in Praat”.

4. POS Tagging
To come up with the part-of-speech tagging of each word in the dialogue the tool
Stanford POSTAGGER was used (version 3.2.0). The result of the tagging operation has
been stored in the file “conversation-tagged.txt”. A pretrained model has been used to
assign part of speech tags to unlabeled text, the adopted model was “wsj-0-18-left3wordsdistsim”, included in the package of the Stanford-postagger.
After the POS-tagging processing I noticed some mistakes of the tagger, i.e. some noun
terms were recognized as verbs and viceversa, but the majority of words had the right tag.

5. Semantic Analysis with JWNL
JWNL is a Java API (Application Programming Interface) to access and query WordNet
database. In this context JWNL was used to find the domains of each word token. I used
version 2.0 of WordNet, version 1.4 of JWNL and Eclipse as IDE with Java 1.7 SDK and
JRE 7 (Java Runtime Environment).
To find the domains of each token I leveraged the CATEGORY pointer type, and when no
related domains were found I wrote a function which recorsively search the root
hypernym. The Java Project reads as .txt input file “conversation-tagged” in the folder
“POS tagging”, and writes the .txt file “dialogue-audio-pos-domains” as output file.
One issue in this operation was due to the fact that the CATEGORY pointer didn't work for
so many tokens, and recursive search for hypernyms returned base classes like “entity” or
“abstraction”, too general for the purpose of a semantic domain search.
The final results of all processing are stored in the excel file “Dialogue Data” and in the flat
.txt file “dialogue-audio-pos-domains-def”.

6. Results and main statistics
Data of dialogue analysis were all imported in the excel file “Dialogue Data”, which include
four different sheets:
–

General Data: table with all fields and values;

–

Speaker Pitch-Intensity: Pitch & Intensity Data and graphics;

–

Dialogue Acts: Analysis of Dialogue Acts;

–

Domains: Analysis of Domains.

In the analysis non-word utterances were not taken into account since there is only a notword token in the conversation.

Pitch Trend By Speaker
600,00
500,00
Pitch (Hz)

400,00
300,00
200,00
100,00
0,00
1

5

9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81
Token Number
Amanda

Karen

Intensity Trend By Speaker

90,00
80,00
Intensity (dB)

70,00
60,00
50,00
40,00
30,00
20,00
10,00
0,00
1

5

9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81
Token Number
Amanda

Karen

7. Conclusions
Due to the difficulties in SPPAS processing, the chosen dialogue is a very simple type of
conversation, so the DAMSL analysis and the domain analysis did not show sensitive
results. The topic of conversation is general, so there is not a particular trend in semantic
domains of word tokens. The conversation is equally distributed such that the two speakers
have almost the same number of tokens. The conversation shows slight variations in pitch
and the fundamental frequency of Amanda's voice is quite different than Karen's, showing
the different timber of the two speakers, though always maintaining a pitch in the range of
common female values. In average pitch results there is a significant pitch outlier
associated to the Amanda's expression “on friday”: the values of 97 and 107 Hz sound a
little bit irrealistics if associated to female voice. The average intensity of tokens underlines
that the volume of dialogue remains constant during the conversation, there's not softly
speaking and the two speakers talk at the same volume (only 2 dB of difference).
The PRAAT analysis is probably the most reliable analysis together with POS tagging,
whereas the analysis carried out with JWNL shows evident limits in recognizing the correct
domains of speech. Most of the domains found are clearly wrong if associated to the kind
of dialogue, and the reason relies upon the fact that a knowledge of the context in which
word token resides should be mandatory to reach the right semantic domain.
The kind of conversation between Amanda and Karen is a Q & A conversation, so it's not a
surprise that a high percentage of dialogue acts falls in the Answer and Info-Request types.
More pleasant expressions seems to have higher level of pitch and intensity, whereas
action-directive, open-options and offers show a lower pitch and sometimes lower
intensity, meaning that when the speaker launches a proposal wants probably to give a
feeling of modesty, to avoid the feeling of an imposition.

8. Appendix: Lines of Code.
MATLAB CODE
function [y_n] = remove_noise(y,win_len,mean_val, atten)
% This functions performs a background noise attenuation, provided that the
% loudness difference between noise and original signal is high enough.
%
y = signal with noise
%
win_len = frame length to calculare noise impact
%
mean_val = threshold which discriminates between noise and signal
%
atten = attenuation value to cut noise
for n = 1:(length(y)-win_len)
if (sum(abs( y(n:(n+win_len-1) ) )) < mean_val*win_len &
max(abs(y(n:n+win_len-1)))< mean_val)
for m = n:n+win_len-1
y(m) = y(m)*atten;
end
end
end
y_n = y;
end

PRAAT CODE
##### Script to extract features for each token #####
##print columns of the table##
echo Token

MeanPitch Intens.

DialogueAct

select all
#sound file & TextGrid file to be analyzed#
s = selected("Sound")
tg = selected("TextGrid")
select tg
numIntervals = Get number of intervals... 3
### calculate Pitch and Intensity of Speech ###
select s
To Pitch... 0.0 75 600
select s
To Intensity... 75 0.0
plus Pitch dialogue-flat

Speaker

pitch = selected ("Pitch")
intensity = selected("Intensity")
space$ = " "
for cont from 1 to numIntervals
select TextGrid dialogue-flat-phon_palign

token$ = Get label of interval... 3 cont
tstart = Get starting point... 3 cont
tend = Get end point... 3 cont
dialogueActNum = Get interval at time... 4 tstart+0.01
dialogueAct$ = Get label of interval... 4 dialogueActNum
speakerNum = Get interval at time... 5 tstart+0.01
speaker$ = Get label of interval... 5 speakerNum
# for each not-silence token extract mean pitch & mean intensity #
if !startsWith (token$, "#")
select pitch
pitchMean = Get mean... tstart tend Hertz

select intensity
intensityMean= Get mean... tstart tend dB
### configure layout ###
lenStr = length(token$)
spaceNum = 15 - lenStr
print 'token$'
for lung from 1 to spaceNum
print 'space$'
endfor

print 'pitchMean:2'

'intensityMean:2'

lenStr2 = length(dialogueAct$)
spaceNum2 = 20 - lenStr2
### configure layout ###
print 'dialogueAct$'

for lung from 1 to spaceNum2
print 'space$'
endfor
print 'speaker$'

printline
endif
endfor
### Save data in txt file ###
appendFile ("conversation-audio.txt", info$ ())

JWNL CODE
package wordnet;
import java.io.*;
public class WordSem {
public static void main(String[] args) throws JWNLException, IOException,
JWNLRuntimeException {
// Initialize JWNL with the properties file to point to dictionary files
JWNL.initialize(new FileInputStream("file_properties.xml"));
// Dictionary object
Dictionary wordnet;
//After initialization create a Dictionary object that can be queried
wordnet = Dictionary.getInstance();
// read text file and extract words to be searched on WordNet
String read_path = "D:Ultimo semestreNatural Language
ProcessingASSIGNMENTconversationPOS taggingconversation-tagged.txt";
//Open file reader stream (will read file with POS Tagging)
FileReader fr = new FileReader(read_path);
BufferedReader br = new BufferedReader(fr);
//Open file writer stream (will write txt file with "Token POS Domain"

// lines for each token
String write_path = "D:Ultimo semestreNatural Language
ProcessingASSIGNMENTconversationdialogue-audio-pos-domains.txt";
File file = new File(write_path);
FileWriter file_write = new FileWriter(file);
String read_linea = ""; //line string variable, read line from sourcefile
String wordn = "";
//takes token words from source file
String word_POS = "";
// takes POS tags from source file
POS wnPOS;
// POS tag in WordNet format
String strdomain = "";
//takes domain string related to word token
// While there are lines in source file take word token and POS tag
while(true)
{
read_linea = br.readLine();
if(read_linea==null)
break;
String [] splits = read_linea.split("_"); //this is separator
between word and tag in source file
wordn = splits[0];
System.out.println(wordn);
word_POS = splits[1];
System.out.println(word_POS);
//begin write line in output txt file
StringBuilder write_appnd = new StringBuilder();
write_appnd.append(wordn)
.append(" ")
.append(word_POS)
.append(" ");
// translate from POS tag to WordNet word type
wnPOS = getWordNetPOS(word_POS);
//WordNet analysis: will check for word domain, and for hypernyms
if (wnPOS != null && wordn != null)
{
//An IndexWord is a single word and part of speech. Lookup a
SynSet object.
IndexWord w = wordnet.lookupIndexWord(wnPOS, wordn);
if (w != null)
{
Synset[] senses = w.getSenses();
int domainlen = senses.length;
Pointer[] domain = new Pointer[domainlen];
for (int i=0; i<senses.length; i++)
{
// CATEGORY is the pointer type for the domains
domain =
senses[i].getPointers(PointerType.CATEGORY);
Synset[] syndomain = new Synset[domain.length];
for (int l=0; l<domain.length; l++)
{
//obtain synset from domain and then an
associated word string
syndomain[l] =
domain[l].getTargetSynset();
Word rootWord = syndomain[l].getWord(0);
strdomain = rootWord.getLemma();
// add to outputtxt file
write_appnd.append(strdomain);

}
}
//get to root hypernym
if (wnPOS == POS.NOUN)
{
strdomain = getRootHypernym(w);
write_appnd.append(strdomain);
}
}
}
//finish to write line, and then skip to another
write_appnd.append("rn");
String write_linea = write_appnd.toString();
file_write.write(write_linea);
}
file_write.close();
br.close();
}
//translate from POS tag to WordNet word type
public static POS getWordNetPOS(String wPOS)
{
POS wordNetPos;
switch (wPOS)
{
case "NN": case "NNS": case "NNP": wordNetPos = POS.NOUN; break;
case "VB": case "VBD": case "VBG": case "VBN": case "VBP": case
"VBZ": wordNetPos = POS.VERB; break;
case "JJ": case "JJR": case "JJS": wordNetPos = POS.ADJECTIVE;
break;
case "RB": case "RBR": case "RBS": wordNetPos = POS.ADVERB; break;
default: wordNetPos = null;
}
return wordNetPos;
}
// search for root hypernym
public static String getRootHypernym(IndexWord synsetw) throws JWNLException
{
String stringdomain ="";
Synset syndomain = null;
Synset[] senses = synsetw.getSenses();
int domainlen = senses.length;
Pointer[] domain = new Pointer[domainlen];
for (int i=0; i<senses.length; i++)
{
domain = senses[0].getPointers(PointerType.HYPERNYM);
if (domain.length > 0)
{
syndomain = domain[0].getTargetSynset();
while(syndomain.toString() != null)
{
domain =
syndomain.getPointers(PointerType.HYPERNYM);
if (domain.length > 0) syndomain =

domain[0].getTargetSynset();
else break;
}
}
}
Word rootWord = syndomain.getWord(0);
stringdomain = rootWord.getLemma();
System.out.println(stringdomain);
return stringdomain;
}

}

Text and Speech Analysis

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a Text and Speech Analysis

Semelhante a Text and Speech Analysis (20)

Último

Último (20)

Text and Speech Analysis