5 March 2010 (Friday) | 09:00 - 12:30 | http://citers2010.cite.hku.hk/abstract/69 | Dr. Kwok Ping CHAN, Associate Professor, Department of Computer Science, HKU
Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative Learning (CSCL) Data
1. Question Classification & Sentiment
Analysis
Kwok-Ping Chan
Dept. of Computer Science,
The Univ. of Hong Kong
March 5, 2010
2. The Knowledge Forum
A forum for students to discuss interesting issues, so that they can
learn during the discussion process.
Monitor the progress of students participating in the forum.
Forum articles can be categorized into four different types – as
Argument, Statement, Information, and Question.
Examples of Articles
(Information) Alcohol is an other kind of energy that would not produce
air-pollution and easy to use. In Brazil, alcohol energy is very popular and
successful. The Brazil government co-operate with a bank and produce alcohol for
drivers
(Argument) but producing fossil fuel need a few million years or maybe more than
it. So it will too late if we have to wait for a long time until its produced.
(Question) is it the one using Changjiang River?
(Statement) we are doing wind energy.
2 of 14
3. Article Classification
The progress of a student is reflected by the different types of
articles the student posted on the forum.
We would like to use Machine Learning technique to solve this
problem.
Two pieces of work which is related to this problem:
Question Classification — Classify questions into different categories.
Sentiment Analysis (Opinion Mining) — aims to determine the
attitude of a writer with respect to some topic. The attitude may be
their judgment or evaluation, their affective state (the emotional state
of the author when writing) or the intended emotional communication
(the emotional effect the author wishes to have on the reader). (from
Wikipedia) This includes
determining the polarity of a given text — positive, negative or neutral.
subjectivity/objectivity identification
determining the opinions expressed on different aspects of entities
3 of 14
4. Question Classification
We have used a local-aligned tree-kernel to do Question
Classification.
Application: Question/Answering System.
Based on the UIUC TREC database.
5 training set, containing 1,000 to 5,500 training questions, and a
test set containing 500 questions (Li & Roth).
The Questions are divided into 6 coarse classes and 50 fine classes.
We achieved 92.5% accuracy.
4 of 14
5. Question hierarchy
ABBREVIATION – abbreviation and expression.
DESCRIPTION – definition, description, manner, reason.
ENTITY – animal, body, color, creative, currency,
disease/medicine, event, food, instrument, lang, letter, other,
plant, product, religion, sport, substance, symbol, technique, term,
vehicle, word.
HUMAN – description, group, individual, title
LOCATION – city, country, mountain, state, other
NUMERIC VALUE – code, count, date, distance, money, order,
period, speed, percent, temp, vol/size, weight, other
5 of 14
6. Example Questions
The following is the first question extracted from the training dataset
for each broad class:
(ABBR, exp) What is the full form of .com ?
(DESC, manner) How did serfdom develop in and then leave
Russia ?
(ENTY, animal) What fowl grabs the spotlight after the Chinese
Year of the Monkey ?
(HUM, title) What is the oldest profession ?
(LOC, state) What sprawling U.S. state boasts the most airports ?
(NUM, date) When was Ozzy Osbourne born ?
6 of 14
7. Syntactic Features
words – words appearing in the question.
POS tags – their corresponding POS tags.
Chunks – non-overlapping phrases in the question.
Head chunks – the first noun/verb chunk in the question.
Examples: (from Li & Roth)
(Question) : Who was the first woman killed in the Vietnam War?
(POS Tagging) : [Who WP] [was VBD] [the DT] [first JJ]
[woman NN] [killed VBN] [in IN] [the DT] [Vietnam NNP] [War
NNP] [? .]
(Chunking) : [NP Who] [VP was] [NP the first woman] [VP
killed] [PP in] [NP the Vietnam War] ?
7 of 14
8. Semantic Features
Named Entities – noun phrases was categorized into different
semantic categories or varying specificity.
e.g. Question in the previous slides, we can get the named entity
[Num first] and [Event Vietnam War].
WordNet Senses – words are organized into senses in WordNet,
which are organized in hierarchy. All senses of a word are used as
features.
We use the Wu & Palmer metric to measure the similarity between
words.
Class-specific related words – some words are related to specific
question class, e.g. alcohol, lunch, orange etc are related to food
class.
Distributional Similarity
words occurring in similar syntactic structure are similar to each other.
words can be grouped into semantic categories accordingly.
8 of 14
9. Classifiers
Li & Roth used a hierarchical classifier.
Use two level classifier.
Coarse classifier – divide into the coarse classes.
Fine classifier – for the fine classes.
use Winnows algorithm.
Zhang & Chan
Use convolution tree kernels with local alignment
tree-kernel is semantic-enriched, by measuring the semantic similarity
of two parse trees, based on WordNet and Wu & Palmer metric, and
distributional similarity.
Classification was done by Support Vector Machine (SVM).
We believe article classification can be done similarly, using both
general features (for example, all POS tags and WordNet senses)
and expert features (Class-specific related words).
9 of 14
10. Sentiment Analysis & Opinion Mining
It involves the following problems (Pang & Li):
Sentiment polarity and degree of positivity
classify the position of the opinion in a continuum between two
polarities.
for example, in the context of reviews or political speech.
determine whether a piece of objective information is good or bad.
more difficult task: rating inference, “pro and con” instead of positive
or negative.
Subjectivity detection and opinion identification
whether an article contain subjective/objective information.
determining opinion strength (different from rating).
for example, use adjectives in the sentences.
10 of 14
11. Features
The following features can be used for sentiment analysis:
Term presence & frequency
Although term frequency was commonly used in information retrieval,
it was found that term presence gives better performance.
Binary features vs numerical feature.
topic emphasized by frequent occurrences of keywords
overall sentiment may not.
Sometimes single occurrence of word already indicate subjectivity.
Term-based features
position of a term within a textual unit.
use of unigram, bigram or trigram.
high-contrast pair of words, such as ”delicious an dirty”.
11 of 14
12. Features
Parts of Speech
Adjectives is particularly important in sentiment analysis.
for example, certain adjective are good indicators.
Use selected phrases, which are chosen via a pre-specified POS
patterns, most including an adjective or an adverb.
Nouns and verbs can also be strong indicators (e.g. ”gem”, and
”love”)
Syntax
sub-tree syntactic structures have been used.
collocation and other complex syntactic patterns have also been found
useful.
Negation
Positive and Negative opinion sometimes only differs in one negative
word (such as ”not”, ”don’t”).
Negation can be expressed in subtle ways, which is difficult to discover
(such as sarcasm and irony).
12 of 14
13. Features
Topic-oriented features
topic information should be incorporated into features.
for example, a piece of good news of rivals can be a bad news.
may need to include indicators (”this work”) or party names so that
the features can be attached to different entities.
13 of 14
14. Suggested Framework
to apply Machine Learning, we need a labeled corpus with
sufficient training data.
Many different features are used. Some system uses more than
200,000 features! (of course generated by computers)
Can group terms together to form concepts to reduce number of
features.
If we have enough training data, we can find the grouping most
tailored for the topic involved.
Features can also be results of another machine learning program,
such as sentiment analysis, topic related keywords.
Supervised classification can be employed, such as Support Vector
Machines or Decision Trees with Adaboost.
If sufficient data, the entire process can be data-driven.
Expert knowledge can be used to reduce amount of training data
needed.
14 of 14