O SlideShare utiliza cookies para otimizar a funcionalidade e o desempenho do site, assim como para apresentar publicidade mais relevante aos nossos usuários. Se você continuar a navegar o site, você aceita o uso de cookies. Leia nosso Contrato do Usuário e nossa Política de Privacidade.
O SlideShare utiliza cookies para otimizar a funcionalidade e o desempenho do site, assim como para apresentar publicidade mais relevante aos nossos usuários. Se você continuar a utilizar o site, você aceita o uso de cookies. Leia nossa Política de Privacidade e nosso Contrato do Usuário para obter mais detalhes.
Majority vote. Examples are from the result data set.
Number of posts results from filtering by length (1500 – 3500 characters, results in ~1600 texts) from about 16,000 texts – means that most of the texts are actually shorter than 1500 characters (‘me, too’, ‘yes’, …).
After filtering for 623 sentences!!! Not clear whether.
Application scenario question: What if we know the expected frequency in a large body of texts, can we use it to spot differences between course?El1/el2 = elearning (2 = year before); swl2 = social worker; sci = science
Alpha okay, but not great. Therefore analysis if the values change in 3 batches.
Creating a gold-standard for evaluating
automated reflection detection
Thomas Ullmann, Fridolin Wild, Peter Scott,
Knowledge Media Institute, The Open University
• A Model for reflection
• Related work on quantification of
• Data collected
• Results and discussion
Reflection is creative
sense-making of the
State of the art
in quantifying reflection
Reference Scales Unit of analysis Findings
Dyment & O’Connell (2011) Depth of reflection Studies (writings) Meta review: five studies low; four
medium; two studies high levels of
Wong et al. (1995) Depth of
45 students Content analysis and interviews: 76%
reflectors, 11% critical reflectors.
Wald et al. (2012) Reflective to non-
93 writings 2nd year students, self selected best of
reflective field notes: 30% critically
reflective, 11% transformative reflective.
Plack et al. (2005) Frequencies of
depth of reflection
43 journals 43% reflection, 42% critical reflection;
frequencies see next slide.
Hatton & Smith (1995) Units of reflection;
‘units’ (in writings of 60
After instruction: 30% dialogic reflection;
19 reflective units in average per 8-12
Ross (1989) Depth of reflection 134 papers of 25 students 22% highly reflective, 34 % moderately
Williams et al. (2002) Action
56 student journals 23% verify learning, 36% new
understanding, 39% future behaviour
Summary: Related work
• More research on level than on elements
• Wide range for ‘level of depth’
• Measurements on students or writings/journals level
• Mostly in the context of instructed reflective writing
• Typically: Mapping from evidence to depth/breadth
=> No re-usable instrument
to measure reflection
The dimensions of reflection
Ullmann, Wild, Scott (2012): Comparing automatically detected
reflective texts with human judgements. http://ceur-ws.org/Vol-931/paper8.pdf
Documentation of insights,
plans, and intentions.
Switch point of view.
Argumentation and reasoning.
Identification of a conflict.
Awareness building over affective factors.
Explication of self-awareness, e.g.,
inner monologues, description of feelings.
Example accounts (anonymised)
Dim: Type Example
SA: Identification of
“[Victor] and [Morgan], you are right that I
should have applied better my own learning
instead of using the Uni ones.”
“I imagine this is probably in order to have a
focus and provide enough detail rather than
skim over the whole area.”
TP: Switch point of
“When I am doing FRT work, I often think
about how the parents view me when they
know I haven‟t got children!”
Dim: Type Example
of an insight.
“After I saw how this lifted her mood and
eased her anxiety, I will remember that what
we can view sometimes to be small can
actually make a significant difference.”
“I would like to be involved in helping with the
site, too - although I‟m a novice! I imagine this
is probably in order to have a focus and
provide enough detail rather than skim over
the whole area.”
Dim: Type Example
“This has helped me reflect on my own life and
experiences whilst allowing me to empathise
with others in their own circumstances; I feel
proud of what I have achieved so far as the
work/life/study balance is always difficult to
navigate, but I‟m lucky that I have a supportive
family to help.“
“Bye the way, Audacity is also run under the CC
creating a gold standard
mid range length
OU LMS forum posts
4 subjects, 2 years de-identification
sentence level 1000 random
Expand grid, 10
5 raters each
„gold questions‟ passed„majority vote‟
• Crowdflower: the ‘virtual pedestrian area’
• Pre-tests showed:
– Really simple questions needed for HITs
– But: Quick answer options increase spam
– Short texts easier than long texts
(less spam, smaller costs)
– Shuffling of answers to avoid artefacts
• Check: larger than usual number of raters (5+) to see
how reliable judgements are
– Raw data
• Baseline: control questions: Krippendorff’s α = 0.43
• Control questions + survey data: α = 0.32
• Survey data: α = 0.22
– ‘objectified’ data
• Majority vote of 3 to all raters agree
– Survey data: α = 0.36, (623 out of 1,000 sentences)
• Majority vote of 4 to all agree
– Survey data: α = 0.581, (301 sentences)
• Majority vote of 5 (to all) agree:
– α = 0.98 (with outliers), (107 out of 1,000 sentences)
• Agreement of 5 of course increases IRR
– (to 0.98 unfiltered)
– when omitting ‘over answering’: to 1.0
– But: reduces to single category sentences
• Agreement of 3 deemed good enough
– since questions were single choice,
whereas multiple anwers are correct
• Sentences are reduction, but allow
to zoom in on markers
• Context: Forum texts
• Personal vs. non personal sentences