Longitudinal Analysis of Peer Feedback in a Writing-Intensive Course: A Pilot Study
1. Longitudinal Analysis of Peer
Feedback in a Writing-Intensive
Course: A Pilot Study
PI: Christina Hendricks
Co-PI: Jeremy Biesanz
University of British Columbia-Vancouver
Funded by the UBC Institute for the Scholarship of Teaching
and Learning SoTL Seed Fund
Festival of Learning, June 2016
Slides licensed CC-BY 4.0
2. Literature on peer feedback
Receiving peer
feedback improves
writing
(Paulus, 1999; Cho & Schunn,
2007; Cho & MacArthur, 2010;
Crossman & Kite, 2012)
Giving peer feedback
improves writing
(Cho & Cho, 2011; Li, Liu &
Steckelberg, 2010)
3. GAPS:
Most studies look at revisions to a single
essay, not changes across different essays
Draft 1 Draft 2 Draft 3
Essay 1 Essay 2 Essay 3 Essay 4 Essay …n
PFB
PF
B
PF
B
PFB PF
B
PFB
Few studies look at “dose-response curve”
4. Pilot study research questions
1. How do students use peer comments given and
received for improving different essays rather
than drafts of the same essay?
1. Are students more likely to use peer comments
given and received for improving their writing
after more than one or two peer feedback
sessions? How many sessions are optimal?
2. Does the quality of peer comments improve
over time?
5. • Interdisciplinary, full year course for first-years
• 18 credits (English, History, Philosophy)
• Students write 10-12 essays (1500-2000
words)
• Peer feedback tutorials every week (4
students)
http://artsone.arts.ubc.ca
Toni Morrison, Wikimedia Commons,
licensed CC BY-SA 2.0
Osamu Tezuka, public domain
on Wikimedia Commons
Jane Austen, public domain on
Wikimedia Commons
Friedrich Nietzsche, public
domain, Wikimedia Commons
6. Data for pilot study 2013-2014
• 10 essays by 12
participants (n=120)
• Comments by 3 peers on
essays (n=1218)
• Comments by instructor
(n=3291)
• All coded with same
rubric
7. Coding Rubric
Categories
(plus
subcategories, for
11 options)
• Strength of argument
• Organization
• Insight
• Style & Mechanics
Numerical
value
1: Significant problem
2: Moderate problem
3: Positive comment/praise
E.g., STREV 2: could use more textual
evidence to support your claims
Change
for future
8. Inter-coder reliability
Fleiss’ Kappa Intra-class
correlation
Student
comments
(n=141)
All categories: 0.61 (moderate)
Most used categories: 0.8
(excellent)
0.96
(excellent)
Essays (n=120) 0.71
(adequate)
3 coders:
• Daniel Munro & Kosta Prodanovic
(undergrads, former Arts One)
• Jessica Stewart (author, editor)
Change
for future
17. Cross-lagged panel design with
auto-regressive structure
Essay Quality
Time 1
Essay Quality
Time 2
Comments
Time 1
Comments
Time 2
B
A
C
D
E
… N
… N
18. Path A: Instructor Comments
Essay Quality
Time 1
Essay Quality
Time 2
Comments
Time 1
Comments
Time 2
B
A
C
D
E
… N
… N
Significant relationships
• Ratings of 1 in Strength (-.12*) & Org. (-.23**)
• Ratings of 2 in Strength (-.06*) & Style (-.08*)
• Ratings of 3 in Str, (.11*), Insight (.35*), Style (.15*)
*p < .05, **p< .01, ***p< .001, ****p < .0001 *****p <.00001
19. Path A: Student comments
Essay Quality
Time 1
Essay Quality
Time 2
Comments
Time 1
Comments
Time 2
B
A
C
D
E
… N
… N
Significant relationships
• Ratings of 2 in Insight (-.53*)
• Ratings of 3 in Organization (.13*)
*p < .05, **p< .01, ***p< .001, ****p < .0001 *****p <.00001
20. Path C: instructor comments
Essay Quality
Time 1
Essay Quality
Time 2
Comments
Time 1
Comments
Time 2
B
A
C
D
E
… N
… N
Significant effects don’t show up if split out by category
• Comments ratings of 1 (.29**)
• Comments ratings of 2 (.23*)
• Comments ratings of 3 (.21, p=.057)
*p < .05, **p< .01, ***p< .001, ****p < .0001 *****p <.00001
21. Path C: student comments
Essay Quality
Time 1
Essay Quality
Time 2
Comments
Time 1
Comments
Time 2
B
A
C
D
E
… N
… N
Significant relationships
• Comments rated 2 in Strength (.22*) & Style (.33**)
• Comments rated 3 in Style (.31*)
*p < .05, **p< .01, ***p< .001, ****p < .0001 *****p <.00001
22. Path D: Student & Instructor
comments
Essay Quality
Time 1
Essay Quality
Time 2
Comments
Time 1
Comments
Time 2
B
A
C
D
E
… N
… N
Significant relationship ONLY if combine student
& instructor comments, & only for comments
rated 1 (all categories combined): (.05, p=.06)
23. Research question 1
How do students use peer comments given
and received for improving different
essays rather than drafts of same essay?
o Very little significant evidence of
relationships in Path D
o No difference between comments given
& received
24. Research question 2
Are students more likely to use peer comments
given and received for improving their writing
after more than one or two peer feedback
sessions? How many sessions are optimal?
o No evidence that there is any change over
time in path D
o No difference between comments given or
received
25. Research question 3
Does the quality of peer comments improve
over time?
o No evidence of change over time in path A
Essay Quality
Time 1
Essay Quality
Time 2
Comments
Time 1
Comments
Time 2
B
A
C
D
E
… N
… N
26. Research Question 3, cont’d
Student/instructor agreement on average
numerical ratings on each essay
• tends to go down over time (-.04**)
• student ratings increase at only half the
rate (.16*) that instructor’s ratings increase
(.33*****)
*p < .05, **p< .01, ***p< .001, ****p < .0001 *****p <.00001
27. Research Question 3, cont’d
Correlations on number of comments,
students & instructor
• No change in these relationships over time
*p < .05, **p< .01, ***p< .001, ****p < .0001 *****p <.00001
Comment
value 1
Comment
value 2
Comment
value 3
Strength 0.23*
Organization 0.21* 0.17*
Insight 0.17*
Style
28. Some conclusions
Pilot study: feasible for larger sample? Yes, if:
o instructors code essay quality rather than coders
o “chunk” essays for cross-lagged analyses
o have easy collection of comments
29. References
• Cho, K., & MacArthur, C. (2010). Student revision with peer
and expert reviewing, Learning and Instruction. 20, 328-338.
• Cho, Y. H., & Cho, K. (2011). Peer reviewers learn from giving
comments. Instructional Science, 39, 629-643.
• Cho, K. & Schunn, C. D. (2007). Scaffolded writing and
rewriting in the discipline: A web-based reciprocal peer review
system. Computers & Education, 48, 409–426
• Crossman, J. M., & Kite, S. L. (2012). Facilitating improved
writing among students through directed peer review, Active
Learning in Higher Education, 13, 219-229.
• Li, L., Liu, X., & Steckelberg, A. L. (2010). Assessor or
assessee: How student learning improves by giving and
receiving peer feedback. British Journal of Educational
Technology, 41(3), 525–536.
• Paulus, T. M. (1999). The effect of peer and teacher feedback
on student writing. Journal of Second Language Writing, 8,
265-289.
30. Thank you!
Christina Hendricks
University of British Columbia-Vancouver
Website: http://blogs.ubc.ca/christinahendricks
Blog: http://blogs.ubc.ca/chendricks
Twitter: @clhendricksbc
Slides available: https://is.gd/PeerFeedbackPilot_FOL16
Slides licensed CC-BY 4.0
Capitals needed
underscore
Editor's Notes
Number of “1” comments total: 239 out of over 4000
1’s by students: 35
1’s by instructor: 204
How much agreement do we observe relative to how much we would expect to see by chance?
-- takes into account the frequency of the type of code occurring in the data
-- some codes are more frequent, so you’d expect those to have more apparent agreement
-1 to +1
0 = amount of agreement we’d expect to see by chance
-1 is complete disagreement
0.6 is moderate agreement; 0.8 is substantial
-- Kappa includes just the category
Many of the mostly used categories have agreement in 0.8 range
Reliability on degree: intra class correlation (ICC) of 0.96
-- to what extent is the average across the three raters reliable: average of all the numbers each gave—how does this correlate with the average of everyone who could possibly do this—get no benefit for adding more people
-- average is 2.5
-- 1’s are pretty infrequent
-- people agree on whether a 2 or a 3 (40% are 2s, 60% are 3s)
These numbers are linear trend over time, not autoregressive
What this says, basically, is that the coders’ ratings of essay quality are pretty similar to the instructor’s comments on essay quality, in these categories at least
This could just be saying that students tend to give the same sorts of comments to the same people, but also that things aren’t changing that much from one essay to another.