Fostering a culture change in assessment and feedback through TESTA
1. Fostering a culture shift
in assessment and feedback
through TESTA
Professor Tansy Jessop
Birkbeck College
15 February 2017
2. Checking in
1. One thing you already know about TESTA
2. One problem you have faced with assessment
3. One problem you have faced with feedback
4. One blue skies idea to address a problem
3. This session
1. Brief overview of TESTA
2. Why people find it useful
3. Three problems TESTA addresses
4. Four themes in the data
5. Solutions: a taster
9. Three problemsThree problems
Problem 1: Something awry not sure why
Problem 2: Curriculum design problem
Problem 3: The problem of educational change
13. Curriculum privileges ‘knowing’ stuff
“Content is often the most visible aspect
for students, the control of which is
frequently devolved to individual
academics, who receive little or no training
in curriculum design and planning”
(Blackmore and Kandiko 2014, 7).
15. Problem 3: Educational change problem
Three misguided assumptions:
1. There is not enough high
quality data.
2. Data will do it
3. Academics will buy it.
http://www.liberalarts.wabash.edu/study-overview/
16. Proving is different from improving
“It is incredibly difficult to translate assessment
evidence into improvements in student learning”
“It’s far less risky and complicated to analyze data
than it is to act”
(Blaich & Wise, 2011)
17. Paradigm What it looks like
Technical rational Focus on data and tools
Relational Focus on people
Emancipatory Focus on systems and structures
18. TESTA themes and impacts
1. Variations in assessment patterns
2. High summative: low formative
3. Disconnected feedback
4. Lack of clarity about goals and standards
19. Defining the terms
• Summative assessment carries a grade which
counts toward the degree classification.
• Formative assessment does not count
towards the degree (either pass/fail or a
grade), elicits comments and is required to be
done by all students.
20. 1. Huge variations
• What is striking for
you about this data?
• How does it compare
with your context?
• Does variation
matter?
21. Assessment features across a 3 year UG degree (n=73)
Characteristic Range
Summative 12 -227
Formative 0 - 116
Varieties of assessment 5 - 21
Proportion of examinations 0% - 87%
Time to return marks & feedback 10 - 42 days
Volume of oral feedback 37 -1800 minutes
Volume of written feedback 936 - 22,000 words
22. Theme 2: High summative: low formative
• Summative ‘pedagogies of control’
• Circa 2 per module in UK
• Ratio of 1:8 of formative to summative
• Formative weakly understood and practised
24. What students say about high summative
• A lot of people don’t do wider reading. You just focus
on your essay question.
• In Weeks 9 to 12 there is hardly anyone in our
lectures. I'd rather use those two hours of lectures to
get the assignment done.
• It’s been non-stop assignments, and I’m now free of
assignments until the exams – I’ve had to rush every
piece of work I’ve done.
25. What students say about formative
• If there weren’t loads of other assessments, I’d do
it.
• If there are no actual consequences of not doing
it, most students are going to sit in the bar.
• It’s good to know you’re being graded because
you take it more seriously.
• The lecturers do formative assessment but we
don’t get any feedback on it.
27. 1) Low-risk, more frequent opportunities for students to
learn from feedback (Sadler, 1989)
2) Helps students to fine-tune and understand
requirements and standards (Boud 2000, Nicol, 2006)
3) Feedback to lecturers from formative tasks helps to
adapt teaching (Hattie, 2009)
4) Engages students in cycles of reflection and
collaboration (Biggs 2003; Nicol & McFarlane Dick 2006)
5) Encourages and distributes student effort (Gibbs 2004).
Why formative matters
28. So, how do we do it?
Five case studies of
successful formative
Your task will be to identify
the principles that make
them work
How could you adapt them?
29. Case Study 1: Business School
• Reduction from average 2 x summative, zero
formative per module
• …to 1 x summative and 3 x formative
• Required by students in entire business school
• All working to similar script
• Systematic shift, experimentation, less risky
together
30. Case Study 2: Social Sciences
• Education, Sociology and PGCAP degrees
• Problem: silent seminar, students not reading
• Public platform blogging
• Current academic texts
• In-class
• Threads and live discussion
• Linked to summative
31. Case Study 3: Media degree
• Media degree
• Presentations formative
• Students get feedback (peer and tutor)
• Refines their thinking for…
• Linked summative essay
32. Case study 4: Film and TV
• Seminar
• Problem: lack of discrimination about sources
• Students bring 1 x book, 1 x chapter, 1 x
journal article, 2 x pop culture articles
• Justify choices to group
• Reach consensus about five best sources
33. Case study 5: Engineering
• Engineering
• Problem low averages
• Course requirement to complete 50 problems
• Peer assessed in six ‘lecture’ slots
• Marks do not count
• Lectures, problems, classes, exams unchanged
• Exam marks increased from 45% to 85%
34. Your task
• In groups, identify five principles for making
formative work. Write them down on flipchart
paper.
• Devise one or two adaptations for your
discipline, using the principles, and make one
poster which outlines/draws your adaptation.
You can be creative!
36. Take five
• Choose a quote that
strikes you.
• What is the key issue?
• What strategies might
address this issue?
37. What students say…
It’s difficult because your assignments are so detached
from the next one you do for that subject. They don’t
relate to each other.
Because it’s at the end of the module, it doesn’t feed into
our future work.
Because they have to mark so many that our essay
becomes lost in the sea that they have to mark.
It was like ‘Who’s Holly?’ It’s that relationship where
you’re just a student.
38. Actions based on evidence
• Conversation: who starts the dialogue?
• Iterative cycles of reflection across modules
• Quick generic feedback: the ‘Sherlock’ factor
• Feedback synthesis tasks
• Technology: audio, screencast and blogging
• From feedback as ‘telling’…
• … to feedback as asking questions
39. Theme 4: Confusion about goals and
standards
• Consistently low scores on the AEQ for clear
goals and standards
• Alienation from the tools, especially criteria
and guidelines
• Symptoms: perceptions of marker variation,
unfair standards and inconsistencies in practice
40. What the literature says…
Marking is important. The grades we give
students and the decisions we make about
whether they pass or fail coursework and
examinations are at the heart of our academic
standards (Bloxham, Boyd and Orr 2011).
Grades matter (Sadler 2009).
41. What the papers say…
https://www.timeshighereducation.co.uk/news/examiners-give-hugely-different-
marks/2019946.article
42. QAA: a paradigm of accountability
• Learning outcomes
• Criteria-based learning
• Meticulous specification
• Written discourse
• Generic discourse (Woolf 2004)
• ‘Validating practices’ (Shay 2004)
• Intended to reduce the arbitrariness of staff
decisions (Sadler 2009).
43. What students say…
We’ve got two tutors- one marks completely differently to
the other and it’s pot luck which one you get.
They have different criteria, they build up their own criteria.
It’s such a guessing game.... You don’t know what they
expect from you.
They read the essay and then they get a general impression,
then they pluck a mark from the air.
44. What’s going wrong here?
There are criteria, but I find them really strange.
There’s “writing coherently, making sure the argument
that you present is backed up with evidence”.
Q: If you could change one thing to improve what
would it be?
A: More consistent marking, more consistency across
everything and that they would talk to each other.
45. Is this quite ‘normal’?
Differences between markers are not ‘error’, but
rather the inescapable outcome of the multiplicity
of perspectives that assessors bring with them (Shay
2005, 665).
The tension between ‘the scientific aspirations of
assessment technologies to represent an objective
reality and the unavoidable subjectivities injected by
the human focus of these technologies’ (Broadfoot
2002, 157).
48. The Art and Science of Evaluation
Judging is both an art and a science: It is an art
because the decisions with which a judge is
constantly faced are very often based on
considerations of an intangible nature that cannot
be recognized intuitively. It is also a science because
without a sound knowledge of a dog’s points and
anatomy, a judge cannot make a proper assessment
of it whether it is standing or in motion.
Take them round please: the art of judging dogs (Horner, T
1975).
49. Marking as social practice
The typical technologies of our assessment and
moderation systems – marking memorandum,
double-marking, external examiners – privilege
reliability. These technologies are not in themselves
problematic. The problem is our failing to use these
technologies as opportunities for dialogue about
what we really value as assessors, individually and as
communities of practice
(Shay 2005).
Marking as social practice
50. Taking action: internalising goals and
standards
• Regular calibration exercises
• Discussion and dialogue
• Discipline specific criteria (no cut and paste)
Lecturers
• Rewrite/co-create criteria
• Marking exercises
• Exemplars
Lecturers
and students
• Enter secret garden - peer review
• Engage in drafting processes
• Self-reflection
Students
54. References
Arum, R. and Roska, J. 2011. Academically Adrift. Limited Learning on College Campuses. Chicago.
University of Chicago Press.
Blaich, C., & Wise, K. (2011). From Gathering to Using Assessment Results: Lessons from the Wabash
National Study. Paper #8. University of Illinois: National Institution for Learning Outcomes Assessment.
Boud, D. and Molloy, E. (2013) ‘Rethinking models of feedback for learning: The challenge of
design’, Assessment & Evaluation in Higher Education, 38(6), pp. 698–712. doi:
10.1080/02602938.2012.691462.
Gibbs, G. & Simpson, C. (2004) Conditions r which assessment supports students' learning. Learning and
Teaching in Higher Education. 1(1): 3-31.
Harland, T., McLean, A., Wass, R., Miller, E. and Sim, K. N. (2014) ‘An assessment arms race and its
fallout: High-stakes grading and the case for slow scholarship’, Assessment & Evaluation in Higher
Jessop, T. and Tomas, C. 2016 The implications of programme assessment on student learning.
Assessment and Evaluation in Higher Education. Published online 2 August 2016.
Jessop, T. and Maleckar, B. (2014). The Influence of disciplinary assessment patterns on student
learning: a comparative study. Studies in Higher Education. Published Online 27 August 2014
http://www.tandfonline.com/doi/abs/10.1080/03075079.2014.943170
Jessop, T. , El Hakim, Y. and Gibbs, G. (2014) The whole is greater than the sum of its parts: a large-scale
study of students’ learning in response to different assessment patterns. Assessment and Evaluation in
Higher Education. 39(1) 73-88.
Nicol, D. (2010) From monologue to dialogue: improving written feedback processes in mass higher
education, Assessment & Evaluation in Higher Education, 35: 5, 501 – 517.
Perry, William 1981. Cognitive and Ethical Growth: The Making of Meaning. In Chickering, A.
(1981) The Modern American College. San Francisco. Jossey Bass.
O'Donovan, B , Price, M. and Rust, C. (2008) 'Developing student understanding of assessment
standards: a nested hierarchy of approaches', Teaching in Higher Education, 13: 2, 205 — 217
Sadler, D. R. (1989) ‘Formative assessment and the design of instructional systems’, Instructional
Science, 18(2), pp. 119–144. doi: 10.1007/bf00117714.
Notas do Editor
What started as a research methodology has become a way of thinking. David Nicol – changing the discourse, the way we think about assessment and feedback; not only technical, research, mapping, also shaping our thinking. Evidence, assessment principles. Habermas framework.
I realised what we were saying was ‘That’s only two per module’. And I was like ‘Ah, but that’s the point. This is a programmatic thing and you’re used to thinking about a module’
(Programme Leader, American Studies).
Data – persistent problem A&F scores. Traffic light systems – green for good. DVC find the people wo are doing well so we can share best practice. Three programmes. Neil McCaw
Data – persistent problem A&F scores. Traffic light systems – green for good. DVC find the people wo are doing well so we can share best practice. Three programmes. Neil McCaw
Hard to make connections, difficult to see the joins between assessments, much more assessment, much more assessment to accredit each little box. Multiplier effect. Less challenge, less integration. Lots of little neo-liberal tasks. The Assessment Arms Race.
Language of ‘covering material’ Should we be surprised?
Wabash study – 2005-2011, 17,000 students in 49 American colleges. 60-70 publications Critical thinking, moral reasoning, leadership towards social justice, engagement in diversity, deep intellectual work.
TESTA has done the data and that’s been useful. Ideological compromises. Mixed methods approaches. Critical pedagogy sleeping with the enemy. Democratic, participatory, liberating curriculum and pedagogy. Teachers and students shape and change education. Resist managerialism and the market. Risky pedagogies.
Teach Less, learn more. Assess less, learn more.
Students can increase their understanding of the language of assessment through their active engagement in: ‘observation, imitation, dialogue and practice’ (Rust, Price, and O’Donovan 2003, 152), Dialogue, clever strategies, social practice, relationship building, relinquishing power.