This case study reports on two first-semester programming courses with more than 190 students. Both courses made use of automated assessments. We observed how students trick these systems by analysing the version history of suspect submissions. By analysing more than 3300 submissions, we revealed four astonishingly simple tricks (overfitting, evasion) and cheat-patterns (redirection, and injection) that students used to trick automated programming assignment assessment systems (APAAS). Although not the main focus of this study, it discusses and proposes corresponding counter-measures where appropriate.
Nevertheless, the primary intent of this paper is to raise problem awareness and to identify and systematise observable problem patterns in a more formal approach. The identified immaturity of existing APAAS solutions might have implications for courses that rely deeply on automation like MOOCs. Therefore, we conclude to look at APAAS solutions much more from a security point of view (code injection). Moreover, we identify the need to evolve existing unit testing frameworks into more evaluation-oriented teaching solutions that provide better trick and cheat detection capabilities and differentiated grading support.
3. • We are at a transition point between the
industrialisation age and the digitisation age.
• Computer science related skills are a vital asset
in this context. One of these basic skills is
practical programming.
• The course sizes of university and college
programming courses are steadily increasing.
• Even MOOC’s are used more frequently to
convey necessary programming capabilities to
students of different disciplines.
• The coursework is composed of assignments
that are highly suited to be assessed
automatically.
• However, it is very often underestimated how
astonishingly easy it is to trick these systems!
Introduction
3
The question arises
whether “robots”
certificate the expertiseto program or to cheat?
4. A small example to get your attention ...
4 VPL == Virtual Programming Lab
• Count the occurence of a character c in
a String s.
• Develop a method countChar().
How to get full points in
Moodle/VPL?
The same works for every assignment!
INTRODUCTION
5. INTRODUCTION
• APAAS solutions are systems that execute injected code
(student submissions).
• Code injection is known as a severe threat from a security
point of view.
• APAAS solutions protect the host system via sandbox
mechanisms.
• Much effort is invested in sophisticated code
plagiarism detection and authorship control of
student submissions.
• But it was astonishing to see that APAAS solutions like VPL
overlook the cheating cleverness of students.
• The grading component can be cheated very
straightforward.
• Unattended automated programming examinations must
be rated suspect.
APAAS == Code Injection System
5
7. • Two first semester programming Java courses
in the winter semester 2018/19:
• A regular computer science study
programme (CS)
• An information technology and design
focused study programme (ITD)
• In both courses we searched for student
submissions that intentionally trick the grading
component.
• APAAS: Moodle/VPL (Version 3.3.3)
Methodology
7
• To minimise Hawthorne and Experimenter effects neither the students nor the advisers
were aware to be part of this study.
• Even if cheating was detected this had no consequences for the students. It was not
even communicated.
• Students were unaware that the version history of their submissions were logged and
analyzed.
8. METHODOLOGY
• VPL submissions were downloaded
from Moodle
• Python/Jupyter based sample selection
• S1: triggered evaluations
• S2: maximum versions
• S3: low average high end
• S4: condition related terms
• S5: unusual terms (System.exit, ...)
• S6: random submissions
• NumPy, matplotlib, statistics,
Javaparser libraries
• Exported weekly into archived PDF
documents (for manual analysis)
Searching for cheats
Automated sample selection, manual sample analysis
8
12. ANALYSIS
Continuous Example Assignment
12
Count the occurence of a character c in a String s
(not case-sensitive).
We searched for solutions
that differed significantly
from this intendend
(reference) solution.
The reference solution used to check for correctness.
13. ANALYSIS
CHEAT PATTERN (1)
• Get a maximum of points but do not solve the given problem
in a general way
• Solution is completely useless outside the scope of the test
cases
• Mapping simply input parameters to expected output
parameters
(63%) Overfitting
13
14. ANALYSIS
CHEAT PATTERN (2)
(30%) Problem Evasion
14
Example assignment:
Count the occurence of a
character c in a String s
recursively.
Solution pretends to be
recursive, but it is merely a
redirection to an overloaded
method using loops (non-
recursive).
Intended solution Evasion solution
15. ANALYSIS
CHEAT PATTERN (3)
(6%) Redirection
15
(1) A small spelling error will
result in compiler messages
indicating that a specific
method is expected by the test
logic!
(2) Compiler error messages
can reveal the reference
solution.
(3) A clever student might
now simply redirect the
submission to the reference
method (to let the grader
evaluate itself).
Redirecting solution
16. ANALYSIS
CHEAT PATTERN (4)
(2%) Injection
16
Print simply the
points you want to
have in a APAAS
specific format on
standard out.
• Change the intended workflow of
the evaluation logic
• Use the standard out stream to
place text that is evaluated by the
APAAS system
• The evaluator calls the to be evaluated code.
• The submission code can print to standard out and then terminates further
evaluation calls.
• The evaluator parses standard outs content and will give full points!
Some strings with a specific
meaning for VPL.
18. DISCUSSION
• Randomize Test CasesOverfitting
• AST-based code inspectionProblem Evasion
• AST-based code inspectionRedirection
• Seperate standard out stream for
evaluation and submission logicInjection
Counter Measures
18
A more detailed discussion
can be found in the paper.
19. DISCUSSION
JEdUnit
19
JEdUnit
https://github.com/nkratzke/JEdUnit
JEdUnit is a unit testing framework with a
special focus on educational aspects. It
strives to simplify automatic evaluation of
(small) Java programming assignments
using Moodle/VPL.
It is used and developed for programming
classes at the Lübeck University of Applied
Sciences.
However, this framework might be helpful
for other programming instructors, so it has
been open sourced.
20. DISCUSSION
Randomize Test Cases
20
Don‘t do that:
Do that:
JEdUnit DSL to express
randomized test values. E.g.
apply regular expressions
inversely to generate random
strings.
21. DISCUSSION
AST-based code inspections
21
E.g.: Don‘t allow to bypass recursions
by inspecting and penalizing loop presence.
The JEdUnit DSL is able to
express selectors on abstract
syntax trees (AST) to check for
the presence or absence of
language constructs.
The selector model of
JEdUnit works similar like
CSS selectors work on DOM-
trees.
22. DISCUSSION
Isolation of submission and evaluation logic
22
Submission logic
gets an isolated fake
console
Submission
shares stdout
with evaluation
process
JEdUnit
approach
VPL
approach
23. DISCUSSION
Further Features of JEdUnit
23
JEdUnit
https://github.com/nkratzke/JEdUnit
• Weighting of test cases (by annotations)
• Checkstyle integration (weightened rules)
• DSL
• to formulate test cases in a check,
explain, onError pattern
• to randomize test cases
• to write arbitrary code inspections
based on a selector model
• Predefined code inspections (switch on/off):
proper collection usage, Loops, Lambdas,
inner classes, datafields, sonsole output, etc.
• Automated class structure comparison (OO
use cases to compare the structural equality
of a multi-class submission with a multi-class
reference solution.
25. LIMITATIONS
We searched qualitatively and not
quantitatively for cheat-patterns
• Do not draw any conclusions
what kind of cheat-pattern occur
at what level of programming
expertise
• Do not draw any conclusions on
the quantitative aspects of
cheating
• The study does not proclaim to
have identified all kinds of cheat-
patterns
The study does not proclaim that
all APAAS solutions have the same
set of vulnerabilities
• Do not generalize Moodle/VPL
specific-problems.
• However, the Overfitting,
Problem Evasion, Redirection,
and Injection patterns can be
used to check for vulnerabilities
in other APAAS solutions.
Threats on Validity
25
26. • We have to be aware that (even first-year)
students are clever enough to trick automated
grading solutions.
• Cheat patterns:
• Overfitting
• Problem Evasion
• Redirection
• Injection
• Options we currently investigate:
• Randomise test cases
• Pragmatic code inspection
• Isolation of submission and evaluation logic
• Exactly these features seem to be only
incompletely provided by current APAAS systems.
Conclusion
26
JEdUnit
https://github.com/nkratzke/JEdUnit
27. Acknowledgement
27
Presentation on SpeakerDeck
Preprint on ResearchGate
Advisers of the practical courses
• David Engelhardt, Thomas Hamer, Clemens Stauner,
Volker Völz, Patrick Willnow
Student tutors
• Franz Bretterbauer, Francisco Cardoso, Jannik
Gramann, Till Hahn, Thorleif Harder, Jan Steffen
Krohn, Diana Meier, Jana Schwieger, Jake Stradling,
and Janos Vinz
Picture Reference
• Hacker: Pixabay.com (CC0)
• Robot: Pixabay.com (CC0)