SlideShare a Scribd company logo
1 of 28
Download to read offline
How clever students trick dumb automated programming
assignment assessment systems (APAAS)
Nane Kratzke
SMART LIKE A FOX
1
Introduction
Methodology
Analysis
Discussion, Counter Measures
Limitations, Conclusion
Agenda
2
Presentation on SpeakerDeck
Preprint on ResearchGate
Presentation at CSEDU 2019, Heraklion, Crete, Greece (2 – 4 May 2019)
• We are at a transition point between the
industrialisation age and the digitisation age.
• Computer science related skills are a vital asset
in this context. One of these basic skills is
practical programming.
• The course sizes of university and college
programming courses are steadily increasing.
• Even MOOC’s are used more frequently to
convey necessary programming capabilities to
students of different disciplines.
• The coursework is composed of assignments
that are highly suited to be assessed
automatically.
• However, it is very often underestimated how
astonishingly easy it is to trick these systems!
Introduction
3
The question arises
whether “robots”
certificate the expertiseto program or to cheat?
A small example to get your attention ...
4 VPL == Virtual Programming Lab
• Count the occurence of a character c in
a String s.
• Develop a method countChar().
How to get full points in
Moodle/VPL?
The same works for every assignment!
INTRODUCTION
INTRODUCTION
• APAAS solutions are systems that execute injected code
(student submissions).
• Code injection is known as a severe threat from a security
point of view.
• APAAS solutions protect the host system via sandbox
mechanisms.
• Much effort is invested in sophisticated code
plagiarism detection and authorship control of
student submissions.
• But it was astonishing to see that APAAS solutions like VPL
overlook the cheating cleverness of students.
• The grading component can be cheated very
straightforward.
• Unattended automated programming examinations must
be rated suspect.
APAAS == Code Injection System
5
Introduction
Methodology
Analysis
Discussion, Counter Measures
Limitations, Conclusion
Agenda
6
• Two first semester programming Java courses
in the winter semester 2018/19:
• A regular computer science study
programme (CS)
• An information technology and design
focused study programme (ITD)
• In both courses we searched for student
submissions that intentionally trick the grading
component.
• APAAS: Moodle/VPL (Version 3.3.3)
Methodology
7
• To minimise Hawthorne and Experimenter effects neither the students nor the advisers
were aware to be part of this study.
• Even if cheating was detected this had no consequences for the students. It was not
even communicated.
• Students were unaware that the version history of their submissions were logged and
analyzed.
METHODOLOGY
• VPL submissions were downloaded
from Moodle
• Python/Jupyter based sample selection
• S1: triggered evaluations
• S2: maximum versions
• S3: low average high end
• S4: condition related terms
• S5: unusual terms (System.exit, ...)
• S6: random submissions
• NumPy, matplotlib, statistics,
Javaparser libraries
• Exported weekly into archived PDF
documents (for manual analysis)
Searching for cheats
Automated sample selection, manual sample analysis
8
METHODOLOGY
Analysis of submissions
9
Manual annotation
Task description
Result, workload, working
phases, student identifier
Introduction
Methodology
Analysis
Discussion, Counter Measures
Limitations, Conclusion
Agenda
10
ANALYSIS
Observed cheat-pattern frequency
11
ANALYSIS
Continuous Example Assignment
12
Count the occurence of a character c in a String s
(not case-sensitive).
We searched for solutions
that differed significantly
from this intendend
(reference) solution.
The reference solution used to check for correctness.
ANALYSIS
CHEAT PATTERN (1)
• Get a maximum of points but do not solve the given problem
in a general way
• Solution is completely useless outside the scope of the test
cases
• Mapping simply input parameters to expected output
parameters
(63%) Overfitting
13
ANALYSIS
CHEAT PATTERN (2)
(30%) Problem Evasion
14
Example assignment:
Count the occurence of a
character c in a String s
recursively.
Solution pretends to be
recursive, but it is merely a
redirection to an overloaded
method using loops (non-
recursive).
Intended solution Evasion solution
ANALYSIS
CHEAT PATTERN (3)
(6%) Redirection
15
(1) A small spelling error will
result in compiler messages
indicating that a specific
method is expected by the test
logic!
(2) Compiler error messages
can reveal the reference
solution.
(3) A clever student might
now simply redirect the
submission to the reference
method (to let the grader
evaluate itself).
Redirecting solution
ANALYSIS
CHEAT PATTERN (4)
(2%) Injection
16
Print simply the
points you want to
have in a APAAS
specific format on
standard out.
• Change the intended workflow of
the evaluation logic
• Use the standard out stream to
place text that is evaluated by the
APAAS system
• The evaluator calls the to be evaluated code.
• The submission code can print to standard out and then terminates further
evaluation calls.
• The evaluator parses standard outs content and will give full points!
Some strings with a specific
meaning for VPL.
Introduction
Methodology
Analysis
Discussion, Counter Measures
Limitations, Conclusion
Agenda
17
DISCUSSION
• Randomize Test CasesOverfitting
• AST-based code inspectionProblem Evasion
• AST-based code inspectionRedirection
• Seperate standard out stream for
evaluation and submission logicInjection
Counter Measures
18
A more detailed discussion
can be found in the paper.
DISCUSSION
JEdUnit
19
JEdUnit
https://github.com/nkratzke/JEdUnit
JEdUnit is a unit testing framework with a
special focus on educational aspects. It
strives to simplify automatic evaluation of
(small) Java programming assignments
using Moodle/VPL.
It is used and developed for programming
classes at the Lübeck University of Applied
Sciences.
However, this framework might be helpful
for other programming instructors, so it has
been open sourced.
DISCUSSION
Randomize Test Cases
20
Don‘t do that:
Do that:
JEdUnit DSL to express
randomized test values. E.g.
apply regular expressions
inversely to generate random
strings.
DISCUSSION
AST-based code inspections
21
E.g.: Don‘t allow to bypass recursions
by inspecting and penalizing loop presence.
The JEdUnit DSL is able to
express selectors on abstract
syntax trees (AST) to check for
the presence or absence of
language constructs.
The selector model of
JEdUnit works similar like
CSS selectors work on DOM-
trees.
DISCUSSION
Isolation of submission and evaluation logic
22
Submission logic
gets an isolated fake
console
Submission
shares stdout
with evaluation
process
JEdUnit
approach
VPL
approach
DISCUSSION
Further Features of JEdUnit
23
JEdUnit
https://github.com/nkratzke/JEdUnit
• Weighting of test cases (by annotations)
• Checkstyle integration (weightened rules)
• DSL
• to formulate test cases in a check,
explain, onError pattern
• to randomize test cases
• to write arbitrary code inspections
based on a selector model
• Predefined code inspections (switch on/off):
proper collection usage, Loops, Lambdas,
inner classes, datafields, sonsole output, etc.
• Automated class structure comparison (OO
use cases to compare the structural equality
of a multi-class submission with a multi-class
reference solution.
Introduction
Methodology
Analysis
Discussion, Counter Measures
Limitations, Conclusion
Agenda
24
LIMITATIONS
We searched qualitatively and not
quantitatively for cheat-patterns
• Do not draw any conclusions
what kind of cheat-pattern occur
at what level of programming
expertise
• Do not draw any conclusions on
the quantitative aspects of
cheating
• The study does not proclaim to
have identified all kinds of cheat-
patterns
The study does not proclaim that
all APAAS solutions have the same
set of vulnerabilities
• Do not generalize Moodle/VPL
specific-problems.
• However, the Overfitting,
Problem Evasion, Redirection,
and Injection patterns can be
used to check for vulnerabilities
in other APAAS solutions.
Threats on Validity
25
• We have to be aware that (even first-year)
students are clever enough to trick automated
grading solutions.
• Cheat patterns:
• Overfitting
• Problem Evasion
• Redirection
• Injection
• Options we currently investigate:
• Randomise test cases
• Pragmatic code inspection
• Isolation of submission and evaluation logic
• Exactly these features seem to be only
incompletely provided by current APAAS systems.
Conclusion
26
JEdUnit
https://github.com/nkratzke/JEdUnit
Acknowledgement
27
Presentation on SpeakerDeck
Preprint on ResearchGate
Advisers of the practical courses
• David Engelhardt, Thomas Hamer, Clemens Stauner,
Volker Völz, Patrick Willnow
Student tutors
• Franz Bretterbauer, Francisco Cardoso, Jannik
Gramann, Till Hahn, Thorleif Harder, Jan Steffen
Krohn, Diana Meier, Jana Schwieger, Jake Stradling,
and Janos Vinz
Picture Reference
• Hacker: Pixabay.com (CC0)
• Robot: Pixabay.com (CC0)
About
28
Nane Kratzke
Web: http://nane.kratzke.pages.mylab.th-luebeck.de/about
Twitter: @NaneKratzke
LinkedIn: https://de.linkedin.com/in/nanekratzke
GitHub: https://github.com/nkratzke
ResearchGate: https://www.researchgate.net/profile/Nane_Kratzke
SlideShare: http://de.slideshare.net/i21aneka

More Related Content

What's hot

Programming with GUTs
Programming with GUTsProgramming with GUTs
Programming with GUTs
catherinewall
 
Performance analysis of machine learning approaches in software complexity pr...
Performance analysis of machine learning approaches in software complexity pr...Performance analysis of machine learning approaches in software complexity pr...
Performance analysis of machine learning approaches in software complexity pr...
Sayed Mohsin Reza
 
Industry-Academia Communication In Empirical Software Engineering
Industry-Academia Communication In Empirical Software EngineeringIndustry-Academia Communication In Empirical Software Engineering
Industry-Academia Communication In Empirical Software Engineering
Per Runeson
 

What's hot (20)

Lung-Hao Lee - 2015 - Overview of the NLP-TEA 2015 Shared Task for Chinese Gr...
Lung-Hao Lee - 2015 - Overview of the NLP-TEA 2015 Shared Task for Chinese Gr...Lung-Hao Lee - 2015 - Overview of the NLP-TEA 2015 Shared Task for Chinese Gr...
Lung-Hao Lee - 2015 - Overview of the NLP-TEA 2015 Shared Task for Chinese Gr...
 
Exploratory testing STEW 2016
Exploratory testing STEW 2016Exploratory testing STEW 2016
Exploratory testing STEW 2016
 
Testing foundations
Testing foundationsTesting foundations
Testing foundations
 
[Tho Quan] Fault Localization - Where is the root cause of a bug?
[Tho Quan] Fault Localization - Where is the root cause of a bug?[Tho Quan] Fault Localization - Where is the root cause of a bug?
[Tho Quan] Fault Localization - Where is the root cause of a bug?
 
Empirical Methods in Software Engineering - an Overview
Empirical Methods in Software Engineering - an OverviewEmpirical Methods in Software Engineering - an Overview
Empirical Methods in Software Engineering - an Overview
 
[01-B] Empirical software engineering
[01-B] Empirical software engineering[01-B] Empirical software engineering
[01-B] Empirical software engineering
 
Ssbse12b.ppt
Ssbse12b.pptSsbse12b.ppt
Ssbse12b.ppt
 
Shyam presentation prefinal
Shyam presentation prefinalShyam presentation prefinal
Shyam presentation prefinal
 
130321 zephyrin soh - on the effect of exploration strategies on maintenanc...
130321   zephyrin soh - on the effect of exploration strategies on maintenanc...130321   zephyrin soh - on the effect of exploration strategies on maintenanc...
130321 zephyrin soh - on the effect of exploration strategies on maintenanc...
 
[2017/2018] RESEARCH in software engineering
[2017/2018] RESEARCH in software engineering[2017/2018] RESEARCH in software engineering
[2017/2018] RESEARCH in software engineering
 
Programming with GUTs
Programming with GUTsProgramming with GUTs
Programming with GUTs
 
MSR2017-RevHelper
MSR2017-RevHelperMSR2017-RevHelper
MSR2017-RevHelper
 
Sound Empirical Evidence in Software Testing
Sound Empirical Evidence in Software TestingSound Empirical Evidence in Software Testing
Sound Empirical Evidence in Software Testing
 
Ontology model for c overflow vulnerabilities attack
Ontology model for c overflow vulnerabilities attackOntology model for c overflow vulnerabilities attack
Ontology model for c overflow vulnerabilities attack
 
Performance analysis of machine learning approaches in software complexity pr...
Performance analysis of machine learning approaches in software complexity pr...Performance analysis of machine learning approaches in software complexity pr...
Performance analysis of machine learning approaches in software complexity pr...
 
Wcre13a.ppt
Wcre13a.pptWcre13a.ppt
Wcre13a.ppt
 
Industry-Academia Communication In Empirical Software Engineering
Industry-Academia Communication In Empirical Software EngineeringIndustry-Academia Communication In Empirical Software Engineering
Industry-Academia Communication In Empirical Software Engineering
 
Week 10: Programming for Data Analysis
Week 10: Programming for Data AnalysisWeek 10: Programming for Data Analysis
Week 10: Programming for Data Analysis
 
Machine Learning Techniques with Ontology for Subjective Answer Evaluation
Machine Learning Techniques with Ontology for Subjective Answer EvaluationMachine Learning Techniques with Ontology for Subjective Answer Evaluation
Machine Learning Techniques with Ontology for Subjective Answer Evaluation
 
Influence of the population structure on the performance of an Agent-Based Ev...
Influence of the population structure on the performance of an Agent-Based Ev...Influence of the population structure on the performance of an Agent-Based Ev...
Influence of the population structure on the performance of an Agent-Based Ev...
 

Similar to Smart like a Fox: How clever students trick dumb programming assignment assessment systems

Automock: Interaction-Based Mock Code Generation
Automock: Interaction-Based Mock Code GenerationAutomock: Interaction-Based Mock Code Generation
Automock: Interaction-Based Mock Code Generation
Sabrina Souto
 
Day 1 1620 - 1705 - maple - pranabendu bhattacharyya
Day 1   1620 - 1705 - maple - pranabendu bhattacharyyaDay 1   1620 - 1705 - maple - pranabendu bhattacharyya
Day 1 1620 - 1705 - maple - pranabendu bhattacharyya
PMI2011
 
Day1 1620-1705-maple-pranabendubhattacharyya-131008043643-phpapp02
Day1 1620-1705-maple-pranabendubhattacharyya-131008043643-phpapp02Day1 1620-1705-maple-pranabendubhattacharyya-131008043643-phpapp02
Day1 1620-1705-maple-pranabendubhattacharyya-131008043643-phpapp02
PMI_IREP_TP
 
Algorithm ExampleFor the following taskUse the random module .docx
Algorithm ExampleFor the following taskUse the random module .docxAlgorithm ExampleFor the following taskUse the random module .docx
Algorithm ExampleFor the following taskUse the random module .docx
daniahendric
 

Similar to Smart like a Fox: How clever students trick dumb programming assignment assessment systems (20)

Testing of Object-Oriented Software
Testing of Object-Oriented SoftwareTesting of Object-Oriented Software
Testing of Object-Oriented Software
 
Experimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles BakerExperimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles Baker
 
Intelligent Career Guidance System.pptx
Intelligent Career Guidance System.pptxIntelligent Career Guidance System.pptx
Intelligent Career Guidance System.pptx
 
DITEC - Software Engineering
DITEC - Software EngineeringDITEC - Software Engineering
DITEC - Software Engineering
 
Internal assessment marking system
Internal assessment marking systemInternal assessment marking system
Internal assessment marking system
 
Can we induce change with what we measure?
Can we induce change with what we measure?Can we induce change with what we measure?
Can we induce change with what we measure?
 
Resume_Apoorva
Resume_ApoorvaResume_Apoorva
Resume_Apoorva
 
Automock: Interaction-Based Mock Code Generation
Automock: Interaction-Based Mock Code GenerationAutomock: Interaction-Based Mock Code Generation
Automock: Interaction-Based Mock Code Generation
 
Metamorphic Testing Thesis Defense.pptx
Metamorphic Testing Thesis Defense.pptxMetamorphic Testing Thesis Defense.pptx
Metamorphic Testing Thesis Defense.pptx
 
Ai in finance
Ai in financeAi in finance
Ai in finance
 
MexADL - HADAS Presentation
MexADL - HADAS PresentationMexADL - HADAS Presentation
MexADL - HADAS Presentation
 
Requirement and System Analysis
Requirement and System AnalysisRequirement and System Analysis
Requirement and System Analysis
 
Survey on Software Defect Prediction (PhD Qualifying Examination Presentation)
Survey on Software Defect Prediction (PhD Qualifying Examination Presentation)Survey on Software Defect Prediction (PhD Qualifying Examination Presentation)
Survey on Software Defect Prediction (PhD Qualifying Examination Presentation)
 
Expanding our Testing Horizons
Expanding our Testing HorizonsExpanding our Testing Horizons
Expanding our Testing Horizons
 
Day 1 1620 - 1705 - maple - pranabendu bhattacharyya
Day 1   1620 - 1705 - maple - pranabendu bhattacharyyaDay 1   1620 - 1705 - maple - pranabendu bhattacharyya
Day 1 1620 - 1705 - maple - pranabendu bhattacharyya
 
Day1 1620-1705-maple-pranabendubhattacharyya-131008043643-phpapp02
Day1 1620-1705-maple-pranabendubhattacharyya-131008043643-phpapp02Day1 1620-1705-maple-pranabendubhattacharyya-131008043643-phpapp02
Day1 1620-1705-maple-pranabendubhattacharyya-131008043643-phpapp02
 
Computer Based Assessment.pptx
Computer Based Assessment.pptxComputer Based Assessment.pptx
Computer Based Assessment.pptx
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in Production
 
Algorithm ExampleFor the following taskUse the random module .docx
Algorithm ExampleFor the following taskUse the random module .docxAlgorithm ExampleFor the following taskUse the random module .docx
Algorithm ExampleFor the following taskUse the random module .docx
 

More from Nane Kratzke

Towards a Lightweight Multi-Cloud DSL for Elastic and Transferable Cloud-nati...
Towards a Lightweight Multi-Cloud DSL for Elastic and Transferable Cloud-nati...Towards a Lightweight Multi-Cloud DSL for Elastic and Transferable Cloud-nati...
Towards a Lightweight Multi-Cloud DSL for Elastic and Transferable Cloud-nati...
Nane Kratzke
 
We have the Bricks to Build Cloud-native Cathedrals - But do we have the mortar?
We have the Bricks to Build Cloud-native Cathedrals - But do we have the mortar?We have the Bricks to Build Cloud-native Cathedrals - But do we have the mortar?
We have the Bricks to Build Cloud-native Cathedrals - But do we have the mortar?
Nane Kratzke
 
About an Immune System Understanding for Cloud-native Applications - Biology ...
About an Immune System Understanding for Cloud-native Applications - Biology ...About an Immune System Understanding for Cloud-native Applications - Biology ...
About an Immune System Understanding for Cloud-native Applications - Biology ...
Nane Kratzke
 

More from Nane Kratzke (20)

#BTW17 on Twitter (Die Bundestagswahl 2017 auf Twitter - war der Ausgang abzu...
#BTW17 on Twitter (Die Bundestagswahl 2017 auf Twitter - war der Ausgang abzu...#BTW17 on Twitter (Die Bundestagswahl 2017 auf Twitter - war der Ausgang abzu...
#BTW17 on Twitter (Die Bundestagswahl 2017 auf Twitter - war der Ausgang abzu...
 
About being the Tortoise or the Hare? Making Cloud Applications too Fast and ...
About being the Tortoise or the Hare? Making Cloud Applications too Fast and ...About being the Tortoise or the Hare? Making Cloud Applications too Fast and ...
About being the Tortoise or the Hare? Making Cloud Applications too Fast and ...
 
Serverless Architectures - Where have all the servers gone?
Serverless Architectures - Where have all the servers gone?Serverless Architectures - Where have all the servers gone?
Serverless Architectures - Where have all the servers gone?
 
There is no impenetrable system - So, why we are still waiting to get breached?
There is no impenetrable system - So, why we are still waiting to get breached?There is no impenetrable system - So, why we are still waiting to get breached?
There is no impenetrable system - So, why we are still waiting to get breached?
 
Towards a Lightweight Multi-Cloud DSL for Elastic and Transferable Cloud-nati...
Towards a Lightweight Multi-Cloud DSL for Elastic and Transferable Cloud-nati...Towards a Lightweight Multi-Cloud DSL for Elastic and Transferable Cloud-nati...
Towards a Lightweight Multi-Cloud DSL for Elastic and Transferable Cloud-nati...
 
We have the Bricks to Build Cloud-native Cathedrals - But do we have the mortar?
We have the Bricks to Build Cloud-native Cathedrals - But do we have the mortar?We have the Bricks to Build Cloud-native Cathedrals - But do we have the mortar?
We have the Bricks to Build Cloud-native Cathedrals - But do we have the mortar?
 
About an Immune System Understanding for Cloud-native Applications - Biology ...
About an Immune System Understanding for Cloud-native Applications - Biology ...About an Immune System Understanding for Cloud-native Applications - Biology ...
About an Immune System Understanding for Cloud-native Applications - Biology ...
 
Der Bundestagswahlkampf 2017 auf Twitter - War der Ausgang abzusehen?
Der Bundestagswahlkampf 2017 auf Twitter - War der Ausgang abzusehen?Der Bundestagswahlkampf 2017 auf Twitter - War der Ausgang abzusehen?
Der Bundestagswahlkampf 2017 auf Twitter - War der Ausgang abzusehen?
 
Smuggling Multi-Cloud Support into Cloud-native Applications using Elastic Co...
Smuggling Multi-Cloud Support into Cloud-native Applications using Elastic Co...Smuggling Multi-Cloud Support into Cloud-native Applications using Elastic Co...
Smuggling Multi-Cloud Support into Cloud-native Applications using Elastic Co...
 
Was die Cloud mit einem brennenden Haus zu tun hat
Was die Cloud mit einem brennenden Haus zu tun hatWas die Cloud mit einem brennenden Haus zu tun hat
Was die Cloud mit einem brennenden Haus zu tun hat
 
What the cloud has to do with a burning house?
What the cloud has to do with a burning house?What the cloud has to do with a burning house?
What the cloud has to do with a burning house?
 
ClouNS - A Cloud-native Application Reference Model for Enterprise Architects
ClouNS - A Cloud-native Application Reference Model for Enterprise ArchitectsClouNS - A Cloud-native Application Reference Model for Enterprise Architects
ClouNS - A Cloud-native Application Reference Model for Enterprise Architects
 
RESTful APIs mit Dart
RESTful APIs mit DartRESTful APIs mit Dart
RESTful APIs mit Dart
 
ppbench - A Visualizing Network Benchmark for Microservices
ppbench - A Visualizing Network Benchmark for Microservicesppbench - A Visualizing Network Benchmark for Microservices
ppbench - A Visualizing Network Benchmark for Microservices
 
About Microservices, Containers and their Underestimated Impact on Network Pe...
About Microservices, Containers and their Underestimated Impact on Network Pe...About Microservices, Containers and their Underestimated Impact on Network Pe...
About Microservices, Containers and their Underestimated Impact on Network Pe...
 
Java Streams und Lambdas
Java Streams und LambdasJava Streams und Lambdas
Java Streams und Lambdas
 
Dart (Teil II der Tour de Dart)
Dart (Teil II der Tour de Dart)Dart (Teil II der Tour de Dart)
Dart (Teil II der Tour de Dart)
 
Dart (Teil I der Tour de Dart)
Dart (Teil I der Tour de Dart)Dart (Teil I der Tour de Dart)
Dart (Teil I der Tour de Dart)
 
Cloud Economics in Training and Simulation
Cloud Economics in Training and SimulationCloud Economics in Training and Simulation
Cloud Economics in Training and Simulation
 
Are cloud based virtual labs cost effective? (CSEDU 2012)
Are cloud based virtual labs cost effective? (CSEDU 2012)Are cloud based virtual labs cost effective? (CSEDU 2012)
Are cloud based virtual labs cost effective? (CSEDU 2012)
 

Recently uploaded

Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
 

Recently uploaded (20)

Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
 
21st_Century_Skills_Framework_Final_Presentation_2.pptx
21st_Century_Skills_Framework_Final_Presentation_2.pptx21st_Century_Skills_Framework_Final_Presentation_2.pptx
21st_Century_Skills_Framework_Final_Presentation_2.pptx
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
Basic Intentional Injuries Health Education
Basic Intentional Injuries Health EducationBasic Intentional Injuries Health Education
Basic Intentional Injuries Health Education
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
OSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsOSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & Systems
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 

Smart like a Fox: How clever students trick dumb programming assignment assessment systems

  • 1. How clever students trick dumb automated programming assignment assessment systems (APAAS) Nane Kratzke SMART LIKE A FOX 1
  • 2. Introduction Methodology Analysis Discussion, Counter Measures Limitations, Conclusion Agenda 2 Presentation on SpeakerDeck Preprint on ResearchGate Presentation at CSEDU 2019, Heraklion, Crete, Greece (2 – 4 May 2019)
  • 3. • We are at a transition point between the industrialisation age and the digitisation age. • Computer science related skills are a vital asset in this context. One of these basic skills is practical programming. • The course sizes of university and college programming courses are steadily increasing. • Even MOOC’s are used more frequently to convey necessary programming capabilities to students of different disciplines. • The coursework is composed of assignments that are highly suited to be assessed automatically. • However, it is very often underestimated how astonishingly easy it is to trick these systems! Introduction 3 The question arises whether “robots” certificate the expertiseto program or to cheat?
  • 4. A small example to get your attention ... 4 VPL == Virtual Programming Lab • Count the occurence of a character c in a String s. • Develop a method countChar(). How to get full points in Moodle/VPL? The same works for every assignment! INTRODUCTION
  • 5. INTRODUCTION • APAAS solutions are systems that execute injected code (student submissions). • Code injection is known as a severe threat from a security point of view. • APAAS solutions protect the host system via sandbox mechanisms. • Much effort is invested in sophisticated code plagiarism detection and authorship control of student submissions. • But it was astonishing to see that APAAS solutions like VPL overlook the cheating cleverness of students. • The grading component can be cheated very straightforward. • Unattended automated programming examinations must be rated suspect. APAAS == Code Injection System 5
  • 7. • Two first semester programming Java courses in the winter semester 2018/19: • A regular computer science study programme (CS) • An information technology and design focused study programme (ITD) • In both courses we searched for student submissions that intentionally trick the grading component. • APAAS: Moodle/VPL (Version 3.3.3) Methodology 7 • To minimise Hawthorne and Experimenter effects neither the students nor the advisers were aware to be part of this study. • Even if cheating was detected this had no consequences for the students. It was not even communicated. • Students were unaware that the version history of their submissions were logged and analyzed.
  • 8. METHODOLOGY • VPL submissions were downloaded from Moodle • Python/Jupyter based sample selection • S1: triggered evaluations • S2: maximum versions • S3: low average high end • S4: condition related terms • S5: unusual terms (System.exit, ...) • S6: random submissions • NumPy, matplotlib, statistics, Javaparser libraries • Exported weekly into archived PDF documents (for manual analysis) Searching for cheats Automated sample selection, manual sample analysis 8
  • 9. METHODOLOGY Analysis of submissions 9 Manual annotation Task description Result, workload, working phases, student identifier
  • 12. ANALYSIS Continuous Example Assignment 12 Count the occurence of a character c in a String s (not case-sensitive). We searched for solutions that differed significantly from this intendend (reference) solution. The reference solution used to check for correctness.
  • 13. ANALYSIS CHEAT PATTERN (1) • Get a maximum of points but do not solve the given problem in a general way • Solution is completely useless outside the scope of the test cases • Mapping simply input parameters to expected output parameters (63%) Overfitting 13
  • 14. ANALYSIS CHEAT PATTERN (2) (30%) Problem Evasion 14 Example assignment: Count the occurence of a character c in a String s recursively. Solution pretends to be recursive, but it is merely a redirection to an overloaded method using loops (non- recursive). Intended solution Evasion solution
  • 15. ANALYSIS CHEAT PATTERN (3) (6%) Redirection 15 (1) A small spelling error will result in compiler messages indicating that a specific method is expected by the test logic! (2) Compiler error messages can reveal the reference solution. (3) A clever student might now simply redirect the submission to the reference method (to let the grader evaluate itself). Redirecting solution
  • 16. ANALYSIS CHEAT PATTERN (4) (2%) Injection 16 Print simply the points you want to have in a APAAS specific format on standard out. • Change the intended workflow of the evaluation logic • Use the standard out stream to place text that is evaluated by the APAAS system • The evaluator calls the to be evaluated code. • The submission code can print to standard out and then terminates further evaluation calls. • The evaluator parses standard outs content and will give full points! Some strings with a specific meaning for VPL.
  • 18. DISCUSSION • Randomize Test CasesOverfitting • AST-based code inspectionProblem Evasion • AST-based code inspectionRedirection • Seperate standard out stream for evaluation and submission logicInjection Counter Measures 18 A more detailed discussion can be found in the paper.
  • 19. DISCUSSION JEdUnit 19 JEdUnit https://github.com/nkratzke/JEdUnit JEdUnit is a unit testing framework with a special focus on educational aspects. It strives to simplify automatic evaluation of (small) Java programming assignments using Moodle/VPL. It is used and developed for programming classes at the Lübeck University of Applied Sciences. However, this framework might be helpful for other programming instructors, so it has been open sourced.
  • 20. DISCUSSION Randomize Test Cases 20 Don‘t do that: Do that: JEdUnit DSL to express randomized test values. E.g. apply regular expressions inversely to generate random strings.
  • 21. DISCUSSION AST-based code inspections 21 E.g.: Don‘t allow to bypass recursions by inspecting and penalizing loop presence. The JEdUnit DSL is able to express selectors on abstract syntax trees (AST) to check for the presence or absence of language constructs. The selector model of JEdUnit works similar like CSS selectors work on DOM- trees.
  • 22. DISCUSSION Isolation of submission and evaluation logic 22 Submission logic gets an isolated fake console Submission shares stdout with evaluation process JEdUnit approach VPL approach
  • 23. DISCUSSION Further Features of JEdUnit 23 JEdUnit https://github.com/nkratzke/JEdUnit • Weighting of test cases (by annotations) • Checkstyle integration (weightened rules) • DSL • to formulate test cases in a check, explain, onError pattern • to randomize test cases • to write arbitrary code inspections based on a selector model • Predefined code inspections (switch on/off): proper collection usage, Loops, Lambdas, inner classes, datafields, sonsole output, etc. • Automated class structure comparison (OO use cases to compare the structural equality of a multi-class submission with a multi-class reference solution.
  • 25. LIMITATIONS We searched qualitatively and not quantitatively for cheat-patterns • Do not draw any conclusions what kind of cheat-pattern occur at what level of programming expertise • Do not draw any conclusions on the quantitative aspects of cheating • The study does not proclaim to have identified all kinds of cheat- patterns The study does not proclaim that all APAAS solutions have the same set of vulnerabilities • Do not generalize Moodle/VPL specific-problems. • However, the Overfitting, Problem Evasion, Redirection, and Injection patterns can be used to check for vulnerabilities in other APAAS solutions. Threats on Validity 25
  • 26. • We have to be aware that (even first-year) students are clever enough to trick automated grading solutions. • Cheat patterns: • Overfitting • Problem Evasion • Redirection • Injection • Options we currently investigate: • Randomise test cases • Pragmatic code inspection • Isolation of submission and evaluation logic • Exactly these features seem to be only incompletely provided by current APAAS systems. Conclusion 26 JEdUnit https://github.com/nkratzke/JEdUnit
  • 27. Acknowledgement 27 Presentation on SpeakerDeck Preprint on ResearchGate Advisers of the practical courses • David Engelhardt, Thomas Hamer, Clemens Stauner, Volker Völz, Patrick Willnow Student tutors • Franz Bretterbauer, Francisco Cardoso, Jannik Gramann, Till Hahn, Thorleif Harder, Jan Steffen Krohn, Diana Meier, Jana Schwieger, Jake Stradling, and Janos Vinz Picture Reference • Hacker: Pixabay.com (CC0) • Robot: Pixabay.com (CC0)
  • 28. About 28 Nane Kratzke Web: http://nane.kratzke.pages.mylab.th-luebeck.de/about Twitter: @NaneKratzke LinkedIn: https://de.linkedin.com/in/nanekratzke GitHub: https://github.com/nkratzke ResearchGate: https://www.researchgate.net/profile/Nane_Kratzke SlideShare: http://de.slideshare.net/i21aneka