This document discusses learning analytics for evaluating competencies and behaviors in serious games. It begins by introducing the presenters and their affiliations. It then discusses motivations for using games for learning and assessment, noting that games can assess complex skills and be engaging for learners. The document outlines the design, development, and evaluation process for game-based assessment, including gathering data during design and implementing assessment models. It provides an example game called Shadowspect and describes how evidence from the game informs constructs and algorithms to measure skills like efficiency. The document notes future work could include evaluating models with external measures and ensuring generalizability.
6. Main contributors to this research
José A. Ruipérez-Valiente
BEng Telecomunications Systems (UCAM),
MEng Telecomunications, MSc y PhD
Telematics (UC3M), Postdoc (MIT)
6 years working in learning analytics across
many objectives and contexts
Currently focused in large scale trends in
MOOCs and game-based assessment
Juan de la Cierva Researcher at UMU and
affiliate at MIT Playful Journey Lab
YJ (Yoon Jeon) Kim
Executive Director Playful
Journey Lab located at
MIT Open Learning
Assessment scientist
Focus on games and
playful approaches for
assessment
7. Topics related to this talk
- Games for Learning
- Game-based Assessment
- Learning Analytics
- … and Design (which is transverse to numerous areas and applications)
9. A game is a voluntary interactive
activity, in which one or more players
follow rules that constrain their
behavior, enacting an artificial conflict
that ends in a quantifiable outcome.
~Eric Zimmerman (2004)
10. Why Games?
● Games are “flexible enough for players to
inhabit and explore through meaningful
play” (Salen & Zimmerman) (deep learning)
● Majority of children grow up playing games
● Learners have more freedom related to
how much effort they choose to expend,
how often they fail and try again (Osterweil,
2014) (real life)
11. Assessment is a process of reasoning
from evidence. Therefore, an
assessment is a tool designed to
observe students’ behavior and
produce data that can be used to draw
reasonable inferences about what
students know.
~ Bob Mislevy
12. Why Games for Assessment?
● Games incorporate multiple pathways to solution(s) where learners can make
meaningful choices and demonstrate multiple ways of solving problems
● Use complex and authentic problems → hard-to-measure constructs
o We need to assess 21st century skills
● Games are motivating and engaging → accurate assessment (Sundre &
Wise, 2003)
● It doesn’t feel like assessment (i.e. stealth assessment)
o Less stresful situations for students
14. The Broad view of Learning Analytics
…collection, analysis and reporting of data about learners and
their contexts, for purposes of understanding and optimising
learning and the environments in which it occurs…
Source: First Learning Analytics
and Knowledge Conference
15. The Learning Analytics data-driven Process
Raw data
generation
Feature
engineering
Visualizations
Recommendation
Report generator
Meaningful features
Which raw data is
necessary?
What to do with the processed
data?
What to obtain and How
to do it?
Technology as an engine to enhance learning
Exploration,
Correlation,
clustering,
prediction,
causes…
Learning
environments
Conclusions generate feedback and close the LA loop
18. Design
● Design and implementation of game system
○ Game mechanics that can generate evidence
from the constructs and a data infrastructure that
effectively stores that evidence
○ The most iterative step of the process with very
frequent playtesting
1. Start with paper prototypes
2. Move to drafty digital prototypes
3. End with advanced digital prototypes
● Data collection
○ Diverse audiences and contexts
○ Very important for game mechanics and tech side
○ Face-to-face playtesting
○ Amazon MTurk
24. Model development
● Implementation of the assessment machinery:
○ Process of turning evidence into constructs
○ Content knowledge assessment: Following a
traditional Evidence-centered Design
○ Cognitive and behavioral assessment: Combining
knowledge engineering process and ML with expert
labelling
● Data collection:
○ Same high school context, age, and settings
○ Two sessions of one hour each
○ Around 10 US high school classes and more than 200
hundred students
27. Common Core Geometry Standards
● Competency model: We focus on the common core geometry standards
o MG.A.1: Use geometric shapes, their measures, and their properties to describe
objects (e.g., modeling a tree trunk or a human torso as a cylinder)
o GMD.B.4: Identify the shapes of two-dimensional cross-sections of three-
dimensional objects, and identify three-dimensional objects generated by rotations
of two-dimensional objects
o CO.A.5: Given a geometric figure and a rotation, reflection, or translation, draw the
transformed figure
o CO.B.6: Use geometric descriptions of rigid motions to transform figures and to
predict the effect of a given rigid motion on a given figure
28. ECD Summary for Geometry Common Standards Assessmement
● Collaboration with geometry specialist, game designer and assessment designer
○ Evidence model: We generate puzzles that generate evidence from the Geometry Common Standards
○ Task model: We map the relationship (none, weak or strong) of each puzzle with the common standard
○ Assembly model: We put all the evidence from a student together to assess their content knowledge
○ Presentation & Delivery model: Reports and dashboards by student/standard. Difficulty by exercise
Puzzle MG.A.1 GMD.B.4 …
Puzzle 1 Weak Weak …
Puzzle 2 None None …
… … … …
Student Puzzle 1 Puzzle 2 …
Student 1
OK, # 1
attempt
OK, # 3
attempts
…
Student 1 NA
Fail, # 5
attempt
…
… … … …
31. Knowledge Engineering Process
● We acquire knowledge about the construct that we want to measure
1. Reading about the construct
2. Conducting interview with experts
3. Reviewing related scientific literature
● We algorithmically implement features that use the data/evidence that can inform the
construct that we want to measure
32. Our simplified case scenario now updates to:
Evidence Constructs
map
Data Features
data schema inform
algorithms
33. Efficiency construct
- Efficiency is the ability to do things well, successfully, and without waste. It
often specifically comprises the capability of a specific application of effort
to produce a specific outcome with a minimum amount or quantity of
waste, expense, or unnecessary effort (Wikipedia)
34. Evidence in Shadowspect related to efficiency
● Ability to do things well:
○ Solving puzzles correctly
● Expense or effort:
○ Time invested
○ Number of attempts to solve a problem
35. Mapping evidence into necessary data in Shadowspect
● We need: puzzles solved correctly, time invested and attempts
○ Necessary types of events for that:
■ puzzle_start (timestamp, student, puzzle_id)
■ leave_to_menu (timestamp, student, puzzle_id)
■ puzzle_attempt (timestamp, student, puzzle_id, correct)
37. Algorithm to compute features from data (pseudo-code)
# note this is a VERY simplified version that do not aim to be the most effective implementation of this algorithm
computeEfficiencyFeatures(student):
student_events = getStudentEvents(student)
correct_exercises_list = list(); number_attempts = 0; total_time = 0; puzzle_started_event = None
for event in student_events:
if(event[‘type’] == ‘puzzle_started’) then
puzzle_started_event = event
elif(event[‘type’] == ‘leave_to_menu’) then
total_time += (event[‘timestamp’] - puzzle_started_event[‘timestamp’])
puzzle_started_event = None
elif(event[‘type’] == ‘puzzle_attempt’):
number_attempts += 1
if(event[‘correct’] == True) then
correct_exercises_list.add(event[‘puzzle_id’])
attempts_per_correct_problem = length(unique(correct_exercises_list))/number_attempts
time_per_correct_problem = length(unique(correct_exercises_list))/total_time
return(attempts_per_correct_problem, time_per_correct_problem)
38. The previous general scenario
Evidence Constructs
map
Data Features
data schema inform
algorithms
39. Model for efficiency in Shadowspect
Evidence
● Correct puzzles
● Time
● Number attempts
Data
● puzzle_start
● leave_to_menu
● puzzle_attempt
data schema inform
computeEfficiency
Features(student)
Construct
Efficiency
Features
attempts_per_correct_problem
time_per_correct_problem
map
41. Expert Labelling and Machine Learning Process
● Two or more experts label text or video replays that can be visually assessed
○ We divide all level interactions in replays that can be labelled
○ Experts review replays and label them for each construct that we want to measure
■ They might use rubrics and we are looking for expert inter-agreement (Cohen’s kappa)
○ We implement a supervised machine learning assessment model based on these labels
● Challenges here include achieving good inter-agreement, technical logistics, replay
resolution and final implementation of the ML model
Example of simplified text replay: 1. Start puzzle – 2. Create shape square – 3. Move square – 4. Create cone
5. Rotate cone – 6. Change perspective – 7. Snapshot – 8. Move cone – 9. Submit – 10 Puzzle correct
42. Expert Labelling and Machine Learning Process
Evidence
Constructsmap
Data Features
data schema
inform
algorithms
expert
assessment
ML/AI
43. Evaluation
● We are not here yet! Future plans:
● Data collection:
○ Implementation as part of the curriculum in high
school classes
○ Demographic and school data with external measures
● Game analytics: How is the game being used by
students? Improvements, enjoyment…
● Model performance evaluation: How are the
models working? What do teachers think about
models?
● Psychometric evaluation: Are our models
correlated to other external tests, e.g. geometry
traditional tests or spatial reasoning validated
instruments
44. It’s time to say goodbye
But let’s conclude before that
45. Conclusions
● Alternative assessment method with great potential
○ Focus on complex constructs, can focus on the process (on only outcomes), is less stressful
and more enjoyable for students
● Highly challenging and multidisciplinary field, main problems:
○ Cost, scalability and generalization across GBA tools, model validity, trustworthiness, and
teacher literacy
● Some companies are already using GBA as part pre-hiring
● Difference between Assessment and assessment
● Opportunities for collaboration!
begins by identifying what should be assessed in terms of knowledge, skills, or other learner attributes. These variables cannot be observed directly, so behaviors and performances that demonstrate these variables need to be identified instead. The next step is determining the types of tasks or situations that would draw out such behaviors or performances.
Example around simple math knowledge in a game: