1) The document analyzes data from the Night Knights GWAP to understand how player participation, accuracy, and behavior are affected by incentives, playing style, task difficulty, and variety.
2) It finds that participation significantly increases with tangible rewards but the effects do not last, and playing style shifts to classifying more images per round with incentives.
3) Player profiles can be categorized as beginners, snipers, champions, or trolls based on accuracy and participation levels.
Game Incentives, Player Profiles and Task Difficulty Impact on GWAPs
1. Interplay of Game Incentives, Player Profiles
and Task Difficulty in Games with a Purpose
Gloria Re Calegari and Irene Celino
Nancy, November 15th, 2018 – 21st International Conference on Knowledge Engineering and Knowledge Management (EKAW 2018)
2. HUMAN-IN-THE-LOOP FOR KNOWLEDGE ACQUISITION
• Machine learning approaches train automatic models on the basis of a training set, thus they require
some partial gold standard, often also named “ground truth”
• Ground truth requires putting back the human in the loop: building a training set for a machine
learning pipeline means asking people to execute a set of tasks
• This knowledge acquisition challenge is usually solved in one of the following ways:
• Asking experts to put together the training set (but involving experts can be expensive!)
• Adopting Crowdsourcing and Human Computation approaches, thus asking to a distributed
crowd to collect the required knowledge
Interplay of Game Incentives, Player Profiles and Task Difficulty in GWAPs - EKAW 2018 2
3. • Crowdsourcing and Human Computation approaches have been largely adopted for several
knowledge management tasks: collection, enrichment, validation, annotation, ranking, …
• Those approaches differ in engagement and reward schemes for human participants
• What are the condition that make it worth adopting a GWAP approach?
• When and how are GWAPs effective to achieve their goal?
• Crowdsourcing is the process to
outsource tasks to a “crowd” of
distributed people
(notable examples: Amazon
Mechanical Turk, Figure Eight)
• Human Computation is a computer
science technique in which a
computational process is performed
by outsourcing certain steps to
humans, usually when humans are
very good at solving those tasks while
computers are not
(notable example: reCAPTCHA)
• Games with a Purpose (GWAP) are
a Human Computation application
that lets to outsource some tasks to
humans in an entertaining way
(notable example: the ESP game)
CROWDSOURCING & HUMAN COMPUTATION
Interplay of Game Incentives, Player Profiles and Task Difficulty in GWAPs - EKAW 2018 3
premium
access
money
prizes
knowledge
recognition
fun
enjoyment
4. • Input: set of pictures and
classification categories
• Goal: associate a category to
each picture by assigning a
score 𝜎 to each picture-category
pair
• Score 𝜎 of each picture-
category association is updated
on the basis of players’ choices
• When the score of a picture-
category pair overcomes the
threshold 𝜎 ≥ 𝑡 , the association
is considered “true” (and the
picture is removed from the
game)
• Purpose: identify pictures of cities from above between those taken on board of the ISS (the
pictures are used then in a scientific process in light pollution research)
USE CASE: THE NIGHT KNIGHTS GWAP
Interplay of Game Incentives, Player Profiles and Task Difficulty in GWAPs - EKAW 2018
http://nightknights.eu
DATA COLLECTION & VALIDATION
Pure GWAP with
not-so-hidden purpose
(but played by anybody)
Points, badges,
leaderboard as
intrinsic reward
A player scores if he/she
agrees with another player
“Bonus” intrinsic reward
with NASA pictures!
Gloria Re Calegari, Gioele Nasi, Irene Celino. Human Computation vs. Machine Learning:
an Experimental Comparison for Image Classification. Human Computation Journal, vol. 5, issue 1, 2018.
Gloria Re Calegari, Andrea Fiano and Irene Celino: A Framework to build Games with a Purpose
for Linked Data Refinement, in proceedings of ISWC 2018, LNCS Volume 11137, pp. 154-169.
4
5. NIGHT KNIGHTS: DATA AND EVALUATION
• Reference observation period: 9 months (February-October 2017)
• 1 month of competition with tangible reward (join the 2017 Summer Expetition to observe the
Solar Eclipse in USA) in June-July 2017
• 4 months from the game launch to the competition start + 4 months after the competition
• Data available at https://github.com/STARS4ALL/Night-Knights-dataset
• ~ 650 players and ~ 28.000 classified pictures
• Released under a Creative Commons 4.0 license
• Investigation to analyse participation and find profile patterns
• Standard GWAP metrics
• Citizen Science metrics
• Influence of different factors, including incentives, playing style, task difficulty, …
5Interplay of Game Incentives, Player Profiles and Task Difficulty in GWAPs - EKAW 2018
6. [Q1] HOW DO PARTICIPATION AND RESULTS CHANGE WITH INCENTIVES?
[Q2] DO THE EXTRINSIC REWARD EFFECTS LAST OVER TIME?
[Q1]
• A tangible reward has a clear effect on participation
• There is a statistically significant difference between
competition and non-competition periods in all evaluation
metrics (throughput, average life play, expected contribution)
[Q2]
• The incentive effect doesn’t seem to last: there is no
statistically significant difference between the pre-competition
and the post-competition periods
• The overlaps between the set of players in the different periods
are very limited (<10%)
Interplay of Game Incentives, Player Profiles and Task Difficulty in GWAPs - EKAW 2018
Before During After
Time span (months) 4 1 4
Classified images 1,830 24,600 1,300
Contributions 13,000 187,600 3,600
Users 285 174 174
Total play time (hours) 65 471 29
Throughput (tasks/hour) 69 212 113
ALP (mins/player) 5.5 65 4
EC (tasks/user) 6.4 141 7.5
6
7. [Q3] DOES PLAYING STYLE CHANGE WITH THE INCENTIVE?
• Contribution speed = number of images played in each game round
• Estimation: 3-5 seconds/photo, 1 min round ~ 15 images/round
• During the competition (extrinsic motivation)
• Normal distribution centred around 15 pictures/round
• Players tried to classify as many picture as possible
• Before and after the competition (intrinsic motivation)
• Almost flat distribution with median < 10 images/round
• Players adopted a more “relaxed” playing style
7Interplay of Game Incentives, Player Profiles and Task Difficulty in GWAPs - EKAW 2018
8. [Q4] HOW DO GWAPS COMPARE TO TRADITIONAL CITIZEN SCIENCE?
[Q5] WHAT DOES PLAYER BEHAVIOUR TELL ABOUT THE GAME NATURE?
[Q4]
• Engagement metrics
• From Citizen Science literature: activity ratio (AR, % active
days), daily devoted time (DDT, in hours), relative active
duration (RAD, wrt reference period), variation in periodicity
(VIP, std of intervals between active days)
• Players show very different behaviour:
• 2-3 times higher AR, consistently higher DDT and RAD
• Significantly lower VIP
• Clustering leads to 90% group of hardworkers (high AR and
low VIP), other Citizen Science behaviour not observed
[Q5]
• Casual game, because of total active time (last – first round)
• 75% of players played for less than 5 minutes
• 10% of players played for more than 1 day
8Interplay of Game Incentives, Player Profiles and Task Difficulty in GWAPs - EKAW 2018
NK
(global)
NK
(compet.)
MW
(*)
GZ
(*)
WI
(**)
AR 0.96 0.95 0.40 0.33 0.32
DDT 0.68 1.80 0.44 0.32 -
RAD - 0.54 0.20 0.23 0.43
VIP 14.53 2.53 18.27 25.23 5.11
Citizen Science campaigns from reference literature:
* Ponciano, L., Brasileiro, F.: Finding volunteers’ engagement profiles in human
computation for citizen science projects. Human Computation Journal, 2015
** Aristeidou, M., Scanlon, E., Sharples, M.: Profiles of engagement in online communities
of citizen science participation. Computers in Human Behavior, 2017
9. [Q6] WHAT KIND OF GWAP PLAYER PROFILES CAN BE IDENTIFIED?
• Player accuracy = how many tasks each player correctly solved over
the total number of tasks he played with (correct wrt aggregated solution)
• Player participation = total number of contributions given by player
• Threshold on accuracy axis accurate / inaccurate player distinction
• Threshold on participation axis casual / frequent player distinction
• Four different player profiles:
• Beginners (low participation, low accuracy)
• Snipers (low participation, high accuracy)
• Champions (high participation, high accuracy)
• Trolls (high participation, low accuracy)
• Distribution of contributions across profiles:
9Interplay of Game Incentives, Player Profiles and Task Difficulty in GWAPs - EKAW 2018
Beginners Snipers Champions Trolls
Contributions 0.7% 0.4% 95.9% 3.0%
10. [Q7] DOES PLAYER BEHAVIOUR CHANGE WITH DIFFERENT INCENTIVES?
• During competition (extrinsic motivation period)
• Majority of champions (high participation, high accuracy)
maybe learning effect?
• Higher average accuracy (statistically significant difference) for
both casual and frequent players (7% improvement in both cases)
higher attention brings higher quality
• Before/after competition (intrinsic motivation period)
• (Relative) majority of beginners (low participation, low accuracy)
maybe due to curiosity or “first try”
• Higher variability of accuracy values (height of boxplots)
• In all periods: limited number of trolls, and always majority of accurate
players (snipers+champions, 64%)
10Interplay of Game Incentives, Player Profiles and Task Difficulty in GWAPs - EKAW 2018
11. [Q8] DOES PLAYER BEHAVIOUR CHANGE WITH TASK DIFFICULTY?
[Q9] DOES PLAYER BEHAVIOUR CHANGE WITH TASK VARIETY?
• Task difficulty = number of different users needed to solve the task (i.e. to find an
agreement by aggregating user contributions)
• Easy tasks: 4 users (minimum by design), 58% of all tasks
• Difficult tasks: 5 to 17 users
• Accuracy variability with task difficulty
• No difference between casual and frequent players on easy tasks
• Statistically significant difference between casual and frequent players on
difficult tasks learning effect (the more they play, the higher the accuracy)
• Accuracy variability with task variety (different classes)
• Some classes are indeed “more difficult” than others
• No difference between casual and frequent players across classes
indeed anybody can be a classifier (no expert knowledge required)
11Interplay of Game Incentives, Player Profiles and Task Difficulty in GWAPs - EKAW 2018
12. CONCLUSIONS
• GWAPs are an effective “human in the loop” method to engage a target community in a process of
knowledge management (e.g. to collect a large enough training set for machine learning)
• Still they are less explored and evaluated among Human Computation approaches
• Investigation of the interplay of different factors in GWAP evaluation
• Game incentives, player participation profiles, task difficulty, …
• A framework to analyse a GWAP and assess the effectiveness of your target community
involvement in knowledge acquisition and management
• Quantitative results are specific of the analysed game, but completely replicable approach
• A method to identify strengths and weaknesses of a GWAP and to plan improvements
Interplay of Game Incentives, Player Profiles and Task Difficulty in GWAPs - EKAW 2018 12
13. MILANO
viale Sarca 226,
20126,
Milano - Italy
LONDON
4° floor
57 Rathbone Place
London W1T 1JU – UK
NEW YORK
One Liberty Plaza,
165 Broadway, 23rd Floor,
New York City, New York, 10006 USA
Cefriel.com
Interplay of Game Incentives, Player Profiles
and Task Difficulty in Games with a Purpose
Gloria Re Calegari and Irene Celino
This work was partially supported by the STARS4ALL project
(H2020-688135) co-funded by the European Commission
Icons made by Eucalyp from www.flaticon.com
Contact me:
Irene Celino
Head of Knowledge Technologies Group
Cefriel - Politecnico di Milano
irene.celino@cefriel.com
iricelino.org