Human Computation for VGI Management

HUMAN COMPUTATION
FOR VGI MANAGEMENT
Irene Celino – irene.celino@cefriel.com
Cefriel, Viale Sarca 226, 20126 Milano
Workshop on Volunteered Geographic Information – Milano, April 16th, 2018

1. Introduction
2. Human Computation and Games with a Purpose
3. GWAP examples for VGI Management
4. Indirect people involvement
5. Future perspectives
AGENDA
2copyright © 2018 Cefriel – All rights reserved

from ideation to business value
3
1. INTRODUCTION
Relation between VGI and Human Computation
copyright © 2018 Cefriel – All rights reserved

VGI AND HUMAN COMPUTATION
• VGI is carried out by volunteers, so by definition it implies human intervention
• Still VGI suffers of all issues related to that:
• Varying participation  impact on sustainability (long tail effect)
• Reliability of volunteers  impact of information quality
• Uneven distribution of contributions  impact on coverage
• Human Computation is an approach that can bring benefits to VGI…
• …and VGI can reveal more than you could expect!

WISDOM OF CROWDS
• “Why the Many Are Smarter Than the Few and
How Collective Wisdom Shapes Business, Economies, Societies and Nations”
• Criteria for a wise crowd
• Diversity of opinion (importance of interpretation)
• Independence (not a “single mind”)
• Decentralization (importance of local knowledge)
• Aggregation (aim to get a collective decision)
• The are also failures/risks in crowd decisions:
• Homogeneity, centralization, division, imitation, emotionality
James Surowiecki
The wisdom of crowds
Anchor, 2005

6
2. HUMAN COMPUTATION &
GAMES WITH A PURPOSE
What is Human Computation? What goals can humans help machines to achieve? How to
involve a crowd of persons?
What extrinsic rewards (money, prizes, etc.) or intrinsic incentives can we adopt to
motivate people?

HUMAN COMPUTATION
• Human Computation is a computer science technique in which a computational process
is performed by outsourcing certain steps to humans. Unlike traditional computation,
in which a human delegates a task to a computer, in Human Computation the computer
asks a person or a large group of people to solve a problem; then it collects, interprets
and integrates their solutions
• The original concept of Human Computation by its inventor Luis von Ahn derived from the
common sense observation that people are intrinsically very good at solving some
kinds of tasks which are, on the other hand, very hard to address for a computer;
this is the case of a number of targets of Artiﬁcial Intelligence (like image recognition or
natural language understanding) for which research is still open
Edith Law and Luis von Ahn. Human computation.
Synthesis Lectures on Artiﬁcial Intelligence and Machine Learning, 2011

HUMAN COMPUTATION
Problem: an Artificial Intelligence
algorithm is unable to achieve an
adequate result with a satisfactory
level of confidence
Solution: ask people to intervene
when the AI system fails, “masking”
the task within another human
process
Example: https://www.google.com/recaptcha/

WHY HUMAN COMPUTATION FOR VGI?
• Collection of new data – as a complement to VGI itself, exploiting redundancy of multiple
contributions
• Validation of collected data or automatic processing – as “third party” to solve
discrepancies
• Completion of data, to fill out “missing pieces”
• Identification of mistakes/outdated information and respective “correction”

GAMES WITH A PURPOSE
• A GWAP lets to outsource to humans some steps of a computational process in an
entertaining way
• The application has a “collateral effect”, because players’ actions are exploited to
solve a hidden task
• The application *IS* a fully-fledged game (opposed to gamification, which is the use
of game-like features in non-gaming environments)
• The players are (usually) unaware of the hidden purpose, they simply meet game
challenges
Luis Von Ahn. Games with a purpose. Computer, 39(6):92–94, 2006
Luis Von Ahn and Laura Dabbish. Designing games with a purpose.
Communications of the ACM, 51(8):58–67, 2008

GAMES WITH A PURPOSE (GWAP)
Problem: it’s the same of
Human Computation (ask
humans when AI fails)
Solution: Solution: hide the
task within a game, so that
users are motivated by game
challenges, often remaining
unaware of the hidden purpose,
task solution comes from
agreement between players

SOME “VARIATIONS” OF HUMAN COMPUTATION
• Other terms have been used to indicate approaches and methods that are similar to
Human Computation and sometimes mistaken for it
• While there is of course quite a large overlap, it is useful to distinguish them
• Crowdsourcing
• Citizen Science

CROWDSOURCING
• Crowdsourcing is the process to outsource tasks to a “crowd” of distributed people.
The possibility to exploit the Internet as vehicle to recruit contributors and to assign
tasks led to the rise of micro-work platforms, thus often (but not always) implying a
monetary reward. The term Crowdsourcing, although quite recent, is used to indicate a
wide range of practices; however, the most common meaning of Crowdsourcing implies
that the “crowd” of workers involved in the solution of tasks is different from the traditional
or intended groups of task solvers
Jeff Howe. Crowdsourcing: How the power of the crowd
is driving the future of business. Random House, 2008

CROWDSOURCING
Problem: a company needs to
execute a lot of simple tasks,
but cannot afford hiring a
person to do that job
Solution: pack tasks in
bunches (human intelligence
tasks or HITs) and outsource
them to a very cheap workforce
through an online platform
Example: https://www.mturk.com/

CITIZEN SCIENCE
• Citizen Science is the involvement of volunteers to collect or process data as part of
a scientific or research experiment; those volunteers can be the scientists and
researchers themselves, but more often the name of this discipline “implies a form of
science developed and enacted by citizens” including those “outside of formal scientific
institutions”, thus representing a form of public participation to science. Formally, Citizen
Science has been defined as “the systematic collection and analysis of data; development
of technology; testing of natural phenomena; and the dissemination of these activities by
researchers on a primarily avocational basis”.
Alan Irwin. Citizen science: A study of people, expertise
and sustainable development. Psychology Press, 1995

CITIZEN SCIENCE
Example: https://www.zooniverse.org/
Problem: a scientific
experiment requires the
execution of a lot of simple
tasks, but researchers are busy
Solution: engage the general
audience in solving those tasks,
explaining that they are
contributing to science,
research and the public good

SPOT THE DIFFERENCE…
• Similarities:
• Involvement of people
• Aggregation of multiple contributions
• No automatic replacement
• Variations:
• Motivation
• Reward (glory, money, passion/need)
• Hybrids or parallel!
Citizen Science
Crowdsourcing
Human
Computation

18
3. GWAP EXAMPLES FOR VGI
MANAGEMENT
Can we embed VGI management tasks within Games with a Purpose?

3 EXAMPLES OF GAMES WITH A PURPOSE FOR VGI
• Collection of missing data: GWAP enabler for OSM Restaurants
• Validation of automatically collected information: LCV game
• Collection, validation and correction of data: Urbanopoly

20
• Input: OSM restaurants in a
given area with/without
cuisine tag (those with the
tag are used for assessing
player reliability)
• Goal: assign score 𝜎 to
each restaurant-cuisine pair
to discover the “right”
category
• Score 𝜎 of each pair is
updated on the basis of
players’ choices
(incremented if link
selected)
• When the score overcomes
the threshold 𝜎 ≥ 𝑡 , the
restaurant’s category is
considered “true” (and
removed from the game)
• Restaurant POIs (amenity=restaurant) from OSM may miss the cuisine type (cuisine key)
GWAP ENABLER TUTORIAL FOR OSM RESTAURANTS
Pure GWAP with
double player game
mechanics
Points, badges,
leaderboard as
intrinsic reward
A player scores if he/she
chooses the same cuisine
of its gameplay “mate”
Data validation is a result
of the “agreement”
between players
https://github.com/STARS4ALL/
gwap-enabler-tutorial
Points, badges,
leaderboard as
intrinsic reward

21
• Input: set of pixels where
the two classifications
“disagree”
• Goal: assign score 𝜎 to
each pixel-category pair to
discover the “right” land
cover class
• Score 𝜎 of each pair is
updated on the basis of
players’ choices
(incremented if selected,
decremented if not
selected)
the threshold 𝜎 ≥ 𝑡 , the
pixel’s category is
considered “true” (and
removed from the game)
• Two automatic land cover classifications in disagreement:
• DUSAF (Lombardy Region) and GlobeLand 30 (Chinese governmental agency)
LAND COVER VALIDATION GAME
https://youtu.be/Q0ru1hhDM9Q
http://bit.ly/foss4game
Pure GWAP with
not-so-hidden purpose
(played by “experts”)
Points, badges,
leaderboard as
intrinsic reward
A player scores if he/she
guess one of the two
disagreeing classifications
Data validation is a result
of the “agreement”
between players
Maria Antonia Brovelli, Irene Celino, Andrea Fiano, Monia Elisa Molinari, Vijaycharan Venkatachalam.
A crowdsourcing-based game for land cover validation. Applied Geomatics, 2017

22
• Input: data from OSM
• Goal:
if data doesn’t exist, collect
if data exists, validate
if data is wrong, correct
• Complex game embedding
“mini-games” for data
collection, validation and
correction
• Same score mechanisms,
with score 𝜎 updated on the
basis of players’ choices
the threshold 𝜎 ≥ 𝑡 , data is
considered “true” (and can
be sent back to OSM)
• POI information from OSM to be collected or validated/corrected
URBANOPOLY
Irene Celino. Geospatial dataset curation through a location-based game.
Semantic Web Journal, Volume 6, Number 2, IOS Press, 2015
Monopoly-like game
to win venues in the
real world
Wheel of fortune and
mini-games to acquire
venues and become
“rich” in the game
Data acquisition
challenges as
contributions for
missing data
Data validation
challenges to check
pre-existing data
Result from
players
“agreement”

LESSONS LEARNED BY DESIGNING AND RUNNING THOSE GAMES
• Designing and developing a full game is expensive
• The simpler the game, the better its acceptance by players and its “throughput”
• Different players are motivated by different incentives
• Fun is not always enough to engage people, especially in the long term
• Data collected via games can be enough to train automatic models
Gloria Re Calegari, Gioele Nasi, Irene Celino. Human Computation vs. Machine Learning:
an Experimental Comparison for Image Classification. Human Computation Journal, 2018.

24
4. INDIRECT PEOPLE INVOLVEMENT
Are there indirect ways to involve humans in data processing?

HUMANS AS A SOURCE OF INFORMATION
• People are not only task executors, they are also information providers!
• Open content and cooperative knowledge
• Data explicitly provided by people like VGI can “hide” further information
• e.g., logs of wiki editing, statistical distribution of contributes
• Opportunistic sensing
• Voluntary or involuntary digital traces of human-related activities
• e.g., phone call logs, GPS traces, social media activities

FROM SPATIAL ANALYTICS TO GEO-SPATIAL “SEMANTICS”
• Spatial distribution and conglomeration of specific points of interest (POI)
from OpenStreetMap can give hints about the geographical space
• Re-engineering of spatial features through comparison between areas:
same POI type shows different distribution  evidence for different
semantics (e.g. what is a pub in Milano vs. London)
• Semantic specification of spatial neighbourhoods:
• Emerging neighbourhoods from spatial clustering of POIs (opposed
to administrative divisions)
• Spatial version of tf-idf to compare between different areas (e.g.
central or peripheral areas in different cities) and to characterise
neighbourhoods (e.g. shopping district)
Gloria Re Calegari, Emanuela Carlino, Irene Celino, Diego Peroni. Supporting Geo-Ontology
Engineering through Spatial Data Analytics. 13th Extended Semantic Web Conference, 2016

FROM POI INFORMATION AND PHONE CALL LOGS TO LAND USE
• General topic: exploit “low-cost” information about a geographic area as features to
train a predictive model that outputs “expensive” information about the same area
• “Inexpensive” input information:
• Geo-information about points of interests processed to characterize space
(distance from the nearest POI of type X)
• Mobile traffic data processed using different time series techniques
(smoothing, decomposition, ﬁltering, time-windowing)
• “Expensive” output information:
• Land use characterization (usually collected through long and expensive
workflows that mix machine processing and costly human labour)
Gloria Re Calegari, Emanuela Carlino, Diego Peroni, Irene Celino. Extracting Urban Land Use from Linked Open Geospatial Data. IJGI, 2015
Gloria Re Calegari, Emanuela Carlino, Diego Peroni, Irene Celino. Filtering and Windowing Mobile Traffic Time Series for Territorial Land Use Classification. COMCOM, 2016

28
5. FUTURE PERSPECTIVES
Are we there yet?!?

FUTURE PERSPECTIVES
• VGI management is still an open issue
• Human Computation methods (and the like) can be employed to support VGI
management
• Parallel/joint adoption of different methods to get the best out of them
• Research challenges are still the same
• Collection, completion/coverage, quality, (in)homogeneity, update/sustainability, …
• Human-in-the-loop is an emerging trend and paradigm also in Machine Learning
research (e.g. active learning)

MILANO
viale Sarca 226,
20126,
Milano - Italy
LONDON
4th floor
57 Rathbone Place
London W1T 1JU – UK
NEW YORK
One Liberty Plaza,
165 Broadway, 23rd Floor,
New York City, New York, 10006 USA
Cefriel.com
Thanks for your attention!
Any question?
Irene Celino
Knowledge Technologies
Digital Interaction Division
irene.celino@cefriel.com

Human Computation for VGI Management

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a Human Computation for VGI Management

Semelhante a Human Computation for VGI Management (20)

Mais de Irene Celino

Mais de Irene Celino (20)

Último

Último (20)

Human Computation for VGI Management