Apidays New York 2024 - The value of a flexible API Management solution for O...
Exploring the challenge of linking scientific publications and studies with crowd workers instead of domain experts
1. Web Science & Technologies
University of Koblenz ▪ Landau, Germany
Exploring the challenge of
linking scientific publications
and studies with crowd workers
instead of domain experts
Cristina Sarasua
csarasua@uni-koblenz.de
Computational Social Science workshop
Köln, 16.12.2013
2. Ideal workflow
1
Read publications
2
Access data
3 Reuse data
FOTO
Peter Schumacher (social scientist) would like to analyse
the voting patterns of Germans in the last 20 years
Past observations
New analysis, new findings
WeST
Cristina Sarasua
Exploring the challenge of linking scientific publications and
studies with crowd workers instead of domain experts
3. Reality
?
FOTO
Publications and research data (coming from surveys and
studies) are published independently
The link between them is missing
Researchers cannot easily access the research data
WeST
Cristina Sarasua
Exploring the challenge of linking scientific publications and
studies with crowd workers instead of domain experts
4. Scenario
publications
research data (studies)
WeST
Cristina Sarasua
We need a method to
process publications and
studies in order to be able
to
1. Find references to
studies inside
publications
2. Identify which
publication is connected
to which study
3. Identify the type of
relation
between
publication and study
Exploring the challenge of linking scientific publications and
studies with crowd workers instead of domain experts
5. Problem
Computers cannot perform these 3 tasks automatically in a
perfect way
Incorrect link between a
publication and a study
We need human intervention
Domain experts are often not available for such kind of
tasks
WeST
Cristina Sarasua
Exploring the challenge of linking scientific publications and
studies with crowd workers instead of domain experts
6. Solution: Crowdsourcing
“The process of outsourcing a task to a (potentially) large and
undefined group of people in an open call“ Jeff Howe, 2006
Microtask crowdsourcing
-Simple and independent tasks
-Paid crowdsourcing
-Online labor marketplaces (e.g. MTurk)
-
WeST
Cristina Sarasua
Exploring the challenge of linking scientific publications and
studies with crowd workers instead of domain experts
7. Amazon Mechanical Turk
WeST
Cristina Sarasua
Exploring the challenge of linking scientific publications and
studies with crowd workers instead of domain experts
8. Crowdsourced interlinking: the GESIS case study
Researcher
1
SSOAR
Web
portal
Publications
da|ra
InfoLink
links
2
3
CrowdLINK
corrected links
Web
portal
Research data
Hybrid solution
1) Automatic processing of publications and studies
2) Ask crowd workers to review links
- Correct errors
- Identify primary literature / secondary literature
3) Generates Linked Data
WeST
Cristina Sarasua
Exploring the challenge of linking scientific publications and
studies with crowd workers instead of domain experts
9. How is this related to CSS?
WeST
Cristina Sarasua
Exploring the challenge of linking scientific publications and
studies with crowd workers instead of domain experts
10. On the one hand …
The GESIS case study
In collab with GESIS colleagues
Katarina Boland, Daniel Hienert et al.
WeST
Cristina Sarasua
Exploring the challenge of linking scientific publications and
studies with crowd workers instead of domain experts
11. On the other hand …
How to manage such a
group of people to maximize
their efficiency and make
them happy?
WeST
Cristina Sarasua
Exploring the challenge of linking scientific publications and
studies with crowd workers instead of domain experts
13. Open call
We can impose some restrictions (e.g. language, country,
reputation gained)
Different background
Different motivations
Chart: Ipeirotis, 2010
Different behaviour
2010
Spam
Charts: Charts Ross et al., 2010
CrowdFlower 11.12.2013
WeST
Cristina Sarasua
Exploring the challenge of linking scientific publications and
studies with crowd workers instead of domain experts
14. The tasks at hand
They are not the “most exciting tasks“ of the world
The data is in German
The domain is very specific
WeST
Cristina Sarasua
Exploring the challenge of linking scientific publications and
studies with crowd workers instead of domain experts
15. First experiments of the GESIS case study
Adopted measures
Used majority voting
Included verification questions (e.g. “please type the date shown for the
publication“)
Defined gold standard links to check who could be trusted
Highlights of findings
We managed to get trusted workers quite quickly (e.g. 490 links reviewed
in ~24hours) being able to improve the precision of the automatic software
without without loosing considerable recall
The cases which required background knowledge showed worse results
The task of “relating publication and study“ was solved with much better
recall than the task of deciding on “whether a publication is
primaryLiterature or not of a study“. The precision was very high, though.
WeST
Cristina Sarasua
Exploring the challenge of linking scientific publications and
studies with crowd workers instead of domain experts
16. Ongoing research work
Can we improve their results by including mixed
incentives? Not only money, but also competition at a
microtask level
there are only X links left, be
quick!“, or „there are three workers
who were faster in reviewing links!
there 3 workers who were faster in
reviewing links!
How can we better instruct crowd workers in 1) the type of
tasks were are running and 2) the domain we are working
with?
WeST
Cristina Sarasua
Exploring the challenge of linking scientific publications and
studies with crowd workers instead of domain experts
17. Take-home message
We can employ crowd workers for connecting scientific
publications and studies in the social sciences. It can improve
automatically generated links.
How can we transfer the knowledge of domain
experts to the crowd?
WeST
Cristina Sarasua
Exploring the challenge of linking scientific publications and
studies with crowd workers instead of domain experts
18. Call for discussion
Who?
1. Psychologists
2. Social Scientists
3. Computer scientists
Possible topics
Any feedback about the aforementioned ideas
Well-established methodologies in psychology to instruct
or train a large group of people
Any suggestion on how to analyse crowd workers (i.e.
criteria)
WeST
Cristina Sarasua
Exploring the challenge of linking scientific publications and
studies with crowd workers instead of domain experts
19. Thank you.
Vielen Dank.
WeST
Cristina Sarasua
Exploring the challenge of linking scientific publications and
studies with crowd workers instead of domain experts