CrowdSearcher. Reactive and multiplatform Crowdsourcing. keynote speech at DBCrowd2013 workshop @ vldb2013

CROWDSEARCHER
Marco Brambilla, Stefano Ceri,
Andrea Mauri, Riccardo Volonterio
Politecnico di Milano
Dipartimento di Elettronica, Informazione e BioIngegneria
Crowdsearcher 1

Crowd-based Applications
• Emerging crowd-based applications:
• opinion mining
• localized information gathering
• marketing campaigns
• expert response gathering
• General structure:
• the requestor poses some questions
• a wide set of responders are in charge of providing answers
(typically unknown to the requestor)
• the system organizes a response collection campaign
• Include crowdsourcing and crowdsearching
Crowdsearcher 2

The “system” is a wide concept
• Crowd-based applications may use social networks and Q&A
websites in addition to crowdsourcing platforms
• Our approach: a coordination engine which keeps an overall
control on the application deployment and execution
Crowdsearcher 3
CrowdSearcher
APIAccess

CrowdSearcher
• Combines a conceptual framework, a specification
paradigm and a reactive execution control environment
• Supports designing, deploying, and monitoring
applications on top of crowd-based systems
• Design is top-down, platform-independent
• Deployment turns declarative specifications into platform-specific
implementations which include social networks and crowdsourcing
platforms
• Monitoring provides reactive control, which guarantees
applications’ adaptation and interoperability
• Developed in the context of Search Computing
(SeCo, ERC Advanced Grant, 2008-2013)
Crowdsearcher 4

An example of crowd-based application:
crowd-search
• People do not trust web search completely
• Want to get direct feedback from people
• Expect recommendations, insights, opinions, reassurance
Crowdsearcher 7

Crowd-searching after conventional
search
• From search results to friends and experts feedback
Social Platform
initial query
Human
Search
System
Search
System
Social PlatformSocial Platform
Crowdsearcher 8

Example: Find your next job (exploration)
Crowdsearcher 9

Example: Find your job (social invitation)
Crowdsearcher 10

Example: Find your job (social invitation)
Selected data items
can be transferred
to the crowd question
Crowdsearcher 11

Find your job (response submission)
Crowdsearcher 12

Crowdsearcher results (in the loop)
Crowdsearcher 13

Deployment alternatives
• Multi-platform deployment
Embedded
application
Social/ Crowd platform
Native
behaviours
External
application
Standalone
application
API
Embedding
Community / Crowd
Generated query template
Native
Crowdsearcher 14

Deployment: search on a social network
Crowdsearcher 15

Deployment: search on the social network
Crowdsearcher 16

Crowdsearcher 17

Crowdsearcher 18

From social workers to communities
• Issues and problems
• Motivation of the responders
• Intensity of social activity of the asker
• Topic appropriateness
• Timing of the post (hour of the day, day of the
week)
• Context and language barrier
Crowdsearcher 19

THE MODELAND
THE PROCESS
Crowdsearcher 20

• A simple task design and deployment process, based on specific data
structures
• created using model-driven transformations
• driven by the task specification
The Design Process
Task
Specification
Task Planning
Control
Specification
Crowdsearcher 21
• Task Specification: task operations, objects, and performers
• Task Planning: work distribution
• Control Specification: task control policies

Task Specification
• Which are the input objects of the crowd interaction?
• Do they have a schema (record of named and typed fields)?
• Which operations should the crowd perform?
• Like, label, comment, add new instances, verify/modify data, order, etc.
• Who are the performers of the task? How should they be
selected? And invited?
• e.g. push vs pull model
• Which quality criteria should be used for deciding the task
outcome?
• e.g., majority weighting, with/without spam detection
• Which platforms should be used? Which execution
interface should be used?
Crowdsearcher 22

Operations
• In a Task, performers are required to execute logical operations on input objects
• e.g. Locate the faces of the people appearing in the following 5 images
• CrowdSearcher offers pre-defined operation types:
• Like: Ask a performer to express a preference (true/false)
• e.g. Do you like this picture?
• Comment: Ask a performer to write a description / summary / evaluation
• e.g. Can you summarize the following text using your own words?
• Tag: Ask a performer to annotate an object with a set of tags
• e.g. How would you label the following image?
• Classify: Ask a performer to classify an object within a closed-set of alternatives
• e.g. Would you classify this tweet as pro-right, pro-left, or neutral?
• Add: Ask a performer to add a new object conforming to the specified schema
• e.g. Can you list the name and address of good restaurants nearby Politecnico di Milano?
• Modify: Ask a performer to verify/modify the content of one or more input object
• e.g. Is this wine from Cinque Terre? If not, where does it come from?
• Order: Ask a performer to order the input objects
• e.g. Order the following books according to your taste
Crowdsearcher 23

Task planning
Typical problems:
• Task structuring: the task is too complex or too critical to
be executed as a single operation.
• Task splitting: the input data collection is too large to be
presented to a user.
• Task routing: a query can be distributed according to the
values of some attribute of the collection.
Crowdsearcher 24

Micro Tasks
• The actual unit of interaction with a performer.
• Mapping of objects to Micro Tasks:
• How many objects in each MicroTask?
• Which objects should appear in each MicroTask?
• How often an object should appear in MicroTasks?
• Which objects cannot appear together?
• Should objects be presented always in some order?
Crowdsearcher 25

Assignment Strategy
• Given a set of MicroTasks, which performers are
assigned to them?
• Pull vs Push:
• Pull: The performer choses
• Push: The performer is chosen
• Online vs offline
• Micro Tasks dynamically assigned to performers
• First come / First served
• Based on performer’s performance
• MicroTasks statically assigned to performers
• Based on performers’ priority
• Based on matching
Crowdsearcher 26

Invitation Strategy
• The process of inviting performers to perform Micro Tasks
• Can use very different mechanisms
• Essential in order to generate the appropriate performer reaction / reward.
• Examples:
• Send an email to a mailing list
• Publish a HIT on Mechanical Turk
• Create a new challenge in your game
• Publish a post/tweet on your social network profile
• Publish a post/tweet on your friends' profile
Crowdsearcher 27

Steps in Crodw-based Application Design
1) Task Design
2) Object and Performer Design
3) Micro Task Design

Step 1. Task Design
Crowdsearcher 29

Step 2: Object and Performer Design

Step 3: MicroTask Design
Crowdsearcher 31

Complete Meta-Model
Crowdsearcher 32

Design Tool: Screenshot
Crowdsearcher 33

Application instatiation (for Italian Politics)
• Given the picture and name of a politician, specify his/her political
affiliation
• No time limit
• Performers are encouraged to look up online
• 2 set of rules
• Majority Evaluation
• Spammer Detection
Crowdsearcher 34

REACTIVITY AND
MULTIPLATFORM
Crowdsearcher 35

Crowd Control is tough…
• There are several aspects that makes crowd
engineering complicated
• Task design, planning, assignment
• Workers discovery, assessment, engagement
Crowdsearcher 36

Crowd Control is tough…
• There are several aspects that makes crowd
engineering complicated
• Task design, planning, assignment
• Workers discovery, assessment, engagement
• Controlling crowdsourcing tasks is a
fundamental issue
• Cost
• Time
• Quality
• Need for higher level abstrasction and tools
Crowdsearcher 37

Reactive Crowdsourcing
• A conceptual framework for controlling the execution of
crowd-based computations. Based on:
• Control Marts
• Active Rules
• Classical forms of controls:
• Majority control (to close object computations)
• Quality control (to check that quality constraints are met)
• Spam detection (to detect / eliminate some performers)
• Multi-platform adaptation (to change the deployment platform)
• Social adaptation (to change the community of performers)
Crowdsearcher 38

Why Active Rules?
• Ease of Use: control is easily expressible
• Simple formalism, simple computation
• Power: arbitrarily complex controls is supported
• Extensibility mechanisms
• Automation: active rules can be system-generated
• Well-defined semantics
• Flexibility: localized impact of changes on the rules set
• Control isolation
• Known formal properties descending from known theory
• Termination, confluence
Crowdsearcher 39

Control Mart
• Data structure for controlling application execution, inspired by data
marts (for data warehousing); content is automatically built from task
specification & planning
• Central entity: MicroTask Object Execution
• Dimensions: Task / Operations, Performer, Object
Crowdsearcher 40
Task Specification Task Planning Control Specification

Auxiliary Structures
• Object : tracking object responses
• Performer: tracking performer behavior (e.g. spammers)
• Task: tracking task status
Crowdsearcher 41
Task Specification Task Planning Control Specification

Active Rules Language
• Active rules are expressed on the previous data
structures
• Event-Condition-Action paradigm
Crowdsearcher 42

structures
• Events: data updates / timer
• ROW-level granularity
• OLD  before state of a row
• NEW  after state of a row
Crowdsearcher 43
e: UPDATE FOR μTaskObjectExecution[ClassifiedParty]

structures
• Condition: a predicate that must be satisfied (e.g. conditions on
control mart attributes)
Crowdsearcher 44
c: NEW.ClassifiedParty == ’Republican’

structures
• Condition: a predicate that must be satisfied (e.g. conditions on
control mart attributes)
• Actions: updates on data structures (e.g. change attribute value,
create new instances), special functions (e.g. replan)
Crowdsearcher 45
a: SET ObjectControl[oID == NEW.oID].#Eval+= 1

Crowdsearcher 46
Rule Example 1

Crowdsearcher 47
Rule Example 1

Crowdsearcher 48
Rule Example 1

Crowdsearcher 49
e: UPDATE FOR ObjectControl
c: (NEW.Rep== 2) or (NEW.Dem == 2)
a: SET Politician[oid==NEW.oid].classifiedParty = NEW.CurAnswer,
SET TaskControl[tID==NEW.tID].compObj += 1
Rule Example 2

Crowdsearcher 50
Rule Example 2

Crowdsearcher 51
Rule Example 2

Crowdsearcher 52
Rule Example 2

Rule Programming Best Practice
• We define three classes of rules
Crowdsearcher 53

Crowdsearcher 54
• Control rules: modifying the control tables;

Crowdsearcher 55
• Result rules: modifying the dimension tables (object, performer, task);

Crowdsearcher 56
• Top-to-bottom, left-to-right, evaluation
• Guaranteed termination

• Execution rules: modifying the execution table, either directly or through re-planning
Crowdsearcher 57
• Termination must be proven (rule precedence graph has cycles)

Crowdsearcher Experiment 1
• Goal: Test engagement on social networks
• Some 150 users
• Two classes of experiments:
• Random questions on fixed topics: interests (e.g. restaurants in the
vicinity of Politecnico), to famous 2011 songs, or to top-quality EU
soccer teams
• Questions manually submitted by the users
• Different invitation strategies:
• Random invitation
• Explicit selection of responders by the asker
• Outcome
• 175 like and insert queries
• 1536 invitations to friends
• 230 answers
• 95 questions (~55%) got at least one answer
Crowdsearcher 59

Manual and Random Questions
Crowdsearcher 60

Interest / Rewarding Factor
• Manually written and assigned questions
are consistently more responded in time
Crowdsearcher 61

Query Type
• Engagement depends on the difficulty of the task
• Like vs. Add tasks:
Crowdsearcher 62

Comparison of Execution Platforms
• Facebook vs. Doodle
Crowdsearcher 64

Posting Time
• Facebook vs. Doodle
Crowdsearcher 65

Crowdsearcher Experiment 2
• GOAL: demonstrate the flexibility and expressive power
of reactive crowdsourcing
• 3 experiments, focused on Italian politicians
• Parties: Human Computation  affiliation classification
• Law: Game With a Purpose  guess the convicted politician
• Order: Pure Game  hot or not
• 1 week (November 2012)
• 284 distinct performers
• Recruited through public mailing lists and social networks
announcements
• 3500 Micro Tasks
Crowdsearcher 66

Politician Affiliation
• Given the picture and name of a politician, specify his/her political
affiliation
• No time limit
• Performers are encouraged to look up online
• 2 set of rules
• Majority Evaluation
• Spammer Detection
Crowdsearcher 67

Results – Majority Evaluation_1/3
Crowdsearcher 68
30 object; object redundancy = 9;
Final object classification as simple majority after 7 evaluations

Results - Majority Evaluation_2/3
Crowdsearcher 69
Final object classification as total majority after 3 evaluations
Otherwise, re-plan of 4 additional evaluations. Then simple majority at 7

Results - Majority Evaluation_3/3
Crowdsearcher 70
Final object classification as total majority after 3 evaluations
Otherwise, simple majority at 5 or at 7 (with replan)

Results – Spammer Detection_1/2
Crowdsearcher 71
New rule for spammer detection without ground truth
Performer correctness on final majority. Spammer if > 50% wrong classifications

Results – Spammer Detection_1/2
Crowdsearcher 72
New rule for spammer detection without ground truth
Performer correctness on current majority. Spammer if > 50% wrong classifications

EXPERT FINDING IN
CROWDSEARCHER
Crowdsearcher 73

Problem
• Ranking the members of a social group according
to the level of knowledge that they have about a
given topic
• Application: crowd selection (for Crowd Searching
or Sourcing)
• Available data
• User profile
• behavioral trace that users leave behind them through
their social activities
Crowdsearcher 74

Considered Features
• User Profiles
• Plus Linked Web Pages
• Social Relationships
• Facebook Friendship
• Twitter mutual following relationship
• LinkedIn Connections
• Resource Containers
• Groups, Facebook Pages
• Linked Pages
• Users who are followed by a given user are resource containers
• Resources
• Material published in resource containers
Crowdsearcher 75

Feature Organization Meta-Model
Crowdsearcher 76

Example (Facebook)
Crowdsearcher 77

Example (Twitter)
Crowdsearcher 78

Resource Distance
• Objects in social graph organized according to their
distance with respect to the user profile
• Why? Privacy, Computational Cost, Platform Access Constraints
Distance Resource
0 Expert Candidate Profile
1
Expert Candidate owns/create/annotates Resource
Expert Candidate relatedTo Resource Container
Expert Candidate follows UserProfile
2
Expert Candidate follows UserProfile relatedTo Resource Container
Expert Candidate relatedTo Resource Container contains Resource
Expert Candidate follows UserProfile owns/create/annotates Resource
Expert Candidate follows UserProfile follows UserProfile
Crowdsearcher 79

Distance interpretation
Distance Resource
0 Expert Candidate Profile
1
Expert Candidate owns/create/annotates Resource
Expert Candidate relatedTo Resource Container
Expert Candidate follows UserProfile
2
Expert Candidate follows UserProfile relatedTo Resource Container
Expert Candidate relatedTo Resource Container contains Resource
Expert Candidate follows UserProfile owns/create/annotates Resource
Expert Candidate follows UserProfile follows UserProfile
Crowdsearcher 80

Resource Processing
• Extraction from Social
Network APIs
• Extraction of Text from linked
Web Pages
• Alchemy Text Extraction APIs
• Language Identification
• Text Processing
• Sanitization, tokenization,
stopword, lemmatization
• Entity Extraction and
Disambiguation
• TagMe
Crowdsearcher 81

Dataset
• 7 kinds of expertises
• Computer Engineering, Location, Movies & TV, Music, Science,
Sport, Technology & Videogames
• 40 volunteer users (on Facebook & Twitter & LinkedIN)
• 330.000 resources (70% with URL to external resources)
• Groundtruth created trough self-assessment
• For expertise need, vote on 7 Likert Scale
• EXPERTS  expertise above average
Crowdsearcher 84

Metrics
• We obtain lists of candidate experts and assess them
against the ground truth, using:
• For precision:
• Mean Average Precision (MAP)
• 11-Point Interpolated Average Precision (11-P)
• For ranking:
• Mean Reciprocal Rank (MRR) – for the first value
• Normalized Discounted Cumulative Gain (DCG) – for more values, can
be set @N for the first N values
Crowdsearcher 86

Metrics improves with resources
• But it comes with a cost
Crowdsearcher 87

Friendship Relationship not useful
• Inspecting friend’s resources does not improve metrics!
Crowdsearcher 88

Social Network Analysis
• a
Comparison of the results obtained with All the social networks, or separately by
FaceBook, TWitter, and LinkedIn.
Crowdsearcher 89

Main Results
• Profiles are less effective than level-1 resources
• Resources produced by others help in describing each individual’s
expertise
• Twitter is the most effective social network for expertise
matching – sometimes it outperforms the other social
networks
• Twitter most effective in Computer Engineering, Science, Technology &
Games, Sport
• Facebook effective in Locations, Sport, Movies & TV, Music
• Linked-in never very helpful in locating expertise
Crowdsearcher 90

Summary
• Results
• An integrated framework for crowdsourcing task design and control
• Well-structured control rules with guarantees of termination
• Support for cross-platform crowd interoperability
• A working prototype  crowdsearcher.search-computing.org
• Forthcoming
• Publication of Web Interface + API
• Support of declarative options for automatic rule generation
• Integration with more social networks and human computation
platforms
• Providing vertical solutions for specific markets
• More applications and experiments (e.g. in Expo 2015)
Crowdsearcher 96

CrowdSearcher. Reactive and multiplatform Crowdsourcing. keynote speech at DBCrowd2013 workshop @ vldb2013

Recomendados

Recomendados

Mais conteúdo relacionado

Destaque

Destaque (11)

Semelhante a CrowdSearcher. Reactive and multiplatform Crowdsourcing. keynote speech at DBCrowd2013 workshop @ vldb2013

Semelhante a CrowdSearcher. Reactive and multiplatform Crowdsourcing. keynote speech at DBCrowd2013 workshop @ vldb2013 (20)

Último

Último (20)

CrowdSearcher. Reactive and multiplatform Crowdsourcing. keynote speech at DBCrowd2013 workshop @ vldb2013