More Related Content Similar to Crowd Sourcing Web Service Annotations (20) Crowd Sourcing Web Service Annotations1. Crowd Sourcing Web Service Annotations
James Scicluna1, Christoph Blank1, Nathalie Steinmetz1 and Elena Simperl2
1seekda GmbH, 2Karlsruhe Institute of Technology
1
© Copyright 2012 SEEKDA GmbH – www.seekda.com
2. Outline
Introduction to seekda Web Service search engine
Web API crawling & identification
Amazon Mechanical Turk crowdsourcing
Web Service Annotation wizard
© Copyright 2012 SEEKDA GmbH – www.seekda.com
5. Why crawl for Web APIs?
Significant growth of Web APIs
> 5,400 Web APIs on ProgrammableWeb (including SOAP and
REST APIs) [end of 2009: ca. 1,500 Web APIs]
> 6,500 Mashups on ProgrammableWeb (combining Web APIs
from one or more sources)
SOAP services are only a small part of the overall available
public services
5
© Copyright 2012 SEEKDA GmbH – www.seekda.com
6. Web API Crawling
Problem:
Web APIs are
described by regular
HTML pages
No standardized
structure that helps
with the
identification
6
© Copyright 2012 SEEKDA GmbH – www.seekda.com
7. Web API Identification
Solution: Crawl for Web APIs
Approach 1: Manual Feature Identification Approach
Taking into account HTML structure (e.g., title, mark-up), syntactical
properties of used language (e.g., camel-cased words), and link
properties of pages (ratio external links / internal links)
Approach 2: Automatic Classification Approach
Text Classification, supervised learning (Support Vector Machine
model)
Training set: APIs from ProgrammableWeb
But: still needed human confirmation to be sure
7
© Copyright 2012 SEEKDA GmbH – www.seekda.com
9. Prototype – User Contributions
Web API – yes/no: confirmation from
human needed!
Other annotations that help improve
the search for Web Services
Categories
Tags
Natural Language descriptions
Cost: Free or paid service
9
© Copyright 2012 SEEKDA GmbH – www.seekda.com
10. Problem - User Contribution
Problem:
Users/developers don’t contribute enough
Hard to motivate them to provide annotations
Community recognition or peer respect not enough
Solution: crowdsourcing the annotations, pay people to
provide annotations
Use Amazon Mechanical Turk
Bootstrap annotations quickly and cheap
10
© Copyright 2012 SEEKDA GmbH – www.seekda.com
15. Amazon Mechanical Turk – Iteration 1
Number of Submissions 70
Reward per task $0.10
Restrictions none
Annotation Wizard
Web API Yes/No
Assign a category
Assign tags
Provide a natural language description
Determine whether page is documentation, pricing or listing
Rate the service
15
© Copyright 2012 SEEKDA GmbH – www.seekda.com
16. Amazon Mechanical Turk – Iteration 1
Results
21 APIs correctly identified as APIs
28 Web documents (non APIs) identified correctly as non APIs
49/70 correctly identified (70% accuracy)
Average task completion time: 2:20 min
But, only:
4 well done & complete annotations
8 acceptable annotations (non complete)
16
© Copyright 2012 SEEKDA GmbH – www.seekda.com
17. Amazon Mechanical Turk – Iterations 2 & 3
Iteration 2 Iteration 3
Number of Submissions 100 150
Reward per task $0.20 $0.20
Restrictions yes yes
Annotation Wizard
Removed page type identification & service rating
For a task to be accepted:
At least one category must be assigned
At least 2 tags must be provided
A meaningful description must be provided
17
© Copyright 2012 SEEKDA GmbH – www.seekda.com
18. Amazon Mechanical Turk – Iteration 2 & 3
Results Iteration 2 & 3:
Ca. 80% of documents correctly identified
Very satisfying annotations
Average completion time: 2:36 min
18
© Copyright 2012 SEEKDA GmbH – www.seekda.com
19. Amazon Mechanical Turk – Survey
48 survey submissions
Female 18, Male 30
Most popular origins: India (27) and USA (9)
Popular age groups:
15-22 (12)
23-30 (18)
31-50 (16)
Most of them worked in some IT profession
Provided best quality annotations
19
© Copyright 2012 SEEKDA GmbH – www.seekda.com
20. Amazon Mechanical Turk
Recommendations for further improvement:
Improve task description, especially ‘what is a Web API’
Better examples (e.g., hinting what makes a false page false)
Allow assignment of multiple categories
Conclusion:
Very positive results good way to get quality annotations
Results will help provide better search experience to users
Results can be used as positive set for automatic classification
20
© Copyright 2012 SEEKDA GmbH – www.seekda.com
21. Questions?
21
© Copyright 2012 SEEKDA GmbH – www.seekda.com