This work introduces faceted service discovery. It uses the Programmable Web directory as its corpus of APIs and enhances the search to enable faceted search, given an OWL ontology. The ontology describes semantic features of the APIs. We have designed the API classification ontology using LexOnt, a software we have built for semi-automatic ontology creation tool. LexOnt is geared toward non-experts within a service domain who want to create a high-level ontology that describes the domain. Using well- known NLP algorithms, LexOnt generates a list of top terms and phrases from the Programmable Web corpus to enable users to find high-level features that distinguish one Programmable Web service category from another. To also aid non-experts, LexOnt relies on outside sources such as Wikipedia and Wordnet to help the user identify the important terms within a service category. Using the ontology created from LexOnt, we have created APIBrowse, a faceted search interface for APIs. The ontology, in combination with the use of the Apache Solr search platform, is used to generate a faceted search interface for APIs based on their distinguishing features. With this ontology, an API is classified and displayed underneath multiple categories and displayed within the APIBrowse interface. APIBrowse gives programmers the ability to search for APIs based on their semantic features and keywords and presents them with a filtered and more accurate set of search results.
Knarig Arabshian is an Assistant Professor in the Computer Science Department at Hofstra University, since Fall 2014. Prior to that she was a Member of Technical Staff at Bell Labs in Murray Hill, NJ. She received her Ph.D. in Computer Science from Columbia University in 2008.
Professor Arabshian’s interests lie in the field of semantic web, service discovery and composition, context-aware computing and distributed systems. The goal of her research is to drive forward the idea of a personalized web. Her work explores ways of describing data meaningfully and designing frameworks and systems for efficient data discovery. During her tenure at Bell Labs, she worked on different aspects of ontology creation, distribution and querying.
3. Motivation
n Most of today’s Web content is suitable
for human consumption
n Humans are left with the work of gathering
information from various websites
n Web content is heterogeneous with little or no
structure
n Data is not easily shared between web
content providers
4. Travel Example
Use services to
manually search for
airfares, car rentals
and hotels
Or search with
aggregating services
Use services to help plan travel
itinerary and provide information
on local sites such as weather,
events, or attractions
Use services that also
provide you with helpful
customer reviews
5. Semantic Web Vision
n Web information can be processed by computers
n Computers can integrate information from the web
“A web of data that can be processed directly
and indirectly by computers”
~Tim Berners-Lee (Inventor of WWW)
6. Quest for Semantics
Three main goals of the Semantic Web:
1. Building models: describe the world in abstract terms to
allow for an easier understanding of complex reality
2. Computing with knowledge: constructing reasoning
machines that can draw meaningful conclusions from
encoded knowledge
3. Exchanging Information: distribute, interlink, and
reconcile knowledge on a global scale
7. Planning Booking Reviews
Travel
Airline Tickets Car Rental Hotels
Using structured data, computers
can aggregate information and
customize it for the user
Travel
ontology
describes and
classifies
travel
services
8. Motivation
§ We can see similar problems when it comes to
API discovery on the Web
§ Discovering an API requires searching through a
large number of services on the Internet
§ Reading pages of documentation to figure out how
to use the ones that may match your application
§ Example: ProgrammableWeb (PW)
§ De facto API directory with over 14,000 APIs
§ Contains over 50 categories of services
§ API providers register their APIs in PW
§ Each API is manually categorized in a single category
by PW team
10. Current state of PW
Current classification is
a flat categorization of
high-level service
classes without any
refinement between
common attributes
Needs a better method
for API discovery
13. Example: Search for Social Advertising
APIs in the Advertising Category
Search for
‘social’ and
‘advertising’
keywords in
Advertising
Category
Results in 7
APIs
14. Example: Search for Social Advertising
APIs in the Social Category
Search for
‘social’ and
‘advertising’
keywords in
Social
Category
Results in 2
APIs
15. What is needed?
A common data model has to be provided such as
an ontology in order to classify terms and
represent knowledge
Definition:
A formal, explicit specification of a shared
conceptualization ~ Tom Gruber
17. Ontology
§ OWL (Web Ontology Language): Approved
standard by W3C
§ Characteristics of ontologies
§ Classes: set of resources
§ Instances: ground level objects
§ Properties: relationships between classes
§ First order logic axioms
§ Class relationships such as disjointness, equivalence,
subsumption
§ Restrictions on properties such as existential, universal,
cardinality
18. Ontology Benefits
n Standard way of describing the world both in terms
of language and meaning
n Easily sharable across domains
n Machine readable
n Reasoning
n Provide complex class relationships such as disjointness,
union, intersection besides pure hierarchy
n Description logic reasoners automatically derive new
information and classify data
n Automated classification can be very useful for dynamic data
that is continually updated
19. Ontology vs Relational Database
n Similarities
n Both use a model to identify common classes and
properties
n ER model can be seen as a simple hierarchical
ontology
n Differences
n Ontologies are broader in scope (rules, incomplete
knowledge)
n Ontologies provide a way for automated reasoning
to occur in order to discover new relationships
between entities
20. Example: Reasoning with a
Restaurant Ontology
Import class Cuisine
Create a restaurant
classification based on
cuisine by setting a
restriction on the
hasCuisine property
21. Example: Reasoning with a
Restaurant Ontology
Since ChineseCuisine has
non-disjoint siblings
JapaneseCuisine and
KoreanCuisine then also
conclude that these are
similar to ChineseCuisine
23. Example: Reasoning with a
Restaurant Ontology
Run Reasoner for
Automated Classification
Conclude that
NewClass is
equivalent to
ChineseRestaurant
EQUIVALENT
25. Problem
§ Problem:
§ Improve API discovery and classification in Programmable
Web by providing a common data model such as an
ontology in order to automatically classify terms and perform
semantic API searches
§ Main Challenges:
§ Define high-level semantic descriptions of Programmable
Web services
§ Combine manual and automated data mining techniques to
create an ontology description of existing Programmable
Web services
§ Implement system that makes use of the ontology, such as
front-end user interface
26. What will improve?
§ Given a PW ontology, the system will:
§ Automatically classify existing API instances
within this ontology
§ Create an ontology-based user-interface for
automatic registration and querying
§ API providers will be able to register their services via this
interface
§ Users will be able to discover services with semantic queries
§ Example:
§ Find me an advertising service for social networks
§ Find me a social networking service for book
sharing
27. What do we need?
PW Service Classes Properties Feature Classes
API Individuals
AutomatedClassification
PW Service Classes
hasFeature
<140Proof, hasFeature, Advertising_Feature>
<140Proof, hasFeature, Social_Feature>
Advertising_Service
Social_Service
Advertising_Feature
Social_Feature
Advertising_Service
Social_Service
<BadgeVille, hasFeature, Advertising_Feature>
<BadgeVille, hasFeature, Social_Feature>
Refinement properties for a given PW Category to
enable automatic classification
31. Video
Improved PW Classification using an
OWL Ontology
Advertising Social Photo
PW Services
VideoSocial PhotoSocial TravelSocial
Travel
AdvertisingSocial
APIs that have attributes
belonging in more than one
category will automatically be
classified
33. APIBrowse: Improved Faceted Search
Interface
Given the PW ontology, automatically generate a faceted search
interface by integrating it with a search platform such as SOLR
38. LexOnt: A semi-automatic
ontology creation tool
§ A semi-automatic ontology creation tool that uses the
Programmable Web as its corpus
§ Suggest high-level property terms for a given service class
which distinguish it from the rest of the categories
§ Implemented as a Protege plugin, de facto ontology
editor, to aid in semi-automated ontology creation
§ Contributions:
§ Novel algorithm ranks terms and phrases within a PW category as
candidate property assignments by comparing them to external
domain knowledge within Wikipedia, Wordnet and the current
state of the ontology
§ Can be used even if the ontology engineer is not necessarily an
expert of a certain domain
39. LexOnt Algorithms
Well-known NLP algorithms used to find terms and phrases
§ TF-IDF: Text frequency-inverse document frequency
§ Score of a word in the document shows how important the word is
§ Importance of a word depends on how frequently the word has been
used in the document vs. all the documents in the corpus
§ Significant Phrases:
§ Chi-square test used to calculate the significance of collocated words
§ Two-phase process:
§ Determine collocations and terms that appear together
§ Filter out unique collocations from the list
§ Gave a very good indication of high-level property descriptions
40. LexOnt Algorithms
Novel algorithm uses external resources like Wikipedia, Wordnet
and the constructed ontology to highlight the important terms
even more
§ Useful for those who are not domain experts but want to
understand what the relevant terms of a domain are
§ Algorithm for using the External Knowledge Base
§ Extract Wikipedia page for each category and rank top words with TF-IDF
§ If a word or phrase in the API contains any of the top Wikipedia words, label it
§ Find synonymous or related terms to the list of generated terms using Wordnet
§ If a word or phrase in the API contains any of the related terms label them
§ If any of the generated terms lexically match terms in the ontology label them
using a color code
41. Top N TF-IDF from Wiki Advertising, marketing, brand, television, semiotics,
advertisement, billboard, radio, product, bowl,
sponsor, consumer, advertise, placement, super,
logo, commercial, infomercial
Top N TF-IDF from Wordnet Ad, advertisement, advertizement, advertising,
advertizing, advert, promotion, direct-mail, prview,
advertorial, mailer, newspaper-ad, commercial,
circular, teaser, top-billing
Top N TF-IDF from PW
Category
Proof, persona, stream, replies, authors, say, hello,
ad, brands, social, consumers, advertisers,
audience, ads
Top N TF-IDF Ranked based
on external KB
Advertisers (wiki), Consumers(wiki),
Social(wiki) Brands(wiki), Ads (related), Ad
(related), proof, persona, stream, replies, authors,
say, hello, audience
Top N Significant Phrases
ranked based on external KB
Stream advertising (wiki), social
stream(wiki), say hello, author, replies, google
groups, ober, michaels, proof, erik, michaels,
persona targeting
Example of Property Selection from a
Social Advertising API
43. LexOnt Implementation
§ LexOnt is implemented as a Protege plugin to
enhance the user experience of semi-
automated ontology creation
§ Four different Java APIs used for the
implementation
§ Lingpipe API used for the NLP algorithms to
generate TF-IDF terms and Significant Phrases
§ Lucene used for indexing and searching for terms
§ Protege API used for implementing the Protege
plugin GUI
§ OWL-API used for ontology generation code
44. LexOnt Results
§ Used PW Corpus of ~3000 APIs equalling 250MB data
§ Constructed ontology for 5 categories with following features:
§ Domain specificity
§ A priori knowledge of domain
§ Number of APIs within the domain
§ Tested for four things when evaluating LexOnt
1. The precision/recall of the TF-IDF term and Significant Phrase
generation without external KB information
2. How helpful the external KB was when choosing terms by finding
the percentage of terms used in ontology
3. Whether or not the terms were used in their exact form, similar
form or different forms
4. How quickly an ontology API was constructed by the user
45. LexOnt Results
1. Precision/Recall tests for terms without taking external KB
into account
n 4% precision
n 28% recall
Results:
Using only TF-IDF/Sig Phrases alone is not good enough to
determine how terms should be used
2. For categories with well-defined Wikipedia pages,
percentage of terms used from external KB was >50%
Results:
Well-defined external KBs made it much easier to quickly assess
distinguishing features of a category
46. LexOnt Results
Domain Number of
APIs
Specifically
Defined
External KB
A priori
knowledge
of Domain
% terms
used from
External KB
Advertising <100 √ X 50%
Travel <100 √ √ 80%
Real Estate <100 √ X 100%
Social >100 X √ 20%
47. LexOnt Results
3. Tested to see how these terms were actually assigned
within the instances
n Compared matches that were exact, similar or completely
different
n Example: if LexOnt produced a term “mobile” but the actual
ontology assignment was “mobile advertising,” this would count
as a similar match
n Percentage of equal and similar matches for API instances
averaged over 80%
Results:
n External KB terms were used over 80% of the time
n Percentage of different matches was higher when category was
not well-defined such as the Utility category
48. LexOnt Results
4. Speed of ontology construction
n Before we had the LexOnt tool, and only worked with generated
TF-IDF/Sig Phrase terms, it took around 15 minutes to construct an
API instance and related feature
n After the completion of LexOnt, this dropped to 2 minutes.
Results:
n LexOnt’s user interface and external knowledge base ranking
reduced the time for ontology construction by a factor of 7
50. Related Work
§ Most related work involves semi-automated ontology
creation for
§ Pure hierarchical ontologies
§ Domains that already have some kind of structural description
§ Machine learning and NLP techniques used
§ On text corpora
§ Alongside existing structured or annotated external knowledge
base
§ The work closest to LexOnt’s
§ Find property relationships between concepts
§ Use unstructured external knowledge bases
51. Related Work
System Corpus Ontology
Suggestions
External
Knowledge
Text2Onto annotated Probabilistic
Ontology
Models
None
OntoLT rule-based Classes and
properties
None
OntoLearn unstructured Hierarchical
classification
Definitions,
Synonyms
LexOnt unstructured Properties Wikipedia,
Wordnet,
Generated
Ontology
52. Conclusion
n LexOnt has shown to be an effective tool for
semi-automated ontology creation
n From our initial results, we have determined that
using an external knowledge base to filter out
generated terms and phrases
n Increases the accuracy of the feature selection
n Helps in understanding the common terms within a
corpus
53. LexOnt Publications
n Knarig Arabshian and Peter Danielsen, Ontology-based Faceted Search
Interface for APIs (In Journal Submission).
n Peter Danielsen and Knarig Arabshian, User Interface Design in Semi-
Automated Ontology Construction, International Conference on Web
Services (ICWS 2013), Santa Clara, CA, June 2013.
n Knarig Arabshian, Peter Danielsen and Sadia Afroz, LexOnt: Semi-
Automatic ontology Creation Tool for Programmable Web, AAAI 2012
Spring Symposium on Intelligent Web Services Meet Social Computing,
Palo Alto, CA, March 2012.
n Knarig Arabshian and Peter Danielsen, Semi-automated Ontology
Creation for High-level Service Classification 7th International
Conference on Semantics, Knowledge and Grids (SKG 2011), Beijing,
China, Oct 2011.