Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
130919 jim cordy - when is a clone not a clone
1. When is a Clone not a Clone?
(and vice-versa)
Contextualized Analysis of Web Services
Douglas Martin
Scott Grant
James R. Cordy
David B. Skillicorn
School of Computing
Kingston, Canada
2. Motivation
— The Personal Web
— Rapidly growing number of web services makes it
increasingly difficult to find and choose the right ones
— Need a quick and convenient
way to find alternatives
— Hand tagging impractical –
automation is needed!
3. Motivation
— Automation
— Similarity detection techniques offer solutions!
— Code clone detection from
software engineering research
can find similar code fragments –
why not similar services?
— Topic models from data mining
research can find text documents
with similar semantics –
why not similar services?
4. Web Service Similarity
— Web services are stored in
service registries, containing
WSDL service description files
— Could apply clone detection to
entire service descriptions
— But what we really want are
similar service operations
6. How about these?
<operation name=“DrawRateChartCustom”>
<input message=“DrawRateChartCustomIn”/>
<output message=“DrawRateChartCustomOut”/>
</operation>
<operation name="GetTopicBinaryChartCustom">
<input message="GetTopicBinaryChartCustomSoapIn"/>
<output message="GetTopicBinaryChartCustomSoapOut"/>
</operation>
7. So what went wrong?
— At this point we thought maybe our idea wasn’t
going to work
— Maybe clone detection can’t help with
web service discovery?
— But why? What’s so special about WSDL?
9. Web Service Description
Language (WSDL)
— A WSDL service description has
3 main parts:
— a <portType> element where the
operations are declared;
10. Web Service Description
Language (WSDL)
— A WSDL service description has
3 main parts:
— a <portType> element where the
operations are declared;
— <message> elements
corresponding to inputs, outputs
and faults of the operations;
11. Web Service Description
Language (WSDL)
— A WSDL service description has
3 main parts:
— a <portType> element where the
operations are declared;
— <message> elements
corresponding to inputs, outputs
and faults of the operations;
— and a <types> element
containing an XML Schema that
defines the data and structure
types used in the messages
15. Web Service Description
Language (WSDL)
— WSDL service description files contain descriptions
of the operations that a web service has to offer
— But the pieces of each operation’s own description
are scattered over different parts of the WSDL file
— Difficult to identify complete units to analyze and
compare
16. The Problem
— This poses a problem for analysis techniques:
— Operations cannot easily be compared for similarity
using clone detectors, because there are no
contiguous fragments to compare
— And they cannot be analyzed using data mining topic
models, because there are no separate complete
documents to generate a model from
17. Our Solution
— Our solution is to contextualize the original
<operation> elements, to create self-contained
operation descriptions
— We use source transformation to inline remote
information from the context into the elements
that reference or depend on them
— We call these contextualized WSDL operations
Web Service Cells, or WSCells
— The first example of a new kind of clone detection:
contextual clones
20. An Experiment
— We have run an experiment to investigate the
difference between clone detection on WSCells
and original raw operations
— Two sets of WSDL service description files:
1,100 operations and 7,500 operations
— Compared NICAD clone detector results for each
set at various near-miss difference thresholds
0% = exact clone,
10% = 1 line in 10 different, and so on
23. Clone Detection for
Web Services
— Contextual clone detection with WSCells works!
— Not only finds similar web service operations,
but uncovers similar operations we could not find
in any other way
<operation name=“DrawRateChartCustom”>
<input message=“DrawRateChartCustomIn”/>
<output message=“DrawRateChartCustomOut”/>
</operation>
<operation name="GetRealChartCustom">
<input message="GetRealChartCustomSoapIn"/>
<output message="GetRealChartCustomSoapOut"/>
</operation>
<operation name="GetLastSaleChartCustom">
<input message="GetLastSaleChartCustomSoapIn"/>
<output message="GetLastSaleChartCustomSoapOut"/>
</operation>
<operation name=“DrawYieldCurveCustom”>
<input message=“DrawYieldCurveCustomIn”/>
<output message=“DrawYieldCurveCustomOut”/>
</operation>
<operation name="GetTopicChartCustom">
<input message="GetTopicChartCustomSoapIn" />
<output message="GetTopicChartCustomSoapOut" />
<operation name="GetTopicBinaryChartCustom">
</operation>
<input message="GetTopicBinaryChartCustomSoapIn"/>
<output message="GetTopicBinaryChartCustomSoapOut"/>
</operation>
24. Semantic Analysis of
Web Services
— Contextualized WSCells also make it possible to use
data mining topic models to do semantic analysis
of web services
— Because they provide self-contained documents of
significant size
— Might topic models provide a different view
of web service similarity?
25. Latent Dirichlet Allocation
— Latent Dirichlet Allocation (LDA) :
— A statistical model to uncover latent topics
— Identifies the correlation between documents in
terms of shared latent topics (sets of tokens)
— Accepts a set of documents (e.g., source files) as
input, returns probability distributions over inferred
topics (a topic model) as output
— Each document has some probability of being related
to topic 1, another probability for topic 2, and so on
— Similar documents should be related to similar topics
26. Latent Dirichlet Allocation
— Documents are represented in the model in terms
of probability distributions over topics
— Similarity between documents is found using the
Hellinger Distance
— A measure of how much agreement there is between
the shared topics of two documents
— Almost identical documents have a small Hellinger
Distance since they will be related to the same topics
— In terms of web services, small Hellinger Distances
indicate highly related operations
27. Evaluating WSCells
— To evaluate the use of WSCells with LDA, we :
— Generate an LDA model for the original <operation>
elements, and another for the contextualized WSCells
— Explore the Global and Local Similarity between each
pair of operations in the models
— Global Similarity an overall view of the most closely
related web service operations in the service set
— Local Similarity a per-operation view of the other
most related web service operations for each
operation
28. Global Similarity
— We look at Global Similarity using a visualization
called Bluevis
— Bluevis shows the global conceptual structure of a
system by highlighting similar operations using an
illuminated line from left-to-right
— Plot some top fraction of similar operations
(top 25,000 in our examples)
— Use a consistently ordered list of web service
operations for the LDA model to view the differences
— If a display is noisy, it is often an indication that the
model is not identifying meaningful data
30. Global Similarity
— For original raw operations:
— Bluevis highlights the LDA
most similar operations
— Some clear structure
— However, most of this is
due to shared keywords,
like get and SOAP
— This uncontextualized
model has very little value
32. Global Similarity
— For contextualized WSCells:
— A clearer semantic
structure, less noise overall
— Operation similarity
becomes meaningful
— Services with semantic
similarity discovered
— E.g., Operations with
similar parameters or
faults, such as those that
manipulate holiday dates
or financial rates
33. Local Similarity
— We can also examine the local similarity for each
individual operation
— Identify the complete ordered list of similarity scores
for an operation in the data set
— Using the top similarity scores, evaluate how
meaningful the data is from a user's perspective
— For example, how can I find the most similar web
service operations to the one I am using now?
— We use a tool called POCO (Pairwise Observation of
Concepts) to examine the most similar operations
35. Local Similarity
Operation
Most similar WSCell
Most similar original raw
WSDL operation
ListFinancials
GetFinancialServicesFromList
LanguagesList
ExportShipsAndCategories ExportIteneraryAndSteps
Search
GetIssueData
GetFlightData
word_cloud
GetWeatherReport
GetWeather
GetIndices
GetAIDIBOR
GetTRLIBOR
GetCarriers
searchByIdentifier
searchByNameAndAddress
GetLastSecurityHeadlines
ToolsAndHardwareBox
KitchenAndHousewareBox
ListRenditions
GetReservations
GetRoomAvailabilityForDay
GetSOFIBOR
GetOtherProductInfo
NextOtherProductPortion
GetParkingInfo
GetAllSplitsByExchange
GetAllCashDividendsByExchange
GetTeamLoyalties2
36. Summary
— Very-high-level domain-specific languages such as
WSDL make poor targets for similarity analysis
using clone detection and topic models
— Lack of local context prevents meaningful results
— Contextualizing using WSCells exposes both cloning
and semantic relationships between web operations
— Clone detection of WSCells identifies similar web
service operations
— Topic models of WSCells expose both global
system-wide semantic relationships and local
individual relationships between operations
37. Current & Future
— Continue analysis of web services for the Personal
Web using our results
— Apply contextualization to similarity analysis of
other modeling and specification languages
(currently Simulink, Stateflow and UML sequence
diagrams)
— Experiment with effect of contextualization on
clone and topic model analysis of
traditional languages such as Java and C
(“contextual clones”)
38. When is a Clone not a Clone?
(and vice-versa)
Contextualized Analysis of Web Services
Douglas Martin
Scott Grant
James R. Cordy
David B. Skillicorn
Questions?