130919 jim cordy - when is a clone not a clone

When is a Clone not a Clone?
(and vice-versa)

Contextualized Analysis of Web Services
Douglas Martin
Scott Grant

James R. Cordy
David B. Skillicorn

School of Computing

Kingston, Canada

Motivation
—  The Personal Web
—  Rapidly growing number of web services makes it
increasingly difficult to find and choose the right ones

—  Need a quick and convenient
way to find alternatives

—  Hand tagging impractical –
automation is needed!

Motivation
—  Automation
—  Similarity detection techniques offer solutions!
—  Code clone detection from

software engineering research
can find similar code fragments –
why not similar services?

—  Topic models from data mining

research can find text documents
with similar semantics –
why not similar services?

Web Service Similarity
—  Web services are stored in

service registries, containing
WSDL service description files

—  Could apply clone detection to
entire service descriptions

—  But what we really want are
similar service operations

Let’s try it!
<operation name="GetStock" >
<input message="tns:GetStockRequest" />
<complexType name=“Stock”>
<output message="tns:GetStockResponse" />
<sequence>
</operation>
<element name=“Supplier” type=“xsd:string”/>
<element name=“Warehouse” type=“xsd:string”/>
<element name=“OnHand” type=“xsd:string”/>
<element name=“OnOrder” type=“xsd:string”/>
<element name=“Demand” type=“xsd:string”/>
</sequence>
</complexType >

<sequence>
</operation>
<element name=“date” type=“xsd:string”/>
<element name=“open” type=“xsd:float”/>
<element name=“high” type=“xsd:float”/>
<element name=“low” type=“xsd:float”/>
<element name=“close” type=“xsd:float”/>
<element name=“volume” type=“xsd:float”/>
</sequence>
</complexType >

How about these?
<operation name=“DrawRateChartCustom”>
<input message=“DrawRateChartCustomIn”/>
<output message=“DrawRateChartCustomOut”/>
</operation>

<operation name="GetTopicBinaryChartCustom">
<input message="GetTopicBinaryChartCustomSoapIn"/>
<output message="GetTopicBinaryChartCustomSoapOut"/>
</operation>

So what went wrong?
—  At this point we thought maybe our idea wasn’t
going to work

—  Maybe clone detection can’t help with
web service discovery?

—  But why? What’s so special about WSDL?

Web Service Description
Language (WSDL)
—  A WSDL service description has
3 main parts:

Language (WSDL)
3 main parts:

—  a <portType> element where the
operations are declared;

Language (WSDL)
3 main parts:


—  <message> elements
corresponding to inputs, outputs
and faults of the operations;

Language (WSDL)
3 main parts:


—  <message> elements
corresponding to inputs, outputs
and faults of the operations;
—  and a <types> element
containing an XML Schema that
defines the data and structure
types used in the messages

Language (WSDL)
—  This simple example service
has two operations:

Language (WSDL)
has two operations:
—  ReserveRoom

Language (WSDL)
has two operations:

—  ReserveRoom
—  GetAvailableRooms

Language (WSDL)
—  WSDL service description files contain descriptions
of the operations that a web service has to offer

—  But the pieces of each operation’s own description
are scattered over different parts of the WSDL file

—  Difficult to identify complete units to analyze and
compare

The Problem
—  This poses a problem for analysis techniques:
—  Operations cannot easily be compared for similarity
using clone detectors, because there are no
contiguous fragments to compare

—  And they cannot be analyzed using data mining topic
models, because there are no separate complete
documents to generate a model from

Our Solution
—  Our solution is to contextualize the original

<operation> elements, to create self-contained
operation descriptions
—  We use source transformation to inline remote
information from the context into the elements
that reference or depend on them

—  We call these contextualized WSDL operations

Web Service Cells, or WSCells
—  The first example of a new kind of clone detection:
contextual clones

Contextualizing WSDL
Operations

An Experiment
—  We have run an experiment to investigate the

difference between clone detection on WSCells
and original raw operations

—  Two sets of WSDL service description files:
1,100 operations and 7,500 operations

—  Compared NICAD clone detector results for each
set at various near-miss difference thresholds
0% = exact clone,
10% = 1 line in 10 different, and so on

An Experiment
—  Number of clones decreases with WSCells
Diﬀerence

Threshold

Clone
Pairs
in
Set
1

Clone
Pairs
in
Set
2

Originals

WSCells

Originals

WSCells

0.0

852

705

1434

1066

0.1

852

734

1434

1228

0.2

879

775

1438

1637

0.3

884

813

1469

1637

<sequence>
</operation>
</sequence>
</complexType >
<sequence>
</operation>
</sequence>
</complexType >

—  Reduction in

false positives

An Experiment
—  Number of clone classes can increase with WSCells
Diﬀerence

Threshold

Clone
Classes
in
Set
1

Clone
Classes
in
Set
2

Originals

WSCells

Originals

WSCells

0.0

169

187

587

433

0.1

169

139

587

499

0.2

172

142

589

631

0.3

171

136

591

631

<sequence>
</operation>
</sequence>
</complexType >
<sequence>
</operation>
</sequence>
</complexType >

—  Splits by deeper
differences –
more precision

Clone Detection for
Web Services
—  Contextual clone detection with WSCells works!
—  Not only finds similar web service operations,

but uncovers similar operations we could not find
in any other way
<operation name=“DrawRateChartCustom”>
<input message=“DrawRateChartCustomIn”/>
<output message=“DrawRateChartCustomOut”/>
</operation>
<operation name="GetRealChartCustom">
<input message="GetRealChartCustomSoapIn"/>
<output message="GetRealChartCustomSoapOut"/>
</operation>
<operation name="GetLastSaleChartCustom">
<input message="GetLastSaleChartCustomSoapIn"/>
<output message="GetLastSaleChartCustomSoapOut"/>
</operation>
<operation name=“DrawYieldCurveCustom”>
<input message=“DrawYieldCurveCustomIn”/>
<output message=“DrawYieldCurveCustomOut”/>
</operation>
<operation name="GetTopicChartCustom">
<input message="GetTopicChartCustomSoapIn" />
<output message="GetTopicChartCustomSoapOut" />
<operation name="GetTopicBinaryChartCustom">
</operation>
<input message="GetTopicBinaryChartCustomSoapIn"/>
<output message="GetTopicBinaryChartCustomSoapOut"/>
</operation>

Semantic Analysis of
Web Services
—  Contextualized WSCells also make it possible to use
data mining topic models to do semantic analysis
of web services
—  Because they provide self-contained documents of
significant size

—  Might topic models provide a different view
of web service similarity?

Latent Dirichlet Allocation
—  Latent Dirichlet Allocation (LDA) :
—  A statistical model to uncover latent topics
—  Identifies the correlation between documents in
terms of shared latent topics (sets of tokens)

—  Accepts a set of documents (e.g., source files) as

input, returns probability distributions over inferred
topics (a topic model) as output
—  Each document has some probability of being related

to topic 1, another probability for topic 2, and so on
—  Similar documents should be related to similar topics

Latent Dirichlet Allocation
—  Documents are represented in the model in terms
of probability distributions over topics

—  Similarity between documents is found using the

Hellinger Distance
—  A measure of how much agreement there is between

the shared topics of two documents
—  Almost identical documents have a small Hellinger
Distance since they will be related to the same topics
—  In terms of web services, small Hellinger Distances
indicate highly related operations

Evaluating WSCells
—  To evaluate the use of WSCells with LDA, we :
—  Generate an LDA model for the original <operation>
elements, and another for the contextualized WSCells
—  Explore the Global and Local Similarity between each
pair of operations in the models

—  Global Similarity an overall view of the most closely
related web service operations in the service set

—  Local Similarity a per-operation view of the other
most related web service operations for each
operation

Global Similarity
—  We look at Global Similarity using a visualization
called Bluevis

—  Bluevis shows the global conceptual structure of a
system by highlighting similar operations using an
illuminated line from left-to-right
—  Plot some top fraction of similar operations
(top 25,000 in our examples)
—  Use a consistently ordered list of web service
operations for the LDA model to view the differences
—  If a display is noisy, it is often an indication that the
model is not identifying meaningful data

Global Similarity
—  For original raw operations:
—  Bluevis highlights the LDA
most similar operations
—  Some clear structure
—  However, most of this is
due to shared keywords,
like get and SOAP

—  This uncontextualized

model has very little value

Global Similarity
—  For contextualized WSCells:
—  A clearer semantic

structure, less noise overall
—  Operation similarity
becomes meaningful

—  Services with semantic
similarity discovered
—  E.g., Operations with

similar parameters or
faults, such as those that
manipulate holiday dates
or financial rates

Local Similarity
—  We can also examine the local similarity for each

individual operation
—  Identify the complete ordered list of similarity scores
for an operation in the data set

—  Using the top similarity scores, evaluate how

meaningful the data is from a user's perspective
—  For example, how can I find the most similar web
service operations to the one I am using now?

—  We use a tool called POCO (Pairwise Observation of
Concepts) to examine the most similar operations

Local Similarity
Operation

Most similar WSCell

Most similar original raw
WSDL operation

ListFinancials

GetFinancialServicesFromList

LanguagesList

ExportShipsAndCategories ExportIteneraryAndSteps

Search

GetIssueData

GetFlightData

word_cloud

GetWeatherReport

GetWeather

GetIndices

GetAIDIBOR

GetTRLIBOR

GetCarriers

searchByIdentifier

searchByNameAndAddress

GetLastSecurityHeadlines

ToolsAndHardwareBox

KitchenAndHousewareBox

ListRenditions

GetReservations

GetRoomAvailabilityForDay

GetSOFIBOR

GetOtherProductInfo

NextOtherProductPortion

GetParkingInfo

GetAllSplitsByExchange

GetAllCashDividendsByExchange

GetTeamLoyalties2

Summary
—  Very-high-level domain-specific languages such as
WSDL make poor targets for similarity analysis
using clone detection and topic models
—  Lack of local context prevents meaningful results

—  Contextualizing using WSCells exposes both cloning
and semantic relationships between web operations
—  Clone detection of WSCells identifies similar web
service operations
—  Topic models of WSCells expose both global
system-wide semantic relationships and local
individual relationships between operations

Current & Future
—  Continue analysis of web services for the Personal
Web using our results

—  Apply contextualization to similarity analysis of

other modeling and specification languages
(currently Simulink, Stateflow and UML sequence
diagrams)

—  Experiment with effect of contextualization on
clone and topic model analysis of
traditional languages such as Java and C
(“contextual clones”)

When is a Clone not a Clone?
(and vice-versa)

Contextualized Analysis of Web Services
Douglas Martin
Scott Grant

James R. Cordy
David B. Skillicorn

Questions?

130919 jim cordy - when is a clone not a clone

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (6)

Semelhante a 130919 jim cordy - when is a clone not a clone

Semelhante a 130919 jim cordy - when is a clone not a clone (20)

Mais de Ptidej Team

Mais de Ptidej Team (20)

Último

Último (20)

130919 jim cordy - when is a clone not a clone