Comprehending Web Applications by a Clustering Based Approach

Comprehending Web Applications
by a Clustering Based Approach

Anna Rita Fasolino
G. A. Di Lucca, F. Pace, P. Tramontana, U. De Carlini

Dipartimento di Informatica e Sistemistica
University of Naples Federico II, Italy

IWPC 2002, Paris, France
1

Web Applications (WA):
problems and open issues

• The pressing market demand of web applications
– WAs developed in very short time, with no respect of
software engineering principles

• The continuously changing needs of the evolving
application domain
– WAs frequently and rapidly modified with ad hoc
approaches, causing low quality software, disordered
architecture and inadequate and incomplete documentation

• The growing complexity of WA technologies
– From static web sites, to sites providing client-side
interaction, to web applications with dynamic content
2

Managing existing Web Applications

• Due to the large number of employed technologies,
understanding, maintaining and evolving a dynamic
application is a complex task …

• Reverse Engineering methods and techniques have
been proposed for…
– Analyzing the functional behavior of an existing WA
– Reconstructing the architecture of the WA
– Capturing and reusing the design of the application
– Modeling static and dynamic views by UML diagrams (use
cases, sequence and class diagrams)
– …

3

Current approaches for Web Application
reverse engineering

• Based on graphical representations of the web
application
– Valuable approach for analyzing relatively small
applications but…
– Less useful for coping with large scale applications.

• A possible solution:
– Factoring the graphical representation into smaller cohesive
parts using Clustering techniques.
• An open issue:
– Adapting traditional Clustering approaches to the WA area.

4

Applying traditional Clustering to the WAs

• Approaches based on file name analysis
– Ineffective with applications whose source code has been
automatically generated or written without any coherent file
name convention

• Approaches based on directory file analysis
– Grounded on the hypothesis that the directory organization
mirrors the functional one, but …

• Pattern-driven clustering approaches
– Requiring the identification of common structures in web
applications

5

Applying traditional Clustering to the WAs

• Approaches based on dependence, or dominance
graphs
– Requiring acyclic interconnection graphs! Unapplicable
because of backward links towards home/ index pages

• Approaches exploiting quality measures of a
clustering
– Optimal clustering is obtained by searching in a space of
possible graph partitioning. What kind of WA graph should
be considered? And what quality measure?

6

A new method for clustering WAs

• Goal: Grouping software components of the WA into
meaningful (highly cohesive) and independent
(loosely coupled) clusters.

• Three questions have been addressed:
– Definition of a model of a WA representing relevant
components and relationships.
– Definition of a metric for expressing the degree of coupling
of interconnected components.
– Selection of a clustering algorithm.

7

1. The conceptual model of a WA
0..n redirect
link Web Page 0..n
Client Module 0..n
0..n

0..n include include

0..n
0..n
Client Page Server Page
0..1 build 0..1 0..n

0..n 0..n

submit
Web Object 0..n

Client Page Load in Frame
with Frame
0..n

• Components: Client pages, server pages, client page with frames,
client modules, web objects.
• Relationships: Link, submit, redirect, build, load_in_frame, include.
• Each WA will be modeled by a WAG (WA Connection Graph).
8

2. A measure of coupling between WA
components

• Heuristic approach:
– Coupling between two nodes in the WAG will depend both
on typology and topology of the edges.
– Two different weigthing strategies.

• Typology:
– Different weigths will be assigned with build, link, redirect,
and submit edges.
• Build: the greatest weight.
• Redirect: greater weight than Link edges.
• Submit : greater weight than Link and Redirect edges.
• wRL = wR / wL AND wSL = wS / wL AND 1< wRL < wSL.

9

A measure of coupling between WA
components
• Topology:
– The degree of coupling of two nodes A and B is considered
stronger when A uniquely reaches B, than when A reaches
both B and other nodes.
– A new weighting strategy:
• Edges from a node will be weighted (w OUT) according to the fan-
out of the node (greater the fan-out, less the weight).
• Edges towards a node will be weighted (w IN) according to the
fan-in of the node (greater the fan-in, less the weight).
• The coupling measure
CA,B= CAB + CBA

Depending on weighted edges Depending on weighted edges
from A to B from B to A

10

3. The clustering algorithm …

• Agglomerative hierarchical clustering algorithm:
– Iteratively gathers the graph nodes into new larger clusters,
starting from an initial clustering with each cluster including a
single WA component.

FINAL CLUSTERING

INITIAL CLUSTERING

11

Four clustering rules

• At each iteration, a new clustering is obtained by applying four
clustering rules
• Rule 1: the cluster containing a built client page will be merged with
the cluster containing the server page building the former page;
• Rule 2: if and only if all the pages referenced by the <frame> tags of a
client page with frame belong to the same cluster, the cluster including
the page with frame will be merged with the cluster including the
referenced pages;
• Rule 3: if and only if all the client pages (server pages) including a
same client module (server page) belong to the same cluster, the
cluster comprising the former pages will be merged with the cluster
including the client module (server page);
• Rule 4: the pair of clusters whose coupling value is the maximum one
will be gathered into a new cluster.

12

The clustering algorithm in PDL
1. begin with n clusters each containing one WA component;
2. define the wL, wRL and wSL values;
3. for each cluster containing a built client page component, apply rule R1;
4. while (there is at least a pair of connected clusters)
do
for each cluster containing a client page with frame component, apply
rule R2;
for each cluster containing a client module or an included server page
component, apply rule R3;
for each cluster c, and for each x, compute
wx (c) and wx (c);
OUT IN

for each pair of clusters A and B, compute the CA,B coupling between
them;
apply rule R4;
od

13

How can the hierarchy of clustering be
pruned?

• An approach based on a quality metric.

• A good clustering will include clusters with high intra-
connectivity and low inter-connectivity.
– intra-connectivity expresses the degree of cohesion between
items of a same cluster.
• a weighted mean of a cluster inner edges (values in [0, 1])
– inter-connectivity expresses the degree of coupling between
items of two different clusters.
• a weighted mean of edges between clusters (values in [0, 1])
• The Quality of a Clustering metric QoC :
QoC= IntraConnectivity – InterConnectivity (values in [-1, 1])

14

The choice of the hierarchy cut-heigth

• The QoC determines the quality of a clustering as the
trade-off between inter-connectivity and intra-
connectivity.
– It rewards the creation of highly cohesive clusters and
avoids excessive coupling between clusters.

• The clustering exihibiting the maximum QoC is a
candidate to implement the best partition of the WA
components.

15

Using clustering during WA
comprehension processes

• A structured approach:
– Static analysis of the WA and production of the WA Connection
Graph;
– Execute clustering;
– Find the Cmax clustering with the maximum QoC value;
– Submit the Cmax clustering to a Concept Assignment Process (CAP).

• An integrated tool platform supporting the process:
– The Reverse engineering WARE tool for :
• Executing Static Analysis of the WA and producing the WAG
• Automatic clustering and Search for the best clustering
• Supporting the software engineer during the CAP

16

A validation experiment

• Goal of the experiment:
– Assessing the effectiveness of the clustering approach in
supporting WA comprehension.

• Experimental procedure:
– Several WAs were analyzed with the clustering technique.
– Software engineers (unfamiliar with the WAs) carried out the CAP,
and distinguished Valid from Invalid clusters.
– Valid clusters: whose items actually implemented one function.
– Invalid clusters were classified as spurious, divisible, or
incomplete:
• Spurious (whose items show low cohesion degree)
• Divisible (whose items can be splitted into smaller cohesive clusters)
• Incomplete (not including all necessary items implementing a function)

17

A case study

• A WA for managing undergraduate course activities:

– Providing course information, allowing student registration to
the course, or exam sessions, teaching material download or
upload, managing the teacher course agenda.
– Implemented using HTML, ASP, Vbscript, Javascript
technologies, with MS Access database.
– Composed of 107 source files arranged in one directory
(size of about 500 Kbytes).
– Development documentation included UML use case
diagrams and textual description of the WA functions.

18

Results from the Static analysis of the WA

• The inventory produced by the WARE tool:

Component type # Detected
Server page 76
Client Static page 23
Client Built page 75
Submit Operation 49
Anchor (Hypertextual link) 45
Redirect operation 8
Include operation 57
Load in Frame operation 4

19

The WA Connection Graph

174 nodes

20

The WA clustering proposed by the tool

50 cluster nodes

The clustering exihibiting the maximum QoC

21

Results from the CAP

• The source code of the proposed clusters was analyzed in order to
associate each cluster with a description of the implemented
function:
– 44 valid clusters
– 6 incomplete clusters
– 3 pairs of clusters could be gathered together into three new clusters
• Final result: 47 valid clusters
• Cluster descriptions were compared against the development
documentation:
– Each valid cluster matched with a use case!

• Effectiveness= # Valid Clusters/ #Proposed clusters= 94%

22

Lesson learned

• The problem of the cut-heigth with hierarchical clustering:
– The QoC metric suggests a candidate clustering to be analyzed. For a
given QoC, the maximum coupling value C° represents the cut-heigth.
• In order to improve the effectiveness of the approach, further
clustering from the hierarchy can be taken into account.
– The clustering with a cut-heigth greater than C° is likely to include
smaller clusters.
– The clustering with a cut-heigth less than C° is likely to include larger
clusters.
• A heuristic approach:
– Use a cut-heigth greater than C° if the considered clustering massively
includes spurious clusters.
– Use a cut-heigth less than C° if the considered clustering massively
includes incomplete clusters.

23

Conclusions

• The inceasing diffusion and increasing complexity of
WAs oblige reseachers to seach for effective Reverse
engineering techniques involving WAs.
• Clustering approaches can be used to collapse the
size of a WA and carry out comprehension processes
more effectively.
• A clustering approach exploiting a coupling measure
of WA components that considers both the typology
and the topology of connections has been proposed
and preliminarly validated.

24

Future work

• A finer model of dependencies between WA
components will be investigated.

• The data flow between components will contribute to
the evaluation of the coupling.

• Experimenting the clustering approach in the context
of WA remodularization and reengineering.

25

Comprehending Web Applications by a Clustering Based Approach

Recomendados

Recomendados

Mais conteúdo relacionado

Destaque

Destaque (8)

Semelhante a Comprehending Web Applications by a Clustering Based Approach

Semelhante a Comprehending Web Applications by a Clustering Based Approach (20)

Mais de Porfirio Tramontana

Mais de Porfirio Tramontana (17)

Último

Último (20)

Comprehending Web Applications by a Clustering Based Approach