The development of Web sites and applications is increasing dramatically to satisfy the market requests. The software industry is facing the new demand under the pressure of a very short time-to-market and an extremely high competition. As a result, Web sites and applications are usually developed without a disciplined process: Web applications are directly coded and no, or poor, documentation is produced to support the subsequent
maintenance and evolution activities, thus compromising the quality of the applications.
This paper presents a tool for reverse engineering Web applications. UML diagrams are used to model a set of views that depict several aspects of a Web application at different abstraction levels. The recovered diagrams ease the comprehension of the application and support its maintenance and evolution. A case study, carried out with the aim of assessing the effectiveness of the proposed tool, allowed relevant information about some real Web applications to be successfully recovered and modeled by UML diagrams.
Videogame localization & technology_ how to enhance the power of translation.pdf
WARE: a tool for the Reverse Engineering of Web Applications
1. WARE: a tool for the Reverse
Engineering of Web Applications
Anna Rita Fasolino
G. A. Di Lucca, F. Pace, P. Tramontana, U. De Carlini
Dipartimento di Informatica e Sistemistica
University of Naples Federico II, Italy
CSMR 2002, Budapest, Hungary
1
2. Web Applications (WA):
problems and open issues
• The pressing market demand of web applications
– WAs developed in very short time
• The continuously changing needs of the evolving
application domains
– WAs frequently and rapidly modified with ad hoc approaches
• The lack of method in producing and maintaining
Web applications
– low quality software, with disordered architecture, and
inadequate and incomplete documentation
CSMR 2002, Budapest, Hungary
2
3. Web applications: a recent classification
• Class 1: primarily static web sites
• Class 2: sites providing client-side interaction
– Based on an event model exploiting some script code
• Class 3: applications with dynamic content
– Pages are dynamically created on the fly, with the support of
various WWW rechnologies (CGI, JSP, PHP, Javascript,
XML, ODBC,…ASP,… )
Tilley and Huang, ICSE 2001
CSMR 2002, Budapest, Hungary
3
4. Dynamic web applications
• Due to the large number of employed technologies,
understanding, maintaining and evolving a Class 3
application is a complex task
• The need for specific Reverse Engineering
techniques and processes to recover:
– Static and dynamic aspects of the applications
– Suitable representation models
• The need for reverse engineering tools that support
the extraction and the abstraction of the needed
information from a WA
CSMR 2002, Budapest, Hungary
4
5. Reverse engineering Web Applications
• A reverse engineering process to recover the
following views:
– The static architecture of the WA
– The dynamic interactions between its components
– The final behavior offered
• Extended UML diagrams to represent these views
– Class diagrams to model the architecture
– Sequence and Collaboration diagrams to represent the
dynamic model
– Use case diagrams to describe the behavior of the WA
CSMR 2002, Budapest, Hungary
5
6. The Reverse Engineering process
STATIC Components and
Web Application Direct Relations
ANALYSIS
Source Code between them
BEHAVIORAL Behavioral
ANALYSIS Model
DYNAMIC Dynamic Interactions
WA in ANALYSIS between Components
Execution
CSMR 2002, Budapest, Hungary
6
7. Views obtainable by static analysis
• A coarse grained view:
– Web pages and Hypertextual links between pages :
– Pages are distinguished into server and client pages, static and
dynamic pages, simple and framed pages
• A finer grained view:
– inner page components and relationships
– input/output form, text box, anchors, scripts, applets, text, images,
multimedia objects (sounds, movies), …...
– Page components may be active components (e.g.,scripts or
applets)
– The relationships include: submit, build, redirect, include
• A UML class diagram representing both views
CSMR 2002, Budapest, Hungary
7
9. The WARE tool
• Designed to support the reverse engineering of a WA
– executes the static analysis of the WA source code
– populates a repository with the extracted information
– supports the user in abstracting the WA models
• Three main components:
– Interface Layer
– Service Layer
– Repository
CSMR 2002, Budapest, Hungary
9
10. Interface layer
Graphical Visualizer
WARE
GUI VCG Dotty RIGI
WARE
Architecture
Service Layer
Parsers
IRF Translator
HTML JS
WA ASP Query Executor
Source ….
Files VBS
UML Diagram
PHP Abstractor
Extractor Abstractor
IRF Diagrams
DBR Repository
CSMR 2002, Budapest, Hungary
10
11. The Service layer
• The Extractor component
– The Extractor parses the WA source code and produces an
Intermediate Representation Form (IRF)
– Several distinct parsers (for HTML, JavaScript, VBScript, ASP and
PHP technologies)
– New parsers can be added as the technology evolves
• The Abstractor component
– It implements abstraction operations necessary for reconstructing
the more abstract views of the WA
– It includes three main components
• A Translator that populates the relational database from the IRF
• A Query Executor that implements predefined queries over the
database
• The UML Diagram Abstractor that produces the class diagrams of a
WA at various degrees of detail and other relevant information
CSMR 2002, Budapest, Hungary
11
12. The conceptual schema of the Repository
Java Applet Media Flash Object Mail Address Other Object
Web Object redirect Server Class
Server Function
Client Function
Static Page Built Page redirect Server Script
Web Page
Session Variable
Client Script Parameter
Modify Tag Client Page
Server Page include Server Cookie
event
Anchor
Client Module HTML Tag
Interface Object
Download Hyperlink
Submits
Form
DB Interface Mail Interface Server File Interface Other Interface
Frameset
Generic File
Field
Select Button Textarea
Frame
source
CSMR 2002, Budapest, Hungary
12
13. The Interface Layer
• It provides the user interface for accessing the WARE
facilities:
– Reverse Engineering for parsing the WA and producing the
IRF
– Comprehension for executing comprehension-related
activities, such as exploring the inventory of the WA
components and their source code, computing the
reachability set of a component, creating clusters of related
components according to a given clustering criterion
– Query (Predefined query and Parametric query) for
activating the Abstractor’s functions, and graphically
visualizing the recovered information
CSMR 2002, Budapest, Hungary
13
14. Computation of
the Reachability
set of a WA
component
Figure 4.4 – b: WA Components reachable starting from a selected
component
Grouping the WA
components into
cohesive clusters
CSMR 2002, Budapest, Hungary
14
15. An experiment
• Carried out for assessing the adequacy of the tool functions in
supporting maintenance tasks
• The tool has been used to understand and redocument existing
WAs
• An example: A WA implementing a ‘Juridical Laboratory’
– 201 files, in 19 directories, sizing 4 Mbytes
– 55 Static Pages (55 HTML files in 10 directories)
– 19 Server Pages (19 ASP files in 4 directories)
– No design documentation available
• The WARE tool was preliminarly used to perform:
– The static analysis of the application, and for producing an
inventory of the WA components
CSMR 2002, Budapest, Hungary
15
16. First step: the static analysis
• Main results:
– The population of the repository automatically produced by
the tool
– The inventory of the WA components
– Graphical representations of the class diagram of the WA at
various degrees of detail
• Coarse-grained representations (including just the pages)
• Fine-grained representation (including inner page components)
CSMR 2002, Budapest, Hungary
16
17. The inventory of the WA components
Component type # Detected
Server page 19
Client Static page 55
Client Built page 14
External web page 3
Client script block 53
Function in Client script block 19
Form 11
Input/ output field 71
Submit Operation (POST method) 4
Submit Operation (GET method) 7
Anchor to files to be downloaded 111
Anchor to Hypertextual link 49
Data File 61
Server script block 76
Function in Server scripts 4
Database Interface Object 29
Mail Interface Object 3
Image file 65
Redirect operation in server 7
blocks
CSMR 2002, Budapest, Hungary
17
19. The second phase: formulating and validating
hypotheses about the WA behavior
• Driven by the graphical representations, notable sub-graphs
were looked for (isolated sub-graphs, sub-trees, strongly
connected components, …)
• A tentative hypothesis about the behavior implemented by each
sub-graph was formulated with the support of:
– the names of the components, their source code analysis, and by
tracing the application execution
• Hypotheses were validated by the source code execution
CSMR 2002, Budapest, Hungary
19
20. Results from the sub-graph analysis
• Four isolated sub-graphs
– Three small ones without static client pages
• Two of them implemented server-side functions for the web
administrator (management of the mailing list, and of the
registered users list)
• The third one resulted from an incorrectly made maintenance
operation
– A large one, rooted in the home page
• Nine user functions could be associated with nine notable sub-
graphs contained in it
CSMR 2002, Budapest, Hungary
20
21. The sub-graphs in the WA class diagram
Professional Yellow Pages
Mailing List
Judicial
Proceedings
Judicial
Case
News Studies
Home
Web Searching
Chat
Forum
About
CSMR 2002, Budapest, Hungary
21
22. Modeling the WA behavior
About
• A use case was defined and
associated with each notable
News
Home
sub-graph
Judicial Case Studies
• The scenarios describing the
use cases were defined with
User
Judicial Proceedings
the support of the tool (the
interactions between objects
Web Searching
Professionals Yellow Pages involved in the sub-graph
Registered User
WA Administrator were searched for and
Forum Mailing List analyzed)
Chat
CSMR 2002, Budapest, Hungary
22
23. Modeling the WA dynamic view
CSMR 2002, Budapest, Hungary
23
24. Conclusions
• The experiments we carried out showed that the
reverse engineering tool WARE can be used to
support the comprehension of Web Applications to be
maintained
• It supports the reconstruction of various UML
diagrams from undocumented applications:
– Static views are automatically produced by the tool
– Behavioral and dynamic views can be semi-automatically
obtained with the tool assistance
• A reverse engineering process based on the WARE
tool could be defined
CSMR 2002, Budapest, Hungary
24
25. Future work
• Specific clustering criteria may support a more
effective analysis of the class diagrams recovered by
the tool
• A clustering approach should take into account both
the topology of the graphs, and the typology of the
connections in the graph
• The automatic clustering of Was will be investigated
in the context of large size web applications
CSMR 2002, Budapest, Hungary
25