11. Example: a God Class centralizes too much
intelligence in the system
Class uses directly more than a
few attributes of other classes
ATFD > FEW
Functional complexity of the
class is very high
AND GodClass
WMC ! VERY HIGH
Class cohesion is low
TCC < ONE THIRD
6
scu 200
Marine
Lanza,
12. Polymetric views show up to 5 metrics
Width metric
Height metric
Position metrics
Color
metric
03
0
Lanza 2
38. Dynamix
... Instance Activation
FAMIX
... Class Method
39. Dynamix ObjectFlow
... Instance Activation ... Alias
FAMIX
... Class Method
40. Dynamix ObjectFlow
... Instance Activation ... Alias
FAMIX
... Class Method
Hismo
Class Method
...
History History
41. Dynamix ObjectFlow
... Instance Activation ... Alias
FAMIX
... Class Method
Subversion Hismo
File File Class Method
... ...
History Version History History
42. Dynamix ObjectFlow
... Instance Activation ... Alias
CVS FAMIX
File File
... ... Class Method
History Version
Subversion Hismo
File File Class Method
... ...
History Version History History
43. BugsLife Dynamix ObjectFlow
... Activity ... Instance Activation ... Alias
Bug
CVS FAMIX
File File
... ... Class Method
History Version
Subversion Hismo
File File Class Method
... ...
History Version History History
44. BugsLife Dynamix ObjectFlow
... Activity ... Instance Activation ... Alias
Bug
CVS FAMIX Dude
File File
... ... Class Method ... Duplication
History Version
Subversion Hismo
File File Class Method
... ...
History Version History History
45. BugsLife Dynamix ObjectFlow
... Activity ... Instance Activation ... Alias
Bug
CVS FAMIX Dude
File File
... ... Class Method ... Duplication
History Version
Subversion Hismo ...
File File Class Method
... ... ...
History Version History History
46. BugsLife Dynamix ObjectFlow
o...dels Alias
... Instance Activation -m
eta
y of m
... Activity
Bug
famil
X is a
FAMI
FAMIX Core
CVS Dude
File File
... ... Class Method ... Duplication
History Version
Subversion Hismo ...
File File Class Method
... ... ...
History Version History History
47. FM3 is the meta-meta-model
FM3.Element
name: String
fullName: String
superclass opposite
FM3.Package FM3.Class FM3.Property
derived: Boolean
keyed: Boolean
multivalued: Boolean
type
extensions
08
20
rwaest
e
Kuhn, V
48. MSE is the exchange format
(FAMIX.Class (id: 100)
(name 'Server')
(container (ref: 82))
(isAbstract false)
(isInterface false)
(package (ref: 624))
(stub false)
(NOM 9)
(WLOC 124))
(FAMIX.Method (id: 101)
(name 'accept')
(signature 'accept(Visitor v)')
(parentClass (ref: 100))
(accessControl 'public')
(hasClassScope false)
(stub false)
(LOC 7)
08
20
(CYCLO 3))
rwaest
e
Kuhn, V
49. is an analysis tool
is a modeling platform
is a visualization platform
is a tool building platform
is a collaboration
50. is an analysis tool
is a modeling platform
is a visualization platform
is a tool building platform
is a collaboration
65. is an analysis tool
is a modeling platform
is a visualization platform
is a tool building platform
is a collaboration
66. is an analysis tool
is a modeling platform
is a visualization platform
is a tool building platform
is a collaboration
67. Université Catholique de Louvain
INRIA Lille
Politehnica University of Timisoara
University of Berne
University of Lugano
68. Current Team Previous Team
Stéphane Ducasse Serge Demeyer
Tudor Gîrba Adrian Kuhn
Michele Lanza
Sander Tichelaar
ears Contributors
y
en Previous
m
Current Contributors
100
> Tobias Aebi Michael Meer
Hani Abdeen Jannik Laval
Ilham Alloui Michael Meyer
Philipp Bunge Adrian Lienhard Gabriela Arevalo Laura Ponisio
Alexandre Bergel Mircea Lungu Mihai Balint Daniel Ratiu
Frank Buchli Matthias Rieger
Johan Brichau Oscar Nierstrasz
Thomas Bühler Azadeh Razavizadeh
Marco D’Ambros Damien Pollet Calogero Butera Andreas Schlapbach
Simon Denier Jorge Ressia Daniel Frey Daniel Schweizer
Georges Golomingi Mauricio Seeberger
Orla Greevy Toon Verwaest David Gurtner Lukas Steiger
Matthias Junker Richard Wettel Reinout Heeck Daniele Talerico
Markus Hofstetter Herve Verjus
Markus Kobel Violeta Voinescu
Michael Locher Sara Sellos
Martin von Löwis Lucas Streit
Pietro Malorgio Roel Wuyts
69. is an analysis tool
is a modeling platform
is a visualization platform
is a tool building platform
is a collaboration
70. is an analysis tool
is a modeling platform
is a visualization platform
ea
id
is a tool building platform
an
is
is a collaboration
71. is an analysis tool
is a modeling platform
is a visualization platform
ea
id
is a tool building platform
an
is
is a collaboration
72. Semantic Clustering: Identifying Topics in
Scripting Visualizations with Mondrian Test Blueprints — Exposing Side Effects in
Source Code Execution Traces to Support Writing Unit Tests
Michael Meyer and Tudor Gˆrba
ı
Software Composition Group, University of Berne, Switzerland Adrian Lienhard*, Tudor Gˆrba, Orla Greevy and Oscar Nierstrasz
ı
Adrian Kuhn a,1 St´phane Ducasse b,2 Tudor Gˆ a,1 Software Composition Group, University of Bern, Switzerland
e ırba
{lienhard, girba, greevy, oscar}@iam.unibe.ch
Abstract 2 Mondrian by example a Software Composition Group, University of Berne, Switzerland
b Language and Software Evolution Group, LISTIC, Universit´ de Savoie, France
e
Most visualization tools focus on a finite set of dedicated In this section we give a simple step-by-step example of Abstract Implementing a fixture and all the relevant assertions re-
visualizations that are adjustable via a user-interface. In how to script a visualization using Mondrian. The example quired can be challenging if the code is the only source of
this demo, we present Mondrian, a new visualization engine builds on a small model of a source code with 32 classes. Writing unit tests for legacy systems is a key maintenance information. One reason is that the gap between static struc-
designed to minimize the time-to-solution. We achieve this The task we propose is to provide a on overview of the hi- task. When writing tests for object-oriented programs, ob- ture and runtime behavior is particularly large with object-
by working directly on the underlying data, by making nest- Abstract
erarchies. oriented programs. Side effects1 make program behavior
jects need to be set up and the expected effects of executing
ing an integral part of the model and by defining a powerful Creating a view and adding nodes. Suppose we can the unit under test need to be verified. If developers lack more difficult to predict. Often, encapsulation and complex
scripting language that can be used to define visualizations. Many of the existing approaches in Software Comprehension focus on program pro-
ask the model object for the classes. We can add those internal knowledge of a system, the task of writing tests is chains of method executions hide where side effects are pro-
We support exploring data in an interactive way by provid- classes to a newly created view by creating a node for each gram structure or external documentation. However, by analyzing formal informa- non-trivial. To address this problem, we propose an ap- duced [2]. Developers usually resort to using debuggers to
ing hooks for various events. Users can register actions for class, where each node is represented as a Rectangle. In the proach that exposes side effects detected in example runs of obtain detailed information about the side effects, but this
tion the informal semantics contained in the vocabulary of source code are over-
these events in the visualization script. case above, NOA, NOM and LOC are methods in the object the system and uses these side effects to guide the developer implies low level manual analysis that is tedious and time
looked. To understand software as a whole, we need to enrich software analysis with
representing a class and return the value of the correspond- when writing tests. We introduce a visualization called Test consuming [25].
the developer knowledge hidden in the code naming. This paper proposes the use
ing metric. Blueprint, through which we identify what the required fix- Thus, the underlying research question of the work we
of information retrieval to exploit linguistic information found in source code, such ture is and what assertions are needed to verify the correct present in this paper is: how can we support developers
view := ViewRenderer new.
as identifier names and comments. We introduce Semantic Clustering, a technique
1 Introduction behavior of a unit under test. The dynamic analysis tech- faced with the task of writing unit tests for unfamiliar legacy
view newShape rectangle;
based on Latent Semantic Indexing and clustering to group source artifacts that use
width: #NOA; height: #NOM; linearColor: #LOC within: model classes;
nique that underlies our approach is based on both tracing code? The approach we propose is based on analyzing run-
withBorder.
similar vocabulary. We call these groups semantic clusters and we interpret them method executions and on tracking the flow of objects at time executions of a program. Parts of a program execu-
view nodes: model classes.
view open.
as linguistic topics that reveal the intention of the code. We compare the topics runtime. To demonstrate the usefulness of our approach we tion, selected by the developer, serve as examples for new
Visualization is an established tool to reason about data.
present results from two case studies. unit tests. Rather than manually stepping through the ex-
to each other, identify links between them, provide automatically retrieved labels,
Given a wanted visualization, we can typically find tools
ecution with a debugger, we perform dynamic analysis to
and use a visualization to illustrate how they are distributed over the system. Our
that take as input a certain format and that provide the Keywords: Dynamic Analysis, Object Flow Analysis,
Adding edges and layouting. To show how classes in- derive information to support the task of writing tests with-
approach is language independent as it works at the level of identifier names. To
needed visualization [4]. Software Maintenance, Unit Testing
herit from each other, we can add an edge for each inheri- out requiring a detailed understanding of the source code.
validate our approach we applied it on several case studies, two of which we present
One drawback of the approach is that, when a deep rea- tance relationship. In our example, supposing that we can In our experimental tool, we present a visual represen-
1 Introduction
in this paper.
soning is required, we need to refer back to the capabili- ask the model for all the inheritance objects, and given an tation of the dynamic information in a diagram similar to
ties of the original tool that manipulates the original data. inheritance object, we will create an edge between the node the UML object diagram [11]. We call this diagram a Test
Note: Some of the visualizations presented make heavy use of colors. Please obtain
can
Creating automated tests for legacy systems is a key
Another drawback is that it actually duplicates the required holding the superclass and the node holding the subclass. Blueprint as it serves as a plan for implementing a test. It
maintenance task [9]. Tests are used to assess if legacy be-
a color copy of the article for better understanding.
resources unnecessarily: the data is present both in the orig- We layout the nodes in a tree. reveals the minimal required fixture and the side effects that
havior has been preserved after performing modifications or
inal tool, and in the visualization tool. Several tools take a are produced during the execution of a particular program
extensions to the code. Unit testing (i.e., tests based on the
view := ViewRenderer new.
middle ground approach and choose to work close with the unit. Thus, the Test Blueprint reveals the exact information
Key words: reverse engineering, clustering, latent semantic indexing, visualization
view newShape rectangle;
XUnit frameworks [1]) is an established and widely used
width: #NOA; height: #NOM; linearColor: #LOC within: model classes;
data by either offering integration with other services [1], that should be verified with a corresponding test.
testing technique. It is now generally recognized as an es-
PACS:
withBorder.
or providing the services themselves [2]. However, when To generate a Test Blueprint, we need to accurately an-
view nodes: model classes.
sential phase in the software development life cycle to en-
view edges: model inheritances
another type of service is required, the integration is lost. alyze object usage, object reference transfers, and the side
sure software quality, as it can lead to early detection of
from: #superclass
effects that are produced as a result of a program execution.
to: #subclass.
We present Mondrian, a visualization engine that imple- defects, even if they are subtle and well hidden [2].
view treeLayout.
To do so, we perform a dynamic Object Flow Analysis in
ments a radically different approach. Instead of provid- The task of writing a unit test involves (i) choosing an
view open.
Email addresses: akuhn@iam.unibe.ch (Adrian Kuhn), sduca@unv-savoie.fr conjunction with conventional execution tracing [17].
ing a required data format, we provide a simple interface appropriate program unit, (ii) creating a fixture, (iii) execut-
(St´phane Ducasse), girba@iam.unibe.ch (Tudor Gˆ
e ırba). Object Flow Analysis is a novel dynamic analysis which
Nesting. To obtain more details for the classes, we
through which the programmer can easily script the visu- ing the unit under test within the context of the fixture, and
tracks the transfer of object references in a program execu-
We gratefully acknowledge the financial support of the Swiss National Science
1
alization in a declarative fashion (more information can be would like to see which are the methods inside. To nest we (iv) verifying the expected behavior of the unit using asser-
tion. In previous work, we demonstrated how we success-
found in [3]). That is, our solution works directly with the specify for each node the view that goes inside. Supposing tions [1]. All these actions require detailed knowledge of
Foundation for the project “Recast: Evolution of Object-Oriented Applications”
objects in the data model, and instead of duplicating the ob- that we can ask each class in the model about its methods, the system. Therefore, the task of writing unit tests may
(SNF 2000-061655.00/1) 1 We refer to side effect as the program state modifications produced by
jects by model transformation, we transform the messages we can add those methods to the class by specifying the prove difficult as developers are often faced with unfamiliar
2 We gratefully acknowledge the financial support of the french ANR for the project a behavior. We consider the term program state to be limited to the scope
papers
sent to the original objects via meta-model transformations. view for each class. legacy systems. of the application under analysis (i.e., excluding socket or display updates).
“Cook: R´architecturisation des applications ` objets”
e a
1
Preprint submitted to Elsevier Science 11 October 2006
e than
e mor
db
shoul
earch
Res The Story of Moose: an Agile Reengineering Environment
Oscar Nierstrasz Stephane Ducasse
´ Tudor Gˆrba
ı
Practical Object-Oriented Back-in-Time Debugging Enriching Reverse Engineering with
Software Composition Group Software Composition Group Software Composition Group
University of Berne University of Berne University of Berne
Annotations
Switzerland Switzerland Switzerland
Adrian Lienhard, Tudor Gˆrba and Oscar Nierstrasz
ı www.iam.unibe.ch/∼scg
Andrea Br¨hlmann, Tudor Gˆ
u ırba, Orla Greevy, Oscar Nierstrasz
Software Composition Group, University of Bern, Switzerland
ABSTRACT Software Composition Group, University of Bern, Switzerland
http://scg.unibe.ch/
Abstract. Back-in-time debuggers are extremely useful tools for identifying the Requirements
Moose is a language-independent environment for reverse-
causes of bugs, as they allow us to inspect the past states of objects that are no and re-engineering complex software systems. Moose pro-
longer present in the current execution stack. Unfortunately the “omniscient” ap- vides a set of services including a common meta-model, met- problem assessment
xxx
proaches that try to remember all previous states are impractical because they
Xxx
rics evaluation and visualization, a model repository, and z
a
Designs Abstract. Much of the knowledge about software systems is implicit,
either consume too much space or they are far too slow. Several approaches rely generic GUI support for querying, browsing and grouping. z
yyy
and therefore difficult to recover by purely automated techniques. Archi-
The development effort invested in Moose has paid off in
on heuristics to limit these penalties, but they ultimately end up throwing out
Yyy
model capture and analysis
tectural layers and the externally visible features of software systems are
precisely those research activities that benefit from applying
too much relevant information. In this paper we propose a practical approach
two examples of information that can be difficult to detect from source
a combination of complementary techniques. We describe Code migration
to back-in-time debugging that attempts to keep track of only the relevant past
how Moose has evolved over the years, we draw a number code alone, and that would benefit from additional human knowledge.
data. In contrast to other approaches, we keep object history information together of lessons learned from our experience, and we outline the Typical approaches to reasoning about data involve encoding an explicit
with the regular objects in the application memory. Although seemingly counter- present and future of Moose. meta-model and expressing analyses at that level. Due to its informal na-
intuitive, this approach has the effect that past data that is not reachable from cur-
Figure 1: The Reengineering life cycle. ture, however, human knowledge can be difficult to characterize up-front
Categories and Subject Descriptors
rent application objects (and hence, no longer relevant) is automatically garbage
and integrate into such a meta-model. We propose a generic, annotation-
collected. In this paper we describe the technical details of our approach, and
D.2.7 [Software Engineering]: Maintenance—Restructur- based approach to capture such knowledge during the reverse engineering
we present benchmarks that demonstrate that memory consumption stays within ing, reverse engineering, and reengineering process. Annotation types can be iteratively defined, refined and trans-
practical bounds. Furthermore since our approach works at the virtual machine reengineer. In addition to the code base, there may be doc- formed, without requiring a fixed meta-model to be defined in advance.
level, the performance penalty is significantly less than with other approaches. General Terms umentation (though often out of sync with the code), bug
We show how our approach supports reverse engineering by implement-
reports, tests and test data, database schemas, and espe-
ing it in a tool called Metanool and by applying it to (i) analyzing archi-
Measurement, Design, Experimentation cially the version history of the code base. Other important
1 Introduction tectural layering, (ii) tracking reengineering tasks, (iii) detecting design
sources of information include the various stakeholders (i.e.,
Keywords flaws, and (iv) analyzing features.
users, developers, maintainers, etc.), and the running system
itself. The reengineer will neither rely on a single source of
When debugging object-oriented systems, the hardest task is to find the actual root Reverse engineering, Reengineering, Metrics, Visualization
information, nor on a single technique for extracting and
cause of the failure as this can be far from where the bug actually manifests itself [1]. analyzing that information [11].
1. INTRODUCTION 1 Introduction
In a recent study, Liblit et al. examined bug symptoms for various programs and found Reengineering is a complex task, and it usually involves
Software systems need to evolve continuously to be effec-
that in 50% of the cases the execution stack contains essentially no information about several techniques. The more data we have at hand, the
Most reverse engineering techniques focus on automatically extracting infor-
tive [41]. As systems evolve, their structure decays, unless
the bug’s cause [2]. more techniques we require to apply to understand this data.
effort is undertaken to reengineer them [41, 44, 23, 11]. mation from the source code without taking external human knowledge into
These techniques range from data mining, to data presen-
Classical debuggers are not always up to the task, since they only provide access to The reengineering process comprises various activities, in-
consideration. More often than not however, important external information is
tation and to data manipulation. Different techniques are
information that is still in the run-time stack. In particular, the information needed to cluding model capture and analysis (i.e., reverse engineer- implemented in different tools, by different people. An in-
available (e.g., developer knowledge or domain specific knowledge) which would
track down these difficult bugs includes (1) how an object reference got here, and (2) ing), assessment of problems to be repaired, and migration frastructure is needed for integrating all these tools.
greatly enhance analyses if it could be taken into account.
from the legacy software towards the reengineered system.
the previous values of an object’s fields. For this reason it is helpful to have previous ob- Moose is a reengineering environment that offers a com-
Only few reverse engineering approaches integrate such external human knowl-
Although in practice this is an ongoing and iterative process, mon infrastructure for various reverse- and re-engineering
ject states and object reference flow information at hand during debugging. Techniques we can idealize it (see Figure 1) as a transformation through
edge into the analysis. For example, reflexion models have been proposed for ar-
tools [22]. At the core of Moose is a common meta-model
and tools like back-in-time debuggers, which allow one to inspect previous program various abstraction layers from legacy code towards a new for representing software systems in a language-independent chitecture recovery by capturing developer knowledge and then manually map-
states and step backwards in the control flow, have gained increasing attention recently system [11, 13, 35]. way. Around this core are provided various services that
ping this knowledge to the source code [1,2]. Another example is provided by
[3,4,5,6]. What may not be clear from this very simplified picture is are available to the different tools. These services include
Intensional Views which make use of rules that encode external constraints and
that various kinds of documents are available to the software
The ideal support for a back-in-time debugger is provided by an omniscient imple- metrics evaluation and visualization, a repository for storing
are checked against the actual source code [3].
multiple models, a meta-meta model for tailoring the Moose
mentation that remembers the complete object history, but such solutions are imprac-
In this paper we propose a generic framework based on annotations to en-
meta-model, and a generic GUI for browsing, querying and
Permission to make digital or hard copies of all or part of this work for
tical because they generate enormous amounts of information. Storing the data to disk grouping.
personal or classroom use is granted without fee provided that copies are hance a reverse engineered model with external knowledge so that automatic
instead of keeping it in memory can alleviate the problem, but it only postpones the Moose has been developed over nearly ten years, and has
not made or distributed for profit or commercial advantage and that copies
analyses can take this knowledge into account. A key feature of our approach
end, and it has the drawback of further increasing the runtime overhead. Current imple- bear this notice and the full citation on the first page. To copy otherwise, to itself been extensively reengineered during the time that it
republish, to post on servers or to redistribute to lists, requires prior specific
mentations such as ODB [3], TOD [4] or Unstuck [5] can incur a slowdown of factor has evolved. Initially Moose was little more than a com-
permission and/or a fee. Models 2008, Krzysztof Czarnecki, et al. (Eds.), LNCS, vol. 5301, Springer-Verlag,
mon meta-model for integrating various ad hoc tools. As it
100 or more for non-trivial programs. Proceedings ESEC-FSE’05, pp. 1-10, ISBN 1-59593-014-0. September 2008, pp. 660-674.
became apparent that these tools would benefit immensely
5–9, 2005, Lisbon, Portugal.
from a common infrastructure, we invested in the evolution
Copyright 2005 ACM 1-59593-014-0/05/0009 ...$5.00.