Automatic Detection of Performance Design and Deployment Antipatterns in Component Based Enterprise Systems

Automatic Detection of
Performance Design and
Deployment Antipatterns in
Component Based Enterprise
Systems
by
Trevor Parsons
The thesis is submitted to
University College Dublin
for the degree of PhD
in the
College of Engineering, Mathematical and Physical Sciences.
November 2007
School of Computer Science and Informatics
Dr. J. Carthy. (Head of Department)
Under the supervision of
Dr. J. Murphy

CONTENTS
Abstract vii
Acknowledgements ix
List of Publications xi
1 Introduction 1
1.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Thesis Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Thesis Contributions and Statement . . . . . . . . . . . . . . . . . . . . . 5
1.4 Key Assumptions and Scope . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 Background 8
2.1 Performance of Software Systems . . . . . . . . . . . . . . . . . . . . . . 10
2.2 Component Based Software . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2.1 Software Components . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2.2 Component Frameworks . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 The Java Enterprise Edition . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3.1 Web Tier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3.2 Business Tier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3.3 Enterprise Information System Tier . . . . . . . . . . . . . . . . . 14
2.4 The Enterprise JavaBean Technology . . . . . . . . . . . . . . . . . . . . 15
2.4.1 The EJB Roles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.4.2 The EJB Component Model . . . . . . . . . . . . . . . . . . . . . 16
2.4.3 EJB Runtime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4.4 Deployment Settings . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.5 Software Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.6 Software Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.7 Software Antipatterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.8 Performance Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.8.1 Workload Generation . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.8.2 Proﬁling Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.9 Reverse Engineering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.10 Design Pattern and Antipattern Detection . . . . . . . . . . . . . . . . . 36
2.11 Knowledge Discovery in Databases and Data Mining . . . . . . . . . . . 39
2.11.1 Frequent Pattern Mining and Clustering . . . . . . . . . . . . . . 40
i

3 Overview of Approach 42
3.1 Approach for the Automatic Detection of Performance Antipatterns . . 44
3.1.1 Research Methodology and Validation Criteria . . . . . . . . . . 45
3.1.2 Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.1.3 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.1.4 Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.2 Antipatterns Revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.2.1 Antipattern Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . 51
4 Monitoring Required for Antipattern Detection 54
4.1 Chapter Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.2 Run-Time Path Tracing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.2.1 Run-Time Paths Overview . . . . . . . . . . . . . . . . . . . . . . 57
4.2.2 Run-Time Path Tracing Motivation . . . . . . . . . . . . . . . . . 60
4.2.3 Run-Time Path Tracing Considerations . . . . . . . . . . . . . . 61
4.2.4 COMPAS Monitoring Framework . . . . . . . . . . . . . . . . . 62
4.2.5 COMPAS Extensions . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.3 Monitoring Server Resource Usage and Extracting Component Meta-
Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.3.1 Using Java Management Extensions to Monitoring Server Re-
source Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.3.2 Automatically Extracting Component Meta-Data . . . . . . . . . 80
4.4 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.4.1 Applications of Run-Time Paths . . . . . . . . . . . . . . . . . . . 82
4.4.2 Alternative Representations for Component Interactions . . . . 83
4.4.3 Run-Time Interaction Tracing Approaches . . . . . . . . . . . . . 84
5 Reconstructing the Systems Design for Antipattern Detection 88
5.2 Automatically Extracting Component Relationships and Object Usage
Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.3 Reconstructing Run-time Container Services . . . . . . . . . . . . . . . . 96
5.4 Identifying Component Communication Patterns in Run-Time Paths
using Frequent Sequence Mining . . . . . . . . . . . . . . . . . . . . . . . 97
5.4.1 Frequent Itemset Mining and Frequent Sequence Mining . . . . 97
5.4.2 Support Counting for Run-Time Paths . . . . . . . . . . . . . . . 99
5.4.3 Further Criteria for Interestingness . . . . . . . . . . . . . . . . . 102
5.4.4 Preprocessing for FSM Performance Improvement . . . . . . . . 102
5.4.5 Closed Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
5.4.6 PostProcessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.4.7 Component Communication Information for the Extracted De-
sign Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.5 Data Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
5.5.1 Clustering Run-time Paths . . . . . . . . . . . . . . . . . . . . . . 106
5.5.2 Statistical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 107
5.6 The Reconstructed Design Model . . . . . . . . . . . . . . . . . . . . . . 107
5.7 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.7.1 Reverse Engineering . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.7.2 Data Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
6 Detecting Performance Design and Deployment Antipatterns 112
ii

6.1 Antipatterns Categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
6.2 A Rule Engine Approach for Antipattern Detection . . . . . . . . . . . . 114
6.3 Example Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
6.3.1 Antipattern Library . . . . . . . . . . . . . . . . . . . . . . . . . . 117
6.3.2 Filtering Using Threshold Values . . . . . . . . . . . . . . . . . . 118
6.4 PAD Tool User Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
6.5 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
6.5.1 Antipattern Categorisation . . . . . . . . . . . . . . . . . . . . . 119
6.5.2 Performance Testing . . . . . . . . . . . . . . . . . . . . . . . . . 120
6.5.3 Detection Techniques . . . . . . . . . . . . . . . . . . . . . . . . . 120
6.5.4 Antipattern Detection . . . . . . . . . . . . . . . . . . . . . . . . 121
7 Results and Evaluation 124
7.2 COMPAS JEEM Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
7.2.1 Deducing System structure . . . . . . . . . . . . . . . . . . . . . 127
7.2.2 Portability Assessment . . . . . . . . . . . . . . . . . . . . . . . . 132
7.2.3 Performance Overhead . . . . . . . . . . . . . . . . . . . . . . . . 133
7.3 Analysis Module Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
7.3.1 FSM Performance Tests . . . . . . . . . . . . . . . . . . . . . . . . 135
7.3.2 Applying FSM to Identify Design Flaws . . . . . . . . . . . . . . 140
7.3.3 Data Reduction Results . . . . . . . . . . . . . . . . . . . . . . . . 142
7.4 PAD Tool Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
7.4.1 Antipatterns Detected in the Duke’s Bank Application . . . . . 144
7.4.2 Antipatterns Detected in the IBM Workplace Application - Beta
Version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
7.5 Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
7.5.1 Overview of Contributions and Evaluation Criteria . . . . . . . 151
7.5.2 Validation of Contributions . . . . . . . . . . . . . . . . . . . . . 152
8 Conclusions 153
8.1 Thesis Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
8.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
8.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
References 171
A Antipattern Rule Library 172
A.1 Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
A.1.1 Category1: Antipatterns Across or Within Run-Time Paths . . . 172
A.1.2 Category2: Inter-Component Relationship Antipatterns . . . . . 174
A.1.3 Category3: Antipatterns Related to Component Communica-
tion Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
A.1.4 Category4: Data Tracking Antipatterns . . . . . . . . . . . . . . 177
A.1.5 Category5: Pooling Antipatterns . . . . . . . . . . . . . . . . . . 177
A.1.6 Category6: Intra-Component Antipatterns . . . . . . . . . . . . 178
A.1.7 Adding Rules to The Rule Library . . . . . . . . . . . . . . . . . 178
A.2 Jess User Deﬁned Functions provided by the PAD Tool . . . . . . . . . . 178
A.3 Conﬁguration Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
B JEEM and FSM Implementation Source Code 180
iii

LIST OF FIGURES
1.1 Typical Enterprise Architecture . . . . . . . . . . . . . . . . . . . . . . 3
2.1 Typical JEE Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Client Invoking an EJB . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.3 Example EJB Deployment Descriptor . . . . . . . . . . . . . . . . . . . 20
2.4 Stateless Session Bean Lifecycle . . . . . . . . . . . . . . . . . . . . . . 21
2.5 Patterns, Antipatterns and their Relationship . . . . . . . . . . . . . . 27
2.6 JVMPI Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.7 The KDD Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.1 PAD Tool Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.2 Run-time Design Meta Model . . . . . . . . . . . . . . . . . . . . . . . 46
3.3 Hierarchy of Antipatterns . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.1 Dynamic Call Trace (a) and Corresponding Dynamic Call Tree (b) . . 58
4.2 Example Run-Time Path . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.3 Run-Time Path Data Structure . . . . . . . . . . . . . . . . . . . . . . . 60
4.4 A Run-Time Path’s PathNode Data Structure . . . . . . . . . . . . . . 60
4.5 COMPAS Probe Insertion Process . . . . . . . . . . . . . . . . . . . . . 65
4.6 COMPAS JEEM Architecture . . . . . . . . . . . . . . . . . . . . . . . . 66
4.7 Intercepting Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.8 Remote Method Invocation . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.9 The Sample Bean’s Home Interface . . . . . . . . . . . . . . . . . . . . 75
4.10 A Wrapper for the Sample Bean’s Home Interface . . . . . . . . . . . . 75
4.11 A Sample Bean Client . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.12 Run-Time Path with Tracked Object, as a Sequence Diagram . . . . . . 78
4.13 JEEManagedObject . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.14 JDBCStats, JDBCConnectionStats, JDBCConnectionPoolStats . . . . . 81
5.1 Run-time Design Meta Model from Chapter 3 . . . . . . . . . . . . . . 91
5.2 Example Run-Time Path (a), Example Deployment Descriptors (b),
Extract of Component Data Structure (c) and Data Extracted to Pop-
ulate Data Structure (d) . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.3 Example Run-Time Path (a), Extract of the TrackedObject Data Struc-
ture (b) and Information Extracted to Populate the TrackedObject
Data Structure (c) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
iv

5.4 Example Run-Time Path (a), Extract of the RunTimeContainerSer-
vice Data Structure (b), and Information Extracted to Populate the
RunTimeContainerService Data Structure (c) . . . . . . . . . . . . . . . 95
5.5 Class Diagram Showing Components Relationships . . . . . . . . . . 98
5.6 Example Transaction with Different Support Counting Approaches . 100
5.7 Hidden Elements in Transaction and Corresponding Support Counts 101
6.1 Rule to Detect Simultaneous Interfaces Antipattern . . . . . . . . . . . 115
6.2 Rule to Detect Needless Session Antipattern . . . . . . . . . . . . . . . 116
6.3 Rule to Detect Bulky or Unusual Levels of Database Communication 117
7.1 AccountList Run-Time Path and UML sequence diagram . . . . . . . 128
7.2 Diagram Showing Components in Duke’s Bank . . . . . . . . . . . . . 129
7.3 Diagram Showing Components in PlantsByWebsphere . . . . . . . . . 131
7.4 Portability Test Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
7.5 Performance Overhead Test Results . . . . . . . . . . . . . . . . . . . . 133
7.6 Test Results on K2 10 2 Database . . . . . . . . . . . . . . . . . . . . . 136
7.7 Test Results on K2 100 2 Database . . . . . . . . . . . . . . . . . . . . . 138
7.8 Test Results on Sun Database . . . . . . . . . . . . . . . . . . . . . . . . 139
7.9 Test Results on IBM Database . . . . . . . . . . . . . . . . . . . . . . . 140
7.10 Class Diagram of a Modiﬁed Version of Duke’s Bank with Commu-
nication Patterns Highlighted . . . . . . . . . . . . . . . . . . . . . . . 143
A.1 The Transactions-A-Plenty Rule . . . . . . . . . . . . . . . . . . . . . . 172
A.2 The Conversational-Baggage Rule . . . . . . . . . . . . . . . . . . . . . 173
A.3 The Sessions-A-Plenty Rule . . . . . . . . . . . . . . . . . . . . . . . . . 173
A.4 The Needless-Session Rule . . . . . . . . . . . . . . . . . . . . . . . . . 174
A.5 The Remote-Calls-Locally Rule . . . . . . . . . . . . . . . . . . . . . . . 175
A.6 The Accessing-Entities-Directly Rule . . . . . . . . . . . . . . . . . . . 175
A.7 The Bloated-Session Rule . . . . . . . . . . . . . . . . . . . . . . . . . . 175
A.8 The Unusual-or-Bulky-Session-Entity-Communication Rule . . . . . . 176
A.9 The Fine-Grained-Remote-Calls Rule . . . . . . . . . . . . . . . . . . . 176
A.10 The Unused-Data-Object Rule . . . . . . . . . . . . . . . . . . . . . . . 177
A.11 The Incorrect-Pool-Size Rule . . . . . . . . . . . . . . . . . . . . . . . . 177
A.12 The Local-and-Remote-Intefaces-Simultaneously Rule . . . . . . . . . 178
v

List of Acronyms
ADL: Architecture Description Language
AOP: Aspect Oriented Programming
API: Application Programming Interface
AST: Abstract Syntax Tree
BCI: Byte Code Instrumentation
CCM: CORBA Component Model
CPI: COMPAS Probe Insertion
DTO: Data Transfer Object
EIS: Enterprise Information Systems
EJB: Enterprise Java Beans
ERP: Enterprise Resource Planning
FCA: Formal Concept Analysis
FIM: Frequent Itemset Mining
FSM: Frequent Sequence Mining
HTML: HyperText Markup Language
HTTP: HyperText Transfer Protocol
J2EE: Java 2 Enterprise Edition
J2SE: Java 2 Standard Edition
JDBC: Java Database Connectivity
JEE: Java Enterprise Edition
JMS: Java Message Service
JMX: Java Management Extensions
JNDI: Java Naming and Directory Interface
JSP: Java Server Pages
JSR: Java Service Request
JVM: Java Virtual Machine
JVMPI: Java Virtual Machine Proﬁler Interface
JVMTI: Java Virtual Machine Tools Interface
KDD: Knowledge Discovery in Databases
LQN: Layered Queuing Networks
MTS: Microsoft Transaction Server
OCL: Object Constraint Language
OS: Operating System
PAD: Performance Antipattern Detection
PMI: Performance Monitoring Infrastructure
POJO: Plain Old Java Object
QN: Queuing Networks
RDBMS: Relational Database Management Systems
RMI: Remote Method Invocation
RML Relational Manipulation Language
SOA: Service Oriented Architecture
SPA: Stochastic Process Algebras
SPN: Stochastic Petri Nets
SQL: Structured Query Language
UML: Uniﬁed Modelling Language
XML: Extensible Markup Language
vi

ABSTRACT
Enterprise applications are becoming increasingly complex. In recent times they have
moved away from monolithic architectures to more distributed systems made up
of a collection of heterogonous servers. Such servers generally host numerous soft-
ware components that interact to service client requests. Component based enterprise
frameworks (e.g. JEE or CCM) have been extensively adopted for building such ap-
plications. Enterprise technologies provide a range of reusable services that can assist
developers building these systems. Consequently developers no longer need to spend
time developing the underlying infrastructure of such applications, and can instead
concentrate their efforts on functional requirements.
Poor performance design choices, however, are common in enterprise applications
and have been well documented in the form of software antipatterns. Design mistakes
generally result from the fact that these multi-tier, distributed systems are extremely
complex and often developers do not have a complete understanding of the entire ap-
plication. As a result developers can be oblivious to the performance implications of
their design decisions. Current performance testing tools fail to address this lack of
system understanding. Most merely profile the running system and present large vol-
umes of data to the tool user. Consequently developers can find it extremely difficult
to identify design issues in their applications. Fixing serious design level performance
problems late in development is expensive and can not be achieved through ”code op-
timizations”. In fact, often performance requirements can only be met by modifying
the design of the application which can lead to major project delays and increased
costs.
This thesis presents an approach for the automatic detection of performance design
and deployment antipatterns in enterprise applications built using component based
frameworks. Our main aim is to take the onus away from developers having to sift
through large volumes of data, in search of performance bottlenecks in their applica-
tions. Instead we automate this process. Our approach works by automatically recon-
structing the run-time design of the system using advanced monitoring and analysis
techniques. Well known (predefined) performance design and deployment antipat-
terns that exist in the reconstructed design are automatically detected. Results of ap-
plying our technique to two enterprise applications are presented.
The main contributions of this thesis are (a) an approach for automatic detection
of performance design and deployment antipatterns in component based enterprise
frameworks, (b) a non-intrusive, portable, end-to-end run-time path tracing approach
for JEE and (c) the advanced analysis of run-time paths using frequent sequence
mining to automatically identify interesting communication patterns between com-
ponents.
vii

Dedicated to my parents, Tommy and Kay.

ACKNOWLEDGEMENTS
Firstly I would like to thank my supervisor, John Murphy for giving me the oppor-
tunity to pursue this research, and also for all his help, encouragement and good
humour along the way. I would also like to thank Liam Murphy who was always
available for dialog and who has effectively acted as a second supervisor over the
years. I would like to thank Andrew Lee for initially suggesting the notion of ”detect-
ing antipatterns”, when I was back searching for research ideas. Also thanks to Peter
Hughes for his input and feedback during the early days of my work.
Next, I would like to thank my colleagues in Dublin City University, where this jour-
ney first began. In particular, I would like to thank Ada Diaconescu, Mircea Trofin
and Adrian Mos, from whom I learned so much during my initial two years as a re-
searcher, for being fun colleagues, for always being available to bounce ideas off (even
now that you have all unfortunately left Dublin) and for teaching me the important
basics of the Romanian language. Furthermore I would like to thank Adrian for as-
sisting me in extending some of his research ideas and for inviting me to INRIA for
interesting discussions on my work. I would also like to thank Colm Devine, Adrian
Fernandes and Cathal Furey (three of the four horsemen) for their engaging lunch
time discussions (in the early days) on the human anatomy and other such topics.
Thanks also to my DCU/UCD colleagues, Dave ”the brickie” McGuinness, Jenny Mc-
Manis, Gabriel Muntean, Christina Muntean, Christina Thorpe, Alex Ufimtsev, Oc-
tavian Ciuhandu, Lucian Patcas, Olga Ormand, Jimmy Noonan, Hamid Nafaa, Petr
Hnetynka, Sean Murphy, John Fitzpatrick (for allowing me to introduce him to Las Ve-
gas), John Bergin, Omar Ashagi (for teaching me basic Arabic) and Philip McGovern
for all being fun colleagues and to all those who I have had the pleasure of working
with over the years. Also thanks again to Sean Murphy for taking my questions over
the years and especially for his help with some of the mathematical aspects of my
research.
Furthermore, thanks to all those in IBM who helped me during my work and granted
me access to their environments. Thanks especially to Pat O’Sullivan and Simon Piz-
zoli for their help, interest and invaluable feedback on my research.
A special thanks to Claire Breslin for her endless support and patience, and for re-
minding me about the more important things in life.
Finally I would like to thank my parents, Tommy and Kay and brother, Tom, for their
constant encouragement. I would especially like to thank my parents, to whom this
work is dedicated. Without their unwavering love and support this work would not
have been possible.
ix

LIST OF PUBLICATIONS
Trevor Parsons, John Murphy. Detecting Performance Antipatterns in Component Based
Enterprise Systems. Accepted for publication in the Journal of Object Technology.
Trevor Parsons, John Murphy, Patrick O’Sullivan, Applying Frequent Sequence Mining
to Identify Design Flaws in Enterprise Software Systems. In Proceedings of the 5th Inter-
national Conference on Machine Learning and Data Mining, Leipzig, Germany, July
18-20, 2007.
Trevor Parsons, John Murphy, Simon Pizzoli, Patrick O’Sullivan, Adrian Mos, Reverse
Engineering Distributed Enterprise Applications to Identify Common Design Flaws. Pre-
sented at the Software Engineering Tools For Tomorrow (SWEFT) 2006 Conference,
T.J. Watson, New York, Oct 17 - 19, 2006.
Liang Chen, Patrick O’Sullivan, Laurence P. Bergman, Vitorrio Castelli, Eric Labadie,
Peter Sohn, Trevor Parsons. Problem Determination in Large Enterprise Systems. Pre-
sented at the Software Engineering Tools For Tomorrow (SWEFT) 2006 conference,
T.J. Watson, New York, Oct 17 - 19, 2006. (Abstract only available)
Trevor Parsons, Adrian Mos, John Murphy. Non-Intrusive End to End Run-time Path
Tracing for J2EE Systems. IEE Proceedings Software, August 2006
Trevor Parsons, John Murphy. The 2nd International Middleware Doctoral Symposium:
Detecting Performance Antipatterns in Component-Based Enterprise Systems. IEEE Dis-
tributed Systems Online, vol. 7, no. 3, March, 2006
Trevor Parsons. A Framework for Detecting Performance Design and Deployment Antipat-
terns in Component Based Enterprise Systems. In Proceedings 2nd International Middle-
ware Doctoral Symposium, ACM Press, art. no. 7, Grenoble, France, 2005
Trevor Parsons. A Framework for Detecting, Assessing and Visualizing Performance An-
tipatterns in Component Based Systems. First Place at ACM SIGPLAN Student Research
Competition Graduate Division, In OOPSLA’04: Companion to the 19th annual ACM
SIGPLAN conference on Object-oriented programming systems, languages, and ap-
plications, pages 316-317, Vancouver, BC, Canada, 2004.
x

Trevor Parsons, John Murphy. A Framework for Automatically Detecting and Assessing
Performance Antipatterns in Component Based Systems using Run-Time Analysis. The 9th
International Workshop on Component Oriented Programming, part of the 18th Eu-
ropean Conference on Object Oriented Programming. Oslo, Norway, June 2004.
Trevor Parsons, John Murphy. Data Mining for Performance Antipatterns in Component
Based Systems Using Run-Time and Static Analysis. Transactions on Automatic Control
and Control Science, Vol. 49 (63), No. 3, pp. 113-118 - ISSN 1224-600X, May 2004.
xi

CHAPTER
ONE
Introduction
Main Points
• Performance is a major issue during the development of enterprise applications.
• System complexity leads to a lack of understanding and consequently poor de-
sign decisions are commonly made by developers.
• Poor system design is often responsible for a badly performing system.
• Current performance testing tools do not address performance design issues and
are limited.
• There are a large number of well known design issues for enterprise systems.
• Antipatterns document well known design issues and their corresponding solu-
tion.
• Thesis Contributions:
– An approach for the automatic detection of performance design and de-
ployment antipatterns in systems built on component based enterprise
frameworks.
– A portable, low overhead, non-intrusive, end-to-end run time path tracer
for distributed JEE systems.
– A technique for the identiﬁcation of interesting communication patterns in
a collection of run-time paths.
1

1.1 Background and Motivation
In the past software developers had to be extremely careful when developing their ap-
plications as resources were often scarce and the management of such scare resources
was a complex issue. Modern advances in software technologies, however, have al-
lowed for developers to concentrate less on issues such as performance and resource
management, and instead developers have been able to spend more time developing
the functionality of their applications. An example of this can be seen in modern lan-
guages (Java1, C#2) that provide garbage collection facilities, freeing developers from
the task of having to manage memory, which had typically been a complex and time
consuming exercise. Freeing developers from having to worry about what is happen-
ing ”under the hood” allows them to concentrate more of their efforts on developing
the functionality of a system. This is even more obvious with enterprise level com-
ponent frameworks (e.g. JEE3 or CCM4) whereby the framework can be expected to
handle complex underlying issues such as security, persistence, performance and con-
currency to name but a few. Again the idea is to allow developers to concentrate on
the application functionality such that the time to market is reduced. A downside
of this advance in software technologies is that developers become less familiar with
the mechanics of the underlying system, and as a result, can make decisions during
development that have an adverse effect on the system.
Performance is a major issue for developers building large scale multi user enterprise
applications. In fact recent surveys have shown that a high percentage of enterprise
projects fail to meet their performance requirements on time or within budget 5 6. This
leads to project delays and higher development costs, and results from the fact that de-
velopers often do not have a complete understanding of the overall system behaviour.
Figure 1.1 shows a typical enterprise application made up of a number of different
physically distributed servers. Each server can in turn be made up of a large number
of software components that interact to service different client requests. Understand-
ing the run-time behaviour of such systems can be a difficult task and consequently
it is common that developers are unaware of the performance implications of their
design decisions.
Current development and testing tools fail to address this issue of understanding en-
terprise system behaviour. For example most of today’s performance tools merely
profile the running system and present performance metrics to the tool user. The
1The Java Technology, Sun Microsystems, http://java.sun.com/
2The C# language, Microsoft, http://msdn2.microsoft.com/en-us/vcsharp/aa336809.aspx
3Java Enterprise Edition, Sun Microsystems, http://java.sun.com/javaee/
4The CORBA Component Model specification, The Object Management Group,
http://www.omg.org/technology/documents/formal/components.htm
5Ptak, Noel and Associates, ”The State of J2EE Application Management: Analysis of 2005 Benchmark
Survey”, http://www.ptaknoelassociates.com/members/J2EEBenchmarkSurvey2005.pdf,
6Jasmine Noel, ”J2EE Lessons Learned ”, SoftwareMag.com, The Software IT Journal, January, 2006.
http://www.softwaremag.com/L.cfm?doc=2006-01/2006-01j2ee
2

Figure 1.1: Typical Enterprise Architecture
volume of data produced when profiling even a single user system can be extremely
large. When profiling multi-user enterprise applications, where a typical load may be
in the order of thousands, the amount of data produced can be truly overwhelming.
Often developers are required to sift through and correlate this information looking
for bottlenecks in their systems. Furthermore, even when developers find issues in
their applications using these tools, it is common that they are unsure as to how to go
about rectifying the issue. There is a clear need for more advanced performance tools
that not only profile the running system, but that also analyse the data produced to
identify potential issues in the application. While there has been research in the area of
debugging tools (e.g. [95] [145] [55] [14] [47]) which allow for automatic low-level bug
detection, often it is the case that low-level optimizations or fixes will not be enough to
enhance the system efficiency such that performance requirements are met. In many
situations an overhaul of the system design is required.
There are a large number of well known design mistakes that are consistently made
by developers building these systems. Such issues have been documented in the form
of software design antipatterns [36]. Similar to software design patterns, which doc-
ument best practices in software development, software antipatterns document com-
mon mistakes made by developers when building software systems. However, as well
as documenting the mistake, antipatterns also document the corresponding solution
to the problem. Thus not only can they be used to identify issues in software systems,
but they can also be used to rectify these issues by applying the solution provided. A
more complete and detailed definition of software patterns and antipatterns is given
in sections 2.6 and 2.7 respectively.
3

1.2 Thesis Overview
In light of the limitations of current performance tools and of the benefits of software
antipatterns, we have developed an approach to automatically identify performance
design and deployment antipatterns in systems built on enterprise component-based
frameworks. This approach takes the burden away from developers, of having to
sift through large volumes of monitoring data in search of design flaws, and instead
automates this process. Well known performance design flaws can be identified au-
tomatically. Identified issues are presented with related contextual information and a
corresponding solution to the problem such that the problem can be easily addressed.
The approach works by reconstructing the run-time design of the application under
test. The reconstructed design can be subsequently checked for well known pre-
defined antipatterns. From a high level this is achieved (a) by monitoring the run-
ning system to collect information required for antipattern detection, (b) by perform-
ing analysis on the monitoring data to summarise it and to identify relationships and
patterns in the data that might suggest potential design flaws, (c) by representing the
analysed data in a design model of the system and (d) by loading the design into a rule
engine such that antipatterns (pre-defined as rules) can be detected. The approach has
been realised in the Performance Antipattern Detection (PAD) tool. The tool has been
designed for the Java Enterprise Edition (JEE) technology.
The remainder of the thesis is structured as follows: Chapter 2 gives background
information on related technologies and related work. Chapter 3 gives a more de-
tailed overview of our approach, discusses our research methodology and outlines a
number of criteria that we use to validate our work. In this chapter we also give an
overview of software design antipatterns, with particular focus on performance an-
tipatterns. Chapter 4 outlines the different monitoring approaches that are required
for antipattern detection in a component based enterprise system, and how they can
be performed in a portable manner. Chapter 5 details a number of advanced anal-
ysis techniques that are applied to identify interesting relationships and patterns in
the run-time data. In particular it presents an approach for identifying frequent or re-
source intensive communication patterns between components using techniques from
the field of data mining. In this chapter we also show how the data collected from en-
terprise systems under load can be reduced and summarised. Chapter 6 shows how
a rule engine approach can be used to identify antipatterns in the reconstructed run-
time design of the system. In this chapter we also categorise JEE performance design
and deployment antipatterns into groups based on the data required to detect them.
Chapter 7 presents different sets of results from a range of tests that we have per-
formed to validate our research. Finally chapter 8 gives our conclusions and ideas on
future work in this area.
4

1.3 Thesis Contributions and Statement
The first major contribution of this thesis is an approach for the automatic detection of
design and deployment antipatterns in systems built using component based enter-
prise frameworks [125] [129] [130] [131] [132] [133]. This approach builds on current
performance tools by performing analysis on the data collected (i.e. run-time infor-
mation and component meta-data). The analysis reconstructs the system design and
identifies performance design flaws within it. The approach has been implemented
for the JEE technology in the form of the PAD tool, however it could potentially be
applied to other component based enterprise frameworks (e.g. CCM). This solution
has been successfully applied to both a sample and a real JEE application and has a
number of key advantages.
Firstly, it reduces and makes sense of the data collected by many of today’s perfor-
mance profilers. This work makes use of statistical analysis and data mining tech-
niques to summarise the data collected and to find patterns of interest that might
suggest performance problems. Thus, it takes the onus away from developers who
currently have to carry out this tedious task manually.
Secondly, while most of today’s performance tools tend to focus on identifying low
level hotspots and programming errors (e.g. memory leaks, deadlocks), this work
focuses on analysing the system from a performance design perspective. Since design
has such a significant effect on performance [43] it is essential that work is carried out
in this area.
Thirdly, unlike with many of today’s performance tools, problems identified are anno-
tated with descriptions of the issue detected, as well as a solution that can be applied
to alleviate the problem. This approach of identifying and presenting antipatterns to
developers helps them understand the mistakes that have been made, and the under-
lying reason as to why performance was affected. Developers can learn from using
our tool, and thus it may be less likely that the same mistakes are made in the future.
This approach also allows developers to easily rectify the situation by applying the
solution provided. In fact, the antipatterns presented provide a high level language
that developers and management alike can use to discuss such problems when they
occur.
The second major contribution of this work is a portable, low overhead, non-
intrusive, end-to-end run-time path tracer for JEE systems [128]. This is the first com-
pletely portable approach for collecting end-to-end run-time paths across all server
side tiers of a distributed JEE application. It is non-intrusive insofar as it does not re-
quire any modifications to the application or middleware source code. The monitor-
ing approach instead makes use of standard JEE mechanisms to intercept calls made
to the instrumented components. A run-time path [44] contains the control flow (i.e.
the ordered sequence of methods called required to service a user request), resources
5

and performance characteristics associated with servicing a request. Such information
is utilised to detect antipatterns by our PAD tool. By analysing run-time paths one can
easily see how system resources are being used, how the different components in the
system interact and how user requests traverse through the different tiers that make
up the system. In fact these paths can also be used for Object Tracking, i.e. to monitor
particular objects’ life cycles across the different user requests. In this work we show
how run-time paths can be used to manually and automatically reverse engineer a
JEE application. We also show how the reconstructed design can be used for either
manual or automatic identification of performance design flaws. For example, the
PAD tool makes use of run-time paths to identify the (run-time) component relation-
ships, communication patterns and object usage patterns in a JEE system. Results are
given for this monitoring approach which show that it produces a low overhead on
the instrumented system and that it can be applied in a portable manner.
The third and final major contribution of this work is a technique for the identifica-
tion of interesting communication patterns in a collection of run-time paths [126].
More precisely, we have applied a data mining technique, Frequent Sequence Mining
(FSM) to identify sequences of interest (e.g. frequently repeating method sequences
and resource intensive loops) across a transactional database of run-time paths by us-
ing alternative support counting techniques. In this work we also discuss scalability
problems (in terms of both the algorithm runtime and the amount of data produced)
related to applying FSM to run-time paths and give solutions to these issues. We
show how the sequences identified, can be used to highlight design flaws in enter-
prise applications, that lead to poor system performance. The PAD tool makes use of
this analysis technique to identify interesting component communication patterns in
a JEE system that may indicate the presence of particular antipatterns.
Following the above contributions the fundamental thesis of this work can be stated
as follows:
Performance design and deployment antipatterns can be automatically detected in component
based enterprise systems by analysing run-time data and component meta-data.
1.4 Key Assumptions and Scope
The work in this thesis is focused on component based systems as defined in section
2.2. As such, it is highly likely that the source code of the application to be analysed
is not available in its entirety, as components may have been developed by third par-
ties. Thus we assume source code is not available for analysis of the system. For such
systems bytecode analysis may also be problematic due to security restrictions or li-
censing constraints. Instead, we assume that a running implementation of the system
to be analysed is available such that dynamic data can be collected and utilised for
analysis.
6

We also assume that a realistic testing scenario is available which reﬂects how the sys-
tem will be used in production. We do not address the issue of how such testing sce-
narios can be obtained in this work, however, research in this area already exists. For
example Weyuker and Voklos have outlined an approach for the development of per-
formance test cases [176]. In this literature ﬁve typical steps required to develop per-
formance test cases are outlined. Alternatively, Ho et al. [92] propose an evolutionary
approach to performance test design based on their Performance Requirements Evolu-
tion Model. The authors claim that more precise and realistic performance tests can be
created incrementally during the development process through customer communi-
cation or performance model solving. In addition, agile development techniques such
as test-driven development [25] promote the design of test cases before developers
begin to code. Recently work has been presented which discusses how performance
tests can be incorporated into the test driven development process [96] allowing for
early availability of performance testing scenarios.
Our approach is applicable to applications built on component based enterprise
frameworks. However our research has thus far only been applied to synchronous
components, and has not, for example, been applied to message driven beans which
are asynchronous components in the JEE technology. Thus, our scope is limited to
synchronous applications. Our plans for future work outline how this approach could
potentially be applied to asynchronous components (see section 8.3).
7

CHAPTER
TWO
Background
In this chapter we introduce related research areas and technologies. We begin by
discussing the research area of performance engineering. Next we give an overview
of component based software, giving a definition for a software component and dis-
cussing component frameworks. We also give background information on the Java
Enterprise Edition technology which is the enterprise framework that our work has
been applied to. We focus specifically on the Enterprise Java Bean component tech-
nology and give details in this area related to our research. We present an overview of
the state of the art in performance tools discussing techniques for load generation and
performance profiling. We particularly focus on performance profiling tools for the
Java technology. Furthermore we give an overview of software architecture, software
patterns, and software antipatterns. An overview of research in the area of reverse
engineering is also presented. In this section we outline why previous approaches are
less suitable for distributed component based applications. The current state of the
art of research in the area of software pattern/antipattern detection is also discussed.
Finally we introduce the area of knowledge discovery in databases, and data mining
techniques relevant in this work.
8

Main Points
• Current performance analysis techniques, e.g., modelling, are inaccurate and
time consuming when applied to component based enterprise systems. Thus,
in industry performance analysis is usually deferred until performance testing
begins using the currently available performance testing tools.
• Component technologies such as EJB are increasingly being adopted to provide
for flexible, manageable and reusable solutions for complex software systems.
However poor system performance is common in these systems.
• System Architecture focuses on issues related to the overall system structure and
is said to be non-local, whereas software design focuses on local issues .
• Enterprise design plays a significant role in a system’s overall performance. Best
practices in design have been well documented in the form of design patterns.
• Well known design issues consistently occur in enterprise applications and have
been well documented, along with their corresponding solution, in the form of
design antipatterns.
• Performance testing tools for complex-multi user enterprise applications are lim-
ited and merely profile the running system, presenting vast amounts of data to
the tool user. There is a clear need from more advanced tools, that take the onus
away from the developer of having to sift through this data, and that automati-
cally analyse the data produced.
• Detailed documentation is generally not available for enterprise applications.
Thus, it can be difficult for developers to comprehend the overall application
design.
• Current reverse engineering/design pattern detection/antipattern detection
techniques are heavily based on static analysis and are unsuitable for compo-
nent based systems
• Data Mining techniques can be applied to extract knowledge from vast volumes
of data.
9

2.1 Performance of Software Systems
The performance of a software system has been described as an indicator of how well
the system meets its requirements for timeliness [154]. Smith and Williams [154] de-
scribe timeliness as being measured in either response time or throughput, where re-
sponse time is defined as the time required to respond to a request and throughput
is defined as the number of requests that can be processed in some specific time in-
terval. Furthermore they define two important dimensions to software performance
timeliness, responsiveness and scalability. Responsiveness is defined as the ability of
a system to meet its objectives for response time or throughput. The ability to con-
tinue to meet these objectives, as the demand on the system increases, is defined as
the systems scalability. The aim of performance engineering is to build systems that
are both responsive and scalable.
To date a vast amount of performance engineering research has focused on system
analysis through performance models. Performance models are created based on sys-
tem artifacts and various relevant estimates. Some of the most common performance
model classes are Queuing Networks (QN) (or extensions, such as Extended QN and
Layered QN), Stochastic Petri Nets (SPN), and Stochastic Process Algebras (SPA). Per-
formance models can be evaluated using simulation techniques or analytical methods,
in order to predict performance indices, such as throughput, response times, or re-
source utilization. A comprehensive survey of modelling approaches for performance
prediction is presented in [15].
However modelling today’s enterprise applications with a high degree of accuracy
can be a difficult and time consuming task. This results from the fact that these sys-
tems are often very large and complex and made up of black box components, the
internals of which are generally unknown (e.g. application servers). Performance
metrics required to populate performance models can thus not be easily obtained.
For enterprise applications accurate performance metrics can often only be obtained
through performance testing of a running system [58]. Recently Liu et al. [76] have
used a combination of performance modelling and benchmarking techniques to al-
low for population of performance models of enterprise applications. Their initial
results give accurate performance prediction for the small sample systems. A draw-
back of this approach is the lack of tool support which would allow for this technique
to be easily reused. From our experiences with large software houses, it seems that
performance modelling of enterprise applications is rarely performed. Most opt for
performance testing using available performance testing tools.
Work in the area of performance testing, however, has been very much lacking [176]
and thus performance testing, especially in the case of large enterprise applications,
can be a difficult task [76] [161]. This comes from the fact that today’s performance
testing tools are quite limited, insofar as they generally focus on simply collecting
10

data from a running system (i.e. profiling) and presenting this data to the user. These
tools tend to focus on low level programming bugs and do not address many of the
issues that lead to poor system performance (e.g. design issues).
2.2 Component Based Software
2.2.1 Software Components
There are numerous definitions of what software components are or should be 1. To be
specific, for the purpose of this thesis, we use Szyperski’s definition of a software com-
ponent: ”A software component is a unit of composition with contractually specified
interfaces and explicit context dependencies only. A software component can be de-
ployed independently and is subject to composition by third parties” [157]. Defining a
software component as a unit of composition simply means that the purpose of a com-
ponent is to be composed with other components. A component based application is
assembled from a set of collaborating components. To be able to compose components
into applications, each component must provide one or more interfaces which provide
a contract between the component and its environment. The interface clearly defines
which services the component provides and therefore defines its responsibility. Usu-
ally, software depends on a specific context, such as available database connections
or other system resources being available. For example, other components that must
be available for a specific component to collaborate with. In order to support com-
posability of components, component dependencies need to be explicitly specified. A
component can be independently deployed, i.e. it is self-contained and changes to
the implementation of a component do not require changes to other components. Of
course, this is only true as long as the component interface remains compatible. Fi-
nally assemblers of component based applications, are not necessarily the developers
of the different components. That is, components can be deployed by third parties
and are intended to be reused.
This definition of a software component leaves many details open, for example, how
components interact, what language(s) can be used for their development, for what
platform. Component frameworks further define the notion of a component, by de-
tailing these issues.
2.2.2 Component Frameworks
The key goal of component technology is independent deployment and assembly of
components. Component frameworks are the most important step for achieving this
1Beyond Objects column of Software Development magazine, articles by Bertrand Meyer and
Clemens Szyperski, archived at http://www.ddj.com/
11

aim [157]. They support components conforming to certain standards (or component
models) and allow instances of these components to be plugged into the component
framework. The component framework establishes environmental conditions for the
component instances and regulates the interaction between component instances. A
key contribution of component frameworks is partial enforcement of architectural
principles. By forcing component instances to perform certain tasks via mechanisms
under control of a component framework the component framework can enforce its
policies on the component instances. This approach helps prevent a number of classes
of subtle errors that can otherwise occur.
There are numerous component frameworks that exist today. Examples include EJB,
CCM, SOFA [140] and Fractal [37]. Each framework contains its own component
model, i.e. a set of features that components satisfy. Component models generally con-
form to either flat component models or hierarchical models. Flat component models
(e.g. EJB, CCM) define only primitive components whereby indivisible entities are di-
rectly implemented in a programming language. Hierarchical models (SOFA, Fractal)
also define composite components which are created via nesting of other components.
The research in this thesis aims at solving issues related to flat component models. In
particular we focus on EJB. EJB is part of a wider enterprise framework (Java Enter-
prise Edition) for building enterprise level applications (see section 2.3). EJB has been
selected since it is a well established technology that is currently used in industry to
develop enterprise applications. There is also a body of work detailing best practices
and bad practices for this technology (see sections 2.6 and 2.7). On the other hand the
hierarchical component models have mainly been used by the research community
and best practices in these areas are less well defined.
EJB is considered a contextual composition framework. Contextual composition
frameworks allow components to specify boundary conditions describing properties
that the runtime context must meet [166]. Composition performed by such frame-
works is based on the creation of contexts and placement of component instances
in the appropriate contexts. For example, a component framework for transactional
computation can be formed by supporting transactional attribution of components
(for example ”this component’s instances need to be used in a new transaction”) and
transactional enactment at the boundary of contexts. This approach can be used to cre-
ate frameworks for any other properties such as security, load balancing, management
etc.
Any component instance in a context can potentially be accessed from outside its con-
text. This context however, gets an opportunity to intercept all messages crossing the
context boundaries. Intercepting instances (e.g. objects) inside a context remains in-
visible to instances both external and internal to this context.
Current technology support for contextual composition includes Microsoft Transac-
12

tion Server (MTS)2, EJB containers, and CCM containers. We introduce the EJB tech-
nology in the following sections. The run-time services we discuss in section 5.3 are
created as a result of contextual composition.
2.3 The Java Enterprise Edition
The Java Enterprise Edition (JEE) is a component technology which defines a standard
(or a set of standards) for developing multi-tier enterprise applications. JEE, formerly
the Java 2 Enterprise Edition (J2EE), is an enterprise component framework for the
Java technology. The specification promotes a multi-tiered distributed architecture for
enterprise applications. Figure 2.1 shows a typical JEE architecture consisting of 4
main tiers: a client tier, a presentation or web tier, a business tier and an enterprise
information systems tier. JEE specifies different component types for implementing
the various enterprise application tiers. Naturally, clients reside in the client tier and
can be in the form of stand-alone Java applications or web browsers. In the following
subsections we detail each of the server-side tiers and give details on the components
that they can consist of.
Figure 2.1: Typical JEE Architecture
2.3.1 Web Tier
The JEE web tier provides a run-time environment (or container) for web components.
JEE web components are either servlets or pages created using the Java Servlet Pages
2Microsoft Corporation. Microsoft Transaction Server Transactional Component Services.
http://www.microsoft.com/com/wpaper/revguide.asp.
13

technology (JSPs) 3. Servlets are Java programming language classes that dynamically
process requests and construct responses. They allow for a combination of static and
dynamic content within the web pages. JSP pages are text-based documents that ex-
ecute as servlets but allow a more natural approach to creating the static content as
they integrate seamlessly in HTML pages. JSPs and Servlets execute in a web con-
tainer and can be accessed by clients over HTTP (e.g. a web browser). The servlet
filter technology is a standard JEE mechanism that can be applied to components in
the web tier to implement common pre and post-processing logic. It is discussed in
detail in section 4.2.5.1.
2.3.2 Business Tier
Enterprise Java Beans (EJBs) 4 are the business tier components and are used to handle
business logic. Business logic is logic that solves or meets the needs of a particular
business domain such as banking, retail, or finance for example. EJBs run in an EJB
container and often interact with a database in the EIS tier in order to process requests.
Clients of the EJBs can be either web components or stand alone applications. EJB
is the core of the JEE platform and provides a number of complex services such as
messaging, security, transactionality and persistence. These services are provided by
the EJB container to any EJB component that requests them. More details on the EJB
component model are given in section 2.4
2.3.3 Enterprise Information System Tier
Enterprise information systems provide the information infrastructure critical to the
business processes of an enterprise. Examples of EISs include relational databases,
enterprise resource planning (ERP) systems, mainframe transaction processing sys-
tems, and legacy database systems. The JEE Connector architecture 5 defines a stan-
dard architecture for connecting the JEE platform to heterogeneous EIS systems. For
example a Java Database Connectivity (JDBC) Connector is a JEE Connector Archi-
tecture compliant connector that facilitates integration of databases with JEE appli-
cation servers. JDBC 6 is an API and specification to which application developers
and database driver vendors must adhere. Relational Database Management Systems
(RDBMS) vendors or third party vendors develop drivers which adhere to the JDBC
specification. Application developers make use of such drivers to communicate with
the vendors’ databases using the JDBC API. The main advantage of JDBC is that it
allows for portability and avoids vendor lock-in. Since all drivers must adhere to the
3Java Servlet Technology, http://java.sun.com/products/servlet/
4Enterprise Java Bean Technology, http://java.sun.com/products/ejb/docs.html
5Java Connector Architecture, http://java.sun.com/j2ee/connector/
6Java Database Connectivity Architecture, http://java.sun.com/products/jdbc/
14

same specification, application developers can replace the driver that they are using
with another one without having to rewrite their application.
2.4 The Enterprise JavaBean Technology
The Enterprise Java Beans architecture is a component architecture for the develop-
ment and deployment of component based distributed applications. It is designed to
simplify and reduce the costs of the development and management processes of large-
scale, distributed applications. Applications built using this technology are capable
of being scalable, transactional, and multi-user secure. EJB provides the distributed
platform support and common services such as transactions, security, persistence and
lifecycle management. EJB also defines a flexible component model which allows for
components of different types that are suitable for specific tasks. Developers make use
of the different component types to implement the application business logic. Subse-
quently, EJBs are deployed and managed by EJB containers, as part of a JEE applica-
tion server. EJB containers provide middleware services and manage the EJB lifecycle
during runtime. These processes can be configured via XML documents, referred to
as EJB deployment descriptors. Physically EJB consists of two things [148]:
The specification7 which defines:
• The distinct ”EJB Roles” that are assumed by the component architecture.
• A component model
• A set of contracts: component-platform and component-client
A set of Java Interfaces:
• Components and application servers must conform to these interfaces. This al-
lows all conforming components to inter-operate. Also the application server
can manage any components that conform to the interfaces.
2.4.1 The EJB Roles
The EJB specification defines the following roles which are assumed by the component
architecture:
• Enterprise Bean Provider: The enterprise bean provider is typically an applica-
tion domain expert. The bean provider develops the reusable enterprise beans
that typically implement business tasks or business entities.
7The Enterprise Java Bean Specification version 2.0, http://java.sun.com/products/ejb/docs.html
15

• Application Assembler: The application Assembler combines enterprise beans
into larger deployable application units.
• Deployer: The Deployer takes the ejb-jar files produced by either the Bean
Provider or the Application Assembler and deploys the enterprise beans con-
tained in the ejb-jar files in a specific operational environment. The operational
environment includes the EJB Server and Container.
• EJB Service Provider and EJB Container Provider: The container supplies an EJB
container (the application server). This is the runtime environment in which
the beans live. The container provides middleware services to the beans and
manages them. The server provider is the same as the container provider. Sun
has not yet differentiated between them.
• System Administrator: The system administrator is responsible for the upkeep
and monitoring of the deployed system and may make use of runtime monitor-
ing and management tools provided by the EJB server provider
2.4.2 The EJB Component Model
EJB is built on top of object technology (Java). An EJB component consists of a busi-
ness interface, an implementation class, a home interface and configuration settings
(defined in an XML deployment descriptor). All of these, except for the deployment
descriptor, are Java artifacts (i.e. classes or interfaces).
The EJB implementation class contains the bean business logic written by the Enter-
prise Bean Provider. The EJB implementation class is a Java object that conforms to a
well defined interface and obeys certain rules. The interface it conforms to depends
on the bean type. The rules are necessary in order for the bean to be able to run in a
container.
Access to the implementation class can be obtained using a the EJB home interface.
The home interface defines methods for creating, destroying and finding EJBs (i.e.
lifecycle methods). The home interface can either be local or remote. Local interfaces
allow access from clients within the same JVM whereas remote interfaces allow for
access from remote clients (e.g. on another JVM running on the same machine or on
a JVM running on a physically distributed machine). In fact an EJB component can
have both local and remote interfaces, however this is not recommended [161].
The bean implementation business methods are exposed through the business inter-
face. Similar to the home interface the business interface can be exposed locally or
remotely (or both).
An EJB component also requires configuration settings for deployment. These settings
are defined in a XML deployment descriptor. The information in the deployment
16

descriptor details the different container services that are required by the EJB. For
example, a deployment descriptor can be used to declare how the container should
perform lifecycle management, persistence, transaction control, and security services.
EJB 2.0 defines three different kinds of enterprise beans, namely session beans, entity
beans and message-driven beans.
Session Beans: A session bean is an action bean that performs work for its client,
shielding the client from complexity by executing business tasks inside the server. A
session bean has only one client. When the client terminates the session appears to
terminate and is no longer associated with the client. The life of a session bean spans
the length of the session (or conversation) between the session and the client. Session
beans are not persistent and typically they do not survive application server crashes,
or machine crashes. They are in memory objects that live and die with their surround-
ing environments. Session beans hold conversations with clients. A conversation is an
interaction between a client and the bean. The two subtypes of session beans are state-
ful session beans and stateless session beans. Each is used to model different types of
conversations.
Stateful Session Beans: A stateful session bean is a bean that is designed to service
business processes that span multiple method requests or transactions. Stateful ses-
sion beans retain state on behalf of an individual client. If a stateful session bean’s
state is changed during a method invocation, that same state will be available to that
same client upon the following invocation.
Stateless Session Beans: A stateless session bean is a bean that holds conversations
that span a single method call. They are stateless because they do not hold multi-
method conversations with their clients. Except during method invocation, all in-
stances of a stateless bean are equivalent, allowing the EJB container to assign an in-
stance to any client. Because stateless session beans can support multiple clients, they
can offer better scalability for applications that require a large number of clients. Typ-
ically, an application requires fewer stateless session beans than stateful session beans
to support the same number of clients.
Entity Beans: Entity beans are persistent data components. Entity beans are enterprise
beans that know how to persist themselves permanently to a durable storage (e.g. a
database). They are physical, storable parts of an enterprise. Entity beans differ from
session beans in a number of ways. They are persistent, and allow shared access. They
have a unique identifier, enabling a client to identify a particular entity bean. Entity
beans can also persist in relationships with other entity beans. Entity beans can be per-
sisted in two ways, either using Bean-Managed Persistence, or Container-Managed
Persistence. Container-Managed persistent beans are the simplest for the bean de-
veloper to create. All logic for synchronizing the bean’s state with the database is
handled automatically by the container. Thus, the beans do not contain any database
access calls, and as a result the bean’s code is not tied to a specific persistent storage
17

mechanism (database). A Bean-managed persistent entity bean is an entity bean that
must be persisted by hand. The component developer must write code to translate
the in-memory fields into an underlying data store.
Message-driven Beans: A message-driven bean is an enterprise bean that allows EJB
applications to process messages asynchronously. They rely on the Java Message Ser-
vice (JMS) technology 8. Message-driven beans act as JMS message listeners. The
messages may be sent by any JEE component: an application client, another enter-
prise bean, a Web component or by a JMS application or system that does not use JEE
technology. A message-driven bean does not have component interfaces. The com-
ponent interfaces are absent because the message-driven bean is not accessible via the
Java RMI API; it responds only to asynchronous messages. One of the most impor-
tant aspects of message-driven beans is that they can consume and process messages
concurrently, because numerous instances of the MDB can execute concurrently in
the container. This capability provides a significant advantage over traditional JMS
clients.
As discussed in section 1.4 we have not applied our research to asynchronous compo-
nents. This is a direct result of the fact that our run-time path tracing approach (see
section 4.2) can not currently be used to monitoring message driven beans. Our plans
for future work suggest how this problem may be addressed (see section 8.3).
2.4.3 EJB Runtime
An EJB component contains a bean implementation class, a business interface, a home
interface and an XML deployment descriptor all of which are supplied by the bean
provider. To integrate the component into the container environment the container
automatically generates ”glue-code” that allows for the component to implicitly make
use of the container services. In fact, enterprise beans are not fully-fledged remote ob-
jects. When a client accesses an EJB, the client never invokes the methods directly on
the actual bean instance. Instead, the invocation is intercepted by the EJB container
and delegated to the bean instance. The interception is performed by the EJBObject.
The EJBObject is generated by the container (either during deployment or at run-time)
and provides the enterprise bean with networking capabilities and container services
such as transactions and security. The EJBObject replicates and exposes every business
method that the bean exposes. It is generated from the business interface supplied by
the bean provider. Similarly an EJBHome object is generated from the home interface.
The EJBHome object exposes the same methods as this interface and acts as a factory
object for EJBObjects. That is, the EJBHome Object is responsible for creating and de-
stroying EJBObjects. In order to understand how the various component constituents
work together we give an example of the various steps that are performed by a client
8Java Message Service, from Sun Microsystems: http://java.sun.com/products/jms/
18

and by the container when a bean is invoked.
To create an instance of an EJB a client must first obtain an instance of a EJBHome ob-
ject, generated by the container. The EJBHome object is bound to the component name
and available at run-time through the system’s naming directory, accessed through the
Java Naming and Directory Interface (JNDI)9. Thus to invoke an EJB a client performs
the following steps (see figure 2.2):
(1) It first obtains a reference to the EJBHome object that the container has generated.
The reference is looked up in the system-naming directory via JNDI. The client will
call the required construction method on the home object.
2) The EJBHome object instructs the container to create a new instance or retrieve an
existing instance of the component, and returns it to the client. The actual Java Object
returned is an instance of the container-generated EJBObject class that corresponds to
the bean’s component interface.
3) The client invokes the business method on the returned EJBObject, transparently,
through the component interface. The EJBObject performs the required container ser-
vices (specified in the XML deployment descriptor) and calls the corresponding busi-
ness method on the bean’s implementation object, instance of the bean provider’s bean
class.
Figure 2.2: Client Invoking an EJB
9Java Naming and Directory Interface (JNDI), http://java.sun.com/products/jndi/
19

Figure 2.3: Example EJB Deployment Descriptor
2.4.4 Deployment Settings
As shown in figure 2.2 the container generated EJBObject intercepts and delegates
all calls to the bean implementation. The EJBObject supplies the bean implementation
with any required services as specified in the deployment descriptor. Figure 2.3 shows
an extract from a deployment descriptor which specifies transactional attributes for a
beans methods. Such settings can have a major impact on the system performance 10
and should be carefully considered.
The container is also responsible for the management of the bean’s life cycle events.
10Performance Tuning EJB Applications - Part I by Mihir Kulkarni, February 2005,
http://dev2dev.bea.com/pub/a/2005/02/perf tune session beans.html
20

Figure 2.4: Stateless Session Bean Lifecycle
21

The management of an EJB’s lifecycle is a complex process and differs from bean type
to bean type. Factors which influence the bean lifecycle management include the load
in the system and the container configuration settings 11. Figure 2.4 illustrates the
lifecycle of a stateless session bean.
When the container starts the application it creates a pool of bean instances. The pool
size can be determined by setting the value in the configuration settings. If a bean
instance is required by a client, an instance is assigned from the bean pool. If no in-
stances are available the container can create more instances until the pool has reached
its maximum size (which is also configurable). If the bean pool has already reached
its maximum size and there are still no instances available the client will be put in a
queue until an instance becomes available. The pool configuration settings can have
a major impact on the system performance and should be tuned according to the ex-
pected load on the system.
The lifecycle of a stateful session bean and of an entity bean are similar but slightly
more complicated than that of the stateless session (since they can both be passivated).
More details on these lifecycles can be found in the literature [148]. It is sufficient to
say for the purposes of this thesis that the configurations settings in relation to EJB
lifecycles can have a major impact on the system performance and need to be carefully
considered.
2.5 Software Architecture
A large number of definitions exist for the term software architecture 12. One of most
cited definitions is by Bass et al. [21] and states that:
”The software architecture of a program or computing system is the structure or struc-
tures of the system, which comprise software elements, the externally visible proper-
ties of those elements, and the relationships among them.”
Bass et al. [21] also outline a number of implications of this definition: Firstly, a soft-
ware architecture is essentially an abstraction since it embodies information about the
relationship between elements, and externally visible properties that are exposed to
other elements, but it omits internal element information or information that does not
pertain to the elements’ interactions. Secondly, the definition makes it evident that
systems can and do consist of more than one structure. Thirdly, it is implied by the
definition that every software system has an architecture since every system can be
shown to compose of elements and relationships between them. Fourthly, the exter-
nal behaviour of each element is part of the architecture and finally the definition is
indifferent as to whether the architecture for a system is a good or bad one.
11PreciseJava, http://www.precisejava.com/
12Software Engineering Institute, Carnegie Mellon, list of software architecture definitions,
http://www.sei.cmu.edu/architecture/definitions.html
22

A software architecture is important for a number of reasons. Firstly it becomes a
vehicle for communication among the system’s stakeholders [21]. System stakehold-
ers are those concerned with the system (e.g. users, customers, software developers,
management etc.). A software architecture is a common abstraction of the system
and can serve as a lingua franca, i.e. an intermediate language that all stakeholders
can use to discuss various aspects of the system. The different stakeholders of the
system are often concerned with different system characteristics. An architecture pro-
vides a common language in which these different concerns can be expressed. Since
stakeholders can be interested in different system characteristics it is important for
the architecture to provide different views [51] that let them consider the architecture
from different perspectives. For example, a functional view might contain an abstrac-
tion of the different system functions and their relations whereas a code view may
give an abstraction of the code in terms of objects or classes (or higher level subsys-
tems or modules) and their relationships. Different stakeholders make use of different
views to analyse the architecture according to their needs. Typical views include a
functional view, a concurrency view, a code view, a physical view etc [52]. Kruchten
[104] introduced the 4+1 view model to describe a software architecture using five
concurrent views. Views are essentially a mechanism that allow for the separation of
concerns within the architecture allowing for the analysis of the architecture from dif-
ferent perspectives. Architecture description languages (ADLs) [50] can be utilised to
describe a software architecture. There have been many attempts to design such lan-
guages. However while some have been employed in real word projects none have
been widely adopted [21]. The literature [115] provides a comparison of ADLS.
Another important reason for system architecture is that it creates a realisation of early
design decisions and allows for system architects to analyse the suitability of these de-
cisions in relation to the system requirements (e.g. performance, security, flexibility)
[21]. These early design decisions manifested in the system architecture can not only
impact the quality attributes of the system but can also place constraints on the actual
system implementation i.e. some technologies may be more suitable for particular
architectures. The initial architecture can even have an impact on the organisational
structure of the team (or teams) building the application [21]. One of the earliest de-
sign decisions is often to choose a suitable architectural style. An architectural style
defines a vocabulary of components (e.g. clients, servers, databases) and connector
types (e.g. procedure calls, database protocols), and a set of constraints on how they
can be combined [152]. Architectural styles are found repeatedly in practice to address
similar sets of demands.
Finally, software architectures are also reusable assets that can be applied repeatedly
to other systems exhibiting similar requirements [21].
23

2.6 Software Patterns
The current use of the term pattern in software engineering is derived from work by
Christopher Alexander [168] in the field of contemporary architecture. Alexander’s
notion of a pattern was adopted by a number of software engineering researchers [26]
[71] and became popular in this field mainly after work published by Gamma et. al
[72]. Gabriel 13 gives the following definition of a pattern: ”Each pattern is a three-part
rule, which expresses a relation between a certain context, a certain system of forces
which occurs repeatedly in that context, and a certain software configuration which
allows these forces to resolve themselves.” This definition is consistent with Alexan-
der’s original definition [168] which states that a ”pattern is a three part rule which
expresses a relation between a certain context a problem and a solution.” Alexander
expands his definition to say that a problem relates to a certain system of forces which
occurs repeatedly in a context and that the problem solution can be considered as a
certain configuration which allows these forces to resolve themselves. While patterns
have been documented for a number of different domains (such as patterns for con-
temporary architecture [168] or organisational patterns [54]) we are mainly interested
in software patterns. Software patterns are usually documented according to a pattern
template. Common templates for describing patterns include the Alexandrian form
[168] and the GoF form [72]. A given template contains a number of elements that
describe the pattern e.g. name, problem, context, forces, solution, examples, resulting
context, rationale, related patterns and known uses 14.
Buschmann et al. [42] document a number of properties or benefits of patterns. While
they focus on patterns for software architecture many of the properties hold for soft-
ware patterns in general e.g.:
• A pattern addresses a recurring problem that arises in specific situations, and
presents a solution to it [42].
• Patterns document existing, well proven experience. That is, they document
solutions learned through experience and avoid the need for less experienced
developers to ”reinvent the wheel” time and time again [72].
• Patterns provide a common vocabulary and understanding for design principles
[72]. Similar, to the way a software architecture can serve as a vehicle for com-
munication (see section 2.5 above), pattern names can become part of a design
language and can act as a lingua franca facilitating discussion of design issues
and their solutions [42].
• Patterns support the construction of software with defined properties [42]. Pat-
terns assist developers in meeting both functional and non-functional require-
13The Hillside Group, Pattern Definitions, http://www.hillside.net/patterns/definition.html
14Patterns and Software: Essential Concepts and Terminology, by Brad Appleton,
http://www.cmcrossroads.com/bradapp/docs/patterns-intro.html
24

ments since they can provide a skeleton of functional behaviour while at the
same time they can explicitly address non-functional requirements e.g. reuse-
ability, maintainability etc.
Software patterns can be documented at various levels of abstraction. For example,
Buschmann et al. [42] discuss patterns at three different levels of abstraction, i.e.,
architectural patterns, design patterns and coding patterns or idioms. Architectural
level patterns are concerned with system structure. They describe predefined sets of
subsystems, specify their responsibilities and include rules and guidelines for organ-
ising the relationships between them. Design patterns on the other hand tend to be at
the level of objects and classes (or micro-architectures) and are used for refining sub-
systems or components of a software system. In the literature [72] they are defined
as ”descriptions of communicating objects and classes that are customized to solve a
general design problem in a particular context.” Eden and Kazman have also distin-
guished between architecture and design stating that architecture is concerned with
non-local issues whereas design is concerned with local issues [62]. Coding patterns
or idioms are lower level patterns specific to a programming language [53].
Since their introduction in the area of object oriented software development [72], pat-
terns have been documented for a range of systems and technologies 15. For examples
pattern catalogs exits in areas such as enterprise systems [67] [93], embedded-systems
[142], telecommunication systems [171] to name but a few. Many technology specific
patterns (or idioms) also exist (e.g. for Java [80], Ajax [110] and Microsoft technolo-
gies 16). In fact pattern catalogs even exist with particular quality attributes in mind
(e.g. security [150], performance [154]). Alur et al. [7] provide a catalog of patterns for
the JEE technology which document best practices for the design and implementation
of JEE applications. Other literature in this area also exists [113] 17. The design of a
JEE application plays a major role in the overall system performance. For example, it
has previously been shown how the system design can influence a JEE system’s scal-
ability [43]. In fact it is well known and recent reports 18 19 also indicate that poor
system design is a major reason as to why JEE systems often fail to meet performance
requirements. Another reason why poor software design is particularly undesirable
is that unlike with lower level software bugs, for example, poor software design can
be particularly difficult to rectify late in development and as such can lead to major
project delays. Software design best practices documented in the form of patterns can
be used to help avoid design issues when developing JEE applications.
15Handbook of Software Architecture, http://www.booch.com/architecture/index.jsp
16Enterprise Solution Patterns Using Microsoft .NET, http://msdn2.microsoft.com/en-
us/library/ms998469.aspx
17The Server Side Pattern Repository, http://www.theserverside.com/patterns/index.tss,
18Ptak, Noel and Associates, ”The State of J2EE Application Management: Analysis of 2005 Benchmark
Survey”, http://www.ptaknoelassociates.com/members/J2EEBenchmarkSurvey2005.pdf,
19Jasmine Noel, ”J2EE Lessons Learned ”, SoftwareMag.com, The Software IT Journal, January, 2006.
http://www.softwaremag.com/L.cfm?doc=2006-01/2006-01j2ee
25

2.7 Software Antipatterns
Antipatterns, first suggested by Koenig [101], have been defined by Brown et al. [36]
as: ”a literary form that describes a commonly occurring solution to a problem that
generates decidedly negative consequences.” The authors [36] also state that when
documented properly an ”antipattern describes a general form, the primary causes
which led to the general form; symptoms describing how to recognize the general
form; the consequences of the general form; and a refactored solution describing how
to change the antipattern into a healthier situation.” Software design antipatterns thus
provide the opportunity for developers to learn from past experiences. They docu-
ment software design mistakes that tend to consistently reoccur. However, as well
as documenting the mistake, antipatterns also document the corresponding solution.
Thus they allow developers to identify design issues in their system, and to rectify
these issues with the corresponding solution provided in the antipattern description.
Antipatterns are complementary to software patterns and often show situations where
patterns are misused. In fact, as technologies evolve often patterns can become stale
[60], i.e., what was once a best practice can in some instances become a bad practice.
Examples from the JEE technology include the caching with a Service Locator pat-
tern [7], which was recommended for J2EE 1.2 but is not recommended for J2EE 1.3
20. Another example is the Composite Entity pattern [7] which has become obsolete
since EJB version 2.x [61]. Figure 2.5 [36] shows the relationship between patterns and
antipatterns.
Software design antipatterns, like software design patterns, have been documented
at a number of different levels. For example Brown et al. [36] introduced a num-
ber of technology independent object oriented development antipatterns (as well as
higher level architectural and management antipatterns). Technology specific antipat-
terns have also been documented (e.g. Java [160], J2EE [161] [61]). Antipatterns for
systems built on service oriented architectures (SOA) have also been recently doc-
umented 21. As with software design patterns, some antipattern catalogs focus on
particular software quality attributes only. For example Smith and Williams have pre-
sented a number of performance related antipatterns [154], while Kis has presented
antipatterns focusing on security [99]. For the purpose of this thesis we focus mainly
on performance related antipatterns for enterprise systems. In particular we focus on
performance antipatterns related to design and deployment for JEE applications.
Similar to software design antipatterns, Fowler and Beck introduced the notion of
code smells [68]. Code smells are lower level symptoms of problems at the code level.
20B. Woolf. IBM WebSphere Developer Technical Journal: Eliminate caching in service locator
implementations in J2EE 1.3.,
http://www-128.ibm.com/developerworks/websphere/techjournal/0410 woolf/0410 woolf.html,
October 2004
21SOA antipatterns,Jenny Ang, Luba Cherbakov and Mamdouh Ibrahim, November 2005,
http://www-128.ibm.com/developerworks/webservices/library/ws-antipatterns/
26

Figure 2.5: Patterns, Antipatterns and their Relationship
They may not necessarily be a problem but often their presence in the code indicate
that problems exist. Catalogs of code smells are available in the literature 22 23.
2.8 Performance Tools
When design issues lead to poor performance, developers require testing tools to iden-
tify why the system is performing poorly. Developers use performance testing tools
to try to understand system behaviour and to discover how their system makes use of
the system resources. Application level performance testing tools fall into two main
categories, i.e. workload generation tools and performance profilers.
2.8.1 Workload Generation
In order to evaluate the performance characteristics of an application under devel-
opment a realistic workload is required to mimic how the system would be utilised
by clients in a production environment. To achieve this a synthetic workload can be
automatically generated using a workload generator. Workload generators fall into
two main categories, traced based approaches and analytical approaches [17]. Traced
based approaches make use of server log files to characterise the workload of an ap-
plication, whereas analytical approaches are based on mathematical models which are
22A Taxonomy of Code Smells, http://www.soberit.hut.fi/mmantyla/BadCodeSmellsTaxonomy.htm
23Smells within Classes, http://wiki.java.net/bin/view/People/SmellsToRefactorings
27

usually based on statistical methods [124]. There are advantages and disadvantages
associated with both approaches. For example, traced based approaches are consid-
ered relatively easy to implement and are based on activity from a known system.
However disadvantages relate to the fact that this approach treats the workload as a
black box and as such insight into the workload characteristics can be difficult to ob-
tain. Also it can be difficult to modify the workload to simulate future or alternative
conditions. Furthermore during development realistic logs may not be available to
base the traced based workload generation upon. Analytical approaches on the other
hand can be used to create synthetic workloads and do not suffer from the drawbacks
outlined above. However they can be more difficult to construct as an understanding
of the characteristics of the expected workload is required. The most commonly used
workload generators are the analytically based commercial profilers e.g. Apache’s
JMeter 24 or Mercury Loadrunner25. The literature [136] gives a representative subset
of the workload generators currently available in the open literature.
2.8.2 Profiling Tools
Next we explain what we mean by the term profiling and discuss the different ways
and levels of granularity in which profiling information can be collected (see section
2.8.2.1). We also give an overview of the different categories of profilers that are avail-
able for the Java technology (see section 2.8.2.2).
Profiling [169], is the ability to monitor and trace events that occur during run time.
This includes the ability to track the cost of these events, as well as the ability to at-
tribute the cost of the events to specific parts of the program. A profiler, for example,
may obtain information about what part of the program consumes the most CPU time,
or about the parts of the program which allocate the most amount of memory. Perfor-
mance profilers can be used in conjunction with load generators to monitor a running
system and obtain the required information for performance analysis. Profilers are
often described as either exact profilers or sampling based profilers [135] [30]. Ex-
act profiling also referred to as full profiling [64] or full instrumentation 26, captures
all events of a given type that are produced during program execution (e.g. method
invocations). Sampling based profilers on the other hand select a part of the entire
event population with the aim of determining the characteristics of the whole pro-
gram. Sampling usually involves selecting a subset of events for profiling based on
certain criteria (e.g. hot paths) or time intervals [64]. Exact profiling has the advantage
of being more precise than sampling but carries a higher performance overhead.
24Jakarta Apache JMeter http://jakarta.apache.org/jmeter/index.html.
25Mercury Loadrunner. http://mercury.com
26http://profiler.netbeans.org/docs/help/5.5/custom instrumetation
28

2.8.2.1 Recording Information
Regardless of the profiling approach however, information must be recorded by the
profiling tool. Performance metrics can be recorded at different levels of granular-
ity. At the lowest level hardware counters can be utilised to obtain performance met-
rics from the underlying hardware on which the program executes [8] [10] [156] [88].
Hardware counters can be utilised to record events, such as instructions executed, cy-
cles executed, pipeline stalls, cache misses, etc. One of the main advantages of using
hardware counters is that an application can be profiled without the need to modify or
instrument it. Also the overhead associated with using hardware counters for profil-
ing is generally quite low [10]. A disadvantage of hardware counters is that they rely
on platform specific features and thus they are generally not portable across different
hardware. Another issue with this type of profiling is that the information may be
too low level for higher level program analysis. Information such as virtual memory
management requests, and signals caused by segmentation violations can be obtained
at the operating system (OS) level [88]. OS level information can be recorded by sys-
tem level tools 27 or libraries 28. In situations where hardware counters or OS level
information is unavailable, or the information they produce is undesirable, informa-
tion can be obtained at a higher level. For today’s enterprise Java applications such
information can be recorded at a number of different levels i.e. at the JVM level, the
middleware level or the application level.
JVM level information is generally recorded by either instrumenting the JVM or by
using an agent-based approach that requests notification of events from the virtual
machine. The former very often requires access to the JVM source code such that it
can be modified to record the information required [12] [33]. A disadvantage of this
approach is that it requires an understanding of the complex JVM internals. Also this
approach generally ties the user to a particular JVM and is thus not portable. One of
the main advantages of instrumenting the JVM is that access to JVM level information
is not restricted as with the agent-based approaches.
Agent based approaches have been made popular through standard interfaces that
allow for a profiler agent to request performance related information from a running
JVM. The Java Virtual Machine Profiler Interface (JVMPI) [169] is an example of such
an interface (see figure 2.6). The JVMPI is a two-way function call interface between
the JVM and an in-process profiler agent. The profiler agent is responsible for commu-
nication between the JVM and the profiler front end. The profiler agent can register
with the JVM to be notified when particular events occur and upon notification can
call back into the JVM to obtain additional information. For example a notification
may be received when a method is entered (or exited) and a call back may be made to
27Performance Monitoring Tools for Linux, David Gavin Jan, 1998,
http://www.linuxjournal.com/article/2396
28Windows Management Instrumentation, http://www.microsoft.com/whdc/system/pnppwr/wmi/default.mspx
29

Figure 2.6: JVMPI Architecture
obtain the current stack trace at this point. The main advantage of standard interfaces
is that they are implemented by different JVM vendors. While the JVMPI was an ex-
perimental interface for Java 1.2, it was implemented by most JVM vendors and effec-
tively became standard. Thus profilers built using JVMPI are portable across different
JVM implementations. A disadvantage of standard interfaces is that they are fixed
interfaces and, as such, can only enable predefined types of profiling [135] or event
notifications. Another major issue with JVMPI in particular was that when using a
JVMPI agent the JVM could not run at full speed and was required to run in a debug-
ging mode. As such this profiling approach was not generally suitable for production
systems. Another major draw back of the JVMPI approach was that notifications could
not be tailored to profile selectively. If, for example, the profiler agent requested to be
notified on method entry events, all method entry events would be reported to the
agent. This lead to high overhead performance tools. The Java Virtual Machine Tools
Interface (JVMTI) 29 will replace JVMPI in Java 1.6 (the JVMPI is currently available
in Java 1.5). While at an architectural level the JVMTI looks similar to the JVMPI (i.e.
it also consists of call back functions and a profiler agent) it is quite different and im-
proves upon many of the limitations of JVMPI. Firstly, it allows for the JVM to run
at full speed and does not require it to run in debug mode. It also promotes the use
of bytecode instrumentation for many of the event based capabilities of the JVMPI.
Using bytecode instrumentation one can be more selective when profiling the appli-
cation and can instrument only the parts of the application that require analysis. This
avoids the ”all or nothing” approach of the JVMPI and thus reduces the profiler over-
head. In fact the JVMTI allows for dynamic bytecode instrumentation 30, which means
that the application can be instrumented as it runs. An issue that remains, however,
with both JVMPI and JVMTI is that they are native interfaces and while the profil-
ing agents (which must be written in native code) are portable across different JVMs
29The Java Virtual Machine Tools Interface, http://java.sun.com/j2se/1.5.0/docs/guide/jvmti/jvmti.html
30Ian Formanek and Gregg Sporar, Dynamic Bytecode Instrumentation A new way to profile Java
applications, December 15, 2005, http://www.ddj.com/dept/java/184406433
30

they are not portable across different platforms. The java.lang.instrument interface 31
is another standard interface (as of Java 1.5) which allows for the interception of the
JVM classloading process through a non-native agent. Since the agent is non-native
it is portable across different platforms. Java.lang.instrument allows for the agent to
monitor the classloading process and to instrument the classes such that they can call
back into the agent libraries.
Recording performance information for Java applications at the middleware level can
also be achieved through standard mechanisms. Java Management Extensions (JMX)
technology 32 is a standard technology that allows for the management of Java re-
sources through so called MBeans. MBeans, also known as managed beans, are Java
objects that are used to represent and manage JMX resources. A JMX resource can be
any application, device or Java object. In order to manage an MBean it must be reg-
istered with a JMX agent. JMX agents directly control registered MBeans and make
them available to remote management applications. While the JMX technology can
potentially be used to manage a wide range of different resources it has been heavily
used, in particular, to manage the resources of JEE application servers. In fact, ac-
cording to the JEE Management specification (Java Service Request 77 33) application
servers are required to expose this data through the JMX technology. Profilers built us-
ing JMX can collect data on the state of the different system resources (e.g. object pool
sizes, thread queues, database connectivity information) and because JMX is standard
they are portable across the different application server implementations.
Non standard hooks or mechanisms have also been used to collect information at the
middleware level. Often middleware vendors provide these mechanisms to enhance
the capabilities of their products. For example, IBM provides non standard features
for the Websphere application server in the form of the Performance Monitoring In-
frastructure (PMI). PMI is available for the Websphere application server and allows
for the collection of performance information on the server resources. The informa-
tion can be exposed to performance profiling tools through a number of different in-
terfaces 34. The main issue with non-standard features, that allow for the collection
of performance information, is that they are not portable across different vendors’ im-
plementations of the middleware, and thus can result in vendor lock in.
Where the information required is not available through standard or non-standard
features the middleware itself can be the subject of instrumentation. This can be
achieved by manually modifying the source code if it is available [45] [105]. However
31J2SE 5.0 in a Nutshell, Calvin Austin, May 2004,http://java.sun.com/developer/technicalArticles/releases/j2se15,
32Java Management Extensions Technology,http://java.sun.com/javase/technologies/core/mntr-
mgmt/javamanagement/
33Java Service Request 77, J2EE management, http://www.jcp.org/en/jsr/detail?id=77
34Srini Rangaswamy, Ruth Willenborg and Wenjian Qiao, IBM WebSphere De-
veloper Technical Journal: Writing a Performance Monitoring Tool Using Web-
Sphere Application Server’s Performance Monitoring Infrastructure API, 13 Feb 2002,
http://www.ibm.com/developerworks/websphere/techjournal/0202 rangaswamy/rangaswamy.html
31

Automatic Detection of Performance Design and Deployment Antipatterns in Component Based Enterprise Systems

Automatic Detection of Performance Design and Deployment Antipatterns in Component Based Enterprise Systems

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (18)

Semelhante a Automatic Detection of Performance Design and Deployment Antipatterns in Component Based Enterprise Systems

Semelhante a Automatic Detection of Performance Design and Deployment Antipatterns in Component Based Enterprise Systems (20)

Último

Último (20)

Automatic Detection of Performance Design and Deployment Antipatterns in Component Based Enterprise Systems