1. A Fast Algorithm
to Locate
Concepts in
Execution Traces
Soumaya Medini,
Philippe Galinier,
Massimiliano Di
Penta,
A Fast Algorithm to Locate
Yann-Ga¨l
Gu´h´neuc,
e e
e
Concepts in Execution Traces
Giuliano Antoniol
Introduction
Motivation and Soumaya Medini, Philippe Galinier, Massimiliano Di
Problem
Statement Penta,
Background Yann-Ga¨l Gu´h´neuc, Giuliano Antoniol
e e e
Our approach
Empirical study
SOCCER Lab. & Ptidej Team, ’Ecole Polytechnique de Montr´al,
e
Qu´bec, Canada
e
Conclusion University of Sannio, Italy
Future works
References
September 1, 2011
2. A Fast Algorithm
to Locate Outline
Concepts in
Execution Traces
Soumaya Medini,
Philippe Galinier,
Massimiliano Di Introduction
Penta,
Yann-Ga¨l e
Gu´h´neuc,
e e
Giuliano Antoniol
Motivation and Problem Statement
Introduction
Background
Motivation and
Problem
Statement Our approach
Background
Our approach Empirical study
Empirical study
Conclusion Conclusion
Future works
References Future works
References
2 / 19
3. A Fast Algorithm
to Locate Introduction
Concepts in
Execution Traces
Soumaya Medini,
Philippe Galinier,
Massimiliano Di
Penta,
Yann-Ga¨l e
Gu´h´neuc,
e e Concept location is an important task during program
Giuliano Antoniol
comprehension.
Introduction
Motivation and
A typical scenario in which concept location takes part
Problem
Statement a failure has been observed in a software system under
Background certain execution conditions
Our approach
such conditions are hard to reproduce
Empirical study
the execution trace was saved during such failure.
Conclusion
Future works
References Result: An execution trace containing a failure is saved but
we can not re-execute the same scenario.
3 / 19
4. A Fast Algorithm
to Locate Motivation and Problem Statement
Concepts in
Execution Traces
Soumaya Medini,
Philippe Galinier,
Massimiliano Di
Penta,
Yann-Ga¨l e
Gu´h´neuc,
e e
Giuliano Antoniol
Concept location process is recently defined as trace
Introduction
segmentation problem.
Motivation and
Problem
Statement
The textual content of methods in trace is used to split
Background
it into segments that participate in the implementation
Our approach of concepts.
Empirical study
Conclusion Few works attempt to automatically identify concepts in a
Future works single execution trace.
References
4 / 19
5. A Fast Algorithm
to Locate Motivation and Problem Statement
Concepts in
Execution Traces
Soumaya Medini,
Philippe Galinier,
Massimiliano Di
Penta,
Yann-Ga¨l e
Gu´h´neuc,
e e Asadi et al. [Asadi et al., 2010] presented the contribution
Giuliano Antoniol
to identify concepts in execution trace by finding cohesive
Introduction and decoupled fragments of the trace.
Motivation and
Problem
Statement
Limitations
Background Not scalable (thousands of methods).
Our approach Based on methaheuristic search: each run may produce
Empirical study
a different concept assignment.
Conclusion
Future works
Does not assign a label to the identified segments.
References
Purpose: To propose a novel approach to overcome these
limitations.
5 / 19
6. A Fast Algorithm
to Locate Background
Concepts in
Execution Traces
Soumaya Medini,
Philippe Galinier,
Massimiliano Di
Penta,
Yann-Ga¨l e
Gu´h´neuc,
e e
Giuliano Antoniol
Introduction
Steps
Motivation and
Problem
Statement
Step 1 : System instrumentation and trace collection.
Background Step 2 : Pruning and compressing traces.
Our approach
Step 3 : Textual analysis of Method source code.
Empirical study
Conclusion
Step 4 : Search-based concept location.
Future works
References
6 / 19
7. A Fast Algorithm
to Locate Background
Concepts in
Execution Traces Step 1 : System instrumentation and trace collection
Soumaya Medini,
Philippe Galinier,
Massimiliano Di
Penta,
Yann-Ga¨l e
Gu´h´neuc,
e e
Giuliano Antoniol
System instrumented using MODEC : tool to extract
Introduction
and model sequence diagrams.
Motivation and
Problem
Statement
Trace Collection : Label manually a parts of the code
Background during execution of the instrumented system.
Our approach
Empirical study Trace
Conclusion Trace represented as an ordered list of invoked methods with
Future works
labels.
References
7 / 19
8. A Fast Algorithm
to Locate Background
Concepts in
Execution Traces Step 2 : Pruning and compressing traces
Soumaya Medini,
Philippe Galinier,
Massimiliano Di
Penta,
Yann-Ga¨l e
Gu´h´neuc,
e e
Giuliano Antoniol
Pruning: Remove too frequent methods having
Introduction
Motivation and
invocation greater than Q3 + 2 x IQR, with Q3 is 75%.
Problem
Statement Compression: Remove repetitions of method invocations
Background using Run Length Encoding (RLE) algorithm.
Our approach
Empirical study
Conclusion
Future works
References
8 / 19
9. A Fast Algorithm
to Locate Background
Concepts in
Execution Traces Step 3 : Textual analysis of Method source code
Soumaya Medini,
Philippe Galinier,
Massimiliano Di
Penta,
Yann-Ga¨l e
Gu´h´neuc,
e e
Giuliano Antoniol
Textual analysis of source code:
Introduction Extract terms from source code : identifiers and
Motivation and comments.
Problem
Statement Split terms by Camel-Case.
Background Remove programming language keywords and english
Our approach stop words.
Empirical study Perform stemming.
Conclusion
Index using the tf-idf indexing mechanisms.
Future works
Apply Latent Semantic Indexing (LSI) to reduce the
References
term-document matrix.
9 / 19
10. A Fast Algorithm
to Locate Background
Concepts in
Execution Traces Step 4 : Search-based concept location
Soumaya Medini,
Philippe Galinier,
Massimiliano Di
Penta,
Yann-Ga¨l e
Gu´h´neuc,
e e
Giuliano Antoniol
Introduction
Motivation and Approach built upon dynamic programming algorithm to:
Problem
Statement Compute the exact split of the execution trace into
Background segments.
Our approach
Improve scalability.
Empirical study
Conclusion
Future works
References
10 / 19
11. A Fast Algorithm
to Locate Our approach
Concepts in
Execution Traces
Soumaya Medini,
Philippe Galinier,
Massimiliano Di
Dynamic programming : based on overlapping sub-problems
Penta,
Yann-Ga¨l e
by dividing a problem into sub-problems recursively solved.
Gu´h´neuc,
e e
Giuliano Antoniol
The solution of the problem is obtained by combining the
solutions of the sub-problems.
Introduction
We compute the quality of the segmentation of a trace split
Motivation and
Problem into K segments using the fitness function.
Statement
Background
Our approach end(l)−1 end(l)
i=begin(l) j=i+1 σ(mi ,mj )
Empirical study COHl = (1)
(end(l)−begin(l)+1)·(end(l)−begin(l))/2
Conclusion
end(l) l
Future works i=begin(l) j=1,j<begin(l) or j>end(l) σ(mi ,mj )
COUl = (2)
References (N−(end(l)−begin(l)+1))·(end(l)−begin(l)+1)
K
1 COHi
fitness = · (3)
K COUi + 1
i=1
11 / 19
12. A Fast Algorithm
to Locate Our approach
Concepts in
Execution Traces
Soumaya Medini,
Philippe Galinier,
Massimiliano Di
Penta,
Yann-Ga¨l e
Gu´h´neuc,
e e
Giuliano Antoniol
Figure: Trace splitting
Introduction
Motivation and
Problem
Statement
Background
Possibilies
Our approach new segment is added or
Empirical study the method is attached to the last solution segment.
Conclusion
Future works
Coh(i)= coh(i-1)+ delta
References
Cou(i)= cou(i-1)- delta
delta = sim(m1,m3)+sim(m1,m4)
12 / 19
13. A Fast Algorithm
to Locate Empirical study
Concepts in
Execution Traces
Soumaya Medini,
Philippe Galinier,
Massimiliano Di
Penta,
Yann-Ga¨l e
Gu´h´neuc,
e e
Giuliano Antoniol
RQ1 : How do the performances of the GA and DP
Introduction
approaches compare in terms of fitness values,
Motivation and
Problem convergence times, and numbers of segments?
Statement
Background
RQ2 : How do the GA and DP approaches perform in
Our approach terms of overlaps be- tween the automatic segmentation
Empirical study and the manually-built oracle, i.e., recall?
Conclusion RQ3 : How do the precision values of the GA and DP
Future works approaches compare when splitting execution traces?
References
13 / 19
14. A Fast Algorithm
to Locate Empirical study
Concepts in
Execution Traces
Soumaya Medini,
Philippe Galinier,
Massimiliano Di
Penta, Table: Data of the empirical study.
Yann-Ga¨l e
Gu´h´neuc,
e e
Compressed Sizes
Giuliano Antoniol
Cleaned Sizes
Original Size
Introduction
Motivation and
Problem
Statement
Background Systems Scenarios
Our approach (1)Start, Create note, Stop 34,746 821 588
ArgoUML v0.18.1 (2) Start, Create class,
Empirical study 64,947 1,066 764
Create note, Stop
Conclusion (1) Start, Draw rectangle, 6,668 447 240
Future works Stop
(2) Start, Add text, Draw
References 13,841 753 361
JHotDraw v1.5 rectangle, Stop
(3) Start, Draw rectangle,
11,215 1,206 414
Cut rectangle, Stop
(4) Start, Spawn window,
16,366 670 433
Draw circle, Stop
14 / 19
15. A Fast Algorithm
to Locate Empirical study
Concepts in
Execution Traces RQ1 Results
Soumaya Medini,
Philippe Galinier,
Massimiliano Di Compare the GA approach proposed by Asadi et al.
Penta,
Yann-Ga¨l e [Asadi et al., 2010] with our approach.
Gu´h´neuc,
e e
Giuliano Antoniol
Introduction Table: Number of segments, values of fitness
Motivation and function/segmentation score, and times required by the GA and
Problem
Statement
DP approaches.
Background # of Segments Fitness Time (s)
System Scenario
Our approach GA DP GA DP GA DP
(1) 24 13 0.54 0.58 7,080 2.13
Empirical study ArgoUML
(2) 73 19 0.52 0.60 10,800 4.33
Conclusion (1) 17 21 0.39 0.67 2,040 0.13
Future works (2) 21 21 0.38 0.69 1,260 0.64
JHotDraw
(3) 56 20 0.46 0.72 1,200 0.86
References
(4) 63 26 0.34 0.69 240 1.00
The DP approach performs significantly better than the GA
approach.
15 / 19
16. A Fast Algorithm
to Locate Empirical study
Concepts in
Execution Traces RQ1 and RQ2 Results
Soumaya Medini,
Philippe Galinier,
Massimiliano Di
Penta,
Yann-Ga¨l e
Gu´h´neuc,
e e
Giuliano Antoniol Table: Jaccard overlaps and precision values between segments
identified by the GA and DP approaches.
Introduction
Motivation and Jaccard Precision
System Scenario Feature
Problem GA DP GA DP
Statement
(1) Create Note 0.33 0.87 1.00 0.99
Background ArgoUML (2) Create Class 0.26 0.53 1.00 1.00
Our approach (2) Create Note 0.34 0.56 1.00 1.00
(1) Draw Rectangle 0.90 0.75 0.90 1.00
Empirical study
(2) Add Text 0.31 0.33 0.36 0.39
Conclusion (2) Draw Rectangle 0.62 0.52 0.62 1.00
Future works (3) Draw Rectangle 0.74 0.24 0.79 0.24
JHotDraw
(3) Cut Rectangle 0.22 0.31 1.00 1.00
References
(4) Draw Circle 0.82 0.82 0.82 1.00
(4) Spawn window 0.42 0.44 1.00 1.00
16 / 19
17. A Fast Algorithm
to Locate Conclusion
Concepts in
Execution Traces
Soumaya Medini,
Philippe Galinier,
Massimiliano Di
Penta,
Yann-Ga¨l e We reformulate the trace segmentation problem as a
Gu´h´neuc,
e e
Giuliano Antoniol dynamic programming (DP) problem
we showed that we can benefit from the overlapping
Introduction
sub-problems
Motivation and
Problem we obtained a dramatic gain in performance reusing
Statement
computed scores of intervals and segmentation scores
Background
The DP approach reuses pre-computed cohesion and
Our approach
Empirical study
coupling values among subsequent segments of an
Conclusion
execution trace, which is not possible using GA.
Future works Results showed that the DP approach significantly
References out-performed the GA approach in terms of:
the times required to produce the segmentations.
the scalability.
17 / 19
18. A Fast Algorithm
to Locate Future works
Concepts in
Execution Traces
Soumaya Medini,
Philippe Galinier,
Massimiliano Di
Penta,
Yann-Ga¨l e
Gu´h´neuc,
e e
Giuliano Antoniol
Introduction Validate the scalability of the DP trace segmentation
Motivation and approach using more larger traces.
Problem
Statement Use more traces obtained from different systems, to
Background verify the generality of our findings.
Our approach
Complement the approach with segment labeling to
Empirical study
make the produced segments more suitable for
Conclusion
Future works
program-comprehension activities.
References
18 / 19
19. A Fast Algorithm
to Locate
Concepts in
Execution Traces
Soumaya Medini,
Philippe Galinier,
Massimiliano Di
Penta,
Yann-Ga¨l e
Gu´h´neuc,
e e
Giuliano Antoniol
Introduction
Motivation and
Problem
Statement
THANKS !
Background
Our approach
Empirical study
Conclusion
Future works
References
19 / 19
20. A Fast Algorithm
to Locate
Asadi, F., Penta, M. D., Antoniol, G., and Gu´h´neuc,
e e
Concepts in
Execution Traces
Y.-G. (2010).
Soumaya Medini,
A heuristic-based approach to identify concepts in
Philippe Galinier,
Massimiliano Di
execution traces.
Penta,
Yann-Ga¨l e
In Proceedings of the European Conference on Software
Gu´h´neuc,
e e
Giuliano Antoniol
Maintenance and Reengineering, pages 31–40. IEEE
Computer Society Press.
Introduction
Motivation and Gethers, M. and Poshyvanyk, D. (2010).
Problem
Statement Using relational topic models to capture coupling among
Background classes in object-oriented software systems.
Our approach In Proceedings of the 2010 IEEE International
Empirical study Conference on Software Maintenance, pages 1–10. IEEE
Conclusion Computer Society.
Future works
˜s
Liu, Y., Poshyvanyk, D., Ferenc, R., GyimA¸thy, T., and
References
Chrisochoides, N. (2009).
Modeling class cohesion as mixtures of latent topics.
In International Conference on Software Maintenance,
pages 233–242.
19 / 19
21. A Fast Algorithm
to Locate
Lukins, S. K., Kraft, N. A., and Etzkorn, L. H. (2010).
Concepts in
Execution Traces
Bug localization using latent dirichlet allocation.
Soumaya Medini,
Information and Software Technology, 52(9):972 – 990.
Philippe Galinier,
Massimiliano Di
Penta,
Yann-Ga¨l e
Gu´h´neuc,
e e
Giuliano Antoniol
Introduction
Motivation and
Problem
Statement
Background
Our approach
Empirical study
Conclusion
Future works
References
19 / 19