The document discusses reference propagation profiling, a technique for uncovering performance problems in Java applications. It is implemented in the Jikes RVM compiler by instrumenting code to track data dependencies and propagate references between memory locations. This allows analyzing applications to find inefficiencies like objects not being assigned to the heap or imbalance between operation costs and benefits. The profiling has high overhead but provides insights to assist with manual performance tuning.
Uncovering Performance Problems in Java Applications with Reference Propagation Profiling
1. Uncovering
Performance
Problems
in
Java
Applica5ons
with
Reference
Propaga5on
Profiling
Dacong
Yan1,
Guoqing
Xu2,
Atanas
Rountev1
1
Ohio
State
University
2
University
of
California,
Irvine
PRESTO:
Program
Analyses
and
So5ware
Tools
Research
Group,
Ohio
State
University
2. Overview
• Performance
inefficiencies
– O5en
exist
in
Java
applicaKons
– Excessive
memory
usage
– Long
running
Kmes,
even
for
simple
tasks
• Challenges
– Limited
compiler
opKmizaKons
– Complicated
behavior
– Large
libraries
and
frameworks
• SoluKon:
manual
tuning
assisted
with
performance
analysis
tools
2
3. An
Example
1 class Vec {
2 double x, y;
3 sub(v) {
4 res=new Vec(x-v.x, y-v.y);
5 return res;
6 }
7 }
8 for (i = 0; i < N; i++) {
9 t = in[i+2].sub(a[i-1]);
10 q[i] = t;
11 t = in[i+1].sub(a[i-2]);
12 // use of fields of t
13 }
……
80 t=q[*];
81 // use of fields of t
3
4. An
Example
1 class Vec {
2 double x, y;
3 sub(v) {
4 res=new Vec(x-v.x, y-v.y);
5 return res;
6 }
7 }
8 for (i = 0; i < N; i++) {
9 t = in[i+2].sub(a[i-1]);
10 q[i] = t;
11 t = in[i+1].sub(a[i-2]);
12 // use of fields of t
13 }
……
80 t=q[*];
81 // use of fields of t
4
5. An
Example
1 class Vec {
2 double x, y;
3 sub(v) {
4 res=new Vec(x-v.x, y-v.y);
5 return res;
6 }
7 }
8 for (i = 0; i < N; i++) {
9 t = in[i+2].sub(a[i-1]);
10 q[i] = t;
11 t = in[i+1].sub(a[i-2]);
12 // use of fields of t
13 }
……
80 t=q[*];
81 // use of fields of t
5
6. An
Example
1 class Vec {
2 double x, y;
3 sub(v) {
4 res=new Vec(x-v.x, y-v.y);
5 return res;
6 }
7 }
8 for (i = 0; i < N; i++) {
9 t = in[i+2].sub(a[i-1]);
10 q[i] = t;
11 t = in[i+1].sub(a[i-2]);
12 // use of fields of t
13 }
……
80 t=q[*];
81 // use of fields of t
6
7. An
Example
1 class Vec {
2 double x, y;
3 sub(v) {
4 res=new Vec(x-v.x, y-v.y);
5 return res;
6 }
7 }
8 for (i = 0; i < N; i++) {
9 t = in[i+2].sub(a[i-1]);
10 q[i] = t;
11 t = in[i+1].sub(a[i-2]);
12 // use of fields of t
13 }
……
80 t=q[*];
81 // use of fields of t
7
8. An
Example
1 class Vec {
2 double x, y;
3 sub(v) {
4 res=new Vec(x-v.x, y-v.y);
5 return res;
6 }
7 }
8 for (i = 0; i < N; i++) {
9 t = in[i+2].sub(a[i-1]);
10 q[i] = t;
11 t = in[i+1].sub(a[i-2]);
12 // use of fields of t
13 }
……
80 t=q[*];
81 // use of fields of t
8
9. ImplementaKon
• Reference
propagaKon
profiling
– Implemented
in
Jikes
RVM
3.1.1
– Modify
the
runKme
compiler
for
code
instrumentaKon
– Create
shadow
loca5ons
to
track
data
dependence
– Instrument
method
calls
to
track
interprocedural
propaga5on
• Overheads
– Space:
2-‐3×
– Time:
30-‐50×
9
10. Reference
PropagaKon
Profiling
• Intraprocedural
propagaKon
– Shadows
for
every
memory
locaKon
(stack
and
heap)
to
record
last
assignment
that
writes
to
it
– Update
shadows
and
the
graph
accordingly
Code Shadow Graph
6 a = new A; aʹ = RefAssign(6,6)
7 b = a; bʹ = RefAssign(6,7)
8 c = new C; cʹ = RefAssign(8, 8)
9 b.fld = c; b.fldʹ = RefAssign(8, 9)
10
11. Reference
PropagaKon
Profiling
• Intraprocedural
propagaKon
– Shadows
for
every
memory
locaKon
(stack
and
heap)
to
record
last
assignment
that
writes
to
it
– Update
shadows
and
the
graph
accordingly
Code Shadow Graph
6 a = new A; aʹ = RefAssign(6,6)
7 b = a; bʹ = RefAssign(6,7)
8 c = new C; cʹ = RefAssign(8, 8)
9 b.fld = c; b.fldʹ = RefAssign(8, 9)
11
12. Reference
PropagaKon
Profiling
• Intraprocedural
propagaKon
– Shadows
for
every
memory
locaKon
(stack
and
heap)
to
record
last
assignment
that
writes
to
it
– Update
shadows
and
the
graph
accordingly
Code Shadow Graph
6 a = new A; aʹ = RefAssign(6,6)
7 b = a; bʹ = RefAssign(6,7)
8 c = new C; cʹ = RefAssign(8, 8)
9 b.fld = c; b.fldʹ = RefAssign(8, 9)
12
13. Reference
PropagaKon
Profiling
• Intraprocedural
propagaKon
– Shadows
for
every
memory
locaKon
(stack
and
heap)
to
record
last
assignment
that
writes
to
it
– Update
shadows
and
the
graph
accordingly
Code Shadow Graph
6 a = new A; aʹ = RefAssign(6,6)
7 b = a; bʹ = RefAssign(6,7)
8 c = new C; cʹ = RefAssign(8, 8)
9 b.fld = c; b.fldʹ = RefAssign(8, 9)
13
14. Reference
PropagaKon
Profiling
• Intraprocedural
propagaKon
– Shadows
for
every
memory
locaKon
(stack
and
heap)
to
record
last
assignment
that
writes
to
it
– Update
shadows
and
the
graph
accordingly
Code Shadow Graph
6 a = new A; aʹ = RefAssign(6,6)
7 b = a; bʹ = RefAssign(6,7)
8 c = new C; cʹ = RefAssign(8, 8)
9 b.fld = c; b.fldʹ = RefAssign(8, 9)
14
15. Reference
PropagaKon
Profiling
• Intraprocedural
propagaKon
– Shadows
for
every
memory
locaKon
(stack
and
heap)
to
record
last
assignment
that
writes
to
it
– Update
shadows
and
the
graph
accordingly
Code Shadow Graph
6 a = new A; aʹ = RefAssign(6,6)
7 b = a; bʹ = RefAssign(6,7)
8 c = new C; cʹ = RefAssign(8, 8)
9 b.fld = c; b.fldʹ = RefAssign(8, 9)
• Interprocedural
propagaKon
– Per-‐thread
scratch
space
save
and
restore
shadows
for
15
parameters
and
return
variables
17. Client
Analyses
• Not-‐assigned-‐to-‐heap
(NATH)
analysis
– Locate
producer
nodes
that
do
not
reach
heap
propagaKon
nodes
(heap
reads
and
writes)
– Variant:
mostly-‐NATH
analysis
17
18. Client
Analyses
• Not-‐assigned-‐to-‐heap
(NATH)
analysis
– Locate
producer
nodes
that
do
not
reach
heap
propagaKon
nodes
(heap
reads
and
writes)
– Variant:
mostly-‐NATH
analysis
• Cost-‐benefit
imbalance
analysis
– Detect
imbalance
between
the
cost
of
interesKng
operaKons,
and
the
benefits
they
produce
– For
example,
analysis
of
write
read
imbalance
18
19. Client
Analyses
• Not-‐assigned-‐to-‐heap
(NATH)
analysis
– Locate
producer
nodes
that
do
not
reach
heap
propagaKon
nodes
(heap
reads
and
writes)
– Variant:
mostly-‐NATH
analysis
• Cost-‐benefit
imbalance
analysis
– Detect
imbalance
between
the
cost
of
interesKng
operaKons,
and
the
benefits
they
produce
– For
example,
analysis
of
write
read
imbalance
• Analysis
of
never-‐used
allocaKons
– IdenKfy
producer
nodes
that
do
not
reach
the
consumer
node
– Variant:
analysis
of
rarely-‐used
allocaKons
19
20. A
Real
Tuning
Session
1 class Vec {
2 double x, y;
3 sub(v) {
4 res=new Vec(x-v.x, y-v.y);
5 return res;
6 }
7 }
8 for (i = 0; i < N; i++) {
9 t = in[i+2].sub(a[i-1]);
10 q[i] = t;
11 t = in[i+1].sub(a[i-2]);
12 // use of fields of t
13 }
……
80 t=q[*];
81 // use of fields of t
20
21. A
Real
Tuning
Session
1 class Vec {
2 double x, y;
3 sub(v) {
4 res=new Vec(x-v.x, y-v.y);
5 return res;
6 }
7 }
8 for (i = 0; i < N; i++) {
9 t = in[i+2].sub(a[i-1]);
10 q[i] = t;
11 t = in[i+1].sub(a[i-2]);
12 // use of fields of t
13 }
……
80 t=q[*];
81 // use of fields of t
21
22. A
Real
Tuning
Session
1 class Vec {
2 double x, y;
1 2
3 sub(v) {
4 res=new Vec(x-v.x, y-v.y);
5 return res;
6 }
7 }
8 for (i = 0; i < N; i++) {
9 t = in[i+2].sub(a[i-1]); 1
10 q[i] = t;
11 t = in[i+1].sub(a[i-2]); 2
12 // use of fields of t
13 }
……
80 t=q[*];
81 // use of fields of t
22
23. A
Real
Tuning
Session
1 class Vec {
2 double x, y;
1 2
3 sub(v) {
4 res=new Vec(x-v.x, y-v.y);
5 return res;
6 }
7 }
8 for (i = 0; i < N; i++) {
9 t = in[i+2].sub(a[i-1]); 1
10 q[i] = t;
11 t = in[i+1].sub(a[i-2]); 2
12 // use of fields of t
13 }
……
80 t=q[*];
81 // use of fields of t
23
24. A
Real
Tuning
Session
1 class Vec { 1 class Vec {
2 double x, y; 2 double x, y;
3 sub(v) { 3 sub_rev(v, res) {
4 res=new Vec(x-v.x, y-v.y); 4 res.x = x-v.x;
5 return res; 5 res.y = y-v.y;
6 } 6 }
7 } tuning 7 } = new Vec; // reusable
nt
8 for (i = 0; i < N; i++) { 8 for (i = 0; i < N; i++) {
9 t = in[i+2].sub(a[i-1]); 9 t = in[i+2].sub(a[i-1]);
10 q[i] = t; 10 q[i] = t;
11 t = in[i+1].sub(a[i-2]); 11 in[i+1].sub_rev(a[i-2], nt);
12 // use of fields of t 12 // use of fields of nt
13 } 13 }
…… ……
80 t=q[*]; 80 t=q[*];
81 // use of fields of t 81 // use of fields of t
24
25. A
Real
Tuning
Session
1 class Vec { 1 class Vec {
2 double x, y; 2 double x, y;
3 sub(v) { 3 sub_rev(v, res) {
4 res=new Vec(x-v.x, y-v.y); 4 res.x = x-v.x;
5 return res; 5 res.y = y-v.y;
6 } 6 }
7 } tuning 7 } = new Vec; // reusable
nt
8 for (i = 0; i < N; i++) { 8 for (i = 0; i < N; i++) {
9 t = in[i+2].sub(a[i-1]); 9 t = in[i+2].sub(a[i-1]);
10 q[i] = t; 10 q[i] = t;
11 t = in[i+1].sub(a[i-2]); 11 in[i+1].sub_rev(a[i-2], nt);
12 // use of fields of t 12 // use of fields of nt
13 } 13 }
…… ……
80 t=q[*]; Reductions: 13% in running time and
80 t=q[*];
81 // use of fields of t 73% in #allocated objectsof fields of t
81 // use
25
26. Examples
of
Inefficiency
Pa`erns
• Temporary
objects
for
method
returns
– ReducKons
for
euler:
13%
in
running
Kme
and
73%
in
#allocated
objects
• Redundant
data
representaKon
– mst:
63%
and
40%
• Unnecessary
eager
object
creaKon
– chart:
8%
and
8%
– jflex:
3%
and
27%
• Expensive
specializaKon
for
sanity
checks
– bloat:
10%
and
11%
26
27. Conclusions
• Reference
propagaKon
profiling
in
Jikes
RVM
• Understanding
reference
propagaKon
is
a
good
starKng
point
for
performance
tuning
• Client
analyses
can
uncover
performance
inefficiencies,
and
lead
to
effecKve
tuning
soluKons
27