1) Performance issues often stem from architectural decisions, disconnected teams, flawed implementations, pushing changes without proper planning, blindly reusing components, and lack of agile deployment practices.
2) Common metrics that help identify performance problems include number of requests/user, log messages, exceptions, objects allocated/in cache and cache hit ratio, images, SQL statements, SQLs per request, HTTP status codes, and page size.
3) Tracking key performance indicators and metrics across automated unit and performance tests can help identify regressions and keep performance/architecture in check.
17. #2: “Teamwork” between Dev and Ops
SEV1 Problem in Production
Need access to log files
Where are they? Can’t get them
Need to increase log level
Can’t do! Can’t change config
files in prod!
21. #2: Root Cause: A special WebSphere Setting!
Log Service provides a synchronized
log file across ALL JVMs
Log Service provides a
synchronized log file across
ALL JVMs
22. Metrics: # Log Messages,
# Exceptions
Share: Same Server
Settings
32. #4: Mobile Landing Page of Super Bowl Ad
434 Resources in total on that page:
230 JPEGs, 75 PNGs, 50 GIFs, …
Total size of ~
20MB
33. #4: m.store.com redirects to www.store.com
ALL CSS and JS files are
redirected to the www domain
This is a lot of time “wasted”
especially on high latency mobile
connections
34. #4: Critical Pages not Optimized!
Browse, Search
and Product Info
performs well
… because they don’t follow
best practices: 87 Requests, 28
Redirects, …
Critical Pages such as
Shopping Cart are very
slow …
35. Metrics: Load Time,
# Resources (Images, …),
# HTTP 3xx, 4xx, 5xx
Dev: Build for Mobile
Test: Test on Mobile
39. #5: Using Hibernate results in 4k+ SQL Statements
to display 3 items!
Hibernate
Executes 4k+
Statements
Individual
Execution VERY
FAST
But Total SUM
takes 6s
41. #5: Using Telerik Controls Results in 9s for Data-
Binding of UI Controls
#1: Slow Stored Procedure
Depending on Request
execution time of this SP
varies between 1 and 7.5s
#2: 240! Similar SQL Statements
Most of these 240! Statements
are not prepared and just differ in
things like Column Names
42. Metrics: # Total SQLs
# SQLs / Web Request
# Same SQLs / Request
Transferred Rows
Test: With realistic Data
Dev: “Learn” Frameworks
54. # of Requests / User
# of Log Messages
# of Exceptions
# Objects Allocated
# Objects In Cache
Cache Hit Ratio
# of Images
# of SQLs
# SQLs per RequestAvailability
# HTTP 3xx, 4xx
Page Size
57. How about this idea?
12 0 120ms
3 1 68ms
Build 20 testPurchase OK
testSearch OK
Build 17 testPurchase OK
testSearch OK
Build 18 testPurchase FAILED
testSearch OK
Build 19 testPurchase OK
testSearch OK
Build # Test Case Status # SQL # Excep CPU
12 0 120ms
3 1 68ms
12 5 60ms
3 1 68ms
75 0 230ms
3 1 68ms
Test Framework Results Architectural Data
We identified a regresesion
Problem solved
Let’s look behind the
scenes
Exceptions probably reason
for failed tests
Problem fixed but now we have an
architectural regression
Problem fixed but now we have an
architectural regression
Now we have the functional and
architectural confidence
58. How? Performance Focus in Test Automation
Cross Impact of KPIs
Analyzing All Unit /
Performance Tests
Analyze Perf
Metrics
Identify
Regressions
59. More Info
• My Blog: http://apmblog.compuware.com
• Tweet about it: @grabnerandi
• dynaTrace Enterprise
– Full End-to-End Visibility in your Java, .NET, PHP Apps
– Sign up for a 15 Days Free Trial on http://compuwareapm.com
• dynaTrace AJAX Edition
– Browser Diagnostics for IE + FF
– Download @ http://ajax.dynatrace.com
When we look at the results of your Testing Framework from Build over Build we can easily spot functional regressions. In our example we see that testPurchase fails in Build 18. We notify the developer, problem gets fixed and with Build 19 we are back to functional correctness. Looking behind the scenesThe problem is that Functional Testing only verifies the functionality to the caller of the tested function. Using dynaTrace we are able to analyze the internals of the tested code. We analyze metrics such as Number of Executed SQL Statements, Number of Exceptions thrown, Time spent on CPU, Memory Consumption, Number of Remoting Calls, Transfered Bytes, …In Build 18 we can see a nice correlation of Exceptions to the failed functional test. We can assume that one of these exceptions caused the problem. For a developer it would be very helpful to get exception information which helps to quickly identify the root cause of the problem and solve it faster.In Build 19 the Testing Framework indicates ALL GREEN. When we look behind the scenes we see that we have a big jump in SQL Statements as well as CPU Usage. What just happened? The Developer fixed the functional problem but introduced an architectural regression. This needs to be looked into – otherwise this change will have negative impact on the application once tested under loadIn Build 20 all these problems are fixed. We are still meeting our functional goals and are back to acceptable number of SQL Statements, Exceptions, CPU Usage, …
Web Architectural Metrics# of JS Files, # of CSS, # of redirectsSize of Images