Heavenly hell – automated tests at scale wojciech seliga

Heavenly Hell
Automated Tests at Scale
WOJCIECH SELIGA • SENIOR DEV MANAGER • ATLASSIAN • @WSELIGA

About me
• Coding since 6 yo
• Agile Practices (inc. TDD) since 2003
• Dev Nerd, Tech Leader, Agile Coach,
Speaker, PHB
• 7 years with Atlassian
(JIRA Senior Dev Manager)
• Spartez Co-founder & CEO

XP Promise
Cost of Change
Waterfall
XP
Time

Almost 10 years
of accumulating
legacy automatic tests

About 20 000 tests
on all levels of abstraction
*just in core JIRA

Very slow (even hours)
and fragile feedback loop

Serious performance and
reliability issues

Dispirited devs
accepting RED as a norm

Feedback
Speed
`
Test
Quality

Design
Restructure
Share
Respect
Prune
Test Code is Not Trash
Refactor Maintain
Review
Discuss
Rewrite

Test Pyramid
Selenium
REST / HTML Tests
Unit Tests (including QUnit)
Fastest, lowest overall confidence
Slowest, highest overall confidence

Selenium
REST / HTML Tests
Unit Tests (including QUnit)
Test Pyramid
90%
9%
1%

Optimum Balance
Isolation Speed

Optimum Balance
Isolation Speed Coverage

Optimum Balance
Isolation Speed Coverage Level

Optimum Balance
Isolation Speed Coverage Level Access

Optimum Balance
Isolation Speed Coverage Level Access Effort

Dangerous to temper with
Quality / Determinism

Dangerous to temper with
Quality / Determinism Maintainability

People - Motivation
Making GREEN the norm

Build Tiers and Policy
Tier A1 - green soon after all commits
unit tests and functional* tests
Tier A2 - green at the end of the day
WebDriver and bundled plugins tests
Tier A3 - green at the end of the iteration
supported platforms tests, compatibility tests

Wallboards:
Constant
Awareness

Training
• Favouring assertThat over assertTrue/False and assertEquals
• Avoiding races - Atlassian Selenium with its TimedElement
• Favouring unit tests over functional tests (including QUnit
over WebDriver)
• Promoting Page Objects
• Brownbags, blog posts, code reviews

Re-run failed tests and see if they pass
Automatic Flakiness Detection
Quarantine

Selenium ditching
Sky did not fall in

Ditching - benefits
• Freed build agents - better system throughput
• Boosted morale
• Gazillion of developer hours saved
• Money saved on infrastructure

Ditching - due diligence
• conducting the audit - analysis of the coverage we lost
• determining which tests needs to rewritten (e.g. security related)
• rewriting the tests (good job for new hires + a senior mentor)

Flaky Browser-based Tests
Races between test code and asynchronous page logic
Playing with "loading" CSS class does not really help

Races Removal with Tracing
// in the browser:!
function mySearchClickHandler() {!
doSomeXhr().always(function() {!
// This executes when the XHR has completed (either success or failure)!
JIRA.trace("search.completed");"
});!
}!
// In production code JIRA.trace is a no-op
// in my page object:!
@Inject!
TraceContext traceContext;!
!
public SearchResults doASearch() {!
Tracer snapshot = traceContext.checkpoint();!
getSearchButton().click(); // causes mySearchClickHandler to be invoked!
// This waits until the "search.completed"
// event has been emitted, *after* previous snapshot !
traceContext.waitFor(snapshot, "search.completed"); !
return pageBinder.bind(SearchResults.class);!
}!

Parallel Execution - Theory
End of Build
Batches
Start of Build

Parallel Execution
End of Build
Batches
Start of Build

Parallel Execution - Reality Bites
End of Build
Agent
availability
Batches
Start of Build

Dynamic Test Execution
Dispatch - Hallelujah

"You can't manage what
you can't measure."
not by W. Edwards Deming

If you believe just in it
you are doomed.
"You can't manage what
you can't measure."
not by W. Edwards Deming

You can't improve the system
if you can't measure it

You can't improve the system
if you can't measure it
Profiler, Build statistics, Logs, statsd → Graphite

Compilation
Packaging
Executing Tests
Anatomy of Build*

Fetching Dependencies
Compilation
Packaging
Executing Tests
Anatomy of Build*

Compilation
Packaging
Executing Tests
Anatomy of Build*
*Any resemblance to maven build is entirely accidental

Compilation
Packaging
Executing Tests
SCM Update
Anatomy of Build*

Agent Availability/Setup
Compilation
Packaging
Executing Tests
SCM Update
Anatomy of Build*

Compilation
Packaging
Publishing Results
Executing Tests
SCM Update
Anatomy of Build*

Compilation (7min)
JIRA Unit Tests Build

Compilation (7min)
Packaging (0min)

Compilation (7min)
Packaging (0min)
Executing Tests (7min)

Compilation (7min)
Publishing Results (1min)
Packaging (0min)

Compilation (7min)
Packaging (0min)
Fetching Dependencies (1.5min)

Compilation (7min)
Packaging (0min)
SCM Update (2min)

Agent Availability/Setup (mean 10min)
Compilation (7min)
Packaging (0min)
SCM Update (2min)

Decreasing test
execution time to
ZERO
alone would not let
us achieve our goal!

• starved builds due to
busy agents building
very long builds
• time synchronization
issue - NTPD problem

SCM Update - Checkout time
• Proximity of SCM repo
• shallow git clones are not so fast and lightweight + generating extra
git server CPU load
• git clone per agent/plan + git pull + git clone per build (hard links!)
• Much less load on Stash server (no need to queue up)

SCM Update - Checkout time
• Proximity of SCM repo
• shallow git clones are not so fast and lightweight + generating extra
git server CPU load
• git clone per agent/plan + git pull + git clone per build (hard links!)
• Much less load on Stash server (no need to queue up)
2 min → 5 seconds

• Fix Predator
• Sandboxing/isolation agent trade-off:
rm -rf $HOME/.m2/repository/com/atlassian/*
into
find $HOME/.m2/repository/com/atlassian/
-name “*SNAPSHOT*” | xargs rm
• Network hardware failure found
(dropping packets)

• Fix Predator
• Sandboxing/isolation agent trade-off:
rm -rf $HOME/.m2/repository/com/atlassian/*
into
find $HOME/.m2/repository/com/atlassian/
-name “*SNAPSHOT*” | xargs rm
• Network hardware failure found
(dropping packets)
1.5 min → 10 seconds

Compilation
• Restructuring multi-pom maven project and dependencies
• Maven 3 parallel compilation FTW!
-T 1.5C
*optimal factor thanks to scientific trial and error research

Compilation
• Restructuring multi-pom maven project and dependencies
• Maven 3 parallel compilation FTW!
-T 1.5C
*optimal factor thanks to scientific trial and error research
7 min → 1 min

Unit Test Execution
• Splitting unit tests into 2 buckets: good and legacy (much longer)
• Maven 3 parallel test execution (-T 1.5C)
3000 poor tests
(5min)
11000 good tests
(1.5min)
Rewritten entirely
over next year

Unit Test Execution
• Splitting unit tests into 2 buckets: good and legacy (much longer)
• Maven 3 parallel test execution (-T 1.5C)
3000 poor tests
(5min)
11000 good tests
(1.5min)
7 min → 5 min
Rewritten entirely
over next year

Functional Tests
• Selenium 1 removal did help
• Faster reset/restore (avoid unnecessary stuff, intercepting SQL
operations for debug purposes - building stacktraces is costly)
• Restoring via Backdoor REST API (JIRA TestKit)
• Using REST API for common setup/teardown operations

Publishing Results
• Server log allocation per test → using now Backdoor
REST API (was Selenium)
• Bamboo DB performance degradation for rich build
history

Publishing Results
• Server log allocation per test → using now Backdoor
REST API (was Selenium)
• Bamboo DB performance degradation for rich build
history
1 min → 40 s

Unexpected Problem
• Stability Issues with our CI server (hardware)
• The bottleneck changed from I/O to CPU
• Too many agents per physical machine

Compilation (1min)
JIRA Unit Tests Build Improved

Compilation (1min)
Packaging (0min)

Compilation (1min)
Packaging (0min)
Publishing Results (40sec)

Compilation (1min)
Packaging (0min)
Fetching Dependencies (10sec)

Compilation (1min)
Packaging (0min)
SCM Update (5sec)

Agent Availability/Setup (3min)*
Compilation (1min)
Packaging (0min)
SCM Update (5sec)

Improvements Summary
Tests Before After Improvement %
Unit tests 29 min 17 min 41%
Functional tests 56 min 34 min 39%
WebDriver tests 39 min 21 min 46%
Overall 124 min 72 min 42%
* Additional ca. 5% improvement expected once new git clone
strategy is consistently rolled-out everywhere

Better speed increases
responsibility
Fewer commits (authors) per single build
vs.

But that's still bad
We want CI feedback loop in a few minutes maximum

Inevitable Split - Fears
• Organizational concerns - understanding, managing,
integrating, releasing, coordinating
• Mindset change - if something worked for 10+ years why
to change it?
• Trust - does this library still work?
• We damned ourselves with big buckets for all tests -
where do they belong to?

Splitting code base
• Step 0 - JIRA Importers Plugin (3.5 years ago)
• Step 1- New Issue View and Navigator
• Step 2 - now everything else follows (e.g. Workflow Designer)
JIRA 6.0

Getting back from hell to heaven is difficult.
Hell sucks in your soul.

Key takeaways:
• Visibility and problem awareness help
• Maintaining huge testbed is difficult and costly
• Measure the problem - to baseline
• No prejudice - no sacred cows
• Automated tests are not one-off investment, it's a continuous journey
• Performance is a damn important feature
#atlassian

Test performance
is a damn
important feature!

XP vs Sad Reality
Cost of Change
Waterfall
XP - ideal
Sad Reality
Time

Images - Credits
• Green Traffic Light - by flrnt, CC-BY-SA-2.0
• Turtle - by Jonathan Zander, CC-BY-SA-3.0
• Loading - by MatthewJ13, CC-SA-3.0
• Merlin Tool - by L. Mahin, CC-BY-SA-3.0
• Flashing Red Light - by Chris Phan, CC BY 2.0
• In Heaven - by Daniel Pascoal, CC BY-NC-ND 2.0

Thank you!
WOJCIECH SELIGA • SENIOR DEV MANAGER • ATLASSIAN • @WSELIGA

Heavenly hell – automated tests at scale wojciech seliga

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Heavenly hell – automated tests at scale wojciech seliga

Similar to Heavenly hell – automated tests at scale wojciech seliga (20)

More from Atlassian

More from Atlassian (20)

Heavenly hell – automated tests at scale wojciech seliga