Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Advanced Java Testing @ POSS 2019
1. New types of tests for
Java projects
Vincent Massol, CTO XWiki SAS, December 2019
2. Agenda
•Context & Current status quo
•Coverage testing
•Mutation testing
•Environment testing
•Crash reproduction
3. Context: XWiki
• Open source wiki
• 15 years
• 10-15 active committers
• Very extensible, scripting in wiki pages
• Platform for developing ad-hoc web applications
• Strong build practices using Maven and lots of “Quality” plugins
• Using Jenkins & custom pipeline library for the CI
https://xwiki.org
4. Context: STAMP
• Automatic Test Amplification
• XWiki SAS participating, 3 years
• Experiments on XWiki project
Mutation testing Environment Testing Production Testing
5. Current Testing Status
• 10986 automated tests (in 2.5 hours):
• Unit tests (using Mockito)
• Integration tests (using Mockito)
• Functional (UI) tests (using Selenium/Webdriver)
6. New questions
• Are my tests testing enough? Coverage
• How good are my tests? Mutation testing
• Do my software work in various setups?
Environment testing
• How can I reproduce bugs found in production?
Production testing
= in place w/ strategy = in progress
7. Test Coverage - Local
• Using Jacoco and Clover
• Strategy - “Ratchet effect”:
• Each Maven module has a threshold
• Jacoco Maven plugin fails if new code
has less coverage than before in %
• Dev is allowed to increase threshold
Of course TPC is not panacea. You
could have 100% and app not
working. Also need functional tests.
Aim for 80%.
8. Test Coverage - Global
• Issue: Local coverage can increase and
global decrease
• Removed code with high TPC
• Code tested indirectly by functional
tests and code refactoring led to
different paths used
• New module with lower TPC than
average
Global TPC evolution
9. Test Coverage - Global
• Strategy:
• Global Clover TPC computed automatically every night on
Jenkins for all repos combined, using a pipeline
• Email sent to developers with report in email (see next slide)
• Developers fix module they have been working on
• Release Manager (RM) ensures that report passes before
release & we add one step in our Release Plan check list.
Source: http://massol.myxwiki.org/xwiki/bin/view/Blog/ComparingCloverReports
11. Mutation Testing
• Using PIT/Gregor, PIT/Descartes
• Concepts of PIT
• Modify code under test (mutants) and run tests
• Good tests kill mutants
• Generates a mutation score similar to the coverage %
• Descartes = extreme mutations that execute fast and have high
values
https://massol.myxwiki.org/xwiki/bin/view/Blog/MutationTestingDescartes
15. Mutation - Example
@Test
public void testEquality()
{
MacroId id1 = new MacroId("id", Syntax.XWIKI_2_0);
MacroId id2 = new MacroId("id", Syntax.XWIKI_2_0);
MacroId id3 = new MacroId("otherid", Syntax.XWIKI_2_0);
MacroId id4 = new MacroId("id", Syntax.XHTML_1_0);
MacroId id5 = new MacroId("otherid", Syntax.XHTML_1_0);
MacroId id6 = new MacroId("id");
MacroId id7 = new MacroId("id");
Assert.assertEquals(id2, id1);
// Equal objects must have equal hashcode
Assert.assertTrue(id1.hashCode() == id2.hashCode());
Assert.assertFalse(id3 == id1);
Assert.assertFalse(id4 == id1);
Assert.assertFalse(id5 == id3);
Assert.assertFalse(id6 == id1);
Assert.assertEquals(id7, id6);
// Equal objects must have equal hashcode
Assert.assertTrue(id6.hashCode() == id7.hashCode());
}
Not testing
for inequality!
Improved thanks to Descartes!
16. Mutation - Limitations
• Takes time to find interesting things to look at and decide if that’s an issue
to handle or not. Need better categorisation in report (now reported by
Descartes):
• Strong pseudo-tested methods:The worst! No matter what the return
values are the tests always fail
• Pseudo-tested methods: Grey area.The tests pass with at least one
modified value.
• Multi module support - PITmp
• But slow on large projects (e.g. 7+ hours just for xwiki-rendering)
17. Mutation - Strategy
• Seems to be working ok so far (6+ months of feedback now)
• But still young and not enough data about evolution
• Fail the build when the mutation score of a given module is below
a defined threshold in the pom.xml
• The idea is that new tests should, in average, be of quality equal or
better than past tests.
• Other idea: hook on CI to run it only on modified code/tests.
General goal with coverage + mutation: maintain quality
18. Mutation: Going further
• Using DSpot
• Uses PIT/Descartes but injects
results to generate new tests
• Adds assertions to existing tests
• Generate new test methods
• Selector can be PIT/Gregor, PIT/
Descartes, Jacoco (instruction
coverage), Clover (Branch
coverage)
https://massol.myxwiki.org/xwiki/bin/view/Blog/TestGenerationDspot
19. Mutation: Dspot Example 1
public void escapeAttributeValue2() {
String escapedText = XMLUtils.escapeAttributeValue("a < a' && a' < a" => a < a" {");
// AssertGenerator add assertion
Assert.assertEquals("a < a' && a' < a" => a < a" {", escapedText);
// AssertGenerator create local variable with return value of invocation
boolean o_escapeAttributeValue__3 = escapedText.contains("<");
// AssertGenerator add assertion
Assert.assertFalse(o_escapeAttributeValue__3);
// AssertGenerator create local variable with return value of invocation
boolean o_escapeAttributeValue__4 = escapedText.contains(">");
// AssertGenerator add assertion
Assert.assertFalse(o_escapeAttributeValue__4);
// AssertGenerator create local variable with return value of invocation
boolean o_escapeAttributeValue__5 = escapedText.contains("'");
// AssertGenerator add assertion
Assert.assertFalse(o_escapeAttributeValue__5);
// AssertGenerator create local variable with return value of invocation
boolean o_escapeAttributeValue__6 = escapedText.contains(""");
// AssertGenerator create local variable with return value of invocation
boolean o_escapeAttributeValue__7 = escapedText.contains("&&");
// AssertGenerator add assertion
Assert.assertFalse(o_escapeAttributeValue__7);
// AssertGenerator create local variable with return value of invocation
boolean o_escapeAttributeValue__8 = escapedText.contains("{");
// AssertGenerator add assertion
Assert.assertFalse(o_escapeAttributeValue__8);
}
Generated test
New test
@Test
public void escapeAttributeValue()
{
String escapedText = XMLUtils.escapeAttributeValue("a < a' && a' < a" => a < a" {");
assertFalse("Failed to escape <", escapedText.contains("<"));
assertFalse("Failed to escape >", escapedText.contains(">"));
assertFalse("Failed to escape '", escapedText.contains("'"));
assertFalse("Failed to escape "", escapedText.contains("""));
assertFalse("Failed to escape &", escapedText.contains("&&"));
assertFalse("Failed to escape {", escapedText.contains("{"));
}
Original test
20. Mutation: Dspot Example 2
Generated test
Original test
Also increase coverage
Before: 70.5%
After: 71.2%
21. Mutation: Dspot Strategy
• DSpot is very slow to execute (between 3 to 20mn on
small modules)
• One strategy is to run it on CI and in the pipeline commit
generated tests in a different source root.
• And run it only on Tests affected by commit changeset
• Configure Maven to add a new test directory source using
the Maven Build Helper plugin.
• Work in progress: small coverage and mutation score
improvements on XWiki so far.
22. Environment Testing
• Environment = combination of Servlet
container & version, DB & version, OS,
Browser & version
• Future: cluster mode, LibreOffice
integration, external SOLR, etc
• Need: Be able to run/debug functional
tests on local dev machines as well as on
CI
• Using Docker / TestContainers
25. Environment Testing
• Feedback: takes about 3 minutes to deploy all (and 1 minute for the
test)
• Strategy
• Run on CI (Jenkins)
• 3 jobs
• “latest”: latest versions of all elements (DB, Servlet Container,
Browser, etc). Once per day
• “all”: all supported versions. Once per week
• “unsupported”: what we want to support in the future. Once
per month.
• Future: IE/Edge + Docker in Docker
• Some instability with Docker and DinD/DooD.
26. Crash Reproduction
• Tool: Botsing
• Concept:Take a stack trace
and generates a test that,
when executed, leads to this
stack trace
• i.e. find the conditions that
leads to the problem
28. Botsing - Feedback
• Can take a long time to reproduce, doesn’t always succeed
• Generates a test that reproduces the problem, not the fix!
• Often you’d write a test at a different level (usually up in the call
chain, to be more meaningful to the use case)
• Is useful for newcomers who don’t know the codebase well as it
helps pinpoint the problem. Acts as a timesaver.
29. Parting words
• Experiment, push the limit!
• Some other types of tests not covered and that also need
automation
• Backward compatibility testing
• Performance/Stress testing
• Usability testing
• others?