This document discusses using machine learning to improve testing and quality assurance processes. It describes collecting historical data from various sources like requirements, development, testing, and operations. This data is then used to train predictive models using machine learning platforms like IBM Watson and SPSS Modeler. The trained models can predict metrics like defects, test cases needed, and component risks to help schedule testing and prioritize resources. The presenter shares their experience piloting this approach and lessons learned around data needs, model training, and demonstrating value to convince stakeholders.
2. About this talk
► To share our motivation and experience in adopting Machine Learning to
conduct Predictive Analytics supporting Test&QA – incl. advices.
► To inspire others to experiment in the same course of action to improve
test effectiveness and thus cost-savings.
► To establish a Community for exchanging ideas, practical experiences in
models and technology adoption.
2
3. About the speaker
► Managing Consultant / Test Manager @ Sogeti Norway.
► 21 years working experiences in Test & QA.
• Consultant / Advisor
• Line Manager
• Board member in DND Software Testing
► Areas of interest:
• Test & QA, Methodology, Measurement, Automation, Machine Learning, …
► 1992 – MSc. Computer Science, NTNU
► 1997 – PhD. "Software Process Improvement by Empirical Data", NTNU
3
4. Motivation
► Machine Learning (ML) has been widely adopted in Business - but not
fully exploited in SW development lifecycle (SDLC), especially Test&QA.
► Huge amount of data collected and available in SDLC.
► Availability of ease-in-use ML technology - requiring no extensive skills in
Mathematics and Statistics.
Is it possible to adopt ML to improve test process for a real
case based on present knowledge and past data?
6. Business Use Case for Client X
► Known Inputs:
• Release: duration, no of changes, components, …
► Expected outcome:
• Predicted answers to the use case presented along with the degree of confidence.
As Test and Release Manager I want to know for a given Release:
• How many Defects can be expected? (Severity, Avg. FixedTime)
• In which Components they are expected to be found?
• Which Test Cases are required to be executed?
So that I can schedule test phase, prioritize my test cases and allocate
resources to achieve optimal test performance.
8. 11
Data sources for our case
Requirements
Development
Project /
Release
Test
Operation
Statistics
Defect
Confluence
Jira
Zephyr
Remedy
Splunk
Confluence
Remedy
Jira
Known variables
Unknown variables
N/A
10. Data snapshot
Project – 4
Release – 33
REQUIREMENT:
- Projects: 265
- Maintenance: 123
DEFECT:
- Projects: 1053
- Maintenance: 345
- Found in Prod: 40
TEST:
- Test cases: 265
- Regression TC: 378
- Executed tests: 1759 / 302
- Failed tests: 106 / 32
- Unexecuted tests: 431 / 550
OPERATION:
- Recorded Incident: 68
- System initiated Incident
Unstructured
text documents
11. 14
Ongoing training phase
Training
data set
Test
data set
ML Predictive
Models
Supervised and
unsupervised training
Tester
Historical data
(structured and
unstructured data)
12. ML platforms
1. IBM Watson services:
• Natural Language Understanding
• Machine Learning Predictive models
2. IBM SPSS (Statistical Package for Social Science):
• SPSS Modeler - Drag-and-drop data exploration to
machine learning without coding.
15
https://www.ibm.com/products/spss-modeler
https://www.ibm.com/watson/
• Easy-to-use and quick enabler.
• No need to be a skilled Data Scientist
• Access to specialists in Capgemini/Sogeti
14. Lesson learned
► Exciting journey to explore PA/ML and its capability in improving Test&QA
► Closer look and hands-on to ML technology.
17
Needs
► Challenging to convince Business Stakeholder.
► Iterative process to align Use Case and data availability.
► Require Test&QA knowledge to identify and assess data quality.
► Agile manifesto – "Working software values more than Comprehensive
documentation" vs. Required data volume for PA.
15. Lesson learned
18
► Report generation in Jira, Remedy is useful.
► Mix of different languages in data sets is a challenging issue.
► Lacking or inconsistent of essential data inputs in data samples - poorly
recognized correlations/patterns/trends – lower degree of confidence.
► Do not underestimate the value of information in unstructured texts.
Analysis
Training
► IBM Watson Cloud – rich of services. Access to skilled people.
► Still unsure what is sufficient number of data samples for training phase
to fully empower the ML capability to discover the "Unknowns"
16. Take-aways
19
► Just jump into it – “Trial and failure”
• Identify needs for prediction in Test&QA activities.
• Analyze and assess data available at your own organization.
• Pick an “easy-to-start” ML technology.
• Few data samples is better than nothing.
• Demonstrate benefits with outcomes despite of low confidence.
► Improve during learning-path:
• Change process to collect relevant data – enrich collected data to improve
predictive capability.
• Evaluate the use of other data sources or ML platforms.
► Keep searching for external knowledge and experiences.
17. Thank you for listening
20
Minh Nguyen
+47 982 28 460
minh.nguyen@sogeti.no
www.linkedin.com/in/minhng67/