SlideShare a Scribd company logo
1 of 90
Download to read offline
European initiatives where academia
and industry get together.	
  

Tanja E.J. Vos
Centro de Investigación en Métodos de Producción de Software
Universidad Politecnica de Valencia
Valencia, Spain
tvos@pros.upv.es
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	


1
Overview
	
  
•  EU	
  projects	
  (with	
  SBST)	
  that	
  I	
  have	
  been	
  coordina;ng:	
  
–  EvoTest	
  (2006-­‐2009)	
  
–  FITTEST	
  (2010-­‐2013)	
  

•  What	
  is	
  means	
  to	
  coordinate	
  them	
  and	
  how	
  they	
  are	
  structured	
  
•  How	
  do	
  we	
  evaluate	
  their	
  results	
  through	
  academia-­‐industry	
  
projects.	
  

The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	


2
EvoTest
	
  
• 
• 
• 
• 

Evolutionary Testing for Complex Systems
September 2006– September 2009
Total costs: 4.300.000 euros
Partners:	
  
– 
– 
– 
– 
– 
– 
– 

Universidad	
  Politecnica	
  de	
  Valencia	
  (Spain)	
  
University	
  College	
  London	
  (United	
  Kingdom)	
  
Daimler	
  Chrysler	
  (Germany)	
  
Berner	
  &	
  MaVner	
  (Germany)	
  
Fraunhofer	
  FIRST	
  (Germany)	
  
Motorola	
  (UK)	
  
Rila	
  Solu;ons	
  (Bulgaria)	
  

•  Website	
  does	
  not	
  work	
  anymore

The FITTEST project is funded by the European Commission (FP7-ICT-257574)
EvoTest	
  objec;ves/results
	
  
•  Apply	
  Evolu;onary	
  Search	
  Based	
  Tes;ng	
  techniques	
  to	
  solve	
  tes;ng	
  problems	
  
from	
  a	
  wide	
  spectrum	
  of	
  complex	
  real	
  world	
  systems	
  	
  in	
  an	
  industrial	
  context.	
  	
  
•  Improve	
  the	
  power	
  of	
  evolu;onary	
  algorithms	
  for	
  searching	
  important	
  test	
  
scenarios,	
  hybridising	
  with	
  other	
  techniques:	
  
–  other	
  general-­‐purpose	
  search	
  techniques,	
  
–  other	
  advanced	
  so_ware	
  engineering	
  techniques,	
  such	
  as	
  slicing	
  and	
  program	
  transforma;on.	
  
	
  

•  An	
  extensible	
  and	
  open	
  Automated	
  Evolu;onary	
  Tes;ng	
  Architecture	
  and	
  
Framework	
  will	
  be	
  developed.	
  This	
  will	
  provide	
  general	
  components	
  and	
  
interfaces	
  to	
  facilitate	
  the	
  automa;c	
  genera;on,	
  execu;on,	
  monitoring	
  and	
  
evalua;on	
  of	
  effec;ve	
  test	
  scenarios.	
  
	
  
	
  

The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	


4
FITTEST
	
  
• 
• 
• 
• 

Future Internet Testing
September 2010 – December 2013
Total costs: 5.845.000 euros
Partners:	
  
– 
– 
– 
– 
– 
– 
– 

Universidad	
  Politecnica	
  de	
  Valencia	
  (Spain)	
  
University	
  College	
  London	
  (United	
  Kingdom)	
  
Berner	
  &	
  MaVner	
  (Germany)	
  
IBM	
  (Israel)	
  
Fondazione	
  Bruno	
  Kessler	
  (Italy)	
  
Universiteit	
  Utrecht	
  (The	
  Netherlands)	
  
So_team	
  (France)	
  	
  

•  	
  hVp://www.pros.upv.es/fiVest/	
  

The FITTEST project is funded by the European Commission (FP7-ICT-257574)
FITTEST	
  objec;ves/results
	
  
•  Future Internet Applications
–  Characterized by an extreme high level of dynamism
–  Adaptation to usage context (context awareness)
–  Dynamic discovery and composition of services
–  Etc..
•  Testing of these applications gets extremely important
•  Society depends more and more on them
•  Critical activities such as social services, learning, finance, business.
•  Traditional testing is not enough
–  Testwares are fixed
•  Continuous testing is needed
–  Testwares that automatically adapt to the dynamic behavior of the
Future Internet application
–  This is the objective of FITTEST
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
How	
  does	
  it	
  work?
	
  
LOGGING

TEST-WARE GENERATION

LOGGING

TEST EXECUTION

TEST EVALUATION

OPTIONAL MANUAL EXTENSIONS

Domain
1.  Run Input target System that is Under Test
the
Specifications
(SUT)
2.  Collect the logs it generates

Domain experts

End-users
INSTRUMENT

ANALYSE & INFER
MODELS
RUN SUT

SUT

GENERATE
TEST CASES

This can be done by:
Model based

MANUALLY

oracles

COLLECT &
PREPROCESS
LOGS

•  real usage by end users of the application
AUTOMATE
TEST CASES
in the production environment

ANALYSE &
INFER ORACLES

EXECUTE
TEST CASES

Log based
oracles

EVALUATE
TEST CASES

•  test case execution in the test environment. TEST
RESULTS
HUMAN
ORACLES

PROPERTIES
BASED ORACLES

PATTERN BASED
ORACLES

MODEL BASED
ORACLES

FREE ORACLES

ORACLES
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
How	
  does	
  it	
  work?	
  
	
  
GENERATION
LOGGING

TEST-WARE GENERATION

2.  Generate different testwares:

Domain Input
Specifications

sers

ANALYSE & INFER
MODELS

GENERATE
TEST CASES

RUN SUT
MANUALLY

Model based
oracles

COLLECT &
PREPROCESS
LOGS

ANALYSE &
INFER ORACLES

TEST EVALUATION

1.  Analyse the logs

OPTIONAL MANUAL EXTENSIONS

experts

UMENT

TEST EXECUTION

AUTOMATE
TEST CASES

Log based
oracles

•  Models
•  Domain Input Specification
•  Oracles

3.  Use these to generate and
automate a test suite consisting
EXECUTE
EVALUATE
TEST CASES off: TEST CASES
TEST
RESULTS

HUMAN
ORACLES

PROPERTIES
BASED ORACLES

PATTERN BASED
ORACLES

MODEL BASED
ORACLES

FREE ORACLES

ORACLES
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	


•  Abstract test cases
•  Concrete test cases
•  Pass/Fail Evaluation criteria
How	
  does	
  it	
  work?	
  
	
  
LOGGING

TEST-WARE GENERATION

TEST EXECUTION

TEST EVALUATION

OPTIONAL MANUAL EXTENSIONS

Domain experts

Domain Input
Specifications

End-users
INSTRUMENT

ANALYSE & INFER
MODELS

GENERATE
TEST CASES

RUN SUT

MANUALLY

Model based
oracles

SUT

COLLECT &
PREPROCESS
LOGS

ANALYSE &
INFER ORACLES

Execute the test cases
and start a new test
cycle for continuous
testing and adaptation of
the test wares!

AUTOMATE
TEST CASES
EXECUTE
TEST CASES

Log based
oracles

EVALUATE
TEST CASES
TEST
RESULTS

HUMAN
ORACLES

PROPERTIES
BASED ORACLES

PATTERN BASED
ORACLES

MODEL BASED
ORACLES

FREE ORACLES

ORACLES
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
What	
  does	
  it	
  mean	
  to	
  be	
  an	
  EU	
  
	
  
project	
  coordinator
	
  
•  Understand what the project is about, what needs to be
done and what is most important.
•  Do NOT be afraid to get your hands dirty.
•  Do NOT assume people are working as hard on the project
as you are (or are as enthusiastic as you ;-)
•  Do NOT assume that the people that ARE responsible for
some tasks TAKE this responsibility
•  Get CC-ed in all emails and deal with it
•  Stalk people (email, sms, whatsapp, skype, voicemail
messages)
•  Be patient when explaining the same things over and over
again
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	


10
EU	
  project	
  structures
	
  
•  We have: WorkPackages (WP)
•  These are composed of: Tasks
•  These result in: Deliverables

WP	
  
Project	
  Management	
  

WP:	
  Do	
  more	
  research	
  
WP:	
  And	
  some	
  more	
  

WP:	
  Integrated	
  it	
  
all	
  together	
  in	
  a	
  
superduper	
  
solu;on	
  that	
  
industry	
  needs	
  

WP:	
  And	
  more……..	
  
WP:	
  Evaluate	
  Your	
  Results	
  through	
  Case	
  Studies	
  

The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	


WP	
  
Exploita;on	
  and	
  Dissemina;on	
  

WP:	
  Do	
  some	
  research	
  
EU	
  project
	
  
how	
  to	
  evaluate	
  your	
  results
	
  
•  You need to do studies that evaluate the resulting testing
tools/techniques within a real industrial environment

The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	


12
WE	
  NEED	
  MORE	
  THAN	
  …
	
  
γ

b

α

Glenford Myers 1979!
Triangle Problem
Test a program which
can return the type of a
triangle based on the
widths of the 3 sides

c

The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	


a
β

For	
  evalua;on	
  
of	
  tes;ng	
  tools	
  
we	
  need	
  more!	
  
WE	
  NEED….
	
  
Real	
  people	
  

Real	
  faults	
  

Empirical	
  studies	
  
with	
  

Real	
  Tes;ng	
  
Processes	
  
Real	
  systems	
  
Real	
  Tes;ng	
  
Environments	
  
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
What	
  are	
  empirical	
  studies
	
  
•  [Wikipedia]	
  Empirical	
  research	
  is	
  a	
  way	
  of	
  gaining	
  knowledge	
  by	
  means	
  of	
  
direct	
  and	
  indirect	
  observa8on	
  or	
  experience.	
  
•  [PPV00]	
  An	
  empirical	
  study	
  is	
  really	
  just	
  a	
  test	
  that	
  compares	
  what	
  we	
  
believe	
  to	
  what	
  we	
  observe.	
  Such	
  tests	
  when	
  wisely	
  constructed	
  and	
  
executed	
  play	
  a	
  fundamental	
  role	
  in	
  so?ware	
  engineering,	
  helping	
  us	
  
understand	
  how	
  and	
  why	
  things	
  work.	
  
•  Collec;ng	
  data:	
  
–  Quan;ta;ve	
  data	
  -­‐>	
  numeric	
  data	
  
–  Qualita;ve	
  data	
  -­‐>	
  observa;ons,	
  interviews,	
  opinions,	
  diaries,	
  etc.	
  

•  Different	
  kinds	
  are	
  dis;nguished	
  in	
  literature	
  [WRH+00,	
  RH09]	
  
– 
– 
– 
– 

Controlled	
  experiments	
  
Surveys	
  
Case	
  Studies	
  
Ac;on	
  research	
  

The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	


15
Controlled	
  experiments
	
  
What	
  
Experimental	
   inves;ga;on	
   of	
   hypothesis	
   in	
   a	
   laboratory	
   semng,	
   in	
   which
	
  
condi;ons	
  are	
  set	
  up	
  to	
  isolate	
  the	
  variables	
  of	
  interest	
  ("independent	
  variables")
	
  
and	
   test	
   how	
   they	
   affect	
   certain	
   measurable	
   outcomes	
   (the	
   "dependent
	
  
variables")	
  

Good	
  for	
  
–  Quan;ta;ve	
  analysis	
  of	
  benefits	
  of	
  a	
  tes;ng	
  tool	
  or	
  technique	
  
–  We	
  can	
  use	
  methods	
  showing	
  sta;s;cal	
  significance	
  
–  We	
  can	
  demonstrate	
  how	
  scien;fic	
  we	
  are!	
  [EA06]	
  

Disadvantages	
  
–  Limited	
  confidence	
  that	
  laboratory	
  set-­‐up	
  reflects	
  the	
  real	
  situa;on	
  
–  ignores	
  contextual	
  factors	
  (e.g.	
  social/organiza;onal/poli;cal	
  factors)	
  
–  extremely	
  ;me-­‐consuming	
  

See:	
  [BSH86,	
  CWH05,	
  PPV00,	
  Ple95]	
  
	
  The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	


16
Surveys
	
  
What	
  
Collec;ng	
  informa;on	
  to	
  describe,	
  compare	
  or	
  explain	
  knowledge,	
  amtudes	
  and	
  
behaviour	
  over	
  large	
  popula;ons	
  using	
  interviews	
  or	
  ques0onnaires.	
  

Good	
  for	
  
–  Quan;ta;ve	
  and	
  qualita;ve	
  data	
  
–  Inves;ga;ng	
  the	
  nature	
  of	
  a	
  large	
  popula;on	
  
–  Tes;ng	
  theories	
  where	
  there	
  is	
  liVle	
  control	
  over	
  the	
  variables	
  

Disadvantages	
  
–  Difficul;es	
  of	
  sampling	
  and	
  selec;on	
  of	
  par;cipants	
  
–  Collected	
  informa;on	
  tends	
  to	
  subjec;ve	
  opinion	
  

See:	
  [PK01-­‐03]	
  
	
  

The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	


17
What	
  

Case	
  Studies
	
  

A	
  technique	
  for	
  detailed	
  exploratory	
  inves;ga;ons	
  that	
  aVempt	
  to	
  understand	
  
and	
  explain	
  phenomenon	
  or	
  test	
  theories	
  within	
  their	
  context	
  

Good	
  for	
  
– 
– 
– 
– 

Quan;ta;ve	
  and	
  qualita;ve	
  data	
  
Inves;ga;ng	
  capability	
  of	
  a	
  tool	
  within	
  a	
  specific	
  real	
  context	
  
Gaining	
  insights	
  into	
  chains	
  of	
  cause	
  and	
  effect	
  
Tes;ng	
  theories	
  in	
  complex	
  semngs	
  where	
  there	
  is	
  liVle	
  control	
  over	
  the	
  
variables	
  (Companies!)	
  

Disadvantages	
  
–  Hard	
  to	
  find	
  good	
  appropriate	
  case	
  studies	
  
–  Hard	
  to	
  quan;fy	
  findings	
  
–  Hard	
  to	
  build	
  generaliza;ons	
  (only	
  context)	
  

See:	
  [RH09,	
  EA06,	
  Fly06]	
  
	
  The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

	
  

18
Ac;on	
  Research
	
  
What	
  
research	
   ini;ated	
   to	
   solve	
   an	
   immediate	
   problem	
   (or	
   a	
   reflec;ve	
   process	
   of
	
  
progressive	
   problem	
   solving)	
   involving	
   a	
   process	
   of	
   ac;vely	
   par;cipa;ng	
   in	
   an
	
  
organiza;on’s	
  change	
  situa;on	
  whilst	
  conduc;ng	
  research	
  

Good	
  for	
  
–  Quan;ta;ve	
  and	
  qualita;ve	
  data	
  
–  When	
  the	
  goal	
  is	
  solving	
  a	
  problem.	
  
–  When	
  the	
  goal	
  of	
  the	
  study	
  is	
  change.	
  

Disadvantages	
  
–  Hard	
  to	
  quan;fy	
  findings	
  
–  Hard	
  to	
  build	
  generaliza;ons	
  (only	
  context)	
  

See:	
  [RH09,	
  EA06,	
  Fly06,	
  Rob02]	
  
	
  
	
  
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	


19
EU	
  project
	
  
how	
  to	
  evaluate	
  your	
  results
	
  
•  You need to do empirical studies that evaluate the resulting testing
tools/techniques within a real industrial environment
•  The empirical study that best fits our purposes is Case Study
–  Evaluate the capability of our testing techiques/tools
–  In a real industrial context
–  Comparing to current testing practice

The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	


20
Case	
  studies:	
  not	
  only	
  for	
  jus;fying	
  
research	
  projects
	
  
•  We	
  need	
  to	
  apply	
  our	
  results	
  in	
  industry	
  to	
  understand	
  the	
  problems	
  
they	
  have	
  and	
  if	
  we	
  are	
  going	
  the	
  right	
  direc;on	
  to	
  solve	
  them.	
  
•  Real	
   need	
   in	
   so_ware	
   engineering	
   industry	
   to	
   have	
   general	
   guidelines
	
  
on	
   what	
   tes;ng	
   techniques	
   and	
   tools	
   to	
   use	
   for	
   different	
   tes;ng
	
  
objec;ves,	
  and	
  how	
  usable	
  these	
  techniques	
  are.	
  
•  Up	
  to	
  date	
  these	
  guidelines	
  do	
  not	
  exist.	
  
•  If	
  we	
  would	
  have	
  a	
  body	
  of	
  documented	
  experiences	
  and	
  knowledge
	
  
from	
  which	
  the	
  needed	
  guidelines	
  can	
  be	
  extracted	
  
•  With	
   these	
   guidelines,	
   tes;ng	
   prac;cioners	
   might	
   make	
   informed
	
  
decisions	
  about	
  which	
  techniques	
  to	
  use	
  and	
  es;mate	
  the	
  ;me/effort
	
  
that	
  is	
  needed.	
  
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	


21
The	
  challenge
	
  
To create a body of evidence consisting of evaluative studies of testing
techniques and tools that can be used to understand the needs of industry and
derive general guidelines about their usability and applicability in industry.
TO DO THIS WE HAVE TO:
• 
• 
• 
• 

Perform more evaluative empirical case studies in industrial environments
Carry out these studies by following the same methodology, to enhance the
comparison among the testing techniques and tools,
Involve realistic systems, environments and subjects (and not toy-programs
and students as is the case in most current work).
Do the studies thoroughly to ensure that any benefit identified during the
evaluation study is clearly derived from the testing technique studied, and also
to ensure that different studies can be compared.

FOR THIS WE NEED:
• 

A general methodological evaluation framework that can simplify the design
of case studies for comparing software testing techniques and make the results
more precise, reliable, and easy to compare.
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
The	
  idea
	
  

InstanDate	
  	
  

Primary	
  case	
  studies	
  

General	
  Framework	
  for	
  EvaluaDng	
  TesDng	
  
Techniques	
  and	
  Tools	
  	
  	
  	
  

Tes;ng	
  Techniques	
  and	
  Tools	
  (TT)	
  

TT1	
   TT2	
   TTi	
   TTn	
  
CS1	
  

CS1	
  

CS1	
  

CS1	
  

CS2	
  

CS2	
  

CS2	
  

CS2	
  

CSn	
  

CSn	
  

CSn	
  

Body of
Evidence

CSn	
  

Secondary	
  study	
  
(EBSE)	
  

Obtain	
  answers	
  to	
  general	
  ques;ons	
  about	
  adop;ng	
  different	
  so_ware	
  
tes;ng	
  techniques	
  and	
  or	
  tools.	
  
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
Very	
  brief…what	
  is	
  EBSE
	
  

Evidence	
  Based	
  So_ware	
  Engineering
	
  
•  The	
   essence	
   of	
   the	
   evidence-­‐based	
   paradigm	
   is	
   that	
   of	
   systema;cally
	
  
collec0ng	
   and	
   analysing	
   all	
   of	
   the	
   available	
   empirical	
   data	
   about	
   a	
   given
	
  
phenomenon	
   in	
   order	
   to	
   obtain	
   a	
   much	
   wider	
   and	
   more	
   complete
	
  
perspec;ve	
   than	
   would	
   be	
   obtained	
   from	
   an	
   individual	
   study,	
   not	
   least
	
  
because	
  each	
  study	
  takes	
  place	
  within	
  a	
  par;cular	
  context	
  and	
  involves	
  a
	
  
specific	
  set	
  of	
  par;cipants.	
  
•  The	
   core	
   tool	
   of	
   the	
   evidence-­‐based	
   paradigm	
   is	
   the	
   Systema;c	
   Literature
	
  
Review	
  (SLR)	
  
–  Secondary	
  study	
  
–  Gathering	
  an	
  analysing	
  primary	
  stydies.	
  
	
  

•  See:	
  hVp://www.dur.ac.uk/ebse/about.php	
  

The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	


24
The	
  idea
	
  

InstanDate	
  	
  

Primary	
  case	
  studies	
  

General	
  Framework	
  for	
  EvaluaDng	
  TesDng	
  
Techniques	
  and	
  Tools	
  	
  	
  	
  

Tes;ng	
  Techniques	
  and	
  Tools	
  (TT)	
  

TT1	
   TT2	
   TTi	
   TTn	
  
CS1	
  

CS1	
  

CS1	
  

CS1	
  

CS2	
  

CS2	
  

CS2	
  

CS2	
  

CSn	
  

CSn	
  

CSn	
  

Body of
Evidence

CSn	
  

Secondary	
  study	
  
(EBSE)	
  

Obtain	
  answers	
  to	
  general	
  ques;ons	
  about	
  adop;ng	
  different	
  so_ware	
  
tes;ng	
  techniques	
  and	
  or	
  tools.	
  
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
Empirical	
  research	
  with	
  industry	
  =	
  difficult
	
  
•  Not “rocket science”-difficult

The FITTEST project is funded by the European Commission (FP7-ICT-257574)
Empirical	
  research	
  with	
  industry	
  =	
  difficult
	
  
•  But “communication science”-difficult

The FITTEST project is funded by the European Commission (FP7-ICT-257574)
“communica;on	
  science”-­‐difficult
	
  
Practitioner in a company

Researcher

Only
takes 1
hour

Not so
long

The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	


Not so
long

5 minutes

28
Communica;on	
  is	
  extremely	
  difficult
	
  

The FITTEST project is funded by the European Commission (FP7-ICT-257574)
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	


30
Examples…..
	
  
Academia	
  

Industry	
  

•  Wants	
  to	
  empirically	
  evaluate	
  T	
  

•  Wants	
  to	
  execute	
  and	
  use	
  T	
  to	
  see	
  what	
  
happens.	
  
•  We	
  use	
  intui;on!	
  
	
  
•  You	
  want	
  me	
  to	
  know	
  all	
  that?	
  
•  That	
  much	
  ;me!?	
  
•  We	
  cannot	
  give	
  this	
  informa;on.	
  
•  Not	
  ar;ficial	
  ones,	
  we	
  really	
  need	
  to	
  
know	
  if	
  this	
  would	
  work	
  for	
  real	
  faults.	
  
•  We	
  can	
  assign	
  1	
  person.	
  
•  That	
  is	
  confiden;al.	
  
•  Oh..,	
  I	
  thought	
  you	
  did	
  not	
  need	
  that.	
  

•  What	
  techniques/tools	
  can	
  we	
  
compare	
  with?	
  
•  Why	
  don´t	
  you	
  know	
  that?	
  
•  That	
  does	
  not	
  take	
  so	
  much	
  ;me!	
  
•  Finding	
  real	
  faults	
  would	
  be	
  great!	
  
•  Can	
  we	
  then	
  inject	
  faults?	
  
•  How	
  many	
  people	
  can	
  use	
  it?	
  
•  Is	
  there	
  historical	
  	
  data?	
  
•  But	
  you	
  do	
  have	
  that	
  informa;on?	
  

The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	


31
With	
  the	
  objec;ve	
  to	
  improve	
  and	
  
reduce	
  some	
  barriers
	
  
• 
• 
• 
• 

Use	
  a	
  general	
  methodological	
  framework.	
  
To	
  use	
  as	
  a	
  vehicle	
  of	
  communica;on	
  
To	
  simplify	
  the	
  design	
  
To	
  make	
  sure	
  that	
  studies	
  can	
  be	
  compared	
  and	
  aggregated	
  

The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	


32
Exis;ng	
  work	
  
	
  
•  By Lott & Rombach, Eldh, Basili, Do et al, Kitchenham et al
•  Describe organizational frameworks, i.e.:
–  General steps
–  Warnings when designing
–  Confounding factors that should be minimized

•  We pretended to define a methodological framework that
defined how to evaluate software testing techiques, i.e.:
–  The research questions that can be posed
–  The variables that can be measured
–  Etc.

The FITTEST project is funded by the European Commission (FP7-ICT-257574)
The	
  methodological	
  framework
	
  
• 

Imagine a company C wants to evaluate T to see whether it is useful and
worthwhile to incorporate in its company.

• 

Components of the Framework (each case study will be an instantiation)
–  Objectives: effectiveness, efficiency, satisfaction
–  Cases or treatments (= the testing techniques/tools)
–  The subjects (= practitioners that will do the study)
–  The objects or pilot projects: selection criteria.
–  The variables and metrics: which data to collect?
–  Protocol that defines how to execute and collect data.
–  How to analyse the data
–  Threats to validity
–  Toolbox
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
Defini;on	
  of	
  the	
  framework	
  –	
  research questions
• 

RQ1: How does T contribute to the effectiveness of testing when it is being
used in real testing environments of C and compared to the current practices
of C?

• 

RQ2: How does T contribute to the efficiency of testing when it is being used
in real testing environments of C and compared to the current practices of C?

• 

RQ3: How satisfied (subjective satisfaction) are testing practitioners of C
during the learning, installing, configuring and usage of T when used in real
testing environments?

The FITTEST project is funded by the European Commission (FP7-ICT-257574)
Defini;on	
  of	
  the	
  framework	
  –	
  the cases
• 

Reused a taxonomy from Vegas and Basili adapted to software testing tools
and augmented with results from Tonella

• 

The case or the testing technique or tool should be described by:
–  Prerequisites: type, life-cycle, environment (platform and languages),
scalability, input, knowledge needed, experience needed.
–  Results: output, completeness, effectiveness, defect types, number of
generated test cases.
–  Operation: Interaction modes, user guidance, maturity, etc.
–  Obtaining the tool: License, cost, support.

The FITTEST project is funded by the European Commission (FP7-ICT-257574)
Defini;on	
  of	
  the	
  framework	
  –	
  subjects
• 

Workers of C that normally use the techniques and tools with which T is being
compared.

The FITTEST project is funded by the European Commission (FP7-ICT-257574)
Defini;on	
  of	
  the	
  framework	
  – the objects/pilot projects
• 

In order to see how we can compare and what type of study we will do, we
need answers to the following questions:
A.  Will	
  we	
  have	
  access	
  to	
  a	
  system	
  with	
  known	
  faults?	
  What	
  informa;on	
  is	
  present	
  
about	
  these	
  faults?	
  
B.  Are	
  we	
  allowed/able	
  to	
  inject	
  faults	
  into	
  the	
  system?	
  
C.  Does	
  your	
  company	
  gather	
  data	
  from	
  projects	
  as	
  standard	
  prac;ce?	
  What	
  data	
  is	
  
this?	
  Can	
  this	
  data	
  be	
  made	
  available	
  for	
  comparison?	
  Do	
  you	
  have	
  a	
  company	
  
baseline?	
  Do	
  we	
  have	
  access	
  to	
  a	
  sister	
  project?	
  
D.  Does	
  company	
  C	
  have	
  enough	
  ;me	
  and	
  resources	
  to	
  execute	
  various	
  rounds	
  of	
  
tests?,	
  or	
  more	
  concrete:	
  
•  Is	
  company	
  C	
  willing	
  to	
  make	
  a	
  new	
  testsuite	
  TSna	
  with	
  some	
  technique/tool	
  
Ta	
  already	
  used	
  in	
  the	
  company	
  C?	
  
•  Is	
  company	
  C	
  is	
  willing	
  to	
  make	
  a	
  new	
  testsuite	
  TSnn	
  with	
  some	
  technique/
tool	
  Tn	
  that	
  is	
  also	
  new	
  to	
  company	
  C?	
  
•  Can	
  we	
  use	
  an	
  exis;ng	
  testsuite	
  TSe	
  that	
  we	
  can	
  use	
  to	
  compare?	
  (Do	
  we	
  
know	
  the	
  techniques	
  that	
  were	
  used	
  to	
  create	
  that	
  test	
  suite,	
  and	
  how	
  much	
  
;me	
  it	
  took?)	
  
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
 

Defini;on	
  of	
  the	
  framework	
  -­‐	
  Protocol	
  
Can we inject
faults?

YES

NO

Training on T

Inject faults

do T to make
TST; collect
data

Can we compare with an existing test
suite from company C (i.e. TSC)

NO

Can company C make
another test suite TS N for
comparison using Tknown or
Tunknown

NO

NO

NO

Do we have a
version with known
faults?

Do we have a
company baseline?

YES

YES

Do we have a
version with known
faults?
NO

YES

3

Do we have a
version with known
faults?

YES

2
6

The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	


7

YES

4

do Tknown o
Tunknown to
make TSN ;
collect data

NO

1

YES

5
 

Defini;on	
  of	
  the	
  framework	
  -­‐	
  Scenarios	
  
• 

• 
• 
• 
• 
• 
• 
• 

Remember:
o  Does company C have enough time and resources to execute various rounds of
tests?, or more concrete:
•  Is company C willing to make a new testsuite TSna with some technique/tool Tal
already used in the company C?
•  Is company C is willing to make a new testsuite TSnn with some technique/tool
Tn that is also new to company C?
•  Can we use an existing testsuite TSe that we can use to compare? (Do we
know the techniques that were used to create that test suite, and how much
time it took?)
Scenario 1 (qualitative assessment only) (Qualitative Effects Analysis)
Scenario 2 (Scenario 1 / quantitative analysis based on company baseline)
Scenario 3 ((Scenario 1 / Scenario 2) / quantitative analysis of FDR)
Scenario 4 ((Scenario 1 / Scenario 2) / quantitative comparison of T and TSe)
Scenario 5 (Scenario 4 / FDR of T and TSe)
Scenario 6 ((Scenario 1 / Scenario 2) / quantitative comparison of T and (Ta or Tn))
Scenario 7 (Scenario 6 / FDR of T and (Ta or Tn))

The FITTEST project is funded by the European Commission (FP7-ICT-257574)
If	
  we	
  can	
  inject	
  faults,	
  take	
  care	
  that
	
  
1.  The	
  ar;ficially	
  seeded	
  faults	
  are	
  similar	
  to	
  real	
  faults	
  that	
  
naturally	
  occur	
  in	
  real	
  programs	
  due	
  to	
  mistakes	
  made	
  by	
  
developers.	
  
–  To	
  iden;fy	
  realis;c	
  fault	
  types,	
  a	
  history-­‐based	
  approach	
  can	
  be	
  used,	
  
i.e.	
  “real”	
  faults	
  can	
  be	
  fetched	
  from	
  the	
  bug	
  tracking	
  system	
  and	
  
made	
  sure	
  that	
  these	
  reported	
  faults	
  are	
  an	
  excellent	
  representa;ve	
  
of	
  faults	
  that	
  are	
  introduced	
  by	
  developers	
  during	
  implementa;on.	
  	
  

2.  The	
  faults	
  should	
  be	
  injected	
  in	
  code	
  that	
  is	
  covered	
  by	
  an	
  
adequate	
  number	
  of	
  test	
  cases	
  
– 

	
  e.g.,	
  they	
  may	
  be	
  seeded	
  in	
  code	
  that	
  is	
  executed	
  by	
  more	
  than	
  20	
  
percent	
  and	
  less	
  than	
  80	
  percent	
  of	
  the	
  test	
  cases.	
  

3.  The	
  faults	
  should	
  be	
  injected	
  “fairly,”	
  i.e.,	
  an	
  adequate	
  
number	
  of	
  instances	
  of	
  each	
  fault	
  type	
  is	
  seeded.	
  
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	


41
Defini;on	
  of	
  the	
  framework	
  –	
  data to collect
• 

Effectiveness
– 
– 
– 
– 
– 
– 

Number of test cases designed or generated.
How many invalid test cases are generated.
How many repeated test cases are generated
Number of failures observed.
Number of faults found.
Number of false positives (The test is marked as Failed, when the functionality is
working).
–  Number of false negatives (The test is marked as Passed, when the functionality is not
working).
–  Type and cause of the faults that were found.
–  Estimation (or when possible measured) of coverage reached.

• 

Efficiency

• 

Subjective satisfaction

The FITTEST project is funded by the European Commission (FP7-ICT-257574)
Defini;on	
  of	
  the	
  framework	
  –	
  data to collect
• 

Effectiveness

• 

Efficiency
–  Time needed to learn the testing method.
–  Time needed to design or generate the test cases.
–  Time needed to set up the testing infrastructure (install, configure, develop test drivers,
etc.) (quantitative). (Note: if software is to be developed or other mayor configuration/
installation efforts, it might be a good idea to maintain working diaries).
–  Time needed to test the system and observe the failure (i.e. planning, implementation
and execution) in hours (quantitative).
–  Time needed to identify the fault type and cause for each observed failure (i.e. time to
isolate) (quantitative).

• 

Subjective satisfaction

The FITTEST project is funded by the European Commission (FP7-ICT-257574)
Defini;on	
  of	
  the	
  framework	
  –	
  data to collect
• 

Effectiveness

• 

Efficiency

• 

Subjective satisfaction
–  SUS score (10 question questionnaire with 5 likert-scale and a total score)
–  5 reactions (through reaction cards) that will be used to create a word cloud and ven
diagrams)
–  Emotional face reactions during semi-structured interviews (faces will be evaluated on
a 5 likert-scale from “not at all like this” to “very much like this”).
–  Subjective opinions about the tool.

A mixture of methods for evaluating
subjective satisfaction

The FITTEST project is funded by the European Commission (FP7-ICT-257574)
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	


!

www.usability.serco.com/trump/documents/Suschapt.doc

!!!!!!!!!!!!!!!!!!!!!!Strongly!agree!
!!!!!!!

© Digital Equipment Corporation, 1986

!
!

System	
  Usability	
  Scale	
  (SUS)	
  

!
!
!
!!!!!!!!!!!!!!!!!!!!!!!!!!!Strongly!disagree!
!
!
!
!
!
!
!
!
1.!I!think!that!I!would!like!to!
!
!!!use!this!system!frequently!
!
!
!
!
2.!I!found!the!system!unnecessarily!
!!!complex!
!
!
3.!I!thought!the!system!was!easy!
!!!to!use!!!!!!!!!!!!!!!!!!!!!!! !
!
4.!I!think!that!I!would!need!the!
!!!support!of!a!technical!person!to!
!!!be!able!to!use!this!system!
!
!
5.!I!found!the!various!functions!in!
!!!this!system!were!well!integrated!
!
!
!
!
!
6.!I!thought!there!was!too!much!
!!!inconsistency!in!this!system!
!
!
!
!
!
7.!I!would!imagine!that!most!people!
!!!would!learn!to!use!this!system!
!!!very!quickly! !
!
!
!
8.!I!found!the!system!very!
!!!cumbersome!to!use!
!
!
!
!
9.!I!felt!very!confident!using!the!
!!!system!
!
!
10.!I!needed!to!learn!a!lot!of!
!!!things!before!I!could!get!going!
!!!with!this!system!!
!
!
!

!

45
Why	
  SUS
	
  
•  studies	
  (e.g.	
  [TS04,	
  BKM08])	
  have	
  shown	
  that	
  this	
  simple	
  
ques;onnaire	
  gives	
  most	
  reliable	
  results.	
  
•  SUS	
  is	
  technology	
  agnos;c,	
  making	
  it	
  flexible	
  enough	
  to	
  assess	
  a	
  
wide	
  range	
  of	
  interface	
  technologies.	
  	
  
•  The	
  survey	
  is	
  rela;vely	
  quick	
  and	
  easy	
  to	
  use	
  by	
  both	
  study	
  
par;cipants	
  and	
  administrators.	
  
•  The	
  survey	
  provides	
  a	
  single	
  score	
  on	
  a	
  scale	
  that	
  is	
  easily	
  
understood	
  by	
  the	
  wide	
  range	
  of	
  people	
  (from	
  project	
  managers	
  
to	
  computer	
  programmers)	
  who	
  are	
  typically	
  involved	
  in	
  the	
  
development	
  of	
  products	
  and	
  services	
  and	
  who	
  may	
  have	
  liVle	
  or	
  
no	
  experience	
  in	
  human	
  factors	
  and	
  usability.	
  
•  The	
  survey	
  is	
  nonproprietary,	
  making	
  it	
  a	
  cost	
  effec;ve	
  tool	
  as	
  
well.	
  
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	


46
SUS	
  Scoring
	
  
•  SUS	
  yields	
  a	
  single	
  number	
  represen;ng	
  a	
  composite	
  measure	
  
of	
  the	
  overall	
  usability	
  of	
  the	
  system	
  being	
  studied.	
  Note	
  that	
  
scores	
  for	
  individual	
  items	
  are	
  not	
  meaningful	
  on	
  their	
  own.	
  
•  To	
  calculate	
  the	
  SUS	
  score,	
  first	
  sum	
  the	
  score	
  contribu;ons	
  
from	
  each	
  item.	
  	
  
–  Each	
  item's	
  score	
  contribu;on	
  will	
  range	
  from	
  0	
  to	
  4.	
  	
  
–  For	
  items	
  1,	
  3,	
  5,	
  7	
  and	
  9	
  the	
  score	
  contribu;on	
  is	
  the	
  scale	
  posi;on	
  
minus	
  1.	
  	
  
–  For	
  items	
  2,4,6,8	
  and	
  10,	
  the	
  contribu;on	
  is	
  5	
  minus	
  the	
  scale	
  
posi;on.	
  	
  
–  Mul;ply	
  the	
  sum	
  of	
  the	
  scores	
  by	
  2.5	
  to	
  obtain	
  the	
  overall	
  value	
  of	
  SU.	
  
	
  

•  SUS	
  scores	
  have	
  a	
  range	
  of	
  0	
  to	
  100.	
  
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	


47
SUS	
  is	
  not	
  enough…
	
  
•  In	
  a	
  literature	
  review	
  of	
  180	
  published	
  usability	
  studies,	
  
Hornbaek	
  [Horn06]	
  concludes	
  that	
  measures	
  of	
  sa;sfac;on	
  
should	
  be	
  extended	
  beyond	
  ques;onnaires.	
  
•  So	
  we	
  add	
  more….	
  

The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	


48
Reac;on	
  Cards
	
  

Developed by and © 2002 Microsoft Corporation. All rights reserved.
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	


49
Emo;onal	
  face	
  reac;ons	
  
	
  
•  The	
  idea	
  is	
  to	
  elicit	
  feedback	
  about	
  the	
  product,	
  par;cularly	
  
emo;ons	
  that	
  arose	
  for	
  the	
  par;cipants	
  while	
  talking	
  about	
  the	
  
product	
  (e.g.	
  frustra;on,	
  happiness).	
  	
  	
  
•  We	
  will	
  video	
  tape	
  the	
  users	
  when	
  they	
  respond	
  to	
  the	
  following	
  
two	
  ques;ons	
  during	
  a	
  semi-­‐structured	
  interview.	
  
•  Would	
  you	
  recommend	
  this	
  tool	
  it	
  to	
  other	
  colleagues?	
  	
  
–  If	
  not	
  why	
  
–  If	
  yes	
  what	
  arguments	
  would	
  you	
  use	
  	
  
•  Do	
  you	
  think	
  you	
  can	
  persuade	
  your	
  management	
  to	
  invest	
  in	
  a	
  tool	
  like	
  this?	
  
–  If	
  not	
  why	
  
–  If	
  yes	
  what	
  arguments	
  would	
  you	
  use	
  	
  
Not!at!
all!like!
this!

1!

!

!

2!

!

3!

!

4!

Very!
much!
like!this!

!

5!

6!

7!

!
!
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	


50
Analysing	
  and	
  interpre;ng	
  the	
  data
	
  
•  Depends	
  on	
  the	
  amount	
  of	
  data	
  we	
  have.	
  
•  If	
  we	
  only	
  have	
  1	
  value	
  for	
  each	
  variable,	
  no	
  analysis	
  techniques
	
  
are	
  available	
  and	
  we	
  just	
  present	
  and	
  interpret	
  the	
  data.	
  
•  If	
   we	
   have	
   sets	
   of	
   values	
   for	
   a	
   variable	
   then	
   we	
   need	
   to	
   use
	
  
sta;s;cal	
  methods	
  
–  Descrip;ve	
  sta;s;cs	
  
–  Sta;s;cal	
   tests	
   (or	
   significance	
   tes;ng).	
   In	
   sta;s;cs	
   a	
   result	
   is	
   called
	
  
sta;s;cally	
   significant	
   if	
   it	
   has	
   been	
   predicted	
   as	
   unlikely	
   to	
   have
	
  
occurred	
   by	
   chance	
   alone,	
   according	
   to	
   a	
   pre-­‐determined	
   threshold
	
  
probability,	
  the	
  significance	
  level.	
  

•  Evalua;ng	
   SBST	
   tools,	
   we	
   will	
   always	
   have	
   sets	
   of	
   values	
   for
	
  
most	
  of	
  the	
  variables	
  to	
  deal	
  with	
  randomness	
  

The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	


51
Descrip;ve	
  sta;s;cs
	
  
•  Mean,	
   median,	
   middle,	
   standard	
   devia;on,	
   frequency,
	
  
correla;on,	
  etc	
  
•  Graphical	
   visualisa;on:	
   scaVer	
   plot,	
   box	
   plot,	
   histogram,	
   pie
	
  
charts	
  

The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	


52
http://science.leidenuniv.nl/index.php/ibl/pep/people/Tom_de_Jong/Teaching

Most	
  important	
  is	
  finding	
  te	
  right	
  method	
  

Sta;s;cal	
  test
	
  

The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	


53
 

Defini;on	
  of	
  the	
  framework	
  -­‐	
  Threats	
  
•  Threats to validity (of confounding factors) have to be
minimized
•  These are the effects or situations that might jeopardize
the validity of your results….
•  Those that cannot be prevented have to be reported
•  When working with people we have to consider many
sociological effects:
•  Let us just look at a couple well known ones to give you an
idea….
– 
– 
– 
– 

The learning curve effect
The Hawthorne effect
The Placebo effect
Etc….

The FITTEST project is funded by the European Commission (FP7-ICT-257574)
The	
  learning	
  curve	
  effect
	
  
•  When	
  using	
  new	
  methods/tools	
  people	
  gain	
  familiarity	
  with	
  
their	
  applica;on	
  over	
  ;me	
  (=	
  learning	
  curve).	
  
–  Ini;ally	
  they	
  are	
  likely	
  to	
  them	
  more	
  ineffec;vely	
  than	
  they	
  might	
  a_er	
  a	
  
period	
  of	
  familiarisa;on/learning.	
  

•  Thus	
  the	
  learning	
  curve	
  effect	
  will	
  tend	
  to	
  counteract	
  any	
  
posi;ve	
  effects	
  inherent	
  in	
  the	
  new	
  method/tool.	
  	
  
•  In	
  the	
  context	
  of	
  evalua;ng	
  methods/tools	
  there	
  are	
  two	
  basic	
  
strategies	
  to	
  minimise	
  the	
  learning	
  curve	
  effect	
  (that	
  are	
  not	
  
mutually	
  exclusive):	
  
1.  Provide	
  appropriate	
  training	
  before	
  undertaking	
  an	
  evalua;on	
  exercise;	
  
2.  Separate	
  pilot	
  projects	
  aimed	
  at	
  gaining	
  experience	
  of	
  using	
  a	
  method/
tool	
  from	
  pilot	
  projects	
  that	
  are	
  part	
  of	
  an	
  evalua;on	
  exercise.	
  
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	


55
The	
  Hawthorn	
  effect
	
  
•  When	
  an	
  evalua;on	
  is	
  performed,	
  staff	
  working	
  on	
  the	
  pilot	
  
project(s)	
  may	
  have	
  the	
  percep;on	
  that	
  they	
  are	
  working	
  under	
  
more	
  management	
  scru;ny	
  than	
  normal	
  and	
  may	
  therefore	
  work	
  
more	
  conscien;ously.	
  	
  
•  Name	
  comes	
  from	
  Hawthorne	
  aircra_	
  factory	
  (lights	
  low	
  or	
  high?).	
  
•  The	
  Hawthorn	
  effect	
  would	
  tend	
  to	
  exaggerate	
  posi;ve	
  effects	
  
inherent	
  in	
  a	
  new	
  method/tool.	
  	
  
•  A	
  strategy	
  to	
  minimise	
  the	
  Hawthorne	
  effect	
  is	
  to	
  ensure	
  that	
  a	
  
similar	
  level	
  of	
  management	
  scru;ny	
  is	
  applied	
  to	
  control	
  projects	
  
in	
  your	
  case	
  study	
  (i.e.	
  project(s)	
  using	
  the	
  current	
  method/tool)	
  as	
  
is	
  applied	
  to	
  the	
  projects	
  that	
  are	
  using	
  the	
  new	
  method/tool.	
  

The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	


56
The	
  placebo	
  effect
	
  
•  In	
  medical	
  research,	
  pa;ents	
  who	
  are	
  deliberately	
  given	
  ineffectual	
  treatments	
  
recover	
  if	
  they	
  believe	
  that	
  the	
  treatment	
  will	
  cure	
  them.	
  	
  
•  Also	
  a	
  so_ware	
  engineer	
  who	
  believes	
  that	
  adop;ng	
  some	
  prac;ce	
  (i.e.,	
  
wearing	
  a	
  pink	
  t-­‐shirt)	
  will	
  improve	
  the	
  reliability	
  of	
  his	
  code	
  may	
  succeed	
  in	
  
producing	
  more	
  reliable	
  code.	
  	
  
•  Such	
  a	
  result	
  could	
  not	
  be	
  generalised.	
  
•  In	
  medicine,	
  placebo	
  effect	
  is	
  minimized	
  by	
  not	
  informing	
  the	
  subjects.	
  	
  
•  This	
  cannot	
  be	
  done	
  in	
  the	
  context	
  of	
  tes;ng	
  tool	
  evalua;ons.	
  
•  When	
  evalua;ng	
  methods	
  and	
  tools	
  the	
  best	
  you	
  can	
  do	
  is	
  to:	
  
–  Assign	
  staff	
  to	
  pilot	
  projects	
  using	
  your	
  normal	
  project	
  staffing	
  methods	
  and	
  hope	
  
that	
  the	
  actual	
  selec;on	
  of	
  staff	
  includes	
  the	
  normal	
  mix	
  of	
  enthusiasts,	
  cynics	
  and	
  
no-­‐hopers	
  that	
  normally	
  comprise	
  your	
  project	
  teams.	
  
–  Make	
  a	
  special	
  effort	
  to	
  avoid	
  staffing	
  pilot	
  projects	
  with	
  staff	
  who	
  have	
  a	
  vested	
  
interest	
  in	
  the	
  method/tool	
  (i.e.	
  staff	
  who	
  developed	
  or	
  championed	
  it)	
  or	
  a	
  vested	
  
interest	
  in	
  seeing	
  it	
  fail	
  (i.e.	
  staff	
  who	
  really	
  hate	
  change	
  rather	
  than	
  just	
  resent	
  it).	
  

•  This	
  is	
  a	
  bit	
  like	
  selec;ng	
  a	
  jury.	
  Ini;ally	
  the	
  selec;on	
  of	
  poten;al	
  jurors	
  is	
  at	
  
random,	
  but	
  there	
  is	
  addi;onal	
  screening	
  to	
  avoid	
  jurors	
  with	
  iden;fiable	
  bias.	
  
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	


57
 

Defini;on	
  of	
  the	
  framework	
  -­‐	
  Threats	
  
•  There are many more factors that all need to be identified
•  Not only the people but also the technology:
–  Did the measurement tools (i.e. Coverage) really measure what we
thoughts
–  Are the injected faults really representative?
–  Were the faults injected fairly?
–  Is the pilot project and software representative?
–  Were the used oracles reliable?
–  Etc….

The FITTEST project is funded by the European Commission (FP7-ICT-257574)
 

Defini;on	
  of	
  the	
  framework	
  -­‐	
  Toolbox	
  
• 

Toolbox
–  Demographic questionnaire: This questionnaire must be answered by the testers
before performing the test. This questionnaire aims to obtain the features of the
testers: level of experience in using the tool, years, job, knowledge of similar tools,
–  Satisfaction questionnaire SUS: Questionnaire in order to extract the testers’
satisfaction when they perform the evaluation.
–  Reaction cards
–  Questions for (taped) semi-structured interviews
–  Process for investigating the face reactions in videos
–  An fault taxonomy to classify the found fault.
–  A software testing technique and tools classification/taxonomy
–  Working diaries
–  A fault template to classify each fault detected by means of the test. The template
contains the following information:
•  Time spent to detect the fault
•  Test case that found the fault
•  Cause of the fault: mistake in the implementation, mistake in the design,
mistake in the analysis.
•  Manifestation of the fault in the code
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
 

Applying	
  or	
  instan;a;ng	
  the	
  framework
	
  
• 
• 
• 
• 
• 

Search Based Structural Testing tool [Vos et.al. 2012]
Search Based Functional Testing tool [Vos et. Al. 2013]
Web testing techniques for AJAX applications [MRT08]
Commercial combinatorial testing tool at Sulake (to be presented at ISSTA
workshop next week)
Automated Test Case Generation at IBM (has been send to ESEM)

• 

We have finished other instances:
–  Commercial combinatorial testing tool at SOFTEAM (has been send to ESEM)

• 

Currently we are working on more instantiations:
–  Regression testing priorization technique
–  Continuous testing tool at SOFTEAM
–  Rogue User Testing at SOFTEAM
–  …

The FITTEST project is funded by the European Commission (FP7-ICT-257574)
Combinatorial	
  Tes;ng
	
  
example	
  case	
  study	
  Sulake:	
  Context
	
  
•  Sulake is a Finish company
•  Develops social entertainment games
•  Main product: Habbo hotel
– 
– 
– 
– 
– 
– 

World’s largest virtual community for teenagers
Millions of teenagaers a week all over the world (ages 13-18)
Access direct through the browser or facebook
11 languages available
218.000.000 registered users
11.000.000 visitors / month

•  System can be accessed through wide variety of
browsers, flashplayers (and their versions) than run on
different operating systems!
•  Which combinations to use when testing the system?!
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
Can	
  a	
  tool	
  help?
	
  
•  What	
  about	
  the	
  CTE	
  XL	
  Profesional	
  or	
  CTE	
  for	
  short?	
  
•  Combinatorial	
  Tree	
  Editor:	
  
–  Model	
  your	
  combinatorial	
  problem	
  in	
  a	
  tree	
  
–  Indicate	
  the	
  priori;es	
  of	
  the	
  combina;ons	
  	
  
–  The	
  tool	
  automa;cally	
  generated	
  the	
  best	
  test	
  cases!	
  

The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	


67
Combinatorial	
  Tes;ng
	
  
example	
  case	
  study	
  Sulake:	
  Research	
  Ques;ons
	
  
• 
• 

What do we want to find out?
Research questions:
–  RQ1: Compared to the current test suites used for testing in Sulake, can
the test cases generated by the CTE contribute to the effectiveness of
testing when it is used in real testing environments at Sulake?
–  RQ2: How much effort would be required to introduce the CTE into the
testing processes currently implanted at Sulake?
–  RQ3: How much effort would be required to add the generated test
cases into the testing infrastructure currently used at Sulake?
–  RQ4: How satisfied are Sulake testing practitioners during the learning,
installing, configuring and usage of CTE when it is used in real testing
environment

The FITTEST project is funded by the European Commission (FP7-ICT-257574)
The	
  tes;ng	
  tools	
  evaluated	
  
(the	
  cases	
  or	
  treatments)
	
  

•  Current combinatorial testing practice at Sulake
–  Exploratory testing with feature coverage as objective
–  Based on real user information(i.e. browsers, OS, Flash,
etc.) combinatorial aspects are taken into account

COMPARED TO
•  Classification Tree Editor (CTE)
–  Classify the combinatorial aspects as a classification tree
–  Generate prioritized test cases and select

The FITTEST project is funded by the European Commission (FP7-ICT-257574)
Who	
  is	
  doing	
  the	
  study
	
  
(the	
  subjects)	
  
•  Subjects: 1 senior tester from Sulake (6 years sw devel,
8 years testing experience)

The FITTEST project is funded by the European Commission (FP7-ICT-257574)
Systems	
  Under	
  Test
	
  
(objects	
  or	
  pilot	
  projects)
	
  

•  Objects:
–  2 nightly builds from Habbo
–  Existing test suite that Sulake uses (TSsulake) with 42
automated test cases
–  No known faults, no injection of faults

The FITTEST project is funded by the European Commission (FP7-ICT-257574)
Can we inject
faults?

The	
  protocol
	
  
(scenario	
  4)
	
  

NO

NO

Do we have a
version with known
faults?

Do we have a
company baseline?

YES

Inject faults

do T to make
TST; collect
data

Can we compare with an existing test
suite from company C (i.e. TSC)

Can company C make
another test suite TS N for
comparison using Tknown or
Tunknown

Do we have a
version with known
faults?
NO

3

Do we have a
version with known
faults?

YES

2
6

7

YES

4

do Tknown o
Tunknown to
make TSN ;
collect data

YES

The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	


YES

YES

NO

1

NO

Training on T

NO

NO

YES

5
Collected	
  data
	
  
Variables)
Measuring!effectiveness:!
Number!of!test!cases!
Number!of!invalid!test!cases!
Number!of!repeated!test!cases!
Number!of!failures!observed!
Number!of!fauls!found!
Type!and!cause!of!the!faults!

TSSulake)

TSCTE)

42!
0!
0!
0!
0!
N/A!

Feature!coverage!reached!
All!pairs!coverage!reached!
Measuring!efficiency:!
Time!needed!to!learn!the!CTE!testing!method!
Time!needed!to!design!and!generate!the!test!suite!with!the!CTE!!

100%!
N/A!

68!(selected!42!high!priority)!
0!
26!
12!
2!
1.!critical,!browser!hang!
2.!minor,!broken!UI!element!
40%!
80%!

Time!needed!to!setup!testing!infrastructure!specific!to!CTE!
Time!needed!to!automate!the!test!suite!generated!by!the!CTE!!
Time!needed!to!execute!the!test!suite!
Time!needed!to!identify!fault!types!and!causes!!
Measuring!subjective!satisfaction!
SUS!
Reaction!cards!

N/A!
N/A!
114min!
0min!

Informal!interview!!

N/A!

The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	


N/A!
N/A!

N/A!
N/A!

116!min!
95!min!(62!for!tree,!33!for!
removing!duplicates)!!
74!min!
357!min!
183!min!(both!builds)!
116!min!
50!
Comprehensive,!Dated,!Old,!
Sterile,!Unattractive!
video!

!
Descrip;ve	
  sta;s;cs	
  for	
  efficiency	
  (1)
	
  

The FITTEST project is funded by the European Commission (FP7-ICT-257574)
Descrip;ve	
  sta;s;cs	
  for	
  efficiency	
  (2)
	
  

The FITTEST project is funded by the European Commission (FP7-ICT-257574)
Emo;onal	
  face	
  reac;ons	
  
	
  
4.

1.

Appearance of the tool

Duplicates Test Cases
2.

3.

Readability of the CTE trees
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	


Technical support and user manuals
76
Combinatorial	
  Tes;ng
	
  
example	
  case	
  study	
  Sulake:	
  Conclusions
	
  
RQ1:	
  Compared	
  to	
  the	
  current	
  test	
  suites	
  used	
  for	
  tes;ng	
  in	
  Sulake,	
  can	
  the	
  test	
  cases	
  
generated	
  by	
  the	
  CTE	
  contribute	
  to	
  the	
  effec;veness	
  of	
  tes;ng	
  when	
  it	
  is	
  used	
  in	
  real	
  tes;ng	
  
environments	
  at	
  Sulake?	
  

• 

– 
– 

2	
  new	
  faults	
  were	
  found!!	
  
The	
  need	
  that	
  more	
  structured	
  combinatorial	
  tes;ng	
  is	
  necessary	
  was	
  confirmed	
  by	
  Sulake.	
  

RQ2:	
  How	
  much	
  effort	
  would	
  be	
  required	
  to	
  introduce	
  the	
  CTE	
  into	
  the	
  tes;ng	
  processes	
  
currently	
  implanted	
  at	
  Sulake?	
  

• 

Effort	
  for	
  learning	
  and	
  installing	
  is	
  medium,	
  but	
  can	
  be	
  jus;fied	
  within	
  Sulake	
  being	
  only	
  once	
  
Designing	
  and	
  genera;ng	
  test	
  cases	
  suffers	
  form	
  duplicates	
  that	
  costs	
  ;me	
  to	
  be	
  removed.	
  Total	
  effort	
  can	
  
be	
  accepted	
  within	
  Sulake	
  because	
  cri;cal	
  faults	
  were	
  discovered.	
  
–  Execu;ng	
  the	
  test	
  suite	
  generated	
  by	
  the	
  CTE	
  takes	
  1	
  hour	
  more	
  that	
  Sulake	
  test	
  suite	
  (due	
  to	
  more	
  
combinatorial	
  aspects	
  being	
  tested).	
  This	
  means	
  that	
  Sulake	
  cannot	
  include	
  these	
  tests	
  in	
  daily	
  build,	
  but	
  will	
  
have	
  to	
  add	
  them	
  to	
  nightly	
  builds	
  only.	
  
– 
– 

¢ 

RQ3:	
  How	
  much	
  effort	
  would	
  be	
  required	
  to	
  add	
  the	
  generated	
  test	
  cases	
  into	
  the	
  tes;ng	
  
infrastructure	
  currently	
  used	
  at	
  Sulake?	
  	
  
• 

¢ 

Effort	
  for	
  automa;ng	
  the	
  generated	
  test	
  cases,	
  but	
  can	
  be	
  jus;fied	
  within	
  Sulake.	
  

RQ4:	
  How	
  sa;sfied	
  are	
  Sulake	
  tes;ng	
  prac;;oners	
  during	
  the	
  learning,	
  installing,	
  configuring	
  
and	
  usage	
  of	
  CTE	
  when	
  it	
  is	
  used	
  in	
  real	
  tes;ng	
  environment.	
  
• 

Seems	
  to	
  have	
  everything	
  that	
  is	
  needed	
  but	
  looks	
  unaVrac;ve.	
  

The FITTEST project is funded by the European Commission (FP7-ICT-257574)
Automated	
  Test	
  Case	
  Genera;on
	
  
example	
  case	
  study	
  IBM:	
  context
	
  
•  IBM	
  Research	
  Labs	
  in	
  Haifa,	
  Israel	
  
•  Develop	
  a	
  system	
  (denoted	
  IMP	
  ;-­‐)	
  for	
  resource	
  management	
  
in	
  a	
  networked	
  environment	
  (servers,	
  virtual	
  machines,	
  
switches,	
  storage	
  devices,	
  etc.)	
  
•  IBM	
  Lab	
  has	
  a	
  designated	
  team	
  that	
  is	
  responsible	
  for	
  tes;ng	
  
new	
  versions	
  of	
  the	
  product.	
  
•  The	
  tes;ng	
  is	
  being	
  done	
  within	
  a	
  simulated	
  tes;ng	
  
environment	
  developed	
  by	
  this	
  tes;ng	
  team	
  
•  IBM	
  Lab	
  is	
  interested	
  in	
  evalua;ng	
  the	
  Automated	
  Test	
  Case	
  
Genera;on	
  tools	
  developed	
  in	
  the	
  FITTEST	
  project!	
  

The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	


78
Automated	
  Test	
  Case	
  Genera;on
	
  
example	
  case	
  study	
  IBM:	
  the	
  research	
  ques;ons
	
  
• 

What do we want to find out?

• 

Research questions:
–  RQ1: Compared to the current test suite used for testing at IBM
Research, can the FITTEST tools contribute to the effectiveness of
testing when it is used in real testing environments at IBM?
–  RQ2: Compared to the current test suite used for testing at IBM
Research, can the FITTEST tools contribute to the efficiency of testing
when it is used in real testing environments at IBM?
–  RQ3: How much effort would be required to deploy the FITTEST tools
within the testing processes currently implanted at IBM Research?

The FITTEST project is funded by the European Commission (FP7-ICT-257574)
The	
  tes;ng	
  tools	
  evaluated	
  
(the	
  cases	
  or	
  treatments)
	
  

•  Current test case design practice at IBM
–  Exploratory test case design
–  The objective of test cases is maximise the coverage of
the system use-cases

COMPARED TO
•  FITTEST Automated Test Case Generation tools
–  Only part of the whole continuous testing approach of
FITTEST

The FITTEST project is funded by the European Commission (FP7-ICT-257574)
FITTEST
	
  
Automated	
  Test	
  Case	
  Genera;on
	
  
FSM2Tests
Logs2FSM
•  Takes FSMs and a Domain Input
Specification (DIS) file created by a
tester for the IBM Research SUT to
generate concrete test cases.
•  This component implements a technique
that combines model-based and
combinatorial testing (see [NMT12]):
1.  generate test paths from the FSM
(using various simple and
advanced graph visit algorithms)
2.  transform these paths into
classification trees using the CTE
XL format, enriched with the DIS
such as data types and partitions;
3.  Generate test combinations from
those trees using combinatorial
criteria.
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	


•  Infers FSM models from the logs
•  Applying an event-based model
inference approach (read more in
[MTR08]).
•  The model-based oracles that also
result from this tool refer to the use of
the paths generated from the inferred
FSM as oracles. If these paths, when
transformed to test cases, cannot be
fully executed, then the tester needs to
inspect the failing paths to see if that is
due to some faults, or the paths
themselves are infeasible.
81
Who	
  is	
  doing	
  the	
  study
	
  
(the	
  subjects)	
  
•  Subjects:
–  1 senior tester from IBM (10 years sw devel, 5 years testing
experience of which 4 years with IMP system)
–  1 researcher from FBK (10 years of experience with sw
development, 5 years of experience with research in testing)

The FITTEST project is funded by the European Commission (FP7-ICT-257574)
Pilot	
  project
	
  
	
  

•  Objects:
–  SUT: Distributed application for managing system
resources in a networked environment
•  A management server that communicates with multiple
managed clients (= physical or virtual resources)
•  Important product for IBM with real customers
•  Case study will be performed on a new version of this system

–  Existing test suite that IBM uses (TSibm)
•  Selected from what they call System Validation Tests
(SVT) -> tests for high level complex costumer use-cases
•  Manually designed
•  Automatically executed through activation scripts

–  10 representative faults to inject into the system
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
Can we inject
faults?

The	
  protocol
	
  
(scenario	
  5)
	
  

NO

NO

Do we have a
version with known
faults?

Do we have a
company baseline?

YES

Inject faults

do T to make
TST; collect
data

Can we compare with an existing test
suite from company C (i.e. TSC)

Can company C make
another test suite TS N for
comparison using Tknown or
Tunknown

Do we have a
version with known
faults?
NO

3

Do we have a
version with known
faults?

YES

2
6

7

YES

4

do Tknown o
Tunknown to
make TSN ;
collect data

YES

The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	


YES

YES

NO

1

NO

Training on T

NO

NO

YES

5
More	
  detailed	
  steps
	
  
1. 
2. 
3. 
4. 

Configure	
  the	
  simulated	
  environment	
  and	
  create	
  the	
  logs	
  [IBM]	
  
Select	
  test	
  suite	
  TSibm [IBM]	
  
Write	
  ac;va;on	
  scripts	
  for	
  each	
  test	
  case	
  in	
  TSibm [IBM]	
  
Generate	
  TSfittest [FBK	
  subject]	
  
a. 
b. 
c. 
d. 

Instan;ate	
  FITTEST	
  components	
  for	
  the	
  IMP	
  
Generate	
  the	
  FSM	
  with	
  Logs2FSM	
  
Define	
  the	
  Domain	
  Input	
  Specifica;on	
  (DIS)	
  
Generate	
  the	
  concrete	
  test	
  data	
  with	
  FSM2Tests	
  

5.  Select	
  and	
  inject	
  the	
  faults	
  [IBM]	
  
6.  Develop	
  a	
  tool	
  that	
  transforms	
  the	
  concrete	
  test	
  cases	
  generated	
  
by	
  the	
  FITTEST	
  tool	
  FSM2Tests	
  	
  to	
  an	
  executable	
  format	
  [IBM]	
  
7.  Execute	
  TSibm 	
  [IBM]	
  
8.  Execute	
  TSfittest [IBM]	
  
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	


85
Collected	
  Measures
	
  
TSibm
NA
4
1814

automated cf. Section III-B2

design
activation scripts

effort to create the test suite

84
3054
18520

manual cf. Section III-B1

size
number of abstract test cases
number of concrete test cases
number of commands (or events)
construction
design of the test cases
effort

TSf ittest

set up FITTEST tools
generate the FSM
specify the DIS
generate concrete tests
transform into executable format

5 hours
4 hours

8 hours
automated, less than 1 second CPU time
2 hours
automated, less than 1 minute CPU time
20 hours

II.D ESCRIPTIVE MEASURES FOR THE TEST SUITES TSFSM THAT ittest
D ESCRIPTIVE MEASURES FOR THE ibm AND TSf IS USED TO
GENERATE TSf ittest

TABLE IV.
TABLE

S15

S42

the effecti
can be de
Moreover
point of view w
can help i

GET/VMControl/virtualAppliances/{id}/progress

GET/VMControl/virtualAppliances/{id}/targets

GET/VMControl/virtualAppliances
S34

GET/VMControl/virtualAppliances/{id}

Variable
what the researcher have in mind and what is investigated used toMoreover, from the FITTEST project’s
Number of traces
infer FSM
6
ccording to the research questions. This type of threat is
have the following results:
Average trace length
100
mainly related to the use of injected faults to measure the fault- in generated FSM
Number of nodes
51
RQ3: Ho
finding capability of our testing strategies. This Number of transitions in • The FITTEST tools have shown to be useful with
is because the
generated FSM
134
GET/VMControl/virtualAppliances/{id}/progress

S33

GET/VMControl/virtualAppliances/{id}

GET/VMControl/virtualAppliances/{id}/progress

GET/VMControl/virtualAppliances/{id}/progress GET/VMControl/virtualAppliances/{id} GET/VMControl/virtualAppliances/{id}/targets

GET/VMControl/virtualAppliances/{id}/targets

GET/VMControl/virtualAppliances/{id}/progress

S20

GET/VMControl/virtualAppliances/{id}

GET/VMControl/virtualAppliances/{id}/targets

S22

GET/VMControl/virtualAppliances/{id}/targets

GET/resources/OperatingSystem?Props=IPv4Address

GET/VMControl/virtualAppliances/{id}/targets

GET/VMControl/virtualAppliances/{id} GET/VMControl/virtualAppliances/{id}/progress

GET/VMControl/virtualAppliances/{id}

S37

GET/VMControl/virtualAppliances/{id}/targets

GET/VMControl/virtualAppliances

GET/VMControl/virtualAppliances

GET/VMControl/virtualAppliances/{id}

GET/VMControl/virtualAppliances/{id}/progress

S40

S48

GET/VMControl/virtualAppliances/{id}/targets

S41

GET/isdsettings/restcompatibilities

GET/VMControl/virtualAppliances/{id}/targets

the context of a real industrial case. FITTEST
ypes of faults seeded may not be enough representative of real
aults. In order to mitigate this threat, the IBM team identified
• The FITTEST tools have the ability toat IBM R
automate th
epresentative faults that were based on real faults, identified
testing process within a real industrial case.
As ca
n earlier time of the development. This identification although
was realized by Thesenior project isat IBMwas revised by(FP7-ICT-257574) 	

 FITTEST technologies contribute 86 FITTEST
a FITTEST tester,funded list European Commission all IBM
the by the Research, can the
ACKNOWLEDGMENT
eam that participated in this casethe effectiveness of testing when it is used in the testing
to study.
This work was financed by the FITTEST Input Spe
project, IC
GET/VMControl/virtualAppliances/{id}

GET/VMControl/virtualAppliances

GET/VMControl/virtualAppliances/{id}/targets
S2

GET/VMControl/virtualAppliances/{id}

S26

GET/VMControl/virtualAppliances

S36

GET/VMControl/virtualAppliances/{id}/progress

GET/isdsettings/restcompatibilities

GET/VMControl

S19

GET/VMControl/workloads
S38

GET/resources/System/{id}/access/a315017165998

GET/VMControl/workloads/{id}
S31
As can be seen from Table IV, the TSibm is substantial
smaller in size than the TSf ittest in all parameters, this is one
of the evident results of automation. However, not only the
size of TSf ittest is bigger, also the effectiveness of TSf ittest ,
measured by the injected faults coverage (see Figure 4), is
significantly higher (50% vs 70%). Moreover, if we would
combine the TSibm and TSf ittest suites, the effectiveness
increases to 80%. Therefore, within the context of the studied
environment, for IBM Research the FITTEST technologies can
contribute to the effectiveness of testing and IBM Research has
decided that, for optimizing faults-finding capability, the two
techniques can best be combined.

Collected	
  measures
	
  

RQ2: Compared to the current test suite used for testing at
IBM Research, can the FITTEST technologies contribute to the
IF1$
IF2$
IF3$
IF4$
IF5$
IF6$
IF7$
IF8$
IF9$
IF10$
efficiency of testing when it is used in the testing environments
TS_ibm$
1$
0$
0$
0$
1$
1$
0$
0$
1$
1$
at IBM Research?
TS_fi5est$
1$
1$
1$
0$
0$
1$
0$
1$
1$
1$
TABLE III.

E FFICIENCY MEASURES FOR EXECUTION OF BOTH TEST
SUITES .

Variable
Execution Time
with fault injection
Execution Time
without fault injection

TSibm

TSf ittest

36.75

normalized
by size
9.18

127.87

normalized
by size
1.52

27.97

6.99

50.72

0.60

It can be seen from Table III that the time to execute TSibm
is smaller than the (FP7-ICT-257574) 	

The FITTEST project is funded by the European Commissiontime to execute TSf ittest . This is due to the
number of concrete tests in TSf ittest . When we normalize the

Internal validi
are examined. In o
is related to the lo
vironment to be us
models. Because of
the content of the i
have asked IBM for
is that the quality
the completeness o
because incomplete
of the TSf ittest . In
number of detected
can be improved, s
conclusion about t
unchanged. Regard
although they had
working in the in
knowledge of the F
means of a closer
complementing the
mistakes or misund

External validi
is possible to gene
findings are of inter
case. Our results r
given set of artifici
expensive in terms
87
it with in order to
However, as discus
Automated	
  Test	
  Case	
  Genera;on
	
  
example	
  case	
  study	
  IBM:	
  Conclusions
	
  
• 

RQ1: Compared to the current test suite used for testing at IBM Research, can the
FITTEST tools contribute to the effectiveness of testing when it is used in real
testing environments at IBM?
–  TSibm finds 50% of injected faults, TSfittest finds 70%
–  Together they find 80%! -> IBM will consider combining the techniques

• 

RQ2: Compared to the current test suite used for testing at IBM Research, can the
FITTEST tools contribute to the efficiency of testing when it is used in real testing
environments at IBM?
–  FITTEST test cases execute faster because they are smaller
–  Shorter tests was good for IBM -> easier to identify faults

• 

RQ3: How much effort would be required to deploy the FITTEST tools within the
testing processes currently implanted at IBM Research?
–  Found reasonable by IBM considering the fact that manual tasks need to be done only
once and more faults were fund.
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
Threats	
  to	
  validity
	
  
The	
  learning	
  curve	
  effect	
  
(an	
  basically	
  all	
  the	
  other	
  human	
  factors	
  ;-­‐)	
  
Missing	
  informa;on	
  in	
  the	
  logs	
  leads	
  to	
  weak	
  FSMs.	
  
Incomplete	
  specifica;on	
  of	
  the	
  DIS	
  lead	
  to	
  weak	
  concrete	
  test	
  
cases.	
  
•  The	
  representa;veness	
  of	
  the	
  injected	
  faults	
  to	
  real	
  faults.	
  
• 
• 
• 
• 

The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	


89
Final	
  Things	
  ……	
  
	
  
•  As	
  researchers,	
  we	
  should	
  concentrate	
  on	
  future	
  problems	
  the	
  
industry	
  will	
  face.	
  ¿no?	
  
	
  
•  How	
  can	
  we	
  claim	
  to	
  know	
  future	
  needs	
  without	
  understanding	
  
current	
  ones?	
  
•  Go	
  to	
  industry	
  and	
  evaluate	
  your	
  results!	
  

The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	


90
Need	
  any	
  help?	
  
More	
  informa;on?	
  	
  
Want	
  to	
  do	
  an	
  instan;a;on?	
  
	
  
Contact:	
  
	
  
Tanja	
  E.	
  J.	
  Vos	
  
	
  
email:	
  tvos@pros.upv.es	
  
skype:	
  tanja_vos	
  
web:	
  hVp://tanvopol.webs.upv.es/	
  
project:	
  hVp://www.facebook.com/FITTESTproject	
  
telephone:	
  +34	
  690	
  917	
  971	
  	
  
The FITTEST project is funded by the European Commission (FP7-ICT-257574)
References
	
  

empirical	
  research	
  in	
  so_ware	
  engineering
	
  
[BSH86]	
  V	
  R	
  Basili,	
  R	
  W	
  Selby,	
  and	
  D	
  H	
  Hutchens.	
  Experimenta;on	
  in	
  so_ware	
  engineering.	
  IEEE	
  Trans.	
  So_w.	
  Eng.,	
  12:733–743,	
  July	
  1986.	
  
[BSL99]	
   Victor	
   R.	
   Basili,	
   Forrest	
   Shull,	
   and	
   Filippo	
   Lanubile.	
   Building	
   knowledge	
   through	
   families	
   of	
   experiments.	
   IEEE	
   Trans.	
   So_w.	
   Eng.,
	
  
25(4):456–473,	
  1999.	
  
[CWH05]	
  M.	
  Host	
  C.	
  Wohlin	
  and	
  K.	
  Henningsson.	
  Empirical	
  research	
  methods	
  in	
  so_ware	
  and	
  web	
  engineering.	
  In	
  E.	
  Mendes	
  and	
  N.	
  Mosley,
	
  
editors,	
  Web	
  Engineering	
  -­‐	
  Theory	
  and	
  Prac;ce	
  of	
  Metrics	
  and	
  Measurement	
  	
  for	
  Web	
  Development,	
  	
  pages	
  409–	
  429,	
  2005.	
  
[Fly04]	
   Bent	
   Flyvbejrg.	
   Five	
   misunderstandings	
   about	
   case-­‐study	
   research.	
   	
   In	
   Jaber	
   F.	
   Gubrium	
   Clive	
   Seale,	
   Giampietro	
   Gobo	
   and	
   David
	
  
Silverman,	
  editors,	
  Qualita;ve	
  Research	
  	
  Prac;ce.,	
  pages	
  420–434.	
  London	
  	
  and	
  Thousand	
  Oaks,	
  CA.,	
  2004.	
  
[KLL97]	
  B.	
  Kitchenham,	
  S.	
  Linkman,	
  and	
  D.	
  Law.	
  Desmet:	
  a	
  methodology	
  for	
  evalua;ng	
  so_ware	
  engineering	
  	
  
[KPP+	
  02]	
  Barbara	
  A.	
  Kitchenham,	
  Shari	
  Lawrence	
  Pfleeger,	
  Lesley	
  M.	
  Pickard,	
  Peter	
  W.	
  Jones,	
  David	
  C.	
  Hoaglin,	
  Khaled	
  El	
  Emam,	
  and	
  JarreV	
  
Rosenberg.	
  	
  Preliminary	
  guidelines	
  	
  for	
  empirical	
  research	
  in	
  so_ware	
  engineering.	
  IEEE	
  	
  Trans.	
  So_w.	
  Eng.,	
  28(8):721–734,	
  	
  2002.	
  
[LSS05]	
  Timothy	
  C.	
  Lethbridge,	
  Susan	
  EllioV	
  Sim,	
  and	
  Janice	
  Singer.	
  Studying	
  so_ware	
  engineers:	
  Data	
  	
  collec;on	
  techniques	
  for	
  so_ware	
  
field	
  studies.	
  Empirical	
  So_ware	
  Engineering,	
  10:311–	
  341,	
  2005.	
  	
  10.1007/s10664-­‐005-­‐1290-­‐x.	
  
[TTDBS07]	
  Paolo	
  Tonella,	
  Marco	
  Torchiano,	
  Bart	
  	
  Du	
  	
  Bois,	
  	
  and	
  Tarja	
  Syst¨a.	
  	
  Empirical	
  studies	
  in	
  reverse	
  engineering:	
  	
  state	
  of	
  the	
  art	
  and	
  
future	
  trends.	
  Empirical	
  So_w.	
  Engg.,	
  12(5):551–571,	
  2007.	
  
[WRH+00]	
  Claes	
  Wohlin,	
  Per	
  Runeson,	
  Mar;n	
  Host,	
  Magnus	
  C.	
   	
  Ohls-­‐	
  son,	
  Bjorn	
  Regnell,	
  and	
  Anders	
  Wesslen.	
  Experimenta;on	
  in	
  so_ware
	
  
engineering:	
  	
  an	
  introduc;on.	
  Kluwer	
  Academic	
  Publishers,	
  Norwell,	
  MA,	
  USA,	
  2000.	
  
[Rob02]	
  Colin	
  Robson.	
  Real	
  World	
  Research.	
  Blackwell	
  Publishing	
  Limited,	
  2002.	
  
[RH09]	
   Per	
   Runeson	
   and	
   Mar;n	
   Host.	
   Guidelines	
   for	
   conduc;ng	
   and	
   repor;ng	
   case	
   study	
   research	
   in	
   so_ware	
   engineering.	
   Empirical	
   So_w.
	
  
Engg.,	
  14(2):131–164,	
  	
  2009.	
  
[PPV00]	
   Dewayne	
   E.	
   Perry,	
   Adam	
   A.	
   Porter,	
   and	
   Lawrence	
   G.	
   VoVa.	
   2000.	
   Empirical	
   studies	
   of	
   so_ware	
   engineering:	
   a	
   roadmap.	
   In
	
  
Proceedings	
  of	
  the	
  Conference	
  on	
  The	
  Future	
  of	
  So?ware	
  Engineering	
  (ICSE	
  '00).	
  ACM,	
  New	
  York,	
  NY,	
  USA,	
  345-­‐355.	
  
[EA06]	
  hVp://www.cs.toronto.edu/~sme/case-­‐studies/case_study_tutorial_slides.pdf	
  
[Ple95]	
  Pfleeger,S.L.;	
  Experimental	
  design	
  and	
  analysis	
  in	
  so_ware	
  engineering.	
  Annals	
  of	
  So_ware	
  Engineering	
  1,	
  219-­‐253,	
  1995.	
  
[PK01-­‐03]	
  Pfleeger,	
  S.L.	
  and	
  Kitchenham,	
  B.A..	
  Principles	
  of	
  Survey	
  Research.	
  So_ware	
  Engineering	
  Notes,	
  (6	
  parts)	
  Nov	
  2001	
  -­‐	
  Mar	
  2003.	
  
[Fly06]	
  Flyvbjerg,	
  B.;	
  Five	
  Misunderstandings	
  about	
  Case	
  Study	
  Research.	
  Qualita;ve	
  Inquiry	
  12	
  (2)	
  219-­‐245,	
  April	
  2006	
  
[KPP95]	
  Kitchenham,	
  B.;	
  Pickard,	
  L.;	
  Pfleeger,	
  Shari	
  Lawrence,	
  "Case	
  studies	
  for	
  method	
  and	
  tool	
  evalua;on,"	
  So?ware,	
  IEEE	
  ,	
  vol.12,	
  no.4,
	
  
pp.52,62,	
  Jul	
  1995	
  
The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	

92
	
  
	
  
References
	
  

Empirical	
  research	
  in	
  so_ware	
  tes;ng
	
  
[BS87]	
  Victor	
  R.	
   	
  Basili	
  and	
  Richard	
  W.	
   	
  Selby.	
   	
   	
  Comparing	
  the	
  effec;veness	
  of	
  so_ware	
  tes;ng	
  strategies.	
  IEEE	
  Trans.	
  So_w.	
  Eng.,
	
  
13(12):1278–1296,	
  1987.	
  
[DRE04]	
   Hyunsook	
   Do,	
   Gregg	
   Rothermel,	
   and	
   Sebas;an	
   Elbaum.	
   	
   In-­‐	
   frastructure	
   support	
   for	
   controlled	
   experimenta;on	
   with	
   so_-­‐
	
  
ware	
  tes;ng	
  and	
  regression	
  tes;ng	
  techniques.	
  In	
  Proc.	
  Int.	
  Symp.	
  On	
  Empirical	
  	
  So_ware	
  Engineering,	
  	
  ISESE	
  ’04,	
  2004.	
  
[EHP+	
  06]	
   	
  Sigrid	
   	
  Eldh,	
  Hans	
  Hansson,	
  Sasikumar	
  Punnekkat,	
  Anders	
  Pet-­‐	
  tersson,	
  and	
  Daniel	
  Sundmark.	
  A	
  framework	
  for	
  comparing
	
  
efficiency,	
  effec;veness	
  and	
  applicability	
  of	
  so_ware	
  tes;ng	
  tech-­‐	
  niques.	
  Tes;ng:	
  Academic	
  &	
  Industrial	
  Conference	
  on	
  Prac;ce	
  And
	
  
Research	
  Techniques	
  (TAIC	
  	
  part),	
  0:159–170,	
  2006.	
  
[JMV04]	
  Natalia	
  Juristo,	
  Ana	
  M.	
  Moreno,	
  and	
  Sira	
  Vegas.	
  Reviewing	
  25	
  years	
  of	
  tes;ng	
  technique	
  experiments.	
  Empirical	
  So_w.	
  Engg.,
9(1-­‐2):7–44,	
  2004.	
  
[RAT+	
  06]	
  Per	
  Runeson,	
  Carina	
  Andersson,	
  Thomas	
  Thelin,	
  Anneliese	
  Andrews,	
  and	
  Tomas	
  Berling.	
  	
  What	
  do	
  we	
  know	
  about	
  de-­‐	
  fect	
  
detec;on	
  methods?	
  IEEE	
  So_w.,	
  23(3):82–90,	
  2006.	
  
[VB05]	
  Sira	
  Vegas	
  and	
  Victor	
  Basili.	
  	
  A	
  	
  characterisa;on	
  	
  schema	
  for	
  so_ware	
  tes;ng	
  Techniques.	
  Empirical	
  So_w.	
  Engg.,	
  10(4):437–
466,	
  2005.	
  
[LR96]	
  Christopher	
  M.	
  LoV	
  and	
  H.	
  Dieter	
  Rombach.	
  Repeatable	
  so_ware	
  engineering	
  	
  experiments	
  	
  for	
  comparing	
  defect-­‐detec;on	
  
techniques.	
  Empirical	
  	
  So_ware	
  Engineering,	
  	
  1:241–277,	
  	
  1996.	
  10.1007/BF00127447.	
  
	
  
	
  
	
  

The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	


93
References
	
  

Descrip;ons	
  of	
  empirical	
  studies	
  done	
  in	
  so_ware	
  tes;ng
	
  
[Vos	
  et.	
  Al.	
  2010]	
  T.	
  E.	
  J.	
  Vos,	
  A.	
  I.	
  Baars,	
  F.	
  F.	
  Lindlar,	
  P.	
  M.	
  Kruse,	
  A.	
  Windisch,	
  and	
  J.	
  Wegener,	
  “Industrial	
  scaled	
  automated	
  structural	
  
tes;ng	
  with	
  the	
  evolu;onary	
  tes;ng	
  tool,”	
  in	
  ICST,	
  2010,	
  pp.	
  175–184.	
  	
  
[Vos	
  et.	
  al	
  2012]	
  Tanja	
  E.	
  J.	
  Vos,	
  Arthur	
  I.	
  Baars,	
  Felix	
  F.	
  Lindlar,	
  Andreas	
  Windisch,	
  Benjamin	
  Wilmes,	
  Hamilton	
  Gross,	
  Peter	
  M.	
  Kruse,	
  
Joachim	
  Wegener:	
  Industrial	
  Case	
  Studies	
  for	
  Evalua;ng	
  Search	
  Based	
  Structural	
  Tes;ng.	
  Interna;onal	
  Journal	
  of	
  So_ware	
  Engineering	
  
and	
  Knowledge	
  Engineering	
  22(8):	
  1123-­‐	
  (2012)	
  
[Vos	
  et.	
  Al.	
  2013]	
  T.	
  E.	
  J.	
  Vos,	
  F.	
  Lindlar,	
  B.	
  Wilmes,	
  A.	
  Windisch,	
  A.	
  Baars,	
  P.	
  Kruse,	
  H.	
  Gross,	
  and	
  J.	
  Wegener,	
  “Evolu;onary	
  func;onal	
  
black-­‐box	
  tes;ng	
  in	
  an	
  industrial	
  semng,”	
  So?ware	
  Quality	
  Journal,	
  21	
  (2):	
  259-­‐288	
  (2013)	
  
[MRT08]	
  A.	
  MarcheVo,	
  F.	
  Ricca,	
  and	
  P.	
  Tonella,	
  “A	
  case	
  study-­‐	
  based	
  comparison	
  of	
  web	
  tes;ng	
  techniques	
  applied	
  to	
  ajax	
  web	
  
applica;ons,”	
  Interna;onal	
  Journal	
  on	
  So_ware	
  Tools	
  for	
  Technology	
  Transfer	
  (STTT),	
  vol.	
  10,	
  pp.	
  477–	
  492,	
  2008	
  	
  

The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	


94
Other	
  references
	
  
[TS04]	
  T.S.	
  Tullis	
  and	
  J.	
  N.	
  Stetson.	
  A	
  comparison	
  of	
  ques;onnaires	
  for	
  assessing	
  website	
  usability.	
  In	
  Proceedings	
  of	
  the	
  Usability	
  
Professionals	
  Associa;on	
  Conference,	
  2004.	
  
[BKM08]	
  Aaron	
  Bangor,	
  Philip	
  T.	
  	
  Kortum,	
  and	
  James	
  T.	
  	
  Miller.	
  	
  An	
  empirical	
  evalua;on	
  of	
  the	
  system	
  usability	
  scale.	
  Interna;onal	
  
Journal	
  of	
  Human-­‐computer	
  	
  Interac;on,	
  	
  24:574–594,	
  2008.	
  
[Horn06]	
  Hornbæk,	
  K.	
  (2006),	
  "Current	
  Prac8ce	
  in	
  Measuring	
  Usability:	
  Challenges	
  to	
  Usability	
  Studies	
  and	
  Research",	
  Interna8onal	
  
Journal	
  of	
  Human-­‐Computer	
  Studies,	
  Volume	
  64,	
  Issue	
  2	
  ,	
  February	
  2006,	
  Pages	
  79-­‐102.	
  	
  
[MTR08]	
  MarcheVo,	
  Alessandro;	
  Tonella,	
  Paolo	
  and	
  Ricca,	
  Filippo.,	
  State-­‐based	
  tes;ng	
  of	
  Ajax	
  web	
  applica;ons,	
  So_ware	
  Tes;ng,	
  
Verifica;on,	
  and	
  Valida;on,	
  2008	
  1st	
  Interna;onal	
  Conference	
  on,	
  pp121-­‐130,	
  IEEE,	
  2008.	
  
[NMT12]	
  C.	
  D.	
  Nguyen,	
  A.	
  MarcheVo,	
  and	
  P.	
  Tonella.	
  Combining	
  model-­‐based	
  and	
  combinatorial	
  tes;ng	
  for	
  effec;ve	
  test	
  case	
  
genera;on.	
  In	
  Proceedings	
  of	
  the	
  2012	
  Interna;onal	
  Symposium	
  on	
  So_ware	
  Tes;ng	
  and	
  Analysis,	
  pages	
  100-­‐110,	
  ACM,	
  2012.	
  
	
  

The FITTEST project is funded by the European Commission (FP7-ICT-257574) 	


95

More Related Content

Similar to TAROT summerschool slides 2013 - Italy

SOPHIA by Philippe Malbranch - Maghrenov workshop on research infrastructures...
SOPHIA by Philippe Malbranch - Maghrenov workshop on research infrastructures...SOPHIA by Philippe Malbranch - Maghrenov workshop on research infrastructures...
SOPHIA by Philippe Malbranch - Maghrenov workshop on research infrastructures...
Maghrenov
 
Horizon2020 - Engagement in EU projects, Professor Roger Woods, Analytics Eng...
Horizon2020 - Engagement in EU projects, Professor Roger Woods, Analytics Eng...Horizon2020 - Engagement in EU projects, Professor Roger Woods, Analytics Eng...
Horizon2020 - Engagement in EU projects, Professor Roger Woods, Analytics Eng...
Invest Northern Ireland
 

Similar to TAROT summerschool slides 2013 - Italy (20)

Julie Marguerite - Tefis open calls (fia dec 2010)
Julie Marguerite - Tefis open calls  (fia dec 2010)Julie Marguerite - Tefis open calls  (fia dec 2010)
Julie Marguerite - Tefis open calls (fia dec 2010)
 
Tien3
Tien3Tien3
Tien3
 
IoT.est Project ID Card
IoT.est Project ID CardIoT.est Project ID Card
IoT.est Project ID Card
 
About IRT Nanoelec
About IRT NanoelecAbout IRT Nanoelec
About IRT Nanoelec
 
ECoVEM DL IEEE EdSoc Tunis UNED 2022 07 final.pptx
ECoVEM DL IEEE EdSoc Tunis UNED 2022 07 final.pptxECoVEM DL IEEE EdSoc Tunis UNED 2022 07 final.pptx
ECoVEM DL IEEE EdSoc Tunis UNED 2022 07 final.pptx
 
resume
resumeresume
resume
 
Fire at Net Futures2015
Fire at Net Futures2015Fire at Net Futures2015
Fire at Net Futures2015
 
COMNET as a Capacity-Building Partner for Two African Institutions
COMNET as a Capacity-Building Partner for Two African InstitutionsCOMNET as a Capacity-Building Partner for Two African Institutions
COMNET as a Capacity-Building Partner for Two African Institutions
 
Tien3
Tien3Tien3
Tien3
 
[OOFHEC2018] Manuel Castro: Identifying the best practices in e-engineering t...
[OOFHEC2018] Manuel Castro: Identifying the best practices in e-engineering t...[OOFHEC2018] Manuel Castro: Identifying the best practices in e-engineering t...
[OOFHEC2018] Manuel Castro: Identifying the best practices in e-engineering t...
 
SOPHIA by Philippe Malbranch - Maghrenov workshop on research infrastructures...
SOPHIA by Philippe Malbranch - Maghrenov workshop on research infrastructures...SOPHIA by Philippe Malbranch - Maghrenov workshop on research infrastructures...
SOPHIA by Philippe Malbranch - Maghrenov workshop on research infrastructures...
 
EADTU 2018 conference e-LIVES project
EADTU 2018 conference e-LIVES project EADTU 2018 conference e-LIVES project
EADTU 2018 conference e-LIVES project
 
L4MS Webinar - All you need to know (25th Oct)
L4MS Webinar - All you need to know (25th Oct)L4MS Webinar - All you need to know (25th Oct)
L4MS Webinar - All you need to know (25th Oct)
 
Horizon2020 - Engagement in EU projects, Professor Roger Woods, Analytics Eng...
Horizon2020 - Engagement in EU projects, Professor Roger Woods, Analytics Eng...Horizon2020 - Engagement in EU projects, Professor Roger Woods, Analytics Eng...
Horizon2020 - Engagement in EU projects, Professor Roger Woods, Analytics Eng...
 
Testing Challenges and Approaches in Edge Computing
Testing Challenges and Approaches in Edge ComputingTesting Challenges and Approaches in Edge Computing
Testing Challenges and Approaches in Edge Computing
 
Advancing the JISC Access & Identity Management Programme
Advancing the JISC Access & Identity Management ProgrammeAdvancing the JISC Access & Identity Management Programme
Advancing the JISC Access & Identity Management Programme
 
Process assessment for use in very small enterprises the noemi assessment met...
Process assessment for use in very small enterprises the noemi assessment met...Process assessment for use in very small enterprises the noemi assessment met...
Process assessment for use in very small enterprises the noemi assessment met...
 
Process assessment for use in very small enterprises the noemi assessment met...
Process assessment for use in very small enterprises the noemi assessment met...Process assessment for use in very small enterprises the noemi assessment met...
Process assessment for use in very small enterprises the noemi assessment met...
 
JISC's AIM programme
JISC's AIM programmeJISC's AIM programme
JISC's AIM programme
 
An Open and Improved VISIR System Through PILAR Federation for Electrical/Ele...
An Open and Improved VISIR System Through PILAR Federation for Electrical/Ele...An Open and Improved VISIR System Through PILAR Federation for Electrical/Ele...
An Open and Improved VISIR System Through PILAR Federation for Electrical/Ele...
 

More from Tanja Vos

SBST 2015 - 3rd Tool Competition for Java Junit test Tools
SBST 2015 - 3rd Tool Competition for Java Junit test ToolsSBST 2015 - 3rd Tool Competition for Java Junit test Tools
SBST 2015 - 3rd Tool Competition for Java Junit test Tools
Tanja Vos
 

More from Tanja Vos (8)

Impress project: Goals and Achievements @ cseet2020
Impress project: Goals and Achievements @ cseet2020Impress project: Goals and Achievements @ cseet2020
Impress project: Goals and Achievements @ cseet2020
 
FormalZ @ cseet2020
FormalZ @ cseet2020FormalZ @ cseet2020
FormalZ @ cseet2020
 
A-TEST2017
A-TEST2017A-TEST2017
A-TEST2017
 
SBST 2015 - 3rd Tool Competition for Java Junit test Tools
SBST 2015 - 3rd Tool Competition for Java Junit test ToolsSBST 2015 - 3rd Tool Competition for Java Junit test Tools
SBST 2015 - 3rd Tool Competition for Java Junit test Tools
 
Software Testing Innovation Alliance
Software Testing Innovation AllianceSoftware Testing Innovation Alliance
Software Testing Innovation Alliance
 
Testar2014 presentation
Testar2014 presentationTestar2014 presentation
Testar2014 presentation
 
Esem2014 presentation
Esem2014 presentationEsem2014 presentation
Esem2014 presentation
 
Testar
TestarTestar
Testar
 

Recently uploaded

The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
heathfieldcps1
 

Recently uploaded (20)

Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
 

TAROT summerschool slides 2013 - Italy

  • 1. European initiatives where academia and industry get together.   Tanja E.J. Vos Centro de Investigación en Métodos de Producción de Software Universidad Politecnica de Valencia Valencia, Spain tvos@pros.upv.es The FITTEST project is funded by the European Commission (FP7-ICT-257574) 1
  • 2. Overview   •  EU  projects  (with  SBST)  that  I  have  been  coordina;ng:   –  EvoTest  (2006-­‐2009)   –  FITTEST  (2010-­‐2013)   •  What  is  means  to  coordinate  them  and  how  they  are  structured   •  How  do  we  evaluate  their  results  through  academia-­‐industry   projects.   The FITTEST project is funded by the European Commission (FP7-ICT-257574) 2
  • 3. EvoTest   •  •  •  •  Evolutionary Testing for Complex Systems September 2006– September 2009 Total costs: 4.300.000 euros Partners:   –  –  –  –  –  –  –  Universidad  Politecnica  de  Valencia  (Spain)   University  College  London  (United  Kingdom)   Daimler  Chrysler  (Germany)   Berner  &  MaVner  (Germany)   Fraunhofer  FIRST  (Germany)   Motorola  (UK)   Rila  Solu;ons  (Bulgaria)   •  Website  does  not  work  anymore The FITTEST project is funded by the European Commission (FP7-ICT-257574)
  • 4. EvoTest  objec;ves/results   •  Apply  Evolu;onary  Search  Based  Tes;ng  techniques  to  solve  tes;ng  problems   from  a  wide  spectrum  of  complex  real  world  systems    in  an  industrial  context.     •  Improve  the  power  of  evolu;onary  algorithms  for  searching  important  test   scenarios,  hybridising  with  other  techniques:   –  other  general-­‐purpose  search  techniques,   –  other  advanced  so_ware  engineering  techniques,  such  as  slicing  and  program  transforma;on.     •  An  extensible  and  open  Automated  Evolu;onary  Tes;ng  Architecture  and   Framework  will  be  developed.  This  will  provide  general  components  and   interfaces  to  facilitate  the  automa;c  genera;on,  execu;on,  monitoring  and   evalua;on  of  effec;ve  test  scenarios.       The FITTEST project is funded by the European Commission (FP7-ICT-257574) 4
  • 5. FITTEST   •  •  •  •  Future Internet Testing September 2010 – December 2013 Total costs: 5.845.000 euros Partners:   –  –  –  –  –  –  –  Universidad  Politecnica  de  Valencia  (Spain)   University  College  London  (United  Kingdom)   Berner  &  MaVner  (Germany)   IBM  (Israel)   Fondazione  Bruno  Kessler  (Italy)   Universiteit  Utrecht  (The  Netherlands)   So_team  (France)     •   hVp://www.pros.upv.es/fiVest/   The FITTEST project is funded by the European Commission (FP7-ICT-257574)
  • 6. FITTEST  objec;ves/results   •  Future Internet Applications –  Characterized by an extreme high level of dynamism –  Adaptation to usage context (context awareness) –  Dynamic discovery and composition of services –  Etc.. •  Testing of these applications gets extremely important •  Society depends more and more on them •  Critical activities such as social services, learning, finance, business. •  Traditional testing is not enough –  Testwares are fixed •  Continuous testing is needed –  Testwares that automatically adapt to the dynamic behavior of the Future Internet application –  This is the objective of FITTEST The FITTEST project is funded by the European Commission (FP7-ICT-257574)
  • 7. How  does  it  work?   LOGGING TEST-WARE GENERATION LOGGING TEST EXECUTION TEST EVALUATION OPTIONAL MANUAL EXTENSIONS Domain 1.  Run Input target System that is Under Test the Specifications (SUT) 2.  Collect the logs it generates Domain experts End-users INSTRUMENT ANALYSE & INFER MODELS RUN SUT SUT GENERATE TEST CASES This can be done by: Model based MANUALLY oracles COLLECT & PREPROCESS LOGS •  real usage by end users of the application AUTOMATE TEST CASES in the production environment ANALYSE & INFER ORACLES EXECUTE TEST CASES Log based oracles EVALUATE TEST CASES •  test case execution in the test environment. TEST RESULTS HUMAN ORACLES PROPERTIES BASED ORACLES PATTERN BASED ORACLES MODEL BASED ORACLES FREE ORACLES ORACLES The FITTEST project is funded by the European Commission (FP7-ICT-257574)
  • 8. How  does  it  work?     GENERATION LOGGING TEST-WARE GENERATION 2.  Generate different testwares: Domain Input Specifications sers ANALYSE & INFER MODELS GENERATE TEST CASES RUN SUT MANUALLY Model based oracles COLLECT & PREPROCESS LOGS ANALYSE & INFER ORACLES TEST EVALUATION 1.  Analyse the logs OPTIONAL MANUAL EXTENSIONS experts UMENT TEST EXECUTION AUTOMATE TEST CASES Log based oracles •  Models •  Domain Input Specification •  Oracles 3.  Use these to generate and automate a test suite consisting EXECUTE EVALUATE TEST CASES off: TEST CASES TEST RESULTS HUMAN ORACLES PROPERTIES BASED ORACLES PATTERN BASED ORACLES MODEL BASED ORACLES FREE ORACLES ORACLES The FITTEST project is funded by the European Commission (FP7-ICT-257574) •  Abstract test cases •  Concrete test cases •  Pass/Fail Evaluation criteria
  • 9. How  does  it  work?     LOGGING TEST-WARE GENERATION TEST EXECUTION TEST EVALUATION OPTIONAL MANUAL EXTENSIONS Domain experts Domain Input Specifications End-users INSTRUMENT ANALYSE & INFER MODELS GENERATE TEST CASES RUN SUT MANUALLY Model based oracles SUT COLLECT & PREPROCESS LOGS ANALYSE & INFER ORACLES Execute the test cases and start a new test cycle for continuous testing and adaptation of the test wares! AUTOMATE TEST CASES EXECUTE TEST CASES Log based oracles EVALUATE TEST CASES TEST RESULTS HUMAN ORACLES PROPERTIES BASED ORACLES PATTERN BASED ORACLES MODEL BASED ORACLES FREE ORACLES ORACLES The FITTEST project is funded by the European Commission (FP7-ICT-257574)
  • 10. What  does  it  mean  to  be  an  EU     project  coordinator   •  Understand what the project is about, what needs to be done and what is most important. •  Do NOT be afraid to get your hands dirty. •  Do NOT assume people are working as hard on the project as you are (or are as enthusiastic as you ;-) •  Do NOT assume that the people that ARE responsible for some tasks TAKE this responsibility •  Get CC-ed in all emails and deal with it •  Stalk people (email, sms, whatsapp, skype, voicemail messages) •  Be patient when explaining the same things over and over again The FITTEST project is funded by the European Commission (FP7-ICT-257574) 10
  • 11. EU  project  structures   •  We have: WorkPackages (WP) •  These are composed of: Tasks •  These result in: Deliverables WP   Project  Management   WP:  Do  more  research   WP:  And  some  more   WP:  Integrated  it   all  together  in  a   superduper   solu;on  that   industry  needs   WP:  And  more……..   WP:  Evaluate  Your  Results  through  Case  Studies   The FITTEST project is funded by the European Commission (FP7-ICT-257574) WP   Exploita;on  and  Dissemina;on   WP:  Do  some  research  
  • 12. EU  project   how  to  evaluate  your  results   •  You need to do studies that evaluate the resulting testing tools/techniques within a real industrial environment The FITTEST project is funded by the European Commission (FP7-ICT-257574) 12
  • 13. WE  NEED  MORE  THAN  …   γ b α Glenford Myers 1979! Triangle Problem Test a program which can return the type of a triangle based on the widths of the 3 sides c The FITTEST project is funded by the European Commission (FP7-ICT-257574) a β For  evalua;on   of  tes;ng  tools   we  need  more!  
  • 14. WE  NEED….   Real  people   Real  faults   Empirical  studies   with   Real  Tes;ng   Processes   Real  systems   Real  Tes;ng   Environments   The FITTEST project is funded by the European Commission (FP7-ICT-257574)
  • 15. What  are  empirical  studies   •  [Wikipedia]  Empirical  research  is  a  way  of  gaining  knowledge  by  means  of   direct  and  indirect  observa8on  or  experience.   •  [PPV00]  An  empirical  study  is  really  just  a  test  that  compares  what  we   believe  to  what  we  observe.  Such  tests  when  wisely  constructed  and   executed  play  a  fundamental  role  in  so?ware  engineering,  helping  us   understand  how  and  why  things  work.   •  Collec;ng  data:   –  Quan;ta;ve  data  -­‐>  numeric  data   –  Qualita;ve  data  -­‐>  observa;ons,  interviews,  opinions,  diaries,  etc.   •  Different  kinds  are  dis;nguished  in  literature  [WRH+00,  RH09]   –  –  –  –  Controlled  experiments   Surveys   Case  Studies   Ac;on  research   The FITTEST project is funded by the European Commission (FP7-ICT-257574) 15
  • 16. Controlled  experiments   What   Experimental   inves;ga;on   of   hypothesis   in   a   laboratory   semng,   in   which   condi;ons  are  set  up  to  isolate  the  variables  of  interest  ("independent  variables")   and   test   how   they   affect   certain   measurable   outcomes   (the   "dependent   variables")   Good  for   –  Quan;ta;ve  analysis  of  benefits  of  a  tes;ng  tool  or  technique   –  We  can  use  methods  showing  sta;s;cal  significance   –  We  can  demonstrate  how  scien;fic  we  are!  [EA06]   Disadvantages   –  Limited  confidence  that  laboratory  set-­‐up  reflects  the  real  situa;on   –  ignores  contextual  factors  (e.g.  social/organiza;onal/poli;cal  factors)   –  extremely  ;me-­‐consuming   See:  [BSH86,  CWH05,  PPV00,  Ple95]    The FITTEST project is funded by the European Commission (FP7-ICT-257574) 16
  • 17. Surveys   What   Collec;ng  informa;on  to  describe,  compare  or  explain  knowledge,  amtudes  and   behaviour  over  large  popula;ons  using  interviews  or  ques0onnaires.   Good  for   –  Quan;ta;ve  and  qualita;ve  data   –  Inves;ga;ng  the  nature  of  a  large  popula;on   –  Tes;ng  theories  where  there  is  liVle  control  over  the  variables   Disadvantages   –  Difficul;es  of  sampling  and  selec;on  of  par;cipants   –  Collected  informa;on  tends  to  subjec;ve  opinion   See:  [PK01-­‐03]     The FITTEST project is funded by the European Commission (FP7-ICT-257574) 17
  • 18. What   Case  Studies   A  technique  for  detailed  exploratory  inves;ga;ons  that  aVempt  to  understand   and  explain  phenomenon  or  test  theories  within  their  context   Good  for   –  –  –  –  Quan;ta;ve  and  qualita;ve  data   Inves;ga;ng  capability  of  a  tool  within  a  specific  real  context   Gaining  insights  into  chains  of  cause  and  effect   Tes;ng  theories  in  complex  semngs  where  there  is  liVle  control  over  the   variables  (Companies!)   Disadvantages   –  Hard  to  find  good  appropriate  case  studies   –  Hard  to  quan;fy  findings   –  Hard  to  build  generaliza;ons  (only  context)   See:  [RH09,  EA06,  Fly06]    The FITTEST project is funded by the European Commission (FP7-ICT-257574)   18
  • 19. Ac;on  Research   What   research   ini;ated   to   solve   an   immediate   problem   (or   a   reflec;ve   process   of   progressive   problem   solving)   involving   a   process   of   ac;vely   par;cipa;ng   in   an   organiza;on’s  change  situa;on  whilst  conduc;ng  research   Good  for   –  Quan;ta;ve  and  qualita;ve  data   –  When  the  goal  is  solving  a  problem.   –  When  the  goal  of  the  study  is  change.   Disadvantages   –  Hard  to  quan;fy  findings   –  Hard  to  build  generaliza;ons  (only  context)   See:  [RH09,  EA06,  Fly06,  Rob02]       The FITTEST project is funded by the European Commission (FP7-ICT-257574) 19
  • 20. EU  project   how  to  evaluate  your  results   •  You need to do empirical studies that evaluate the resulting testing tools/techniques within a real industrial environment •  The empirical study that best fits our purposes is Case Study –  Evaluate the capability of our testing techiques/tools –  In a real industrial context –  Comparing to current testing practice The FITTEST project is funded by the European Commission (FP7-ICT-257574) 20
  • 21. Case  studies:  not  only  for  jus;fying   research  projects   •  We  need  to  apply  our  results  in  industry  to  understand  the  problems   they  have  and  if  we  are  going  the  right  direc;on  to  solve  them.   •  Real   need   in   so_ware   engineering   industry   to   have   general   guidelines   on   what   tes;ng   techniques   and   tools   to   use   for   different   tes;ng   objec;ves,  and  how  usable  these  techniques  are.   •  Up  to  date  these  guidelines  do  not  exist.   •  If  we  would  have  a  body  of  documented  experiences  and  knowledge   from  which  the  needed  guidelines  can  be  extracted   •  With   these   guidelines,   tes;ng   prac;cioners   might   make   informed   decisions  about  which  techniques  to  use  and  es;mate  the  ;me/effort   that  is  needed.   The FITTEST project is funded by the European Commission (FP7-ICT-257574) 21
  • 22. The  challenge   To create a body of evidence consisting of evaluative studies of testing techniques and tools that can be used to understand the needs of industry and derive general guidelines about their usability and applicability in industry. TO DO THIS WE HAVE TO: •  •  •  •  Perform more evaluative empirical case studies in industrial environments Carry out these studies by following the same methodology, to enhance the comparison among the testing techniques and tools, Involve realistic systems, environments and subjects (and not toy-programs and students as is the case in most current work). Do the studies thoroughly to ensure that any benefit identified during the evaluation study is clearly derived from the testing technique studied, and also to ensure that different studies can be compared. FOR THIS WE NEED: •  A general methodological evaluation framework that can simplify the design of case studies for comparing software testing techniques and make the results more precise, reliable, and easy to compare. The FITTEST project is funded by the European Commission (FP7-ICT-257574)
  • 23. The  idea   InstanDate     Primary  case  studies   General  Framework  for  EvaluaDng  TesDng   Techniques  and  Tools         Tes;ng  Techniques  and  Tools  (TT)   TT1   TT2   TTi   TTn   CS1   CS1   CS1   CS1   CS2   CS2   CS2   CS2   CSn   CSn   CSn   Body of Evidence CSn   Secondary  study   (EBSE)   Obtain  answers  to  general  ques;ons  about  adop;ng  different  so_ware   tes;ng  techniques  and  or  tools.   The FITTEST project is funded by the European Commission (FP7-ICT-257574)
  • 24. Very  brief…what  is  EBSE   Evidence  Based  So_ware  Engineering   •  The   essence   of   the   evidence-­‐based   paradigm   is   that   of   systema;cally   collec0ng   and   analysing   all   of   the   available   empirical   data   about   a   given   phenomenon   in   order   to   obtain   a   much   wider   and   more   complete   perspec;ve   than   would   be   obtained   from   an   individual   study,   not   least   because  each  study  takes  place  within  a  par;cular  context  and  involves  a   specific  set  of  par;cipants.   •  The   core   tool   of   the   evidence-­‐based   paradigm   is   the   Systema;c   Literature   Review  (SLR)   –  Secondary  study   –  Gathering  an  analysing  primary  stydies.     •  See:  hVp://www.dur.ac.uk/ebse/about.php   The FITTEST project is funded by the European Commission (FP7-ICT-257574) 24
  • 25. The  idea   InstanDate     Primary  case  studies   General  Framework  for  EvaluaDng  TesDng   Techniques  and  Tools         Tes;ng  Techniques  and  Tools  (TT)   TT1   TT2   TTi   TTn   CS1   CS1   CS1   CS1   CS2   CS2   CS2   CS2   CSn   CSn   CSn   Body of Evidence CSn   Secondary  study   (EBSE)   Obtain  answers  to  general  ques;ons  about  adop;ng  different  so_ware   tes;ng  techniques  and  or  tools.   The FITTEST project is funded by the European Commission (FP7-ICT-257574)
  • 26. Empirical  research  with  industry  =  difficult   •  Not “rocket science”-difficult The FITTEST project is funded by the European Commission (FP7-ICT-257574)
  • 27. Empirical  research  with  industry  =  difficult   •  But “communication science”-difficult The FITTEST project is funded by the European Commission (FP7-ICT-257574)
  • 28. “communica;on  science”-­‐difficult   Practitioner in a company Researcher Only takes 1 hour Not so long The FITTEST project is funded by the European Commission (FP7-ICT-257574) Not so long 5 minutes 28
  • 29. Communica;on  is  extremely  difficult   The FITTEST project is funded by the European Commission (FP7-ICT-257574)
  • 30. The FITTEST project is funded by the European Commission (FP7-ICT-257574) 30
  • 31. Examples…..   Academia   Industry   •  Wants  to  empirically  evaluate  T   •  Wants  to  execute  and  use  T  to  see  what   happens.   •  We  use  intui;on!     •  You  want  me  to  know  all  that?   •  That  much  ;me!?   •  We  cannot  give  this  informa;on.   •  Not  ar;ficial  ones,  we  really  need  to   know  if  this  would  work  for  real  faults.   •  We  can  assign  1  person.   •  That  is  confiden;al.   •  Oh..,  I  thought  you  did  not  need  that.   •  What  techniques/tools  can  we   compare  with?   •  Why  don´t  you  know  that?   •  That  does  not  take  so  much  ;me!   •  Finding  real  faults  would  be  great!   •  Can  we  then  inject  faults?   •  How  many  people  can  use  it?   •  Is  there  historical    data?   •  But  you  do  have  that  informa;on?   The FITTEST project is funded by the European Commission (FP7-ICT-257574) 31
  • 32. With  the  objec;ve  to  improve  and   reduce  some  barriers   •  •  •  •  Use  a  general  methodological  framework.   To  use  as  a  vehicle  of  communica;on   To  simplify  the  design   To  make  sure  that  studies  can  be  compared  and  aggregated   The FITTEST project is funded by the European Commission (FP7-ICT-257574) 32
  • 33. Exis;ng  work     •  By Lott & Rombach, Eldh, Basili, Do et al, Kitchenham et al •  Describe organizational frameworks, i.e.: –  General steps –  Warnings when designing –  Confounding factors that should be minimized •  We pretended to define a methodological framework that defined how to evaluate software testing techiques, i.e.: –  The research questions that can be posed –  The variables that can be measured –  Etc. The FITTEST project is funded by the European Commission (FP7-ICT-257574)
  • 34. The  methodological  framework   •  Imagine a company C wants to evaluate T to see whether it is useful and worthwhile to incorporate in its company. •  Components of the Framework (each case study will be an instantiation) –  Objectives: effectiveness, efficiency, satisfaction –  Cases or treatments (= the testing techniques/tools) –  The subjects (= practitioners that will do the study) –  The objects or pilot projects: selection criteria. –  The variables and metrics: which data to collect? –  Protocol that defines how to execute and collect data. –  How to analyse the data –  Threats to validity –  Toolbox The FITTEST project is funded by the European Commission (FP7-ICT-257574)
  • 35. Defini;on  of  the  framework  –  research questions •  RQ1: How does T contribute to the effectiveness of testing when it is being used in real testing environments of C and compared to the current practices of C? •  RQ2: How does T contribute to the efficiency of testing when it is being used in real testing environments of C and compared to the current practices of C? •  RQ3: How satisfied (subjective satisfaction) are testing practitioners of C during the learning, installing, configuring and usage of T when used in real testing environments? The FITTEST project is funded by the European Commission (FP7-ICT-257574)
  • 36. Defini;on  of  the  framework  –  the cases •  Reused a taxonomy from Vegas and Basili adapted to software testing tools and augmented with results from Tonella •  The case or the testing technique or tool should be described by: –  Prerequisites: type, life-cycle, environment (platform and languages), scalability, input, knowledge needed, experience needed. –  Results: output, completeness, effectiveness, defect types, number of generated test cases. –  Operation: Interaction modes, user guidance, maturity, etc. –  Obtaining the tool: License, cost, support. The FITTEST project is funded by the European Commission (FP7-ICT-257574)
  • 37. Defini;on  of  the  framework  –  subjects •  Workers of C that normally use the techniques and tools with which T is being compared. The FITTEST project is funded by the European Commission (FP7-ICT-257574)
  • 38. Defini;on  of  the  framework  – the objects/pilot projects •  In order to see how we can compare and what type of study we will do, we need answers to the following questions: A.  Will  we  have  access  to  a  system  with  known  faults?  What  informa;on  is  present   about  these  faults?   B.  Are  we  allowed/able  to  inject  faults  into  the  system?   C.  Does  your  company  gather  data  from  projects  as  standard  prac;ce?  What  data  is   this?  Can  this  data  be  made  available  for  comparison?  Do  you  have  a  company   baseline?  Do  we  have  access  to  a  sister  project?   D.  Does  company  C  have  enough  ;me  and  resources  to  execute  various  rounds  of   tests?,  or  more  concrete:   •  Is  company  C  willing  to  make  a  new  testsuite  TSna  with  some  technique/tool   Ta  already  used  in  the  company  C?   •  Is  company  C  is  willing  to  make  a  new  testsuite  TSnn  with  some  technique/ tool  Tn  that  is  also  new  to  company  C?   •  Can  we  use  an  exis;ng  testsuite  TSe  that  we  can  use  to  compare?  (Do  we   know  the  techniques  that  were  used  to  create  that  test  suite,  and  how  much   ;me  it  took?)   The FITTEST project is funded by the European Commission (FP7-ICT-257574)
  • 39.   Defini;on  of  the  framework  -­‐  Protocol   Can we inject faults? YES NO Training on T Inject faults do T to make TST; collect data Can we compare with an existing test suite from company C (i.e. TSC) NO Can company C make another test suite TS N for comparison using Tknown or Tunknown NO NO NO Do we have a version with known faults? Do we have a company baseline? YES YES Do we have a version with known faults? NO YES 3 Do we have a version with known faults? YES 2 6 The FITTEST project is funded by the European Commission (FP7-ICT-257574) 7 YES 4 do Tknown o Tunknown to make TSN ; collect data NO 1 YES 5
  • 40.   Defini;on  of  the  framework  -­‐  Scenarios   •  •  •  •  •  •  •  •  Remember: o  Does company C have enough time and resources to execute various rounds of tests?, or more concrete: •  Is company C willing to make a new testsuite TSna with some technique/tool Tal already used in the company C? •  Is company C is willing to make a new testsuite TSnn with some technique/tool Tn that is also new to company C? •  Can we use an existing testsuite TSe that we can use to compare? (Do we know the techniques that were used to create that test suite, and how much time it took?) Scenario 1 (qualitative assessment only) (Qualitative Effects Analysis) Scenario 2 (Scenario 1 / quantitative analysis based on company baseline) Scenario 3 ((Scenario 1 / Scenario 2) / quantitative analysis of FDR) Scenario 4 ((Scenario 1 / Scenario 2) / quantitative comparison of T and TSe) Scenario 5 (Scenario 4 / FDR of T and TSe) Scenario 6 ((Scenario 1 / Scenario 2) / quantitative comparison of T and (Ta or Tn)) Scenario 7 (Scenario 6 / FDR of T and (Ta or Tn)) The FITTEST project is funded by the European Commission (FP7-ICT-257574)
  • 41. If  we  can  inject  faults,  take  care  that   1.  The  ar;ficially  seeded  faults  are  similar  to  real  faults  that   naturally  occur  in  real  programs  due  to  mistakes  made  by   developers.   –  To  iden;fy  realis;c  fault  types,  a  history-­‐based  approach  can  be  used,   i.e.  “real”  faults  can  be  fetched  from  the  bug  tracking  system  and   made  sure  that  these  reported  faults  are  an  excellent  representa;ve   of  faults  that  are  introduced  by  developers  during  implementa;on.     2.  The  faults  should  be  injected  in  code  that  is  covered  by  an   adequate  number  of  test  cases   –   e.g.,  they  may  be  seeded  in  code  that  is  executed  by  more  than  20   percent  and  less  than  80  percent  of  the  test  cases.   3.  The  faults  should  be  injected  “fairly,”  i.e.,  an  adequate   number  of  instances  of  each  fault  type  is  seeded.   The FITTEST project is funded by the European Commission (FP7-ICT-257574) 41
  • 42. Defini;on  of  the  framework  –  data to collect •  Effectiveness –  –  –  –  –  –  Number of test cases designed or generated. How many invalid test cases are generated. How many repeated test cases are generated Number of failures observed. Number of faults found. Number of false positives (The test is marked as Failed, when the functionality is working). –  Number of false negatives (The test is marked as Passed, when the functionality is not working). –  Type and cause of the faults that were found. –  Estimation (or when possible measured) of coverage reached. •  Efficiency •  Subjective satisfaction The FITTEST project is funded by the European Commission (FP7-ICT-257574)
  • 43. Defini;on  of  the  framework  –  data to collect •  Effectiveness •  Efficiency –  Time needed to learn the testing method. –  Time needed to design or generate the test cases. –  Time needed to set up the testing infrastructure (install, configure, develop test drivers, etc.) (quantitative). (Note: if software is to be developed or other mayor configuration/ installation efforts, it might be a good idea to maintain working diaries). –  Time needed to test the system and observe the failure (i.e. planning, implementation and execution) in hours (quantitative). –  Time needed to identify the fault type and cause for each observed failure (i.e. time to isolate) (quantitative). •  Subjective satisfaction The FITTEST project is funded by the European Commission (FP7-ICT-257574)
  • 44. Defini;on  of  the  framework  –  data to collect •  Effectiveness •  Efficiency •  Subjective satisfaction –  SUS score (10 question questionnaire with 5 likert-scale and a total score) –  5 reactions (through reaction cards) that will be used to create a word cloud and ven diagrams) –  Emotional face reactions during semi-structured interviews (faces will be evaluated on a 5 likert-scale from “not at all like this” to “very much like this”). –  Subjective opinions about the tool. A mixture of methods for evaluating subjective satisfaction The FITTEST project is funded by the European Commission (FP7-ICT-257574)
  • 45. The FITTEST project is funded by the European Commission (FP7-ICT-257574) ! www.usability.serco.com/trump/documents/Suschapt.doc !!!!!!!!!!!!!!!!!!!!!!Strongly!agree! !!!!!!! © Digital Equipment Corporation, 1986 ! ! System  Usability  Scale  (SUS)   ! ! ! !!!!!!!!!!!!!!!!!!!!!!!!!!!Strongly!disagree! ! ! ! ! ! ! ! ! 1.!I!think!that!I!would!like!to! ! !!!use!this!system!frequently! ! ! ! ! 2.!I!found!the!system!unnecessarily! !!!complex! ! ! 3.!I!thought!the!system!was!easy! !!!to!use!!!!!!!!!!!!!!!!!!!!!!! ! ! 4.!I!think!that!I!would!need!the! !!!support!of!a!technical!person!to! !!!be!able!to!use!this!system! ! ! 5.!I!found!the!various!functions!in! !!!this!system!were!well!integrated! ! ! ! ! ! 6.!I!thought!there!was!too!much! !!!inconsistency!in!this!system! ! ! ! ! ! 7.!I!would!imagine!that!most!people! !!!would!learn!to!use!this!system! !!!very!quickly! ! ! ! ! 8.!I!found!the!system!very! !!!cumbersome!to!use! ! ! ! ! 9.!I!felt!very!confident!using!the! !!!system! ! ! 10.!I!needed!to!learn!a!lot!of! !!!things!before!I!could!get!going! !!!with!this!system!! ! ! ! ! 45
  • 46. Why  SUS   •  studies  (e.g.  [TS04,  BKM08])  have  shown  that  this  simple   ques;onnaire  gives  most  reliable  results.   •  SUS  is  technology  agnos;c,  making  it  flexible  enough  to  assess  a   wide  range  of  interface  technologies.     •  The  survey  is  rela;vely  quick  and  easy  to  use  by  both  study   par;cipants  and  administrators.   •  The  survey  provides  a  single  score  on  a  scale  that  is  easily   understood  by  the  wide  range  of  people  (from  project  managers   to  computer  programmers)  who  are  typically  involved  in  the   development  of  products  and  services  and  who  may  have  liVle  or   no  experience  in  human  factors  and  usability.   •  The  survey  is  nonproprietary,  making  it  a  cost  effec;ve  tool  as   well.   The FITTEST project is funded by the European Commission (FP7-ICT-257574) 46
  • 47. SUS  Scoring   •  SUS  yields  a  single  number  represen;ng  a  composite  measure   of  the  overall  usability  of  the  system  being  studied.  Note  that   scores  for  individual  items  are  not  meaningful  on  their  own.   •  To  calculate  the  SUS  score,  first  sum  the  score  contribu;ons   from  each  item.     –  Each  item's  score  contribu;on  will  range  from  0  to  4.     –  For  items  1,  3,  5,  7  and  9  the  score  contribu;on  is  the  scale  posi;on   minus  1.     –  For  items  2,4,6,8  and  10,  the  contribu;on  is  5  minus  the  scale   posi;on.     –  Mul;ply  the  sum  of  the  scores  by  2.5  to  obtain  the  overall  value  of  SU.     •  SUS  scores  have  a  range  of  0  to  100.   The FITTEST project is funded by the European Commission (FP7-ICT-257574) 47
  • 48. SUS  is  not  enough…   •  In  a  literature  review  of  180  published  usability  studies,   Hornbaek  [Horn06]  concludes  that  measures  of  sa;sfac;on   should  be  extended  beyond  ques;onnaires.   •  So  we  add  more….   The FITTEST project is funded by the European Commission (FP7-ICT-257574) 48
  • 49. Reac;on  Cards   Developed by and © 2002 Microsoft Corporation. All rights reserved. The FITTEST project is funded by the European Commission (FP7-ICT-257574) 49
  • 50. Emo;onal  face  reac;ons     •  The  idea  is  to  elicit  feedback  about  the  product,  par;cularly   emo;ons  that  arose  for  the  par;cipants  while  talking  about  the   product  (e.g.  frustra;on,  happiness).       •  We  will  video  tape  the  users  when  they  respond  to  the  following   two  ques;ons  during  a  semi-­‐structured  interview.   •  Would  you  recommend  this  tool  it  to  other  colleagues?     –  If  not  why   –  If  yes  what  arguments  would  you  use     •  Do  you  think  you  can  persuade  your  management  to  invest  in  a  tool  like  this?   –  If  not  why   –  If  yes  what  arguments  would  you  use     Not!at! all!like! this! 1! ! ! 2! ! 3! ! 4! Very! much! like!this! ! 5! 6! 7! ! ! The FITTEST project is funded by the European Commission (FP7-ICT-257574) 50
  • 51. Analysing  and  interpre;ng  the  data   •  Depends  on  the  amount  of  data  we  have.   •  If  we  only  have  1  value  for  each  variable,  no  analysis  techniques   are  available  and  we  just  present  and  interpret  the  data.   •  If   we   have   sets   of   values   for   a   variable   then   we   need   to   use   sta;s;cal  methods   –  Descrip;ve  sta;s;cs   –  Sta;s;cal   tests   (or   significance   tes;ng).   In   sta;s;cs   a   result   is   called   sta;s;cally   significant   if   it   has   been   predicted   as   unlikely   to   have   occurred   by   chance   alone,   according   to   a   pre-­‐determined   threshold   probability,  the  significance  level.   •  Evalua;ng   SBST   tools,   we   will   always   have   sets   of   values   for   most  of  the  variables  to  deal  with  randomness   The FITTEST project is funded by the European Commission (FP7-ICT-257574) 51
  • 52. Descrip;ve  sta;s;cs   •  Mean,   median,   middle,   standard   devia;on,   frequency,   correla;on,  etc   •  Graphical   visualisa;on:   scaVer   plot,   box   plot,   histogram,   pie   charts   The FITTEST project is funded by the European Commission (FP7-ICT-257574) 52
  • 53. http://science.leidenuniv.nl/index.php/ibl/pep/people/Tom_de_Jong/Teaching Most  important  is  finding  te  right  method   Sta;s;cal  test   The FITTEST project is funded by the European Commission (FP7-ICT-257574) 53
  • 54.   Defini;on  of  the  framework  -­‐  Threats   •  Threats to validity (of confounding factors) have to be minimized •  These are the effects or situations that might jeopardize the validity of your results…. •  Those that cannot be prevented have to be reported •  When working with people we have to consider many sociological effects: •  Let us just look at a couple well known ones to give you an idea…. –  –  –  –  The learning curve effect The Hawthorne effect The Placebo effect Etc…. The FITTEST project is funded by the European Commission (FP7-ICT-257574)
  • 55. The  learning  curve  effect   •  When  using  new  methods/tools  people  gain  familiarity  with   their  applica;on  over  ;me  (=  learning  curve).   –  Ini;ally  they  are  likely  to  them  more  ineffec;vely  than  they  might  a_er  a   period  of  familiarisa;on/learning.   •  Thus  the  learning  curve  effect  will  tend  to  counteract  any   posi;ve  effects  inherent  in  the  new  method/tool.     •  In  the  context  of  evalua;ng  methods/tools  there  are  two  basic   strategies  to  minimise  the  learning  curve  effect  (that  are  not   mutually  exclusive):   1.  Provide  appropriate  training  before  undertaking  an  evalua;on  exercise;   2.  Separate  pilot  projects  aimed  at  gaining  experience  of  using  a  method/ tool  from  pilot  projects  that  are  part  of  an  evalua;on  exercise.   The FITTEST project is funded by the European Commission (FP7-ICT-257574) 55
  • 56. The  Hawthorn  effect   •  When  an  evalua;on  is  performed,  staff  working  on  the  pilot   project(s)  may  have  the  percep;on  that  they  are  working  under   more  management  scru;ny  than  normal  and  may  therefore  work   more  conscien;ously.     •  Name  comes  from  Hawthorne  aircra_  factory  (lights  low  or  high?).   •  The  Hawthorn  effect  would  tend  to  exaggerate  posi;ve  effects   inherent  in  a  new  method/tool.     •  A  strategy  to  minimise  the  Hawthorne  effect  is  to  ensure  that  a   similar  level  of  management  scru;ny  is  applied  to  control  projects   in  your  case  study  (i.e.  project(s)  using  the  current  method/tool)  as   is  applied  to  the  projects  that  are  using  the  new  method/tool.   The FITTEST project is funded by the European Commission (FP7-ICT-257574) 56
  • 57. The  placebo  effect   •  In  medical  research,  pa;ents  who  are  deliberately  given  ineffectual  treatments   recover  if  they  believe  that  the  treatment  will  cure  them.     •  Also  a  so_ware  engineer  who  believes  that  adop;ng  some  prac;ce  (i.e.,   wearing  a  pink  t-­‐shirt)  will  improve  the  reliability  of  his  code  may  succeed  in   producing  more  reliable  code.     •  Such  a  result  could  not  be  generalised.   •  In  medicine,  placebo  effect  is  minimized  by  not  informing  the  subjects.     •  This  cannot  be  done  in  the  context  of  tes;ng  tool  evalua;ons.   •  When  evalua;ng  methods  and  tools  the  best  you  can  do  is  to:   –  Assign  staff  to  pilot  projects  using  your  normal  project  staffing  methods  and  hope   that  the  actual  selec;on  of  staff  includes  the  normal  mix  of  enthusiasts,  cynics  and   no-­‐hopers  that  normally  comprise  your  project  teams.   –  Make  a  special  effort  to  avoid  staffing  pilot  projects  with  staff  who  have  a  vested   interest  in  the  method/tool  (i.e.  staff  who  developed  or  championed  it)  or  a  vested   interest  in  seeing  it  fail  (i.e.  staff  who  really  hate  change  rather  than  just  resent  it).   •  This  is  a  bit  like  selec;ng  a  jury.  Ini;ally  the  selec;on  of  poten;al  jurors  is  at   random,  but  there  is  addi;onal  screening  to  avoid  jurors  with  iden;fiable  bias.   The FITTEST project is funded by the European Commission (FP7-ICT-257574) 57
  • 58.   Defini;on  of  the  framework  -­‐  Threats   •  There are many more factors that all need to be identified •  Not only the people but also the technology: –  Did the measurement tools (i.e. Coverage) really measure what we thoughts –  Are the injected faults really representative? –  Were the faults injected fairly? –  Is the pilot project and software representative? –  Were the used oracles reliable? –  Etc…. The FITTEST project is funded by the European Commission (FP7-ICT-257574)
  • 59.   Defini;on  of  the  framework  -­‐  Toolbox   •  Toolbox –  Demographic questionnaire: This questionnaire must be answered by the testers before performing the test. This questionnaire aims to obtain the features of the testers: level of experience in using the tool, years, job, knowledge of similar tools, –  Satisfaction questionnaire SUS: Questionnaire in order to extract the testers’ satisfaction when they perform the evaluation. –  Reaction cards –  Questions for (taped) semi-structured interviews –  Process for investigating the face reactions in videos –  An fault taxonomy to classify the found fault. –  A software testing technique and tools classification/taxonomy –  Working diaries –  A fault template to classify each fault detected by means of the test. The template contains the following information: •  Time spent to detect the fault •  Test case that found the fault •  Cause of the fault: mistake in the implementation, mistake in the design, mistake in the analysis. •  Manifestation of the fault in the code The FITTEST project is funded by the European Commission (FP7-ICT-257574)
  • 60.   Applying  or  instan;a;ng  the  framework   •  •  •  •  •  Search Based Structural Testing tool [Vos et.al. 2012] Search Based Functional Testing tool [Vos et. Al. 2013] Web testing techniques for AJAX applications [MRT08] Commercial combinatorial testing tool at Sulake (to be presented at ISSTA workshop next week) Automated Test Case Generation at IBM (has been send to ESEM) •  We have finished other instances: –  Commercial combinatorial testing tool at SOFTEAM (has been send to ESEM) •  Currently we are working on more instantiations: –  Regression testing priorization technique –  Continuous testing tool at SOFTEAM –  Rogue User Testing at SOFTEAM –  … The FITTEST project is funded by the European Commission (FP7-ICT-257574)
  • 61. Combinatorial  Tes;ng   example  case  study  Sulake:  Context   •  Sulake is a Finish company •  Develops social entertainment games •  Main product: Habbo hotel –  –  –  –  –  –  World’s largest virtual community for teenagers Millions of teenagaers a week all over the world (ages 13-18) Access direct through the browser or facebook 11 languages available 218.000.000 registered users 11.000.000 visitors / month •  System can be accessed through wide variety of browsers, flashplayers (and their versions) than run on different operating systems! •  Which combinations to use when testing the system?! The FITTEST project is funded by the European Commission (FP7-ICT-257574)
  • 62. Can  a  tool  help?   •  What  about  the  CTE  XL  Profesional  or  CTE  for  short?   •  Combinatorial  Tree  Editor:   –  Model  your  combinatorial  problem  in  a  tree   –  Indicate  the  priori;es  of  the  combina;ons     –  The  tool  automa;cally  generated  the  best  test  cases!   The FITTEST project is funded by the European Commission (FP7-ICT-257574) 67
  • 63. Combinatorial  Tes;ng   example  case  study  Sulake:  Research  Ques;ons   •  •  What do we want to find out? Research questions: –  RQ1: Compared to the current test suites used for testing in Sulake, can the test cases generated by the CTE contribute to the effectiveness of testing when it is used in real testing environments at Sulake? –  RQ2: How much effort would be required to introduce the CTE into the testing processes currently implanted at Sulake? –  RQ3: How much effort would be required to add the generated test cases into the testing infrastructure currently used at Sulake? –  RQ4: How satisfied are Sulake testing practitioners during the learning, installing, configuring and usage of CTE when it is used in real testing environment The FITTEST project is funded by the European Commission (FP7-ICT-257574)
  • 64. The  tes;ng  tools  evaluated   (the  cases  or  treatments)   •  Current combinatorial testing practice at Sulake –  Exploratory testing with feature coverage as objective –  Based on real user information(i.e. browsers, OS, Flash, etc.) combinatorial aspects are taken into account COMPARED TO •  Classification Tree Editor (CTE) –  Classify the combinatorial aspects as a classification tree –  Generate prioritized test cases and select The FITTEST project is funded by the European Commission (FP7-ICT-257574)
  • 65. Who  is  doing  the  study   (the  subjects)   •  Subjects: 1 senior tester from Sulake (6 years sw devel, 8 years testing experience) The FITTEST project is funded by the European Commission (FP7-ICT-257574)
  • 66. Systems  Under  Test   (objects  or  pilot  projects)   •  Objects: –  2 nightly builds from Habbo –  Existing test suite that Sulake uses (TSsulake) with 42 automated test cases –  No known faults, no injection of faults The FITTEST project is funded by the European Commission (FP7-ICT-257574)
  • 67. Can we inject faults? The  protocol   (scenario  4)   NO NO Do we have a version with known faults? Do we have a company baseline? YES Inject faults do T to make TST; collect data Can we compare with an existing test suite from company C (i.e. TSC) Can company C make another test suite TS N for comparison using Tknown or Tunknown Do we have a version with known faults? NO 3 Do we have a version with known faults? YES 2 6 7 YES 4 do Tknown o Tunknown to make TSN ; collect data YES The FITTEST project is funded by the European Commission (FP7-ICT-257574) YES YES NO 1 NO Training on T NO NO YES 5
  • 68. Collected  data   Variables) Measuring!effectiveness:! Number!of!test!cases! Number!of!invalid!test!cases! Number!of!repeated!test!cases! Number!of!failures!observed! Number!of!fauls!found! Type!and!cause!of!the!faults! TSSulake) TSCTE) 42! 0! 0! 0! 0! N/A! Feature!coverage!reached! All!pairs!coverage!reached! Measuring!efficiency:! Time!needed!to!learn!the!CTE!testing!method! Time!needed!to!design!and!generate!the!test!suite!with!the!CTE!! 100%! N/A! 68!(selected!42!high!priority)! 0! 26! 12! 2! 1.!critical,!browser!hang! 2.!minor,!broken!UI!element! 40%! 80%! Time!needed!to!setup!testing!infrastructure!specific!to!CTE! Time!needed!to!automate!the!test!suite!generated!by!the!CTE!! Time!needed!to!execute!the!test!suite! Time!needed!to!identify!fault!types!and!causes!! Measuring!subjective!satisfaction! SUS! Reaction!cards! N/A! N/A! 114min! 0min! Informal!interview!! N/A! The FITTEST project is funded by the European Commission (FP7-ICT-257574) N/A! N/A! N/A! N/A! 116!min! 95!min!(62!for!tree,!33!for! removing!duplicates)!! 74!min! 357!min! 183!min!(both!builds)! 116!min! 50! Comprehensive,!Dated,!Old,! Sterile,!Unattractive! video! !
  • 69. Descrip;ve  sta;s;cs  for  efficiency  (1)   The FITTEST project is funded by the European Commission (FP7-ICT-257574)
  • 70. Descrip;ve  sta;s;cs  for  efficiency  (2)   The FITTEST project is funded by the European Commission (FP7-ICT-257574)
  • 71. Emo;onal  face  reac;ons     4. 1. Appearance of the tool Duplicates Test Cases 2. 3. Readability of the CTE trees The FITTEST project is funded by the European Commission (FP7-ICT-257574) Technical support and user manuals 76
  • 72. Combinatorial  Tes;ng   example  case  study  Sulake:  Conclusions   RQ1:  Compared  to  the  current  test  suites  used  for  tes;ng  in  Sulake,  can  the  test  cases   generated  by  the  CTE  contribute  to  the  effec;veness  of  tes;ng  when  it  is  used  in  real  tes;ng   environments  at  Sulake?   •  –  –  2  new  faults  were  found!!   The  need  that  more  structured  combinatorial  tes;ng  is  necessary  was  confirmed  by  Sulake.   RQ2:  How  much  effort  would  be  required  to  introduce  the  CTE  into  the  tes;ng  processes   currently  implanted  at  Sulake?   •  Effort  for  learning  and  installing  is  medium,  but  can  be  jus;fied  within  Sulake  being  only  once   Designing  and  genera;ng  test  cases  suffers  form  duplicates  that  costs  ;me  to  be  removed.  Total  effort  can   be  accepted  within  Sulake  because  cri;cal  faults  were  discovered.   –  Execu;ng  the  test  suite  generated  by  the  CTE  takes  1  hour  more  that  Sulake  test  suite  (due  to  more   combinatorial  aspects  being  tested).  This  means  that  Sulake  cannot  include  these  tests  in  daily  build,  but  will   have  to  add  them  to  nightly  builds  only.   –  –  ¢  RQ3:  How  much  effort  would  be  required  to  add  the  generated  test  cases  into  the  tes;ng   infrastructure  currently  used  at  Sulake?     •  ¢  Effort  for  automa;ng  the  generated  test  cases,  but  can  be  jus;fied  within  Sulake.   RQ4:  How  sa;sfied  are  Sulake  tes;ng  prac;;oners  during  the  learning,  installing,  configuring   and  usage  of  CTE  when  it  is  used  in  real  tes;ng  environment.   •  Seems  to  have  everything  that  is  needed  but  looks  unaVrac;ve.   The FITTEST project is funded by the European Commission (FP7-ICT-257574)
  • 73. Automated  Test  Case  Genera;on   example  case  study  IBM:  context   •  IBM  Research  Labs  in  Haifa,  Israel   •  Develop  a  system  (denoted  IMP  ;-­‐)  for  resource  management   in  a  networked  environment  (servers,  virtual  machines,   switches,  storage  devices,  etc.)   •  IBM  Lab  has  a  designated  team  that  is  responsible  for  tes;ng   new  versions  of  the  product.   •  The  tes;ng  is  being  done  within  a  simulated  tes;ng   environment  developed  by  this  tes;ng  team   •  IBM  Lab  is  interested  in  evalua;ng  the  Automated  Test  Case   Genera;on  tools  developed  in  the  FITTEST  project!   The FITTEST project is funded by the European Commission (FP7-ICT-257574) 78
  • 74. Automated  Test  Case  Genera;on   example  case  study  IBM:  the  research  ques;ons   •  What do we want to find out? •  Research questions: –  RQ1: Compared to the current test suite used for testing at IBM Research, can the FITTEST tools contribute to the effectiveness of testing when it is used in real testing environments at IBM? –  RQ2: Compared to the current test suite used for testing at IBM Research, can the FITTEST tools contribute to the efficiency of testing when it is used in real testing environments at IBM? –  RQ3: How much effort would be required to deploy the FITTEST tools within the testing processes currently implanted at IBM Research? The FITTEST project is funded by the European Commission (FP7-ICT-257574)
  • 75. The  tes;ng  tools  evaluated   (the  cases  or  treatments)   •  Current test case design practice at IBM –  Exploratory test case design –  The objective of test cases is maximise the coverage of the system use-cases COMPARED TO •  FITTEST Automated Test Case Generation tools –  Only part of the whole continuous testing approach of FITTEST The FITTEST project is funded by the European Commission (FP7-ICT-257574)
  • 76. FITTEST   Automated  Test  Case  Genera;on   FSM2Tests Logs2FSM •  Takes FSMs and a Domain Input Specification (DIS) file created by a tester for the IBM Research SUT to generate concrete test cases. •  This component implements a technique that combines model-based and combinatorial testing (see [NMT12]): 1.  generate test paths from the FSM (using various simple and advanced graph visit algorithms) 2.  transform these paths into classification trees using the CTE XL format, enriched with the DIS such as data types and partitions; 3.  Generate test combinations from those trees using combinatorial criteria. The FITTEST project is funded by the European Commission (FP7-ICT-257574) •  Infers FSM models from the logs •  Applying an event-based model inference approach (read more in [MTR08]). •  The model-based oracles that also result from this tool refer to the use of the paths generated from the inferred FSM as oracles. If these paths, when transformed to test cases, cannot be fully executed, then the tester needs to inspect the failing paths to see if that is due to some faults, or the paths themselves are infeasible. 81
  • 77. Who  is  doing  the  study   (the  subjects)   •  Subjects: –  1 senior tester from IBM (10 years sw devel, 5 years testing experience of which 4 years with IMP system) –  1 researcher from FBK (10 years of experience with sw development, 5 years of experience with research in testing) The FITTEST project is funded by the European Commission (FP7-ICT-257574)
  • 78. Pilot  project     •  Objects: –  SUT: Distributed application for managing system resources in a networked environment •  A management server that communicates with multiple managed clients (= physical or virtual resources) •  Important product for IBM with real customers •  Case study will be performed on a new version of this system –  Existing test suite that IBM uses (TSibm) •  Selected from what they call System Validation Tests (SVT) -> tests for high level complex costumer use-cases •  Manually designed •  Automatically executed through activation scripts –  10 representative faults to inject into the system The FITTEST project is funded by the European Commission (FP7-ICT-257574)
  • 79. Can we inject faults? The  protocol   (scenario  5)   NO NO Do we have a version with known faults? Do we have a company baseline? YES Inject faults do T to make TST; collect data Can we compare with an existing test suite from company C (i.e. TSC) Can company C make another test suite TS N for comparison using Tknown or Tunknown Do we have a version with known faults? NO 3 Do we have a version with known faults? YES 2 6 7 YES 4 do Tknown o Tunknown to make TSN ; collect data YES The FITTEST project is funded by the European Commission (FP7-ICT-257574) YES YES NO 1 NO Training on T NO NO YES 5
  • 80. More  detailed  steps   1.  2.  3.  4.  Configure  the  simulated  environment  and  create  the  logs  [IBM]   Select  test  suite  TSibm [IBM]   Write  ac;va;on  scripts  for  each  test  case  in  TSibm [IBM]   Generate  TSfittest [FBK  subject]   a.  b.  c.  d.  Instan;ate  FITTEST  components  for  the  IMP   Generate  the  FSM  with  Logs2FSM   Define  the  Domain  Input  Specifica;on  (DIS)   Generate  the  concrete  test  data  with  FSM2Tests   5.  Select  and  inject  the  faults  [IBM]   6.  Develop  a  tool  that  transforms  the  concrete  test  cases  generated   by  the  FITTEST  tool  FSM2Tests    to  an  executable  format  [IBM]   7.  Execute  TSibm  [IBM]   8.  Execute  TSfittest [IBM]   The FITTEST project is funded by the European Commission (FP7-ICT-257574) 85
  • 81. Collected  Measures   TSibm NA 4 1814 automated cf. Section III-B2 design activation scripts effort to create the test suite 84 3054 18520 manual cf. Section III-B1 size number of abstract test cases number of concrete test cases number of commands (or events) construction design of the test cases effort TSf ittest set up FITTEST tools generate the FSM specify the DIS generate concrete tests transform into executable format 5 hours 4 hours 8 hours automated, less than 1 second CPU time 2 hours automated, less than 1 minute CPU time 20 hours II.D ESCRIPTIVE MEASURES FOR THE TEST SUITES TSFSM THAT ittest D ESCRIPTIVE MEASURES FOR THE ibm AND TSf IS USED TO GENERATE TSf ittest TABLE IV. TABLE S15 S42 the effecti can be de Moreover point of view w can help i GET/VMControl/virtualAppliances/{id}/progress GET/VMControl/virtualAppliances/{id}/targets GET/VMControl/virtualAppliances S34 GET/VMControl/virtualAppliances/{id} Variable what the researcher have in mind and what is investigated used toMoreover, from the FITTEST project’s Number of traces infer FSM 6 ccording to the research questions. This type of threat is have the following results: Average trace length 100 mainly related to the use of injected faults to measure the fault- in generated FSM Number of nodes 51 RQ3: Ho finding capability of our testing strategies. This Number of transitions in • The FITTEST tools have shown to be useful with is because the generated FSM 134 GET/VMControl/virtualAppliances/{id}/progress S33 GET/VMControl/virtualAppliances/{id} GET/VMControl/virtualAppliances/{id}/progress GET/VMControl/virtualAppliances/{id}/progress GET/VMControl/virtualAppliances/{id} GET/VMControl/virtualAppliances/{id}/targets GET/VMControl/virtualAppliances/{id}/targets GET/VMControl/virtualAppliances/{id}/progress S20 GET/VMControl/virtualAppliances/{id} GET/VMControl/virtualAppliances/{id}/targets S22 GET/VMControl/virtualAppliances/{id}/targets GET/resources/OperatingSystem?Props=IPv4Address GET/VMControl/virtualAppliances/{id}/targets GET/VMControl/virtualAppliances/{id} GET/VMControl/virtualAppliances/{id}/progress GET/VMControl/virtualAppliances/{id} S37 GET/VMControl/virtualAppliances/{id}/targets GET/VMControl/virtualAppliances GET/VMControl/virtualAppliances GET/VMControl/virtualAppliances/{id} GET/VMControl/virtualAppliances/{id}/progress S40 S48 GET/VMControl/virtualAppliances/{id}/targets S41 GET/isdsettings/restcompatibilities GET/VMControl/virtualAppliances/{id}/targets the context of a real industrial case. FITTEST ypes of faults seeded may not be enough representative of real aults. In order to mitigate this threat, the IBM team identified • The FITTEST tools have the ability toat IBM R automate th epresentative faults that were based on real faults, identified testing process within a real industrial case. As ca n earlier time of the development. This identification although was realized by Thesenior project isat IBMwas revised by(FP7-ICT-257574) FITTEST technologies contribute 86 FITTEST a FITTEST tester,funded list European Commission all IBM the by the Research, can the ACKNOWLEDGMENT eam that participated in this casethe effectiveness of testing when it is used in the testing to study. This work was financed by the FITTEST Input Spe project, IC GET/VMControl/virtualAppliances/{id} GET/VMControl/virtualAppliances GET/VMControl/virtualAppliances/{id}/targets S2 GET/VMControl/virtualAppliances/{id} S26 GET/VMControl/virtualAppliances S36 GET/VMControl/virtualAppliances/{id}/progress GET/isdsettings/restcompatibilities GET/VMControl S19 GET/VMControl/workloads S38 GET/resources/System/{id}/access/a315017165998 GET/VMControl/workloads/{id} S31
  • 82. As can be seen from Table IV, the TSibm is substantial smaller in size than the TSf ittest in all parameters, this is one of the evident results of automation. However, not only the size of TSf ittest is bigger, also the effectiveness of TSf ittest , measured by the injected faults coverage (see Figure 4), is significantly higher (50% vs 70%). Moreover, if we would combine the TSibm and TSf ittest suites, the effectiveness increases to 80%. Therefore, within the context of the studied environment, for IBM Research the FITTEST technologies can contribute to the effectiveness of testing and IBM Research has decided that, for optimizing faults-finding capability, the two techniques can best be combined. Collected  measures   RQ2: Compared to the current test suite used for testing at IBM Research, can the FITTEST technologies contribute to the IF1$ IF2$ IF3$ IF4$ IF5$ IF6$ IF7$ IF8$ IF9$ IF10$ efficiency of testing when it is used in the testing environments TS_ibm$ 1$ 0$ 0$ 0$ 1$ 1$ 0$ 0$ 1$ 1$ at IBM Research? TS_fi5est$ 1$ 1$ 1$ 0$ 0$ 1$ 0$ 1$ 1$ 1$ TABLE III. E FFICIENCY MEASURES FOR EXECUTION OF BOTH TEST SUITES . Variable Execution Time with fault injection Execution Time without fault injection TSibm TSf ittest 36.75 normalized by size 9.18 127.87 normalized by size 1.52 27.97 6.99 50.72 0.60 It can be seen from Table III that the time to execute TSibm is smaller than the (FP7-ICT-257574) The FITTEST project is funded by the European Commissiontime to execute TSf ittest . This is due to the number of concrete tests in TSf ittest . When we normalize the Internal validi are examined. In o is related to the lo vironment to be us models. Because of the content of the i have asked IBM for is that the quality the completeness o because incomplete of the TSf ittest . In number of detected can be improved, s conclusion about t unchanged. Regard although they had working in the in knowledge of the F means of a closer complementing the mistakes or misund External validi is possible to gene findings are of inter case. Our results r given set of artifici expensive in terms 87 it with in order to However, as discus
  • 83. Automated  Test  Case  Genera;on   example  case  study  IBM:  Conclusions   •  RQ1: Compared to the current test suite used for testing at IBM Research, can the FITTEST tools contribute to the effectiveness of testing when it is used in real testing environments at IBM? –  TSibm finds 50% of injected faults, TSfittest finds 70% –  Together they find 80%! -> IBM will consider combining the techniques •  RQ2: Compared to the current test suite used for testing at IBM Research, can the FITTEST tools contribute to the efficiency of testing when it is used in real testing environments at IBM? –  FITTEST test cases execute faster because they are smaller –  Shorter tests was good for IBM -> easier to identify faults •  RQ3: How much effort would be required to deploy the FITTEST tools within the testing processes currently implanted at IBM Research? –  Found reasonable by IBM considering the fact that manual tasks need to be done only once and more faults were fund. The FITTEST project is funded by the European Commission (FP7-ICT-257574)
  • 84. Threats  to  validity   The  learning  curve  effect   (an  basically  all  the  other  human  factors  ;-­‐)   Missing  informa;on  in  the  logs  leads  to  weak  FSMs.   Incomplete  specifica;on  of  the  DIS  lead  to  weak  concrete  test   cases.   •  The  representa;veness  of  the  injected  faults  to  real  faults.   •  •  •  •  The FITTEST project is funded by the European Commission (FP7-ICT-257574) 89
  • 85. Final  Things  ……     •  As  researchers,  we  should  concentrate  on  future  problems  the   industry  will  face.  ¿no?     •  How  can  we  claim  to  know  future  needs  without  understanding   current  ones?   •  Go  to  industry  and  evaluate  your  results!   The FITTEST project is funded by the European Commission (FP7-ICT-257574) 90
  • 86. Need  any  help?   More  informa;on?     Want  to  do  an  instan;a;on?     Contact:     Tanja  E.  J.  Vos     email:  tvos@pros.upv.es   skype:  tanja_vos   web:  hVp://tanvopol.webs.upv.es/   project:  hVp://www.facebook.com/FITTESTproject   telephone:  +34  690  917  971     The FITTEST project is funded by the European Commission (FP7-ICT-257574)
  • 87. References   empirical  research  in  so_ware  engineering   [BSH86]  V  R  Basili,  R  W  Selby,  and  D  H  Hutchens.  Experimenta;on  in  so_ware  engineering.  IEEE  Trans.  So_w.  Eng.,  12:733–743,  July  1986.   [BSL99]   Victor   R.   Basili,   Forrest   Shull,   and   Filippo   Lanubile.   Building   knowledge   through   families   of   experiments.   IEEE   Trans.   So_w.   Eng.,   25(4):456–473,  1999.   [CWH05]  M.  Host  C.  Wohlin  and  K.  Henningsson.  Empirical  research  methods  in  so_ware  and  web  engineering.  In  E.  Mendes  and  N.  Mosley,   editors,  Web  Engineering  -­‐  Theory  and  Prac;ce  of  Metrics  and  Measurement    for  Web  Development,    pages  409–  429,  2005.   [Fly04]   Bent   Flyvbejrg.   Five   misunderstandings   about   case-­‐study   research.     In   Jaber   F.   Gubrium   Clive   Seale,   Giampietro   Gobo   and   David   Silverman,  editors,  Qualita;ve  Research    Prac;ce.,  pages  420–434.  London    and  Thousand  Oaks,  CA.,  2004.   [KLL97]  B.  Kitchenham,  S.  Linkman,  and  D.  Law.  Desmet:  a  methodology  for  evalua;ng  so_ware  engineering     [KPP+  02]  Barbara  A.  Kitchenham,  Shari  Lawrence  Pfleeger,  Lesley  M.  Pickard,  Peter  W.  Jones,  David  C.  Hoaglin,  Khaled  El  Emam,  and  JarreV   Rosenberg.    Preliminary  guidelines    for  empirical  research  in  so_ware  engineering.  IEEE    Trans.  So_w.  Eng.,  28(8):721–734,    2002.   [LSS05]  Timothy  C.  Lethbridge,  Susan  EllioV  Sim,  and  Janice  Singer.  Studying  so_ware  engineers:  Data    collec;on  techniques  for  so_ware   field  studies.  Empirical  So_ware  Engineering,  10:311–  341,  2005.    10.1007/s10664-­‐005-­‐1290-­‐x.   [TTDBS07]  Paolo  Tonella,  Marco  Torchiano,  Bart    Du    Bois,    and  Tarja  Syst¨a.    Empirical  studies  in  reverse  engineering:    state  of  the  art  and   future  trends.  Empirical  So_w.  Engg.,  12(5):551–571,  2007.   [WRH+00]  Claes  Wohlin,  Per  Runeson,  Mar;n  Host,  Magnus  C.    Ohls-­‐  son,  Bjorn  Regnell,  and  Anders  Wesslen.  Experimenta;on  in  so_ware   engineering:    an  introduc;on.  Kluwer  Academic  Publishers,  Norwell,  MA,  USA,  2000.   [Rob02]  Colin  Robson.  Real  World  Research.  Blackwell  Publishing  Limited,  2002.   [RH09]   Per   Runeson   and   Mar;n   Host.   Guidelines   for   conduc;ng   and   repor;ng   case   study   research   in   so_ware   engineering.   Empirical   So_w.   Engg.,  14(2):131–164,    2009.   [PPV00]   Dewayne   E.   Perry,   Adam   A.   Porter,   and   Lawrence   G.   VoVa.   2000.   Empirical   studies   of   so_ware   engineering:   a   roadmap.   In   Proceedings  of  the  Conference  on  The  Future  of  So?ware  Engineering  (ICSE  '00).  ACM,  New  York,  NY,  USA,  345-­‐355.   [EA06]  hVp://www.cs.toronto.edu/~sme/case-­‐studies/case_study_tutorial_slides.pdf   [Ple95]  Pfleeger,S.L.;  Experimental  design  and  analysis  in  so_ware  engineering.  Annals  of  So_ware  Engineering  1,  219-­‐253,  1995.   [PK01-­‐03]  Pfleeger,  S.L.  and  Kitchenham,  B.A..  Principles  of  Survey  Research.  So_ware  Engineering  Notes,  (6  parts)  Nov  2001  -­‐  Mar  2003.   [Fly06]  Flyvbjerg,  B.;  Five  Misunderstandings  about  Case  Study  Research.  Qualita;ve  Inquiry  12  (2)  219-­‐245,  April  2006   [KPP95]  Kitchenham,  B.;  Pickard,  L.;  Pfleeger,  Shari  Lawrence,  "Case  studies  for  method  and  tool  evalua;on,"  So?ware,  IEEE  ,  vol.12,  no.4,   pp.52,62,  Jul  1995   The FITTEST project is funded by the European Commission (FP7-ICT-257574) 92    
  • 88. References   Empirical  research  in  so_ware  tes;ng   [BS87]  Victor  R.    Basili  and  Richard  W.    Selby.      Comparing  the  effec;veness  of  so_ware  tes;ng  strategies.  IEEE  Trans.  So_w.  Eng.,   13(12):1278–1296,  1987.   [DRE04]   Hyunsook   Do,   Gregg   Rothermel,   and   Sebas;an   Elbaum.     In-­‐   frastructure   support   for   controlled   experimenta;on   with   so_-­‐   ware  tes;ng  and  regression  tes;ng  techniques.  In  Proc.  Int.  Symp.  On  Empirical    So_ware  Engineering,    ISESE  ’04,  2004.   [EHP+  06]    Sigrid    Eldh,  Hans  Hansson,  Sasikumar  Punnekkat,  Anders  Pet-­‐  tersson,  and  Daniel  Sundmark.  A  framework  for  comparing   efficiency,  effec;veness  and  applicability  of  so_ware  tes;ng  tech-­‐  niques.  Tes;ng:  Academic  &  Industrial  Conference  on  Prac;ce  And   Research  Techniques  (TAIC    part),  0:159–170,  2006.   [JMV04]  Natalia  Juristo,  Ana  M.  Moreno,  and  Sira  Vegas.  Reviewing  25  years  of  tes;ng  technique  experiments.  Empirical  So_w.  Engg., 9(1-­‐2):7–44,  2004.   [RAT+  06]  Per  Runeson,  Carina  Andersson,  Thomas  Thelin,  Anneliese  Andrews,  and  Tomas  Berling.    What  do  we  know  about  de-­‐  fect   detec;on  methods?  IEEE  So_w.,  23(3):82–90,  2006.   [VB05]  Sira  Vegas  and  Victor  Basili.    A    characterisa;on    schema  for  so_ware  tes;ng  Techniques.  Empirical  So_w.  Engg.,  10(4):437– 466,  2005.   [LR96]  Christopher  M.  LoV  and  H.  Dieter  Rombach.  Repeatable  so_ware  engineering    experiments    for  comparing  defect-­‐detec;on   techniques.  Empirical    So_ware  Engineering,    1:241–277,    1996.  10.1007/BF00127447.         The FITTEST project is funded by the European Commission (FP7-ICT-257574) 93
  • 89. References   Descrip;ons  of  empirical  studies  done  in  so_ware  tes;ng   [Vos  et.  Al.  2010]  T.  E.  J.  Vos,  A.  I.  Baars,  F.  F.  Lindlar,  P.  M.  Kruse,  A.  Windisch,  and  J.  Wegener,  “Industrial  scaled  automated  structural   tes;ng  with  the  evolu;onary  tes;ng  tool,”  in  ICST,  2010,  pp.  175–184.     [Vos  et.  al  2012]  Tanja  E.  J.  Vos,  Arthur  I.  Baars,  Felix  F.  Lindlar,  Andreas  Windisch,  Benjamin  Wilmes,  Hamilton  Gross,  Peter  M.  Kruse,   Joachim  Wegener:  Industrial  Case  Studies  for  Evalua;ng  Search  Based  Structural  Tes;ng.  Interna;onal  Journal  of  So_ware  Engineering   and  Knowledge  Engineering  22(8):  1123-­‐  (2012)   [Vos  et.  Al.  2013]  T.  E.  J.  Vos,  F.  Lindlar,  B.  Wilmes,  A.  Windisch,  A.  Baars,  P.  Kruse,  H.  Gross,  and  J.  Wegener,  “Evolu;onary  func;onal   black-­‐box  tes;ng  in  an  industrial  semng,”  So?ware  Quality  Journal,  21  (2):  259-­‐288  (2013)   [MRT08]  A.  MarcheVo,  F.  Ricca,  and  P.  Tonella,  “A  case  study-­‐  based  comparison  of  web  tes;ng  techniques  applied  to  ajax  web   applica;ons,”  Interna;onal  Journal  on  So_ware  Tools  for  Technology  Transfer  (STTT),  vol.  10,  pp.  477–  492,  2008     The FITTEST project is funded by the European Commission (FP7-ICT-257574) 94
  • 90. Other  references   [TS04]  T.S.  Tullis  and  J.  N.  Stetson.  A  comparison  of  ques;onnaires  for  assessing  website  usability.  In  Proceedings  of  the  Usability   Professionals  Associa;on  Conference,  2004.   [BKM08]  Aaron  Bangor,  Philip  T.    Kortum,  and  James  T.    Miller.    An  empirical  evalua;on  of  the  system  usability  scale.  Interna;onal   Journal  of  Human-­‐computer    Interac;on,    24:574–594,  2008.   [Horn06]  Hornbæk,  K.  (2006),  "Current  Prac8ce  in  Measuring  Usability:  Challenges  to  Usability  Studies  and  Research",  Interna8onal   Journal  of  Human-­‐Computer  Studies,  Volume  64,  Issue  2  ,  February  2006,  Pages  79-­‐102.     [MTR08]  MarcheVo,  Alessandro;  Tonella,  Paolo  and  Ricca,  Filippo.,  State-­‐based  tes;ng  of  Ajax  web  applica;ons,  So_ware  Tes;ng,   Verifica;on,  and  Valida;on,  2008  1st  Interna;onal  Conference  on,  pp121-­‐130,  IEEE,  2008.   [NMT12]  C.  D.  Nguyen,  A.  MarcheVo,  and  P.  Tonella.  Combining  model-­‐based  and  combinatorial  tes;ng  for  effec;ve  test  case   genera;on.  In  Proceedings  of  the  2012  Interna;onal  Symposium  on  So_ware  Tes;ng  and  Analysis,  pages  100-­‐110,  ACM,  2012.     The FITTEST project is funded by the European Commission (FP7-ICT-257574) 95