Theories in Empirical Software Engineering

Theories in Empirical Software
Engineering
Roel Wieringa
Sidekicks:
Daniel Méndez
Lutz Prechelt
21 October 2015 IASESE 1

Who are we?
Roel Wieringa
University of Twente, Germany
http://wwwhome.ewi.utwente.nl/~roelw/
Lutz Prechelt
FU Berlin
http://www.mi.fu-berlin.de/w/Main/LutzPrechelt
Daniel Méndez
TU München
http://www4.in.tum.de/~mendezfe/

Who are you?
Quick round
• Who are you?
• What is your experience in conducting
empirical studies?
• What are your expectations?
3

What do you think?
Why do we need scientific theories in software engineering?
4

4. Methodology (the study of research methods)
a. Notion of conceptual framework; statements about them
b. Notion of generalization; statements about them
3. Theory (statement about many research results)
a. Conceptual framework
b. Generalization
2. Research questions (what, how, when where, …., why) aimed at
generalizable knowledge, research method, and research result
1. Practice domain: SW, methods, tools, processes (as is / to be)
Looking at
research from the
sky
General
knowledge is the
gold we are after
Hard work to grow
knowledge
Grass roots
• Everything on the slides in this talk , except the examples, is at level 4.
• The examples on these slides contain explicit level indications.
• The separate example slides report about research that contains 2 and 3.
• The reported research studies some aspect of 1.

Agenda
Time Topic
09:00 – 10:30 Opening and Introduction
10:30 – 11:00 Coffee break
11:00 – 12:30 Inferring Theories from Data
12:30 – 13:30 Lunch
13:30 – 15:00 Designing Research based on Theories
15:30 – 16:30 Hands-on Working Session and Q&A
16:30 – 17:00 Wrap up (all)
6

What is a Scientific Theory

Scientific theories
• A theory is a belief that there is a pattern in phenomena
• A scientific theory is a theory that
– Has survived tests against experience
• Observation, measurement
• Possiblyexperiment, simulation, trials
– Has survived criticism by critical peers
• Anonymous peer review
• Publication
• Replication

Examples (level 3)
• Theory of cognitive dissonance
• Theory of electromagnetism
• The Balance theorem in social networks
• Theories X, Y, Z, and W of (project) management
• Technology Acceptance Model
• Hannayet al. A Systematic “Review of Theory Use in Software
Engineering Experiments”. IEEE TOSEM 33(2), February2007
• Lim et al. “Theories Used in Information Systems Research:
Identifying Theory Networks in Leading IS Journals”./ ICIC 2009,
paper 91.
• Non-examples
– Speculations based on imagination rather than fact: Conspiracy theories
about who killed John Kennedy
– Opinions that cannot be refuted: The Dutch lost the World Championship
because they play like prima donnas

Design theories
• A design theory is a scientific theory about an
artifact in a context
• Vriezekolk: What is a theory
• Méndez: What is a theory
21 October 2015 IASESE 4 10

The Structure of Theories

The structure of scientific theories
1. Conceptual framework
– Constructs used to express beliefs about patternsin phenomena
– E.g. The concepts of beamforming, of multi-agent planning, of data
location compliance. (level 3)
2. Generalizations
– stated in terms of these concepts, that express beliefs about
patterns in phenomena.
– E.g. relationbetween angle of incidence and phase difference,
– Statement about delay reduction on airports. (level 3)
• Generalizations have a scope, a.k.a. target of generalization

The structure of design theories
2. Generalizations
– Artifact specification X Context assumptions → Effects
– Effects satisfya requirement to some extent

1. Architectural structures: Class of systems, componentswith
capabilities, interactions
– E.g. entities, (de)composition,taxonomies, cardinality, events,
processes, procedures, constraints, … (level 4)
– Useful for case-based research (observationalcase studies, case
experiments, simulations, technical action research)
– Typically qualitative
2. Statistical structures: Population, variables with probability
distributions, relations among variables
– Useful for sample-based research (surveys, statisticaldifference-
making experiments)
– Typically quantitative
Two kinds of conceptual structures
21 October 2015 IASESE
14

• Prechelt: What is a theory, the structure of
theories
• Vriezekolk: The structure of theories
• Méndez: The structure of theories

The Use of Theories

Uses of a conceptual framework
• Framing a problem or artifact: choosing which concepts to
use
– Using the theory of infectuous diseases to understand a patient’s
symptoms
– Using concepts of force & energy to understand behavior of a machine
– Using concept of a coordination gatekeeper to understand a
distributedSE project (all three examples at level 1)
• Describe a problemor specify an artifact: using the concepts
• Generalize about the problem or artifact
• Analyze a problem or artifact (i.e. analyze the framework)

Functions of generalizations
• Functions of generalizations
– Explanation: explain phenomenaby identifyingcauses,
mechanisms or reasons
– Prediction: state what will happen in the future
• Design: use generalizations to justifya design choice

• Prechelt: the use of theories
• Vriezekolk: the use of theories
• Méndez: the use of theories

Usability of theories
• When is a design theory
Context assumptions X Artifact design → Effects
usable by a practitioner?
1. He/she is capable to recognize Context assumptions
2. and to acquire/build Artifact under constraints of practice,
3. effects will indeed occur, and
4. He/she can observe this, and
5. They will contribute to stakeholder goals/satisfy
requirements
• Practitioner has to asses the risk that each of these fails

• Prechelt: the usability of theories
• Vriezekolk: the usability of theories
• Méndez: the usability of theories

Agenda
Time Topic
12:30 – 13:30 Lunch
16:30 – 17:00 Wrap up (all)
22

Scientific Inference

Case-based inference
• Descriptive inference: Describing observations
• Abductive inference: Providing an explanation
• Analogic inference: Generalize to similar cases
Data
Explanations
Observations
Generalizations
Abduction
Analogy
Description
Proposition(s) to generalize
Scope of generalization

• Architectural explanation must be the basis of the
analogic generalization;
• Otherwise, we engage in wishful/magical thinking
– You have observed that some small companies did not put
a customer representative on-site of an agile project;
– you explain this as a result of tight resources (level 3);
– you generalize by analogy that this will happen in (almost)
all small companies (level 3).
Data
Explanations
Observations
Generalizations
Abduction
Analogy
Description
Architectural
Architectural

Sample-based inference
• Descriptive inference: Describe sample statistics
• Statistical inference: Generalize to population parameters
• Abductive inference: Provide an explanation
• Analogic inference: Expand the scope of a theory based on similarity
Explanations
Observations
GeneralizationsStatistical
inference
AbductionAnalogyData
Description

• Causal explanations can be supported by sample-based
designs (treatment group/control group)
• Generalization from a population, to similar populations
must be based on architectural explanation
– In an experiment witha sample of students you observe a difference between
treatment group and control group;
– By randomness you generalize topopulation of students
– Your explanation: this difference is caused by the treatment (level 3);
– In turn explainedby cognitive processes of students (level 3);
– generalizedby analogy to novice software engineers (level 3).
Explanations
Observations
Generalizations
AbductionAnalogyData
Description
Statistical inference
Architectural
Causal & Architectural

• Vriezekolk: Inferring theories from data
• Méndez: inferring theories from data
• Prechelt: Applying/inferring theories to/from
data

Agenda
Time Topic
12:30 – 13:30 Lunch
16:30 – 17:00 Wrap up (all)
29

Research Design

The research setup
• In experiments we are interested in the effect of the
treatment on the OoS
– Requires capabilityto applytreatment and control
• In observational studies we are interested in the structure and
dynamics of the OoS itself
– Only weak support for causality
Population
Sample of
Objects of
Study
Represents
one or
more
population
elements
Treatment
instruments
Measure-
ment
instruments

• Case-based designs
– provide architecturalexplanations
– generalize by architectural analogy
– Nondeterminism across cases is not quantified
• Sample-based designs
– Collect sample statistics
– Infer properties of distributionover population
– May be purely descriptive!
– Possibly a causal explanation
– To generalize further, need architectural explanation too
– Nondeterminsim within the population is quantified, but not
across analogous populations

Field versus lab
• If a phenomenoncannot be (re)produced in the lab, it can
only be investigatedin the field
• Which of the followingdesigns can be done in a lab?
Case-based inference Sample-based inference
No treatment
(observational study)
Observational case study Survey
Treatment
(experimental study)
Single-case mechanism
experiment,
Technical action research
Statistical difference-
making experiment
E.g. simulation, test
of individual OoS Treatment group /
control group designs
E.g. test with client,
pilot project

• Vriezekolk The research setup
• Méndez: The research setup
• Prechelt: The research setup

Agenda
Time Topic
12:30 – 13:30 Lunch
16:30 – 17:00 Wrap up (all)
35

Hands-on Working Session

Hands-on Working Session
1. What is your research question?
2. Describe a research setup to answer it
3. What inferences do you plan to base on this setup?
Groups of 3
• 15:30 Each person first drafts a flipchartwith his/her answers for
own research
• 15:45 Each group member comments on the two flipcharts of
others in his/her group, in particularon:
– Are the answers clear?
– Are the answers defensible?
• 16:30 Each person finalizes (for now) his/her flipchart
• 16:31 Paste to the wall. See what you can learn from other designs.
• 16:45 Plenary wrap-up

Q&A
You probably can’t ask anyway, so ask us!

“Naming the pain in requirements engineering: A design for a global
family of surveys and first results from Germany”
Méndez& Wagner
Information & Software technology 2015
“Towards Building Knowledge on Causes of Critical Requirements
Engineering Problems”
Kalinowski et al
Twenty-Seventh International Conference on Software Engineering and
Knowledge Engineering (SEKE 2015) pp. 1-6
40

• International on-linesurvey of requirements engineering
professionals’ opinion about causes and effects of RE
problems
• Research questions
– RQ 1 What are the expectations on a good RE?
– RQ 2 How is RE defined, applied, and controlled?
– RQ 3 How is RE continuously improved?
– RQ 4 Which contemporary problems exist in RE, and what implications
do they have?
– RQ 5 Are there observable patterns of expectations, status quo, and
problems in RE?
• Observational research
41

What is a theory
• The researchers formulated 34 hypotheses about
– RE improvement
• Is beneficial
• Is challenging
– RE standardization
• Hampers creativity
• Improves quality
• ….
– Company-specific standards
• ….
42

• This theory (consisting of 34 proposed generalizations) is
tested against
– Opinions of professionals, based on their experience
– Critical peer review in the publication process
• The opinions of professionals are themselves theories based
on experience,
– but not subjected to systematic tests
– nor to critical peer reviews
43

The structure of theories
– Requirements, needs, goals, specification, RE
skill, etc.
2. Generalizations
– All if the claims about social mechanisms on
previous slides
44

45
customer
Project
team
Requirements
engineer
Product
Requirements
specification
No solution approach
Agile approach
No experience
RE considered unimportant
No RE qualification
No time
Team too small
Different interests
No domain knowledge
No template
Poor techniques
No completeness check
No RE skills
Unclear needs
Unrealistic expectations
No engagement
Unclear requirements
Frequent
changes
Poorly defined
Brazilian theory of social mechanisms that lead
to incomplete requirements
Artifact: Requirements engineering project
Context: software development

46
customer
Project
team
Requirements
engineer
Product
Requirements
specification
No solution approach
Agile approach
No experience
No RE qualification
No time
Team too small
Different interests
No domain knowledge
No contact person
Solution orientation
No template
Poor techniques
No completeness check
No company standard
No RE skills
Unclear needs
Unrealistic expectations
No engagement
Unclear requirements
No contact person
Solution orientation
Domain complexity
Frequent
changes
Poorly defined
Business
dept
conflict
German theory of social mechanisms that lead
to incomplete requirements

• The conceptual structure of social mechanisms in
the previous two slides is architectural:
– Components
– Interactions
• Conceptual structure of the causal theories on
the next slides is statistical:
– Variables
– Distribution over population
47

48
• Brazilian respondents’ theory about causes and effects of
incomplete requirements

• German respondents’ theory about causes and effects of
49

The use of theories
• “Requirements are incomplete because customers have
unclear needs and has no RE skills”
– Frame a phenomenon: requirements can be completely specified
– Describe it: describe all mechanisms that are responsible for
– Specify a treatment: train the customer in RE skills (??)
– Analyze it: —
– Generalize about it: claim that this is responsible for incomplete
requirements more often / always
– Predict an effect: predict that it will happen in the next project
– Explain an effect: explain that incompleteness is dues to unclear needs
and absence of RE skills in customer
50

• The theory of 34 hypotheses is not intendedto be used by
professionals to improve their practice. Consider the theory
``improvingRE skills reduces requirements incompleteness’’
1. Professional is capable to recognize Context assumptions
– Yes: recognizable when there is requirements engineering
2. Capable to acquire/build Artifact under constraintsof practice
– That depends on the available budget (time, money) for RE training
3. The effects will indeed occur
– That depends on the training; and on other factors causingRE incompleteness
4. He/she can observe this
– Hard to say whether requirements are more complete
5. They will contribute to stakeholder goals/satisfy requirements
– Hard to say whether RE completeness will contribute to stakeholder goals
51

Inferring theories from data
– Description
• Interpretation of the answers of the respondents
• Descriptive statistics
– Statistical inference
• No statistical inference
– Abductiveinference
• The assumed explanation of the respondent’s answers is that
they base them on experience
– Analogic inference
• Other professionals will answer similarly; but possibly different
across countries/cultures
52

The research setup
Population
Sample of
Objects of
Study
Represents
one or
more
population
elements
Treatment
instruments
Measure-
ment
instruments
53
All RE professionals
Sample of RE professionals
No treatment
On-line survey tool,
questionnaire

“Why Software Repositories Are Not Used
For Defect-Insertion Circumstance Analysis
More Often: A Case Study”
Lutz Prechelt, Alexander Pepper
Information and Software Technology
55

“Why Software Repositories Are Not Used For Defect-Insertion
Circumstance Analysis More Often: A Case Study”
Lutz Prechelt, Alexander Pepper
Information and Software Technology
• Pepper tried to mine software repositories of the content
management system Fiona, produced by Infopark, in order to
identify correlates of defect insertion, hoping that they can be
used to improve the software process.
– Engineering cycle of the client
• Pepper and Prechelt observed this.
– Case study
• Validationof a community-wide development of MSR
techniques for DICA.
– Engineering cycle of research community
• Research question that emerged from the case: why are MSR
techniques for DICA not used more often? 56

What is a theory
• Theory 1, held by the community:
– MSR can provide information about improvement
opportunities of the software process (p. 3 right
column)
• Artifact : MSR
• Context: any software development process
57
Descriptive
generalization

• Theory 2, proposed by Prechelt and Pepper based
on the case study:
– R1: …
– …
– R5: There is no affordable method to assess the
reliability of the results of MSR in DICA
– R6: The reliability of MSR results in DICA is low
– R5 and R6 are the major reasons why MSR is not used
for DICA
• Artifact: MSR
• Context: organizations that develop web
applications for a long period of time, confuse
defects with issues, and have no dedicated staff
to maintain bug tracks (sect 8.1)
58
Descriptive
generalizations
Rational
explanation of a
phenomenon.
(= architectural
explanation,
where some
components are
actors that have
goals and may
have reasons for
actions)

• Conceptual framework
– Definitions of change, defect, rework, issue, bug, bugfix,
defect insertion, defect correction
– Difficulty, cost, utility, reliability of a technique
• NB1 concepts shared with the OoS
• NB2 architectural framework
• Generalizations
– Previous slide
• NB they are about the effects of a class of artifacts in a class of
contexts
59

The use of theories
• “MSR can provide information about improvement
opportunities of the software process”
– Frame a phenomenon: software improvement is a problem of lack of
data about the software process
– Describe it: describe software repositories
– Specify a treatment: specify MSR techniques, tools and steps
– Analyze it: analyze the meaning of the output of MSR
– Generalize about it: claim that the outcome will be obtained in all
software processes
– Predict an effect: predict that it will happen in the next project
– Explain an effect: explain that an improvement has occurred because
of removal of a weak spot in the process
60

– yes
2. Capable to acquire/build Artifact under constraintsof
practice
– Prechelt & Pepper: considerable effort in their case
– No evidence that reliable information about processes will be
produced
– No: considerable uncertaintywhether effects have occured
5. They will contribute to stakeholder goals/satisfy
requirements
– No evidence that process improvements will occur
61

Applying existing theories to data and
Inferring new or updated theories from data
• Description
– Case descriptions of every step
– Interpretation of every step in terms of R1 – R6
• Statistical inference
– Not possible from a case
– (but there is one inside this case to investigate the
relation between defect descriptions and issue
descriptions)
• Abductiveinference
– Explanation of non-use in terms of R1 – R6
– Rational explanation in terms of reasons of actors
• Analogic inference
– Descriptions and explanation generalized by analogy
– Discussion of external validity
62
How did it happen?
• Existing theory 1
assumed, and falsified
• New theory 2 emerged
from the data and from
opinions of actors in the
OoS. Or were the
propositions R1-6
specified before the case
study was started?

The research setup
Population
Sample of
Objects of
Study
Represents
one or
more
population
elements
Treatment
instruments
Measure-
ment
instruments
63
Sources of evidence p. 5:
Context information, raw data of version archive and
bugtracker, analysis steps taken and not taken, issues
and arguments of those steps, data provided by MSR tools,
Infopark’s interpretation of the outcomes of the steps
MSR tools providing data;
Peppers work notes;
Pepper’s memory
(sect 8.3)
MSR tools
One complex Object of Study:
Infopark and its software repositories
Other software development
organizations and their repositories
Treatment is the 4–step procedure listed in
sect 2.3 performed by Pepper at Infopark

“Experimental Validation of a Risk
Assessment Method”
Vriezekolk, Etalle &Wieringa
21st Working Conference on
Requirements Engineering:
Foundations for Software Quality
(REFSQ) 2015
65

• Lab experiment to test reliability of a method,
RASTER, to assess risk of telecom availability
– Research question: How reliable is RASTER?
– Research setup: Six groupsof three students each
had to estimate likelihood and impact of a list of
non-availability risks for an email service, using
the RASTER method
66

What is a theory
• Design theory
– RASTER x professionals providing services during incidents
and disasters → availability risk assessments
• Theory of the experiment
– Sources of variability in assessment are
• Ambiguity or incompleteness of the method description
• Misunderstanding of the method,
• Lack of experience
• Lack of motivation
• Case complexity
• Disturbance from the environment
67
Empirical test,
Peer review?
Empirical test,
Peer review?
Artefact,
context
Artefact,
context

Design theory
– Raster concepts (infrastructure component, vulnerability, risk,
impact, likelihood, …)
2. The design generalization
Theory of the experiment
– Risk assessor, team, target of assessment, asse4ssment environment
2. Generalizations
– Claims about mechanisms that produce variability
68

The use of theories
• “Raster x Professionals → risk assessments”
– Frame a phenomenon: risk assessments are made by professionals
– Describe it: describe telco infrastructure architecture and its
vulnerabilities
– Specify a treatment: use RASTER to assess risks
– Analyze it: Trace risks to architecture components
– Generalize about it: claim that other professionals would find the
same risks of similar telco architectures
– Predict an effect: predict that this will happen in the next project
– Explain an effect: Explain assessments in terms of RASTER method and
ToA
69

– Yes
2. Capable to acquire/build Artifact under constraintsof practice
– RASTER requires relativelylittle training; RA is expensive, but not due to
RASTER
– Has been shown in experiments and pilots
– Plain for all to see
5. They will contribute to stakeholder goals/satisfy requirements
– Goal is to obtain accurate and reliable assessments
70

Inferring theories from data
– Description
• Outcome of RA’s on paper
• Krippendorf’s alpha to measure interrater agreement
• Outcome of exit questionnaires to asses sources of variability
– Statistical inference
• Sample non-random, and too small.
– Abductiveinference
Observed variability explained by
1. lack of expert knowledge,
2. differences in assumptions,
3. difficulty to choose between adjacent ordinal values for likelihood
– Analogic inference
• 1 and 2 absent/reduced in the field, so less variabilitythere
• 3 motivates improvement of the method to reduce this phenomenon
71

The research setup
Population
Sample of
Objects of
Study
Represents
one or
more
population
elements
Treatment
instruments
Measure-
ment
instruments
72
RA professionals in telco
Doing RA in a quiet room
Self-selected sample of students
In a quiet room
Application of RASTER to a small case
Personal observation,
Exit questionnaire,
RASTER forms
Oral instruction, written case
description and RASTER help
Similarities and dissimilarities!
Used both to reason from sample to population
1. Theory of variability formulated;
2. Designed a research setup that minimized the impact of these sources;
3. Explained observed variation in terms of this theory
4. Used this to generalize to population and to improve RASTER

Theories in Empirical Software Engineering

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Theories in Empirical Software Engineering

Semelhante a Theories in Empirical Software Engineering (20)

Mais de Daniel Mendez

Mais de Daniel Mendez (13)

Último

Último (20)

Theories in Empirical Software Engineering