SlideShare a Scribd company logo
1 of 34
Download to read offline
Dependable Systems
!
Dependability Attributes
Dr. Peter Tröger

!
Sources: 

!
J.C. Laprie. Dependability: Basic Concepts and Terminology

Eusgeld, Irene et al.: Dependability Metrics. 4909. Springer Publishing, 2008

Echtle, Klaus: Fehlertoleranzverfahren. Heidelberg, Germany : Springer Verlag, 1990.

Pfister, Gregory F.: High Availability. In: In Search of Clusters. , S. 379-452

!
Dependable Systems Course PT 2014
Dependability
• Umbrella term for operational requirements on a system

• IFIP WG 10.4: "[..] the trustworthiness of a computing system which allows reliance
to be justifiably placed on the service it delivers [..]"

• IEC IEV: "dependability (is) the collective term used to describe the availability
performance and its influencing factors : reliability performance, maintainability
performance and maintenance support performance"

• Laprie: „ Trustworthiness of a computer system such that 

reliance can be placed on the service it delivers to the user “

• Adds a third dimension to system quality

• General question: How to deal with unexpected events ?

• In German: ,Verlässlichkeit‘ vs. ,Zuverlässigkeit‘

2
Dependable Systems Course PT 2014
Dependability Tree (Laprie)
3
Dependable Systems Course PT 2014
Attributes of Dependability
• Non-functional attributes such as reliability and maintainability

• Complementary nature of viewpoints in the area of dependability

• In comparison to functional properties

• ... hard to define

• ... hard to abstract

• ... ,Divide and conquer‘ does not work as good

• ... difficult interrelationships

• ... often probabilistic dependencies
4
Dependable Systems Course PT 2014
Attributes of Dependability
• Reliability (,Zuverlässigkeit‘) - Continuity of service

• Initial goal for computer system trustworthiness

• Other disciplines have different understanding

• „Reliability is not doing the wrong thing.“ [Gray85]

• „Reliability: Ability of a system or component to perform its required functions under
stated conditions for a specified period of time“ [IEEE]

• „Reliability is the probability that an item will not fail.“ [Misra]

• Availability (,Verfügbarkeit‘) - Readiness for usage

• „Probability that a system is able to deliver correctly its service at any given
time.“ [Goloubeva]

• „Maintainability is the probability that the item can be successfully restored to operation
after failure; and availability ... is a function of reliability and maintainability .“ [Misra]

5
Dependable Systems Course PT 2014
Observations on Dependability Attributes
• Availability is always required

• Reliability, safety, and security may be optional

• Reliability might be analyzed for hardware / software components

• Availability is always from the system view point
6
Dependable Systems Course PT 2014
Attributes of Dependability
• Safety - Avoidance of catastrophic consequences on the environment

• Critical applications

• Specification needs to describe things that should not happen

• Security - Prevention of unauthorized access and / or information handling

• Became especially relevant with distributed systems

• Confidentiality - Absence of unauthorized disclosure of information

• Integrity - Absence of improper system alteration

• With respect to either accidental or intentional faults

• Maintainability - Ability to undergo modifications 

and repairs
7
Dependable Systems Course PT 2014
In Detail
• Reliability - Function R(t) 

• Probability that a system is functioning properly and constantly over time period t

• Assumes that system was fully operational at t=0

• Denotes failure-free interval of operation
• Availability - Statement if a system is operational at a point in time / fraction of time

• Describe system behavior in presence of error treatment mechanisms 

• Instantaneous availability (at t) - Probability that a system is performing
correctly at time t; equal to reliability for non-repairable systems: Ai(t) = R(t)

• Steady-state availability - Probability that a system will be operational at any
random point of time, expressed as the fraction of time a system is operational
during its expected lifetime: As = Uptime / Lifetime
8
Dependable Systems Course PT 2014
Probability of Events
9
(C) mathforum.org
Dependable Systems Course PT 2014
PDF & CDF
• Probability density function pdf for random variable X

• Discrete random variable: Probability that X will be x

• Continuous variable: Probability that X is in [a, b]
• Computed as area under the density function in 

this range



• Cumulative distribution function cdf(x): Probability that 

the value of the random variable is at most x



• Limits of integration depend on the nature of the distribution function

• Value of cdf at x is always area under pdf from 0 to x
10
(C) weibull.com
Dependable Systems Course PT 2014
PDF Examples
• Well-known statistical distributions, each describing a random variable behavior

• Continuous version described by PDF (discrete pendant would be histogram)
11
Normal distribution

(mean, variance)
Exponential distribution

(rate parameter)
Probability density

function
Cumulative distribution

function
Dependable Systems Course PT 2014
The Reliability Function R(t)
• Reliability: Probability R(t) that a component 

works for time period [0,t]
• Idea: Express time period of correct operation

as continuos random variable X 

-> time to failure

• cdf(t) of this variable: Describes probability of 

failure before t -> Unreliability Function F(t)

• 1-cdf(t): Describes probability of a 

failure after t -> time to failure -> Reliability Function R(t)

• This works since (A) working / non-working is a binary decision, (B) the area under
the complete pdf is 1 and (C) the ,red‘ area is the result of the cdf function

• Typically, failures are modeled as Poisson process

• Poisson properties lead to exponential distribution for the time between events

• This time therefore only depends on failure rate parameter
12
(C) weibull.com
Dependable Systems Course PT 2014
Failure Rate
!
• Time to failure is not always measured in calendar time, might be discrete

• Number of kilometers driven with a car

• Number of rotations / cycles for a mechanical component

• Treat pdf for time-to-failure random variable X as failure density function
• Can be computed as derivative of the unreliability function



• Failure rate / hazard rate function - mean frequency of failures at time t

• Conditional probability of a failure between a and b, given the survival until t
13
f(t) = dF(t)/dt
(t) = f(t)
R(t) = for constant failure rate
Dependable Systems Course PT 2014
Why Exponential ?
• Distribution function that models the memoryless property of the Poisson process
• P(T > t + s|T > t) = P(T > s), e.g. PFailure(5 years|T > 2 years) = PFailure(3 years)

• Failure is not the result of wear-out

• Models ,intrinsic failure‘ behavior, assumed for the majority of hardware life time

• Weibull distribution as alternative, can also model tear-in and wear-out

• Some natural phenomena have constant failure rate (e.g. cosmic ray particles)
14
• Failures occur continuously and 

independently at a constant 

average rate (Poisson process)

• Increasing probability of failure 

with increasing t - cdt function

• Failure rate from experience

or complexity measures

• Cumulative distribution function:

!
• Reliability function (survival probability) for exponential failure distribution:
Dependable Systems Course PT 2014
The Reliability Function R(t)
15
R(t) = P(X > t) = 1 F(t) = e x
with F(x) = 1 e x
CDF:Probabilityofa
failurebeforet
Dependable Systems Course PT 2014
Variable Failure Rate in Real World
16
Burn in Use Wear out Integration
& Test
Use Obsolete
Hardware Software
• Failure rate is treated as constant parameter of the exponential distribution

• (maybe invalid) simplification, mostly combined solution in practice:

• Exponential distribution when failure rate is constant

• Weibull distribution when failure rate is monotonic decreasing / increasing
Dependable Systems Course PT 2014
Weibull Distribution
• Most widely used life distribution beside exponential

• Named after Swedish professor Waloddi Weibull (1887 - 1979)

• Originally invented for modeling material strength

• Very flexible through parametrization, can model many other
probability distribution types

• k: Shape parameter

• k=1: constant hazard rate, like exponential distribution

• k<1: Hazard rate decreases over time (,infant mortality‘)

• k>1: Hazard rate increases with time (,aging‘), 

like with (log)normal distribution

• : Scale parameter
17
pdf
cdf
Dependable Systems Course PT 2014
Hardware Failure Rate
18
DS – SR&C - 6
SW Failure Rates – Industrial Practice
Dependable Systems Course PT 2014
Software Failure Rate
• Industrial practice

• When do you stop testing ? - No more time, or no more money ...
19
(C)Malek
Dependable Systems Course PT 2014
Failure Rate Examples
• Standards from experience provide base data for component reliability

• Society of Automotive Engineers (SAE) reliability model





• Predicted failure rate

• Base failure rate for the component

• Various modification factors

• Component composition

• Ambient temperature

• Location in the vehicle
20
p = b
b
i=1⇥i
p
b
i
Dependable Systems Course PT 2014
Life-Stress Relationship
• Formulate a model that includes the life
distribution and how outside factors
(such as stress) change this distribution

• Example - load sharing redundancy:
Component reliability depends
indirectly on the number of previousely
failed components

• Single component failure distribution
is no longer sufficient for describing
reliability

• Model must describe the effect of
load and the failure probability

• Typical approach: Define load-
dependent parameter of the
distribution function
21
(C) weibull.com
Dependable Systems Course PT 2014
Other Reliability Distributions
• (Mixed) Weibull distribution

• Normal distribution

• Lognormal distribution

• (Generalized) Gamma distribution

• Logistic distribution

• Loglogistic distribution
22
• Mean time to failure (MTTF) - 

Average time it takes to fail

-> average uptime

• Mean time to recover / repair (MTTR) -
Average time it takes to recover

• Mean time between failures (MTBF) -
Average time between two successive failures

• Availability = Uptime / Lifetime

= MTTF / MTBF
Dependable Systems Course PT 2014
Steady-State Availability
23
up down up
MTBF
down up
up
C1
up
C3
up
C2
MTTF
• Expressing availability with MTTF/MTBF describes a repairable systems behavior
under infinite time assumption

• MTTF and MTTR get stable over longer time periods

• Fulfils also the steady-state condition

• Expressing dependability with MTTF ,should‘ imply a non-repairable system, 

expressing dependability with MTBF ,should‘ imply a repairable system

• Sometimes MTBF means mean time BEFORE failure = MTTF 

-> typical source of confusion

• Exponential distribution: 

Reciprocal of the rate parameter is equivalent to the distribution mean

(Example: With 4 events per hour, you can expect one roughly every 15 minutes).
Dependable Systems Course PT 2014
Steady-State Availability and MTBF
24
= 1
MT T F
R(100hours) = e 100
= 0.96
= ln0.96
100 = 0, 000408
MTTF = 1
= 2449, 66hours
Dependable Systems Course PT 2014
Example
• Test population with 50 HDDs and 100 hours of testing, 2 drives fail during the test

• As usual, we assume exponential distribution of the time to failure

• Reliability at t=100 is known to be 96% (48/50)

• Reciprocal of the according failure rate is the MTTF
25
R(t) = P(X > t) = 1 F(t) = e x
with F(x) = 1 e x
Dependable Systems Course PT 2014
MTBF / MTTF in Practice
• Often express average failure behavior (statistics) for a component population

• Good for relative comparison, not for expected life time expectation of one unit

• Example: Hard disk with MTTF of 500.000 hours and 5 years of expected operation 

(,service life‘)

• Drive of this type is expected to run 5 years without problems

• Large group of such drives will (on average) have one failed drive after 500.000
hours of accumulated life time

• MTBF in practice is a weak approximation - sum of up-phases / number of failures

• Assumes renewable system with homogeneous failure distribution over lifetime
26
Dependable Systems Course PT 2014
Steady-State Availability
27
Availability Downtime per year Downtime per week
90.0 % (1 nine) 36.5 days 16.8 hours
99.0 % (2 nines) 3.65 days 1.68 hours
99.9 % (3 nines) 8.76 hours 10.1 min
99.99 % (4 nines) 52.6 min 1.01 min
99.999 % (5 nines) 5.26 min 6.05 s
99.9999 % (6 nines) 31.5 s 0.605 s
99.99999 % (7 nines) 0.3 s 6 ms
A = Uptime
Uptime+Downtime = MT T F
MT T F +MT T R
Dependable Systems Course PT 2014
Amazon EC2 SLA (2012)
28
Dependable Systems Course PT 2014
Operational Availability Calculation [Misra]
• Uptime elements: Standby time, operating time

• Downtime elements

• Logistic: Spares availability, spares location, transportation of spares

• Preventive maintenance: Inspection, servicing

• Administrative delay
• Finding personnel, reviewing manuals, complying with supply procedures,
locating tools, setting up test equipment

• Corrective maintenance
• Preparation time, fault location diagnosis, getting parts, correcting faults,
testing
29
Dependable Systems Course PT 2014
Example: Item-Level Sparing Analysis [Misra]
• Sparing analysis challenges

• How many spares do you need to keep the system available at the desired rate ?

• When are you going to need to spares (manufacturing time) ?

• Where the spares should be kept ?

• What system level you want to spare at ?
30
Dependable Systems Course PT 2014
MTTR Examples
• Hardware MTTR with spares onsite

• Operator available - 30min

• Operator on call - 2 hours

• Operator available during working hours - 14h

• Without spares - at least 24h

• SW MTTR with watchdog

• Reboot from ROM - 30s

• Reboot from disk - 3 min

• Reboot from network - 10 min
31
Dependable Systems Course PT 2014
MTTR << MTTF [Fox]
• Armando Fox on ,Recovery-Oriented Computing‘

• A = MTTF / (MTTF + MTTR)

• 10x decrease of MTTR as good as 10x increase of MTTF ?

• MTTF‘s are not claimable, but MTTR claims are verifiable

• Proving MTTF numbers demands system-years of observation and experience

• Lowering MTTR directly improves user experience of one specific outage, 

since MTTF is typically longer than one user session

• HCI factor of failed system

• Miller, 1968: >1sec “sluggish”, >10sec “distracted” (user moves away)

• 2001 Web user study: ~5sec „acceptable”, ~10sec „excessively slow“
32
Dependable Systems Course PT 2014
MTTR << MTTF [Fox]
• Proposal: Utility curve for recovery time

• Factors: Length of recovery time, level of service availability during error state

• Key distinction between interactive (session-based) and non-interactive systems

• If error state leads to some steady-state latency

• For how long will users tolerate temporary degradation ?

• How much degradation is acceptable ?

• Do they show a preference for increased latency vs. worse QOS vs. being turned
away and incentivized to return?

• Long recovery times are often reasoned by stateful components

• Utilize alternative architecture concepts
33
Dependable Systems Course PT 2014
Availability
34

More Related Content

What's hot

Chapter 08
Chapter 08Chapter 08
Chapter 08
guru3188
 
Artificial Intelligence MCQ Part 1 | 50 AI MCQs | Multiple Choice Questions &...
Artificial Intelligence MCQ Part 1 | 50 AI MCQs | Multiple Choice Questions &...Artificial Intelligence MCQ Part 1 | 50 AI MCQs | Multiple Choice Questions &...
Artificial Intelligence MCQ Part 1 | 50 AI MCQs | Multiple Choice Questions &...
RahulSharma4566
 

What's hot (20)

Anomaly detection using One-Class Neural Networks (OCNN)
Anomaly detection using One-Class Neural Networks (OCNN)Anomaly detection using One-Class Neural Networks (OCNN)
Anomaly detection using One-Class Neural Networks (OCNN)
 
Chapter 08
Chapter 08Chapter 08
Chapter 08
 
Multiple Access
Multiple AccessMultiple Access
Multiple Access
 
Ch 8 configuration management
Ch 8 configuration managementCh 8 configuration management
Ch 8 configuration management
 
Real Time Systems
Real Time SystemsReal Time Systems
Real Time Systems
 
EC 8004 wireless networks -Two marks with answers
EC 8004   wireless networks -Two marks with answersEC 8004   wireless networks -Two marks with answers
EC 8004 wireless networks -Two marks with answers
 
V model presentation
V model presentationV model presentation
V model presentation
 
SPACE DIVISION MULTIPLE ACCESS (SDMA) SATELLITE COMMUNICATION
SPACE DIVISION MULTIPLE ACCESS (SDMA) SATELLITE COMMUNICATION  SPACE DIVISION MULTIPLE ACCESS (SDMA) SATELLITE COMMUNICATION
SPACE DIVISION MULTIPLE ACCESS (SDMA) SATELLITE COMMUNICATION
 
Artificial Intelligence MCQ Part 1 | 50 AI MCQs | Multiple Choice Questions &...
Artificial Intelligence MCQ Part 1 | 50 AI MCQs | Multiple Choice Questions &...Artificial Intelligence MCQ Part 1 | 50 AI MCQs | Multiple Choice Questions &...
Artificial Intelligence MCQ Part 1 | 50 AI MCQs | Multiple Choice Questions &...
 
Nature-Inspired Metaheuristic Algorithms
Nature-Inspired Metaheuristic AlgorithmsNature-Inspired Metaheuristic Algorithms
Nature-Inspired Metaheuristic Algorithms
 
state space representation,State Space Model Controllability and Observabilit...
state space representation,State Space Model Controllability and Observabilit...state space representation,State Space Model Controllability and Observabilit...
state space representation,State Space Model Controllability and Observabilit...
 
Software process
Software processSoftware process
Software process
 
Spiral model
Spiral modelSpiral model
Spiral model
 
Modelling simulation (1)
Modelling simulation (1)Modelling simulation (1)
Modelling simulation (1)
 
Sdlc
SdlcSdlc
Sdlc
 
Deadbeat Response Design _8th lecture
Deadbeat Response Design _8th lectureDeadbeat Response Design _8th lecture
Deadbeat Response Design _8th lecture
 
Reference for z and inverse z transform
Reference for z and inverse z transformReference for z and inverse z transform
Reference for z and inverse z transform
 
DSDM
DSDMDSDM
DSDM
 
Defect removal effectiveness
Defect removal effectivenessDefect removal effectiveness
Defect removal effectiveness
 
SE CHAPTER 2 PROCESS MODELS
SE CHAPTER 2 PROCESS MODELSSE CHAPTER 2 PROCESS MODELS
SE CHAPTER 2 PROCESS MODELS
 

Similar to Dependable Systems -Dependability Attributes (5/16)

Reliability.pptx related to quality related
Reliability.pptx related to quality relatedReliability.pptx related to quality related
Reliability.pptx related to quality related
nikhilyadav365577
 
Reliability Engineering intro.pptx
Reliability Engineering intro.pptxReliability Engineering intro.pptx
Reliability Engineering intro.pptx
BakiyalakshmiR1
 

Similar to Dependable Systems -Dependability Attributes (5/16) (20)

Dependable Systems -Dependability Means (3/16)
Dependable Systems -Dependability Means (3/16)Dependable Systems -Dependability Means (3/16)
Dependable Systems -Dependability Means (3/16)
 
basic concepts of reliability
basic concepts of reliabilitybasic concepts of reliability
basic concepts of reliability
 
Reliability.pptx related to quality related
Reliability.pptx related to quality relatedReliability.pptx related to quality related
Reliability.pptx related to quality related
 
Dependable Systems - Structure-Based Dependabiilty Modeling (6/16)
Dependable Systems - Structure-Based Dependabiilty Modeling (6/16)Dependable Systems - Structure-Based Dependabiilty Modeling (6/16)
Dependable Systems - Structure-Based Dependabiilty Modeling (6/16)
 
Dependable Systems - Summary (16/16)
Dependable Systems - Summary (16/16)Dependable Systems - Summary (16/16)
Dependable Systems - Summary (16/16)
 
BASIC RELIABILITY MATHEMATICS.pptx
BASIC RELIABILITY MATHEMATICS.pptxBASIC RELIABILITY MATHEMATICS.pptx
BASIC RELIABILITY MATHEMATICS.pptx
 
Reliability Engineering intro.pptx
Reliability Engineering intro.pptxReliability Engineering intro.pptx
Reliability Engineering intro.pptx
 
EMOSH Reliability.ppt
EMOSH Reliability.pptEMOSH Reliability.ppt
EMOSH Reliability.ppt
 
Cs 568 Spring 10 Lecture 5 Estimation
Cs 568 Spring 10  Lecture 5 EstimationCs 568 Spring 10  Lecture 5 Estimation
Cs 568 Spring 10 Lecture 5 Estimation
 
Ch15 software reliability
Ch15 software reliabilityCh15 software reliability
Ch15 software reliability
 
Non Functional Requirement.
Non Functional Requirement.Non Functional Requirement.
Non Functional Requirement.
 
Laser 3-incremental
Laser 3-incrementalLaser 3-incremental
Laser 3-incremental
 
Dependable Systems - Introduction (1/16)
Dependable Systems - Introduction (1/16)Dependable Systems - Introduction (1/16)
Dependable Systems - Introduction (1/16)
 
Real Time Systems
Real Time SystemsReal Time Systems
Real Time Systems
 
Javier Garcia - Verdugo Sanchez - Six Sigma Training - W4 Reliability
Javier Garcia - Verdugo Sanchez - Six Sigma Training - W4 Reliability Javier Garcia - Verdugo Sanchez - Six Sigma Training - W4 Reliability
Javier Garcia - Verdugo Sanchez - Six Sigma Training - W4 Reliability
 
Introduction to OpenSees by Frank McKenna
Introduction to OpenSees by Frank McKennaIntroduction to OpenSees by Frank McKenna
Introduction to OpenSees by Frank McKenna
 
Dependable Systems -Dependability Threats (2/16)
Dependable Systems -Dependability Threats (2/16)Dependable Systems -Dependability Threats (2/16)
Dependable Systems -Dependability Threats (2/16)
 
SERENE 2014 School: Resilience in Cyber-Physical Systems: Challenges and Oppo...
SERENE 2014 School: Resilience in Cyber-Physical Systems: Challenges and Oppo...SERENE 2014 School: Resilience in Cyber-Physical Systems: Challenges and Oppo...
SERENE 2014 School: Resilience in Cyber-Physical Systems: Challenges and Oppo...
 
SERENE 2014 School: Gabor karsai serene2014_school
SERENE 2014 School: Gabor karsai serene2014_schoolSERENE 2014 School: Gabor karsai serene2014_school
SERENE 2014 School: Gabor karsai serene2014_school
 
chapter 8 discussabout reliability .pptx
chapter 8 discussabout reliability .pptxchapter 8 discussabout reliability .pptx
chapter 8 discussabout reliability .pptx
 

More from Peter Tröger

Operating Systems 1 (8/12) - Concurrency
Operating Systems 1 (8/12) - ConcurrencyOperating Systems 1 (8/12) - Concurrency
Operating Systems 1 (8/12) - Concurrency
Peter Tröger
 

More from Peter Tröger (20)

WannaCry - An OS course perspective
WannaCry - An OS course perspectiveWannaCry - An OS course perspective
WannaCry - An OS course perspective
 
Cloud Standards and Virtualization
Cloud Standards and VirtualizationCloud Standards and Virtualization
Cloud Standards and Virtualization
 
Distributed Resource Management Application API (DRMAA) Version 2
Distributed Resource Management Application API (DRMAA) Version 2Distributed Resource Management Application API (DRMAA) Version 2
Distributed Resource Management Application API (DRMAA) Version 2
 
OpenSubmit - How to grade 1200 code submissions
OpenSubmit - How to grade 1200 code submissionsOpenSubmit - How to grade 1200 code submissions
OpenSubmit - How to grade 1200 code submissions
 
Design of Software for Embedded Systems
Design of Software for Embedded SystemsDesign of Software for Embedded Systems
Design of Software for Embedded Systems
 
Humans should not write XML.
Humans should not write XML.Humans should not write XML.
Humans should not write XML.
 
What activates a bug? A refinement of the Laprie terminology model.
What activates a bug? A refinement of the Laprie terminology model.What activates a bug? A refinement of the Laprie terminology model.
What activates a bug? A refinement of the Laprie terminology model.
 
Dependable Systems - Hardware Dependability with Redundancy (14/16)
Dependable Systems - Hardware Dependability with Redundancy (14/16)Dependable Systems - Hardware Dependability with Redundancy (14/16)
Dependable Systems - Hardware Dependability with Redundancy (14/16)
 
Dependable Systems - System Dependability Evaluation (8/16)
Dependable Systems - System Dependability Evaluation (8/16)Dependable Systems - System Dependability Evaluation (8/16)
Dependable Systems - System Dependability Evaluation (8/16)
 
Dependable Systems -Software Dependability (15/16)
Dependable Systems -Software Dependability (15/16)Dependable Systems -Software Dependability (15/16)
Dependable Systems -Software Dependability (15/16)
 
Dependable Systems -Reliability Prediction (9/16)
Dependable Systems -Reliability Prediction (9/16)Dependable Systems -Reliability Prediction (9/16)
Dependable Systems -Reliability Prediction (9/16)
 
Dependable Systems -Fault Tolerance Patterns (4/16)
Dependable Systems -Fault Tolerance Patterns (4/16)Dependable Systems -Fault Tolerance Patterns (4/16)
Dependable Systems -Fault Tolerance Patterns (4/16)
 
Dependable Systems - Hardware Dependability with Diagnosis (13/16)
Dependable Systems - Hardware Dependability with Diagnosis (13/16)Dependable Systems - Hardware Dependability with Diagnosis (13/16)
Dependable Systems - Hardware Dependability with Diagnosis (13/16)
 
Verteilte Software-Systeme im Kontext von Industrie 4.0
Verteilte Software-Systeme im Kontext von Industrie 4.0Verteilte Software-Systeme im Kontext von Industrie 4.0
Verteilte Software-Systeme im Kontext von Industrie 4.0
 
Operating Systems 1 (12/12) - Summary
Operating Systems 1 (12/12) - SummaryOperating Systems 1 (12/12) - Summary
Operating Systems 1 (12/12) - Summary
 
Operating Systems 1 (8/12) - Concurrency
Operating Systems 1 (8/12) - ConcurrencyOperating Systems 1 (8/12) - Concurrency
Operating Systems 1 (8/12) - Concurrency
 
Operating Systems 1 (9/12) - Memory Management Concepts
Operating Systems 1 (9/12) - Memory Management ConceptsOperating Systems 1 (9/12) - Memory Management Concepts
Operating Systems 1 (9/12) - Memory Management Concepts
 
Operating Systems 1 (7/12) - Threads
Operating Systems 1 (7/12) - ThreadsOperating Systems 1 (7/12) - Threads
Operating Systems 1 (7/12) - Threads
 
Operating Systems 1 (1/12) - History
Operating Systems 1 (1/12) - HistoryOperating Systems 1 (1/12) - History
Operating Systems 1 (1/12) - History
 
Operating Systems 1 (2/12) - Hardware Basics
Operating Systems 1 (2/12) - Hardware BasicsOperating Systems 1 (2/12) - Hardware Basics
Operating Systems 1 (2/12) - Hardware Basics
 

Recently uploaded

The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 

Recently uploaded (20)

UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 

Dependable Systems -Dependability Attributes (5/16)

  • 1. Dependable Systems ! Dependability Attributes Dr. Peter Tröger ! Sources: ! J.C. Laprie. Dependability: Basic Concepts and Terminology Eusgeld, Irene et al.: Dependability Metrics. 4909. Springer Publishing, 2008 Echtle, Klaus: Fehlertoleranzverfahren. Heidelberg, Germany : Springer Verlag, 1990. Pfister, Gregory F.: High Availability. In: In Search of Clusters. , S. 379-452 !
  • 2. Dependable Systems Course PT 2014 Dependability • Umbrella term for operational requirements on a system • IFIP WG 10.4: "[..] the trustworthiness of a computing system which allows reliance to be justifiably placed on the service it delivers [..]" • IEC IEV: "dependability (is) the collective term used to describe the availability performance and its influencing factors : reliability performance, maintainability performance and maintenance support performance" • Laprie: „ Trustworthiness of a computer system such that 
 reliance can be placed on the service it delivers to the user “ • Adds a third dimension to system quality • General question: How to deal with unexpected events ? • In German: ,Verlässlichkeit‘ vs. ,Zuverlässigkeit‘
 2
  • 3. Dependable Systems Course PT 2014 Dependability Tree (Laprie) 3
  • 4. Dependable Systems Course PT 2014 Attributes of Dependability • Non-functional attributes such as reliability and maintainability • Complementary nature of viewpoints in the area of dependability • In comparison to functional properties • ... hard to define • ... hard to abstract • ... ,Divide and conquer‘ does not work as good • ... difficult interrelationships • ... often probabilistic dependencies 4
  • 5. Dependable Systems Course PT 2014 Attributes of Dependability • Reliability (,Zuverlässigkeit‘) - Continuity of service • Initial goal for computer system trustworthiness • Other disciplines have different understanding • „Reliability is not doing the wrong thing.“ [Gray85] • „Reliability: Ability of a system or component to perform its required functions under stated conditions for a specified period of time“ [IEEE] • „Reliability is the probability that an item will not fail.“ [Misra] • Availability (,Verfügbarkeit‘) - Readiness for usage • „Probability that a system is able to deliver correctly its service at any given time.“ [Goloubeva] • „Maintainability is the probability that the item can be successfully restored to operation after failure; and availability ... is a function of reliability and maintainability .“ [Misra]
 5
  • 6. Dependable Systems Course PT 2014 Observations on Dependability Attributes • Availability is always required • Reliability, safety, and security may be optional • Reliability might be analyzed for hardware / software components • Availability is always from the system view point 6
  • 7. Dependable Systems Course PT 2014 Attributes of Dependability • Safety - Avoidance of catastrophic consequences on the environment • Critical applications • Specification needs to describe things that should not happen • Security - Prevention of unauthorized access and / or information handling • Became especially relevant with distributed systems • Confidentiality - Absence of unauthorized disclosure of information • Integrity - Absence of improper system alteration • With respect to either accidental or intentional faults • Maintainability - Ability to undergo modifications 
 and repairs 7
  • 8. Dependable Systems Course PT 2014 In Detail • Reliability - Function R(t) • Probability that a system is functioning properly and constantly over time period t • Assumes that system was fully operational at t=0 • Denotes failure-free interval of operation • Availability - Statement if a system is operational at a point in time / fraction of time • Describe system behavior in presence of error treatment mechanisms • Instantaneous availability (at t) - Probability that a system is performing correctly at time t; equal to reliability for non-repairable systems: Ai(t) = R(t) • Steady-state availability - Probability that a system will be operational at any random point of time, expressed as the fraction of time a system is operational during its expected lifetime: As = Uptime / Lifetime 8
  • 9. Dependable Systems Course PT 2014 Probability of Events 9 (C) mathforum.org
  • 10. Dependable Systems Course PT 2014 PDF & CDF • Probability density function pdf for random variable X • Discrete random variable: Probability that X will be x • Continuous variable: Probability that X is in [a, b] • Computed as area under the density function in 
 this range
 
 • Cumulative distribution function cdf(x): Probability that 
 the value of the random variable is at most x
 
 • Limits of integration depend on the nature of the distribution function • Value of cdf at x is always area under pdf from 0 to x 10 (C) weibull.com
  • 11. Dependable Systems Course PT 2014 PDF Examples • Well-known statistical distributions, each describing a random variable behavior • Continuous version described by PDF (discrete pendant would be histogram) 11 Normal distribution
 (mean, variance) Exponential distribution
 (rate parameter) Probability density
 function Cumulative distribution
 function
  • 12. Dependable Systems Course PT 2014 The Reliability Function R(t) • Reliability: Probability R(t) that a component 
 works for time period [0,t] • Idea: Express time period of correct operation
 as continuos random variable X 
 -> time to failure • cdf(t) of this variable: Describes probability of 
 failure before t -> Unreliability Function F(t) • 1-cdf(t): Describes probability of a 
 failure after t -> time to failure -> Reliability Function R(t) • This works since (A) working / non-working is a binary decision, (B) the area under the complete pdf is 1 and (C) the ,red‘ area is the result of the cdf function • Typically, failures are modeled as Poisson process • Poisson properties lead to exponential distribution for the time between events • This time therefore only depends on failure rate parameter 12 (C) weibull.com
  • 13. Dependable Systems Course PT 2014 Failure Rate ! • Time to failure is not always measured in calendar time, might be discrete • Number of kilometers driven with a car • Number of rotations / cycles for a mechanical component • Treat pdf for time-to-failure random variable X as failure density function • Can be computed as derivative of the unreliability function
 
 • Failure rate / hazard rate function - mean frequency of failures at time t • Conditional probability of a failure between a and b, given the survival until t 13 f(t) = dF(t)/dt (t) = f(t) R(t) = for constant failure rate
  • 14. Dependable Systems Course PT 2014 Why Exponential ? • Distribution function that models the memoryless property of the Poisson process • P(T > t + s|T > t) = P(T > s), e.g. PFailure(5 years|T > 2 years) = PFailure(3 years) • Failure is not the result of wear-out • Models ,intrinsic failure‘ behavior, assumed for the majority of hardware life time • Weibull distribution as alternative, can also model tear-in and wear-out • Some natural phenomena have constant failure rate (e.g. cosmic ray particles) 14
  • 15. • Failures occur continuously and 
 independently at a constant 
 average rate (Poisson process) • Increasing probability of failure 
 with increasing t - cdt function • Failure rate from experience
 or complexity measures • Cumulative distribution function:
 ! • Reliability function (survival probability) for exponential failure distribution: Dependable Systems Course PT 2014 The Reliability Function R(t) 15 R(t) = P(X > t) = 1 F(t) = e x with F(x) = 1 e x CDF:Probabilityofa failurebeforet
  • 16. Dependable Systems Course PT 2014 Variable Failure Rate in Real World 16 Burn in Use Wear out Integration & Test Use Obsolete Hardware Software • Failure rate is treated as constant parameter of the exponential distribution • (maybe invalid) simplification, mostly combined solution in practice: • Exponential distribution when failure rate is constant • Weibull distribution when failure rate is monotonic decreasing / increasing
  • 17. Dependable Systems Course PT 2014 Weibull Distribution • Most widely used life distribution beside exponential • Named after Swedish professor Waloddi Weibull (1887 - 1979) • Originally invented for modeling material strength • Very flexible through parametrization, can model many other probability distribution types • k: Shape parameter • k=1: constant hazard rate, like exponential distribution • k<1: Hazard rate decreases over time (,infant mortality‘) • k>1: Hazard rate increases with time (,aging‘), 
 like with (log)normal distribution • : Scale parameter 17 pdf cdf
  • 18. Dependable Systems Course PT 2014 Hardware Failure Rate 18
  • 19. DS – SR&C - 6 SW Failure Rates – Industrial Practice Dependable Systems Course PT 2014 Software Failure Rate • Industrial practice • When do you stop testing ? - No more time, or no more money ... 19 (C)Malek
  • 20. Dependable Systems Course PT 2014 Failure Rate Examples • Standards from experience provide base data for component reliability • Society of Automotive Engineers (SAE) reliability model
 
 
 • Predicted failure rate • Base failure rate for the component • Various modification factors • Component composition • Ambient temperature • Location in the vehicle 20 p = b b i=1⇥i p b i
  • 21. Dependable Systems Course PT 2014 Life-Stress Relationship • Formulate a model that includes the life distribution and how outside factors (such as stress) change this distribution • Example - load sharing redundancy: Component reliability depends indirectly on the number of previousely failed components • Single component failure distribution is no longer sufficient for describing reliability • Model must describe the effect of load and the failure probability • Typical approach: Define load- dependent parameter of the distribution function 21 (C) weibull.com
  • 22. Dependable Systems Course PT 2014 Other Reliability Distributions • (Mixed) Weibull distribution • Normal distribution • Lognormal distribution • (Generalized) Gamma distribution • Logistic distribution • Loglogistic distribution 22
  • 23. • Mean time to failure (MTTF) - 
 Average time it takes to fail
 -> average uptime • Mean time to recover / repair (MTTR) - Average time it takes to recover • Mean time between failures (MTBF) - Average time between two successive failures • Availability = Uptime / Lifetime
 = MTTF / MTBF Dependable Systems Course PT 2014 Steady-State Availability 23 up down up MTBF down up up C1 up C3 up C2 MTTF
  • 24. • Expressing availability with MTTF/MTBF describes a repairable systems behavior under infinite time assumption • MTTF and MTTR get stable over longer time periods • Fulfils also the steady-state condition • Expressing dependability with MTTF ,should‘ imply a non-repairable system, 
 expressing dependability with MTBF ,should‘ imply a repairable system • Sometimes MTBF means mean time BEFORE failure = MTTF 
 -> typical source of confusion • Exponential distribution: 
 Reciprocal of the rate parameter is equivalent to the distribution mean
 (Example: With 4 events per hour, you can expect one roughly every 15 minutes). Dependable Systems Course PT 2014 Steady-State Availability and MTBF 24 = 1 MT T F
  • 25. R(100hours) = e 100 = 0.96 = ln0.96 100 = 0, 000408 MTTF = 1 = 2449, 66hours Dependable Systems Course PT 2014 Example • Test population with 50 HDDs and 100 hours of testing, 2 drives fail during the test • As usual, we assume exponential distribution of the time to failure • Reliability at t=100 is known to be 96% (48/50) • Reciprocal of the according failure rate is the MTTF 25 R(t) = P(X > t) = 1 F(t) = e x with F(x) = 1 e x
  • 26. Dependable Systems Course PT 2014 MTBF / MTTF in Practice • Often express average failure behavior (statistics) for a component population • Good for relative comparison, not for expected life time expectation of one unit • Example: Hard disk with MTTF of 500.000 hours and 5 years of expected operation 
 (,service life‘) • Drive of this type is expected to run 5 years without problems • Large group of such drives will (on average) have one failed drive after 500.000 hours of accumulated life time • MTBF in practice is a weak approximation - sum of up-phases / number of failures • Assumes renewable system with homogeneous failure distribution over lifetime 26
  • 27. Dependable Systems Course PT 2014 Steady-State Availability 27 Availability Downtime per year Downtime per week 90.0 % (1 nine) 36.5 days 16.8 hours 99.0 % (2 nines) 3.65 days 1.68 hours 99.9 % (3 nines) 8.76 hours 10.1 min 99.99 % (4 nines) 52.6 min 1.01 min 99.999 % (5 nines) 5.26 min 6.05 s 99.9999 % (6 nines) 31.5 s 0.605 s 99.99999 % (7 nines) 0.3 s 6 ms A = Uptime Uptime+Downtime = MT T F MT T F +MT T R
  • 28. Dependable Systems Course PT 2014 Amazon EC2 SLA (2012) 28
  • 29. Dependable Systems Course PT 2014 Operational Availability Calculation [Misra] • Uptime elements: Standby time, operating time • Downtime elements • Logistic: Spares availability, spares location, transportation of spares • Preventive maintenance: Inspection, servicing • Administrative delay • Finding personnel, reviewing manuals, complying with supply procedures, locating tools, setting up test equipment • Corrective maintenance • Preparation time, fault location diagnosis, getting parts, correcting faults, testing 29
  • 30. Dependable Systems Course PT 2014 Example: Item-Level Sparing Analysis [Misra] • Sparing analysis challenges • How many spares do you need to keep the system available at the desired rate ? • When are you going to need to spares (manufacturing time) ? • Where the spares should be kept ? • What system level you want to spare at ? 30
  • 31. Dependable Systems Course PT 2014 MTTR Examples • Hardware MTTR with spares onsite • Operator available - 30min • Operator on call - 2 hours • Operator available during working hours - 14h • Without spares - at least 24h • SW MTTR with watchdog • Reboot from ROM - 30s • Reboot from disk - 3 min • Reboot from network - 10 min 31
  • 32. Dependable Systems Course PT 2014 MTTR << MTTF [Fox] • Armando Fox on ,Recovery-Oriented Computing‘ • A = MTTF / (MTTF + MTTR) • 10x decrease of MTTR as good as 10x increase of MTTF ? • MTTF‘s are not claimable, but MTTR claims are verifiable • Proving MTTF numbers demands system-years of observation and experience • Lowering MTTR directly improves user experience of one specific outage, 
 since MTTF is typically longer than one user session • HCI factor of failed system • Miller, 1968: >1sec “sluggish”, >10sec “distracted” (user moves away) • 2001 Web user study: ~5sec „acceptable”, ~10sec „excessively slow“ 32
  • 33. Dependable Systems Course PT 2014 MTTR << MTTF [Fox] • Proposal: Utility curve for recovery time • Factors: Length of recovery time, level of service availability during error state • Key distinction between interactive (session-based) and non-interactive systems • If error state leads to some steady-state latency • For how long will users tolerate temporary degradation ? • How much degradation is acceptable ? • Do they show a preference for increased latency vs. worse QOS vs. being turned away and incentivized to return? • Long recovery times are often reasoned by stateful components • Utilize alternative architecture concepts 33
  • 34. Dependable Systems Course PT 2014 Availability 34