SlideShare uma empresa Scribd logo
1 de 23
Dependable systems
                             architectures
                                         Lecture 2




Dependable systems architectures, 2013               Slide 1
System fault tolerance
    •       Fault tolerance means that the system can continue
            in operation in spite of software fault i.e. the fault
            does not lead to a failure
    •       Fault tolerance is required where there are high
            availability requirements, no ‘fail safe’ state or where
            system failure costs are very high.
    •       Even if the system has been proved to conform to its
            specification, it must also be fault tolerant as there
            may be specification errors or the validation may be
            incorrect.



Dependable systems architectures, 2013                         Slide 2
Dependable system
                               architectures
    •      Dependable systems architectures are used in
           situations where fault tolerance is essential. These
           architectures are generally all based on redundancy
           and diversity.
    •      Examples of situations where dependable
           architectures are used:
         –       Flight control systems, where system failure could threaten
                 the safety of passengers
         –       Reactor systems where failure of a control system could lead
                 to a chemical or nuclear emergency
         –       Telecommunication systems, where there is a need for 24/7
                 availability.
Dependable systems architectures, 2013                                  Slide 3
Protection systems
    •      A specialized system that is associated with some
           other control system, which can take emergency
           action if a failure occurs.
         –       System to stop a train if it passes a red light
         –       System to shut down a reactor if temperature/pressure are
                 too high

    •      Protection systems independently monitor the
           controlled system and the environment.
    •      If a problem is detected, it issues commands to take
           emergency action to shut down the system and avoid
           a catastrophe.
Dependable systems architectures, 2013                                 Slide 4
Sizewell B reactor
                                         •   Software controlled
                                             protection system in
                                             Sizewell B reactor
                                         •   Go-live in 1993
                                         •   Software protection
                                             system has been claimed
                                             to have a 1/10000 PFD




Dependable systems architectures, 2013                              Slide 5
Protection system architecture




Dependable systems architectures, 2013   Slide 6
Protection system functionality
    •      Protection systems are redundant because they
           include monitoring and control capabilities that
           replicate those in the control software.
    •      Protection systems should be diverse and use
           different technology from the control software.
    •      They are simpler than the control system so more
           effort can be expended in validation and
           dependability assurance.
    •      Aim is to ensure that there is a low probability of
           failure on demand for the protection system.

Dependable systems architectures, 2013                           Slide 7
Self-monitoring architectures
    •      Multi-channel architectures where the system
           monitors its own operations and takes action if
           inconsistencies are detected.
    •      The same computation is carried out on each channel
           and the results are compared. If the results are
           identical and are produced at the same time, then it is
           assumed that the system is operating correctly.
    •      If the results are different, then a failure is assumed
           and a failure exception is raised.



Dependable systems architectures, 2013                         Slide 8
Self-monitoring architecture




Dependable systems architectures, 2013       Slide 9
Self-monitoring systems
    •      Hardware in each channel has to be diverse so that
           common mode hardware failure will not lead to each
           channel producing the same results.
    •      Software in each channel must also be diverse,
           otherwise the same software error would affect each
           channel.
    •      If high-availability is required, you may use several
           self-checking systems in parallel.
         –       This is the approach used in the Airbus family of aircraft for
                 their flight control systems.


Dependable systems architectures, 2013                                     Slide 10
Airbus 340
                                               •   Airbus were the first
                                                   commercial aircraft
                                                   manufacturer to use ‘fly
                                                   by wire’ flight control
                                                   systems
                                               •   Fly by wire systems are
                                                   lighter (saving fuel),
                                                   can be more fuel
                                                   efficient and can detect
                                                   and prevent pilot
                                                   actions that are
                                                   potentially dangerous.
Dependable systems architectures, 2013                                Slide 11
Airbus flight control system
                       architecture




Dependable systems architectures, 2013        Slide 12
Airbus architecture discussion
    •      The Airbus FCS has 5 separate computers, any one
           of which can run the control software.
    •      Extensive use has been made of diversity
         –       Primary and secondary systems use different processors.
         –       Primary and secondary systems use chipsets from different
                 manufacturers.
         –       Software in secondary systems is less complex than in
                 primary system – provides only critical functionality.
         –       Software in each channel is developed in different
                 programming languages by different teams.
         –       Different programming languages used in primary and
                 secondary systems.
Dependable systems architectures, 2013                                    Slide 13
Hardware fault tolerance
                                         •   Depends on triple-modular
                                             redundancy (TMR).
                                         •   Three identical components that
                                             receive the same input and whose
                                             outputs are compared.
                                         •   If one output is different, it is
                                             ignored and component failure is
                                             assumed.
                                         •   Based on most faults resulting
                                             from component failures rather
                                             than design faults and a low
Dependable systems architectures, 2013       probability of simultaneous Slide 14
Triple modular redundancy




Dependable systems architectures, 2013     Slide 15
N-version programming
                                         •   Multiple versions of a software
                                             system carry out computations at
                                             the same time. There should be an
                                             odd number of computers
                                             involved, typically 3.
                                         •   The results are compared using a
                                             voting system and the majority
                                             result is taken to be the correct
                                             result.
                                         •   Approach derived from the notion
                                             of triple-modular redundancy, as
                                             used in hardware systems.
Dependable systems architectures, 2013                                    Slide 16
N-version programming




Dependable systems architectures, 2013       Slide 17
N-version programming
                                         •   The different system versions
                                             are designed and implemented
                                             by different teams. It is
                                             assumed that there is a low
                                             probability that they will make
                                             the same mistakes. The
                                             algorithms used should but may
                                             not be different.
                                         •   There is some empirical
                                             evidence that teams commonly
                                             misinterpret specifications in the
                                             same way and chose the same
                                             algorithms in their systems.
Dependable systems architectures, 2013                                    Slide 18
Software diversity
    •      Approaches to software fault tolerance embedded in
           the system architecture depend on software diversity
           where it is assumed that different implementations of
           the same software specification will fail in different
           ways.
    •      It is assumed that implementations are (a)
           independent and (b) do not include common errors.
    •      Strategies to achieve diversity
         –       Different programming languages
         –       Different design methods and tools
         –       Explicit specification of different algorithms
Dependable systems architectures, 2013                            Slide 19
Problems with design diversity
    •       Teams are not culturally diverse so they tend to
            tackle problems in the same way
    •       Characteristic errors
          –      Different teams make the same mistakes. Some parts of an
                 implementation are more difficult than others so all teams
                 tend to make mistakes in the same place;

    •       Specification errors;
          –      If there is an error in the specification then this is reflected in
                 all implementations;
          –      This can be addressed to some extent by using multiple
                 specification representations.

Dependable systems architectures, 2013                                        Slide 20
Specification dependency
    •       Both approaches to software redundancy are
            susceptible to specification errors. If the specification
            is incorrect, the system could fail
    •       This is also a problem with hardware but software
            specifications are usually more complex than
            hardware specifications and harder to validate.
    •       This has been addressed in some cases by
            developing separate software specifications from the
            same user specification.



Dependable systems architectures, 2013                          Slide 21
Improvements in practice
  •   In principle, if diversity and independence can be
      achieved, multi-version programming leads to very
      significant improvements in reliability and availability.
  •   In practice, observed improvements are much less
      significant but the approach seems leads to reliability
      improvements of between 5 and 9 times.
  •   Are such improvements are worth the considerable
      extra development costs for multi-version
      programming?
  •       Is it cheaper to increase the reliability of a single
          version using fault avoidance and fault detection
Dependabletechniques?
          systems architectures, 2013                           Slide 22
Key points
    •      Dependable system architectures are system
           architectures that are designed for fault tolerance.
    •      Architectural styles that support fault tolerance
           include protection systems, self-monitoring
           architectures and N-version programming.
    •      Software diversity is difficult to achieve because it is
           practically impossible to ensure that each version of
           the software is truly independent.




Dependable systems architectures, 2013                         Slide 23

Mais conteúdo relacionado

Mais procurados

CS 5032 L12 security testing and dependability cases 2013
CS 5032 L12  security testing and dependability cases 2013CS 5032 L12  security testing and dependability cases 2013
CS 5032 L12 security testing and dependability cases 2013Ian Sommerville
 
CS5032 L9 security engineering 1 2013
CS5032 L9 security engineering 1 2013CS5032 L9 security engineering 1 2013
CS5032 L9 security engineering 1 2013Ian Sommerville
 
Reliability and security specification (CS 5032 2012)
Reliability and security specification (CS 5032 2012)Reliability and security specification (CS 5032 2012)
Reliability and security specification (CS 5032 2012)Ian Sommerville
 
Safety specification (CS 5032 2012)
Safety specification (CS 5032 2012)Safety specification (CS 5032 2012)
Safety specification (CS 5032 2012)Ian Sommerville
 
Security Engineering 2 (CS 5032 2012)
Security Engineering 2 (CS 5032 2012)Security Engineering 2 (CS 5032 2012)
Security Engineering 2 (CS 5032 2012)Ian Sommerville
 
Security case buffer overflow
Security case buffer overflowSecurity case buffer overflow
Security case buffer overflowIan Sommerville
 
Ch11-Software Engineering 9
Ch11-Software Engineering 9Ch11-Software Engineering 9
Ch11-Software Engineering 9Ian Sommerville
 
Norman Patch and Remediation
Norman Patch and  RemediationNorman Patch and  Remediation
Norman Patch and RemediationKavlieBorge
 
Security Engineering 1 (CS 5032 2012)
Security Engineering 1 (CS 5032 2012)Security Engineering 1 (CS 5032 2012)
Security Engineering 1 (CS 5032 2012)Ian Sommerville
 
Ch12-Software Engineering 9
Ch12-Software Engineering 9Ch12-Software Engineering 9
Ch12-Software Engineering 9Ian Sommerville
 
Software security engineering
Software security engineeringSoftware security engineering
Software security engineeringaizazhussain234
 
Critical systems specification
Critical systems specificationCritical systems specification
Critical systems specificationAryan Ajmer
 
Ch13-Software Engineering 9
Ch13-Software Engineering 9Ch13-Software Engineering 9
Ch13-Software Engineering 9Ian Sommerville
 
Software Security Engineering
Software Security EngineeringSoftware Security Engineering
Software Security EngineeringMuhammad Asim
 

Mais procurados (20)

CS 5032 L12 security testing and dependability cases 2013
CS 5032 L12  security testing and dependability cases 2013CS 5032 L12  security testing and dependability cases 2013
CS 5032 L12 security testing and dependability cases 2013
 
CS5032 L9 security engineering 1 2013
CS5032 L9 security engineering 1 2013CS5032 L9 security engineering 1 2013
CS5032 L9 security engineering 1 2013
 
Reliability and security specification (CS 5032 2012)
Reliability and security specification (CS 5032 2012)Reliability and security specification (CS 5032 2012)
Reliability and security specification (CS 5032 2012)
 
Safety specification (CS 5032 2012)
Safety specification (CS 5032 2012)Safety specification (CS 5032 2012)
Safety specification (CS 5032 2012)
 
Security Engineering 2 (CS 5032 2012)
Security Engineering 2 (CS 5032 2012)Security Engineering 2 (CS 5032 2012)
Security Engineering 2 (CS 5032 2012)
 
Security case buffer overflow
Security case buffer overflowSecurity case buffer overflow
Security case buffer overflow
 
Ch11-Software Engineering 9
Ch11-Software Engineering 9Ch11-Software Engineering 9
Ch11-Software Engineering 9
 
Ch3
Ch3Ch3
Ch3
 
Ch13 security engineering
Ch13 security engineeringCh13 security engineering
Ch13 security engineering
 
Norman Patch and Remediation
Norman Patch and  RemediationNorman Patch and  Remediation
Norman Patch and Remediation
 
Security Engineering 1 (CS 5032 2012)
Security Engineering 1 (CS 5032 2012)Security Engineering 1 (CS 5032 2012)
Security Engineering 1 (CS 5032 2012)
 
Ch12-Software Engineering 9
Ch12-Software Engineering 9Ch12-Software Engineering 9
Ch12-Software Engineering 9
 
Software security engineering
Software security engineeringSoftware security engineering
Software security engineering
 
Ch12 safety engineering
Ch12 safety engineeringCh12 safety engineering
Ch12 safety engineering
 
Ispe Article
Ispe ArticleIspe Article
Ispe Article
 
Critical systems specification
Critical systems specificationCritical systems specification
Critical systems specification
 
Ch13-Software Engineering 9
Ch13-Software Engineering 9Ch13-Software Engineering 9
Ch13-Software Engineering 9
 
Ch14 resilience engineering
Ch14 resilience engineeringCh14 resilience engineering
Ch14 resilience engineering
 
DSDConference07
DSDConference07DSDConference07
DSDConference07
 
Software Security Engineering
Software Security EngineeringSoftware Security Engineering
Software Security Engineering
 

Destaque

CS5032 L20 cybersecurity 2
CS5032 L20 cybersecurity 2CS5032 L20 cybersecurity 2
CS5032 L20 cybersecurity 2Ian Sommerville
 
CS5032 Case study Kegworth air disaster
CS5032 Case study Kegworth air disasterCS5032 Case study Kegworth air disaster
CS5032 Case study Kegworth air disasterIan Sommerville
 
CS 5032 L18 Critical infrastructure 2: SCADA systems
CS 5032 L18 Critical infrastructure 2: SCADA systemsCS 5032 L18 Critical infrastructure 2: SCADA systems
CS 5032 L18 Critical infrastructure 2: SCADA systemsIan Sommerville
 
CS5032 Case study Maroochy water breach
CS5032 Case study Maroochy water breachCS5032 Case study Maroochy water breach
CS5032 Case study Maroochy water breachIan Sommerville
 
CS5032 L19 cybersecurity 1
CS5032 L19 cybersecurity 1CS5032 L19 cybersecurity 1
CS5032 L19 cybersecurity 1Ian Sommerville
 
CS 5032 L3 socio-technical systems 2013
CS 5032 L3 socio-technical systems 2013CS 5032 L3 socio-technical systems 2013
CS 5032 L3 socio-technical systems 2013Ian Sommerville
 

Destaque (12)

CS5032 L20 cybersecurity 2
CS5032 L20 cybersecurity 2CS5032 L20 cybersecurity 2
CS5032 L20 cybersecurity 2
 
CS5032 Case study Kegworth air disaster
CS5032 Case study Kegworth air disasterCS5032 Case study Kegworth air disaster
CS5032 Case study Kegworth air disaster
 
CS 5032 L18 Critical infrastructure 2: SCADA systems
CS 5032 L18 Critical infrastructure 2: SCADA systemsCS 5032 L18 Critical infrastructure 2: SCADA systems
CS 5032 L18 Critical infrastructure 2: SCADA systems
 
CS5032 Case study Maroochy water breach
CS5032 Case study Maroochy water breachCS5032 Case study Maroochy water breach
CS5032 Case study Maroochy water breach
 
CS5032 L19 cybersecurity 1
CS5032 L19 cybersecurity 1CS5032 L19 cybersecurity 1
CS5032 L19 cybersecurity 1
 
Critical systems intro
Critical systems introCritical systems intro
Critical systems intro
 
System dependability
System dependabilitySystem dependability
System dependability
 
Critical systems engineering
Critical systems engineeringCritical systems engineering
Critical systems engineering
 
CS 5032 L3 socio-technical systems 2013
CS 5032 L3 socio-technical systems 2013CS 5032 L3 socio-technical systems 2013
CS 5032 L3 socio-technical systems 2013
 
Availability and reliability
Availability and reliabilityAvailability and reliability
Availability and reliability
 
System security
System securitySystem security
System security
 
Airbus Flight Control System
Airbus Flight Control SystemAirbus Flight Control System
Airbus Flight Control System
 

Semelhante a CS 5032 L8 dependability engineering 2 2013

Dependability Engineering 2 (CS 5032 2012)
Dependability Engineering 2 (CS 5032 2012)Dependability Engineering 2 (CS 5032 2012)
Dependability Engineering 2 (CS 5032 2012)Ian Sommerville
 
Dependablity Engineering 1 (CS 5032 2012)
Dependablity Engineering 1 (CS 5032 2012)Dependablity Engineering 1 (CS 5032 2012)
Dependablity Engineering 1 (CS 5032 2012)Ian Sommerville
 
High dependability of the automated systems
High dependability of the automated systemsHigh dependability of the automated systems
High dependability of the automated systemsAlan Tatourian
 
Introduction to Embedded Systems.pptx
Introduction to Embedded Systems.pptxIntroduction to Embedded Systems.pptx
Introduction to Embedded Systems.pptxKIRAN CHIPPALA
 
Fault Tolerance System
Fault Tolerance SystemFault Tolerance System
Fault Tolerance Systemprakashjjaya
 
Safety control systems: Some essential considerations
Safety control systems: Some essential considerationsSafety control systems: Some essential considerations
Safety control systems: Some essential considerationsOptima Control Solutions
 
Fault tolerance and computing
Fault tolerance  and computingFault tolerance  and computing
Fault tolerance and computingPalani murugan
 
VTU 5TH SEM CSE SOFTWARE ENGINEERING SOLVED PAPERS - JUN13 DEC13 JUN14 DEC14 ...
VTU 5TH SEM CSE SOFTWARE ENGINEERING SOLVED PAPERS - JUN13 DEC13 JUN14 DEC14 ...VTU 5TH SEM CSE SOFTWARE ENGINEERING SOLVED PAPERS - JUN13 DEC13 JUN14 DEC14 ...
VTU 5TH SEM CSE SOFTWARE ENGINEERING SOLVED PAPERS - JUN13 DEC13 JUN14 DEC14 ...vtunotesbysree
 
UNIT 1C CHARACTERISTICS _ QUALITY ATT OF ES.pptx
UNIT 1C CHARACTERISTICS _ QUALITY ATT OF ES.pptxUNIT 1C CHARACTERISTICS _ QUALITY ATT OF ES.pptx
UNIT 1C CHARACTERISTICS _ QUALITY ATT OF ES.pptxSKILL2021
 
AWS Community Day - Vitaliy Shtym - Pragmatic Container Security
AWS Community Day - Vitaliy Shtym - Pragmatic Container SecurityAWS Community Day - Vitaliy Shtym - Pragmatic Container Security
AWS Community Day - Vitaliy Shtym - Pragmatic Container SecurityAWS Chicago
 
Trends in Embedded system Design
Trends in Embedded system DesignTrends in Embedded system Design
Trends in Embedded system DesignRaman Deep
 
Section01 overview (1)
Section01 overview (1)Section01 overview (1)
Section01 overview (1)Vimarsh Padha
 
Automotive communication systems: from dependability to security
Automotive communication systems: from dependability to securityAutomotive communication systems: from dependability to security
Automotive communication systems: from dependability to securityNicolas Navet
 
Automotive communication systems: from dependability to security
Automotive communication systems: from dependability to securityAutomotive communication systems: from dependability to security
Automotive communication systems: from dependability to securityRealTime-at-Work (RTaW)
 

Semelhante a CS 5032 L8 dependability engineering 2 2013 (20)

Dependability Engineering 2 (CS 5032 2012)
Dependability Engineering 2 (CS 5032 2012)Dependability Engineering 2 (CS 5032 2012)
Dependability Engineering 2 (CS 5032 2012)
 
Ch13.pptx
Ch13.pptxCh13.pptx
Ch13.pptx
 
Ch13
Ch13Ch13
Ch13
 
Dependablity Engineering 1 (CS 5032 2012)
Dependablity Engineering 1 (CS 5032 2012)Dependablity Engineering 1 (CS 5032 2012)
Dependablity Engineering 1 (CS 5032 2012)
 
High dependability of the automated systems
High dependability of the automated systemsHigh dependability of the automated systems
High dependability of the automated systems
 
Introduction to Embedded Systems.pptx
Introduction to Embedded Systems.pptxIntroduction to Embedded Systems.pptx
Introduction to Embedded Systems.pptx
 
Fault Tolerance System
Fault Tolerance SystemFault Tolerance System
Fault Tolerance System
 
Os
OsOs
Os
 
Safety control systems: Some essential considerations
Safety control systems: Some essential considerationsSafety control systems: Some essential considerations
Safety control systems: Some essential considerations
 
Fault tolerance and computing
Fault tolerance  and computingFault tolerance  and computing
Fault tolerance and computing
 
VTU 5TH SEM CSE SOFTWARE ENGINEERING SOLVED PAPERS - JUN13 DEC13 JUN14 DEC14 ...
VTU 5TH SEM CSE SOFTWARE ENGINEERING SOLVED PAPERS - JUN13 DEC13 JUN14 DEC14 ...VTU 5TH SEM CSE SOFTWARE ENGINEERING SOLVED PAPERS - JUN13 DEC13 JUN14 DEC14 ...
VTU 5TH SEM CSE SOFTWARE ENGINEERING SOLVED PAPERS - JUN13 DEC13 JUN14 DEC14 ...
 
Coud discovery chap 5
Coud discovery chap 5Coud discovery chap 5
Coud discovery chap 5
 
UNIT 1C CHARACTERISTICS _ QUALITY ATT OF ES.pptx
UNIT 1C CHARACTERISTICS _ QUALITY ATT OF ES.pptxUNIT 1C CHARACTERISTICS _ QUALITY ATT OF ES.pptx
UNIT 1C CHARACTERISTICS _ QUALITY ATT OF ES.pptx
 
AWS Community Day - Vitaliy Shtym - Pragmatic Container Security
AWS Community Day - Vitaliy Shtym - Pragmatic Container SecurityAWS Community Day - Vitaliy Shtym - Pragmatic Container Security
AWS Community Day - Vitaliy Shtym - Pragmatic Container Security
 
Trends in Embedded system Design
Trends in Embedded system DesignTrends in Embedded system Design
Trends in Embedded system Design
 
Section01 overview (1)
Section01 overview (1)Section01 overview (1)
Section01 overview (1)
 
Section01 overview
Section01 overviewSection01 overview
Section01 overview
 
Embedded operating systems
Embedded operating systemsEmbedded operating systems
Embedded operating systems
 
Automotive communication systems: from dependability to security
Automotive communication systems: from dependability to securityAutomotive communication systems: from dependability to security
Automotive communication systems: from dependability to security
 
Automotive communication systems: from dependability to security
Automotive communication systems: from dependability to securityAutomotive communication systems: from dependability to security
Automotive communication systems: from dependability to security
 

Mais de Ian Sommerville

Ultra Large Scale Systems
Ultra Large Scale SystemsUltra Large Scale Systems
Ultra Large Scale SystemsIan Sommerville
 
Dependability requirements for LSCITS
Dependability requirements for LSCITSDependability requirements for LSCITS
Dependability requirements for LSCITSIan Sommerville
 
Conceptual systems design
Conceptual systems designConceptual systems design
Conceptual systems designIan Sommerville
 
Requirements Engineering for LSCITS
Requirements Engineering for LSCITSRequirements Engineering for LSCITS
Requirements Engineering for LSCITSIan Sommerville
 
An introduction to LSCITS
An introduction to LSCITSAn introduction to LSCITS
An introduction to LSCITSIan Sommerville
 
Internet worm-case-study
Internet worm-case-studyInternet worm-case-study
Internet worm-case-studyIan Sommerville
 
Designing software for a million users
Designing software for a million usersDesigning software for a million users
Designing software for a million usersIan Sommerville
 
CS5032 Case study Ariane 5 launcher failure
CS5032 Case study Ariane 5 launcher failureCS5032 Case study Ariane 5 launcher failure
CS5032 Case study Ariane 5 launcher failureIan Sommerville
 
L17 CS5032 critical infrastructure
L17 CS5032 critical infrastructureL17 CS5032 critical infrastructure
L17 CS5032 critical infrastructureIan Sommerville
 

Mais de Ian Sommerville (13)

Ultra Large Scale Systems
Ultra Large Scale SystemsUltra Large Scale Systems
Ultra Large Scale Systems
 
Resp modellingintro
Resp modellingintroResp modellingintro
Resp modellingintro
 
Resilience and recovery
Resilience and recoveryResilience and recovery
Resilience and recovery
 
LSCITS-engineering
LSCITS-engineeringLSCITS-engineering
LSCITS-engineering
 
Requirements reality
Requirements realityRequirements reality
Requirements reality
 
Dependability requirements for LSCITS
Dependability requirements for LSCITSDependability requirements for LSCITS
Dependability requirements for LSCITS
 
Conceptual systems design
Conceptual systems designConceptual systems design
Conceptual systems design
 
Requirements Engineering for LSCITS
Requirements Engineering for LSCITSRequirements Engineering for LSCITS
Requirements Engineering for LSCITS
 
An introduction to LSCITS
An introduction to LSCITSAn introduction to LSCITS
An introduction to LSCITS
 
Internet worm-case-study
Internet worm-case-studyInternet worm-case-study
Internet worm-case-study
 
Designing software for a million users
Designing software for a million usersDesigning software for a million users
Designing software for a million users
 
CS5032 Case study Ariane 5 launcher failure
CS5032 Case study Ariane 5 launcher failureCS5032 Case study Ariane 5 launcher failure
CS5032 Case study Ariane 5 launcher failure
 
L17 CS5032 critical infrastructure
L17 CS5032 critical infrastructureL17 CS5032 critical infrastructure
L17 CS5032 critical infrastructure
 

CS 5032 L8 dependability engineering 2 2013

  • 1. Dependable systems architectures Lecture 2 Dependable systems architectures, 2013 Slide 1
  • 2. System fault tolerance • Fault tolerance means that the system can continue in operation in spite of software fault i.e. the fault does not lead to a failure • Fault tolerance is required where there are high availability requirements, no ‘fail safe’ state or where system failure costs are very high. • Even if the system has been proved to conform to its specification, it must also be fault tolerant as there may be specification errors or the validation may be incorrect. Dependable systems architectures, 2013 Slide 2
  • 3. Dependable system architectures • Dependable systems architectures are used in situations where fault tolerance is essential. These architectures are generally all based on redundancy and diversity. • Examples of situations where dependable architectures are used: – Flight control systems, where system failure could threaten the safety of passengers – Reactor systems where failure of a control system could lead to a chemical or nuclear emergency – Telecommunication systems, where there is a need for 24/7 availability. Dependable systems architectures, 2013 Slide 3
  • 4. Protection systems • A specialized system that is associated with some other control system, which can take emergency action if a failure occurs. – System to stop a train if it passes a red light – System to shut down a reactor if temperature/pressure are too high • Protection systems independently monitor the controlled system and the environment. • If a problem is detected, it issues commands to take emergency action to shut down the system and avoid a catastrophe. Dependable systems architectures, 2013 Slide 4
  • 5. Sizewell B reactor • Software controlled protection system in Sizewell B reactor • Go-live in 1993 • Software protection system has been claimed to have a 1/10000 PFD Dependable systems architectures, 2013 Slide 5
  • 6. Protection system architecture Dependable systems architectures, 2013 Slide 6
  • 7. Protection system functionality • Protection systems are redundant because they include monitoring and control capabilities that replicate those in the control software. • Protection systems should be diverse and use different technology from the control software. • They are simpler than the control system so more effort can be expended in validation and dependability assurance. • Aim is to ensure that there is a low probability of failure on demand for the protection system. Dependable systems architectures, 2013 Slide 7
  • 8. Self-monitoring architectures • Multi-channel architectures where the system monitors its own operations and takes action if inconsistencies are detected. • The same computation is carried out on each channel and the results are compared. If the results are identical and are produced at the same time, then it is assumed that the system is operating correctly. • If the results are different, then a failure is assumed and a failure exception is raised. Dependable systems architectures, 2013 Slide 8
  • 10. Self-monitoring systems • Hardware in each channel has to be diverse so that common mode hardware failure will not lead to each channel producing the same results. • Software in each channel must also be diverse, otherwise the same software error would affect each channel. • If high-availability is required, you may use several self-checking systems in parallel. – This is the approach used in the Airbus family of aircraft for their flight control systems. Dependable systems architectures, 2013 Slide 10
  • 11. Airbus 340 • Airbus were the first commercial aircraft manufacturer to use ‘fly by wire’ flight control systems • Fly by wire systems are lighter (saving fuel), can be more fuel efficient and can detect and prevent pilot actions that are potentially dangerous. Dependable systems architectures, 2013 Slide 11
  • 12. Airbus flight control system architecture Dependable systems architectures, 2013 Slide 12
  • 13. Airbus architecture discussion • The Airbus FCS has 5 separate computers, any one of which can run the control software. • Extensive use has been made of diversity – Primary and secondary systems use different processors. – Primary and secondary systems use chipsets from different manufacturers. – Software in secondary systems is less complex than in primary system – provides only critical functionality. – Software in each channel is developed in different programming languages by different teams. – Different programming languages used in primary and secondary systems. Dependable systems architectures, 2013 Slide 13
  • 14. Hardware fault tolerance • Depends on triple-modular redundancy (TMR). • Three identical components that receive the same input and whose outputs are compared. • If one output is different, it is ignored and component failure is assumed. • Based on most faults resulting from component failures rather than design faults and a low Dependable systems architectures, 2013 probability of simultaneous Slide 14
  • 15. Triple modular redundancy Dependable systems architectures, 2013 Slide 15
  • 16. N-version programming • Multiple versions of a software system carry out computations at the same time. There should be an odd number of computers involved, typically 3. • The results are compared using a voting system and the majority result is taken to be the correct result. • Approach derived from the notion of triple-modular redundancy, as used in hardware systems. Dependable systems architectures, 2013 Slide 16
  • 17. N-version programming Dependable systems architectures, 2013 Slide 17
  • 18. N-version programming • The different system versions are designed and implemented by different teams. It is assumed that there is a low probability that they will make the same mistakes. The algorithms used should but may not be different. • There is some empirical evidence that teams commonly misinterpret specifications in the same way and chose the same algorithms in their systems. Dependable systems architectures, 2013 Slide 18
  • 19. Software diversity • Approaches to software fault tolerance embedded in the system architecture depend on software diversity where it is assumed that different implementations of the same software specification will fail in different ways. • It is assumed that implementations are (a) independent and (b) do not include common errors. • Strategies to achieve diversity – Different programming languages – Different design methods and tools – Explicit specification of different algorithms Dependable systems architectures, 2013 Slide 19
  • 20. Problems with design diversity • Teams are not culturally diverse so they tend to tackle problems in the same way • Characteristic errors – Different teams make the same mistakes. Some parts of an implementation are more difficult than others so all teams tend to make mistakes in the same place; • Specification errors; – If there is an error in the specification then this is reflected in all implementations; – This can be addressed to some extent by using multiple specification representations. Dependable systems architectures, 2013 Slide 20
  • 21. Specification dependency • Both approaches to software redundancy are susceptible to specification errors. If the specification is incorrect, the system could fail • This is also a problem with hardware but software specifications are usually more complex than hardware specifications and harder to validate. • This has been addressed in some cases by developing separate software specifications from the same user specification. Dependable systems architectures, 2013 Slide 21
  • 22. Improvements in practice • In principle, if diversity and independence can be achieved, multi-version programming leads to very significant improvements in reliability and availability. • In practice, observed improvements are much less significant but the approach seems leads to reliability improvements of between 5 and 9 times. • Are such improvements are worth the considerable extra development costs for multi-version programming? • Is it cheaper to increase the reliability of a single version using fault avoidance and fault detection Dependabletechniques? systems architectures, 2013 Slide 22
  • 23. Key points • Dependable system architectures are system architectures that are designed for fault tolerance. • Architectural styles that support fault tolerance include protection systems, self-monitoring architectures and N-version programming. • Software diversity is difficult to achieve because it is practically impossible to ensure that each version of the software is truly independent. Dependable systems architectures, 2013 Slide 23