SlideShare uma empresa Scribd logo
1 de 15
CprE 458/558: Real-Time Systems


                              Lecture 17
                  Fault-tolerant design techniques




CprE 458/558               G. Manimaran (ISU)
Fault Tolerant Strategies
    Fault tolerance in computer system is achieved through
     redundancy in hardware, software, information, and/or
     computations. Such redundancy can be implemented in
     static, dynamic, or hybrid configurations.
    Fault tolerance can be achieved by many techniques:
        Fault masking is any process that prevents faults in a system
         from introducing errors. Example: Error correcting memories and
         majority voting.
        Reconfiguration is the process of eliminating faulty component
         from a system and restoring the system to some operational state.




CprE 458/558                 G. Manimaran (ISU)                          2
Reconfiguration Approach
    Fault detection is the process of recognizing that a fault
     has occurred. Fault detection is often required before any
     recovery procedure can be initiated.
    Fault location is the process of determining where a fault
     has occurred so that an appropriate recovery can be
     initiated.
    Fault containment is the process of isolating a fault and
     preventing the effects of that fault from propagating
     throughout the system.
    Fault recovery is the process of remaining operational or
     regaining operational status via reconfiguration even in the
     presence of faults.

CprE 458/558             G. Manimaran (ISU)                     3
The Concept of Redundancy
    Redundancy is simply the addition of information,
     resources, or time beyond what is needed for normal
     system operation.
    Hardware redundancy is the addition of extra
     hardware, usually for the purpose either detecting or
     tolerating faults.
    Software redundancy is the addition of extra software,
     beyond what is needed to perform a given function, to
     detect and possibly tolerate faults.
    Information redundancy is the addition of extra
     information beyond that required to implement a given
     function; for example, error detection codes.


CprE 458/558            G. Manimaran (ISU)                    4
The Concept of Redundancy (Cont’d)
    Time redundancy uses additional time to perform the
     functions of a system such that fault detection and often
     fault tolerance can be achieved. Transient faults are
     tolerated by this.

    The use of redundancy can provide additional capabilities
     within a system. But, redundancy can have very important
     impact on a system's performance, size, weight, power
     consumption, and reliability.




CprE 458/558             G. Manimaran (ISU)                      5
Hardware Redundancy
    Passive techniques use the concept of fault masking.
     These techniques are designed to achieve fault tolerance
     without requiring any action on the part of the system.
     Relies on voting mechanisms.
    Active techniques achieve fault tolerance by detecting
     the existence of faults and performing some action to
     remove the faulty hardware from the system. That is,
     active techniques use fault detection, fault location, and
     fault recovery in an attempt to achieve fault tolerance.




CprE 458/558             G. Manimaran (ISU)                       6
Hardware Redundancy (Cont’d)


    Hybrid techniques combine the attractive features of
     both the passive
     and active approaches.
        Fault masking is used in hybrid systems to prevent erroneous
         results from being generated.
        Fault detection, location, and recovery are also used to improve
         fault tolerance by removing faulty hardware and replacing it with
         spares.




CprE 458/558                 G. Manimaran (ISU)                              7
Hardware Redundancy - A Taxonomy

     Title:
     hard-fault.fig
     Creator:
     fig2dev Version 3.1 Patchlevel 2
     Preview:
     This EPS picture was not saved
     with a preview included in it.
     Comment:
     This EPS picture will print to a
     PostScript printer, but not to
     other types of printers.




CprE 458/558                            G. Manimaran (ISU)   8
Triple Modular Redundancy (TMR)


          Title:
          tmr1.fig
          Creator:
          fig2dev Version 3.1 Patchlevel 2
          Preview:
          This EPS picture was not saved
          with a preview included in it.
          Comment:
          This EPS picture will print to a
          PostScript printer, but not to
          other types of printers.




CprE 458/558                                 G. Manimaran (ISU)   9
Software Redundancy - to Detect
        Software Faults
    There are two popular approaches: N-Version
     Programming (NVP) and Recovery Blocks (RB).

    NVP is a forward recovery scheme - it masks faults.
    NVP: multiple versions of the same task is executed
     concurrently.
    NVP relies on voting.

    RB is a backward error recovery scheme.
    RB: the versions of a task are executed serially.
    RB relies on acceptance test.

CprE 458/558              G. Manimaran (ISU)               10
N-Version Programming (NVP)
    NVP is based on the principle of design diversity, that is coding a
     software module by different teams of programmers, to have multiple
     versions.

    Diversity can also be introduced by employing different algorithms for
     obtaining the same solution or by choosing different programming
     languages.

    NVP can tolerate both hardware and software faults.

    Correlated faults are not tolerated by the NVP.

    In NVP, deciding the number of versions required to ensure acceptable
     levels of software reliability is an important design consideration.


CprE 458/558                  G. Manimaran (ISU)                          11
N-Version Programming (Cont’d)

          Title:
          nvp.fig
          Creator:
          fig2dev Version 3.1 Patchlevel 2
          Preview:
          This EPS picture was not saved
          with a preview included in it.
          Comment:
          This EPS picture will print to a
          PostScript printer, but not to
          other types of printers.




CprE 458/558                                 G. Manimaran (ISU)   12
Recovery Blocks (RB)
    RB uses multiple alternates (backups) to perform the same
     function; one module (task) is primary and the others are
     secondary.

    The primary task executes first. When the primary task
     completes execution, its outcome is checked by an
     acceptance test.

    If the output is not acceptable, a secondary task executes
     after undoing the effects of primary (i.e., rolling back to
     the state at which primary was invoked) until either an
     acceptable output is obtained or the alternates are
     exhausted.

CprE 458/558             G. Manimaran (ISU)                    13
Recovery Blocks (Cont’d)
         Title:
         rblocks.fig
         Creator:
         fig2dev Version 3.1 Patchlevel 2
         Preview:
         This EPS picture was not saved
         with a preview included in it.
         Comment:
         This EPS picture will print to a
         PostScript printer, but not to
         other types of printers.




CprE 458/558                                G. Manimaran (ISU)   14
Recovery Blocks (Cont’d)
    The acceptance tests are usually sanity checks; these
     consist of making sure that the output is within a certain
     acceptable range or that the output does not change at
     more than the allowed maximum rate.

    Selecting the range for acceptance test is crucial. If the
     allowed ranges are too small, the acceptance tests may
     label correct outputs as bad. If they are too large, the
     probability that incorrect outputs will be accepted is more.

    RB can tolerate software faults because the alternates are
     usually implemented with different approaches; RB is also
     known as Primary-Backup approach.

CprE 458/558              G. Manimaran (ISU)                      15

Mais conteúdo relacionado

Mais procurados

Fault Tolerance System
Fault Tolerance SystemFault Tolerance System
Fault Tolerance SystemEhsan Ilahi
 
Physical and Logical Clocks
Physical and Logical ClocksPhysical and Logical Clocks
Physical and Logical ClocksDilum Bandara
 
Mobile Network Layer
Mobile Network LayerMobile Network Layer
Mobile Network LayerRahul Hada
 
Fault tolerance in distributed systems
Fault tolerance in distributed systemsFault tolerance in distributed systems
Fault tolerance in distributed systemssumitjain2013
 
Synchronization in distributed systems
Synchronization in distributed systems Synchronization in distributed systems
Synchronization in distributed systems SHATHAN
 
Foult Tolerence In Distributed System
Foult Tolerence In Distributed SystemFoult Tolerence In Distributed System
Foult Tolerence In Distributed SystemRajan Kumar
 
Deadlock in Distributed Systems
Deadlock in Distributed SystemsDeadlock in Distributed Systems
Deadlock in Distributed SystemsPritom Saha Akash
 
Communication costs in parallel machines
Communication costs in parallel machinesCommunication costs in parallel machines
Communication costs in parallel machinesSyed Zaid Irshad
 
8. mutual exclusion in Distributed Operating Systems
8. mutual exclusion in Distributed Operating Systems8. mutual exclusion in Distributed Operating Systems
8. mutual exclusion in Distributed Operating SystemsDr Sandeep Kumar Poonia
 
Inter Process Communication Presentation[1]
Inter Process Communication Presentation[1]Inter Process Communication Presentation[1]
Inter Process Communication Presentation[1]Ravindra Raju Kolahalam
 
distributed Computing system model
distributed Computing system modeldistributed Computing system model
distributed Computing system modelHarshad Umredkar
 

Mais procurados (20)

Message passing in Distributed Computing Systems
Message passing in Distributed Computing SystemsMessage passing in Distributed Computing Systems
Message passing in Distributed Computing Systems
 
Multiple access protocol
Multiple access protocolMultiple access protocol
Multiple access protocol
 
Fault Tolerance System
Fault Tolerance SystemFault Tolerance System
Fault Tolerance System
 
TinyOS
TinyOSTinyOS
TinyOS
 
Physical and Logical Clocks
Physical and Logical ClocksPhysical and Logical Clocks
Physical and Logical Clocks
 
Framing in data link layer
Framing in data link layerFraming in data link layer
Framing in data link layer
 
The medium access sublayer
 The medium  access sublayer The medium  access sublayer
The medium access sublayer
 
Concurrency
ConcurrencyConcurrency
Concurrency
 
Mobile Network Layer
Mobile Network LayerMobile Network Layer
Mobile Network Layer
 
Stream oriented communication
Stream oriented communicationStream oriented communication
Stream oriented communication
 
Fault tolerance in distributed systems
Fault tolerance in distributed systemsFault tolerance in distributed systems
Fault tolerance in distributed systems
 
Synchronization in distributed systems
Synchronization in distributed systems Synchronization in distributed systems
Synchronization in distributed systems
 
Foult Tolerence In Distributed System
Foult Tolerence In Distributed SystemFoult Tolerence In Distributed System
Foult Tolerence In Distributed System
 
Mainframe systems
Mainframe systemsMainframe systems
Mainframe systems
 
Deadlock in Distributed Systems
Deadlock in Distributed SystemsDeadlock in Distributed Systems
Deadlock in Distributed Systems
 
Communication costs in parallel machines
Communication costs in parallel machinesCommunication costs in parallel machines
Communication costs in parallel machines
 
Distributed Mutual exclusion algorithms
Distributed Mutual exclusion algorithmsDistributed Mutual exclusion algorithms
Distributed Mutual exclusion algorithms
 
8. mutual exclusion in Distributed Operating Systems
8. mutual exclusion in Distributed Operating Systems8. mutual exclusion in Distributed Operating Systems
8. mutual exclusion in Distributed Operating Systems
 
Inter Process Communication Presentation[1]
Inter Process Communication Presentation[1]Inter Process Communication Presentation[1]
Inter Process Communication Presentation[1]
 
distributed Computing system model
distributed Computing system modeldistributed Computing system model
distributed Computing system model
 

Destaque

N-version programming
N-version programmingN-version programming
N-version programmingshabnam0102
 
Fault tolerance techniques for real time operating system
Fault tolerance techniques for real time operating systemFault tolerance techniques for real time operating system
Fault tolerance techniques for real time operating systemanujos25
 
Fault Tolerance System
Fault Tolerance SystemFault Tolerance System
Fault Tolerance Systemprakashjjaya
 
Fault tolerant presentation
Fault tolerant presentationFault tolerant presentation
Fault tolerant presentationskadyan1
 
Software Fault Tolerance
Software Fault ToleranceSoftware Fault Tolerance
Software Fault ToleranceAnkit Singh
 
Fault tolearant system
Fault tolearant systemFault tolearant system
Fault tolearant systemarvinthsaran
 
Main MeMory Data Base
Main MeMory Data BaseMain MeMory Data Base
Main MeMory Data BaseSiva Rushi
 
Fault Tolerance (Distributed computing)
Fault Tolerance (Distributed computing)Fault Tolerance (Distributed computing)
Fault Tolerance (Distributed computing)Sri Prasanna
 
Software engineering quality assurance and testing
Software engineering quality assurance and testingSoftware engineering quality assurance and testing
Software engineering quality assurance and testingBipul Roy Bpl
 
Fault tolerance and computing
Fault tolerance  and computingFault tolerance  and computing
Fault tolerance and computingPalani murugan
 
Real time database (MDARTS)
Real time database (MDARTS)Real time database (MDARTS)
Real time database (MDARTS)Pradeep Kumar TS
 
Fault management presentation
Fault management presentationFault management presentation
Fault management presentationardhita banu adji
 
Fault Management System (OSS)
Fault Management System (OSS)Fault Management System (OSS)
Fault Management System (OSS)Riswan
 
Be information technology2008course
Be information technology2008courseBe information technology2008course
Be information technology2008courseAnuj Sharma
 
Chapter 19 - Real Time Systems
Chapter 19 - Real Time SystemsChapter 19 - Real Time Systems
Chapter 19 - Real Time SystemsWayne Jones Jnr
 
Introduction to Real-Time Operating Systems
Introduction to Real-Time Operating SystemsIntroduction to Real-Time Operating Systems
Introduction to Real-Time Operating Systemscoolmirza143
 

Destaque (20)

N-version programming
N-version programmingN-version programming
N-version programming
 
Fault tolerance techniques for real time operating system
Fault tolerance techniques for real time operating systemFault tolerance techniques for real time operating system
Fault tolerance techniques for real time operating system
 
Fault tolerance
Fault toleranceFault tolerance
Fault tolerance
 
Fault Tolerance System
Fault Tolerance SystemFault Tolerance System
Fault Tolerance System
 
Fault tolerant presentation
Fault tolerant presentationFault tolerant presentation
Fault tolerant presentation
 
Software Fault Tolerance
Software Fault ToleranceSoftware Fault Tolerance
Software Fault Tolerance
 
Fault tolearant system
Fault tolearant systemFault tolearant system
Fault tolearant system
 
Main MeMory Data Base
Main MeMory Data BaseMain MeMory Data Base
Main MeMory Data Base
 
Fault Tolerance (Distributed computing)
Fault Tolerance (Distributed computing)Fault Tolerance (Distributed computing)
Fault Tolerance (Distributed computing)
 
Software engineering quality assurance and testing
Software engineering quality assurance and testingSoftware engineering quality assurance and testing
Software engineering quality assurance and testing
 
Fault tolerance and computing
Fault tolerance  and computingFault tolerance  and computing
Fault tolerance and computing
 
Vxworks
VxworksVxworks
Vxworks
 
Real time database (MDARTS)
Real time database (MDARTS)Real time database (MDARTS)
Real time database (MDARTS)
 
Fault management presentation
Fault management presentationFault management presentation
Fault management presentation
 
Fault Management System (OSS)
Fault Management System (OSS)Fault Management System (OSS)
Fault Management System (OSS)
 
Be information technology2008course
Be information technology2008courseBe information technology2008course
Be information technology2008course
 
Chapter 19 - Real Time Systems
Chapter 19 - Real Time SystemsChapter 19 - Real Time Systems
Chapter 19 - Real Time Systems
 
Ch21 real time software engineering
Ch21 real time software engineeringCh21 real time software engineering
Ch21 real time software engineering
 
DFD level-0 to 1
DFD level-0 to 1DFD level-0 to 1
DFD level-0 to 1
 
Introduction to Real-Time Operating Systems
Introduction to Real-Time Operating SystemsIntroduction to Real-Time Operating Systems
Introduction to Real-Time Operating Systems
 

Semelhante a Fault tolerance

1 introduction
1 introduction1 introduction
1 introductionhanmya
 
Implementing Vulnerability Management
Implementing Vulnerability Management Implementing Vulnerability Management
Implementing Vulnerability Management Argyle Executive Forum
 
Software rejuvenation
Software rejuvenationSoftware rejuvenation
Software rejuvenationRVCE
 
Software rejuvenation
Software rejuvenationSoftware rejuvenation
Software rejuvenationRVCE2
 
Software rejuvenation
Software rejuvenationSoftware rejuvenation
Software rejuvenationRVCE
 
Review Paper on Recovery of Data during Software Fault
Review Paper on Recovery of Data during Software FaultReview Paper on Recovery of Data during Software Fault
Review Paper on Recovery of Data during Software FaultAM Publications
 
SOLUTION MANUAL OF OPERATING SYSTEM CONCEPTS BY ABRAHAM SILBERSCHATZ, PETER B...
SOLUTION MANUAL OF OPERATING SYSTEM CONCEPTS BY ABRAHAM SILBERSCHATZ, PETER B...SOLUTION MANUAL OF OPERATING SYSTEM CONCEPTS BY ABRAHAM SILBERSCHATZ, PETER B...
SOLUTION MANUAL OF OPERATING SYSTEM CONCEPTS BY ABRAHAM SILBERSCHATZ, PETER B...vtunotesbysree
 
Systematic Model based Testing with Coverage Analysis
Systematic Model based Testing with Coverage AnalysisSystematic Model based Testing with Coverage Analysis
Systematic Model based Testing with Coverage AnalysisIDES Editor
 
Presentation
PresentationPresentation
Presentations1150056
 
Presentation
PresentationPresentation
Presentations1150056
 
DEF CON 27 - ALI ISLAM and DAN REGALADO WEAPONIZING HYPERVISORS
DEF CON 27 - ALI ISLAM and DAN REGALADO WEAPONIZING HYPERVISORSDEF CON 27 - ALI ISLAM and DAN REGALADO WEAPONIZING HYPERVISORS
DEF CON 27 - ALI ISLAM and DAN REGALADO WEAPONIZING HYPERVISORSFelipe Prado
 
Libckpt transparent checkpointing under unix
Libckpt transparent checkpointing under unixLibckpt transparent checkpointing under unix
Libckpt transparent checkpointing under unixZongYing Lyu
 
02. Fault Tolerance Pattern 위한 mindset
02. Fault Tolerance Pattern 위한 mindset02. Fault Tolerance Pattern 위한 mindset
02. Fault Tolerance Pattern 위한 mindseteva
 
Memory Management in OS
Memory Management in OSMemory Management in OS
Memory Management in OSvampugani
 
Business Continuity Knowledge Share
Business Continuity Knowledge ShareBusiness Continuity Knowledge Share
Business Continuity Knowledge Share.Gastón. .Bx.
 
A Brief Review Of Approaches For Fault Tolerance In Distributed Systems
A Brief Review Of Approaches For Fault Tolerance In Distributed SystemsA Brief Review Of Approaches For Fault Tolerance In Distributed Systems
A Brief Review Of Approaches For Fault Tolerance In Distributed SystemsIRJET Journal
 

Semelhante a Fault tolerance (20)

1 introduction
1 introduction1 introduction
1 introduction
 
Implementing Vulnerability Management
Implementing Vulnerability Management Implementing Vulnerability Management
Implementing Vulnerability Management
 
Software rejuvenation
Software rejuvenationSoftware rejuvenation
Software rejuvenation
 
Software rejuvenation
Software rejuvenationSoftware rejuvenation
Software rejuvenation
 
Software rejuvenation
Software rejuvenationSoftware rejuvenation
Software rejuvenation
 
A Practical Fault Tolerance Approach in Cloud Computing Using Support Vector ...
A Practical Fault Tolerance Approach in Cloud Computing Using Support Vector ...A Practical Fault Tolerance Approach in Cloud Computing Using Support Vector ...
A Practical Fault Tolerance Approach in Cloud Computing Using Support Vector ...
 
Review Paper on Recovery of Data during Software Fault
Review Paper on Recovery of Data during Software FaultReview Paper on Recovery of Data during Software Fault
Review Paper on Recovery of Data during Software Fault
 
SOLUTION MANUAL OF OPERATING SYSTEM CONCEPTS BY ABRAHAM SILBERSCHATZ, PETER B...
SOLUTION MANUAL OF OPERATING SYSTEM CONCEPTS BY ABRAHAM SILBERSCHATZ, PETER B...SOLUTION MANUAL OF OPERATING SYSTEM CONCEPTS BY ABRAHAM SILBERSCHATZ, PETER B...
SOLUTION MANUAL OF OPERATING SYSTEM CONCEPTS BY ABRAHAM SILBERSCHATZ, PETER B...
 
Systematic Model based Testing with Coverage Analysis
Systematic Model based Testing with Coverage AnalysisSystematic Model based Testing with Coverage Analysis
Systematic Model based Testing with Coverage Analysis
 
580 584
580 584580 584
580 584
 
Presentation
PresentationPresentation
Presentation
 
Presentation
PresentationPresentation
Presentation
 
DEF CON 27 - ALI ISLAM and DAN REGALADO WEAPONIZING HYPERVISORS
DEF CON 27 - ALI ISLAM and DAN REGALADO WEAPONIZING HYPERVISORSDEF CON 27 - ALI ISLAM and DAN REGALADO WEAPONIZING HYPERVISORS
DEF CON 27 - ALI ISLAM and DAN REGALADO WEAPONIZING HYPERVISORS
 
Techno-Fest-15nov16
Techno-Fest-15nov16Techno-Fest-15nov16
Techno-Fest-15nov16
 
Libckpt transparent checkpointing under unix
Libckpt transparent checkpointing under unixLibckpt transparent checkpointing under unix
Libckpt transparent checkpointing under unix
 
02. Fault Tolerance Pattern 위한 mindset
02. Fault Tolerance Pattern 위한 mindset02. Fault Tolerance Pattern 위한 mindset
02. Fault Tolerance Pattern 위한 mindset
 
Memory Management in OS
Memory Management in OSMemory Management in OS
Memory Management in OS
 
Ch20
Ch20Ch20
Ch20
 
Business Continuity Knowledge Share
Business Continuity Knowledge ShareBusiness Continuity Knowledge Share
Business Continuity Knowledge Share
 
A Brief Review Of Approaches For Fault Tolerance In Distributed Systems
A Brief Review Of Approaches For Fault Tolerance In Distributed SystemsA Brief Review Of Approaches For Fault Tolerance In Distributed Systems
A Brief Review Of Approaches For Fault Tolerance In Distributed Systems
 

Último

Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxcallscotland1987
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxVishalSingh1417
 
Third Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxThird Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxAmita Gupta
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseAnaAcapella
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and ModificationsMJDuyan
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17Celine George
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docxPoojaSen20
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 

Último (20)

Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptx
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Third Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxThird Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptx
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 

Fault tolerance

  • 1. CprE 458/558: Real-Time Systems Lecture 17 Fault-tolerant design techniques CprE 458/558 G. Manimaran (ISU)
  • 2. Fault Tolerant Strategies  Fault tolerance in computer system is achieved through redundancy in hardware, software, information, and/or computations. Such redundancy can be implemented in static, dynamic, or hybrid configurations.  Fault tolerance can be achieved by many techniques:  Fault masking is any process that prevents faults in a system from introducing errors. Example: Error correcting memories and majority voting.  Reconfiguration is the process of eliminating faulty component from a system and restoring the system to some operational state. CprE 458/558 G. Manimaran (ISU) 2
  • 3. Reconfiguration Approach  Fault detection is the process of recognizing that a fault has occurred. Fault detection is often required before any recovery procedure can be initiated.  Fault location is the process of determining where a fault has occurred so that an appropriate recovery can be initiated.  Fault containment is the process of isolating a fault and preventing the effects of that fault from propagating throughout the system.  Fault recovery is the process of remaining operational or regaining operational status via reconfiguration even in the presence of faults. CprE 458/558 G. Manimaran (ISU) 3
  • 4. The Concept of Redundancy  Redundancy is simply the addition of information, resources, or time beyond what is needed for normal system operation.  Hardware redundancy is the addition of extra hardware, usually for the purpose either detecting or tolerating faults.  Software redundancy is the addition of extra software, beyond what is needed to perform a given function, to detect and possibly tolerate faults.  Information redundancy is the addition of extra information beyond that required to implement a given function; for example, error detection codes. CprE 458/558 G. Manimaran (ISU) 4
  • 5. The Concept of Redundancy (Cont’d)  Time redundancy uses additional time to perform the functions of a system such that fault detection and often fault tolerance can be achieved. Transient faults are tolerated by this.  The use of redundancy can provide additional capabilities within a system. But, redundancy can have very important impact on a system's performance, size, weight, power consumption, and reliability. CprE 458/558 G. Manimaran (ISU) 5
  • 6. Hardware Redundancy  Passive techniques use the concept of fault masking. These techniques are designed to achieve fault tolerance without requiring any action on the part of the system. Relies on voting mechanisms.  Active techniques achieve fault tolerance by detecting the existence of faults and performing some action to remove the faulty hardware from the system. That is, active techniques use fault detection, fault location, and fault recovery in an attempt to achieve fault tolerance. CprE 458/558 G. Manimaran (ISU) 6
  • 7. Hardware Redundancy (Cont’d)  Hybrid techniques combine the attractive features of both the passive and active approaches.  Fault masking is used in hybrid systems to prevent erroneous results from being generated.  Fault detection, location, and recovery are also used to improve fault tolerance by removing faulty hardware and replacing it with spares. CprE 458/558 G. Manimaran (ISU) 7
  • 8. Hardware Redundancy - A Taxonomy Title: hard-fault.fig Creator: fig2dev Version 3.1 Patchlevel 2 Preview: This EPS picture was not saved with a preview included in it. Comment: This EPS picture will print to a PostScript printer, but not to other types of printers. CprE 458/558 G. Manimaran (ISU) 8
  • 9. Triple Modular Redundancy (TMR) Title: tmr1.fig Creator: fig2dev Version 3.1 Patchlevel 2 Preview: This EPS picture was not saved with a preview included in it. Comment: This EPS picture will print to a PostScript printer, but not to other types of printers. CprE 458/558 G. Manimaran (ISU) 9
  • 10. Software Redundancy - to Detect Software Faults  There are two popular approaches: N-Version Programming (NVP) and Recovery Blocks (RB).  NVP is a forward recovery scheme - it masks faults.  NVP: multiple versions of the same task is executed concurrently.  NVP relies on voting.  RB is a backward error recovery scheme.  RB: the versions of a task are executed serially.  RB relies on acceptance test. CprE 458/558 G. Manimaran (ISU) 10
  • 11. N-Version Programming (NVP)  NVP is based on the principle of design diversity, that is coding a software module by different teams of programmers, to have multiple versions.  Diversity can also be introduced by employing different algorithms for obtaining the same solution or by choosing different programming languages.  NVP can tolerate both hardware and software faults.  Correlated faults are not tolerated by the NVP.  In NVP, deciding the number of versions required to ensure acceptable levels of software reliability is an important design consideration. CprE 458/558 G. Manimaran (ISU) 11
  • 12. N-Version Programming (Cont’d) Title: nvp.fig Creator: fig2dev Version 3.1 Patchlevel 2 Preview: This EPS picture was not saved with a preview included in it. Comment: This EPS picture will print to a PostScript printer, but not to other types of printers. CprE 458/558 G. Manimaran (ISU) 12
  • 13. Recovery Blocks (RB)  RB uses multiple alternates (backups) to perform the same function; one module (task) is primary and the others are secondary.  The primary task executes first. When the primary task completes execution, its outcome is checked by an acceptance test.  If the output is not acceptable, a secondary task executes after undoing the effects of primary (i.e., rolling back to the state at which primary was invoked) until either an acceptable output is obtained or the alternates are exhausted. CprE 458/558 G. Manimaran (ISU) 13
  • 14. Recovery Blocks (Cont’d) Title: rblocks.fig Creator: fig2dev Version 3.1 Patchlevel 2 Preview: This EPS picture was not saved with a preview included in it. Comment: This EPS picture will print to a PostScript printer, but not to other types of printers. CprE 458/558 G. Manimaran (ISU) 14
  • 15. Recovery Blocks (Cont’d)  The acceptance tests are usually sanity checks; these consist of making sure that the output is within a certain acceptable range or that the output does not change at more than the allowed maximum rate.  Selecting the range for acceptance test is crucial. If the allowed ranges are too small, the acceptance tests may label correct outputs as bad. If they are too large, the probability that incorrect outputs will be accepted is more.  RB can tolerate software faults because the alternates are usually implemented with different approaches; RB is also known as Primary-Backup approach. CprE 458/558 G. Manimaran (ISU) 15