SlideShare uma empresa Scribd logo
1 de 35
APPLICATION LEVEL CHECKPOINT-BASED
APPROACH FOR CRUSH FAILURE IN
DISTRIBUTED SYSTEM
Presented By
Moh Moh Khaing
OUTLINES
 Abstract
 Introduction
 Objectives
 Background Theory
 Proposed System
 System flow of proposed system
 Two phases of proposed system
 Implementation
 Conclusion
2
ABSTRACT
 Fault-tolerance for the computing node failure is an important and
critical issue in distributed and parallel processing system.
 If the numbers of computing nodes are increased concurrently and
dynamically in network, it may occur node failure more times.
 This system proposes application level checkpoint-based fault
tolerance approach for distributed computing.
 The proposed system uses coordinated checkpointing techniques
and systematic process logging as global monitoring mechanism.
 The proposed system implements on distributed multiple
sequences alignment (MSA) application using genetic algorithm
(GA).
3
DISTRIBUTED MULTIPLE SEQUENCE ALIGNMENT WITH
GENETIC ALGORITHM (MSAGA)
4
MSA with
GA
Division
Head Node
MSA with
GA
MSA with
GA
Aligned
Sequence Result
Aligned
Sequence Result
Aligned
Sequence Result
Combine Alignment Result
Display Result
DNA Sequences (2 …..n)
SEQUENCES ALIGNMENT EXAMPLE
Input multiple DNA Sequences
>DNAseq1: AAGGAAGGAAGGAAGGAAGGAAGG
>DNAseq2: AAGGAAGGAATGGAAGGAAGGAAGG
>DNAseq3: AAGGAACGGAATGGTAGGAAGGAAGG
Output for aligned DNA Sequences
>DNAseq1: A-AGGA-AGGA-AGGAA-------GG-----AA-GGAAGG
>DNAseq2: ----------------AAGGAAGGAATGGAAGGAAGGAAGG
>DNAseq3: ----------------AAGGAACGGAATGGTAGGAAGGAAGG 5
NODE FAILURE CONDITION
 Node failure condition is occurred when the worker node connects
to head node, worker node accepts the input sequence and worker
node sends resulted sequence the head node. The failure
conditions are
1. Worker node is denied as soon as worker node had connected
to the head node without working any job.
2. Worker node rejects the input sequence from the head node
after the head node and worker node had connected and head
node had prepared the input sequence for worker node.
3. Worker node sends “No Send” message to Head node after
worker node had accepted the result sequence to head node.
4. Worker node is crushed when it cannot connect to the Head
node with correct address.
5. Worker node is crushed when it disconnect to the Head node.
6
COORDINATED CHECKPOINTING
 Checkpointing is used as fault tolerance mechanism in distributed
system.
 A checkpoint is a snapshot of the current state of a process and
assist in monitoring process.
 Coordinated checkpointing takes the checkpoint periodically and
save in the log file.
 This monitoring information provides at the node failure
condition.
 If node failure occurs in distributed computing, another available
node can reconstruct the process state from the information saved
in the checkpoint information of failed node.
7
SYSTEMATIC PROCESS LOGGING
 Systematic Process Logging (SPL) which was derived from a
log-based method.
 The motivation for SPL is to reduce the amount of computation
that can be lost, which is bound by the execution time of a
single failed task.
 SPL saves the checkpoint information from the coordinated
checkpointing as the log file format with exactly time and their
contents.
 Depending on the fault, it decides which node can be accepted
the job from failed node using storing log file.
8
PROPOSED FAULT TOLERANCE SYSTEM
 The checkpoint based fault tolerance approach is implemented
on the application layer without using any operating system
support.
 In distributed multiple sequences alignment application,one head
node and one or more worker nodes are connected with local
area network.
 All worker nodes implemented the MSAGA and aligned the
input sequence from head node independently.
 The proposed fault tolerance system takes the local checkpoint at
the MSA process of each computing worker node themselves
and global checkpoint at events of all workers ’ condition by
head node.
9
ARCHITECTURE OF PROPOSED FAULT TOLERANCE
SYSTEM
Head Node
Local Area Network
GRM GCS
LCS LC
Worker 1
LCS LC
Worker 2
LCS LC
Worker 3
GRM – Global Resource Monitor
GCS – Global Checkpoint Storage
LCS- Local Checkpoint Storage
LC – Local Checkpoint 10
SYSTEM FLOW OF PROPOSED SYSTEM
Start
End
Load Balancing Phase
GRM
HN
GCS
Checkpointing Phase
WNHN
Systematic Process Logging
GCS LCS
WNHN
GRM LC
Coordinated Checkpointing
HN- Head Node
WN – Worker Node
11
IMPLEMENTATION OF HEAD NODE
Checkpointing Phase
 The global resource monitor(GRM) plays the main role in
both coordinated checkpointing phase and systematic process
logging phase.
 GRM takes the global checkpoint of all workers nodes’ event
at the coordinated checkpointing phase.
 GCS saves the global checkpoint information as the log file
format at the Systematic process logging phase.
12
GLOBAL CHECKPOINT
13
Global Rrsource Monitor(GRM )
Begin
1. Taking global checkpoints of current condition of each WN
with WN’s IP, port, status, and time duration
2. Detecting the failure condition of WNs
3. Finding the available worker nodes and decide which node
is suitable for continuing to do failed WN’s jobs
End
TYPES OF CHECKPOINT
14
Checkpoint No Checkpoint
Name
Checkpoint Content
1 Available Worker node is connected with Head node
and waits for jobs from Head node
2 Denied Worker node is disconnected with Server
3 Busy Worker node is processing the jobs
4 Receive Worker node send the result to the Head
node and exist (or) Worker node send
Error message and Exit
5 Crush Worker node sends the crush message to
the Head node
CHECKPOINT INFORMATION
 For each checkpoint, there are four conditions are
described:
 Worker Typeto show worker number,
 IP address to show WN,
 Checkpoint Name to show worker node’s conditions,
 Current Time to show process current time,
 Time Duration to show time within each worker’s
running state to accept and receive state or running
state to reject state.

15
Worker
Type
IP Address Checkpoint
Name
Current
Time
Time
Duration
AVAILABLE CHECKPOINT OF ALL WORKERS
 GRM take checkpoint as Available when all worker nodes are
connected to the head node
16
CHECKPOINT CHANGES FROM AVAILABLE
17
GlobalCheckpoint_Available ( )
Begin
1. IF HN and WNs are connected THEN
GRM takes checkpoint as Available
END IF
2. IF Checkpoint is Available THEN
IF WN is continuously connected to HN THEN
HN selects sequence and send to WNs
IF WN not accepted the sequence THEN
GRM takes checkpoint as Crush
The sequence is go to crush queue
ELSE
GRM takes checkpoint as Busy
WN does MSA application
END IF
ELSE
GRM takes checkpoint as Denied
END IF
End
DETECTING NODE FAILURE BY GRM
18
BUSY CHECKPOINT OF ALL WORKERS
19
CHECKPOINT CHANGES FROM BUSY
20
GlobalCheckpoint_Busy ( )
Begin
1 IF WN accepted input sequence from HN THEN
GRM takes checkpoint as Busy
END IF
2 IF the checkpoint is Busy THEN
IF WN sends error message to HN THEN
GRM takes checkpoint as Receive for error
ELSE
GRM takes checkpoint as Receive for result
END IF
END IF
End
RECEIVE CHECKPOINT WITH RESULT
21
RECEIVE CHECKPOINT WITH NO SEND MESSAGE
22
GLOBAL CHECKPOINT STORAGE(GCS)
23
Global_Checkpoint_Storage ( )
Begin
1 GCS stores the current condition of all WN in network
as checkpoint by GRM
2 GCS records the detail condition of WN
3 Create GCS log file for all checkpoint of nodes
End
GCS LOG FILE
24
LOAD BALANCING PHASE
25
GRM_LoadBalancing( )
BEGIN
IF (GRM detects Denied or Crush or Receive “No Send”) THEN
1 It is assumed that they are the failure of worker node.
2 The GRM finds the available node using GCS and decide
which node is suitable to send job.
3 If so, the HN sends jobs to such available node from failed
node.
4 Call Available and Busy Algorithm
ENDIF
END
LOAD BALANCING ACCORDING TO NODE FAILURE
AS DENIED CHECKPOINT
26
LOAD BALANCING ACCORDING TO NODE FAILURE
AS CRUSH CHECKPOINT
27
LOAD BALANCING ACCORDING TO NODE FAILURE
AS RECEIVE CHECKPOINT(NO SEND)
28
IMPLEMENTATION OF WORKER NODE
 Worker node executes the DNA sequence to form aligned
sequence using MSAGA application
 Worker node takes the local checkpoint at the application level
of MSAGA
 Worker node implements checkpointing phase in proposed fault
tolerance system.
 The local checkpoint (LC) and the local checkpoint storage
(LCS) play the main role in that phase.
 Every worker nodes make the local checkpoint and has own
local checkpoint storage.
 Local checkpoint (LC) takes all checkpoint of each worker node.
 Local checkpoint storage(LCS) stores the process of one
worker’s processing state. 29
LOCAL CHECKPOINT
 local checkpoint (LC) is responsible for taking local checkpoint
of worker process states.
 Local checkpoint (LC) starts to take the checkpoints of worker’s
processing state when worker node (WN) connects to the head
node.
 This local checkpoint’s responsibilities is done till all workers’
processes are finished regularly and worker is exit from local area
network because of node failure.
30
LOCAL CHECKPOINT OF EACH WORKER
31
LocalCheckpoint( )
BEGIN
1 Record WN Starting time, Ending time and connection time
2 Record all process state of MSA for sequence
END
LOCAL CHECKPOINT STORAGE(LCS)
 SPL produces the checkpoint log file and processing log file for
local condition of each node.
 So, all local checkpoint monitoring information are stored into
local checkpoint storage (LCS).
 The LCS is stored by the correspondence each WN.
32
LocalCheckpointStorage( )
BEGIN
1. Store WN Starting time, Ending time and
connection time
2. Store all process state of MSA for sequence
END
LCS LOG FILE
33
CONCLUSION
 The GRM cannot make wrong checkpoint for the number of
worker node .
 GRM can recognize differences between old worker node and new
worker node exactly when the worker node connect to the head
node next again.
 While GRM takes the checkpoint for one worker node, the
remaining workers do not need to stop their operation. Therefore,
there is no block for worker nodes.
 This approach supports that the distributed multiple sequence
alignment processing can operate continuously to get the final
result when the node failure occurred within network.
 This system computes the exact time of each worker nodes and
the whole system execution time. This system can get the portable
checkpoint feature and does not need to use any operating system
supports.
34
THANK YOU!!
35

Mais conteúdo relacionado

Mais procurados

SAND: A Fault-Tolerant Streaming Architecture for Network Traffic Analytics
SAND: A Fault-Tolerant Streaming Architecture for Network Traffic AnalyticsSAND: A Fault-Tolerant Streaming Architecture for Network Traffic Analytics
SAND: A Fault-Tolerant Streaming Architecture for Network Traffic AnalyticsQin Liu
 
OSMC 2021 | Scaling Naemon deployments to Kubernetes with Merlin
OSMC 2021 | Scaling Naemon deployments to Kubernetes with MerlinOSMC 2021 | Scaling Naemon deployments to Kubernetes with Merlin
OSMC 2021 | Scaling Naemon deployments to Kubernetes with MerlinNETWAYS
 
8. mutual exclusion in Distributed Operating Systems
8. mutual exclusion in Distributed Operating Systems8. mutual exclusion in Distributed Operating Systems
8. mutual exclusion in Distributed Operating SystemsDr Sandeep Kumar Poonia
 
Chapter 18 - Distributed Coordination
Chapter 18 - Distributed CoordinationChapter 18 - Distributed Coordination
Chapter 18 - Distributed CoordinationWayne Jones Jnr
 
Clock Synchronization in Distributed Systems
Clock Synchronization in Distributed SystemsClock Synchronization in Distributed Systems
Clock Synchronization in Distributed SystemsZbigniew Jerzak
 
Non integer order controller based robust performance analysis of a conical t...
Non integer order controller based robust performance analysis of a conical t...Non integer order controller based robust performance analysis of a conical t...
Non integer order controller based robust performance analysis of a conical t...Editor Jacotech
 
Process Migration in Heterogeneous Systems
Process Migration in Heterogeneous SystemsProcess Migration in Heterogeneous Systems
Process Migration in Heterogeneous Systemsijsrd.com
 
Gsm kpi optimization
Gsm kpi optimizationGsm kpi optimization
Gsm kpi optimizationBernard Sqa
 
Traffic Based Malicious Switch and DDoS Detection in Software Defined Network
Traffic Based Malicious Switch and DDoS Detection in Software Defined NetworkTraffic Based Malicious Switch and DDoS Detection in Software Defined Network
Traffic Based Malicious Switch and DDoS Detection in Software Defined NetworkAkshaya Arunan
 
Distributed System Management
Distributed System ManagementDistributed System Management
Distributed System ManagementIbrahim Amer
 
Communication And Synchronization In Distributed Systems
Communication And Synchronization In Distributed SystemsCommunication And Synchronization In Distributed Systems
Communication And Synchronization In Distributed Systemsguest61205606
 
Synchronization in distributed systems
Synchronization in distributed systems Synchronization in distributed systems
Synchronization in distributed systems SHATHAN
 
resource management
  resource management  resource management
resource managementAshish Kumar
 
Synchronization Pradeep K Sinha
Synchronization Pradeep K SinhaSynchronization Pradeep K Sinha
Synchronization Pradeep K SinhaJawwad Rafiq
 

Mais procurados (20)

SAND: A Fault-Tolerant Streaming Architecture for Network Traffic Analytics
SAND: A Fault-Tolerant Streaming Architecture for Network Traffic AnalyticsSAND: A Fault-Tolerant Streaming Architecture for Network Traffic Analytics
SAND: A Fault-Tolerant Streaming Architecture for Network Traffic Analytics
 
OSMC 2021 | Scaling Naemon deployments to Kubernetes with Merlin
OSMC 2021 | Scaling Naemon deployments to Kubernetes with MerlinOSMC 2021 | Scaling Naemon deployments to Kubernetes with Merlin
OSMC 2021 | Scaling Naemon deployments to Kubernetes with Merlin
 
Resource management
Resource managementResource management
Resource management
 
8. mutual exclusion in Distributed Operating Systems
8. mutual exclusion in Distributed Operating Systems8. mutual exclusion in Distributed Operating Systems
8. mutual exclusion in Distributed Operating Systems
 
Chapter 18 - Distributed Coordination
Chapter 18 - Distributed CoordinationChapter 18 - Distributed Coordination
Chapter 18 - Distributed Coordination
 
Clock Synchronization in Distributed Systems
Clock Synchronization in Distributed SystemsClock Synchronization in Distributed Systems
Clock Synchronization in Distributed Systems
 
Non integer order controller based robust performance analysis of a conical t...
Non integer order controller based robust performance analysis of a conical t...Non integer order controller based robust performance analysis of a conical t...
Non integer order controller based robust performance analysis of a conical t...
 
Process Migration in Heterogeneous Systems
Process Migration in Heterogeneous SystemsProcess Migration in Heterogeneous Systems
Process Migration in Heterogeneous Systems
 
Distributed System
Distributed SystemDistributed System
Distributed System
 
Gsm kpi optimization
Gsm kpi optimizationGsm kpi optimization
Gsm kpi optimization
 
Chapter05 new
Chapter05 newChapter05 new
Chapter05 new
 
Traffic Based Malicious Switch and DDoS Detection in Software Defined Network
Traffic Based Malicious Switch and DDoS Detection in Software Defined NetworkTraffic Based Malicious Switch and DDoS Detection in Software Defined Network
Traffic Based Malicious Switch and DDoS Detection in Software Defined Network
 
Distributed System Management
Distributed System ManagementDistributed System Management
Distributed System Management
 
Communication And Synchronization In Distributed Systems
Communication And Synchronization In Distributed SystemsCommunication And Synchronization In Distributed Systems
Communication And Synchronization In Distributed Systems
 
Synchronization in distributed systems
Synchronization in distributed systems Synchronization in distributed systems
Synchronization in distributed systems
 
resource management
  resource management  resource management
resource management
 
Chapter 6 synchronization
Chapter 6 synchronizationChapter 6 synchronization
Chapter 6 synchronization
 
Process Synchronization
Process SynchronizationProcess Synchronization
Process Synchronization
 
Process Management-Process Migration
Process Management-Process MigrationProcess Management-Process Migration
Process Management-Process Migration
 
Synchronization Pradeep K Sinha
Synchronization Pradeep K SinhaSynchronization Pradeep K Sinha
Synchronization Pradeep K Sinha
 

Semelhante a Grds conferences icst and icbelsh (9)

Review of Some Checkpointing Schemes for Distributed and Mobile Computing Env...
Review of Some Checkpointing Schemes for Distributed and Mobile Computing Env...Review of Some Checkpointing Schemes for Distributed and Mobile Computing Env...
Review of Some Checkpointing Schemes for Distributed and Mobile Computing Env...Eswar Publications
 
CS304PC:Computer Organization and Architecture Session 15 program control.pptx
CS304PC:Computer Organization and Architecture Session 15 program control.pptxCS304PC:Computer Organization and Architecture Session 15 program control.pptx
CS304PC:Computer Organization and Architecture Session 15 program control.pptxAsst.prof M.Gokilavani
 
Ch17 OS
Ch17 OSCh17 OS
Ch17 OSC.U
 
Streaming systems - Part 2
Streaming systems - Part 2Streaming systems - Part 2
Streaming systems - Part 2Sandeep Malhotra
 
Hierarchical Non-blocking Coordinated Checkpointing Algorithms for Mobile Dis...
Hierarchical Non-blocking Coordinated Checkpointing Algorithms for Mobile Dis...Hierarchical Non-blocking Coordinated Checkpointing Algorithms for Mobile Dis...
Hierarchical Non-blocking Coordinated Checkpointing Algorithms for Mobile Dis...CSCJournals
 
Software rejuvenation based fault tolerance
Software rejuvenation based fault toleranceSoftware rejuvenation based fault tolerance
Software rejuvenation based fault tolerancewww.pixelsolutionbd.com
 
Computer Organization
Computer OrganizationComputer Organization
Computer OrganizationAnish Goel
 
Capturing Monotonic Components from Input Patterns
Capturing Monotonic Components from Input PatternsCapturing Monotonic Components from Input Patterns
Capturing Monotonic Components from Input Patternsijeukens
 
Motorola BSC Overview
Motorola BSC OverviewMotorola BSC Overview
Motorola BSC OverviewFarhan Ahmed
 
Where is my MQ message on z/OS?
Where is my MQ message on z/OS?Where is my MQ message on z/OS?
Where is my MQ message on z/OS?Matt Leming
 
Operating Systems - "Chapter 5 Process Synchronization"
Operating Systems - "Chapter 5 Process Synchronization"Operating Systems - "Chapter 5 Process Synchronization"
Operating Systems - "Chapter 5 Process Synchronization"Ra'Fat Al-Msie'deen
 
Formal Verification of Distributed Checkpointing Using Event-B
Formal Verification of Distributed Checkpointing Using Event-BFormal Verification of Distributed Checkpointing Using Event-B
Formal Verification of Distributed Checkpointing Using Event-Bijcsit
 
Integrating fault tolerant scheme with feedback control scheduling algorithm ...
Integrating fault tolerant scheme with feedback control scheduling algorithm ...Integrating fault tolerant scheme with feedback control scheduling algorithm ...
Integrating fault tolerant scheme with feedback control scheduling algorithm ...ijics
 
[White paper] detecting problems in industrial networks though continuous mon...
[White paper] detecting problems in industrial networks though continuous mon...[White paper] detecting problems in industrial networks though continuous mon...
[White paper] detecting problems in industrial networks though continuous mon...TI Safe
 
Operating system Interview Questions
Operating system Interview QuestionsOperating system Interview Questions
Operating system Interview QuestionsKuntal Bhowmick
 

Semelhante a Grds conferences icst and icbelsh (9) (20)

Review of Some Checkpointing Schemes for Distributed and Mobile Computing Env...
Review of Some Checkpointing Schemes for Distributed and Mobile Computing Env...Review of Some Checkpointing Schemes for Distributed and Mobile Computing Env...
Review of Some Checkpointing Schemes for Distributed and Mobile Computing Env...
 
CS304PC:Computer Organization and Architecture Session 15 program control.pptx
CS304PC:Computer Organization and Architecture Session 15 program control.pptxCS304PC:Computer Organization and Architecture Session 15 program control.pptx
CS304PC:Computer Organization and Architecture Session 15 program control.pptx
 
Ch17 OS
Ch17 OSCh17 OS
Ch17 OS
 
OS_Ch17
OS_Ch17OS_Ch17
OS_Ch17
 
Streaming systems - Part 2
Streaming systems - Part 2Streaming systems - Part 2
Streaming systems - Part 2
 
Module3 part1
Module3 part1Module3 part1
Module3 part1
 
Hierarchical Non-blocking Coordinated Checkpointing Algorithms for Mobile Dis...
Hierarchical Non-blocking Coordinated Checkpointing Algorithms for Mobile Dis...Hierarchical Non-blocking Coordinated Checkpointing Algorithms for Mobile Dis...
Hierarchical Non-blocking Coordinated Checkpointing Algorithms for Mobile Dis...
 
Software rejuvenation based fault tolerance
Software rejuvenation based fault toleranceSoftware rejuvenation based fault tolerance
Software rejuvenation based fault tolerance
 
Computer Organization
Computer OrganizationComputer Organization
Computer Organization
 
p2 p grid
 p2 p grid  p2 p grid
p2 p grid
 
Capturing Monotonic Components from Input Patterns
Capturing Monotonic Components from Input PatternsCapturing Monotonic Components from Input Patterns
Capturing Monotonic Components from Input Patterns
 
Real Time System
Real Time SystemReal Time System
Real Time System
 
Motorola BSC Overview
Motorola BSC OverviewMotorola BSC Overview
Motorola BSC Overview
 
Stream Processing Overview
Stream Processing OverviewStream Processing Overview
Stream Processing Overview
 
Where is my MQ message on z/OS?
Where is my MQ message on z/OS?Where is my MQ message on z/OS?
Where is my MQ message on z/OS?
 
Operating Systems - "Chapter 5 Process Synchronization"
Operating Systems - "Chapter 5 Process Synchronization"Operating Systems - "Chapter 5 Process Synchronization"
Operating Systems - "Chapter 5 Process Synchronization"
 
Formal Verification of Distributed Checkpointing Using Event-B
Formal Verification of Distributed Checkpointing Using Event-BFormal Verification of Distributed Checkpointing Using Event-B
Formal Verification of Distributed Checkpointing Using Event-B
 
Integrating fault tolerant scheme with feedback control scheduling algorithm ...
Integrating fault tolerant scheme with feedback control scheduling algorithm ...Integrating fault tolerant scheme with feedback control scheduling algorithm ...
Integrating fault tolerant scheme with feedback control scheduling algorithm ...
 
[White paper] detecting problems in industrial networks though continuous mon...
[White paper] detecting problems in industrial networks though continuous mon...[White paper] detecting problems in industrial networks though continuous mon...
[White paper] detecting problems in industrial networks though continuous mon...
 
Operating system Interview Questions
Operating system Interview QuestionsOperating system Interview Questions
Operating system Interview Questions
 

Mais de Global R & D Services (20)

Wb june ictel
Wb june ictelWb june ictel
Wb june ictel
 
Wb june icrst
Wb june icrstWb june icrst
Wb june icrst
 
Wb june ecg
Wb june ecgWb june ecg
Wb june ecg
 
Wb june icssh
Wb  june icsshWb  june icssh
Wb june icssh
 
Wb june icpbs
Wb  june icpbsWb  june icpbs
Wb june icpbs
 
Wb june icnm
Wb  june icnmWb  june icnm
Wb june icnm
 
Wb june icllr
Wb  june icllrWb  june icllr
Wb june icllr
 
Wb june ichlsr
Wb  june ichlsrWb  june ichlsr
Wb june ichlsr
 
Wb june icbmls
Wb  june icbmlsWb  june icbmls
Wb june icbmls
 
Rome icpbs 2017
Rome icpbs 2017Rome icpbs 2017
Rome icpbs 2017
 
Rome icnm
Rome icnmRome icnm
Rome icnm
 
Romei ecg 2017
Romei ecg 2017Romei ecg 2017
Romei ecg 2017
 
Rome ictel 2017
Rome  ictel 2017Rome  ictel 2017
Rome ictel 2017
 
Rome icssh, ppt
Rome icssh, pptRome icssh, ppt
Rome icssh, ppt
 
Rome icrst 2017
Rome  icrst 2017Rome  icrst 2017
Rome icrst 2017
 
Rome icllr 2017
Rome  icllr 2017Rome  icllr 2017
Rome icllr 2017
 
Rome ichlsr
Rome  ichlsr Rome  ichlsr
Rome ichlsr
 
Rome icbmls 2017
Rome icbmls 2017Rome icbmls 2017
Rome icbmls 2017
 
Wb june ictel
Wb june ictelWb june ictel
Wb june ictel
 
Wb june icrst
Wb june icrstWb june icrst
Wb june icrst
 

Último

Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
Culture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxCulture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxPoojaSen20
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)cama23
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptxiammrhaywood
 
FILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinoFILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinojohnmickonozaleda
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptxSherlyMaeNeri
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 

Último (20)

Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
Culture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxCulture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptx
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
 
FILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinoFILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipino
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptx
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 

Grds conferences icst and icbelsh (9)

  • 1. APPLICATION LEVEL CHECKPOINT-BASED APPROACH FOR CRUSH FAILURE IN DISTRIBUTED SYSTEM Presented By Moh Moh Khaing
  • 2. OUTLINES  Abstract  Introduction  Objectives  Background Theory  Proposed System  System flow of proposed system  Two phases of proposed system  Implementation  Conclusion 2
  • 3. ABSTRACT  Fault-tolerance for the computing node failure is an important and critical issue in distributed and parallel processing system.  If the numbers of computing nodes are increased concurrently and dynamically in network, it may occur node failure more times.  This system proposes application level checkpoint-based fault tolerance approach for distributed computing.  The proposed system uses coordinated checkpointing techniques and systematic process logging as global monitoring mechanism.  The proposed system implements on distributed multiple sequences alignment (MSA) application using genetic algorithm (GA). 3
  • 4. DISTRIBUTED MULTIPLE SEQUENCE ALIGNMENT WITH GENETIC ALGORITHM (MSAGA) 4 MSA with GA Division Head Node MSA with GA MSA with GA Aligned Sequence Result Aligned Sequence Result Aligned Sequence Result Combine Alignment Result Display Result DNA Sequences (2 …..n)
  • 5. SEQUENCES ALIGNMENT EXAMPLE Input multiple DNA Sequences >DNAseq1: AAGGAAGGAAGGAAGGAAGGAAGG >DNAseq2: AAGGAAGGAATGGAAGGAAGGAAGG >DNAseq3: AAGGAACGGAATGGTAGGAAGGAAGG Output for aligned DNA Sequences >DNAseq1: A-AGGA-AGGA-AGGAA-------GG-----AA-GGAAGG >DNAseq2: ----------------AAGGAAGGAATGGAAGGAAGGAAGG >DNAseq3: ----------------AAGGAACGGAATGGTAGGAAGGAAGG 5
  • 6. NODE FAILURE CONDITION  Node failure condition is occurred when the worker node connects to head node, worker node accepts the input sequence and worker node sends resulted sequence the head node. The failure conditions are 1. Worker node is denied as soon as worker node had connected to the head node without working any job. 2. Worker node rejects the input sequence from the head node after the head node and worker node had connected and head node had prepared the input sequence for worker node. 3. Worker node sends “No Send” message to Head node after worker node had accepted the result sequence to head node. 4. Worker node is crushed when it cannot connect to the Head node with correct address. 5. Worker node is crushed when it disconnect to the Head node. 6
  • 7. COORDINATED CHECKPOINTING  Checkpointing is used as fault tolerance mechanism in distributed system.  A checkpoint is a snapshot of the current state of a process and assist in monitoring process.  Coordinated checkpointing takes the checkpoint periodically and save in the log file.  This monitoring information provides at the node failure condition.  If node failure occurs in distributed computing, another available node can reconstruct the process state from the information saved in the checkpoint information of failed node. 7
  • 8. SYSTEMATIC PROCESS LOGGING  Systematic Process Logging (SPL) which was derived from a log-based method.  The motivation for SPL is to reduce the amount of computation that can be lost, which is bound by the execution time of a single failed task.  SPL saves the checkpoint information from the coordinated checkpointing as the log file format with exactly time and their contents.  Depending on the fault, it decides which node can be accepted the job from failed node using storing log file. 8
  • 9. PROPOSED FAULT TOLERANCE SYSTEM  The checkpoint based fault tolerance approach is implemented on the application layer without using any operating system support.  In distributed multiple sequences alignment application,one head node and one or more worker nodes are connected with local area network.  All worker nodes implemented the MSAGA and aligned the input sequence from head node independently.  The proposed fault tolerance system takes the local checkpoint at the MSA process of each computing worker node themselves and global checkpoint at events of all workers ’ condition by head node. 9
  • 10. ARCHITECTURE OF PROPOSED FAULT TOLERANCE SYSTEM Head Node Local Area Network GRM GCS LCS LC Worker 1 LCS LC Worker 2 LCS LC Worker 3 GRM – Global Resource Monitor GCS – Global Checkpoint Storage LCS- Local Checkpoint Storage LC – Local Checkpoint 10
  • 11. SYSTEM FLOW OF PROPOSED SYSTEM Start End Load Balancing Phase GRM HN GCS Checkpointing Phase WNHN Systematic Process Logging GCS LCS WNHN GRM LC Coordinated Checkpointing HN- Head Node WN – Worker Node 11
  • 12. IMPLEMENTATION OF HEAD NODE Checkpointing Phase  The global resource monitor(GRM) plays the main role in both coordinated checkpointing phase and systematic process logging phase.  GRM takes the global checkpoint of all workers nodes’ event at the coordinated checkpointing phase.  GCS saves the global checkpoint information as the log file format at the Systematic process logging phase. 12
  • 13. GLOBAL CHECKPOINT 13 Global Rrsource Monitor(GRM ) Begin 1. Taking global checkpoints of current condition of each WN with WN’s IP, port, status, and time duration 2. Detecting the failure condition of WNs 3. Finding the available worker nodes and decide which node is suitable for continuing to do failed WN’s jobs End
  • 14. TYPES OF CHECKPOINT 14 Checkpoint No Checkpoint Name Checkpoint Content 1 Available Worker node is connected with Head node and waits for jobs from Head node 2 Denied Worker node is disconnected with Server 3 Busy Worker node is processing the jobs 4 Receive Worker node send the result to the Head node and exist (or) Worker node send Error message and Exit 5 Crush Worker node sends the crush message to the Head node
  • 15. CHECKPOINT INFORMATION  For each checkpoint, there are four conditions are described:  Worker Typeto show worker number,  IP address to show WN,  Checkpoint Name to show worker node’s conditions,  Current Time to show process current time,  Time Duration to show time within each worker’s running state to accept and receive state or running state to reject state.  15 Worker Type IP Address Checkpoint Name Current Time Time Duration
  • 16. AVAILABLE CHECKPOINT OF ALL WORKERS  GRM take checkpoint as Available when all worker nodes are connected to the head node 16
  • 17. CHECKPOINT CHANGES FROM AVAILABLE 17 GlobalCheckpoint_Available ( ) Begin 1. IF HN and WNs are connected THEN GRM takes checkpoint as Available END IF 2. IF Checkpoint is Available THEN IF WN is continuously connected to HN THEN HN selects sequence and send to WNs IF WN not accepted the sequence THEN GRM takes checkpoint as Crush The sequence is go to crush queue ELSE GRM takes checkpoint as Busy WN does MSA application END IF ELSE GRM takes checkpoint as Denied END IF End
  • 19. BUSY CHECKPOINT OF ALL WORKERS 19
  • 20. CHECKPOINT CHANGES FROM BUSY 20 GlobalCheckpoint_Busy ( ) Begin 1 IF WN accepted input sequence from HN THEN GRM takes checkpoint as Busy END IF 2 IF the checkpoint is Busy THEN IF WN sends error message to HN THEN GRM takes checkpoint as Receive for error ELSE GRM takes checkpoint as Receive for result END IF END IF End
  • 22. RECEIVE CHECKPOINT WITH NO SEND MESSAGE 22
  • 23. GLOBAL CHECKPOINT STORAGE(GCS) 23 Global_Checkpoint_Storage ( ) Begin 1 GCS stores the current condition of all WN in network as checkpoint by GRM 2 GCS records the detail condition of WN 3 Create GCS log file for all checkpoint of nodes End
  • 25. LOAD BALANCING PHASE 25 GRM_LoadBalancing( ) BEGIN IF (GRM detects Denied or Crush or Receive “No Send”) THEN 1 It is assumed that they are the failure of worker node. 2 The GRM finds the available node using GCS and decide which node is suitable to send job. 3 If so, the HN sends jobs to such available node from failed node. 4 Call Available and Busy Algorithm ENDIF END
  • 26. LOAD BALANCING ACCORDING TO NODE FAILURE AS DENIED CHECKPOINT 26
  • 27. LOAD BALANCING ACCORDING TO NODE FAILURE AS CRUSH CHECKPOINT 27
  • 28. LOAD BALANCING ACCORDING TO NODE FAILURE AS RECEIVE CHECKPOINT(NO SEND) 28
  • 29. IMPLEMENTATION OF WORKER NODE  Worker node executes the DNA sequence to form aligned sequence using MSAGA application  Worker node takes the local checkpoint at the application level of MSAGA  Worker node implements checkpointing phase in proposed fault tolerance system.  The local checkpoint (LC) and the local checkpoint storage (LCS) play the main role in that phase.  Every worker nodes make the local checkpoint and has own local checkpoint storage.  Local checkpoint (LC) takes all checkpoint of each worker node.  Local checkpoint storage(LCS) stores the process of one worker’s processing state. 29
  • 30. LOCAL CHECKPOINT  local checkpoint (LC) is responsible for taking local checkpoint of worker process states.  Local checkpoint (LC) starts to take the checkpoints of worker’s processing state when worker node (WN) connects to the head node.  This local checkpoint’s responsibilities is done till all workers’ processes are finished regularly and worker is exit from local area network because of node failure. 30
  • 31. LOCAL CHECKPOINT OF EACH WORKER 31 LocalCheckpoint( ) BEGIN 1 Record WN Starting time, Ending time and connection time 2 Record all process state of MSA for sequence END
  • 32. LOCAL CHECKPOINT STORAGE(LCS)  SPL produces the checkpoint log file and processing log file for local condition of each node.  So, all local checkpoint monitoring information are stored into local checkpoint storage (LCS).  The LCS is stored by the correspondence each WN. 32 LocalCheckpointStorage( ) BEGIN 1. Store WN Starting time, Ending time and connection time 2. Store all process state of MSA for sequence END
  • 34. CONCLUSION  The GRM cannot make wrong checkpoint for the number of worker node .  GRM can recognize differences between old worker node and new worker node exactly when the worker node connect to the head node next again.  While GRM takes the checkpoint for one worker node, the remaining workers do not need to stop their operation. Therefore, there is no block for worker nodes.  This approach supports that the distributed multiple sequence alignment processing can operate continuously to get the final result when the node failure occurred within network.  This system computes the exact time of each worker nodes and the whole system execution time. This system can get the portable checkpoint feature and does not need to use any operating system supports. 34