SlideShare a Scribd company logo
1 of 5
Download to read offline
)DXOW 7ROHUDQW 3ODWIRUPV
                                         IRU 0DQXIDFWXULQJ $SSOLFDWLRQV
                       %< *5(* *25%$&+                   6(37(0%(5              $5 ,16,*+76      0 (



                       .(:25'6
                       Fault Tolerance, High Availability, Cluster, Collaborative Manufacturing


                    6800$5
                    New, low-cost technology for fault-tolerant platforms is now available for Microsoft
                    Windows 2000 environments. Manufacturers should revisit some old assumptions
                                       about where they might benefit from deploying these platforms.
   7KH FRVW RI WKH QHZ IDXOW WROHUDQW  Collaboration puts a premium on real-time manufacturing informa-
       VVWHPV KDV IDOOHQ VR IDU WKDW
                                       tion, and these systems can help ensure that the information is
        PDQXIDFWXUHUV PXVW UHWKLQN
                                       always available. Next generation automation systems, production
    HQVXULQJ WKH DYDLODELOLW RI WKHLU
                 FULWLFDO LQIRUPDWLRQ
                                       management systems, business systems, and collaborative systems
                                       can all benefit from this technology.


                       $1$/6,6
                       The first is the fully replicated, fault-tolerant hardware solution from Stratus Computer
                       Systems, with duplicate components operating in lockstep. In the event of a component
                       failure, there is no interruption in processing, no lost data, and no slowdown in perform-
                                                                       ance. The second approach, offered by
'HVFULSWLRQ              6WUDWXV         0DUDWKRQ      OXVWHU         Marathon Technologies, isolates all I/O
$YDLODELOLW                                           from both the user operating system and the
                                                                       application by placing these tasks on differ-
5HFRYHU 7LPH            =HUR            0LOOLVHFRQGV 0LQXWHV
                                                                       ent    computers      connected     through
RSLHV RI 26                           0XOWLSOH      0XOWLSOH
                                                                       proprietary interface cards, software, and
6PPHWULF 0XOWL         $YDLODEOH       1R            $YDLODEOH       high speed interconnect.
3URFHVVLQJ

6VWHP 2SHUDWLRQ        6LQJOH 6VWHP    6SOLW          0XOWL6VWHP
                        ,PDJH            $UFKLWHFWXUH   OXVWHU        %HRQG OXVWHUV
                                                                       While the traditional clustering approach to
,PSOHPHQWDWLRQ          1R ZRUN          ,QWHJUDWH )7   6FULSW 'H
                        UHTXLUHG         RPSR         YHORSPHQW      fault tolerance does provide for enhanced
                                         QHQWV          DQG 7HVWLQJ    availability, there are significant limitations.
                         UG                              UG
'LVDVWHU 7ROHUDQFH       3DUW          $YDLODEOH       3DUW        Cluster solutions do not provide fault toler-
6LQJOH 6XSSRUW          HV              UG 3DUW      UG 3DUW      ance   (failure   and    repair/recovery      is
RQWDFW                                                                transparent to the user), only failover (a
               RPSDULVRQ RI )DXOW 7ROHUDQW 6ROXWLRQV                  backup system automatically restarts the




                                        @IU@SQSDT@Ã6I9ÃH6IVA68UVSDIBÃTUS6U@BD@TÃAPSÃDI9VTUS`Ã@Y@8VUDW@TÃ
6S8ÃD†vtu‡†ÃQhtrÃ!Ã




                     applications and logs on the users). Implementation requires the development, testing,
                     and support of custom failover scripts, licensing and installation of multiple copies of
                     software, and possibly application modifications for a cluster environment. In the event
                     of a hardware failure, a cluster failover always loses all memory contents, and several
                     minutes will be required to recover. Cluster solutions offer 99.9 percent availability
                     (about 8 hours down per year), but fault tolerant solutions offer 99.999 percent availabil-
                     ity (about 5 minutes down per year).


                     +DUGZDUH )DXOW 7ROHUDQFH
                     The first requirement for high availability systems is hardware fault tolerance. Stratus
                     and Marathon each take a different approach.

                     Stratus ftServer
                     ftServer uses standard Intel server components and designs, but Stratus designs its own
                     motherboard (using standard Intel server design guidelines), removes the PCI I/O, and
                     adds fault detection logic that is key to fault isolation in a DMR configuration. The sys-
                     tem contains two motherboards for Dual Modular Redundancy (DMR) or three
                     motherboards for Triple Modular Redundancy (TMR). All motherboards run in lock-
                                                                                                step, using a single system clock, and
      Disk     PCI                         Fault
                                                                                                redundant clock cards. Fault-detection
                         Fault                       Memory         CPU
                       Detection         Detection
                                                                 1-N way SMP                    and isolation logic (a custom ASIC) com-
                                                                               Lockstep CPU’s
                                                                               Lockstep CPU’s
                                                                               Lockstep CPU’s
                                                                               Lockstep CPU’s




                                            
                       Isolation         Isolation   Chipset
                                                                                                pares I/O output from all motherboards.
DMR
      Disk     PCI                                                                              DMR systems rely on fault-detection
                         Fault             Fault     Memory         CPU
                       Detection         Detection
                                             
                                                                 1-N way SMP                    logic on each motherboard to see which
                          
                                         Isolation   Chipset
                       Isolation
                                                                                                is in error. If no motherboard error is
                                                                                                signaled, a software algorithm decides
                                           Fault     Memory         CPU
                                         Detection
                                                                 1-N way SMP
                                                                                                which board to remove. In a TMR sys-
TMR                                          
                                                     Chipset
                                         Isolation                                              tem, 3-way voting is used to isolate the
                                                                                                failed board. ftServer runs a single copy
 6WUDWXV· IW6HUYHU $UFKLWHFWXUH (QVXUHV =HUR 6ZLWFKRYHU 7LPH                                   of all software, resulting in lower licens-
   1R 6LQJOH 3RLQW RI )DLOXUH DQG D 6LQJOH 6RIWZDUH ,PDJH
                                                                                                ing costs and simple administration.

                     Marathon Endurance System
                     Marathon physically and logically separates the two basic operations of computers, the
                     manipulating and transforming data (computing) and the moving data to and from mass
                     storage, networks, and other I/O devices (I/O processing). The computing function is
                     put on one server (the compute element), and the I/O processing function is put on an-
                     other server, (the I/O processor).                   These CE/IOP pairs (tuples) connect through
                     proprietary high-speed PCI interfaces and fiber optics. The Marathon Interface Card




                       ‹Ã!       ÇÃ6S8Ã6q‰v†‚…’ÃB…‚ˆƒÃ‡ÃÃ6yyvrqÃ9…v‰rÇÃ9rquh€ÃH6Ã!!%ÃVT6ÇÃ'             #      ÇÃ6S8rip‚€Ã

                                                               VT6ÇÃVFÇÃBr…€h’ÇÃEhƒhÃ‡ÃDqvhÃ
6S8ÃD†vtu‡†ÃQhtrÃÃ




                      (MIC) sends and receives data from both systems simultaneously. The MIC also pro-
                      vides the comparison and test logic to ensures that both systems are identical. Each
                      tuple is a complete system, wherein the operating system running on both the CE and
                      IOP is a Windows server OS. All CE I/O task requests go to the IOP for handling.
                      Marathon software runs as an application on the IOP and controls all of the fault man-
                      agement, disk mirroring, system management, and resynchronization. Because the fault
                      management is done in software, it can impact the performance. Depending on the ap-
                      plications running, system performance may degrade by 10-20 percent or more.

                                                 It takes two tuples to configure an assured availability system. The
Compute Element                                  IOPs run in parallel, but not in lockstep. If an IOP fails, the other
  CPU                                            IOP continues to run the system. The failed IOP can then be physi-
                        Applications and         cally removed. After the Marathon software starts running, the
 MEMORY         MIC     Operating System         repaired IOP automatically rejoins the configuration. The mirrored
I/O Processor                                    disks are re-mirrored in background mode over the private
                                                 Ethernet linking the IOPs. The same process handles the failure of
  MEMORY        MIC
                         All I/O                 a mirrored disk.
               I/O
  CPU       ADAPTERS

                                                 6RIWZDUH $YDLODELOLW
                       Network                   The second requirement is for maximizing software availability.
                                                 Clusters rely on standard hardware, software, and service models
                                                 that do not help prevent failures, isolate failures, or resolve failures.
  0DUDWKRQ 7XSOH ³ %XLOGLQJ
%ORFN IRU DQ $VVXUHG $YDLODELOLW                They simply recover from failures. Once again, Marathon and Stra-
                                                 tus have different approaches.

                      Stratus
                      Software availability features seek to prevent outages, minimize those that cannot be
                      prevented, and resolve problems so that they do not happen again. Stratus does not
                      change any of the core Windows code. This guarantees 100 percent binary compatibility
                      of all Windows applications. Stratus does change the Windows 2000 environment, but
                      only in areas designed to be customized by hardware and software partners and sepa-
                      rated from the main body of Windows code by documented, well-defined interfaces.

                      Drivers cause a significant percentage of NT failures. Stratus driver hardening goes be-
                      yond Windows 2000 improvements to further reduce driver-induced OS failures. The
                      driver defines its memory boundaries and works with Stratus hardware to automatically
                      prevent memory transfers beyond the defined memory boundaries. This prevents a bad
                      PCI card from crashing the system. The new Microsoft driver model for Windows 2000
                      uses WMI (Windows Management Instrumentation) for management, control, and re-
                      porting functions.         Stratus hardened drivers are completely compatible with WMI.



                        ‹Ã!      ÇÃ6S8Ã6q‰v†‚…’ÃB…‚ˆƒÃ‡ÃÃ6yyvrqÃ9…v‰rÇÃ9rquh€ÃH6Ã!!%ÃVT6ÇÃ'   #      ÇÃ6S8rip‚€Ã

                                                           VT6ÇÃVFÇÃBr…€h’ÇÃEhƒhÃ‡ÃDqvhÃ
6S8ÃD†vtu‡†ÃQhtrÃ#Ã




Stratus recommends that all drivers be hardened. Hardened drivers for all installed
adapters are required in order to receive Stratus’ 100 percent availability guarantee.

Incompatible versions of hardware and software from different suppliers are well-
known. The Resource Inventory Manager (RIM) identifies all system hardware and
software configuration elements, along with their revision levels, at initial install and all
configuration changes. This information is stored and is also sent to the Stratus CAC,
which can check known conflicts and help diagnose any problems.

Marathon
Marathon’s architecture provides hardware fault tolerance, protection against transient
OS bugs, detects OS failures, and automatically restarts the system. Because the IOPs
run Marathon’s I/O management and fault-handling software, they are isolated from the
loads placed on the CEs by the user’s applications and operating system. The IOPs run
in parallel, but not in lockstep. Since the IOPs handle all interruptions, the CEs are free
to run the OS and user applications without the usual stream of asynchrony. Interrup-
tions are managed through a structured process that eliminates a major source of
asynchrony-induced software failures. The IOPs are subjected to these asynchronies, but
since there are two autonomous IOPs in a full fault-tolerant system, an interrupt-induced
software asynchrony will only affect one of the IOPs. If an IOP goes down, the surviving
IOP carries on until an automatic reboot of the failed IOP is completed.


6HUYLFH
The third requirement for high availability systems is designed-in serviceability. Again,
Stratus and Marathon have different approaches.

Stratus
Serviceability is built into the ftServer hardware design in the form of customer replace-
able modules, automatic fault isolation and remote management, and reporting through
the Stratus remote management card. The Stratus Service Network (SSN) enables re-
mote access to every customer system. The Stratus Customer Assistance Center provides
the 24/7 critical support.

ftServer automatically isolates failures to the component level while continuing opera-
tion on a second component. Failures are automatically reported to the CAC via a dial
connection. A replacement component is shipped from Stratus for next-day arrival. The
customer replaces the component while the system continues to operate. The new com-
ponent is automatically integrated into the running system. The system and application
continue to run normally through this entire process.




  ‹Ã!   ÇÃ6S8Ã6q‰v†‚…’ÃB…‚ˆƒÃ‡ÃÃ6yyvrqÃ9…v‰rÇÃ9rquh€ÃH6Ã!!%ÃVT6ÇÃ'   #      ÇÃ6S8rip‚€Ã

                                  VT6ÇÃVFÇÃBr…€h’ÇÃEhƒhÃ‡ÃDqvhÃ
6S8ÃD†vtu‡†ÃQhtrÃ$Ã




Each ftServer comes with two ftServer Management PCI adapters. These adapters are,
themselves, board level computers. They run independently of the host system and are
powered even if the rest of the system is powered off. Either redundant ftServer Man-
agement adapter provides full control over the ftServer. Access is controlled through an
TCP/IP interface via dial modem or local Ethernet.

If a customer calls, Stratus will troubleshoot the problem. If the problem is in Microsoft
Windows 2000 code, Stratus calls in Microsoft, based on its service contract with Micro-
soft. Stratus also has licensed Windows 2000 source code and a staff of kernel-trained
engineers. Microsoft has also given Stratus access to their OS debugging tools.

Marathon
The Marathon Assured Availability system has three states: operational, vulnerable, and
down. The vulnerable state, invisible to users, notifies the system manager that a re-
pair/resynchronization cycle can be initiated.                Marathon provides two notification
methods: the system console and the event log. The console presents a graphical model
on the system monitor, on remote systems over the network, or through a serial line to
the system manager. Color-coded components indicate their state, and a point-and-click
interface is used to examine and manage system components. The second method uses
the Windows server event log to log all events, including Marathon system events. Sev-
eral third-party tools are available that use the event log to communicate specified events
via beepers, fax, e-mail, etc., to the system manager.


5(200(1'$7,216
•    All systems supporting real-time collaboration throughout the enterprise and value
     chain should be deployed on fault-tolerant platforms.

•    When it comes to control-level, real-time, batch and process control applications,
     Stratus ftServer has the advantage because their architecture has no single point of
     failure and zero switchover time.

•    When selecting fault tolerant solutions, consider the whole solution, including
     hardware fault tolerance, software availability, performance, implementation costs,
     and serviceability.

For further information, contact your account manager or the author at ggorbach@arcweb.com.
Recommended circulation: All EAS and MAS clients.




    ‹Ã!   ÇÃ6S8Ã6q‰v†‚…’ÃB…‚ˆƒÃ‡ÃÃ6yyvrqÃ9…v‰rÇÃ9rquh€ÃH6Ã!!%ÃVT6ÇÃ'   #      ÇÃ6S8rip‚€Ã

                                    VT6ÇÃVFÇÃBr…€h’ÇÃEhƒhÃ‡ÃDqvhÃ

More Related Content

What's hot

Nvidia’s tegra line of processors for mobile devices2 2
Nvidia’s tegra line of processors for mobile devices2 2Nvidia’s tegra line of processors for mobile devices2 2
Nvidia’s tegra line of processors for mobile devices2 2Sukul Yarraguntla
 
I/O仮想化最前線〜ネットワークI/Oを中心に〜
I/O仮想化最前線〜ネットワークI/Oを中心に〜I/O仮想化最前線〜ネットワークI/Oを中心に〜
I/O仮想化最前線〜ネットワークI/Oを中心に〜Ryousei Takano
 
Motherboard manual ga-8i945gmf_e
Motherboard manual ga-8i945gmf_eMotherboard manual ga-8i945gmf_e
Motherboard manual ga-8i945gmf_erveiga100
 
SoM with Zynq UltraScale device
SoM with Zynq UltraScale deviceSoM with Zynq UltraScale device
SoM with Zynq UltraScale devicenie, jack
 
Expanding The Micro Blaze System
Expanding  The Micro Blaze  SystemExpanding  The Micro Blaze  System
Expanding The Micro Blaze Systemiuui
 
Motherboard manual 8vm533m-rz_e
Motherboard manual 8vm533m-rz_eMotherboard manual 8vm533m-rz_e
Motherboard manual 8vm533m-rz_eJeferson Camargo
 
Nexys4ddr rm FPGA board Datasheet
Nexys4ddr rm  FPGA board DatasheetNexys4ddr rm  FPGA board Datasheet
Nexys4ddr rm FPGA board DatasheetOmkar Rane
 
M2 m ehs6t_hardware
M2 m ehs6t_hardwareM2 m ehs6t_hardware
M2 m ehs6t_hardwareMarianoPenna
 
Blue Line Supermicro Aplus
Blue Line Supermicro AplusBlue Line Supermicro Aplus
Blue Line Supermicro AplusBlue Line
 

What's hot (20)

P drive schneider
P drive schneiderP drive schneider
P drive schneider
 
Nvidia’s tegra line of processors for mobile devices2 2
Nvidia’s tegra line of processors for mobile devices2 2Nvidia’s tegra line of processors for mobile devices2 2
Nvidia’s tegra line of processors for mobile devices2 2
 
I/O仮想化最前線〜ネットワークI/Oを中心に〜
I/O仮想化最前線〜ネットワークI/Oを中心に〜I/O仮想化最前線〜ネットワークI/Oを中心に〜
I/O仮想化最前線〜ネットワークI/Oを中心に〜
 
Afl test-inspection-catalog
Afl test-inspection-catalogAfl test-inspection-catalog
Afl test-inspection-catalog
 
Motherboard manual ga-8i945gmf_e
Motherboard manual ga-8i945gmf_eMotherboard manual ga-8i945gmf_e
Motherboard manual ga-8i945gmf_e
 
Vigor Evo Plus
Vigor Evo PlusVigor Evo Plus
Vigor Evo Plus
 
NVIDIA Tegra K1
NVIDIA Tegra K1 NVIDIA Tegra K1
NVIDIA Tegra K1
 
SoM with Zynq UltraScale device
SoM with Zynq UltraScale deviceSoM with Zynq UltraScale device
SoM with Zynq UltraScale device
 
Expanding The Micro Blaze System
Expanding  The Micro Blaze  SystemExpanding  The Micro Blaze  System
Expanding The Micro Blaze System
 
50hz eng m7
50hz eng m750hz eng m7
50hz eng m7
 
An1930
An1930An1930
An1930
 
Mhdd advanced-diag
Mhdd advanced-diagMhdd advanced-diag
Mhdd advanced-diag
 
Vigor Ex
Vigor ExVigor Ex
Vigor Ex
 
Chipsets amd
Chipsets amdChipsets amd
Chipsets amd
 
P21gv31
P21gv31P21gv31
P21gv31
 
Csdap
CsdapCsdap
Csdap
 
Motherboard manual 8vm533m-rz_e
Motherboard manual 8vm533m-rz_eMotherboard manual 8vm533m-rz_e
Motherboard manual 8vm533m-rz_e
 
Nexys4ddr rm FPGA board Datasheet
Nexys4ddr rm  FPGA board DatasheetNexys4ddr rm  FPGA board Datasheet
Nexys4ddr rm FPGA board Datasheet
 
M2 m ehs6t_hardware
M2 m ehs6t_hardwareM2 m ehs6t_hardware
M2 m ehs6t_hardware
 
Blue Line Supermicro Aplus
Blue Line Supermicro AplusBlue Line Supermicro Aplus
Blue Line Supermicro Aplus
 

Viewers also liked

Semana cultural ii
Semana cultural iiSemana cultural ii
Semana cultural iieli
 
Marcos georgalos narnia
Marcos georgalos narniaMarcos georgalos narnia
Marcos georgalos narniamarcosgeo1
 
Diverticulitis cirugia
Diverticulitis   cirugia Diverticulitis   cirugia
Diverticulitis cirugia juliomayol
 
Pt ignite-o-escritorio-do-futuro-escritorio-virtual-avila-business-center-120211
Pt ignite-o-escritorio-do-futuro-escritorio-virtual-avila-business-center-120211Pt ignite-o-escritorio-do-futuro-escritorio-virtual-avila-business-center-120211
Pt ignite-o-escritorio-do-futuro-escritorio-virtual-avila-business-center-120211Avila Spaces
 
COEM 4205 COMUNICACION INTERPERSONAL
COEM 4205 COMUNICACION INTERPERSONALCOEM 4205 COMUNICACION INTERPERSONAL
COEM 4205 COMUNICACION INTERPERSONALIlia E. Lopez-Jimenez
 
M Biz M Profile In Ppt
M Biz M Profile In PptM Biz M Profile In Ppt
M Biz M Profile In PptHarbans8
 
Little mix
Little mixLittle mix
Little mixanamdq99
 
[CXL Live 16] How to Give Your Data an Annual Checkup by Annie Cushing
[CXL Live 16] How to Give Your Data an Annual Checkup by Annie Cushing[CXL Live 16] How to Give Your Data an Annual Checkup by Annie Cushing
[CXL Live 16] How to Give Your Data an Annual Checkup by Annie CushingCXL
 
Huizinga, El otoño en la Edad Media, pp. 9-45
Huizinga, El otoño en la Edad Media, pp. 9-45 Huizinga, El otoño en la Edad Media, pp. 9-45
Huizinga, El otoño en la Edad Media, pp. 9-45 Rodrigo Diaz
 
Microbioligia Micobacterias
Microbioligia Micobacterias Microbioligia Micobacterias
Microbioligia Micobacterias ezequiel bolaños
 
Información faringitis
Información faringitisInformación faringitis
Información faringitisJesus Castro
 
Engineering & Piping design
Engineering & Piping designEngineering & Piping design
Engineering & Piping designMusa Sabri
 
Equilibrios ácido-base y equilibrio de solubilidad
Equilibrios ácido-base y equilibrio de solubilidad Equilibrios ácido-base y equilibrio de solubilidad
Equilibrios ácido-base y equilibrio de solubilidad Ângel Noguez
 

Viewers also liked (20)

Semana cultural ii
Semana cultural iiSemana cultural ii
Semana cultural ii
 
Marcos georgalos narnia
Marcos georgalos narniaMarcos georgalos narnia
Marcos georgalos narnia
 
CCC_2015eng
CCC_2015engCCC_2015eng
CCC_2015eng
 
Diverticulitis cirugia
Diverticulitis   cirugia Diverticulitis   cirugia
Diverticulitis cirugia
 
Pt ignite-o-escritorio-do-futuro-escritorio-virtual-avila-business-center-120211
Pt ignite-o-escritorio-do-futuro-escritorio-virtual-avila-business-center-120211Pt ignite-o-escritorio-do-futuro-escritorio-virtual-avila-business-center-120211
Pt ignite-o-escritorio-do-futuro-escritorio-virtual-avila-business-center-120211
 
Servicio tecnico fontanería | fontanero sevilla
Servicio tecnico fontanería | fontanero sevillaServicio tecnico fontanería | fontanero sevilla
Servicio tecnico fontanería | fontanero sevilla
 
COEM 4205 COMUNICACION INTERPERSONAL
COEM 4205 COMUNICACION INTERPERSONALCOEM 4205 COMUNICACION INTERPERSONAL
COEM 4205 COMUNICACION INTERPERSONAL
 
Tesen asanza video de reflexion
Tesen asanza  video de reflexionTesen asanza  video de reflexion
Tesen asanza video de reflexion
 
M Biz M Profile In Ppt
M Biz M Profile In PptM Biz M Profile In Ppt
M Biz M Profile In Ppt
 
Cibernauta y sitios web
Cibernauta y sitios webCibernauta y sitios web
Cibernauta y sitios web
 
Alcaloides y drogas
Alcaloides y drogasAlcaloides y drogas
Alcaloides y drogas
 
1 principios de funcionamiento
1 principios de funcionamiento1 principios de funcionamiento
1 principios de funcionamiento
 
Little mix
Little mixLittle mix
Little mix
 
[CXL Live 16] How to Give Your Data an Annual Checkup by Annie Cushing
[CXL Live 16] How to Give Your Data an Annual Checkup by Annie Cushing[CXL Live 16] How to Give Your Data an Annual Checkup by Annie Cushing
[CXL Live 16] How to Give Your Data an Annual Checkup by Annie Cushing
 
El estado peruano
El estado peruanoEl estado peruano
El estado peruano
 
Huizinga, El otoño en la Edad Media, pp. 9-45
Huizinga, El otoño en la Edad Media, pp. 9-45 Huizinga, El otoño en la Edad Media, pp. 9-45
Huizinga, El otoño en la Edad Media, pp. 9-45
 
Microbioligia Micobacterias
Microbioligia Micobacterias Microbioligia Micobacterias
Microbioligia Micobacterias
 
Información faringitis
Información faringitisInformación faringitis
Información faringitis
 
Engineering & Piping design
Engineering & Piping designEngineering & Piping design
Engineering & Piping design
 
Equilibrios ácido-base y equilibrio de solubilidad
Equilibrios ácido-base y equilibrio de solubilidad Equilibrios ácido-base y equilibrio de solubilidad
Equilibrios ácido-base y equilibrio de solubilidad
 

Similar to Fault tolerant platforms for manufacturing applications

Pumping stationone20140628 real-timeprogrammingwithbeaglebonepr_us.pptx
Pumping stationone20140628 real-timeprogrammingwithbeaglebonepr_us.pptxPumping stationone20140628 real-timeprogrammingwithbeaglebonepr_us.pptx
Pumping stationone20140628 real-timeprogrammingwithbeaglebonepr_us.pptxAmit Tripathi
 
SCADA Strangelove: взлом во имя
SCADA Strangelove: взлом во имяSCADA Strangelove: взлом во имя
SCADA Strangelove: взлом во имяEkaterina Melnik
 
SCADA Strangelove: Hacking in the Name
SCADA Strangelove: Hacking in the NameSCADA Strangelove: Hacking in the Name
SCADA Strangelove: Hacking in the NamePositive Hack Days
 
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...AMD Developer Central
 
Spark Summit EU talk by Jorg Schad
Spark Summit EU talk by Jorg SchadSpark Summit EU talk by Jorg Schad
Spark Summit EU talk by Jorg SchadSpark Summit
 
Molecular Shape Searching on GPUs: A Brave New World
Molecular Shape Searching on GPUs: A Brave New WorldMolecular Shape Searching on GPUs: A Brave New World
Molecular Shape Searching on GPUs: A Brave New WorldCan Ozdoruk
 
Kindratenko hpc day 2011 Kiev
Kindratenko hpc day 2011 KievKindratenko hpc day 2011 Kiev
Kindratenko hpc day 2011 KievVolodymyr Saviak
 
A15 prepaid-energy-meter
A15 prepaid-energy-meterA15 prepaid-energy-meter
A15 prepaid-energy-meterAayush Patidar
 
OSMC 2015: Linux Performance Profiling and Monitoring by Werner Fischer
OSMC 2015: Linux Performance Profiling and Monitoring by Werner FischerOSMC 2015: Linux Performance Profiling and Monitoring by Werner Fischer
OSMC 2015: Linux Performance Profiling and Monitoring by Werner FischerNETWAYS
 
OSMC 2015 | Linux Performance Profiling and Monitoring by Werner Fischer
OSMC 2015 | Linux Performance Profiling and Monitoring by Werner FischerOSMC 2015 | Linux Performance Profiling and Monitoring by Werner Fischer
OSMC 2015 | Linux Performance Profiling and Monitoring by Werner FischerNETWAYS
 
Performance analysis in a multitenant cloud environment Using Hadoop Cluster ...
Performance analysis in a multitenant cloud environment Using Hadoop Cluster ...Performance analysis in a multitenant cloud environment Using Hadoop Cluster ...
Performance analysis in a multitenant cloud environment Using Hadoop Cluster ...Orgad Kimchi
 
Introduction to Java Profiling
Introduction to Java ProfilingIntroduction to Java Profiling
Introduction to Java ProfilingJerry Yoakum
 
Q2.12: Existing Linux Mechanisms to Support big.LITTLE
Q2.12: Existing Linux Mechanisms to Support big.LITTLEQ2.12: Existing Linux Mechanisms to Support big.LITTLE
Q2.12: Existing Linux Mechanisms to Support big.LITTLELinaro
 
OSDC 2017 - Werner Fischer - Linux performance profiling and monitoring
OSDC 2017 - Werner Fischer - Linux performance profiling and monitoringOSDC 2017 - Werner Fischer - Linux performance profiling and monitoring
OSDC 2017 - Werner Fischer - Linux performance profiling and monitoringNETWAYS
 
Polyteda: Power DRC/LVS, October 2016
Polyteda: Power DRC/LVS, October 2016Polyteda: Power DRC/LVS, October 2016
Polyteda: Power DRC/LVS, October 2016Oleksandra Nazola
 
Introduction to FreeRTOS
Introduction to FreeRTOSIntroduction to FreeRTOS
Introduction to FreeRTOSICS
 
CodeWarrior, Linux; OrCad and Hyperlynx; QMS Tools
CodeWarrior, Linux; OrCad and Hyperlynx; QMS ToolsCodeWarrior, Linux; OrCad and Hyperlynx; QMS Tools
CodeWarrior, Linux; OrCad and Hyperlynx; QMS Toolsdjerrybellott
 
Network switch router
Network switch routerNetwork switch router
Network switch routerMuthu Pandi B
 

Similar to Fault tolerant platforms for manufacturing applications (20)

Pumping stationone20140628 real-timeprogrammingwithbeaglebonepr_us.pptx
Pumping stationone20140628 real-timeprogrammingwithbeaglebonepr_us.pptxPumping stationone20140628 real-timeprogrammingwithbeaglebonepr_us.pptx
Pumping stationone20140628 real-timeprogrammingwithbeaglebonepr_us.pptx
 
SCADA Strangelove: взлом во имя
SCADA Strangelove: взлом во имяSCADA Strangelove: взлом во имя
SCADA Strangelove: взлом во имя
 
SCADA Strangelove: Hacking in the Name
SCADA Strangelove: Hacking in the NameSCADA Strangelove: Hacking in the Name
SCADA Strangelove: Hacking in the Name
 
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
 
Spark Summit EU talk by Jorg Schad
Spark Summit EU talk by Jorg SchadSpark Summit EU talk by Jorg Schad
Spark Summit EU talk by Jorg Schad
 
Molecular Shape Searching on GPUs: A Brave New World
Molecular Shape Searching on GPUs: A Brave New WorldMolecular Shape Searching on GPUs: A Brave New World
Molecular Shape Searching on GPUs: A Brave New World
 
Embedded system apsd
Embedded system apsdEmbedded system apsd
Embedded system apsd
 
Kindratenko hpc day 2011 Kiev
Kindratenko hpc day 2011 KievKindratenko hpc day 2011 Kiev
Kindratenko hpc day 2011 Kiev
 
A15 prepaid-energy-meter
A15 prepaid-energy-meterA15 prepaid-energy-meter
A15 prepaid-energy-meter
 
OSMC 2015: Linux Performance Profiling and Monitoring by Werner Fischer
OSMC 2015: Linux Performance Profiling and Monitoring by Werner FischerOSMC 2015: Linux Performance Profiling and Monitoring by Werner Fischer
OSMC 2015: Linux Performance Profiling and Monitoring by Werner Fischer
 
OSMC 2015 | Linux Performance Profiling and Monitoring by Werner Fischer
OSMC 2015 | Linux Performance Profiling and Monitoring by Werner FischerOSMC 2015 | Linux Performance Profiling and Monitoring by Werner Fischer
OSMC 2015 | Linux Performance Profiling and Monitoring by Werner Fischer
 
Performance analysis in a multitenant cloud environment Using Hadoop Cluster ...
Performance analysis in a multitenant cloud environment Using Hadoop Cluster ...Performance analysis in a multitenant cloud environment Using Hadoop Cluster ...
Performance analysis in a multitenant cloud environment Using Hadoop Cluster ...
 
Introduction to Java Profiling
Introduction to Java ProfilingIntroduction to Java Profiling
Introduction to Java Profiling
 
Q2.12: Existing Linux Mechanisms to Support big.LITTLE
Q2.12: Existing Linux Mechanisms to Support big.LITTLEQ2.12: Existing Linux Mechanisms to Support big.LITTLE
Q2.12: Existing Linux Mechanisms to Support big.LITTLE
 
OSDC 2017 - Werner Fischer - Linux performance profiling and monitoring
OSDC 2017 - Werner Fischer - Linux performance profiling and monitoringOSDC 2017 - Werner Fischer - Linux performance profiling and monitoring
OSDC 2017 - Werner Fischer - Linux performance profiling and monitoring
 
Polyteda: Power DRC/LVS, October 2016
Polyteda: Power DRC/LVS, October 2016Polyteda: Power DRC/LVS, October 2016
Polyteda: Power DRC/LVS, October 2016
 
Le Device Tree Linux
Le Device Tree LinuxLe Device Tree Linux
Le Device Tree Linux
 
Introduction to FreeRTOS
Introduction to FreeRTOSIntroduction to FreeRTOS
Introduction to FreeRTOS
 
CodeWarrior, Linux; OrCad and Hyperlynx; QMS Tools
CodeWarrior, Linux; OrCad and Hyperlynx; QMS ToolsCodeWarrior, Linux; OrCad and Hyperlynx; QMS Tools
CodeWarrior, Linux; OrCad and Hyperlynx; QMS Tools
 
Network switch router
Network switch routerNetwork switch router
Network switch router
 

More from ARC Advisory Group

Information Driven Enterprise for the Connected World
Information Driven Enterprise for the Connected WorldInformation Driven Enterprise for the Connected World
Information Driven Enterprise for the Connected WorldARC Advisory Group
 
Stork Presentation on Migration (Willem Hazenberg)
Stork Presentation on Migration (Willem Hazenberg)Stork Presentation on Migration (Willem Hazenberg)
Stork Presentation on Migration (Willem Hazenberg)ARC Advisory Group
 
Asset Information Management (AIM) Presentation @ ARC's 2011 Industry Forum
Asset Information Management (AIM) Presentation @ ARC's 2011 Industry ForumAsset Information Management (AIM) Presentation @ ARC's 2011 Industry Forum
Asset Information Management (AIM) Presentation @ ARC's 2011 Industry ForumARC Advisory Group
 
Three market trends drive collaborative value networks to the next level
Three market trends drive collaborative value networks to the next levelThree market trends drive collaborative value networks to the next level
Three market trends drive collaborative value networks to the next levelARC Advisory Group
 
Mobile Technologies and Supply Chain @ ARC's 2011 Industry Forum
Mobile Technologies and Supply Chain @ ARC's 2011 Industry Forum Mobile Technologies and Supply Chain @ ARC's 2011 Industry Forum
Mobile Technologies and Supply Chain @ ARC's 2011 Industry Forum ARC Advisory Group
 
Enterprise Mobility - Current Practices and Future Plans for Mobility Systems...
Enterprise Mobility - Current Practices and Future Plans for Mobility Systems...Enterprise Mobility - Current Practices and Future Plans for Mobility Systems...
Enterprise Mobility - Current Practices and Future Plans for Mobility Systems...ARC Advisory Group
 
Energy Management Strategies for Operational Excellence @ ARC's 2011 Industry...
Energy Management Strategies for Operational Excellence @ ARC's 2011 Industry...Energy Management Strategies for Operational Excellence @ ARC's 2011 Industry...
Energy Management Strategies for Operational Excellence @ ARC's 2011 Industry...ARC Advisory Group
 
Energy Management and the Evolution of Intelligent Motor Control and Drives @...
Energy Management and the Evolution of Intelligent Motor Control and Drives @...Energy Management and the Evolution of Intelligent Motor Control and Drives @...
Energy Management and the Evolution of Intelligent Motor Control and Drives @...ARC Advisory Group
 
Driving Innovation, Sustainability and Performance @ ARC's 2011 Industry Forum
Driving Innovation, Sustainability and Performance @ ARC's 2011 Industry Forum Driving Innovation, Sustainability and Performance @ ARC's 2011 Industry Forum
Driving Innovation, Sustainability and Performance @ ARC's 2011 Industry Forum ARC Advisory Group
 
Anti-counterfeiting and Brand Protection (ABP) Workshop @ ARC's 2011 Industry...
Anti-counterfeiting and Brand Protection (ABP) Workshop @ ARC's 2011 Industry...Anti-counterfeiting and Brand Protection (ABP) Workshop @ ARC's 2011 Industry...
Anti-counterfeiting and Brand Protection (ABP) Workshop @ ARC's 2011 Industry...ARC Advisory Group
 
Strategies for Asset Performance Management @ ARC's 2011 Industry Forum
Strategies for Asset Performance Management @ ARC's 2011 Industry Forum Strategies for Asset Performance Management @ ARC's 2011 Industry Forum
Strategies for Asset Performance Management @ ARC's 2011 Industry Forum ARC Advisory Group
 
Current Automation Purchasing Strategies Fall Short
Current Automation Purchasing Strategies Fall ShortCurrent Automation Purchasing Strategies Fall Short
Current Automation Purchasing Strategies Fall ShortARC Advisory Group
 
CPM Identified as RPM Engine at ARC Forum
CPM Identified as RPM Engine at ARC ForumCPM Identified as RPM Engine at ARC Forum
CPM Identified as RPM Engine at ARC ForumARC Advisory Group
 
Controls to CPM Connection: Are We There?
Controls to CPM Connection: Are We There?Controls to CPM Connection: Are We There?
Controls to CPM Connection: Are We There?ARC Advisory Group
 
Conoco on Path to Reliability Centered Loop Management: Enhancing ROA on the Way
Conoco on Path to Reliability Centered Loop Management: Enhancing ROA on the WayConoco on Path to Reliability Centered Loop Management: Enhancing ROA on the Way
Conoco on Path to Reliability Centered Loop Management: Enhancing ROA on the WayARC Advisory Group
 
Component Based Solutions Well Aligned with Needs of Service Logistics Providers
Component Based Solutions Well Aligned with Needs of Service Logistics ProvidersComponent Based Solutions Well Aligned with Needs of Service Logistics Providers
Component Based Solutions Well Aligned with Needs of Service Logistics ProvidersARC Advisory Group
 
Combined Fluid Power and Mechatronic Technology Optimizes Solutions
Combined Fluid Power and Mechatronic Technology Optimizes SolutionsCombined Fluid Power and Mechatronic Technology Optimizes Solutions
Combined Fluid Power and Mechatronic Technology Optimizes SolutionsARC Advisory Group
 
Collaborative Asset Lifecycle Management Vision and Strategies
Collaborative Asset Lifecycle Management Vision and StrategiesCollaborative Asset Lifecycle Management Vision and Strategies
Collaborative Asset Lifecycle Management Vision and StrategiesARC Advisory Group
 
Closing the Gap on Digital Manufacturing
Closing the Gap on Digital ManufacturingClosing the Gap on Digital Manufacturing
Closing the Gap on Digital ManufacturingARC Advisory Group
 

More from ARC Advisory Group (20)

Eam guide-video-2015
Eam guide-video-2015Eam guide-video-2015
Eam guide-video-2015
 
Information Driven Enterprise for the Connected World
Information Driven Enterprise for the Connected WorldInformation Driven Enterprise for the Connected World
Information Driven Enterprise for the Connected World
 
Stork Presentation on Migration (Willem Hazenberg)
Stork Presentation on Migration (Willem Hazenberg)Stork Presentation on Migration (Willem Hazenberg)
Stork Presentation on Migration (Willem Hazenberg)
 
Asset Information Management (AIM) Presentation @ ARC's 2011 Industry Forum
Asset Information Management (AIM) Presentation @ ARC's 2011 Industry ForumAsset Information Management (AIM) Presentation @ ARC's 2011 Industry Forum
Asset Information Management (AIM) Presentation @ ARC's 2011 Industry Forum
 
Three market trends drive collaborative value networks to the next level
Three market trends drive collaborative value networks to the next levelThree market trends drive collaborative value networks to the next level
Three market trends drive collaborative value networks to the next level
 
Mobile Technologies and Supply Chain @ ARC's 2011 Industry Forum
Mobile Technologies and Supply Chain @ ARC's 2011 Industry Forum Mobile Technologies and Supply Chain @ ARC's 2011 Industry Forum
Mobile Technologies and Supply Chain @ ARC's 2011 Industry Forum
 
Enterprise Mobility - Current Practices and Future Plans for Mobility Systems...
Enterprise Mobility - Current Practices and Future Plans for Mobility Systems...Enterprise Mobility - Current Practices and Future Plans for Mobility Systems...
Enterprise Mobility - Current Practices and Future Plans for Mobility Systems...
 
Energy Management Strategies for Operational Excellence @ ARC's 2011 Industry...
Energy Management Strategies for Operational Excellence @ ARC's 2011 Industry...Energy Management Strategies for Operational Excellence @ ARC's 2011 Industry...
Energy Management Strategies for Operational Excellence @ ARC's 2011 Industry...
 
Energy Management and the Evolution of Intelligent Motor Control and Drives @...
Energy Management and the Evolution of Intelligent Motor Control and Drives @...Energy Management and the Evolution of Intelligent Motor Control and Drives @...
Energy Management and the Evolution of Intelligent Motor Control and Drives @...
 
Driving Innovation, Sustainability and Performance @ ARC's 2011 Industry Forum
Driving Innovation, Sustainability and Performance @ ARC's 2011 Industry Forum Driving Innovation, Sustainability and Performance @ ARC's 2011 Industry Forum
Driving Innovation, Sustainability and Performance @ ARC's 2011 Industry Forum
 
Anti-counterfeiting and Brand Protection (ABP) Workshop @ ARC's 2011 Industry...
Anti-counterfeiting and Brand Protection (ABP) Workshop @ ARC's 2011 Industry...Anti-counterfeiting and Brand Protection (ABP) Workshop @ ARC's 2011 Industry...
Anti-counterfeiting and Brand Protection (ABP) Workshop @ ARC's 2011 Industry...
 
Strategies for Asset Performance Management @ ARC's 2011 Industry Forum
Strategies for Asset Performance Management @ ARC's 2011 Industry Forum Strategies for Asset Performance Management @ ARC's 2011 Industry Forum
Strategies for Asset Performance Management @ ARC's 2011 Industry Forum
 
Current Automation Purchasing Strategies Fall Short
Current Automation Purchasing Strategies Fall ShortCurrent Automation Purchasing Strategies Fall Short
Current Automation Purchasing Strategies Fall Short
 
CPM Identified as RPM Engine at ARC Forum
CPM Identified as RPM Engine at ARC ForumCPM Identified as RPM Engine at ARC Forum
CPM Identified as RPM Engine at ARC Forum
 
Controls to CPM Connection: Are We There?
Controls to CPM Connection: Are We There?Controls to CPM Connection: Are We There?
Controls to CPM Connection: Are We There?
 
Conoco on Path to Reliability Centered Loop Management: Enhancing ROA on the Way
Conoco on Path to Reliability Centered Loop Management: Enhancing ROA on the WayConoco on Path to Reliability Centered Loop Management: Enhancing ROA on the Way
Conoco on Path to Reliability Centered Loop Management: Enhancing ROA on the Way
 
Component Based Solutions Well Aligned with Needs of Service Logistics Providers
Component Based Solutions Well Aligned with Needs of Service Logistics ProvidersComponent Based Solutions Well Aligned with Needs of Service Logistics Providers
Component Based Solutions Well Aligned with Needs of Service Logistics Providers
 
Combined Fluid Power and Mechatronic Technology Optimizes Solutions
Combined Fluid Power and Mechatronic Technology Optimizes SolutionsCombined Fluid Power and Mechatronic Technology Optimizes Solutions
Combined Fluid Power and Mechatronic Technology Optimizes Solutions
 
Collaborative Asset Lifecycle Management Vision and Strategies
Collaborative Asset Lifecycle Management Vision and StrategiesCollaborative Asset Lifecycle Management Vision and Strategies
Collaborative Asset Lifecycle Management Vision and Strategies
 
Closing the Gap on Digital Manufacturing
Closing the Gap on Digital ManufacturingClosing the Gap on Digital Manufacturing
Closing the Gap on Digital Manufacturing
 

Fault tolerant platforms for manufacturing applications

  • 1. )DXOW 7ROHUDQW 3ODWIRUPV IRU 0DQXIDFWXULQJ $SSOLFDWLRQV %< *5(* *25%$&+ 6(37(0%(5 $5 ,16,*+76 0 ( .(:25'6 Fault Tolerance, High Availability, Cluster, Collaborative Manufacturing 6800$5 New, low-cost technology for fault-tolerant platforms is now available for Microsoft Windows 2000 environments. Manufacturers should revisit some old assumptions about where they might benefit from deploying these platforms. 7KH FRVW RI WKH QHZ IDXOW WROHUDQW Collaboration puts a premium on real-time manufacturing informa- VVWHPV KDV IDOOHQ VR IDU WKDW tion, and these systems can help ensure that the information is PDQXIDFWXUHUV PXVW UHWKLQN always available. Next generation automation systems, production HQVXULQJ WKH DYDLODELOLW RI WKHLU FULWLFDO LQIRUPDWLRQ management systems, business systems, and collaborative systems can all benefit from this technology. $1$/6,6 The first is the fully replicated, fault-tolerant hardware solution from Stratus Computer Systems, with duplicate components operating in lockstep. In the event of a component failure, there is no interruption in processing, no lost data, and no slowdown in perform- ance. The second approach, offered by 'HVFULSWLRQ 6WUDWXV 0DUDWKRQ OXVWHU Marathon Technologies, isolates all I/O $YDLODELOLW from both the user operating system and the application by placing these tasks on differ- 5HFRYHU 7LPH =HUR 0LOOLVHFRQGV 0LQXWHV ent computers connected through RSLHV RI 26 0XOWLSOH 0XOWLSOH proprietary interface cards, software, and 6PPHWULF 0XOWL $YDLODEOH 1R $YDLODEOH high speed interconnect. 3URFHVVLQJ 6VWHP 2SHUDWLRQ 6LQJOH 6VWHP 6SOLW 0XOWL6VWHP ,PDJH $UFKLWHFWXUH OXVWHU %HRQG OXVWHUV While the traditional clustering approach to ,PSOHPHQWDWLRQ 1R ZRUN ,QWHJUDWH )7 6FULSW 'H UHTXLUHG RPSR YHORSPHQW fault tolerance does provide for enhanced QHQWV DQG 7HVWLQJ availability, there are significant limitations. UG UG 'LVDVWHU 7ROHUDQFH 3DUW $YDLODEOH 3DUW Cluster solutions do not provide fault toler- 6LQJOH 6XSSRUW HV UG 3DUW UG 3DUW ance (failure and repair/recovery is RQWDFW transparent to the user), only failover (a RPSDULVRQ RI )DXOW 7ROHUDQW 6ROXWLRQV backup system automatically restarts the @IU@SQSDT@Ã6I9ÃH6IVA68UVSDIBÃTUS6U@BD@TÃAPSÃDI9VTUS`Ã@Y@8VUDW@TÃ
  • 2. 6S8ÃD†vtu‡†ÃQhtrÃ!à applications and logs on the users). Implementation requires the development, testing, and support of custom failover scripts, licensing and installation of multiple copies of software, and possibly application modifications for a cluster environment. In the event of a hardware failure, a cluster failover always loses all memory contents, and several minutes will be required to recover. Cluster solutions offer 99.9 percent availability (about 8 hours down per year), but fault tolerant solutions offer 99.999 percent availabil- ity (about 5 minutes down per year). +DUGZDUH )DXOW 7ROHUDQFH The first requirement for high availability systems is hardware fault tolerance. Stratus and Marathon each take a different approach. Stratus ftServer ftServer uses standard Intel server components and designs, but Stratus designs its own motherboard (using standard Intel server design guidelines), removes the PCI I/O, and adds fault detection logic that is key to fault isolation in a DMR configuration. The sys- tem contains two motherboards for Dual Modular Redundancy (DMR) or three motherboards for Triple Modular Redundancy (TMR). All motherboards run in lock- step, using a single system clock, and Disk PCI Fault redundant clock cards. Fault-detection Fault Memory CPU Detection Detection 1-N way SMP and isolation logic (a custom ASIC) com- Lockstep CPU’s Lockstep CPU’s Lockstep CPU’s Lockstep CPU’s Isolation Isolation Chipset pares I/O output from all motherboards. DMR Disk PCI DMR systems rely on fault-detection Fault Fault Memory CPU Detection Detection 1-N way SMP logic on each motherboard to see which Isolation Chipset Isolation is in error. If no motherboard error is signaled, a software algorithm decides Fault Memory CPU Detection 1-N way SMP which board to remove. In a TMR sys- TMR Chipset Isolation tem, 3-way voting is used to isolate the failed board. ftServer runs a single copy 6WUDWXV· IW6HUYHU $UFKLWHFWXUH (QVXUHV =HUR 6ZLWFKRYHU 7LPH of all software, resulting in lower licens- 1R 6LQJOH 3RLQW RI )DLOXUH DQG D 6LQJOH 6RIWZDUH ,PDJH ing costs and simple administration. Marathon Endurance System Marathon physically and logically separates the two basic operations of computers, the manipulating and transforming data (computing) and the moving data to and from mass storage, networks, and other I/O devices (I/O processing). The computing function is put on one server (the compute element), and the I/O processing function is put on an- other server, (the I/O processor). These CE/IOP pairs (tuples) connect through proprietary high-speed PCI interfaces and fiber optics. The Marathon Interface Card ‹Ã! ÇÃ6S8Ã6q‰v†‚…’ÃB…‚ˆƒÃ‡ÃÃ6yyvrqÃ9…v‰rÇÃ9rquh€ÃH6Ã!!%ÃVT6ÇÃ' # ÇÃ6S8rip‚€Ã VT6ÇÃVFÇÃBr…€h’ÇÃEhƒhÃ‡ÃDqvhÃ
  • 3. 6S8ÃD†vtu‡†ÃQhtrÃà (MIC) sends and receives data from both systems simultaneously. The MIC also pro- vides the comparison and test logic to ensures that both systems are identical. Each tuple is a complete system, wherein the operating system running on both the CE and IOP is a Windows server OS. All CE I/O task requests go to the IOP for handling. Marathon software runs as an application on the IOP and controls all of the fault man- agement, disk mirroring, system management, and resynchronization. Because the fault management is done in software, it can impact the performance. Depending on the ap- plications running, system performance may degrade by 10-20 percent or more. It takes two tuples to configure an assured availability system. The Compute Element IOPs run in parallel, but not in lockstep. If an IOP fails, the other CPU IOP continues to run the system. The failed IOP can then be physi- Applications and cally removed. After the Marathon software starts running, the MEMORY MIC Operating System repaired IOP automatically rejoins the configuration. The mirrored I/O Processor disks are re-mirrored in background mode over the private Ethernet linking the IOPs. The same process handles the failure of MEMORY MIC All I/O a mirrored disk. I/O CPU ADAPTERS 6RIWZDUH $YDLODELOLW Network The second requirement is for maximizing software availability. Clusters rely on standard hardware, software, and service models that do not help prevent failures, isolate failures, or resolve failures. 0DUDWKRQ 7XSOH ³ %XLOGLQJ %ORFN IRU DQ $VVXUHG $YDLODELOLW They simply recover from failures. Once again, Marathon and Stra- tus have different approaches. Stratus Software availability features seek to prevent outages, minimize those that cannot be prevented, and resolve problems so that they do not happen again. Stratus does not change any of the core Windows code. This guarantees 100 percent binary compatibility of all Windows applications. Stratus does change the Windows 2000 environment, but only in areas designed to be customized by hardware and software partners and sepa- rated from the main body of Windows code by documented, well-defined interfaces. Drivers cause a significant percentage of NT failures. Stratus driver hardening goes be- yond Windows 2000 improvements to further reduce driver-induced OS failures. The driver defines its memory boundaries and works with Stratus hardware to automatically prevent memory transfers beyond the defined memory boundaries. This prevents a bad PCI card from crashing the system. The new Microsoft driver model for Windows 2000 uses WMI (Windows Management Instrumentation) for management, control, and re- porting functions. Stratus hardened drivers are completely compatible with WMI. ‹Ã! ÇÃ6S8Ã6q‰v†‚…’ÃB…‚ˆƒÃ‡ÃÃ6yyvrqÃ9…v‰rÇÃ9rquh€ÃH6Ã!!%ÃVT6ÇÃ' # ÇÃ6S8rip‚€Ã VT6ÇÃVFÇÃBr…€h’ÇÃEhƒhÃ‡ÃDqvhÃ
  • 4. 6S8ÃD†vtu‡†ÃQhtrÃ#à Stratus recommends that all drivers be hardened. Hardened drivers for all installed adapters are required in order to receive Stratus’ 100 percent availability guarantee. Incompatible versions of hardware and software from different suppliers are well- known. The Resource Inventory Manager (RIM) identifies all system hardware and software configuration elements, along with their revision levels, at initial install and all configuration changes. This information is stored and is also sent to the Stratus CAC, which can check known conflicts and help diagnose any problems. Marathon Marathon’s architecture provides hardware fault tolerance, protection against transient OS bugs, detects OS failures, and automatically restarts the system. Because the IOPs run Marathon’s I/O management and fault-handling software, they are isolated from the loads placed on the CEs by the user’s applications and operating system. The IOPs run in parallel, but not in lockstep. Since the IOPs handle all interruptions, the CEs are free to run the OS and user applications without the usual stream of asynchrony. Interrup- tions are managed through a structured process that eliminates a major source of asynchrony-induced software failures. The IOPs are subjected to these asynchronies, but since there are two autonomous IOPs in a full fault-tolerant system, an interrupt-induced software asynchrony will only affect one of the IOPs. If an IOP goes down, the surviving IOP carries on until an automatic reboot of the failed IOP is completed. 6HUYLFH The third requirement for high availability systems is designed-in serviceability. Again, Stratus and Marathon have different approaches. Stratus Serviceability is built into the ftServer hardware design in the form of customer replace- able modules, automatic fault isolation and remote management, and reporting through the Stratus remote management card. The Stratus Service Network (SSN) enables re- mote access to every customer system. The Stratus Customer Assistance Center provides the 24/7 critical support. ftServer automatically isolates failures to the component level while continuing opera- tion on a second component. Failures are automatically reported to the CAC via a dial connection. A replacement component is shipped from Stratus for next-day arrival. The customer replaces the component while the system continues to operate. The new com- ponent is automatically integrated into the running system. The system and application continue to run normally through this entire process. ‹Ã! ÇÃ6S8Ã6q‰v†‚…’ÃB…‚ˆƒÃ‡ÃÃ6yyvrqÃ9…v‰rÇÃ9rquh€ÃH6Ã!!%ÃVT6ÇÃ' # ÇÃ6S8rip‚€Ã VT6ÇÃVFÇÃBr…€h’ÇÃEhƒhÃ‡ÃDqvhÃ
  • 5. 6S8ÃD†vtu‡†ÃQhtrÃ$à Each ftServer comes with two ftServer Management PCI adapters. These adapters are, themselves, board level computers. They run independently of the host system and are powered even if the rest of the system is powered off. Either redundant ftServer Man- agement adapter provides full control over the ftServer. Access is controlled through an TCP/IP interface via dial modem or local Ethernet. If a customer calls, Stratus will troubleshoot the problem. If the problem is in Microsoft Windows 2000 code, Stratus calls in Microsoft, based on its service contract with Micro- soft. Stratus also has licensed Windows 2000 source code and a staff of kernel-trained engineers. Microsoft has also given Stratus access to their OS debugging tools. Marathon The Marathon Assured Availability system has three states: operational, vulnerable, and down. The vulnerable state, invisible to users, notifies the system manager that a re- pair/resynchronization cycle can be initiated. Marathon provides two notification methods: the system console and the event log. The console presents a graphical model on the system monitor, on remote systems over the network, or through a serial line to the system manager. Color-coded components indicate their state, and a point-and-click interface is used to examine and manage system components. The second method uses the Windows server event log to log all events, including Marathon system events. Sev- eral third-party tools are available that use the event log to communicate specified events via beepers, fax, e-mail, etc., to the system manager. 5(200(1'$7,216 • All systems supporting real-time collaboration throughout the enterprise and value chain should be deployed on fault-tolerant platforms. • When it comes to control-level, real-time, batch and process control applications, Stratus ftServer has the advantage because their architecture has no single point of failure and zero switchover time. • When selecting fault tolerant solutions, consider the whole solution, including hardware fault tolerance, software availability, performance, implementation costs, and serviceability. For further information, contact your account manager or the author at ggorbach@arcweb.com. Recommended circulation: All EAS and MAS clients. ‹Ã! ÇÃ6S8Ã6q‰v†‚…’ÃB…‚ˆƒÃ‡ÃÃ6yyvrqÃ9…v‰rÇÃ9rquh€ÃH6Ã!!%ÃVT6ÇÃ' # ÇÃ6S8rip‚€Ã VT6ÇÃVFÇÃBr…€h’ÇÃEhƒhÃ‡ÃDqvhÃ