1. While alarm systems are important for maintaining plant operations, poorly designed alarm systems can decrease productivity and increase costs. As plants change over time, the number of potential alarms increases, overloading operators.
2. The paper proposes a spiral approach to continuously optimize alarm systems involving regular analysis of alarm and operator response data, implementing countermeasures to identified problems, and evaluating the effectiveness of countermeasures over time.
3. Analysis involves quantifying and visualizing alarm and response frequencies to identify imbalances signaling problems. Countermeasures aim to reduce unnecessary alarms through techniques like parameter tuning. Evaluation assesses if countermeasures maintain effectiveness as plants evolve.
2. 384 Yoshitaka Yuki / ISA Transactions 41 (2002) 383–387
erator’s prompt reaction to unexpected situations.
If an operator does not recognize an abnormal
situation, it may not only affect product quality,
but may also waste time and lead to increased pro-
duction costs, as well as potentially compromise
plant safety. On the other hand, if alarm messages
are too frequent, operators are overloaded with
alarms and may miss an important message buried
among spurious messages. This also may ad-
versely affect quality, productivity, and safety. In
this sense, maintaining a well-tuned alarm system
which generates an adequate and balanced fre-
quency of messages is essential to reduce costs, Fig. 1. Spiral improvement cycle.
improve quality, and maintain a safe operating en-
vironment.
The evil of alarm flooding has long been recog- grading previous alarm optimization efforts. As
nized and most initial designs and startups attempt shown in Fig. 1 the ongoing alarm system optimi-
to address the issue. Some of the countermeasures zation effort consists of three steps: ͑1͒ analysis;
typically applied include alarm suppression, which ͑2͒ countermeasures, and ͑3͒ evaluation.
bridles messages with conditions; an alarm inte-
gration block, which combines several alarm con- 3.1. Analysis
ditions, thereby reducing the number of alarms
which the operator is presented with; and an intel- Finding alarm system problems is not easy. In a
ligent alarm function block, which utilizes as ex- typical chemical plant, more than 5000 alarms and
pert knowledge base or fuzzy logic. These coun- events are being recorded each day. Operators may
termeasures reduce message repetition, thereby notice when and where a rush of alarm notifica-
freeing operators from the burden of the alarm tions occurred. However, looking through an enor-
flooding. However, even with the best technology mous number of messages in a log file is like find-
and engineering, the effect of alarm system opti- ing a needle in a haystack.
mization does not last forever. Whenever there are Focusing on the interrelation of alarm messages
changes in process, equipment, or system, addi- and operator action makes the problem of finding
tional alarm points are usually added. Hence an process easier. Alarm messages are sent to opera-
intelligent alarm can be buried among those newly tors to prompt them to react in some way. In this
added alarms and messages. sense, they are the ‘‘process request’’ messages,
The continual addition of alarms is a natural re- whereas a plant process asks for the operators’ re-
sult of a plant’s evolution. When the alarms are action. The operators usually start their actions by
added, it is often done in the context of a work watching the console messages. Therefore fre-
order or a change to the small part of a plant. In quency of the operators’ actions should be related
these limited scopes, the alarm addition may be to the frequency of message notifications. There
approvable. However, when considering a larger are also cases where the alarm messages fre-
context, these changes may not be appropriate. As quency increases when an operator’s action is not
the number of alarm conditions increases, the op- adequate. In both cases, the alarm notification and
erators’ workload also increases. This cyclical ad- the operator actions are interrelated with each
dition of alarms makes it difficult to maintain an other. This can be calculated by counting message
optimum frequency of alarm notifications. notifications for a time period. The message fre-
quency can be quantified and visualized as a bar
chart. The frequency of the operators’ actions can
3. Maintaining optimal alarm frequency be also quantified in a similar manner. By compar-
ing the alarm message notification frequency and
Alarm system optimization must be treated as an the operator action frequency on an event balance
ongoing activity in order to escape the creeping trend graph as shown in Fig. 2, you can easily tell
addition of alarms that end up masking and de- whether the operator was busy dealing with mes-
3. Yoshitaka Yuki / ISA Transactions 41 (2002) 383–387 385
alarm/guidance messages and operator actions. By
comparing the balance of these two items with the
unit recipe’s relative time, a specific batch phase
can be found. It generates more messages, or re-
quires more manual operation, than expected.
Fig. 4 shows another batch balance trend graph
for the same product. The procedure ͑recipe͒ of
this batch is the same as shown in Fig. 3, therefore
Fig. 2. Event balance trend graph. the balance peak pattern is similar. As shown in
these figures, comparing several batch event bal-
ance patterns can lead to finding a repeating alarm
sages. In this example, the alarm frequency had an message or frequent operator operations. With this
abrupt increase at 11:00, but the operators’ action approach, we could find substantial productivity
frequency was not increasing. In this case, it was bottleneck problems in the material transfer phase
found to be a result of unneeded alarm messages. between reactor 1 and reactor 2, and the material
Once the frequency and the source of spurious cake removal phase of a centrifuge.
alarm messages is found, it is possible to apply
countermeasures to prevent a recurrence.
3.3. Countermeasures
3.2. Plan
The next step is to apply countermeasures for
It is important to apply improvement efforts in each problem. Countermeasures vary depending
problem areas where the most benefit is expected. on the nature of problem. Some examples of coun-
The largest impact can usually be obtained by im- termeasures are:
proving the most frequently occurring imbalances • Set adequate alarm range;
between alarm messages and operator actions.
• Tune watchdog timer depending on the pro-
This can be accomplished by finding repetitive cedure;
spurious alarms and concentrated manual opera-
• Tuning parameter adjustment;
tions that can be automated. For example, in a
batch plant, many products are produced repeat- • Integrate/combine redundant manual opera-
tions;
edly according to a ‘‘recipe.’’ Each recipe consists
of several procedures, and each procedure is ex- • Add timely guidance message prior to un-
stable sequences.
ecuted in a defined order. A repeating event bal-
ance pattern for a specific recipe leads to repeating Countermeasures can be determined and applied
problem areas that can be improved. to each problem. Countermeasures may require
Fig. 3 shows an example of an event balance trial-and-error iterations, but once a problem area
trend of a batch process. It shows each unit proce- is narrowed down the countermeasures become
dure’s running duration, and the frequency of less complex. Our experience shows that a timely
Fig. 3. Batch event balance trend graph ͑1͒ ͓batch ID: 22-9DAP͔.
4. 386 Yoshitaka Yuki / ISA Transactions 41 (2002) 383–387
Fig. 4. Batch event balance trend graph ͑2͒ ͓batch ID: 23-9DAP͔.
guidance message sent to an operator before the 4. Alarm system requisites
first warning of an abnormal state can help to pre-
vent a rush of alarm messages caused by an alarm In order to make alarm optimization effective,
tripping. As shown in Fig. 5, one timely guidance an alarm system should have a comprehensive da-
message helps the operator to prevent this type of tabase. The alarms can then be analyzed from
situation. various aspects as well as provide flexibility for
easy configuration changes. The following fea-
3.4. Evaluate tures are required in an optimal alarm system.
͑1͒ Basic alarm and event database features for
As already discussed, it is difficult to maintain supporting analysis activity:
the effect of countermeasures. To learn whether
the effect of a countermeasure continues, a longer- • Time stamped event description ͑message͒;
term quantitative analysis is required. A daily • Identifier for showing the origin of the mes-
alarm notification summary before and after ap- sage ͑tag-ID, etc.͒;
plying a countermeasure is shown in Fig. 6. In this • Plant hierarchical ID for grouping messages
way, it is easy to determine when alarm messages of each unit/area;
begin to increase. When the total daily alarm mes- • Alarm and event category ͑process alarm,
sage count is increased, the event balance trend guidance, tracking record, operation record,
analysis for a day/batch should be investigated etc.͒.
again. Summarizing the count for each week, each ͑2͒ Flexibility and tolerance for optimization ef-
month, or each batch also helps to watch the effect fort:
continuity.
• Alarm grouping so that only one integrated
alarm message is presented to the operator
Fig. 5. An earlier guidance message prevents alarm rush
afterwards. Fig. 6. Alarm reduction effort result.
5. Yoshitaka Yuki / ISA Transactions 41 (2002) 383–387 387
when many interrelated alarms have been is to optimize the alarm system. However, achiev-
generated in the control package at the same ing a well-tuned alarm system is difficult because
time; of the huge number of daily events and the lack of
• Intelligent alarm blocks which capture status statistical analysis methods to detect problems.
changes and predicts abnormal situations in Quantifying and visualizing alarms and events in
advance;
an event balance trend graph makes it easy to
• Security system to prevent operators from
grasp problem areas. Overlaying unit recipe
performing invalid operations, such as privi-
lege protection for acknowledgement and re- schedule results from a batch process on an event
alarming when an alarm condition persists. balance trend can identify specific portions of the
batch process that can be improved. Once a prob-
5. Conclusion lem area is identified and analyzed numerically,
countermeasures can be applied to eliminate pro-
A plant’s alarm and event database is a live duction bottlenecks. It is also important to iterate
record of a production process. It contains key in- the cycles of analysis-plan-countermeasure-
formation useful in improving operations produc- evaluation in order to maintain the effect of im-
tivity by solving production bottlenecks, perform- provement. A reliable alarm system and an effi-
ing production planning, and optimizing opera- cient analysis tool are indispensable to support this
tion. One method to achieve these improvements spiral activity.