A case study paper on equipment availability data analysis.
Tracking bottling equipment line uptime and downtime is a common metric for bottling production lines. The runtime and downtime along with reasons for being down are routinely and semi-automatically recorded. The data is often summarized using the exponential distribution and reported as MTBF and MTTR.
During the design of a new bottling line, the design team used the recorded data from existing lines and equipment to estimate the proposed line availability. If the new line could shorten the run time to accommodate a high mix of products and improve the line availability and thus throughput, the new line would permit significant warehouse savings.
The experienced operator, maintenance and engineering teams knew that the line availability improved as the run duration increased. After the initial setup, the line operator and maintenance crew continued to adjust and improve the operation of the bottling line, thus, overtime improving the line availability. It was not a constant value independent of the run duration. And, the existing calculations based on MTBF and MTTR did not reflect this behavior.
This paper examines the use of expected values of the fitted distributions for uptime and downtime, rather than using MTBF and MTTR. The expected values permit the analysis to study the changes in availability as the run duration changes. The result was the design team’s analysis could tradeoff the run duration and associated throughput with the expected warehouse requirements and cost savings for an optimal bottling line design. This paper primarily explores the equipment analysis and availability calculations.
How to Remove Document Management Hurdles with X-Docs?
Equipment Availability Analysis
1. Equipment Availability Analysis
Fred Schenkelberg, Ops A La Carte, LLC
Angela Lo, Kaiser Permanente
Key Words: Availability, Data Analysis, Repairable System
SUMMARY & CONCLUSIONS require 2 to 4 hours to reset the bottle alignment guides, chutes
and other equipment and supplies.
Tracking bottling equipment line uptime and downtime is
A scheduling team worked out the production schedule
a common metric for bottling production lines. The runtime
well in advance with the intent to maximize the line uptime by
and downtime along with reasons for being down are routinely
avoiding bottle size changes. Yet, the bottling line design team
and semi-automatically recorded. The data is often
was asked to explore the increase in throughput by increasing
summarized using the exponential distribution and reported as
the availability of the overall line with both engineering and
MTBF and MTTR.
layout changes. For example, one consideration was if
During the design of a new bottling line, the design team
purchasing dedicated equipment for each bottle size increased
used the recorded data from existing lines and equipment to
throughput sufficiently to offset the cost of the additional (and
estimate the proposed line availability. If the new line could
often idle) equipment. Another consideration was the use of
shorten the run time to accommodate a high mix of products
redundant pieces of equipment, especially those prone to
and improve the line availability and thus throughput, the new
extended downtime due to a major repair.
line would permit significant warehouse savings.
While exploring the effectiveness in increasing
The experienced operator, maintenance and engineering
throughput by improving the overall line availability, we also
teams knew that the line availability improved as the run
need to consider the tradeoff between throughput and
duration increased. After the initial setup, the line operator and
inventory costs. For example, in order to increase the line
maintenance crew continued to adjust and improve the
availability and hence the throughput, we should prioritize in
operation of the bottling line, thus, overtime improving the
minimizing bottlenecks during the process. Therefore the
line availability. It was not a constant value independent of the
focus on this paper is on the ‘filler’ equipment as it is the line
run duration. And, the existing calculations based on MTBF
bottleneck. Increasing the throughput of the filler will permit
and MTTR did not reflect this behavior.
the line to produce that same quantity in less time. This frees
This paper examines the use of expected values of the
up the line for other production and reduces quantity of
fitted distributions for uptime and downtime, rather than using
finished goods inventory required.
MTBF and MTTR. The expected values permit the analysis to
The design team had a line throughput modeling software
study the changes in availability as the run duration changes.
package, which included buffer sizing, permitted dwell times
The result was the design team’s analysis could tradeoff the
for the contents at specific temperatures or between bottling
run duration and associated throughput with the expected
and sterilization equipment. They also knew from experience
warehouse requirements and cost savings for an optimal
and simple data analysis that the longer duration runs with a
bottling line design. This paper primarily explores the
single bottle size tended to have better throughput (equipment
equipment analysis and availability calculations.
availability) performance during the later stages of the run.
Anecdotally they knew that the first 12 hours of a run includes
1 INTRODUCTION a significant number of adjustments, which improved the
ability of the line to run smoothly.
The plethora of bottle sizes and flavors even for single
The existing method within the plant to determine
brand of beverage necessitates flexible bottling equipment
equipment availability used MTBF and MTTR and the
capable of ‘change overs’ between flavors and bottle sizes.
underlying assumption of the exponential distribution. The
The equipment for bottling originally primarily only worked
design team recognized the lack of time dependence and
with one bottle size and shape. As market demands increase
therefore asked us to perform the data analysis.
the equipment continued to evolve and now permits the same
bottling line to fill, label and box a relatively large selection of
bottle sizes. A flavor change requires only the cleaning of the 1.1 Project Question
filling equipment and changing the labels, creating the
The basic question explored in this paper is just one of
preference to filling many flavors for one bottle size, when
many analysis performed in support of the design team. One
ever possible. In contrast the bottle size change tended to
question was how to properly model the equipment data such
2. that the design team could explore the differences in 2.1 MTBF
equipment availability over time. For example, with no
The unbiased estimator for the exponential distribution’s
equipment design changes, was it possible to achieve suitable
single fitting parameter, θ, is
throughput with only 4-hour runs rather than 12-hour runs?
(1)
Another was the exploration of the demonstrated throughput
after extended runs suggested what was possible if the
equipment design made ‘change overs’ that did not then
require adjustments to improve it’s performance.
where, θ is called the MTBF by definition within the
This paper will explore one piece of equipment, the filler,
factory. Also the operating time is determined by summing all
and fit appropriate distributions to the data. The fitted
the time segments representing when the filler equipments was
distributions for the uptime (operating) and the downtime
actually filling or ready to fill bottles.
(under repair) permit the calculation of the equipment
The number of downtime events is just the simple count
availability at various run durations.
of events that occurred. And, with the filtered data only counts
1.2 Data events associated with the filler equipment, thus providing the
filler equipment’s average uptime.
The data has been disguised to shield the equipment
As is practice within the factory, the MTBF value is
manufacturer and bottling plant from identification. While the
determined by calculating MTBF over many similar bottle
actual data has a linear transformation, the trends have
size runs. As an example, the data for the ‘small bottles’
remained the same. Furthermore the codes for downtime,
provides an estimate of MTBF of 46.5 minutes.
which included blockage, jams, alignment issues, fill sensor
readings, and many more, have also been altered to represent 2.2 MTTR
generic reasons unrelated to the actual reasons. For the
Using the same formula above with the substitution of
purpose of this discussion the downtime reasons are
downtime for run time and again assuming an exponential
immaterial.
distribution, the factory personal calculate (what they defined)
The actual raw data included downtime for shift change,
the MTTR or average downtime.
meetings, scheduled maintenance, and lack of raw materials.
We removed such data since the purpose of the analysis was to
(2)
focus only on the individual piece of equipment.
Condition Start End Using the same dataset as for MTBF and making the
04:50:18 04:52:23 substitution of downtime for runtime, we find MTTR of 2.45
Supply Tank Low Level Sep/24/2007 Sep/24/2007
05:04:19 05:08:29 minutes.
Capper Infeed Star Jam Sep/24/2007 Sep/24/2007
05:08:42 05:17:28
2.3 Availability
Capper Infeed Star Jam Sep/24/2007 Sep/24/2007
The well known formula for availability
Blocked - Discharge Conveyor 05:51:19 05:51:51
Stopped Sep/24/2007 Sep/24/2007
05:52:28 05:52:58 MTBF
Discharge Jam Alarm At S203 Sep/24/2007 Sep/24/2007 Availiability = (3)
05:52:59 05:54:30 MTBF + MTTR
Discharge Jam Alarm At S203 Sep/24/2007 Sep/24/2007
05:55:34 05:58:31
Jog Mode Selected Sep/24/2007 Sep/24/2007 was given as the reason for estimating the MTBF and
06:00:27 06:00:32 MTTR values by factory personal. Using the values provided
Discharge Jam Alarm At S204 Sep/24/2007 Sep/24/2007
06:33:54 07:17:03
and the availability formula (3) we find the average filler
Filler Run Switch Off Sep/24/2007 Sep/24/2007 availability of 95% over the recent 6 months of operation.
07:47:39 07:53:02
Jog Mode Selected Sep/24/2007 Sep/24/2007 2.4 Throughput
07:56:55 07:58:56
Jog Mode Selected Sep/24/2007 Sep/24/2007 The filler equipment has the capability to fill bottles at the
08:34:11 08:42:50 rate of approximately 425 bottles a minute. And, the
Door 6 Open Sep/24/2007 Sep/24/2007 equipment has the capability to run for short periods of time
much faster. Plus, for restarting (after clearing a bottle jam, for
2 CURRENT MEASURES example) or when troubleshooting, the filler has a run mode
that is much slower. On average the filler is considered to
The following analysis illustrates the plant’s methods for have an average fill rate of 400 bottles per minute.
calculating the equipment availability and throughput. The throughput calculation is:
Throughput = Fill Rate × Availability (4)
3. run with fewer failures. Furthermore, this supports the use of
Thus, for small bottles and this particular filler, the simple constant failure rate estimates for scheduling and the
average throughput is 380 bottles per minute. improved line design decisions.
Finally, in order to schedule the line to produce a desired Taking a closer look at the underlying data, we noticed
amount of filled bottles, the scheduling department would
divide the amount desired by the average throughput. After
applying ‘historical knowledge’ to adjust the run schedule to a
slightly longer duration for short runs and slightly shorter
duration for longer runs when compared to the average run
duration, the scheduling department would publish the factory
schedule.
3 THE DILEMMA
Anecdotally the design team and factory personal know
the longer runs tend to produce more bottles per hour then
short runs. Yet, the values used to calculate equipment and
line availability do not reflect the changing nature of the
equipment operation.
that only a few of the runs lasted more than one or two shifts.
The use of exponentially based distributions and
Some flavors only required a small quantity of bottles filled to
availability calculation does not permit the team to consider
keep up with demand, while only a few commanded a large
different run times and associated inversely proportional
demand. It is the same equipment for short or long run, and
availability values. Knowing the equipments capability when
the design team desired information that quantified the
operated over a long run may suggest to the design team that
changing nature of the failure rates for various lengths of
altering the equipment set-up methods may reduce downtime
planned runs.
sufficiently to permit shorter runs. Or, they may find, that even
with the better equipment availability in the latter parts of long
run may not be sufficient to provide the cost savings 5 GENERAL RENEWAL PROCESS
anticipated, thus suggesting the use of redundant sets of
Advances in the development of the treatment of
equipment to improve line availability.
repairable systems’ data analysis permit the fitting of a
Another troublesome unknown is the rate of change of
parametric model to the factory data. (Mettas and Wenbiao
equipment and line availability. A rapid or slow change would
2005) The data provided by the factory meet the two primary
suggest different strategies to design the improved line. The
assumptions:
same information on the time dependency of availability
1. The time to first failure (TTFF) distribution is known and
would also permit additional accuracy in line scheduling, even
can be estimated from the data. There are over 2000
for the current line configuration.
failure events within the dataset.
The current data analysis methods do not provide
sufficient information related to the changing equipment
The Weibull probability plot shows a beta of
availability. Therefore, the design team decided to employ
approximately 0.6. The fit of the two parameters Weibull was
data analysis that included the time element and the associated
done with the rank regression on X using median ranks. The
changes in equipment availability.
4 GRAPHICAL ANALYSIS
The Mean Cumulative Function (MCF) is a non-
parametric graph of the cumulative failures plotted versus
time. The following plot has 6 months of operations for one
piece of equipment on the production line. There are
approximately 40 different runs (different bottle size/flavor
configurations or ‘setups’).
Overall, from this plot, which appears to be a fairly
straight line, the conclusion is the system is not improving or
degrading over time as the repairs occur. It remains at
approximately the same condition or failure rate over various
length runs. (Trindade and Nathan 2006)
This is in conflict with the common knowledge within the
factory, where over the time of the run, the equipment tends to
4. beta less than one indicate a system that has a decreasing
β
failure rate over time. This suggests that the repairs made − λ ( xi + vi−1 )β − vi−1 (8)
f (t i t i − 1) = λβ (xi + vi −1 )β −1 e
during the earlier part of the run assist in preventing future
failures. For further details on the derivation and fitting algorithms for
this model see (Mettas and Wenbiao 2005).
2. The repair time is negligible relative to the run time. Most
repairs occur within 1 minute of failure occurrence and
compared to the average runtime of approximately 45
minutes is negligible.
The fit of the repair times was done within Weibull++
using rank regression on X and median ranks to fit the
lognormal distribution. The plot shows that approximately
50% of the repairs are accomplished within one minute and
approximately 90% are accomplished within 10 minutes.
While a larger difference between runtime and repair time
would be desirable, the single order of magnitude difference is
sufficient for this analysis.
The general renewal process model uses a concept of
virtual age. Let t1, t2, …,tn represent the successive failure time.
And, let x1, x2, …,xn represent the time between failures where
ti = ∑ j −1 x j
i
(5)
For the Type II model of the General Renewal Process the
virtual age is determine with equation 6.
vi = q(vi −1 + qxi ) = q i x1 + q i − 1x2 + L + xi (6)
where vi is the virtual age of the system right after the ith
repair. Depending on the value of q the model permits the
partial improvement of the system by adjusting the apparent
system age.
The power law function models the rate of recurrent
failures within the system, which is
λ (t) = λβ t β −1 (7)
and, the conditional pdfis
5. 5.1 Analysis
Minutes 120 240 480 960 1440
Within the Weibull++ software algorithms for modeling
recurrent event data, there are two models available. The Type Cumulative
I model assumes the repair only addresses the immediate 0.1395 0.1077 0.0865 0.0754 0.0706
Failure Intensity
failure. Whereas, the Type II model assumes the repair
partially of completely repairs or possibly improves the Instantaneous
system, not just fixing the immediate fault. Given the nature of 0.0482 0.0377 0.0307 0.0267 0.0288
Failure Intensity
fixes on the production line that often include equipment
adjustments (alignment, timing, etc.) we use the Type II model Considering the MTBF is the inverse of the failure intensity,
for this analysis. we can calculate the MTBF values for specific durations or
Weibull++ using the General Renewal Process, type II, three- instants.
parameter model, accomplishes the fit. The results are
Minutes 120 240 480 960 1440
Beta = 0.27
Lambda = 2.09 Cumulative
7.17 9.29 11.56 13.26 14.16
MTBF
q = 0.38
The third parameter, q, may be considered an index for repair Instantaneous
20.75 26.53 32.57 37.45 34.72
effectiveness. Where q=0 represents a perfect repair, ‘as good MTBF
as new’ state. And, where q=1 represents a minimal repair,
permitting the use of non-homogenous Poison process analysis The MTBF values above along with the MTTR value of 2.45
(MTBF) or the system is considered in an ‘as bad of old’ state. minutes determined as the expected value of the fitted
This model permits the repair to only partial make they system lognormal distribution, we can use the availability formula (3)
better, 0<q<1 or an imperfect repair. The q=0.38 indicates that above to determine the expected availability values for select
in general the repairs make a slight improvement. durations or instants.
5.2 Discussion
Minutes 120 240 480 960 1440
The plot of cumulative failure intensity vs. time shows the
rapid improvement in equipment performance after the early Cumulative
failures receive attention. Note the jog upward in the data at 0.75 0.79 0.83 0.84 0.85
Availability
approximately 500 minutes, where two plant behaviors
contribute to cause this data. First, a significant number of Instantaneous
runs are scheduled to occur over one shift, which is 480 0.89 0.92 0.93 0.94 0.93
Availability
minutes long. Second, the shift change incurs a change of
personal and during the shift briefing time, the line is
Finally, using the equation to determine the expected
administratively shut down. The restart incurs additional
throughput, equation (4), we can determine the expected
failures and adjustments.
production for various durations of runs. The instantaneous
After approximately two shifts or 1000 minutes of running, throughput provides information on the improving nature of
the equipment tends to run smoothly and repairs do not the system over time.
improve or degrade the equipment performance.
Minutes 120 240 480 960 1440
5.3 Model Use
The GRP model permits us to determine the cumulative, Cumulative
283 301 314 321 324
instantaneous and conditional failure intensities at a given Throughput
time and duration of our choosing. This addresses the desire to
determine the equipment availability and throughput for Instantaneous
340 348 353 357 355
specific run durations. Throughput
Using the quick calculation pad within Weibull++ for the
fitted data we can calculate the for the cumulative and
instantaneous failure intensities at select duration or times,
respectively. The following table summaries the failure 6 ANALYSIS.
intensity calculations:
With the improvement in calculating the changing nature
of the filler’s MTBF, we are not able to determine the
potential impact on final goods inventory reduction. The
6. comparison of the current short run performance to the
potential performance provides a basis for the potential
This suggests a 20% reduction in time to produce the same
inventory reduction.
amount of finished goods for a four-hour duration run. Of
6.1 Inventory vs. Throughput course, this is only possible if the equipment improvements
permit the filler to have the same average throughput over a 4
When analyzing the opportunity of increasing throughput
hours run as the long run average throughput of 380 bottles
by improving the line availability, we are able to determine the
per minute. The reduced runtime values permit the reduction
potential inventory savings using an application of Little’s
in finished goods, as the increased capacity of the factory
Law.
permit the factory to replenish the inventory more often.
Finished Goods Inventory = The cost savings in inventory provides a basis for the
engineering improvement project. If the engineering team
Throughput x Flow Time (9) expects to make improvements to achieve four-hour runs with
a 380 bottles/minute throughput, they may achieve at least a
The above Little’s Law (Silver, E. et.al. 1998) can be 20% reduction in inventory. Assume the cost to carry the
applied to evaluate the tradeoff between the throughput and inventory for a year is $20 million within this site. This
the inventory cost. It is clear that increasing throughput while suggests the engineering team can spend $5 million for
holding flow time constant will take less runtime to build the improvements and achieve a one-year payback on the
same amount of finished goods. investment.
7 CONCLUSION
The results show the lack of accuracy of the existing method
Length of run
120 240 480 960 1440
(minutes)
Time to build
3.53 3.33 3.19 3.12 3.09
1000 units
%Improvement
25.5 20.9 17.5 15.6 14.7
with 380/min
when evaluating equipment availability using traditionally
calculated MTBF and MTTR. The traditional method has
only one, non-time dependant estimate for MTBF. In order to
provide a better overall analysis of equipment availability and
throughput, include within the analysis a time dependence
variable such as run durations. The GPP model permits such
an analysis.
As seen in the calculations using the GPP model, it takes
approximately 4 hours (240 minutes) of runtime to stabilize
the instantaneous availability and throughput. Engineering
changes to the equipment to either accelerate or improve the
initial performance effectively eliminating the first four hours
of adjustments will permit the line to run more efficiently with
short runs. Simply implementing shorter runs will not achieve
the goal without fundamental changes to the production
equipment.
Running more effectively permits the reduction of final goods
inventory by as much as 20% for a 4 hour run. Further
inventory reduction is also possibly due to the additional
capacity of the factory and is not consider in this analysis. The
cost savings associated with the inventory reduction provides
a boundary for the improvement costs.
8 REFERENCES
7. 1. Mettas, A. and Z. Wenbiao (2005). Modeling and currently the Chair of the American Society of Quality
analysis of repairable systems with general repair. Reliability Division, active at the local level with the Society
Reliability and Maintainability Symposium, 2005. of Reliability Engineers and IEEE’s Reliability Society, IEEE
Proceedings, Annual. reliability standards development teams and recently joined
2. Trindade, D. and S. Nathan (2006). Simple plots for the US delegation as a voting member of the IEC TAG 56 -
monitoring the field reliability of repairable systems. Durability. He is a Senior Member of ASQ and IEEE. He is
Reliability and Maintainability Symposium, 2006. an ASQ Certified Quality and Reliability Engineer.
Proceedings, Annual.
3. Silver, E., D. Pyke, and R. Peterson (1998). Inventory Angela Lo
Management and Production Planning and Scheduling, 7313 Shelter Creek Lane
3rd Ed. Wiley, New York, 1998. San Bruno, CA 94066, USA
e-mail: angelalo928@gmail.com
9 BIOGRAPHIES
Fred Schenkelberg Angela Lo is Senior Financial Analyst at Kaiser
Ops A La Carte, LLC Permanente – South San Francisco Medical Office. In her
990 Richard Avenue, Suite 101 current job role, she provides operational analysis and process
Santa Clara, CA 95050, USA improvement recommendations to front office operations.
e-mail: fms@opsalacarte.com Prior to this position, she worked for a few domestic and
international companies with focus areas in supply chain
Fred Schenkelberg is a reliability engineering and management, operations improvement, and six sigma
management consultant with Ops A La Carte, with areas of initiatives. Her knowledge in process improvement was not
focus including reliability engineering management training only utilized in manufacturing operations but also in service
and accelerated life testing. Previously, he co-founded and environment. She earned her bachelor’s degree in Industrial
built the HP corporate reliability program, including Engineering and Operations Research at University of
consulting on a broad range of HP products. He is a lecturer California, Berkeley in 2005 and her master’s degree in
with the University of Maryland teaching a graduate level Industrial and Systems Engineering at San Jose State
course on reliability engineering management. He earned a University in 2007. She also obtained her Six Sigma Black
Master of Science degree in statistics at Stanford University in Belt Certification through American Society for Quality in
1996. He earned his bachelors degrees in Physics at the 2009. Angela is currently an active member in American
United State Military Academy in 1983. Fredis an active Society for Quality.
volunteer with the management committee of RAMS,
8. 1. Mettas, A. and Z. Wenbiao (2005). Modeling and currently the Chair of the American Society of Quality
analysis of repairable systems with general repair. Reliability Division, active at the local level with the Society
Reliability and Maintainability Symposium, 2005. of Reliability Engineers and IEEE’s Reliability Society, IEEE
Proceedings, Annual. reliability standards development teams and recently joined
2. Trindade, D. and S. Nathan (2006). Simple plots for the US delegation as a voting member of the IEC TAG 56 -
monitoring the field reliability of repairable systems. Durability. He is a Senior Member of ASQ and IEEE. He is
Reliability and Maintainability Symposium, 2006. an ASQ Certified Quality and Reliability Engineer.
Proceedings, Annual.
3. Silver, E., D. Pyke, and R. Peterson (1998). Inventory Angela Lo
Management and Production Planning and Scheduling, 7313 Shelter Creek Lane
3rd Ed. Wiley, New York, 1998. San Bruno, CA 94066, USA
e-mail: angelalo928@gmail.com
9 BIOGRAPHIES
Fred Schenkelberg Angela Lo is Senior Financial Analyst at Kaiser
Ops A La Carte, LLC Permanente – South San Francisco Medical Office. In her
990 Richard Avenue, Suite 101 current job role, she provides operational analysis and process
Santa Clara, CA 95050, USA improvement recommendations to front office operations.
e-mail: fms@opsalacarte.com Prior to this position, she worked for a few domestic and
international companies with focus areas in supply chain
Fred Schenkelberg is a reliability engineering and management, operations improvement, and six sigma
management consultant with Ops A La Carte, with areas of initiatives. Her knowledge in process improvement was not
focus including reliability engineering management training only utilized in manufacturing operations but also in service
and accelerated life testing. Previously, he co-founded and environment. She earned her bachelor’s degree in Industrial
built the HP corporate reliability program, including Engineering and Operations Research at University of
consulting on a broad range of HP products. He is a lecturer California, Berkeley in 2005 and her master’s degree in
with the University of Maryland teaching a graduate level Industrial and Systems Engineering at San Jose State
course on reliability engineering management. He earned a University in 2007. She also obtained her Six Sigma Black
Master of Science degree in statistics at Stanford University in Belt Certification through American Society for Quality in
1996. He earned his bachelors degrees in Physics at the 2009. Angela is currently an active member in American
United State Military Academy in 1983. Fredis an active Society for Quality.
volunteer with the management committee of RAMS,