2. 2 of
Failure After Overhaul
0
200
400
600
800
10 00
12 00
14 00
16 00
18 00
2 000
< 1 wk 1 – 2 wk 2 – 3 wk 3- 4 wk 1 – 2 mo 2 – 3 mo >3 mo
Time a fte r ove rha u l
4. 4 of
Each Technology Is Optimized
Machine Healthcare is sub-optimized
Spreads cost over as many
machines as possible
Minimizes cost
per data point
Maximizes utilization of
test equipment
Provides evidence of full
work load to supervision
5. 5 of
Is This the Best Way?
Would you be happy with your doctor
if on your annual physical he only
tested your pulse rate?
And then sent you out to contract
your own blood work and interpret
the results?
Then based on that limited
information, he makes the
decision to do surgery.
A pump overhaul is surgery!
6. 6 of
Technology Centered:
Optimizes individual
technology program
spreads cost over as
many pieces as
possible
Minimizes cost
per measurement
Provides full
workload
Keeps equipment in
use
Machine centered:
Optimizes machine
health
Provide all needed
information to assess
machines health
Decide what PM's
actually improve or
maintain machine's
health
Family physician
model
7. 7 of
Application to Machinery Healthcare
To get a complete picture of machine
health, you need to run a number of
tests.
And when that PM for overhaul
(surgery) comes up, you can make
an informed decision on whether to
perform or defer it.
You’re more likely to catch
something early.
8. 8 of
Advantages of a Machine
Centered Approach
Optimize Machines Healthcare
Defer routine overhauls
Collect complete data on each trip
to machine
Less machines per day
More valuable information
Manage failure
9. 9 of
Machine Centered Process
It’s nothing new
Reliability Centered Maintenance formalizes it
But the fact is not everyone can do RCM
• They can’t afford it
• They don’t have the manpower
• They can’t get approval
But they still have to maintain the machine
10. 10 of
Machine Centered
Thought Process
What are the
possible
failures?
Which of
these
failures are
significant?
How can
we avoid
these
failures?
When we can't
avoid failure, how
can we get an
early warning?
Tailor a suite of
tests to get
early warnings.
Collect all
information at one
decision point.
12. 12 of
First ask:
What are the possible failures
Think about the function of the
machine
How can it fail to meet that function?
13. 13 of
Function
Downstream
(Load side)
Start motor
Stop motor
Deliver specified
torque at
specified RPM
Specified speed
ramp rate up
Specified speed
ramp rate down
accelerate load
from stop to
operating speed
adjust torque and
speed on
demand
Motor-Drive System Functions
14. 14 of
Types of Failures
• Hidden
• Safety
• Environmental
• Operational
• Non-Operational
15. 12/5/97 15 of
Functional Failure Analysis
• Complete failure
• Partial failure
• Intermittent failure
• Failure over time
• Over performance of function
16. 16 of
Function Functional
Failure
Failure Mode
Start motor Motor will not turn winding failure (stator)
Insulation Failure (stator)
Rotor failure
Bearing Seized
Contactor Failed
Loss of Power
VFD Malfunction (Start)
Stop motor Motor will not stop Contactor Failed
VFD Malfunction (STOP)
Deliver
specified torque
at specified
RPM
Motor turns at
wrong speed.
VFD Malfunction (Speed
control)
Motor fault
load fault
Specified speed
ramp rate up
Motor ramps up at
wrong rate
VFD Malfunction (ramp
up)
winding failure (stator)
Insulation Failure
(stator)
Rotor failure
Bearing Seized
Functional
Failure of a
Motor Drive
System
17. 17 of
Next ask:
Which of these failures are
significant?
How often it happens - frequency
What’s the impact when it does - consequence
Risk = frequency times consequence
18. 18 of
Machine History
Machine Failures Total Downtime
Casing failure 3 times in 10 years 24 hours
Seal failure Twice in the last six
months
105 hours
NPSH failure Couple of times a
week
Stops line for 10
minutes each time
Bearing failure About once a year 6 hours
• $10,000/hr downtime cost
• 2080 hours / yr
19. 19 of
Calculations
• Pump Set 1:
Total time = 10 yr x 2080 hr/yr = 20800 hrs
Downtime = 24 hrs
MTBF = 20800/3 = 6933.3 hrs
MTTR = 24/3 = 8 hrs
German
French
Cost = (3 fail x 8hr/fail x $10,000/hr)/10yr = $24,000/yr
20. 20 of
Application Of Ranking
Machine MTBF MTTR Annual
Downtime Cost
($)
Casing failure 6933.3 8 $24,000
Seal failure 520 52.5 $2,100,000
NPSH failure 20 0.16 $173,000
Bearing
failure
2080 6 $60,000
Note that the 10 minute failure adds up to
more loss than the 6 hr or 8 hr failures
21. 21 of
Criticality Survey
Score Frequency Effect
1 1/10 yrs None
2 1/ yr A little
3 1/ month Some
4 1/ week A lot
5 1/ day Complete
22. 22 of
Then ask:
How can we avoid these failures?
Design changes
Adjust, lubricate, …
Preventive replacement
23. 23 of
Task Types
• Time directed
• Condition directed
• Failure finding
• Run to failure
• Regulatory compliance
27. 27 of
Condition Assessment Techniques
• Process Parameters
• Vibration Analysis
• Infrared Thermography
• Ultrasonic
• Lubricating Oil Analysis
• 30+ Other NDT technologies
It’s a way of using information, not a
specific technology
28. 28 of
Then:
Tailor a suite of tests to get early
warnings?
Only do the tests needed
29. 29 of
Failure Mode Failure Causes Symptoms Measurement
winding failure
(stator)
Conductor failure vibration > ips vibration monitoring
Various MCSA
Various MCE
excessive vibration vibration > ips vibration monitoring
Insulation Failure
(Stator)
Breakdown Polarization index
R to gnd < ohms Megger
excessive current temperature > °F thermometer
amperes > A ammeter
voltage spike power quality monitor
excessive temperature Motor temperature > °F thermometer
Ambient temperature > °F thermometer
thermography
excessive vibration vibration > ips vibration monitoring
phase imbalance phase angle >± ° power quality monitor
MCSA
MCE
temperature > °F thermometer
Rotor failure broken rotor bars vibration > ips vibration monitoring
Bearing Seized Fatigue vibration > ips vibration monitoring
shock pulse > db Shock pulse meter
improper lubrication shock pulse > db Shock pulse meter
Lube deterioration lube monitoring
Motor Failure
31. 31 of
Machine Centered
Thought Process
What are the
possible
failures?
Which of
these
failures are
significant?
How can
we avoid
these
failures?
When we can't
avoid failure, how
can we get an
early warning?
Tailor a suite of
tests to get
early warnings?
Collect all
information at one
decision point.
This is a graph showing time till failure after overhaul. Notice the the failure rate is highest immediately after an overhaul.
This matches my experience with shipboard equipment during the testing period after a ship overhaul. We usually found that the majority of the failures occurred within a couple of days of startup.
There are many reasons for this early failure, including misassemblely, lack of clean conditions during overhaul resulting in dirt getting into critical clearances, and infant mortality of newly installed components.
The bottom line here is “minimize tasks that break the system boundary”. Anytime you break that boundary, it like a surgeon making a cut. You have an increased risk of contamination, improper procedures, and the like.
Many plants have a condition assessment program in place. But usually those programs operate in relative isolation, concentrating on only one or two technologies. The people responsible for them work to maximize the efficiency of the application of the technology. Therefore the application of the technology is optimized, rather than the results.
A machine-centered, as opposed to a technology-centered, approach to condition assessment will maximize your effectiveness in improving machine reliability. This approach focuses on those tests that are most cost effective when it comes to machine reliability.
Let’s consider the typical vibration program. After some research, a cost justification is approved for purchase vibration equipment and software. Then one or two technicians are trained and designated to manage the program. They’re told to make the vibration program run. In the absence of any measure of cost-benefit, they make the decision to apply the vibration to as many pieces of equipment as possible. From their perspective, it’s a smart move.
CLICK
It spreads the cost of equipment and training over as many pieces of equipment as possible,
CLICK
it minimizes cost per measurement,
CLICK
it keeps the equipment in use,
CLICK
and it provides a full work load. They’ve optimized the individual technology program.
CLICK
But the healthcare for that machine in sub-optimized.
But is this the best strategy to improve machine reliability? Would you be happy with your doctor if on your annual physical he only tested your pulse rate?
And then maybe he makes a decision to do surgery based on that?
Probably not! You’d like to see him make a number of tests — blood work, EKG, chest x-ray, etc.
Then he’ll get a complete picture of your health. And have a lot better basis to make a decision on surgery.
We want all the information possible to assess the machines health. And we must do it economically. This may include high tech methods such as vibration, IR, ultrasound or the like. And it also includes inspections and test that are more basic. Visual inspection, trending operating parameters, and failure finding tasks are examples.
We must design our PM tasks to collect the right information at the right intervals.
We also should only do those tasks that can improve the reliability of the machine, not damage it. Time based overhauls and replacement should get a very close look. Some components need to be replaced on a set interval. But many can be replaced as an “on condition” task. Examples that come to mind are drive belts, florescent tubes, or oil changes.
Time based overhauls are expensive and can lead to further failure. They should almost always be based on the condition of the machine. You wouldn’t have your appendix out just for the fun of it.
What I’m saying is we should approach our machines health like we approach our own. We do the right inspections and tests (vital signs, blood pressure, MRI, etc.) at the right times to detect potential problems early. And we do preventive tasks like exercise and diet to maintain our health.
The same principle applies to machinery. To get a complete picture of machine health, you need to run a number of tests. Then you’ll have complete information to manage the machine.
Then you assess the machine based on those tests. If the machine is OK, you’ll see that. But you might just catch a problem while it is still not causing problems. Then you can plan action to corr4ect the problem with minimum impact on production or customer service.
We had a good example of this a couple of years ago. A client had a 250Hp fan that showed misalignment and over tensioning of the belt drive. The bearing also showed signs of deterioration. We recommended taking the fan down to replace the bearings and properly align and tension the drive system. The client decided not too.
After about three months the bearing failed, the shaft dropped and broke. That caused the fan to contact the shrouding and destroy it. When all was said and done, what should have been a $2500 repair cost over $45,000.
Complete information about the machine also means when that time based PM for overhaul, which is surgery, comes up, you can make an informed decision to perform or defer it.
If you have a technician going to a machine to collect data for one technology, why not collect all the data you need?
Instead of just vibration, how about trending bearing temperatures and fluid pressures. RPM and other parameters also contribute to a complete picture of the machines health.
It means that more time will be spent at each machine, and fewer machines will be assessed in a day. But you have more valuable information.
You’ll also save transit time, prep time, and administrative time associated with multiple trips to the machine. And you’ll save time by just applying a technology to those machines where it’s cost effective. You haven’t optimized the technology, but you have optimized the machine’s healthcare.
Isn’t that what we really want?
I want to propose an approach that’s not new. Many maintenance people have been doing it for years. It’s formalized by Reliability Centered Maintenance. I call it Machine Centered Healthcare.
I believe that Reliability Centered Maintenance is the best approach for critical machines. But not every plant can afford it, can get approval for it, or has the manpower for a RCM program. It’s expensive in the short run.
I’m proposing a thought process that will help you decide how to maintain your machines in a less formal manner with less paperwork than reliability centered maintenance.
A machinery-centered approach looks at the machine first, and by asking a series of questions, helps you decide how to maintain the machine’s health. What tests should be done? What routine PM should be done? How can we make the overhaul/no-overhaul decision?
What we want to do is to maximize our effectiveness in improving machinery reliability. We need to assess machine health based on several measures. And we need to maintain the reliability by doing those tasks that prolong or improve machine health. But we should only do those tasks and tests that are cost effective from the point of view of the machine. The question is, how do we decide what to do?
I propose we follow a systematic process to identify that.
The questioning process is:
First ask, what are the possible failures?
Next ask, which of these failures are significant?
Next ask, how can we avoid these failures?
Then ask, when we can’t avoid failure, how can we get an early warning?
Then, tailor a suite of tests to detect those early warning signs.
Finally, collect the results of the tests at one decision point.
Lets look at each step in detail.
Our first step is to decide at what level we are going to address the machine. We could look at the whole VFD – motor – pump system. Or the motor – pump system. Or the pump alone. I think it’s probably simple to just look at the pump. Then we could do the motor and VFD separately.
To ask what the failures are, first we need to know what the machine is supposed to do. What is its primary function?
At first glance, you might say that a pump’s primary function is to pump a liquid. In reality, its primary function may be to keep a supply tank full. As the process draws liquid from the tank, the pump replaces it. If the pump can’t pump at a sufficient rate, the supply tank will go empty. That minimum rate will vary from process to process. Look beyond the obvious to the real function of the machine.
Machines can also have secondary functions. For example, the pump has a secondary function keeping the pumped fluid contained. This is especially critical if the liquid is flammable, acidic or a caustic. If the seal leaks, or the casing develops a pinhole due to cavitation, the pump fails to meet the function. It may still pump sufficient liquid to meet the needs of the process, but by letting fluid leak, it’s failed at it’s secondary function.
Once you’ve decided what the machine’s function is, ask what can happen to prevent it from meeting that function.
In the case of the pump, the answer might be the impellor wearing out reducing available head, bearing failure causing low RPM reducing flow, or a crack in the casing or a worn-out seal causing liquid to be lost, or a number of other possible failures.
At this point, you’re just brainstorming. Don’t consider whether the failure is likely or has much impact. We’ll do that in the next step. For now, just get a complete list of potential failures..
This is a partial table of the possible functions of a motor-drive system. Your particular systems may have different functions.
Any machine may have both primary and secondary functions. Be sure to consider all of them.
&lt;number&gt;
This busy table is a partial list of the functions, functional failure and the failure modes that may cause them. We’ve taken the list of functions and then asked, how can the machine fail to meet those functions. Each function may have one or more potential functional failures.
Then we ask, what failure modes (or causes) might cause that functional failure. Again, each functional failure may have one or mode associated failure modes.
This is the starting point for looking for failure symptoms and maintenance tasks. But for most machines, you’ll have a long list. There will probably be too many to address in a reasonable amount of time.
Because it’s so long, we need to trim it to a more manageable size. That’s the next step.
&lt;number&gt;
Here’s an example of what you might find in machinery history. This is for several failure modes but you should do the same thing for all the failure modes on your machine.
I’ve taken the downtime cost to be $10,000 per hour and assumed a one shift operation.
As you can see, we have a casing failure that has happened there time in one year for a total down time of 24 hours. What we what to do next is to find the mean time between failure and the annual downtime cost due to casing failure.
These are the calculations.
We’ll define MTBF as the total time divided by the number of failures. Our total time is 10 years time 2080 hours per year, or 20800 hours. The casing failed three times in that period so the average or mean time between failures is 20800 divided by three or 6933.3 hours.
Now us folks in maintenance and engineering understand that. But the folks in management may not. So we have to put it in their terms. We’re technical people, we speak a technical language (German). Management speaks in a more general language (French). Because they have no incentive to learn German, we have to learn French. In this case French means dollars.
So let’s convert our figures to dollars. If we do, we find that casing failures cost on average $24,000 per year. They don’t happen every year but when they do, their expensive.
We also know that a total of 24 hours was required to complete the repairs. The mean time to repair is the total downtime divided by the number for failures, or 8 hours.
&lt;number&gt;
These are the results for a four modes. The seal failure is obviously the most expensive. But look at the NPSH failure. It happens so often that no one thinks much about it. Plus we might be able blame it on production. Each time it happens it only costs $1600 but it’s costing us $173,000 per year.
We need to look at all the potential failures. Some might seem minor because they don’t cost much when thwey happen, but they happen so often that the cost adds up.
What we’ve done here is define risk as consequence time frequency. No matter what information you have, if it can be put in those terms, we can use it for ranking.
However, we can do it without the history. I’ve had success in the past using a subjective evaluation. Make a list of the failures and ask two questions about each one: how often does this failure occur and what’s the impact on production when it does. Make it up as a questionnaire.
Possible answers are in this table. This may sound simplistic but it works.
Now send the questionnaires to a cross section of maintenance, production and management personnel. As each person fills the questionnaire out, for each failure mode, he or she will ask of themselves how often does this happen and what the effect on production or safety. Based on their answers, they’ll enter the appropriate score.
When you get them back, average the scores for each item.
The significance of a failure is the combination of the two factors: frequency and effect. By taking the score for frequency score and multiplying it by the score for effect you’ll get a composite score for each failure in the range of 1 to 25. Rank the list by the composite score.
The higher the composite score, the greater the significance of the failure.
.
There will be some failures that we can’t avoid. For those, we ask “How can we detect the failure before it occurs?”
What are the symptoms of the failure? Most failures show symptoms before they happen. A pump may have to be run faster because of a worn impellor. A motor may draw more amps because of misalignment or low supply voltage. A coupling may be hot because of misalignment or lack of lubrication. Make a list for each failure.
An understanding of the PF curve is helpful.
With a list of symptoms, you’re now in the position to select tests that can detect that symptom.
For each symptom, try to get as many independent tests as possible. The more information you have, the more confident you’ll be in your call. You should have at least two tests for each failure that can confirm each other and avoid false positives or negatives.
As you’re considering tests, don’t limit yourself to high tech methods. Process parameters are also valuable. And one of the most valuable tests is the operator and maintainer. An experienced person, familiar with the machine, making a conscious effort to sense a particular effect, can be very effective at assessing the health of a machine.
This table shows some of the failure modes, their causes and the symptoms they might present in a motor. It also shows some possible tests. You want to develop a similar table for your machines. After you’ve finished doing this, you have a list of the tests needed..
Doing the tests without putting all the information together is not effective. I recommend that each machine have one or two individuals assigned to monitor its health. They should be trained in assessing all the information provided by the tests. Notice I didn’t say, “trained to evaluate the data”. They don’t have to analyze the data (for example the vibration spectra); they just have to understand the results of that analysis.
They should receive the results of the tests along with any other pertinent information on a regular basis. Then they can use that information to manage the machine. They can use it to adjust lubrication intervals, decide when adjustments are needed or part replacement is indicated. And that overhaul? They may decide it’s not needed after all.
Back in about 1990, I saw a good example of this in practice. A large power plant had been overhauling their main feed pumps on a five year interval. A new reliability engineer came on board. She changed to a condition based approach.
Each time an overhaul came due, instead of proceed with the overhaul, she assigned a plnner who was very familiar with the pump to review the data. They collect all available date: vibration, IR, motor current dats, flow rates, RPM’s, etc. Anything they could find. Then the planner looked at the data in conjunction with each other. And he made one of three reccommendation:
Defer the overhaul for two years.
Defer the overhaul for one year,
Or
3. Perform the overhaul.
As a result, they are going for more than 10 years between overhauls.
If we do all of that, we should do this – Optimize the machines healthcare.
Thank you for you attention. I’ll be happy to take any questions or to listen to any comments.