Accelerated life testing (ALT) is widely used to expedite failures of a product in a short time period for predicting the product’s reliability under normal operating conditions. The resulting ALT data are often characterized by a probability distribution, such as Weibull, Lognormal, Gamma distribution, along with a life-stress relationship. However, if the selected failure time distribution is not adequate in describing the ALT data, the resulting reliability prediction would be misleading. In this talk, we provide a generic method for modeling ALT data which will assist engineers in dealing with a variety of failure time distributions. The method uses Erlang-Coxian (EC) distributions, which belong to a particular subset of phase-type (PH) distributions, to approximate the underlying failure time distributions arbitrarily closely. To estimate the parameters of such an EC-based ALT model, two statistical inference approaches are proposed. First, a mathematical programming approach is formulated to simultaneously match the moments of the EC-based ALT model to the ALT data collected at all test stress levels. This approach resolves the feasibility issue of the method of moments. In addition, the maximum likelihood estimation (MLE) approach is proposed to handle ALT data with type-I censoring. Numerical examples are provided to illustrate the capability of the generic method in modeling ALT data.
2. ASQ Reliability Division
English Webinar Series
One of the monthly webinars
on topics of interest to
reliability engineers.
To view recorded webinar (available to ASQ Reliability
Division members only) visit asq.org/reliability
To sign up for the free and available to anyone live
webinars visit reliabilitycalendar.org and select English
Webinars to find links to register for upcoming events
http://reliabilitycalendar.org/webina
rs/
3. Accelerated Reliability
Techniques
in the 21 st Century
by
Lou LaVallee, Senior Reliability Engineer, Ops A La Carte
Doug Goodman, CEO, Ridgetop Group
Mike Silverman, Managing Partner, Ops A La Carte
Milena Krasich, Sr Principal Systems Engineer, Raytheon
Page 3
4. ABSTRACT
• As product development cycles are shortening, the need for
more accelerated reliability tools is becoming increasingly
more important.
• This webinar will focus on the best reliability tools focused
on accelerating the reliability learning process. In this
webinar, we will focus on four important accelerated tools:
• Robust design and Reliability Engineering Synergy
• Prognostics as a Tool for Reliable Systems
• HALT
• Accelerated Reliability Growth Testing
The audience will understand a variety of reliability tools that
can be used to accelerate product development from design
through testing. This will also provide a snapshot of the
learning that will be provided as part of the Accelerated
Stress Testing and Reliability Workshop for 2013.
Page 4
5. Agenda
• Introduction 5 min
• Robust Design and Reliability Engineering
Synergy 10 min
Lou LaVallee, Ops A La Carte
• Prognostics as a Tool for Reliable Systems10 min
Doug Goodman, Ridgetop Group
• HALT/AST – History and Trends 10 min
Mike Silverman, Ops A La Carte
• Accelerated Reliability Growth Testing 10 min
Milena Krasich, Raytheon
• Tieing them all together 5 min
• Questions 10 min Page 5
6. Biography for Lou LaVallee
• Mr. LaVallee is founder of Upstate Reliability Engineering Services
an upstate New York based consulting firm delivering advanced reliability support to a wide
variety of industries. He joined forces with Ops a la Carte in 2010.
• He has a strong technical background in physics, engineering materials/polymer science and a
solid grounding in consumer product design, development, and delivery. His comprehensive
background includes electronic films , robust design, modeling & analytics, critical parameter
management, six sigma DFSS & DMAIC, optimization of product quality/reliability,
experimental design, reliability test methods, and design tool development and deployment. He
successfully managed systems engineering groups for development of ink jet print heads at
Xerox Corp.
• Mr. LaVallee has held other technical management positions in manufacturing technology,
engineering excellence (trained several thousand engineers worldwide). He also managed the
robust engineering center at Xerox for 10 years, managed a high volume printing product
quality and reliability group, and worked extensively with high volume printing product service
organization.
• He has strong validation experience of design quality and reliability through product reviews
and customer interaction Mr. LaVallee holds a Bachelor of Science degree in Physics (BS),
and an MS from the University of Rochester in materials/polymer engineering.
• He holds several U.S. patents involving fluidics and engineering design processes. He is
currently a senior reliability engineering consultant with Ops a la Carte LLC.. Mr. LaVallee is an
ASQ certified reliability engineer.
Page 6
7. Biography for Doug Goodman
• Mr Goodman is CEO and Founder of Ridgetop Group, Inc.
an Arizona-based leader of advanced diagnostic, prognostic
and health management tools, instrumentation, and rad hard
microelectronics.
• He is accustomed to being a pioneer of innovative electronic technology and
establishing engineering firsts. His comprehensive background encompasses
low-noise instrumentation design, design-for-test (DFT), fault simulation
techniques, and design tool development at firms such as Tektronix and
Honeywell.
• He was also part of the team that developed the first DSP-based IF processing
for spectrum analyzers.
• He successfully steered engineering at Analogy Inc. (electromechanical design
simulation tools) as vice president until its IPO. Afterwards, he moved to co-
found and head Opmaxx Inc., a design-for-test IP firm that later merged with
Credence Systems.
• Mr. Goodman also serves on the Board of Engineering Synthesis Design, Inc.
(ESDI), a waveform and surface metrology instrumentation firm based in
Tucson, Arizona. (ESDI.com).
Page 7
8. Biography for Mike Silverman
• Mike Silverman is the founder and a managing partner at
Ops A La Carte, a Professional Consulting Company that has an
intense focus on helping customers with end-to-end reliability.
• Mike has over 25 years of experience in reliability engineering, reliability
management and reliability training. He is an experienced leader in reliability
improvement through analysis and testing.
• Through Ops A La Carte, Mike has had extensive experience as a consultant to
high-tech companies, and has consulted for over 500 companies in over 100
different industries in most of the US and 15 countries around the world.
• Mike is an expert in accelerated reliability techniques and owns HALT and
HASS Labs, one of the oldest and most experienced reliability labs in the world.
• Mike has recently completed his first book on reliability entitled “How Reliable Is
Your Product: 50 Ways to Improve Product Reliability”.
• Mike has authored and published 25 papers on reliability techniques and has
presented these around the world including Canada, China, Germany, Japan,
Korea, Singapore, Taiwan, and the USA. He has also developed and currently
teaches over 30 courses on reliability techniques.
• Mike is the chair of this year’s ASTR conference and chair of the Santa Clara
Valley IEEE Reliability Society. Page 8
9. Biography for Milena Krasich
• Milena Krasich is a Senior Principal Systems Engineer in Raytheon
Integrated Defense Systems, RAM Engineering Group in MA.
• Prior to joining Raytheon, she was a Senior Technical Lead of Reliability
Engineering in Design Quality Engineering of Bose Corporation, Automotive
Systems Division. Before joining Bose, she was a Member of Technical Staff in the
Reliability Engineering Group of General Dynamics Advanced Technology Systems
formerly Lucent Technologies, after the five year tenure at the Jet Propulsion
Laboratory in Pasadena, California.
• While in California, she was a part-time professor at the California State University
Dominguez Hills, where she taught graduate courses in System Reliability,
Advanced Reliability and Maintainability, and Statistical Process Control. At that
time, she was also a part-time professor at the California State Polytechnic
University, Pomona, teaching undergraduate courses in Engineering Statistics,
Reliability, SPC, Environmental Testing, Production Systems Design.
• She holds a BS and MS in EE from the University of Belgrade, Yugoslavia, and is a
California registered Professional Electrical Engineer.
• She is also a member of the IEEE and ASQC Reliability Society, and a Fellow and
the president Emeritus of the Institute of Environmental Sciences and Technology.
Currently, she is the Technical Advisor (Chair) to the US Technical Advisory Group
(TAG) to the International Electrotechnical Committee, IEC, Technical Committee,
TC56, Dependability. Page 9
10. &
Accelerated Stress Testing and Reliability Workshop
October 9-11, 2013 San Diego, CA
Accelerating Reliability into the 21st Century
Keynote Presenter Day 1: Vice Admiral Walter Massenburg
Keynote Presenter Day 2: Alain Bensoussan, Thales
Avionics
CALL FOR PRESENTATIONS: We are now Accepting Abstracts.
Email to: don.gerstle@gmail.com.
Guidelines on website www.ieee-astr.org
For more details, click here to join our LinkedIn Group:
IEEE/CPMT Workshop on Accelerated Stress Testing and Reliability
12. Introduction
In this webinar, we will introduce four of the most
effective reliability techniques that can accelerate
reliability learning on your product.
• Robust Design and Reliability Engineering Synergy
• Prognostics as a Tool for Reliable Systems
• Highly Accelerated Life Testing (HALT)
• Accelerated Reliability Growth Testing (RGT)
We invite you to determine which can be most
effective for your reliability program.
Page 12
13. Robust Design and Reliability
Engineering Synergy
by Lou LaVallee
Page 13
14. Poll Question 1
Have you ever used Design for Robustness
Techniques ?
a) We use all the time
b) We’ve used a few times
c) We tried once
d) We haven’t used but are planning to
e) We have never used
Page 14
16. Abstract for full tutorial
Robust Design (RD) Methodology is discussed for hardware
development. Comparison is made with reliability engineering (RE)
tools and practices. Differences and similarities are presented.
Proximity to ideal function for robust design is presented and
compared to physics of failure and other reliability modeling and
prediction approaches. Measurement selection is shown to strongly
differentiates RD and reliability engineering methods. When and
how to get the most from each methodology is outlined. Pitfalls for
each set of practices are also covered. (This presentation is a taste
of a larger presentation to be delivered in San Diego)
Page 16
18. RD Reliabilit y
Life Tests
P-diagram
Root cause Analysis
Tolerance Design Expt
Layout
Ideal Function POF
Response DOE RCM
Tuning Engineering Maintainability CBM
6 Flexibility
Lean Science Warranty $
Robust Design Simulation Reliability
Quality Testing
Models
Loss
Reuse FMECA
transformability HALT/HASS
Planning
S/N RSM
ADT Life prediction Redundancy
Online QC
ALT
Parameter design FTA
Availability
Generic Function RBD
Page 18
19. Robustness is…
“The ability to transform input to output as closely to ideal
function as possible. Proximity to ideal function is highly
desirable. A design is more robust if ratio of useful part to
harmful part [of input energy ] is large. A design is more
robust if it operates close to ideal, even when exposed to
various noise factors, including time”
Reliability is…
“The ability of a system, subsystem, assembly, or component
to perform its required functions under stated conditions
for a specified period of time”
Page 19
20. Harmful Variation & Countermeasures
• Search for root cause & eliminate it
• Screen out defectives (scrap and rework)
• Feedback/feed forward control systems
• Tighten tolerances (control, noise, signal factors)
• Add a subsystem to balance the problem
• Calibration & adjustment
• Robust design (Parameter design & RSM)
• Change the concept to better one
• Turn off or turn down the power
• Correct design mistakes (e.g. installing diodes backwards)
Page 20
21. Robustness Growth
S/N
Factors Can be changed
today
time
S/N
Factors Can be changed in 1
week
time
S/N
Benchmark Target
Factors Can be changed in 2
weeks
Robustness gains
time
Page 21
22. Progression of Robustness to Ideal Function Development
A B C
LSL USL
Zero Defects
Cpk
Static S/N
Dynamic S/N Ratio
When a product’s performance deviates from target, its quality is
considered inferior. Such deviations in performance cause losses to
the user of the product, and in varying degrees to the rest of
society.
Page 22
23. Useful
Input signals Output
Main Function
Mi Y=f(x)+
Harmful
Output
Noise Control
Factors Factors
Taxonomy of Design Function -- P Diagram
Page 23
24. Transformability & Robustness Improvement
Response Response
N1 N1
N2
N2
0,0 M signal 0,0 M signal
Minimizing the effects of noise factors on transformation of input to output
improves reliability. Sensitivity increase can be used for power reduction. Noise
factor here might be fatigue cycles, or stress in one or two directions, or …
Page 24
25. Typical Failure Modes and Causes for
Mechanical Springs
TYPE OF SPRING/STRESS
FAILURE MODES FAILURE CAUSES
CONDITION
- Load loss
- Parameter change
- Static (constant deflection - Creep
- Hydrogen embrittlement
or constant load) -Compression Set
- Yielding
- Fracture
- Damaged spring end - Corrosive atmosphere
- Cyclic (10,000 cycles or - Fatigue failure - Misalignment
more during - Buckling - Excessive stress range of
the life of the spring) - Surging reverse stress **
- Complex stress change as a - Cycling temperature
function of time …
- Dynamic (intermittent - Surface defects
- Fracture - Excessive stress range of
occurrences of
- Fatigue failure reverse stress
a load surge)
- Resonance surging
Page 25
26. Ideal Function & Failure Modes
If data remain close to ideal function, even under
predicted stressful conditions of use, and there is no way
for failure to occur without affecting functional variation of
the data, then moving closer to ideal function is highly
desirable.
For example, spring fatigue, if it did occur would
dramatically change force-deflection (F-D) data and inflate
variation. Similarly, for yielding, F-D results would change
and inflate the variation. Other failure modes would follow
in most cases.
Page 26
27. Measurement System Ideal Function
Y= M+e
M=true value of measurand
Y=measured value
Auto Steering Ideal function
Y= M+e
M=steering wheel angle
Y=Turning radius
Communication system ideal function
Y=M+e
M=signal sent
Y=signal received
Cantilever beam Ideal Function
Y= M/M*+e
M=Load
M*=Cross sectional area
Fuel Pump Ideal Function
Y= M
Y=Fuel volumetric flow rate
M=IV/P current, voltage,& backpressure
Page 27
28. Summary
• RD methods and Reliability methods both have functionality at their
core. RD methods attempt to optimize the designs toward ideal function,
diverting energy from creating problems and dysfunction. Reliability
methods attempt to minimize dysfunction through mechanistic
understand and mitigation of the root causes for problems.
• RD methods actively change design parameters to efficiently and cost
effectively explore viable design space. Reliability methods subject the
designs to stresses, accelerating stresses, and even highly accelerated
stresses, [to improve time and cost of testing]. First principle physical
models are considered where available to predict stability.
• Both RE and RD methods have strong merits, and learning when and
how to apply each is a great advantage to product engineering teams.
Page 28
29. Prognostics as a Tool
for Reliable Systems
by Doug Goodman
Page 29
30. Poll Question 2
Have you ever used Prognostics ?
a) We use all the time
b) We’ve used a few times
c) We tried once
d) We haven’t used but are planning to
e) We have never used
Page 30
31. Reliability “Bathtub Curve”
Prognostics Trigger Point
Failure rate
Infant Useful Life Normal
Mortality Period Wearout
Period
Time
Threshold Trigger Points Advanced Warning of Failure (RUL)
are selectable
Page 31
32. Prognostic Solutions
• Ridgetop develops
electronic prognostic
solutions for critical
systems:
– Sensor array
detectors
– Harnesses for
“prognostics-enabling”
critical systems, and
– Sentinel software to
comprise a complete
solution. Page 32
33. Electronic Prognostics
• Electronics are the keystone to successful deployment of
complex systems (50+ MPUs in an automobile)
• Large MTBF and Statistical Process Control and Centering
methods are not sufficient alone for reliability due to “outliers”
(e.g. Toyota Prius, Deepwater Horizon Drilling Rig, Boeing
787)
• Ridgetop technology exists to pinpoint degrading systems
before they fail; supporting operational readiness objectives
and cost-saving Prognostics/Health Management (PHM) and
Condition Based Maintenance (CBM) initiatives.
Page 33
35. Degradation Rates Depend on
Environmental Conditions
MTBF statistical expected life
Usage Environment
Usage monitoring would
provide a safety benefit if
actual usage is more
severe than predicted (see
the red region, T1). T1 T2
Service life can be
extended beyond normal
replacement time if the
actual usage severity is
known (see the green
region, T2).
PHM enables replacement only upon evidence of need
Page 35
36. Degradation Example
Good Power System Degraded Power System
State of State of Health
Health
Degraded VR
Threshold
End-of-Life End-of-Life
Both supplies provide regulated voltage, but one is degraded and will
soon fail.
Page 36
37. Prognostic Advantages
• Prognostics provides advanced warning of impending
failure conditions on critical systems.
• Physical evidence of degradation is the basis for service,
not an arbitrary time interval.
• PHM and CBM maintenance strategies can reduce
support costs through optimized timing of service and
parts replacement.
• Autonomic logistics systems can be established, placing
spare parts and provisions where needed.
Page 37
39. Sentinel NetworkTM
• Collection and analysis hub
for PHM
• Scalable, system level State
of Health (SoH) Analysis &
Prognostics
• Automatic SNMP-based
Sensor Network Discovery
• Troubleshooter
• System stability cost
reduction for tactical
networks
Page 39
40. Airborne Power System Monitoring
• PHM applied to power
systems in harsh
environment
• Apache Helicopter where
vibration, heat, shock all
can reduce lifetime of
deployed systems
• Extracts and processes
eigenvalues as a metric
of health
Page 40
41. Prognostic Health Management
Ecosystem
2 Identified Design
Integrated
Improvements
Communicate Diagnostic/Prognostics
PHM sensor
Design Platform
data
3
Address
1 ECRs
Real-time and
Health & RUL Improve
Subsystem
Parts
OEM
CBM Actions
4
Scheduler Minimize Inventory
5
Replacement
Parts
Line Replaceable Parts
Unit (LRU)
Maintenance
Page 41
43. Poll Question 3
Have you ever used the technique HALT ?
a) We use all the time
b) We’ve used a few times
c) We tried once
d) We haven’t used but are planning to
e) We have never used
Page 43
44. INTRODUCTION
HALT began 40 years ago with a simple idea of testing
beyond specifications in order to better understand
design margins.
Over the past 40 years, thousands of engineers around
the world have been exposed to the concepts of HALT
and have tried the techniques.
What have we learned in the past 40 Years?
Page 44
45. HISTORY OF HALT/HASS
HP started performing Stress for Life (STRIFE) testing in
the early 70’s. Some people consider STRIFE the
predecessor to HALT.
Reliability cannot be achieved by adhering to
detailed specifications. Reliability cannot be
achieved by formula or by analysis. Some of
these may help to some extent, but there is only
one road to reliability. Build it, test it, and fix the
things that go wrong. Repeat the process until the
desired reliability is achieved. It is a feedback
process and there is no other way.
David Packard, 1972
Page 45
46. HISTORY OF HALT/HASS
Dr. Gregg Hobbs officially coined the term HALT in 1988.
For the next two decades, Gregg traveled around the
world teaching the concepts of HALT and HASS.
Many of you in this room probably attended that seminar.
Page 46
47. HISTORY OF HALT/HASS
Over the next 17 years, HALT labs popped up around the
world. Today I estimate there are about 200 HALT labs in
the world.
This has exposed literally thousands of engineers to the
processes.
However, the methodology being practiced is inconsistent.
Standards have and are being written (more like
guidelines)
Books have been published
Conferences have been formed
Page 47
48. HISTORY OF HALT/HASS
Standards/Guidelines/References to HALT/HASS
IEC 62506: “Accelerated Testing”
IEST-RP-PR003: “HALT and HASS”
IPC-9592: “Performance Parameters for Power
Conversion Devices”
Page 48
49. HISTORY OF HALT/HASS
Books on HALT/HASS
“Accelerated Reliability Engineering: HALT and
HASS”, Gregg Hobbs, 2000
“HALT, HASS, and HASA Explained”, Harry W.
McLean, 2009
“Improving Product Reliability: Strategies and
Implementation”, Levin and Kalal, 2003
“Accelerated Testing and Validation”, Alex Porter of
Intertek, 2004
“How Reliable Is Your Product: 50 Ways to Improve
Your Product Reliability”, Mike Silverman, 2010
Page 49
50. HISTORY OF HALT/HASS
Conferences
This ASTR Conference started in 1995 and we have
held every year since except 2001. This is the only
conference solely dedicated to accelerated testing
Other conferences with an ALT track
Applied Reliability Symposium (ARS)
Reliability and Maintainability Symposium (RAMS)
Other conference with a reliability focus
IRPS
Prognostics Conference (two of them)
Page 50
51. WHAT IS HALT?
HALT: A design technique used to discover product
weaknesses and improve design margins. The intent is to
systematically subject a product to stress stimuli
well beyond the expected field environments in order to
determine and expand the operating and destruct limits of
your product.
- 50 Ways to Improve Your Product Reliability, Mike Silverman
Page 51
52. WHAT IS NOT HALT?
What are some classic HALT misconceptions:
My product does not experience vibration so we can’t use it
The spec for this component is 70C so we can’t go above that
in HALT
We can’t drill holes in the product because it will change the
airflow
We must mount it in the same direction as it will be mounted in
the field
We don’t need to go above the first failure point because that is
what will fail first
Run to preset levels (remember this is not a pass/fail test)
Don’t stress beyond specifications
Only perform HALT at system level
Just perform HALT only when diags are fully ready
Page 52
53. BASICS OF HALT
Start low and step up the
stress, testing the product
during the stressing
Page 53
54. BASICS OF HALT
Gradually increase
stress level until a
failure occurs
Page 54
59. BASICS OF HALT
Classic S-N Diagram
(stress vs. number of cycles)
S0= Normal Stress conditions
S2
N0= Projected Normal Life
S1
S0
N2 N1 N0
Page 59
60. BASICS OF HALT
Classic S-N Diagram
(stress vs. number of cycles)
Point at which failures become non-relevant
S0= Normal Stress conditions
S2
N0= Projected Normal Life
S1
S0
N2 N1 N0
Page 60
63. NEW ADVANCES IN HALT
Along with improvements in chamber
technology, there have been advances in the
methodology as well.
Harry McLean’s HALT Calculator
To determine “Guard Band” Limits during the
HALT Plan
To determine AFR after HALT
Using FMEA to determine specific areas to test for
Linking HALT to ALT
Using HALT for software/firmware issues
Page 63
64. FUTURE OF HALT AND HASS
The number of companies performing HALT will
continue to rise as more labs obtain HALT equipment
The need for more education will continue to increase
Standards/guidance docs will gain more importance as
more companies and labs are doing HALT, many
incorrectly.
Chambers will need to provide stresses in addition to
temperature and vibration to keep up with the physics
of the failures (especially due to smaller packages and
MEMs devices).
Move away from people and move to process
HALT as acronym will fade away
Less HALT and more emphasis on DFR including HALT
Page 64
65. CONCLUSION
In this presentation
we took you through 40 years of HALT
showed you advances that have been made
pointed out areas where improvements are
still needed
Page 65
67. Poll Question 4
For the last RGT you performed, did you
have a chance to plan the duration and the
stresses?
a) Planned both
b) Planned duration only
c) Planned test environments
d) Did not plan RG test
Page 67
68. Tutorial Objectives
Show a synopsis of the tutorial which will be presented at the
ASTR 2013.
Reliability Growth Test objectives
Explain traditional Reliability Growth test methodology
Show shortcomings of the traditional methods
• Entire item failure rate not calculated
• Test duration too long for the modern high reliability items
• Little or no relationship of reliability and stresses
Show principles of the Physics of Failure test methodology
Show how the Reliability growth test based on PoF is constructed
Show how the expected stresses are applied and accelerated
Show reliability measures
Show advantages of the test PoF test design and acceleration
Show achieved considerable test cost reduction.
Page 68
69. Traditional RG Test Methodology
Applied stresses in test - magnitude equal to those in use
Involved assumptions of stress average magnitudes
Overall test duration determined based on the initial and goal
reliability measure: failure rates Mean Time Between
Failures, MTBF (or MTTF)
Environmental or operational stresses applied in sequence or
simultaneously at the use levels
Applied stress duration determined by engineering judgment
Overall test duration and stress application are unrelated to use profiles
or required life or mission of the product
Additional errors:
Random failure rates and those of not corrected failure modes not
added into the final failure rates
Page 69
70. Mathematics of Traditional Reliability Growth
Failure modes types in test:
Systematic: corrected in test (Type B), not corrected (Type A), Random -
constant
0,06 Item (t ) B (t ) A (t ) r (t )
1
Item (t ) t A (t ) r (t )
1
Item (t ) t
0,05
Failure intensity/failure rate (failures/hour)
Only type B failure modes failure
0,04
rates are accounted for in a
reliability test program – those that
show growth expressed by the
0,03 power law model; the type A and
S(t)= A(t)+ r(t)+ B(t)
random remain constant.
0,02
r(t)
The only failure modes with
decreasing failure rates
0,01
B(t) (power law)
A(t)
0
0 1000 2000 3000 4000 5000 6000
Test duration (hours)
Page 70
71. Information Needed – Information Obtained
Test duration is mathematically determined from:
1
tF log F tF log 1 t1
tF log t1
F 1 t1 1
t1 tF e
Where:
F = final product MTBF (for mitigated. “fixed” failure modes only) – given goal
I = initial product MTBF (for failure modes that will be mitigated) - assumed
tF =test duration needed to achieve the final MTBF for fixed failure modes
tI = initial test time (has various explanations) - assumed
= parameter initially assumed, then determined from analysis (power law) –
the test duration might change dependent on failure mitigation success
The test duration too high for high reliability items and depends on three
assumed parameters
Failure rate of the non-mitigated failure modes and those considered random
are assumed or predicted
Information obtained from mathematical Reliability Growth technique is
NOT product MTBF, it only is MTBF from the improved, B, failure modes
The A type (not fixed) and random failure modes are – forgotten!
Page 71
72. Physics of Failure and Reliability
Failures occur when an item is not strong enough to withstand one or
more attributes of a stress:
Level, duration, or repetitions of its application
• The higher the level the shorter duration or less repetitions induce a failure
The area of overlap of strength and stress distributions
represents probability of failure for each of the stresses;
L, L = mean and standard deviation of the load
distribution = b L
S, S = mean and standard deviation of the strength
distribution = a S
• If the mean of strength is a k times multiple of the mean of stress (load) and the
standard deviations of each are a and b times their respective mean
values, reliability of an item regarding each use stress (i), and the total reliability
will be: S
k L_i L_i RItem (t0 ) RStressi (ti )
Ri (k , L _i)
2 2 i 1
a k L_i b L_i
Page 72
73. Physics of Failure Reliability – Margin k Selection
Allocate reliability regarding each of the expected stresses in use
The cumulative damage and ultimately failure due to a stress is proportional
to the stress level and its duration. For the stress applied at the same level
as in life, the cumulative damage model is: D(t ) S (t ) dt
1.00
t
For the allocated reliability
0.95
regarding each stress, select
0.90 the value of margin k which
0.85
would multiply its duration in use
to be applied in test;
0.80
Apply stresses simultaneously
Reliability
Reliability
0.75
b=0,5
whenever possible;
0.70
a=0,05
b=0,2
If the same stress type is
a=0,05
b=0,05 applied at different levels in
a=0,05
0.65
b=0,2
a=0,02
use, recalculate their durations
0.60 b=0,1
a=0.02
to the highest level (using
0.55
b=0,05
a=0,02 acceleration factors);
The most common values for a
0.50
1.00 1.05 1.10 1.15 1.20 1.25 1.30 1.35 1.40 1.45 1.50
and b are:
Multiplier k a = 0.05, b = 0.2
Page 73
74. Test Acceleration
Each of the stresses is accelerated in test to allow for shorter test
duration
Total item failure rate is the sum of its failure rates regarding each
individual stress ( 0 is the item total failure rate in use condition and A is
the accelerated item total failure rate (in reliability growth is equivalent
to ):
NS
A ATest 0 Aj i
i 1 j i
Product j exists when the stresses 1 to j produce the same failure mode.
Stress acceleration models for different stresses – example:
inverse power law model (usually applicable to thermal
cycling, vibration, shock, humidity);
Arrhenius model (used for temperature acceleration using absolute temperature);
Eyring model (used also when the thermal stress is a factor in process acceleration);
step stress model, where the stress is increasing in steps;
fatigue model representing the degradation due to the repetitious stress.
Page 74
75. Test Example B Failure Modes
Stress/Requirement/Property Symbol/Value Units Determination of factor k – for major
Product life t0 h
stresses:
1
Time ON ta h/day
Ri (t 0 ) R0 (t 0 ) 4 0.946 k=1.5
Internal temperature when ON T ON °C 1
Internal temperature when OFF T OFF °C 0,95
0,9
Temperature change T Use °C
0,85
Rate of temperature change ς Use °C/min
0,8
Number of thermal cycles cT Cycles/day
Reliability
0,75 a=0,1
Temperature rise over the ambient T °C b=0,1
0,7
Relative humidity RH Use %
0,65
Distance travelled in product life D miles
0,6
Vibration level in use W Use g
0,55
Operational. (ON/OFF) cycling c Cycles/day
0,5
1,00 1,05 1,10 1,15 1,20 1,25 1,30 1,35 1,40 1,45 1,50
Stresses: Multiplier k
Thermal cycling
Thermal exposure (thermal dwell) Thermal dwell (normalize exposure when OFF to duration
Humidity at ON temperature):
Vibration tON _ N tON tOFF exp
Ea 1 1
Operational cycling kB TOFF 273 TON 273
Thermal cycling tON _ N 8,754 hours
TTest
m 1/ 3 NTC _ Use k Duration of accelerated exposure:
ATC ARamp _ Rate Test NTC _ Test
TUse Uset ATC ARamp _ Rate tT _ Test tON _ N k exp
Ea 1 1
kB TON 273 TTest 273
One thermal cycle in test = 24 hours in life
tT _ Test 168.1 h
Page 75
.
76. Test Example, Cont.
The thermal exposure is combined with the thermal cycling, distributed over the high temperature:
The test cycle profile:
tTC 2 (ramp time) (temp.Stabilizat ion ThermalDwell) Dwell at cold
125
tTC 2 22.3 5 52.3 min 0.875 h
10
Humidity: Test 95% RH and temperature TRH= 85 °C (65 °C chamber + 20 °C internal temperature
rise) h
RHUse Ea 1 1
t RH _ Test _ Test tON _ N exp
RH Test kB TON 273 TRH 273
h 2.3
t RH _ Test 300 h
Vibration: 150,000 miles, 150 hours per axis vibration at 1.7 g rms. Test level: 3.2 g rms
w
WUse
tVib _ Test k tVib _ Use
WTest
With : w 4
tVib _ Test 18 hours per axis
Data for reliability plotting:
Failure Time to Cumulative (t) log(t) log[ (t)]
failure time to
h failure Initial B failure modes MTBF 100,000 hours, final 106hours
(n=24)
1 3,821.33 91,711,92 91 ,711.92 4.96 4.96 Initial test time: 100 hours
2 5,781.33 138,751.92 69 ,375.96 5.14 4.84
3 14,016 336,384.00 112 ,128 5.53 5.05 Total traditional test time: 4.6x103hours
4 18,563.44 445 522,56 111, 380.64 5.65 5.05
t 0*k 131.400 3 ,153 ,600 788 ,400 6.50 5.90 Final test reliability (B failure modes): 0.99997
Final MTBF (improved failure modes):1,431,964 hours
Page 76
Total accelerated test time; 526 hours
77. Why Accelerated Reliability Growth ?
The test duration covers product entire life
It allows detection of all design problems, not only those that appear in a small
fraction of product life
It enables estimate of failure rate regarding product random events,
disregarded in traditional RG testing
The failure rate achieved by design improvement with the random failure rate
provides realistic estimate of total product reliability
Test duration is determined based on required total reliability in view of
product physical cumulative damage from life stresses in use;
Test acceleration allows achievement of very reasonable test duration,
shorter than traditional mathematically derived testing
The reliability improvement through test is no longer cost prohibitive
Test failure times are projected to their appearance in real life and the
analysis uses this data;
Even though covering the product expected life (durability information), it is
still considerably shorter than the traditional reliability
Page 77
78. Summary
What each of these 4 techniques have in
common is that
1) Each is a progressive accelerated
reliability technique being used today
2) Each will be highlighted as tutorials in our
Accelerated Stress Testing and Reliability
(ASTR) Workshop Oct 9-11 in San Diego
Page 78
78
79. Summary
What each of these 4 techniques have in
common is that
1) Each is a progressive accelerated
reliability technique being used today
2) Each will be highlighted as tutorials in our
Accelerated Stress Testing and Reliability
(ASTR) Workshop Oct 9-11 in San Diego
Page 79
79
80. To find out more about this year’s
Accelerated Stress Testing & Reliability (ASTR)
Workshop
October 9-11, 2013 San Diego, CA
EMAIL: mikes@opsalacarte.com
(MIKE IS THIS YEAR’S ASTR GENERAL CHAIR)
ALL THAT RESPOND WILL BE ENTERED INTO A DRAWING
FOR A FREE REGISTRATION TO THE CONFERENCE
Page 80
82. Contact Information
Ops A La Carte, LLC Ops A La Carte, LLC
Mike Silverman Lou LaVallee
Managing Partner Senior Reliability Engineer
(408) 472-3889 (585) 281-1882
mikes@opsalacarte.com loul@opsalacarte.com
www.opsalacarte.com www.opsalacarte.com
Raytheon Ridgetop Group
Milena Krasich Doug Goodman
Sr. System Engineer CEO
978-440-1578 (520) 742-3300
Milena_Krasich@raytheon.com doug.goodman@ridgetopgroup.com
www.ridgetopgroup.com
Page 82
Notas do Editor
Started with demand for monthly webinars to increase company footprint . DFR DRF & DFSS Solar Reliability Green Reliability Root cause analysis Medical risk based testing DOE for Reliability DOE for software testing DOE for simulation
Design for Reliability (DFR)AXD axiomatic design
Stress condition: Static, Cyclic, Dynamic LoadingFM’s: Load loss, creep, set, yielding, Fracture, damaged spring end , fatigue, buckling ,surging, complex stress change(t)Fracture, fracture fatigue, fretting corrosionCauses: Hydrogen embrittlement, flaws, high temp operation, stress concentrations from nicks, misalignment, low & high freq vibration, cycling temperature, corrosive atmosphere, insufficient space for operation, resonance surging. Sharp bends on spring endsFailures occur near spring ends. Conical, barrel, hourglass shaped