SlideShare uma empresa Scribd logo
1 de 5
TRAITS FOUND IN EFFECTIVE RELIABILITY PROGRAMS


         Fred Schenkelberg
         Reliability Engineering Consultant
         FMS Reliability
         Los Gatos, CA 95032
         fms@fmsreliability.com
         www.fmsreliability.com

SUMMARY

          Having the privilege to interview a cross-section of more than 70 product development teams to understand their
reliability program has led to a few observations. Only a rare few have mature, cost effective and efficient reliability
programs.

          A clear understanding of your organization’s reliability program along with a clear vision of what is possible is the
crucial first step to making systematic program improvements. This paper explores the key traits which separate good from
great reliability programs.

        Marketing, product volume, complexity and organizational structure do not tend to matter however a proactive
approach, statistical thinking, fact based decision making and integrated reliability tools do tend to make a difference. This
paper outlines how to assess your organization and provides highlights of key traits of good and simply great reliability
programs.


INTRODUCTION

         On one occasion I conducted assessments of two organizations located in the same building. Both designed and
manufactured telecommunication equipment with similar complexity and volume. The interview schedule had me going up
and down stairs almost every hour for two days and by midday of the first day I enjoyed going upstairs and dreaded heading
down. Despite all the similarities the two reliability programs were dramatically different; as different as their reliability
results.

          Downstairs the interviews started late, got interrupted by urgent phone calls or in-person requests; Firefighting at its
best. The team employed a wide range of tools, all that were listed on a checklist, for each project. The reliability goals were
not known to the design team and for the few that knew them also understood they would not be measured or impede getting
the product to market. The people I talked to stated reliability was very important and were very busy fixing field or testing
(just before product launch) identified issues. Reliability was done by the guy that left last year.

         Upstairs the interview started on time, without interruption. No one remembered the last time there was an urgent
need to resolve a field issue. The team employed reliability tools as needed that would benefit the project. The specific testing
was tailored to the risks identified during the design phase. The goals were widely known and current status was also known,
during development and after product launch. The people I talked to stated reliability was very important and they knew what
to do to meet their reliability objectives. Reliability thinking and skills were taught by Sharon who left last year.

         This paper touches on the key traits that separated these two groups.

BACKGROUND

         In 1983, John Young, CEO of Hewlett-Packard Company, noticed that the rate of growth of warranty was higher
than the rate of growth of revenue. He asked the corporation to reduce warranty by 10x by the end of the decade. One of the
key factors in the success of this program in changing warranty from 4% to 1.5% of net revenue was the identification and
encouragement of key reliability engineering practices. [Ireson, 5.1] Dick Moss conducted the survey and was my mentor at
HP.
In 1996, we were unable to tally the corporate warranty expense, the systems and metrics established in the late ‘80s
had been dismantled. When the result was finally determine, the corporation had lost ground and looked like it would
continue to grow the warranty expense faster than the revenue rate. Just like the early ‘80s many of the key reliability
practices were widely used, yet the results did not indicate any effectiveness. It was time to conduct another survey since a
few product divisions did have better results with respect to warranty expenses. So, I dusted off the old Moss survey.

          One item became clear as the survey progressed was that the culture of the product team and how they viewed
reliability seemed directly related to the results. This is similar to the quality maturity as described in “Quality is Free” by
Philip B. Crosby. Using the same approach for product reliability, the product teams with high maturity did have significantly
lower warranty expenses. Other attempts have explored this relationship between reliability activities, effectiveness and
results, including a current effort within IEEE to publish a reliability assessment standard. [Gullo]

          In my experience, product teams have asked for guidance on how to improve their product reliability (e.g. warranty
expenses), which is guidance on how to move to the right on the maturity matrix and become more effective in achieving
reliable product performance in the field. A few of these engagements involved reliability programs that already employed an
assortment of practices, yet each had one or two missing elements that kept them from achieving systemic improvements. It
is specifically these experiences that form the basis for this paper.

THE TRAITS

          There are three main interconnected threads that run through very effective programs. First, teams with clearly
stated reliability goals that are routinely estimated, measured and evaluated. Second, teams that make design decisions fully
considering the impact to the program and business. And, third, the team actively seeks failures and endeavor to learn as
much as possible from each failure. Each of these traits consists of a collection of tightly interwoven reliability tools or
practices. The specific tools vary from one team to the next due to volume, market, and other business priorities.



Trait 1: STATE CLEAR GOALS

         There are plenty of really bad reliability goal statements, like 20,000 hour MTBF, 5 year life, ‘as good or better
than…”, 2 year warranty, zero field failures. What very good programs have is a complete statement that permits the
organization to understand and use the goal to influence each design decision.

          A simple definition of reliability includes four elements: function, duration, probability and environment. Poor goal
statements are often only one of the four elements and force assumed values for the other elements. A complete reliability
goal statement includes all four elements as shown below.

         “Product FMS provides music storage and playback [key functions] for two years [duration] with 98% reliability
[probability of success over duration period] in a worldwide portable environment [environment].”

         Both the function and environment require further definition and is often done with other key documents or
references. For example many product development teams have a set of product specifications the design should meet. These
include size, color, features, and performance parameters. Generally the function element includes anything that the customer
would notice not working, and when it didn’t perform as expected, would call a failure. Understanding what is a failure from
the customer’s point-of-view tailor risk analysis and product evaluations to key elements important to customers.

           The environment includes shipping, storage, installation, startup, and use. Many organizations develop a set of
documents that capture the key features of their market’s environment. Many organizations rely on standards and do not
tailor, as the best do, the environmental parameters to reflect the experience of their products with their customers. For
example, the above MP3 player is likely to be on a car dash board in the sun – does the internal set of environmental
requirements capture this temperature extreme and expected duration? The better environmental statements include nominal
and expected range of values for temperature, humidity, shock, radiated emissions, usage profiles, and possibility numerous
other environmental and usage factors that define the most significant parameters that impact the short and long term
performance of the product with the customer. It is not a set of fixed profile tests.
A fully stated goal, often with multiple duration and associated probability statements (out of box, first 90 days,
warranty period and expected life are common durations of interest). Different failure mechanisms may exhibit failures with
a design at different points of time. For example shock and vibration from transportation to the customer may be the most
significant root cause of out of box failures, whereas mechanical fatigue may dominate the failures after the warranty period.
The full statement permits consideration of materials, assembly options, component selection and packaging approaches early
in the product design process.

          A reliability goal is just one of many constraints a design team must consider during product development. They
face a seemingly endless list of requirements, regulations, and business expectations. The three most common are
performance, schedule and cost. The performance is the functions i.e. what is the product supposed to do for the customer
and this is often key to the value the product provides. It is immediately measurable and either meets the performance
requirements or doesn’t. The first prototypes provide the first measures and are central to nearly every measure made and
reported during development and manufacturing. Schedule refers to the time to market requirement. The project has a target
date to have the product in its final form, on the shelf, ready for sale. The calendar measures this criteria and a series of
schedule milestones remind the design team of the deadline. Cost is often the bill of material cost and relates to the
profitability of the product. A simple spreadsheet listing the components and assembly costs can tally this every day for the
design team. All three are readily measurable. They each provide feedback to the team.

         Reliability, specifically the probability of successful operation at later durations is difficult at best to measure
accurately. The second element of this trait is the repeated and improving measure of reliability during the design process.
Goals without some method to track progress leaves the team guessing did they achieve the goal or are they on target. The
measure provides a means to make adjustments, to gauge readiness for the market.

         One of the best examples I’ve seen involved a weekly report to the design team on reliability. Each Friday, Phil
would gather the best available data or estimates for each of the major sub-systems of the product. On Monday he would
report the results of the tally against the reliability goals. Early in the program these estimates were based on historical data
from previously fielded products. As the design evolved the estimates received adjustments from parts count and vendor data
sources. For key elements the team invested in accelerated life testing or encouraged the vendor to perform the testing. And,
finally with later prototypes, the team conducted accelerated demonstration tests on the entire system using time compression
and elevated temperature. High temperatures accelerated most dominate high risk failure mechanisms and the team closely
monitored the first 6 months of field performance.

         During each stage of the product lifecycle the team received the best available measure of reliability. As the design
progressed and as the product become more functional, additional testing and estimates continued to improve. Just like the
other three major constraints (performance, schedule and cost) reliability measures provided regular feedback.

          A goal without a measure, like measures without a goal, provide limited value to the decision making process.
Clearly stating a fully expressed reliability goal and regularly measuring reliability permit the team to know where they are
going, if they are on track, and, when they have arrived.

Trait 2: ENABLE TRADEOFFS

          A single key piece of information is all that is required to enable designers to balance reliability with performance,
time to market and cost. This information exists within any product shipping company, and is nearly always unknown to the
design team. Providing the cost of a field return value in dollars permits the designer to translate reliability differences into
dollars.
          For example if the projected shipments are 1000 units a month and a return costs the company $450 (call center,
repair/replacement, shipping, failure analysis, are examples of elements of this value) translate into the value of a 1% change
in reliability (from 92% to 93%, for example) would reduce the returns cost by $4,500 per month. Taking this example a bit
further, assume it would cost (bill of material cost) $1/unit more to achieve the change in field failure rate, is this worth the
increase in bill of material cost? Certainly, as the savings is $4,500/1000 units or $4.50 per unit shipped. Adding, $3.50 to
profit for each unit shipped.

        For high risk areas or major elements of a design, the team may face multiple options to trade off cost, time to
market or functionality each with associated costs. By understanding the impact to reliability these trade offs can be fully
considered. Teams that do this well use it during component selection, during design solution comparisons and during design
optimization. Teams that do this well seek the areas for the best return for the investment, whether that is component cost,
functionality, schedule or reliability.

Trait 3: SECURE FAILURES

         “The concept of failure is central to the design process, and it is by thinking in terms of obviating failure that
successful designs are achieved.” [Petroski]

          Product teams understand the product should just work for the customer. It shouldn’t fail. In my experience design
teams tend to imagine possible failure modes and attempt to design the product to avoid or mitigate the failure. It may be a
point of litigation if the product fails in a manner that should have been anticipated by the design team. More often it is the
business case that a product that doesn’t fail, will sell better and have lower warranty expenses. Hence, a reliable product is
more profitable.

          The best teams aggressively seek failures in the design over the entire product lifecycle. In early concept phases,
consider the fundamental limits of the chosen technology. Also, consider the types of stresses expected during use and project
effect onto the core technology. A Failure Mode and Effect Analysis (FMEA) may help reveal high risk areas for further
analysis. With the first prototypes, the team now can directly evaluate performance and discover failure mechanisms through
testing such as Highly Accelerated Life Testing (HALT). And, during the product launch, the team can either confirm or
discover the way the product fails in use. In all cases, a technical understanding of the interaction of the design with the
applied stress (use, temperature, vibration, etc.) permits the team to uncover the design flaw the revealed itself as a failure.

         Reliability growth modeling is based on the premise that every design has an unknown and finite number of design
flaws. The product development process is the careful uncovering and resolving of as many of these flaws as possible before
shipping the products. At some point finding the remaining flaws is not worth the effort (cost and time). The remaining flaws
have acceptable field reliability.

          Just finding the failures is a key first step in this trait. Many failures, once revealed to a design team, highlights
various design changes that will reduce or eliminate the same failures in the improved design. On some occasions, the failure
only is a symptom and treating the assumed cause of the failure does not remove the flaw. For example, an intermittent over
voltage power supply may cause sensitive integrated circuits (IC) to fail. The IC failure may indicate a faulty component, and
it’s replacement does not change the underlying root cause of the failure. It will happen again. Or, the faulty power supply
may cause another component to fail. With careful failure analysis of the broken IC, the root cause of over voltage would
lead to investigating the power supply. Once the power supply design is fixed, the failure symptom of blown IC’s goes away.

          Another element of this trait is the pursuit of every failure. Imagine during prototyping 100 units are created and
distributed to various parts of the team for evaluation and testing. Some failures may occur with all 100 units, some failures
occur with about half and some occur on only one unit. The first two cases are obvious flaws that need attention and
resolution before shipping, as the sample failure rates approximates a 100% and 50% field failure rate.

         Now let’s further assume the product goal is 95% reliability over the first year and that five units revealed a design
flaw each, and furthermore, the other 95 units function without fault. The team is done, right? No, first there is an issue with
the sample of 100 units with five failures estimating the population’s failure rate. The nominal estimate is 5/100 or 5%, which
is the same as 95% reliability. We have to assume all 100 units experienced at least of year of operation (very unlikely) or
those other functional units did not replicate the failure when exposed to the stress that uncovered the fault (more likely).
Using a 90% confidence that the sample represents the population, the actual reliability could be as low as 63%.

         Also, consider use conditions, environment, manufacturing and components all vary, the actually failure rate will
certainly be worse than that estimated during development. Therefore, even relatively rare failures in the development
process require careful analysis and resolution.

       In other words, each and every failure is a gift to learn about design flaws within a product. Using tools like FMEA
and HALT permit the team to uncover the faults as soon as possible.


CONCLUSION
Product teams that regularly produce reliable products (the upstairs team) have these three traits in common.

             •    First a complete reliability statement with regular measurement.
             •    Second, the ability to translate reliability changes into dollars,
             •    Third, the aggressive discovery and resolution of failures.

          Each of these is more than using a reliability engineering tool. They are a collection of tools working together to
encourage and enable the engineer to develop a product that meets the customer’s expectations of reliability. When all the
pieces are in place the opportunity to meet reliability and business goals improves. The results of the upstairs team has been
repeated by other teams that carefully assessed their development program and adjusted to include all the elements of the
three traits.


REFERENCES

Crosby, Philip B., Quality if Free: The Art of Making Quality Certain, Mentor, New York, 1979.

Gullo, Louis J., et. al., "Assessment of Organizational Reliability Capability", Components and Packaging Technologies,
IEEE Transactions, June, 2006, Vol. 29, Issue 2, 425-428.

Ireson, W. Grant, Coombs, Clyde F. and Moss, Richard Y., Handbook of Reliability Engineering and Management, 2nd Ed.,
McGraw-Hill, New York, 1996.

Petroski, Henry, Design Paradigms: Case Histories of Error and Judgment in Engineering, Cambridge University Press, 1994.

Mais conteúdo relacionado

Mais de Accendo Reliability

08-Master the basics carousel.pdf
08-Master the basics carousel.pdf08-Master the basics carousel.pdf
08-Master the basics carousel.pdfAccendo Reliability
 
07-Manufacturer Recommended Maintenance.pdf
07-Manufacturer Recommended Maintenance.pdf07-Manufacturer Recommended Maintenance.pdf
07-Manufacturer Recommended Maintenance.pdfAccendo Reliability
 
06-Is a Criticality Analysis Required.pdf
06-Is a Criticality Analysis Required.pdf06-Is a Criticality Analysis Required.pdf
06-Is a Criticality Analysis Required.pdfAccendo Reliability
 
05-Failure Modes Right Detail.pdf
05-Failure Modes Right Detail.pdf05-Failure Modes Right Detail.pdf
05-Failure Modes Right Detail.pdfAccendo Reliability
 
04-Equipment Experts Couldn't believe response.pdf
04-Equipment Experts Couldn't believe response.pdf04-Equipment Experts Couldn't believe response.pdf
04-Equipment Experts Couldn't believe response.pdfAccendo Reliability
 
Reliability Engineering Management course flyer
Reliability Engineering Management course flyerReliability Engineering Management course flyer
Reliability Engineering Management course flyerAccendo Reliability
 
How to Create an Accelerated Life Test
How to Create an Accelerated Life TestHow to Create an Accelerated Life Test
How to Create an Accelerated Life TestAccendo Reliability
 
Getting Started with Reliability Engineering
Getting Started with Reliability EngineeringGetting Started with Reliability Engineering
Getting Started with Reliability EngineeringAccendo Reliability
 

Mais de Accendo Reliability (20)

08-Master the basics carousel.pdf
08-Master the basics carousel.pdf08-Master the basics carousel.pdf
08-Master the basics carousel.pdf
 
07-Manufacturer Recommended Maintenance.pdf
07-Manufacturer Recommended Maintenance.pdf07-Manufacturer Recommended Maintenance.pdf
07-Manufacturer Recommended Maintenance.pdf
 
06-Is a Criticality Analysis Required.pdf
06-Is a Criticality Analysis Required.pdf06-Is a Criticality Analysis Required.pdf
06-Is a Criticality Analysis Required.pdf
 
05-Failure Modes Right Detail.pdf
05-Failure Modes Right Detail.pdf05-Failure Modes Right Detail.pdf
05-Failure Modes Right Detail.pdf
 
03-3 Ways to Do RCM.pdf
03-3 Ways to Do RCM.pdf03-3 Ways to Do RCM.pdf
03-3 Ways to Do RCM.pdf
 
04-Equipment Experts Couldn't believe response.pdf
04-Equipment Experts Couldn't believe response.pdf04-Equipment Experts Couldn't believe response.pdf
04-Equipment Experts Couldn't believe response.pdf
 
02-5 RCM Myths Carousel.pdf
02-5 RCM Myths Carousel.pdf02-5 RCM Myths Carousel.pdf
02-5 RCM Myths Carousel.pdf
 
01-5 CBM Facts.pdf
01-5 CBM Facts.pdf01-5 CBM Facts.pdf
01-5 CBM Facts.pdf
 
Lean Manufacturing
Lean ManufacturingLean Manufacturing
Lean Manufacturing
 
Reliability Engineering Management course flyer
Reliability Engineering Management course flyerReliability Engineering Management course flyer
Reliability Engineering Management course flyer
 
How to Create an Accelerated Life Test
How to Create an Accelerated Life TestHow to Create an Accelerated Life Test
How to Create an Accelerated Life Test
 
Reliability Programs
Reliability ProgramsReliability Programs
Reliability Programs
 
Reliability Distributions
Reliability DistributionsReliability Distributions
Reliability Distributions
 
R Software and Reliability
R Software and ReliabilityR Software and Reliability
R Software and Reliability
 
Getting Started with Reliability Engineering
Getting Started with Reliability EngineeringGetting Started with Reliability Engineering
Getting Started with Reliability Engineering
 
ALT Approaches for Reliability
ALT Approaches for ReliabilityALT Approaches for Reliability
ALT Approaches for Reliability
 
Environmental Testing
Environmental TestingEnvironmental Testing
Environmental Testing
 
Break the Always Cycle
Break the Always CycleBreak the Always Cycle
Break the Always Cycle
 
Building a Reliability Plan
Building a Reliability PlanBuilding a Reliability Plan
Building a Reliability Plan
 
Establishing Reliability Goals
Establishing Reliability GoalsEstablishing Reliability Goals
Establishing Reliability Goals
 

Último

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 

Último (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 

Traits Found in Effective Reliability Programs

  • 1. TRAITS FOUND IN EFFECTIVE RELIABILITY PROGRAMS Fred Schenkelberg Reliability Engineering Consultant FMS Reliability Los Gatos, CA 95032 fms@fmsreliability.com www.fmsreliability.com SUMMARY Having the privilege to interview a cross-section of more than 70 product development teams to understand their reliability program has led to a few observations. Only a rare few have mature, cost effective and efficient reliability programs. A clear understanding of your organization’s reliability program along with a clear vision of what is possible is the crucial first step to making systematic program improvements. This paper explores the key traits which separate good from great reliability programs. Marketing, product volume, complexity and organizational structure do not tend to matter however a proactive approach, statistical thinking, fact based decision making and integrated reliability tools do tend to make a difference. This paper outlines how to assess your organization and provides highlights of key traits of good and simply great reliability programs. INTRODUCTION On one occasion I conducted assessments of two organizations located in the same building. Both designed and manufactured telecommunication equipment with similar complexity and volume. The interview schedule had me going up and down stairs almost every hour for two days and by midday of the first day I enjoyed going upstairs and dreaded heading down. Despite all the similarities the two reliability programs were dramatically different; as different as their reliability results. Downstairs the interviews started late, got interrupted by urgent phone calls or in-person requests; Firefighting at its best. The team employed a wide range of tools, all that were listed on a checklist, for each project. The reliability goals were not known to the design team and for the few that knew them also understood they would not be measured or impede getting the product to market. The people I talked to stated reliability was very important and were very busy fixing field or testing (just before product launch) identified issues. Reliability was done by the guy that left last year. Upstairs the interview started on time, without interruption. No one remembered the last time there was an urgent need to resolve a field issue. The team employed reliability tools as needed that would benefit the project. The specific testing was tailored to the risks identified during the design phase. The goals were widely known and current status was also known, during development and after product launch. The people I talked to stated reliability was very important and they knew what to do to meet their reliability objectives. Reliability thinking and skills were taught by Sharon who left last year. This paper touches on the key traits that separated these two groups. BACKGROUND In 1983, John Young, CEO of Hewlett-Packard Company, noticed that the rate of growth of warranty was higher than the rate of growth of revenue. He asked the corporation to reduce warranty by 10x by the end of the decade. One of the key factors in the success of this program in changing warranty from 4% to 1.5% of net revenue was the identification and encouragement of key reliability engineering practices. [Ireson, 5.1] Dick Moss conducted the survey and was my mentor at HP.
  • 2. In 1996, we were unable to tally the corporate warranty expense, the systems and metrics established in the late ‘80s had been dismantled. When the result was finally determine, the corporation had lost ground and looked like it would continue to grow the warranty expense faster than the revenue rate. Just like the early ‘80s many of the key reliability practices were widely used, yet the results did not indicate any effectiveness. It was time to conduct another survey since a few product divisions did have better results with respect to warranty expenses. So, I dusted off the old Moss survey. One item became clear as the survey progressed was that the culture of the product team and how they viewed reliability seemed directly related to the results. This is similar to the quality maturity as described in “Quality is Free” by Philip B. Crosby. Using the same approach for product reliability, the product teams with high maturity did have significantly lower warranty expenses. Other attempts have explored this relationship between reliability activities, effectiveness and results, including a current effort within IEEE to publish a reliability assessment standard. [Gullo] In my experience, product teams have asked for guidance on how to improve their product reliability (e.g. warranty expenses), which is guidance on how to move to the right on the maturity matrix and become more effective in achieving reliable product performance in the field. A few of these engagements involved reliability programs that already employed an assortment of practices, yet each had one or two missing elements that kept them from achieving systemic improvements. It is specifically these experiences that form the basis for this paper. THE TRAITS There are three main interconnected threads that run through very effective programs. First, teams with clearly stated reliability goals that are routinely estimated, measured and evaluated. Second, teams that make design decisions fully considering the impact to the program and business. And, third, the team actively seeks failures and endeavor to learn as much as possible from each failure. Each of these traits consists of a collection of tightly interwoven reliability tools or practices. The specific tools vary from one team to the next due to volume, market, and other business priorities. Trait 1: STATE CLEAR GOALS There are plenty of really bad reliability goal statements, like 20,000 hour MTBF, 5 year life, ‘as good or better than…”, 2 year warranty, zero field failures. What very good programs have is a complete statement that permits the organization to understand and use the goal to influence each design decision. A simple definition of reliability includes four elements: function, duration, probability and environment. Poor goal statements are often only one of the four elements and force assumed values for the other elements. A complete reliability goal statement includes all four elements as shown below. “Product FMS provides music storage and playback [key functions] for two years [duration] with 98% reliability [probability of success over duration period] in a worldwide portable environment [environment].” Both the function and environment require further definition and is often done with other key documents or references. For example many product development teams have a set of product specifications the design should meet. These include size, color, features, and performance parameters. Generally the function element includes anything that the customer would notice not working, and when it didn’t perform as expected, would call a failure. Understanding what is a failure from the customer’s point-of-view tailor risk analysis and product evaluations to key elements important to customers. The environment includes shipping, storage, installation, startup, and use. Many organizations develop a set of documents that capture the key features of their market’s environment. Many organizations rely on standards and do not tailor, as the best do, the environmental parameters to reflect the experience of their products with their customers. For example, the above MP3 player is likely to be on a car dash board in the sun – does the internal set of environmental requirements capture this temperature extreme and expected duration? The better environmental statements include nominal and expected range of values for temperature, humidity, shock, radiated emissions, usage profiles, and possibility numerous other environmental and usage factors that define the most significant parameters that impact the short and long term performance of the product with the customer. It is not a set of fixed profile tests.
  • 3. A fully stated goal, often with multiple duration and associated probability statements (out of box, first 90 days, warranty period and expected life are common durations of interest). Different failure mechanisms may exhibit failures with a design at different points of time. For example shock and vibration from transportation to the customer may be the most significant root cause of out of box failures, whereas mechanical fatigue may dominate the failures after the warranty period. The full statement permits consideration of materials, assembly options, component selection and packaging approaches early in the product design process. A reliability goal is just one of many constraints a design team must consider during product development. They face a seemingly endless list of requirements, regulations, and business expectations. The three most common are performance, schedule and cost. The performance is the functions i.e. what is the product supposed to do for the customer and this is often key to the value the product provides. It is immediately measurable and either meets the performance requirements or doesn’t. The first prototypes provide the first measures and are central to nearly every measure made and reported during development and manufacturing. Schedule refers to the time to market requirement. The project has a target date to have the product in its final form, on the shelf, ready for sale. The calendar measures this criteria and a series of schedule milestones remind the design team of the deadline. Cost is often the bill of material cost and relates to the profitability of the product. A simple spreadsheet listing the components and assembly costs can tally this every day for the design team. All three are readily measurable. They each provide feedback to the team. Reliability, specifically the probability of successful operation at later durations is difficult at best to measure accurately. The second element of this trait is the repeated and improving measure of reliability during the design process. Goals without some method to track progress leaves the team guessing did they achieve the goal or are they on target. The measure provides a means to make adjustments, to gauge readiness for the market. One of the best examples I’ve seen involved a weekly report to the design team on reliability. Each Friday, Phil would gather the best available data or estimates for each of the major sub-systems of the product. On Monday he would report the results of the tally against the reliability goals. Early in the program these estimates were based on historical data from previously fielded products. As the design evolved the estimates received adjustments from parts count and vendor data sources. For key elements the team invested in accelerated life testing or encouraged the vendor to perform the testing. And, finally with later prototypes, the team conducted accelerated demonstration tests on the entire system using time compression and elevated temperature. High temperatures accelerated most dominate high risk failure mechanisms and the team closely monitored the first 6 months of field performance. During each stage of the product lifecycle the team received the best available measure of reliability. As the design progressed and as the product become more functional, additional testing and estimates continued to improve. Just like the other three major constraints (performance, schedule and cost) reliability measures provided regular feedback. A goal without a measure, like measures without a goal, provide limited value to the decision making process. Clearly stating a fully expressed reliability goal and regularly measuring reliability permit the team to know where they are going, if they are on track, and, when they have arrived. Trait 2: ENABLE TRADEOFFS A single key piece of information is all that is required to enable designers to balance reliability with performance, time to market and cost. This information exists within any product shipping company, and is nearly always unknown to the design team. Providing the cost of a field return value in dollars permits the designer to translate reliability differences into dollars. For example if the projected shipments are 1000 units a month and a return costs the company $450 (call center, repair/replacement, shipping, failure analysis, are examples of elements of this value) translate into the value of a 1% change in reliability (from 92% to 93%, for example) would reduce the returns cost by $4,500 per month. Taking this example a bit further, assume it would cost (bill of material cost) $1/unit more to achieve the change in field failure rate, is this worth the increase in bill of material cost? Certainly, as the savings is $4,500/1000 units or $4.50 per unit shipped. Adding, $3.50 to profit for each unit shipped. For high risk areas or major elements of a design, the team may face multiple options to trade off cost, time to market or functionality each with associated costs. By understanding the impact to reliability these trade offs can be fully considered. Teams that do this well use it during component selection, during design solution comparisons and during design
  • 4. optimization. Teams that do this well seek the areas for the best return for the investment, whether that is component cost, functionality, schedule or reliability. Trait 3: SECURE FAILURES “The concept of failure is central to the design process, and it is by thinking in terms of obviating failure that successful designs are achieved.” [Petroski] Product teams understand the product should just work for the customer. It shouldn’t fail. In my experience design teams tend to imagine possible failure modes and attempt to design the product to avoid or mitigate the failure. It may be a point of litigation if the product fails in a manner that should have been anticipated by the design team. More often it is the business case that a product that doesn’t fail, will sell better and have lower warranty expenses. Hence, a reliable product is more profitable. The best teams aggressively seek failures in the design over the entire product lifecycle. In early concept phases, consider the fundamental limits of the chosen technology. Also, consider the types of stresses expected during use and project effect onto the core technology. A Failure Mode and Effect Analysis (FMEA) may help reveal high risk areas for further analysis. With the first prototypes, the team now can directly evaluate performance and discover failure mechanisms through testing such as Highly Accelerated Life Testing (HALT). And, during the product launch, the team can either confirm or discover the way the product fails in use. In all cases, a technical understanding of the interaction of the design with the applied stress (use, temperature, vibration, etc.) permits the team to uncover the design flaw the revealed itself as a failure. Reliability growth modeling is based on the premise that every design has an unknown and finite number of design flaws. The product development process is the careful uncovering and resolving of as many of these flaws as possible before shipping the products. At some point finding the remaining flaws is not worth the effort (cost and time). The remaining flaws have acceptable field reliability. Just finding the failures is a key first step in this trait. Many failures, once revealed to a design team, highlights various design changes that will reduce or eliminate the same failures in the improved design. On some occasions, the failure only is a symptom and treating the assumed cause of the failure does not remove the flaw. For example, an intermittent over voltage power supply may cause sensitive integrated circuits (IC) to fail. The IC failure may indicate a faulty component, and it’s replacement does not change the underlying root cause of the failure. It will happen again. Or, the faulty power supply may cause another component to fail. With careful failure analysis of the broken IC, the root cause of over voltage would lead to investigating the power supply. Once the power supply design is fixed, the failure symptom of blown IC’s goes away. Another element of this trait is the pursuit of every failure. Imagine during prototyping 100 units are created and distributed to various parts of the team for evaluation and testing. Some failures may occur with all 100 units, some failures occur with about half and some occur on only one unit. The first two cases are obvious flaws that need attention and resolution before shipping, as the sample failure rates approximates a 100% and 50% field failure rate. Now let’s further assume the product goal is 95% reliability over the first year and that five units revealed a design flaw each, and furthermore, the other 95 units function without fault. The team is done, right? No, first there is an issue with the sample of 100 units with five failures estimating the population’s failure rate. The nominal estimate is 5/100 or 5%, which is the same as 95% reliability. We have to assume all 100 units experienced at least of year of operation (very unlikely) or those other functional units did not replicate the failure when exposed to the stress that uncovered the fault (more likely). Using a 90% confidence that the sample represents the population, the actual reliability could be as low as 63%. Also, consider use conditions, environment, manufacturing and components all vary, the actually failure rate will certainly be worse than that estimated during development. Therefore, even relatively rare failures in the development process require careful analysis and resolution. In other words, each and every failure is a gift to learn about design flaws within a product. Using tools like FMEA and HALT permit the team to uncover the faults as soon as possible. CONCLUSION
  • 5. Product teams that regularly produce reliable products (the upstairs team) have these three traits in common. • First a complete reliability statement with regular measurement. • Second, the ability to translate reliability changes into dollars, • Third, the aggressive discovery and resolution of failures. Each of these is more than using a reliability engineering tool. They are a collection of tools working together to encourage and enable the engineer to develop a product that meets the customer’s expectations of reliability. When all the pieces are in place the opportunity to meet reliability and business goals improves. The results of the upstairs team has been repeated by other teams that carefully assessed their development program and adjusted to include all the elements of the three traits. REFERENCES Crosby, Philip B., Quality if Free: The Art of Making Quality Certain, Mentor, New York, 1979. Gullo, Louis J., et. al., "Assessment of Organizational Reliability Capability", Components and Packaging Technologies, IEEE Transactions, June, 2006, Vol. 29, Issue 2, 425-428. Ireson, W. Grant, Coombs, Clyde F. and Moss, Richard Y., Handbook of Reliability Engineering and Management, 2nd Ed., McGraw-Hill, New York, 1996. Petroski, Henry, Design Paradigms: Case Histories of Error and Judgment in Engineering, Cambridge University Press, 1994.