SlideShare uma empresa Scribd logo
1 de 10
Weekday Dependence in Reliability Analysis of Repairable Systems

                                   Alexandre Zolotovitski, David Trindade
                                                       Sun Microsystems Inc.
                                                {alex.zolot, david.trindade} @sun.com




Abstract
A useful and informative way of representing the reliability behavior of repairable systems is plotting the Mean
Cumulative Function (MCF) and the Recurrence Rate (RR) versus the age or calendar date in days of the observed
systems. Usually this approach implicitly assumes that systems operate uniformly 24 hours per day and seven days per
week, the so-called 24/7. We analyzed weekday dependence statistics of failures and found that often frequency of
failures on weekends was significantly lower than on work days. More detailed analysis showed that all systems could
be separated into two groups: “5d systems” that never had failures on Saturdays and Sundays and “7d systems” that
had failures any day of the week.
Typically, the ratio of average 24/7 weekly RR's for these two groups is close to 5:7. The implication is that
quot;5d systemsquot; do not age during week-ends. Neglecting this fact and aggregating statistics of systems of both types can
lead to extra noise and bias in parameter estimation.

Key Words: Reliability analyses, field data, mean cumulative function, repairable system


                                       Analysis of Aggregated Systems

Plotting the average number of failures across systems, called the Mean Cumulative Function (MCF), and its
derivative, called the Recurrence Rate (RR), versus the age or calendar date in days of observed systems is a useful and
informative way of representing the reliability behavior of repairable systems.
 This approach is widely used in Sun Microsystems for analysis of statistics of failures both for hardware (datacenters,
servers, storages, network routers, and so on) and software. This method is far more informative than using common
single value metrics such as Mean Time between Failure (MTBF) and a Mean Time to Repair/Recovery (MTTR).
These summary statistics typically assume a homogeneous Poisson process (HPP), that is, a renewal process where the
times between events are derived from a single distribution, are independent, and exponentially distributed with a
constant rate of occurrence. These presumptions are often not satisfied in practice, and so MCFs, a non-parametric
approach, are increasingly being used to monitor the reliability of repairable systems in the field [2,3,4].

The reliability behaviour of a single machine can best be shown as a cumulative plot, which graphs the number of
failures (outages) versus time, where time can be age from installation or a calendar date. The MCF, which shows the
average number of failures across all systems versus time, represents the behavior of a group of machines.



There is an implicit assumption that systems operate uniformly 24 hours per day and seven days per week, the so-called
24/7. However, this assumption could be suspect for many important practical situations.
We might expect weekday dependence for several reasons:

1. Reduced work hours on weekends cause reduced loads on systems. This effect should not change the types of failure

modes observed.




                                                                                                                      1
2. Maintenance is often done on week-ends and may include changing of software and hardware configurations, with

initial application and test on Mondays of the new configurations. These procedures have the potential to introduce new

failure modes different from those in normal operation.

To investigate possible effects we analyzed the frequency distributions of numbers of failures vs. weekday from a large

database of systems in the field. Results are shown on Chart 1.



                                                   Sum of fails .total


                            2000


                            1500


                            1000


                             500


                               0
                                    Sun     Mon      Tue     Wed         Thu   Fri   Sat


                                      Chart 1: Number of failures vs. weekday

It is obvious from the Chart 1 that
a) The number of failures is much lower on weekends: about 25% of the mean workday level.
b) Considering just the workdays (Monday through Friday), the number of failures on Mondays is higher (106%) and
on Fridays is lower (92%) than the five day workday average.
A - test for homogeneity shows that a null-hypothesis of constant frequency during M-F workdays is rejectable
with a p-value of 0.0013.

It is interesting to compare the weekday dependence of the number of failures that is shown on Chart 1 with the
weekday dependence of other characteristics such as the total number of active systems (Chart 2) and the number of
systems installed on specific day (Chart 3).




                                                                                                                     2
Sum of systems.active

                                   80000


                                   60000


                                   40000


                                   20000


                                        0



                                              on




                                                                u
                                                      e




                                                                       i
                                          n




                                                                              t
                                                          ed



                                                                    Fr

                                                                           Sa
                                                               Th
                                                   Tu
                                       Su

                                              M



                                                          W
                              Chart 2: Total number of active systems vs. weekday

We see from Chart 2 that slightly fewer systems are active on weekends compared to weekdays (within +4.0%, -6.3%
of the overall average), but this small difference does not account for the observed behavior.

                                                  Sum of NewSyst

                                     400


                                     300


                                     200


                                     100


                                       0
                                                    e
                                              on




                                                                u



                                                                           t
                                                                    i
                                          n




                                                          ed



                                                                    Fr
                                                                         Sa
                                                   Tu



                                                               Th
                                       Su

                                              M



                                                        W




                         Chart 3: Number of new systems installed on specific weekday

Chart 3 shows that numbers of new systems installed are lower on Sundays and Mondays and higher on Tuesdays and
Saturdays.




It is possible that different failure modes can show weekday dependence due to varying usage and maintenance
patterns of systems. Chart 4 shows this comparison, which
 does not confirm this hypothesis.




                                                                                                              3
Chart 4: Proportions of sums of different failure modes

We see from the Chart 4 that there are no essential differences in proportions for the sums of the different types of
failures comparing weekdays to weekends. The - test for homogeneity has a p-value of 0.52. The failure
distributions are basically the same.


The weekday effect introduces extra noise (or adds a periodic component) to charts of RR and MCF. If we calculate
RR from the MCF chart and we choose a time window that does not have a whole number multiple of weeks, then we
may bias the results and add variation.
          To illustrate, let us assume we have a 5d system that has 1 failure every weekday and no failures on weekends.
We calculate a moving average RR using 3-day windows. Starting from a Sunday (that is, S-M-T), we get the
following seven point sequence for the RR: 2/3, 1, 1, 1, 2/3, 1/3, 1/3. If we use only weekdays, we get 1, 1, 1, 1, 1. In
the first case we have a bias and a periodic component that are absent in the second approach. However, if we use
weekly averaging over a 7 day window, we get the following sequence for RR (starting from a Sunday): 5/7, 5/7, 5/7,
5/7, 5/7, 5/7, 5/7.

Chart 5 compares the recurrence rates estimated using 19 and 21 day windows. There are slight differences, but
separating out the weekday effect from the smoothing effect of a larger window is not easily accomplished by looking
at the RRs.




                                                                                                                       4
Chart 5: RR calculated with 19 and 21 day windows. It is difficult to separate noise from a periodic component,
                                       related to the weekday dependence.

To separate the week-day effect from age-effect we could use trend-period-noise decomposition in the same way as it

is done for data with seasonality [6]:


                  RR(t) = Trend(t) * F(Weekday(t)) * Noise(t).

This method can be complicated. Alternatively,
we can use two simpler approaches to exclude the week-day effect:
1. Aggregate data to whole weeks and plot weekly instead of daily data. Then with or without an age effect we could
expect the same frequency of failure every week.
2. Associate the frequency of failure on different weekdays as result of different number of working hours each day and
plot on x-axis the cumulative number of work hours instead of days.

The first way is simpler and may be accurate enough, because after the aggregation we still can do trend - noise
decomposition. As we see from Chart 1, quot;by weekquot;, we can obtain the trend clear of noise by choosing the spreadsheet
functions TREND or AVERAGEA with the smoothing window at least 7 weeks. The trend line still contains the age
effect but is clear of week-day effect and noise.




                                                                                                                     5
500                                                                          fails.total.Rel
                                   Chart 1. Fails/machine vs Date in weeks                   7w Trend
                                                                                             12 w eeks trend
                400
                                                                                             12 w eeks avg


                300
        .
        Fails




                200



                100



                  0
                      0           20               40                  60             80                   100
                                                         Weeks
                               Chart 6: RR calculated with 7 and 12 weeks window.

                                             By-System Analysis

It is interesting to analyze the weekday failure distribution by systems.
 For each system we calculate a week-end index (WEI) and Monday and Friday indices:

WEI = (Number of Weekend failures*7/2) / Total # of failures

For 5d systems, WEI should be zero. For 7d systems, WEI should be near 1.

MonInd = (Number of Monday failures*5) / Total # of workday failures
FriInd = (Number of Friday failures*5) / Total # of workday failures

For no Monday or Friday effects, MonInd and FriInd should be near 1.

The data consisted of 934 systems with known serial numbers and installation dates. We do not analyze early failures
here, and so we account only for failures that happened after one week. We split the systems into three groups:
quot;5 work day systemsquot; (quot;5dquot;) - 74% of the systems that never had a week-end failure
quot;7 work day systemsquot; (quot;7dquot;) - 22% of the systems that had week-end index >= .25
quot;Intermediate systemsquot; (quot;iquot;) - 4% of the systems that had week-end index between 0 and .25 .




                                                                                                                  6
Ch.2. CDF of Week-end index (WEI)

              1


             0.8


             0.6
         P




             0.4


             0.2


              0
                   0                   1                     2                      3                     4
                                                            WEI



                                Chart 7: Distribution of the Weekend Index (WEI).

To validate the hypothesis that the absence of week-end failures is the result of reporting week-end failures on
Monday, we calculate MonInd separately for 5d and 7d systems. If there is the same failure rate on week-ends as work
days with misreporting, we would have MonInd for 5d systems about 3 times higher than for 7d systems. We found
that in reality MonInd for 5d systems is only about 6% higher than for 7d systems, and so the misreporting does not
play an essential role.

To validate the hypothesis that 5d systems do not age at the same rate during week-ends we compare MCFs for 5d and
7d systems.
 The results are in Chart 8 and Chart 9.




                                                                                                                  7
Ch.1. Duane Plot, point=failure



                      2



                                                y = 1.20x - 2.39
                                                                              ln(7/5)
          ln( MCF )




                      0
                           0          1                 2                 3                 4                  5


                                                 y = 1.16x - 2.73

                      -2

                                                                           ln(MCF_Tot_5d)

                                                                           ln(MCF_Tot_7d)

                                                                           Linear (ln(MCF_Tot_7d)     )
                      -4
                                                                           Linear (ln(MCF_Tot_5d)     )



                                                              ln(AgeW)
                      -6



                               Chart 8: Duane plots for 5d and 7d systems, point=system.

We see that Duane plots for MCF have very close values of the slope β for 5d and 7d systems (trend lines are almost
parallel) and the ratio of the intercepts in Chart 8
α_7d / α_5d = exp(-1.81)/exp(-2.22) = 1.5 is quite close to 7/5 = 1.4 . Thus, it looks as if 5d systems really do not age
during week-ends.

The difference between MCF plots on Chart 8 and Chart 9 is the result of plotting 1 point = 1 failure on Chart 8 and 1
point = 1 week on Chart 9. The data consists of more systems with small ages and fewer systems with large ages.
Consequently, the weight of small ages is higher on Chart 8. We would expect that if we had the constant number of
systems for all ages, then the result should be closer to Chart 9. So we consider results 1 point = 1 week on
Chart 9 to be a more accurate representation.




                                                                                                                       8
Ch.3. Duane plot, point=w eek .
                                     3


                                     2
                                                                y = 1.04x - 1.81

                                     1
                     ln( MCF )




                                                                                         ln(7/5)
                                     0
                                         0          1               2                3                    4                   5
                                                                y = 1.03x - 2.22
                                 -1

                                                                                                   ln(MCF_Tot_5d)
                                 -2
                                                                                                   ln(MCF_Tot_7d)
                                                                                                   Linear (ln(MCF_Tot_7d) )
                                                                                                   Linear (ln(MCF_Tot_5d) )
                                 -3
                                                                         ln(iAgeW)

                                             Chart 9: Duane plots for 5d and 7d systems, point=week.



                                              Chart 10. Comparing RR across customers

                     0.04


                                                                                                          y = 1.53x + 0.002
                                                                                                              R2 = 0.47
                     0.03
Recurrence Rate 7d




                     0.02


                                                                                                              y
                                                                                                              95% Conf.Limits
                     0.01
                                                                                                              1.4 * x



                                 0
                                     0                            0.01                                    0.02
                                                                  Recurrence Rate 5d

                                             Chart 10: Comparing Recurrence Rates across customers


                                                                                                                                  9
Chart 10 shows a comparison of average RRs for 5d and 7d systems across several customers that have 10 or more
systems of each type. The average RR was obtained by dividing the total number of failures by the total age in days for
each type of system..
 One clearly sees that RR for 7d systems is usually significantly higher than RR for 5d systems, with a ratio that is
about 1.6, which is even higher than 7/5 = 1.4. The obvious inference is that the 7d systems fail at a higher rate over 7
days than the 5d systems fail over 5 days.


                                                    Conclusions

We see that weekday dependence is significant in failure behavior and must be taken into account using work days
instead of calendar days or analyzing statistics of failure for 5d and 7d systems separately.
Neglecting this fact and aggregating statistics of systems of both types can lead to extra noise and possible bias in
parameter estimation.



                                                     References

D.C. Trindade, Swami Nathan, “Simple Plots for Monitoring the Field Reliability of Repairable Systems”, Proceedings
     of the Annual Reliability and Maintainability Symposium (RAMS), Alexandria, Jan 2005.
W. Nelson, Recurrence Events Data Analysis for Product Repairs, Disease Recurrence and Other Applications, ASA-
     SIAM Series in Statistics and Applied Probability, 2003.
P.A. Tobias, D.C. Trindade, Applied Reliability, 2nd ed., Chapman and Hall/CRC, 1995.
W.Q. Meeker, L.A. Escobar, Statistical Methods for Reliability Data, Wiley Interscience, 1998.
H. Ascher, H. Feingold, Repairable Systems Reliability: Modeling, Inference, Misconceptons and their Causes, Marcel
     Dekker, 1984.
http://en.wikipedia.org/wiki/Seasonal _adjustment, http://en.wikipedia.org/wiki/Decomposing_of_time_series




                                                                                                                     10

Mais conteúdo relacionado

Semelhante a Weekday Dependence in Reliability Analysis of Repairable Systems

Ijsom19001398886200
Ijsom19001398886200Ijsom19001398886200
Ijsom19001398886200IJSOM
 
Unknown input observer for Takagi-Sugeno implicit models with unmeasurable pr...
Unknown input observer for Takagi-Sugeno implicit models with unmeasurable pr...Unknown input observer for Takagi-Sugeno implicit models with unmeasurable pr...
Unknown input observer for Takagi-Sugeno implicit models with unmeasurable pr...IJECEIAES
 
APPLICATION OF VARIABLE FUZZY SETS IN THE ANALYSIS OF SYNTHETIC DISASTER DEGR...
APPLICATION OF VARIABLE FUZZY SETS IN THE ANALYSIS OF SYNTHETIC DISASTER DEGR...APPLICATION OF VARIABLE FUZZY SETS IN THE ANALYSIS OF SYNTHETIC DISASTER DEGR...
APPLICATION OF VARIABLE FUZZY SETS IN THE ANALYSIS OF SYNTHETIC DISASTER DEGR...ijfls
 
Multiple Constraints Consideration in Power System State Estimation
Multiple Constraints Consideration in Power System State EstimationMultiple Constraints Consideration in Power System State Estimation
Multiple Constraints Consideration in Power System State EstimationIOSR Journals
 
Temporal disaggregation methods
Temporal disaggregation methodsTemporal disaggregation methods
Temporal disaggregation methodsStephen Bradley
 
Short-term load forecasting with using multiple linear regression
Short-term load forecasting with using multiple  linear regression Short-term load forecasting with using multiple  linear regression
Short-term load forecasting with using multiple linear regression IJECEIAES
 
IEOR 265 Final Paper_Minchao Lin
IEOR 265 Final Paper_Minchao LinIEOR 265 Final Paper_Minchao Lin
IEOR 265 Final Paper_Minchao LinMinchao Lin
 
2Building Distribution-Level Base Cases Through a State Estimator
2Building Distribution-Level Base Cases Through a State Estimator2Building Distribution-Level Base Cases Through a State Estimator
2Building Distribution-Level Base Cases Through a State Estimatordavidtrebolle
 
Data Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVAData Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVADerek Kane
 
Queue with Breakdowns and Interrupted Repairs
Queue with Breakdowns and Interrupted RepairsQueue with Breakdowns and Interrupted Repairs
Queue with Breakdowns and Interrupted Repairstheijes
 
Autocorrelation- Concept, Causes and Consequences
Autocorrelation- Concept, Causes and ConsequencesAutocorrelation- Concept, Causes and Consequences
Autocorrelation- Concept, Causes and ConsequencesShilpa Chaudhary
 
Credit Analogue (2003)
Credit Analogue (2003)Credit Analogue (2003)
Credit Analogue (2003)Texxi Global
 

Semelhante a Weekday Dependence in Reliability Analysis of Repairable Systems (17)

Ijsom19001398886200
Ijsom19001398886200Ijsom19001398886200
Ijsom19001398886200
 
Unknown input observer for Takagi-Sugeno implicit models with unmeasurable pr...
Unknown input observer for Takagi-Sugeno implicit models with unmeasurable pr...Unknown input observer for Takagi-Sugeno implicit models with unmeasurable pr...
Unknown input observer for Takagi-Sugeno implicit models with unmeasurable pr...
 
APPLICATION OF VARIABLE FUZZY SETS IN THE ANALYSIS OF SYNTHETIC DISASTER DEGR...
APPLICATION OF VARIABLE FUZZY SETS IN THE ANALYSIS OF SYNTHETIC DISASTER DEGR...APPLICATION OF VARIABLE FUZZY SETS IN THE ANALYSIS OF SYNTHETIC DISASTER DEGR...
APPLICATION OF VARIABLE FUZZY SETS IN THE ANALYSIS OF SYNTHETIC DISASTER DEGR...
 
Multiple Constraints Consideration in Power System State Estimation
Multiple Constraints Consideration in Power System State EstimationMultiple Constraints Consideration in Power System State Estimation
Multiple Constraints Consideration in Power System State Estimation
 
Glm
GlmGlm
Glm
 
Test Reports
Test ReportsTest Reports
Test Reports
 
Temporal disaggregation methods
Temporal disaggregation methodsTemporal disaggregation methods
Temporal disaggregation methods
 
Short-term load forecasting with using multiple linear regression
Short-term load forecasting with using multiple  linear regression Short-term load forecasting with using multiple  linear regression
Short-term load forecasting with using multiple linear regression
 
IEOR 265 Final Paper_Minchao Lin
IEOR 265 Final Paper_Minchao LinIEOR 265 Final Paper_Minchao Lin
IEOR 265 Final Paper_Minchao Lin
 
2Building Distribution-Level Base Cases Through a State Estimator
2Building Distribution-Level Base Cases Through a State Estimator2Building Distribution-Level Base Cases Through a State Estimator
2Building Distribution-Level Base Cases Through a State Estimator
 
Data Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVAData Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVA
 
16928_5302_1.pdf
16928_5302_1.pdf16928_5302_1.pdf
16928_5302_1.pdf
 
Queue with Breakdowns and Interrupted Repairs
Queue with Breakdowns and Interrupted RepairsQueue with Breakdowns and Interrupted Repairs
Queue with Breakdowns and Interrupted Repairs
 
Autocorrelation- Concept, Causes and Consequences
Autocorrelation- Concept, Causes and ConsequencesAutocorrelation- Concept, Causes and Consequences
Autocorrelation- Concept, Causes and Consequences
 
Tutorial marzo2011 villen
Tutorial marzo2011 villenTutorial marzo2011 villen
Tutorial marzo2011 villen
 
TO_EDIT
TO_EDITTO_EDIT
TO_EDIT
 
Credit Analogue (2003)
Credit Analogue (2003)Credit Analogue (2003)
Credit Analogue (2003)
 

Último

Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...BookNet Canada
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfAarwolf Industries LLC
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...amber724300
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...Karmanjay Verma
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentMahmoud Rabie
 
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sectoritnewsafrica
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsYoss Cohen
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Karmanjay Verma
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Nikki Chapple
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialJoão Esperancinha
 

Último (20)

Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdf
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career Development
 
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platforms
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorial
 

Weekday Dependence in Reliability Analysis of Repairable Systems

  • 1. Weekday Dependence in Reliability Analysis of Repairable Systems Alexandre Zolotovitski, David Trindade Sun Microsystems Inc. {alex.zolot, david.trindade} @sun.com Abstract A useful and informative way of representing the reliability behavior of repairable systems is plotting the Mean Cumulative Function (MCF) and the Recurrence Rate (RR) versus the age or calendar date in days of the observed systems. Usually this approach implicitly assumes that systems operate uniformly 24 hours per day and seven days per week, the so-called 24/7. We analyzed weekday dependence statistics of failures and found that often frequency of failures on weekends was significantly lower than on work days. More detailed analysis showed that all systems could be separated into two groups: “5d systems” that never had failures on Saturdays and Sundays and “7d systems” that had failures any day of the week. Typically, the ratio of average 24/7 weekly RR's for these two groups is close to 5:7. The implication is that quot;5d systemsquot; do not age during week-ends. Neglecting this fact and aggregating statistics of systems of both types can lead to extra noise and bias in parameter estimation. Key Words: Reliability analyses, field data, mean cumulative function, repairable system Analysis of Aggregated Systems Plotting the average number of failures across systems, called the Mean Cumulative Function (MCF), and its derivative, called the Recurrence Rate (RR), versus the age or calendar date in days of observed systems is a useful and informative way of representing the reliability behavior of repairable systems. This approach is widely used in Sun Microsystems for analysis of statistics of failures both for hardware (datacenters, servers, storages, network routers, and so on) and software. This method is far more informative than using common single value metrics such as Mean Time between Failure (MTBF) and a Mean Time to Repair/Recovery (MTTR). These summary statistics typically assume a homogeneous Poisson process (HPP), that is, a renewal process where the times between events are derived from a single distribution, are independent, and exponentially distributed with a constant rate of occurrence. These presumptions are often not satisfied in practice, and so MCFs, a non-parametric approach, are increasingly being used to monitor the reliability of repairable systems in the field [2,3,4]. The reliability behaviour of a single machine can best be shown as a cumulative plot, which graphs the number of failures (outages) versus time, where time can be age from installation or a calendar date. The MCF, which shows the average number of failures across all systems versus time, represents the behavior of a group of machines. There is an implicit assumption that systems operate uniformly 24 hours per day and seven days per week, the so-called 24/7. However, this assumption could be suspect for many important practical situations. We might expect weekday dependence for several reasons: 1. Reduced work hours on weekends cause reduced loads on systems. This effect should not change the types of failure modes observed. 1
  • 2. 2. Maintenance is often done on week-ends and may include changing of software and hardware configurations, with initial application and test on Mondays of the new configurations. These procedures have the potential to introduce new failure modes different from those in normal operation. To investigate possible effects we analyzed the frequency distributions of numbers of failures vs. weekday from a large database of systems in the field. Results are shown on Chart 1. Sum of fails .total 2000 1500 1000 500 0 Sun Mon Tue Wed Thu Fri Sat Chart 1: Number of failures vs. weekday It is obvious from the Chart 1 that a) The number of failures is much lower on weekends: about 25% of the mean workday level. b) Considering just the workdays (Monday through Friday), the number of failures on Mondays is higher (106%) and on Fridays is lower (92%) than the five day workday average. A - test for homogeneity shows that a null-hypothesis of constant frequency during M-F workdays is rejectable with a p-value of 0.0013. It is interesting to compare the weekday dependence of the number of failures that is shown on Chart 1 with the weekday dependence of other characteristics such as the total number of active systems (Chart 2) and the number of systems installed on specific day (Chart 3). 2
  • 3. Sum of systems.active 80000 60000 40000 20000 0 on u e i n t ed Fr Sa Th Tu Su M W Chart 2: Total number of active systems vs. weekday We see from Chart 2 that slightly fewer systems are active on weekends compared to weekdays (within +4.0%, -6.3% of the overall average), but this small difference does not account for the observed behavior. Sum of NewSyst 400 300 200 100 0 e on u t i n ed Fr Sa Tu Th Su M W Chart 3: Number of new systems installed on specific weekday Chart 3 shows that numbers of new systems installed are lower on Sundays and Mondays and higher on Tuesdays and Saturdays. It is possible that different failure modes can show weekday dependence due to varying usage and maintenance patterns of systems. Chart 4 shows this comparison, which does not confirm this hypothesis. 3
  • 4. Chart 4: Proportions of sums of different failure modes We see from the Chart 4 that there are no essential differences in proportions for the sums of the different types of failures comparing weekdays to weekends. The - test for homogeneity has a p-value of 0.52. The failure distributions are basically the same. The weekday effect introduces extra noise (or adds a periodic component) to charts of RR and MCF. If we calculate RR from the MCF chart and we choose a time window that does not have a whole number multiple of weeks, then we may bias the results and add variation. To illustrate, let us assume we have a 5d system that has 1 failure every weekday and no failures on weekends. We calculate a moving average RR using 3-day windows. Starting from a Sunday (that is, S-M-T), we get the following seven point sequence for the RR: 2/3, 1, 1, 1, 2/3, 1/3, 1/3. If we use only weekdays, we get 1, 1, 1, 1, 1. In the first case we have a bias and a periodic component that are absent in the second approach. However, if we use weekly averaging over a 7 day window, we get the following sequence for RR (starting from a Sunday): 5/7, 5/7, 5/7, 5/7, 5/7, 5/7, 5/7. Chart 5 compares the recurrence rates estimated using 19 and 21 day windows. There are slight differences, but separating out the weekday effect from the smoothing effect of a larger window is not easily accomplished by looking at the RRs. 4
  • 5. Chart 5: RR calculated with 19 and 21 day windows. It is difficult to separate noise from a periodic component, related to the weekday dependence. To separate the week-day effect from age-effect we could use trend-period-noise decomposition in the same way as it is done for data with seasonality [6]: RR(t) = Trend(t) * F(Weekday(t)) * Noise(t). This method can be complicated. Alternatively, we can use two simpler approaches to exclude the week-day effect: 1. Aggregate data to whole weeks and plot weekly instead of daily data. Then with or without an age effect we could expect the same frequency of failure every week. 2. Associate the frequency of failure on different weekdays as result of different number of working hours each day and plot on x-axis the cumulative number of work hours instead of days. The first way is simpler and may be accurate enough, because after the aggregation we still can do trend - noise decomposition. As we see from Chart 1, quot;by weekquot;, we can obtain the trend clear of noise by choosing the spreadsheet functions TREND or AVERAGEA with the smoothing window at least 7 weeks. The trend line still contains the age effect but is clear of week-day effect and noise. 5
  • 6. 500 fails.total.Rel Chart 1. Fails/machine vs Date in weeks 7w Trend 12 w eeks trend 400 12 w eeks avg 300 . Fails 200 100 0 0 20 40 60 80 100 Weeks Chart 6: RR calculated with 7 and 12 weeks window. By-System Analysis It is interesting to analyze the weekday failure distribution by systems. For each system we calculate a week-end index (WEI) and Monday and Friday indices: WEI = (Number of Weekend failures*7/2) / Total # of failures For 5d systems, WEI should be zero. For 7d systems, WEI should be near 1. MonInd = (Number of Monday failures*5) / Total # of workday failures FriInd = (Number of Friday failures*5) / Total # of workday failures For no Monday or Friday effects, MonInd and FriInd should be near 1. The data consisted of 934 systems with known serial numbers and installation dates. We do not analyze early failures here, and so we account only for failures that happened after one week. We split the systems into three groups: quot;5 work day systemsquot; (quot;5dquot;) - 74% of the systems that never had a week-end failure quot;7 work day systemsquot; (quot;7dquot;) - 22% of the systems that had week-end index >= .25 quot;Intermediate systemsquot; (quot;iquot;) - 4% of the systems that had week-end index between 0 and .25 . 6
  • 7. Ch.2. CDF of Week-end index (WEI) 1 0.8 0.6 P 0.4 0.2 0 0 1 2 3 4 WEI Chart 7: Distribution of the Weekend Index (WEI). To validate the hypothesis that the absence of week-end failures is the result of reporting week-end failures on Monday, we calculate MonInd separately for 5d and 7d systems. If there is the same failure rate on week-ends as work days with misreporting, we would have MonInd for 5d systems about 3 times higher than for 7d systems. We found that in reality MonInd for 5d systems is only about 6% higher than for 7d systems, and so the misreporting does not play an essential role. To validate the hypothesis that 5d systems do not age at the same rate during week-ends we compare MCFs for 5d and 7d systems. The results are in Chart 8 and Chart 9. 7
  • 8. Ch.1. Duane Plot, point=failure 2 y = 1.20x - 2.39 ln(7/5) ln( MCF ) 0 0 1 2 3 4 5 y = 1.16x - 2.73 -2 ln(MCF_Tot_5d) ln(MCF_Tot_7d) Linear (ln(MCF_Tot_7d) ) -4 Linear (ln(MCF_Tot_5d) ) ln(AgeW) -6 Chart 8: Duane plots for 5d and 7d systems, point=system. We see that Duane plots for MCF have very close values of the slope β for 5d and 7d systems (trend lines are almost parallel) and the ratio of the intercepts in Chart 8 α_7d / α_5d = exp(-1.81)/exp(-2.22) = 1.5 is quite close to 7/5 = 1.4 . Thus, it looks as if 5d systems really do not age during week-ends. The difference between MCF plots on Chart 8 and Chart 9 is the result of plotting 1 point = 1 failure on Chart 8 and 1 point = 1 week on Chart 9. The data consists of more systems with small ages and fewer systems with large ages. Consequently, the weight of small ages is higher on Chart 8. We would expect that if we had the constant number of systems for all ages, then the result should be closer to Chart 9. So we consider results 1 point = 1 week on Chart 9 to be a more accurate representation. 8
  • 9. Ch.3. Duane plot, point=w eek . 3 2 y = 1.04x - 1.81 1 ln( MCF ) ln(7/5) 0 0 1 2 3 4 5 y = 1.03x - 2.22 -1 ln(MCF_Tot_5d) -2 ln(MCF_Tot_7d) Linear (ln(MCF_Tot_7d) ) Linear (ln(MCF_Tot_5d) ) -3 ln(iAgeW) Chart 9: Duane plots for 5d and 7d systems, point=week. Chart 10. Comparing RR across customers 0.04 y = 1.53x + 0.002 R2 = 0.47 0.03 Recurrence Rate 7d 0.02 y 95% Conf.Limits 0.01 1.4 * x 0 0 0.01 0.02 Recurrence Rate 5d Chart 10: Comparing Recurrence Rates across customers 9
  • 10. Chart 10 shows a comparison of average RRs for 5d and 7d systems across several customers that have 10 or more systems of each type. The average RR was obtained by dividing the total number of failures by the total age in days for each type of system.. One clearly sees that RR for 7d systems is usually significantly higher than RR for 5d systems, with a ratio that is about 1.6, which is even higher than 7/5 = 1.4. The obvious inference is that the 7d systems fail at a higher rate over 7 days than the 5d systems fail over 5 days. Conclusions We see that weekday dependence is significant in failure behavior and must be taken into account using work days instead of calendar days or analyzing statistics of failure for 5d and 7d systems separately. Neglecting this fact and aggregating statistics of systems of both types can lead to extra noise and possible bias in parameter estimation. References D.C. Trindade, Swami Nathan, “Simple Plots for Monitoring the Field Reliability of Repairable Systems”, Proceedings of the Annual Reliability and Maintainability Symposium (RAMS), Alexandria, Jan 2005. W. Nelson, Recurrence Events Data Analysis for Product Repairs, Disease Recurrence and Other Applications, ASA- SIAM Series in Statistics and Applied Probability, 2003. P.A. Tobias, D.C. Trindade, Applied Reliability, 2nd ed., Chapman and Hall/CRC, 1995. W.Q. Meeker, L.A. Escobar, Statistical Methods for Reliability Data, Wiley Interscience, 1998. H. Ascher, H. Feingold, Repairable Systems Reliability: Modeling, Inference, Misconceptons and their Causes, Marcel Dekker, 1984. http://en.wikipedia.org/wiki/Seasonal _adjustment, http://en.wikipedia.org/wiki/Decomposing_of_time_series 10