In this START Sheet, we discuss practical considerations regarding the derivation of reliability estimations in the important case when the underlying distribution is Exponential. For example, we examine the interpretation and implications of statements such as "a device is 99% reliable with 90% confidence" or "the failure rate of the device is, with probability 0.95, between 0.07 and 0.1." We begin by discussing what is understood by reliability estimation and addressing the "confidence" that we place in these estimations.

Broadly speaking, reliability is the probability that a device will function according to its specifications, for a pre-established period, sometimes called "mission time". If it is a oneshot device, reliability implies that it fulfills its prescribed function in the brief but crucial time that the device must work. The failure rate, for the exponential case, is the constant rate at which the device is failing, given in some time or operational domain or context (e.g., in parts per million hours or per thousand cycles of operations, etc.).

An estimation is a "snap shot" quantification of the present condition of some entity (e.g., device reliability, failure rate). We can obtain current estimates of these values either as "point estimations" (e.g., reliability is 0.99) or "interval estimations" (e.g., reliability of the device is between 0.97 and 0.99, with probability 0.95). The latter case is called a confidence interval (CI), for there is a "confidence" (probability of occurrence) associated with the interval given (e.g., the true but unknown device reliability falls in the 0.97 to 0.99 interval, 95% of the times that we implement this procedure). In expressing a "confidence", we hope that the specified result obtained is actually one of the (95%) favorable outcomes. But it may also come from one of the (5%) unfavorable cases.

The confidence level is a crucial performance measure of interval estimation (References 1, 2, 4, and 6) and can be illustrated with the following example. One may estimate the reliability of a device to be at least 0.95, with probability 0.8 (80% of the times we obtain this estimate, the true reliability is 0.95 or above). Another estimate may yield a reliability of 0.90 for the same device, but with probability 0.9 (e.g., 90% of the times). Similarly, an interval estimation of the failure rate of a device may be "between 0.07 and 0.1, with probability 0.7" (i.e., with 70% chance). A second CI may estimate the device failure rate to be "between 0.05 and 0.12, with probability 0.99" (i.e., the true failure rate of the device falls within these values, 99% of the times such CI are obtained).

Even when all of the preceding statements are equally correct, the second statements in both examples are broader but more accurate. The first example provides a lower limit for reliability (e.g., 0.90 < 0.95). The cost we pay for this type of interval is in the higher confidence level of the second estimations (e.g., 80% versus 90% of the times). In the second example, the CI has a larger coverage (much wider than the first CI). But this wider interval is also associated with a larger coverage probability (99%). Such trade-offs between estimation precision and confidence are always present and constitute an important factor in confidence interval estimation.

Finally, one should not confuse an "estimation", which pertains to a current value, with a "prediction". The latter is the forecast of a future condition (e.g., the future reliability of a device, say under development) obtained at present for a pre-specified base value (some future time). General reliability predictions and forecasts will be the topic of a separate START Sheet.

In this START Sheet, we analyze the important but special case of reliability estimation when the life of the device is Exponential. This distribution has only one parameter, the mean (or equivalently rate λ = 1/θ) and is important because the failure rate λ is constant. This results, for example, when a device is screen-tested before having being placed in operation and the replacement policy prevents it from operating beyond "useful life". Then, for all practical purposes the device is considered as having a constant failure rate λ.

The Exponential distribution is also a special case because it has only one parameter (the mean). Hence, knowledge about mean life (or failure rate) implies knowledge about the corresponding device reliability. Then, a CI for the mean life induces an equivalent CI for the reliability or an equivalent CI for the failure rate.

In the rest of this START Sheet, we discuss problems associated with deriving CI for reliability estimations in the case when the distribution of device lives is Exponential and develop several numerical and graphical examples to illustrate such derivations and their problems (References 3 and 5).

Reliability Data Analysis

Reliability data analysis includes three components: the raw data, the statistic used to synthesize it and the underlying statistical distribution of the variable. In this START Sheet, such distribution is assumed Exponential (how to test for this distribution has been discussed in Reference 6).

The raw data consists of the device lives, or of the test times with the number of failures that occurred. Data are then synthesized into two statistics: Total Test Time, denoted "T", and Total Number of Failures, denoted "n". Statistic T, however, can be implemented as, and interpreted in, different ways, according to how the original test times were collected.

For example, in reliability testing we can place "n" devices on operation and then observe them for some pre-specified time (T_{0}) or until "k" devices fail, where k n. The failure times (when known) are denoted T_{1}, T_{2}, Tk and the Total Test Time statistic is then T = Σ T_{i}. If k = n then we have tested the entire sample, or until all n devices have failed. This is the situation treated in this START Sheet.

On other occasions, we test a set of devices for a pre-specified time T_{0} and then count the number "n" of devices that have failed during this time (but we do not know the exact times when the failures occurred).

The importance of knowing which of the two life testing schemes have been implemented is that the choice determines the "degrees of freedom" (denoted DF) that the corresponding Chi-Square statistic will use. This statistic, in turn, depends on having an underlying Exponential distribution for the life data of the devices.

When the underlying distribution of the lives is indeed Exponential and the failures are independent, the distribution of the statistic "twice the Total Test Time divided by the mean life" i.e., (2*T/θ) is distributed as a Chi-Square with γ = 2n DF. The Exponential distribution then allows us (given a pre-specified probability α) to find the Chi-Square percentile (via a ChiSquare table value: χ^{2}_{α/2,γ}) that defines a relation between the statistic T and the Exponential mean θ (see Figure 1).

For, if 2T/θ is distributed Chi-Square with DF = 2n then the α- percentile χ^{2}_{α/2,γ},
allows us to obtain a probability bound for the unknown device mean life (or the reliability or the failure rate). We can estimate or bound θ, using Equation 1.

Figure 1. The Chi-Square Distribution of 2*Total Test Time/θ = 2*T/θ (Click to Zoom)

The Chi-Square distribution is readily tabulated and its parameter is the corresponding DF. For a time terminated test, the DF used are γ = 2n + 2 (twice the number of failures plus two). In the case of failure terminated tests, the DF used are γ = 2n (twice the number of failures observed during this time).

For example, assume we place ten devices on test and want to construct a 95% CI for the unknown mean θ. Assume we know the exact failure times of all ten devices (n = 10). We need to look up the upper (1 - α/2) and lower (α/2) percentiles corresponding to DF = 2n, for a confidence 1 - α = 0.95 (95%). Since we seek a (two-sided) CI, we need to split the 5% error (0.95 =
1 - = 1 - 0.05) into two halves (α/2 = 0.025 or 2.5%) on each extreme of the Chi-Square distribution (see Figure 1). To obtain the Chi-Square table values, enter the Chi-Square table and find the columns for 0.025 (lower) and 0.975 (upper) percentiles. Go down these two columns until reaching the desired DF: γ = 2n =
2 x 10 = 20. In the present example we obtain the following (upper/lower) results (see Figure 1).

If, instead of knowing the exact failure times we had only the Total Test Time (T) and Total Number of Failures (n) we should use DF = 2n + 2 instead. Therefore, in the same example above, now for DF = γ = 2n + 2 = 20 + 2 = 22, we obtain:

All the other Exponential parameters of interest (reliability and failure rate) are obtained directly from the Exponential mean. We next illustrate how to obtain CIs for the failure rate and the device reliability, from the mean, using a large set of life data.

A Numerical Example for Failure-Terminated Tests

The life test data given in Table 1 comes from an Exponential distribution. These T_{1}, T_{2}, ... Tn failure times correspond to (n = 45) devices D1 ... D45, placed on a reliability test as represented in Figure 2.

Table 1. Device Life Data

12.411

58.526

46.684

49.022

77.084

7.400

21.491

28.637

16.263

53.533

93.241

43.911

33.771

78.954

399.071

102.947

118.077

61.894

72.435

108.561

46.252

40.479

95.291

10.291

27.668

116.729

149.432

59.067

199.458

45.771

272.005

60.266

233.254

87.592

137.149

50.668

89.601

313.879

150.011

173.580

220.413

182.737

6.171

162.792

82.273

Figure 2. Representation of the Times to Failure of the "n" Devices on Test (Click to Zoom)

Using the statistic of total test time, we obtain the following point estimator of the mean life.

Est. Mean Life = Total Test Time / Sample Size = Σ T_{i} / n
= 4495.75/45 = 99.9

The reciprocal of the mean life, yields the point estimate of the device failure rate:

Failure Rate = 1 / Mean Life = 1 / 99.9 = 0.01001

We first verify whether the statistical distribution of the life of the device is Exponential and whether the data came from independent observations (Reference 7). We then use appropriate statistical reliability methods to calculate the CI. In the Exponential case we use the Chi-Square (χ2) distribution and the Total Test Time (T) statistic to obtain a CI for the (true but unknown) device mean life θ (or rate λ = 1/θ). The formula to obtain an Exponential CI for the true, but unknown mean , with a confidence level 100(1 - α)%, is given by:

In this example: T = Σ T_{i} = 4496.75 is the Total Test Time and n
= k = 45 is the (full) sample size. The test (lives) is failure terminated. Hence, the Chi-Square table value (χ^{2}_{α/2,γ}) has DF = 2n = 90. Confidence coefficient (1 - α) is then selected, according to whether our CI requires an 80%, 90%, 95% confidence, etc. A 95% confidence (1 - α) yields = 0.05, α/2 = 0.025 and 1 - α/2 = 0.975. Therefore, the Chi-Square table values for our example are:

The corresponding CI for the true mean life , with confidence level of 95% is:

(2 x 4496.75/118.14; 2 x 4496.75/65.65) = (76.13, 136.99)

Since the failure rate λ = 1/θ, an associated CI for the failure rate, with confidence level of 95%, can also be obtained by using the reciprocal values of the above CI for the mean:

(1/136.99, 1/76.13) = (0.0073, 0.0131)

Such a CI means that, 95% of the times that we derive it from test data, the true but unknown failure rate (λ) is between 0.0073 and 0.0131 (but 5% of the times it can be elsewhere).

Finally, because the Exponential is a one-parameter distribution, the device reliability at any given mission time T is also obtained using the mean as follows: R(T) = P{X ≥ T} = Exp{-T/θ) =
Exp{-λT}.

Then, a 95% CI for the reliability at any mission time T can be obtained by using the mean or the failure rate CI upper/lower limits. For our example and for T = 100, we use the upper/lower limits of the CI for the failure rate (ρ) and obtain:

Therefore, a 95% CI for the reliability for a mission time T = 100 units is: (0.27, 0.48).

Such CI means that, 95% of the times we derive it from test data, the true but unknown reliability for such mission time is between 0.27 and 0.48.

Often, we just need a lower or upper bound on reliability. Assume we are interested in a 90% reliability lower bound for the above example and mission time T = 100. We re-estimate the failure rate bound, for the error α = 0.1, for only one side. This changes the Chi-Square table percentile. The new Chi-Square table (1 - α) percentile (corresponding to DF = 90 and α = 0.1) is now 107.57 and we use it to obtain the 90% confidence bound for the Exponential mean (or its reciprocal, the failure rate).

The 90% Lower Bound for the (unknown) mean, induces a failure rate bound:

Which, in turn, allows us to calculate a 90% Lower Bound for the reliability "R" of the device, for T = 100, as:

R(100) = P{X ≥ 100} = Exp{-100 x λ) =
Exp{-100 x 0.01196} = 0.3024

Such a Lower Bound means that, 90% of the times we derive it from test data, the true but unknown device reliability is at least 0.3024, for mission time T = 100.

Other Numerical Examples of Confidence Intervals and Bounds for the Exponential Case

We now construct a 80% CI for the mean of the n = 45 data given in Table 1.

Proceeding likewise, we obtain the CI limits for the mean, rates and reliability for mission times of T = 50, 30, and 20 hours as shown in Table 2. For comparison, the exact reliability values for lives that are distributed Exponential (100) are also given.

Table 2. CIs for Mean, Rate, and Reliability

Parameter

Upper Lim for 80% CI

Lower Lim for 80% CI

Exact Reliability

Mean

122.711

83.6060

N/A

Rate

0.0119609

0.0081492

N/A

Reliability
for T =

100

0.442674

0.302375

0.367879

50

0.665337

0.549887

0.606531

30

0.783114

0.698496

0.740818

20

0.849604

0.787244

0.818731

A Numerical Example for Time-Terminated Tests

Assume we now place only ten devices on test, which are replaced by similar devices as soon as they fail. Assume that we test these m = 10 devices for a time T_{0} = 20, and observe (as represented in Figure 3) n = 3 failures, but we do not know the exact failure times.

Even without knowing the exact times of these failures, we can still use the Chi-Square distribution with DF = 2n + 2 = 2 x 3 + 2
= 8. We obtain a "conservative" CI for the mean (or failure rate) of the underlying Exponential Distribution. For this, we again use the total test time T (= mT_{0} = 10 x 20 = 200) statistic and the Chi-Square percentile (now with DF = 8) and obtain a CI for the device true mean life θ (or reliability or rate = 1/α). We obtain a conservative CI, for confidence level 100(1 - α)% (say, of 80%, 90%, 95%). The statistic of the CI for the mean is given by:

Figure 3. Representation of Type I Censoring; 10 Devices Continuously on Test (Click to Zoom)

We calculate the corresponding upper and lower Chi-Square percentiles. As before, confidence coefficient (1 - α) depends on the required confidence. A 95% confidence yields = 0.05, α/2 = 0.025 and 1 - α/2 = 0.975. The Chi-Square table values are:

The corresponding CI for the true mean life θ, with a 95% confidence is ((2 x 200)/17.54; (2 x 200)/2.18) = (22.81; 183.49).

Since rate ρ = 1/θ, a CI for the true failure rate ρ, with confidence level of 95%, can be obtained by using the reciprocal values of the corresponding CI for the mean: (1/183.49, 1/22.81) =
(0.00545; 0.0438). Because the Exponential is a one-parameter distribution, the reliability at any time T is given by: R(T) = P{X ≥ T} = Exp{-T/θ) = Exp{-λT}.

Then, a 95% CI for the reliability at any mission time T can be obtained by using either the mean or the failure rate CI upper and lower limits. For example we obtain, using the upper/lower limits of the CI for Rate , and for a mission time T = 10:

Hence, a 95% CI for the true reliability, when mission time T =
10, is: (0.65, 0.95).

Finally, reliability bounds in this time terminated case, are resolved in the same manner as shown earlier, with the only change being that now Chi-Square has DF = 2n + 2 instead of DF
= 2n. For example, a lower 97.5% bound for the reliability, from the above data is obtained by dropping the upper limit of the 95% CI, or by applying the upper bound of the corresponding CI for failure rate (-0.0438):

Thus, the true device reliability is equal to or less than 0.947, and greater than or equal to 0.645, 97.5% of the times.

The Case of Hypothesis Testing

It is well known that there is a close relationship between the derivation of a confidence interval and testing hypotheses. For example, let a 95% CI derived from a data set, for the value of an Exponential MTBF, exclude the value 100. We can state, without doubt, that the test of hypothesis performed for the MTBF = 100, with the same data set and for the same level α = 0.05, will reject the MTBF value of 100. In the same manner, if a 95% confidence interval does include the value MTBF = 100 then, we can state without doubt that the hypothesis test, performed with the same data set and for the same level α = 0.05, will not reject the value MTBF = 100.

Therefore, all the CIs derived in the previous sections can be converted into tests of hypothesis by stating the hypothesized value of the MTBF (Exponential mean life parameter θ), denoted as θ_{0}. Based on Figure 1, we can restate the problem.

Let the underlying distribution of the lives be Exponential and the failures be independent. Then, the distribution of the statistic (2*T/θ) is distributed as a Chi-Square and the corresponding DF = λ will depend on whether the test is terminated at the time of a failure or not.

Therefore, instead of estimating the mean life using Equation 1, we compare the value of the hypothesis test statistic 2T/θ with the corresponding Chi Square table value (which we denote: χ^{2}_{α/2,γ}) obtained using the appropriate DF = γ and α-percentile. As before, DF will be γ = 2n, if the test is failure terminated and γ= 2n + 2 if the test is not. We now illustrate this, using the previous two examples.

First, consider we are testing the assumption that the true mean life θ is 140 hours (in statistical notation, H_{0}: θ_{0} = 140) using the data in Table 1. We can see, from the results in the mentioned section, that the 95% CI derived from such data set (76.13, 136.99) does not include the value 140. Performing the calculations for the hypothesis test (with T = 4495.75) we obtain:

Under the assumed hypothesis the above-defined variable (χ^{2}) is distributed Chi-Square (as illustrated in Figure 1) with DF = 2n = 90 (for, the test is failure terminated). Then, with 0.95 probability (95% chance) the value χ^{2} (= 64.225) obtained above should be included between 65.65 and 118.14 the two ChiSquare table values (test acceptance region):

This value of χ^{2} is not included in the acceptance region (64.22 < 65.65). It falls in the "rejection region". Therefore, we reject, with α = 0.05, H_{0}: θ_{0} = 140.

Now, consider the time-terminated example. Here, we test, using the data from that example, the assumption that the true mean life , is 140 hours. We can see, from the results in Table 2, that the 95% CI derived from that data set (22.81, 183.49) does include the value 140. Performing the calculations for this hypothesis test, using Test Time T = mT_{0} = 10 x 20 = 200, we obtain:

Under the assumed hypothesis, the above-defined variable 2 is also distributed as a Chi-Square (as indicated in Figure 1) with DF = 2n + 2 = 8 (for, the test is not failure terminated). Then, with 0.95 probability (or error level α = 0.05) the value χ^{2} obtained above should be included between the two Chi-Square table values (i.e., in the acceptance region):

Since this χ^{2} value is actually included: 2.18 < 2.857 < 17.54, we cannot reject the hypotheses H_{0}: 0 = 140. The probability of error for such a hypothesis test is less than = 0.05.

Summary

In this START Sheet, we discussed some problems associated with confidence interval estimation for device reliability and failure rate, when the distribution of times to failure is Exponential, when the testing is done on complete samples. We provided numerical and graphical examples and discussed some related theoretical and practical issues. In For Further Study, we give our bibliography and references for additional information.

For Further Study

Practical Statistical Tools for Reliability Engineers, Coppola, A., RIAC, 1999.

A Practical Guide to Statistical Analysis of Material Property Data, Romeu, J.L. and C. Grethlein, AMPTIAC, 2000.

Mechanical Applications in Reliability Engineering, Sadlon, R.J., RIAC, 1993.

Methods for Statistical Analysis of Reliability and Life Data, Mann, N., R.E. Schafer, and N.D. Singpurwalla, John Wiley, NY, 1974.

* Note: The following information about the author(s) is same as what was on the original document and may not be correct anymore.

Dr. Jorge Luis Romeu has over thirty years of statistical and operations research experience in consulting, research, and teaching. He was a consultant for the petrochemical, construction, and agricultural industries. Dr. Romeu has also worked in statistical and simulation modeling and in data analysis of software and hardware reliability, software engineering and ecological problems.

Dr. Romeu has taught undergraduate and graduate statistics, operations research, and computer science in several American and foreign universities. He teaches short, intensive professional training courses. He is currently an Adjunct Professor of Statistics and Operations Research for Syracuse University and a Practicing Faculty of that school's Institute for Manufacturing Enterprises. Dr. Romeu is a Chartered Statistician Fellow of the Royal Statistical Society, Full Member of the Operations Research Society of America, and Fellow of the Institute of Statisticians.

Romeu is a senior technical advisor for reliability and advanced information technology research with Alion Science and Technology. Since joining Alion in 1998, Romeu has provided consulting for several statistical and operations research projects. He has written a State of the Art Report on Statistical Analysis of Materials Data, designed and taught a three-day intensive statistics course for practicing engineers, and written a series of articles on statistics and data analysis for the AMPTIAC Newsletter and RIAC Journal.