Raytheon Assessment of PRISM As A Field Failure Prediction Tool By: Christopher L. Smith and Jerry B. Womack, Jr; Raytheon Company, McKinney Summary & Conclusions For any company interested in predicting field reli- ability performance, finding a prediction technique that provides a high degree of fidelity to observed field data is essential. With the discontinuance of military handbook Mil-Hdbk-217, Reliability Prediction of Electronic Equipment, and the limit- ed environmental applications of Telcordia SR- 332, Reliability Prediction for Electronic Equipment, this paper evaluates the Reliability Analysis Center's (RIAC) PRISM software tool as a potential improved methodology in predicting the field reliability of military systems. This evalua- tion compares the PRISM predicted failure rate to the actual observed field failure rate for three mili- tary electronics units. While initial results showed the predicted failure rate to be approximately one- half of the observed field failure rate, the ratio of predicted failure rate to observed field failure rate was consistent across three independent systems. Furthermore, the PRISM methodology has features such as process grade factors and field failure data incorporation through Bayesian analysis which show promise in allowing a more accurate field reliability prediction to be generated. As a point of comparison, the initial failure rate prediction by Raytheon is opposite to an earlier assessment per- formed by TRW Automotive where field data was not factored in through the use of PRISM's Bayesian analysis option (Reference 1). TRW Automotive found a predicted failure rate that was twice the observed field failure rate. This paper discusses Raytheon's assessment of the PRISM software tool including the reason for choosing PRISM, application of the PRISM pre- diction methodology to three military electronic units, and analysis of the prediction results. This paper also discusses future plans for refinements in the use of PRISM's features to produce a more accurate reliability prediction of field performance. Introduction While Mil-Hdbk-217 was never intended as a field reliability predictor, it remained a reliability pre- diction mainstay in the defense industry through the 1980's until its discontinuance in 1995. With the increased use of commercial electronics in mil- itary applications and the lack of periodic update, Mil-Hdbk-217 has become notorious for generat- ing overly pessimistic field reliability predictions. Over the past 10 years, the defense industry has faced a challenge in finding a field reliability pre- diction methodology that consistently provides a high degree of correlation with observed field data. Techniques such as physics-of-failure, while help- ful in examining specific failure mechanisms, tend to be very cumbersome for complex systems, and Telcordia SR-332 has limitations given it was developed for commercial systems. In the past few years, defense contractors have tended to either extrapolate Telcordia environmental factors to address military environments or extrapolate Mil- Hdbk-217 part complexity and quality factors to address advances in device technology and increas- es in commercial electronics quality. Furthermore, these methodologies typically only address part operational failures while other system failure con- tributors such as inadequate design and manufac- turing are not considered in the reliability predic- tion. While these practices have provided a stop- gap measure for producing field reliability predic- tions, a comprehensive reliability prediction tool to accurately predict field performance is desired. Raytheon has conducted a recent assessment of the PRISM software tool to determine its ability to accurately predict field reliability. This assessment consists of a comparison of the PRISM predicted failure rate to the actual observed field failure rate for three military elec- tronics units used in an Air Force fighter aircraft, a Navy heli- copter, and a Navy surveillance aircraft. The assessment includes understanding the details of the PRISM methodology, conducting a PRISM prediction of the electronics units, comparing the pre- dicted failure rates with those observed in the field, analyzing how the various PRISM input parameters affect the predicted failure rates, and determining areas where further refinement could produce a higher fidelity field reliability prediction. Background The Reliability Information Analysis Center (RIAC) has developed a methodology and asso- ciated engineering software tool, PRISM, to assess the reliabili- ty of electronic systems. This methodology includes component- level reliability prediction models as well as a process for assess- ing the reliability of systems due to non-component variables. The PRISM system reliability assessment program is comprised of component-level failure rate calculations taken from RIACRates models, RIAC data, or user-defined data and a system- level model that applies process grading factors. The building blocks of the PRISM prediction methodology are component-level fail- ure rates. These failure rates are determined from RIACRates models, RIAC data, or user-defined data. RIACRates are component reliability prediction models that use a combination of additive and multi- plicative factors to generate a separate failure rate for each generic class of failure mechanisms for a component. Each of these failure rate terms is then accelerated by the appropriate stress. RIACRates models have the following general form: !!!Equation 1 where λp = predicted failure rate,
λo = failure rate from operational stresses,
πo = product of failure rate multipliers for operational stresses,
λe = failure rate from environmental stresses,
πe = product of failure rate multipliers for environmental stresses,
λc = failure rate from power or temperature cycling stresses,
πc = product of failure rate multipliers for cycling stresses,
λi = failure rate from induced stresses including electrical overstress,
λsj = failure rate from solder joints,
πsj = product of failure rate multipliers for solder joint stresses.
By modeling the failure rate in this manner, factors that account for the application and component-specific variables that affect reliability ("" factors) can be applied to the appropriate additive failure rate term. RIACRates models are currently available only for capacitors, resistors, diodes, transistors, thyristors, integrated circuits, and software. PRISM also contains data from the RIAC Electronics Parts Reliability Data (EPRD) and Nonelectronic Parts Reliability Data (NPRD) publications. This data has been refined and scaled to fit into the calendar hour structure of PRISM. RIAC data is available for a variety of components including transformers, inductors, switches, relays, and connec- tors. This data is helpful when a RIACRates model or user- defined data does not exist for a particular component. In the event that empirical data is available, the PRISM software tool allows for the input of user- defined failure rate data when RIACRates model or RIAC data does not exist. The PRISM system failure rate model is defined as the sum of the component failure rates times a process grade factor. This system model is given by: !!!Equation 2 where the parameters are defined in Table 1.
Table 1. PRISM System Failure Rate Model Parameters
Once a unit is designed, the failure rate value that is calculated by any model is an inherent or "seed" failure rate because it represents only the physical attributes of the components that comprise the unit, subject to the environmental conditions and operating profile characteristics associated with its application. The failure rate that the unit will actually experience in the field may be potentially better or worse than the inherent failure rate. The difference in the observed field failure rate and the inherent predicted failure rate depends on the design, requirement definition, and testing activ- ities undertaken by the manufacturer to ensure that:
  • Designs are reliable and robust
  • Manufacturing practices do not degrade reliability performance
  • Parts of acceptable quality are selected and controlled
  • Management processes encourage good requirements definition and design practices
  • The number of "cannot duplicate" (CND) incidents is minimized
  • Maintenance activities do not induce failures
  • Wearout and infant mortality issues are understood and addressed
  • Reliability growth is emphasized throughout the design and development phases
The effect of process-related variability around the inherent (or seed) failure rate is accounted for within PRISM by applying process grade factors. By answering a series of questions with- in a specific process grade type, a scoring profile is generated and translated into a quantitative pi-factor multiplier. This score then accounts for the process-related variability by impacting the predicted failure rate positively or negatively. The process grade types within PRISM and the pi-factor multipliers associated with them are:
  • Design process grade (πD)
  • Manufacturing process grade (πM)
  • Parts process grade (πP)
  • System management process grade (πS)
  • No-defect process grade (πN)
  • Induced process grade (πI)
  • Wearout process grade (πW)
  • Reliability growth factor (πG)
  • Infant mortality grade (πIM)
Each of these processes are scored, and the process scores are combined into a module-level process grade set. For the "indus- try average", the process grade expression in the system-level model (i.e., πPIπME + πDπG + πMπIMπEπG + πSπG + πI + πN + πW) is equal to unity for the average grade. The process grade factor will increase if "less than average" processes are in place while the grade will decrease if "better than average" processes are in place. Most failure rate prediction methods allow only for an inherent reliability to be predicted, that is, the reliability of the components given correct manufacturing, requirement specifications, and handling. However, PRISM allows for two failure rate prediction types: inherent and logistics. Inherent: The inherent failure rate calculation does not take into account induced failures or "cannot duplicate" (CND) issues. The induced process grade (I) and the no-defect process grade (N) are not included in the system-level failure rate calculation. !!!Equation 3 Logistics: The logistics failure rate calculation takes into account induced failures and cannot duplicate (CND) issues. The induced process grade (I) and the no-defect process grade (N) are included in the system-level failure rate calculation. !!!Equation 4 Evaluation Methodology For the purpose of the PRISM evaluation, three airborne electronic units were chosen that had well- documented field failure data and sufficient cumulative field-oper- ating time. Their basic makeup involves multiple circuit card assemblies mounted in an enclosed chassis. All three units had at least 12 months of continuous field failure data that was detailed enough for categorizing by induced, could not duplicate (synony- mous with no defect found), design, or part failure modes. This same field data was also used to baseline the observed performance of each electronics unit. For comparing the field and predicted data, the reliability metrics Mean Time Between Failures (MTBF) and Mean Time Between Unscheduled Removal (MTBUR) were used. The MTBF metric includes only inherent-type failures excluding cannot duplicate (CND) and induced fail returns. MTBUR includes both induced and CND returns along with inher- ent failures. These field MTBF and MTBUR values were directly compared with the PRISM inherent and logistics models, respec- tively. Using these relationships, common failure modes are kept consistent in both the field data and the methodology of PRISM, thus ensuring accurate correlation between the data. The field data used in the PRISM evaluation is given in Table 2. The baseline used for comparison includes actual failure data taken over 12 months of continuous performance monitoring. The observed MTBF and MTBUR calculations were normalized over the steady 12-month period. Field returns were analyzed, sort- ed, and combined so that the observed metric represents random equipment failures. Returns that were repetitive, systematically induced, or non-performance related were removed from the total failure count. The number of software-related failures was insignif- icant and therefore left out of the evaluation altogether. The PRISM methodology generates failure rate predictions in terms of failures per million calendar hours, instead of the more common failure per million operating hours. Therefore, a translation from operating hours to calendar hours was accomplished by dividing the cumula- tive operating hours by the units' respective duty cycle. A four-step process was used to evaluate PRISM failure rate predictions against observed field data. The following four-step process was repeated for each of the three units:
  1. Inputting component/system data into the PRISM tool,
  2. Calculating predicted failure rates using the inherent and logistics RIAC models with both industry average and program-specific process grade factors, and
  3. Comparing the PRISM prediction results with observed field failure rates.
Existing system models (also known as component tree struc- tures) in Raytheon's Advanced Specialty Engineering Networked Toolkit (ASENT) reliability analysis software tool were used from previous engineering efforts on each of the three units. All assem- bly models and component parameters such as part type, electrical stress, and temperature were exported from ASENT into PRISM. After importing parts data into PRISM, component parameters were checked to verify misplaced or corrupt data was not incurred during the data transport. Components that did not have RIACRates models were assigned failure rates from the RIAC data library or assigned a user-defined failure rate. Subassemblies and components having user-defined failure rates were converted from failures per million operating hours into failures per million cal- endar hours using the respective duty cycle of each unit. PRISM's default environment settings and operating profiles were not used in our evaluation. Instead, environment and pro- file information was obtained from actual field measurements and/or contract specifications for each program. The composition component failure rate sources for each of the three units is shown in Figure 1. As can be seen, the failure rate sources varied greatly among the three units. Electronics Unit 1 had approximately an equal contribution of failure rates from RIACRates models and RIAC data. However, Electronics Unit 2 and Electronics Unit 3 had a predominant contribution from RIAC data and user-defined data, respectively. Because each of the three electronic units was designed, devel- oped, and manufactured by different programs, each unit had its own process grade factor (PGF) set. PGF sets were created by surveying the program's engineering personnel responsible for the respective process. Once attained, these factors were applied only to assemblies that were designed and manufactured by the respective Raytheon program. Parts and assemblies that were out-sourced to subcontractors were assigned PRISM's default PGFs for this evaluation.
!!!FIGURE
Figure 1. Comparison of Component Failure Rate Sources
Results The ratios of the PRISM predicted failure rates to the observed field failure rates were compared for each unit and the relative differences evaluated. Percentage differences were also calcu- lated so as to quantify the accuracy of the individual predictions as well as the evaluation average. Inherent Reliability Comparison. First, the predicted and observed inherent failure rates of each unit were compared using both the PRISM default and program-specific process grade factor sets. Figure 2 illustrates the percent differences between the predicted inherent failure rates and the observed inherent field failure rates.
!!!FIGURE
Figure 2. Comparison of PRISM Inherent Failure Rate Predictions to Observed Inherent Field Failure Rates
The primary observation is the accuracy of the predictions using the default PGFs compared to the program-specified PGFs. On the average, predictions made using the default PGFs were 57% closer to the observed values than ones made with the program- specific PGFs. The program-specific process grade factor sets have adjusted the overall failure rate to generate an optimistic Comparison of PRISM Failure Rate Sources 23% 9% 48% 12% 19% 44% 63% 19% 66% 0% 20% 40% 60% 80% Electronics Unit 1 Electronics Unit 2 Electronics Unit 3 RACRate Models RIAC Data User Defined PRISM Predicted Failure Rate vs. Random Field Failure Rate Inherent Reliability: No CNDs or Induced Failures 103% 55% 103% 70% 81% 59% 0% 25% 50% 75% 100% 125% PRISM Failure Rate: Default PGF PRISM Failure Rate: Prgm-Specific PGF Electronics Unit 1 Electronics Unit 2 Electronics Unit 3 Field Failure Rate Normalized to 100% T h e J o u r n a l o f t h e R e l i a b i l i t y A n a l y s i s C e n t e r F i r s t Q u a r t e r - 2 0 0 4 5 prediction. The standard deviations for both categories were approximately equal, which suggests relative agreement between the overall effect of each program-specific grade factor sets. With the exception of Electronics Unit 3, the inherent default PGF reliability predictions for both Electronic Units 1 and 2 were very close to the observed failure rates exhibiting no more than a 3% deviation. Electronics Unit 3 showed an 18% difference between predicted and observed field failure rates. To understand the cause for the variance in prediction accuracies, the failure rate sources for each assembly of each unit were evaluated. Figure 1 illustrates the failure data percent makeup of each electronic unit. One dis- tinct difference between Electronic Units 1 and 2 and Electronics Unit 3 is the percentage of user-defined values associated with the total predicted failure rate. The predicted failure rate of Electronics Unit 3 is 66% user-defined, whereas the user-defined contribution to Electronics Units 1 and 2 units is less than 20%. This large user-defined source in Electronics Unit 3 may contribute to the source of the prediction variance, and it will be assessed fur- ther in future PRISM evaluations. 4.2 Logistics Reliability Comparison. Next, the predicted and observed logistics failure rates of each unit were compared using both the PRISM default and program-specific process grade fac- tor sets. For this comparison, CND and induced failure returns were incorporated into the field failure rate. Likewise, the PRISM model factored CND and induced process grade factors into the overall failure rate predictions. Figure 3 shows the per- cent differences between the predicted logistics failure rates and the observed logistics field failure rates. Figure 3. Comparison of PRISM Logistics Failure Rate Predictions to Observed Logistics Field Failure Rates Reviewing the average percent differences of the default PGFs logistics model prediction, a similar difference was observed, both in value and error, to that of the inherent model prediction. It appears the effect of introducing CND and induced failures into the field data was consistent with the effect of the results of the same process grade factors. This correlation provides valid- ity to the CND and induced process grade factors. Again, the choice of PGFs greatly affects the overall accuracy of the predictions. In Figure 3, the average effective difference between using the default PGFs and program-specific PGFs is 54%. The outcome of a logistics model prediction using the pro- gram-specific PGFs is still optimistic. To better understand this, the variations between the default and program-specific process grade factors were analyzed. Table 3 illustrates the percent differences between the RIAC default and Raytheon-surveyed process grade factors. This table raises the possibility that the optimistic predicted failure rates may stem from the results of the program-specific PGF surveys. In six of the nine total process gradings, the three programs averaged at least a 35% lower value than the RIAC default values. These dif- ferences signify lower process grade scores which, by the RIAC model, lead to lower predicted failure rates. Table 3. Comparison of Process Grade Factor Scores 4.3 Field Data Incorporation. Using the two prediction models already calculated, field data was incorporated into the PRISM soft- ware tool as "observed data" to evaluate the accuracy of the adjust- ed predictions. When applying the field data into PRISM, CND and induced field failures were removed from the PRISM model entry when adjusting the inherent model. Likewise, these failures were included when adjusting the logistics model predictions. The results of using PRISM's Bayesian analysis is shown in Figures 4 and 5. Figure 4. Inherent Reliability Comparison Using Bayesian Analysis Figure 5. Logistics Reliability Comparison Using Bayesian Analysis PRISM Predicted Failure Rate vs. RandomField Failure Rate Logistics Reliability: Includes CNDs and Induced Failures 94% 59% 82% 58% 101% 82% 0% 25% 50% 75% 100% 125% PRISM Failure Rate: Default PGF PRISM Failure Rate: Prgm-Specific PGF Electronics Unit 1 Electronics Unit 2 Electronics Unit 3 Field Failure Rate Normalized to 100% Process Grade Factor Type Electr. Unit 1 Electr. Unit 2 Electr. Unit 3 Avg Diff Part Quality -32% -47% -28% -36% Infant Mortality -48% -42% -55% -48% Design -56% -62% -41% -53% Growth +4% -7% -6% -3% Manufacturing -44% -50% -24% -39% System Mgmt. -62% -68% -36% -55% Induced -42% -81% -27% -50% No Defect -9% -22% +4% -9% Wear Out +4% 0% -18% -5% PRISM Predicted Failure Rate vs. Actual Field Failure Rate Inherent Reliability: No CNDs or Induced Failures With PRISM Bayesian Analysis 100.0% 99.2% 99.4% 98.6% 100.1% 98.2% 98% 99% 100% 101% PRISMFailure Rate: Default PGF PRISMFailure Rate: Prgm-Specific PGF Electronics Unit 1 Electronics Unit 2 Electronics Unit 3 Field Failure Rate Normalized to 100% PRISM Predicted Failure Rate vs. Actual Field Failure Rate Logistics Reliability : Includes CNDs or Induced Failures With PRISM Bayesian Analysis 100.0% 100.0% 99.6% 99.6% 98.7% 99.5% 98% 99% 100% 101% PRISM Failure Rate: Default PGF PRISMFailure Rate: Prgm-Specific PGF Electronics Unit 1 Electronics Unit 2 Electronics Unit 3 Field Failure Rate Normalized to 100% T h e J o u r n a l o f t h e R e l i a b i l i t y A n a l y s i s C e n t e r F i r s t Q u a r t e r - 2 0 0 4 6 After applying observed field data into PRISM, the failure rate predictions fell very close to observed field values. Using the observed field data, the predictions fell within 2% of the observed field failure rate values. This improvement in predic- tion accuracy is an example of how important historical field data can be for predicting the field reliability of a derivative elec- tronics system. 5. Conclusions This paper compares the predicted field reliability of electronics units using the PRISM methodology to the observed field failure rate. The initial results showed that:
  • The PRISM inherent and logistics failure rate predictions both agreed well with observed field failure rates when using the PRISM default process grade factors.
  • The PRISM failure rate predictions for both inherent and logistics reliability were optimistic by approximately 30- 40% using program-specific process grade factors. It is interesting to note that the differences in the predicted val- ues versus actual field values are opposite to those found in an earlier TRW Automotive PRISM evaluation (Reference 1). While TRW Automotive's predicted fail- ure rates were approximately twice the actual field values (where field data was not factored in through the use of PRISM's Bayesian analysis option), Raytheon's predicted failure rates were approximately one-half the actual field values. The goal of this evaluation was to determine if PRISM would provide a methodology to accurately predict field failure rates. Based on these initial results, it can be concluded that PRISM does indeed have the potential to accurately predict field failure rates. It is encouraging that, given the variations in use environ- ments, failure data, and failure rate sources (RIACRates models, RIAC data, and user-defined data), the predicted failure rates of the three electronics units track fairly well with each other for both the inherent and logistics reliability predictions. 6. Future Plans Raytheon plans to continue its PRISM evaluation. While this ini- tial evaluation was conducted independently with minimal con- sultation with the Reliability Analysis Center, future plans include working more closely with the RIAC group to develop a more refined PRISM use methodology to increase the accuracy of the failure rate predictions. The ultimate goal is to develop a PRISM prediction process that accurately predicts field perform- ance using program-specific process grade factors without the need for adding observed field data via PRISM's Bayesian analy- sis methodology. The main areas of future emphasis will include:
  • Focusing on the proper development and use of program- specific process grade factors.
  • Evaluating the PRISM predicted failure modes/categories versus the observed field failure modes/categories.
  • Evaluating the PRISM reliability assessment of more complex electrical and mechanical systems to determine if the observed data patterns remain consistent.
  • Conducting independent field failure rate analyses and PRISM failure rate predictions for comparison (i.e., using two independent personnel to conduct the field failure rate analysis and the PRISM prediction to eliminate any bias that would tend to converge the two analyses). References 1. M.G. Priore, P.S. Goel, R. Itabashi-Campbell, "TRWAutomotive Assesses PRISMŽ Methodology for Internal Use", The Journal of the Reliability Analysis Center, 2002 First Quarter, pp 14-19. Biographies Christopher L. Smith is a Reliability Engineer with Raytheon Company's Space and Airborne Systems Division located in McKinney, Texas. Chris earned a BS degree in Physics from Southwest Texas State University. He has worked with Raytheon Systems Specialty Engineering for 2 years. Christopher L. Smith Raytheon Company 2501 W. University Drive, M/S 8052 McKinney, TX 75071 Internet (E-mail): Jerry B. Womack, Jr. is a Senior Reliability Engineer in Raytheon Company's Space and Airborne Systems Division located in McKinney, Texas. Jerry has 16 years of experience in reliability and system safety engineering in radar and electro- optic programs. Jerry received his BS degree in Physics from the University of Mississippi in 1987 and his MS degree in Physics from the University of Texas at Dallas in 1992. Jerry has been an American Society for Quality (ASQ) Certified Reliability Engineer (CRE) since 1993. Jerry B. Womack, Jr. Raytheon Company 2501 W. University Drive, M/S 8094 McKinney, TX 75071 Internet (E-mail): The appearance of advertising in this publication does not constitute endorsement by the Department of Defense or RIAC of the products or services advertised.