Engineering Asset Management & Data

Articles

Perform a Weibull Analysis in Excel

Welcome to our comprehensive guide on how to perform a Weibull Analysis in Excel.

In this piece, I'll outline the necessary steps for a comprehensive analysis, highlighting key considerations and potential challenges to be aware of. By the conclusion of this process, you'll be able to derive Weibull parameters from the dataset, offering insights into the component’s failure traits, including its failure rate, nature of failure, MTBF, and the likelihood of failure at any given moment. Such insights are invaluable for Reliability Engineers, guiding them in refining maintenance approaches and timing, which is why this kind of analysis stands as a cornerstone in Reliability Engineering.

At the bottom of this article you will have the ability to download the dataset, the fully completed example in Excel, as well as the video explaining it all! (For those visual learners).

The High-Level process for performing a Weibull Analysis

Section 1: Collecting Life Data

The foundation of any reliable Weibull Analysis lies in the quality of the life data being used. In fact, it's recommended that 90% of the effort should be dedicated to preparing life data, as it directly affects the credibility of the results. Poor data can render an entire Weibull Analysis futile.

There are four crucial steps to preparing life data:

  1. Identify the asset(s) to be analyzed

  2. Determine the component failure mode for the chosen asset(s)

  3. Gather as much relevant life data as possible

  4. Classify the life data

Step 1: Identify the asset(s) to be analyzed Weibull Analysis can be performed on a single asset or multiple similar assets, as long as they share comparable design, function, failure modes, and failure rates.

Step 2: Determine the component failure mode for the chosen asset(s) Different components have distinct failure modes and rates. To ensure accurate and representative results, treat each component failure mode separately.

Step 3: Gather as much relevant life data as possible Life data, or "time-to-failure" (TTF) data, refers to measurements of a product's lifespan, typically calculated in hours, kilometers, cycles, or other relevant metrics. To make accurate predictions for the entire product population, gather ample life data. Accurate predictions usually result from a combination of quality data and an appropriate model.

Life data sources include:

  • In-house reliability tests: design testing, qualification testing, life tests, quantitative accelerated life tests, reliability growth tests

  • Field data: call center records, warranty claims, returned item inspections; this data can be biased but is more likely to reflect real-world use and abuse

Step 4: Classify life data Not all data sets are complete. Life data can be classified into two types: complete data (all information available) or censored data (some information missing).

Censored data can be further divided into three sub-types: right censored (suspended), interval censored, and left censored data. Different data types require different analysis methods, which will be discussed in our next blog post.

  • Complete Data: Exact TTF is known (e.g., failure at 300 hours); usually from structured lab testing or fully accessible field data with high failure rates

  • Right Censored Data (Suspended): Unit operated successfully for a known period and then continued (or could have continued) for an unknown period (e.g., still operating at 300 hours)

  • Interval Censored Data: Exact TTF is unknown, but failure occurred within a specific interval (e.g., between 300 and 400 hours)

    • Rule of Thumb: Treat data as interval data if the granularity is coarser than the desired results (e.g., desired results in days, but data points in months)

  • Left Censored Data: Exact TTF is known only to have occurred before a certain time (e.g., failure between 0 and 300 hours)

Types of Censored Data

Section 2: Selecting the Right Lifetime Distribution:

Lifetime distributions are mathematical models designed to represent specific behaviors of life data. The key to obtaining accurate predictions is to choose the correct distribution that fits the life data set and models the component's life accurately. Lifetime distributions are typically characterized by their failure rates, which can be increasing, decreasing, or constant.

Methods for Choosing the Right Distribution:

  1. Theoretical method: If you have sufficient knowledge of the failure mechanism(s), extensive experience in Weibull Analysis, and ample data, use your engineering judgment to determine the right lifetime distribution. Consider the following factors:

    • The variable (data) in question

    • Historical data and analysis

    • Literature from your industry

    • Descriptions and underlying assumptions of probability distributions

  2. Goodness-of-fit tests: If you're unsure which distribution to use, perform goodness-of-fit (GOF) tests to determine the most appropriate model. Weibull Analysis software often includes GOF features to help with this process.

Estimating Distribution Parameters: After selecting the best-fit lifetime distribution, you'll need to estimate its parameters to fit the statistical model to your life data set. Lifetime distributions typically have three types of parameters: shape, scale, and location. The number and values of these parameters directly affect the distribution characteristics and the visual representation of the probability density function (PDF).

Methods for Parameter Estimation:

  1. Probability plotting

  2. Least squares (rank regression) estimation

  3. Maximum likelihood estimation (MLE)

  4. Bayesian Estimation Method

Different data types (complete, right censored, interval censored, and left censored) require different analysis methods to estimate the parameters. As a rule of thumb, use Rank Regression for complete data and small sample sizes, and MLE for heavy and/or mixed censoring and larger sample sizes (30+ failures).

The Weibull distribution can be characterized by two or three parameters:

  • 2-parameter Weibull distribution: shape parameter (β) and scale parameter (η)

  • 3-parameter Weibull distribution: shape parameter (β), scale parameter (η), and location parameter (γ)

The 3-parameter model is used when there's a significant "time-to-failure" offset, while the 2-parameter model is suitable for most other cases.

In summary, selecting the right lifetime distribution and accurately estimating its parameters are crucial for conducting a successful Weibull Analysis. By following these guidelines, you'll be better equipped to make informed decisions and predictions based on your life data.

Section 3: Calculate the Weibull Parameters:

In this example, we have a data set of Hydraulic cylinders from an Excavator fleet. This data includes the current hours and if the cylinder has failed “F” or is suspended “S” (Right censored).

Because we have suspended data, this will affect the ranking in our Weibull Analysis (Which will be explained a little bit further on), but to start we first rank from low-high on the hours and assign a Rank and Reverse rank to each data point.

Sort Data from Low-High Hours, and Assign Ranks, Reverse Ranks and add a column for Adjusted Ranks

In a Weibull Analysis, we only plot the failed data, but because the suspended data has a real effect on how the failed data is ranked, this must be considered in the analysis. This is done by what we call “Adjusted Ranks”, or otherwise known as “Kaplan-Meier Adjusted Ranks”.

The Kaplan-Meier estimator provides a non-parametric estimate of the empirical cumulative distribution function (CDF) while accounting for censored observations. The adjusted ranks represent the estimated failure probabilities for each observed failure event. It is important to note that these adjusted ranks are only applied to “Failed” data. So after assigning Reverse Ranks to both Failed and Suspended data, we must filter by Suspended data, and delete all these lines.

The Reverse rank number should be preserved for the failed data, as this is what we’ll use to calculate the adjusted ranks using the Kaplan-Meier Formula:

 

Kaplan-Meier Formula for calculating adjusted Ranks

 

This implementation into Excel is given in the previous two images.

Now, we’ll need to calculate the Median Rank..

The median rank is a non-parametric plotting position used in Weibull Analysis to estimate the cumulative failure probability of a component. It's a statistical method to assign a probability value to each failure in a data set, which is then used in probability plotting for fitting a lifetime distribution model, such as the Weibull distribution, to the life data.

The median rank is calculated as follows using what’s called Bernard’s Approximation:

Median Rank = (i - 0.3) / (n + 0.4)

Where i is the rank of the failure (from 1 to n) in ascending order of time-to-failure, and n is the total number of failures in the data set.

The resulting median rank values are used as the vertical axis in a Weibull probability plot, while the corresponding time-to-failure data is used as the horizontal axis. By plotting the data points and fitting a straight line through these points, you can estimate the parameters of the Weibull distribution and analyze the reliability characteristics of the component under study. the “i” value in this formula is the Adjusted Rank we calculated earlier.

You will have probably caught on that if we only had Failure data, the adjusted ranks step will not be needed, and “i” is simply the original rank.

Bernard's Approximation- Weibull

So the next step is to calculate the final X and Y axes values for your weibull plot, which are going to be:

  • X axis: ln(x) or ln(hours)

  • Y axis: ln(ln(1/1-f(t))) or ln(ln(1/1- Median Rank))

Plot these two against each other and click on the scatter data to assign a line of best fit and R-squared value as shown below.

As you can see from the plot above, the data fits the trendline reasonable well with an R-squared value of 0.97, which is a 97% best fit measure.

If your data doesn’t fit the trendline well, you may have multiple failure modes present, which will show up as an S-bend or dog-leg. as seen below:

Weibull Goodness of fit

So let’s extract the valuable parameters here:

  • β is the slope of the straight line: 2.03 (which is greater than 1, indicating wearout failure). For wearout failures, the best maintenance strategy is Scheduled Component Changeout.

  • Characteristic life, Eta: η=e^((-b/β))= exp(--18.208/2.0303)= 7848 Hours

Section 4: Weibull Function Plotting

Reliability Function R(t)= e^(-(t/η)^β)

The reliability function, also known as the survival function, represents the probability that a system or component will perform its intended function without failure up to a specified time (t).

Failure Function F(t)= 1-R(t)

Failure Function or Cummulative Distribtuion Fucntion (CDF) represents the probability of failure up to a given time (t).

Reliability Function

Failure Function


Failure Rate λ(t)= (β/η)∗ (t/η)^(β-1)

The failure rate, also known as the hazard rate, represents the rate at which failures occur at a given time (t).

Probability Density Function (PDF)=

(β/η)∗ (t/η)^(β-1)∗exp⁡(-(t/η)^β )

 The PDF represents the probability density of failures at a given time (t)

Hazard Function- Weibull

Hazard Function

Probability Density Function- Weibull

Probability Density Function

Section 5: Reliability Metrics

Mean Time To Failure (MTTF):

MTTF represents the expected value or the average time to failure for non-repairable systems or components.

The MTTF for a Weibull distribution can be calculated using the following formula:

MTTF = η ∗ Γ(1 +1/β)

Where Γ is the Gamma function

***The gamma function is used in calculating the Mean Time To Failure (MTTF) for a Weibull distribution because it helps to compute the expected value or the average of the distribution.***

Mean Time Between Failure (MTBF):

MTTF and MTBF are different in terms of the systems or components they apply to and the information they convey. MTTF is used for non-repairable systems or components and represents the average time to failure, while MTBF is used for repairable systems or components and represents the average time between consecutive failures, including repair time.

MTBF =MTTF+MTTR

Let’s say it takes 1 x 12 hour shift to change out a cylinder, our MTBF would then be 6,955 + 12= 6967 Hours.

Common Pitfalls:

  • Weibull analysis should be used at the component level, not system level. Don’t go perform a Weibull analysis on an entire haul truck.

  • That is because it can only predict reliability with accuracy for single failure modes

  • If multiple failure modes are present, then the data will fit a linear equation well, and will have S- bends present. Sometime a little bend is ok- as long as it fits within your confidence intervals

  • If using Minitab or Python, you can have more advanced features such as a multiple failure mode Weibull model- this is a useful tool and its even able to separate failure mode data for you. Excel does not have this capability unfortunately, even if you write in VBA


Section 6: Discussion:

Now that you've gained insights into the life characteristics of all products in the population, the next step is to use these results to improve the reliability and cost performance of the products.

Choose Optimal Strategies Based on the Beta Value, β The best maintenance strategies can be determined by examining the failure patterns of the component, represented by the beta value (β). The beta value is a measure of the slope of the probability plot.

The "Reliability Bathtub Curve" in the Failure Rate vs Time plot (see image below) graphically represents the three failure patterns: infant mortality failures (β < 1) with a decreasing failure rate, random failures (β = 1) with a low, relatively constant failure rate, and wear-out failures (β > 1) with an increasing failure rate.

 
 

Infant Mortality Failures:

What may cause the failures?

  • Inadequate quality assurance and control in design 

  • Inadequate quality assurance and control in manufacturing 

  • Lack of burn-in or stress testing 

What to Do About It?

  • Choose the best design approaches, such as Appropriate specifications, adequate design tolerance and sufficient component derating. 

  • Start stress testing, such as HALT (Highly Accelerated Life Test) or HAST (Highly Accelerated Stress Test), at the earliest development phases to evaluate design weaknesses and detect specific assembly and materials problems. 

  • Apply stress testing in early production phases to precipitate failures to effectively identify defects, analysing the resulting failures and take corrective action through redesign to eliminate the root causes of these defects.  

Random Failures:

What may cause the failures? 

Stress exceeding strength such as human error during maintenance, induced failures, accidents and natural disasters. 

What to Do About It? 

  • Conduct condition monitoring.

  • If the failure is considered as unacceptable, redesign and replace the component or the system before it fails; 

  • If the cost of replacement outweighs the benefit gained from making changes, and the failure is not significant, leave it in operation, tackle it when the failure occurs.  

Wearout Failures:

What may cause the failures?

  • Fatigue or depletion of materials 

  • Corrosion or erosion 

  • Inherent failures of materials 

  • Accumulated damage 

Recommended Actions:

  1. For significant and rapid wear-out failures (i.e., β > 4), overhauls may be the most cost-effective solution.

  2. For early wear-out failures (i.e., 1 < β < 4), preventive maintenance optimization strategies may be the most cost-effective. Schedule optimal replacement or remediation maintenance strategies at a given time interval (determined by CDF) to prevent failure before it occurs.

  3. Run simulations using the Weibull results in your RBD blocks and execute a full system simulation over time. This approach will help you accurately define the component and system's failure profile, as well as forecast the best strategies to meet your reliability and cost needs.

In summary, this step-by-step guide has walked you through the process of performing a Weibull Analysis, from collecting life data to determining the distribution type, estimating parameters, generating results, reviewing the analysis, and finally identifying appropriate strategies to enhance reliability and cost performance.

If you want to learn how to not only do a proper Weibull Analysis, but also calculate optimal maintenance intervals, Capital Asset Replacement times and strategies, calculate optimal spares levels for a certain reliability level, and much more, than refer to our online course HERE.

Weibull Analysis Video