How to calculate optimal maintenance intervals
In a previous article we covered how to perform a detailed Weibull Analysis in Excel. The outputs from a Weibull analysis are important because we can use them for a variety of Reliability calculations such as when to most economically maintain assets.
A common mistake we see made is Reliability Engineers determining the optimal maintenance interval as the MTBF. This is incorrect as it assumes that you will have failed approximately 50-60% of the components before you have maintenance performed on them. (Sounds ridiculous when you say it that way right!?)
The reality is that optimal maintenance intervals are a cost optimisation problem, which is dependent on the increasing failure probability with age, the cost of downtime and the cost of planned maintenance at each selected interval.
Since we’ve already covered the Weibull analysis in a previous article, let’s start this process of assuming we already have the Beta and Eta value of our component we’re analysing. It is important to note that when conducting the optimal maintenance interval calculation, we do this at the component level, not the system level- just like in the Weibull analysis.
During the course of this article we will be working through an example of a hydraulic cylinder on a mining excavator that we need to calculate the optimal maintenance interval on.
We can assume that if one cylinder out of the four cylinders fails, it renders the machine unusable.
The Beta value is: 2.0303 and the Eta value is 7848.9
Calculated MTBF= 6967 Hrs (This will be a discussion point later as to why we don’t use this metric).
Determine the cost model:
The cost model for a maintenance interval optimisation is relatively straight forward- we have two types of cost:
Cost of maintenance : The cost of performing the actual maintenance (parts + labour)
Cost of downtime: The cost of not only performing the required maintenance, but the extra’s such as lost production, expedited shipping/ callout costs
How these two will integrate in the model we will show below:
In the above image, we have laid out the cost of normal maintenance as well as that of corrective maintenance. The key difference here is that the cost of corrective maintenance includes the cost of downtime per hour. It is important the calculate this value on the principle of the Theory of Constraints. This means that the cost of downtime for a Mining excavator in a coal mine isn’t simply the tonnes/hr dig rate multiplied by the net profit per tonne of coal. The calculated number must take into account the main constraint in the value chain.
Let’s say for our example our calculated cost of downtime is $5,000/hr. So why is it that during planned maintenance we don’t add the cost of downtime?
Production loss on planned maintenance here is considered to be $0. This is because when production sets the Budget for machine availability, this includes the maintenance of the machine. Thus theoreticaly, while the machine is scheduled for Preventative Maintenance, it wasn’t required to produce any direct/indirect income. However if the machine was planned to operate and breaks down, it incurs a production loss as the plan may have not included any backups, and the budget is missed.
In the real world, the cost of labour and parts in a breakdown situation is usually higher than in a planned sense. How could that be so when technically the same parts and labour are required to complete the repair in each instance?
While it is true that both planned and unplanned maintenance may involve the same repair work, there are several factors that contribute to the difference in costs between the two:
Labor costs: In planned maintenance, the labor force is scheduled in advance, ensuring that the required technicians and specialists are available at the right time. Unplanned maintenance often requires immediate attention, which may lead to higher labor costs due to overtime, weekend work, or emergency call-outs. Sometimes labour might not even be available, exacerbating the downtime cost.
Spare parts and inventory: With planned maintenance, spare parts and inventory can be ordered and managed more efficiently. Organizations can take advantage of lead times, bulk discounts, and reduced shipping costs. In contrast, unplanned maintenance often requires expedited shipping or emergency purchases of spare parts, which can be more expensive.
Preventive measures: Planned maintenance usually includes preventive measures, such as lubrication, adjustments, and inspections, which help extend the life of equipment and reduce the frequency of failures. Unplanned maintenance only addresses failures after they have occurred, which can result in more frequent and severe failures over time.
Calculate the cost per unit time for each potential maintenance interval:
As mentioned earlier, optimal maintenance intervals are really an optimisation problem, where we want the minimum total cost per operating hour. This is a balance between the cost of corrective maintenance at earlier intervals vs the higher cost of unplanned maintenance at higher intervals.
The most important aspect of this calculation is the Failure rate function from the Weibull analysis. Here we predict the probability of failure at each stage and assign the Corrective maintenance cost.
The cost per unit time is given by the below formula:
In a practical sense, how we do this in Excel is by creating 3 columns beside out costing calculations and Weibull parameters:
Maintenance Interval (range from 0 - a realistic value of 1.5 x the expected life of the component)
Failure Rate
Cost per Unit Time
Let’s take a step back and really understand what this is calculating…
Remember that for each maintenance interval I select, I will of course incur the cost of performing the planned task. The failure rate factors in that even though the part may be in its infancy stage, there is still a possibility (albeit small) for a failure, which will then incur the unplanned maintenance cost. The total cost of unplanned maintenance is fractionalized by this failure rate.
As the maintenance interval increases, the failure rate will as well, and although the planned maintenance cost/unit time is decreasing, the cost of unplanned maintenance /unit time is increasing. When combining them together, there will be a minima of which in practice is the best Risk vs Economic Reward - The optimal maintenance interval!
So now we can see that the optimal maintenance interval of 3,500 hours (which in this case was a component replacement due to the high β value) is significantly lower than the MTBF we calculated of 6,967 Hours.
According to this specific component’s Survival Function, if we put MTBF as our maintenance interval we would have failed 55% of all our cylinders!!
So if MTBF is 6976 Hrs, shouldn’t 50% of cylinders have failed instead of 55%?
When it comes to the Weibull distribution, MTBF doesn't always provide an accurate representation of the expected life of a component. This discrepancy arises from the fact that the Weibull distribution can be a highly skewed distribution, depending on its shape parameter (β).
In the case where the shape parameter β > 1, the distribution is right-skewed, meaning that the majority of failures occur later in the component's life, with some components lasting significantly longer. In this situation, the MTBF is higher than the time by which a large proportion of components have failed.
Remember, the goal is to make the most economical decision in terms of maintenance strategy. This will involve replacing parts that have life in them in order the reduce the chance an unplanned event will impair the asset and prevent it from generating revenue. But is important to note here that for this to be valid you must accurately model the true cost of downtime for the asset. These data driven decisions are important in FMECAS’s.
For those wishing to download the worked version of the Excel sheet, you can download them after filling in the form below: