HUMS Threshold Setting: Knowing the difference between ‘good’ and ‘bad’ components (without manufacturer data)

August 18 2020

Dr. Eric Bechhoefer, Chief Engineer/CEO GPMS

Last week, a very bright engineering leader at a major helicopter OEM asked ‘How can you determine the vibration tolerance of a component without the manufacturer data?” Put another way, how do we know ‘good’ from ‘bad’ and therefore set our thresholds?  Because this topic has come up before, I wanted to post my reply. Be warned — its technical!


[XXXX], Your question is really four questions because, like many things, it depends on circumstance. First, when the OEM provides a limitation, such as on the main rotor, tail rotor, or short shaft, we use their guidance.

But the OEM does not have limits for most components in the aircraft. (Instead, the OEM establishes maintenance practices and TBO based on the spectrum of usage that they design the aircraft around. This is the process used for safe life design.) Therefore, we effectively need to determine those thresholds for ourselves.

In the case of the shaft, for which we have physical features such as shaft order one (SO1) inches per second (IPS) we typically set limits based on user experience that are tighter than the OEM limits. For example, the threshold for SO1 on the Bell 407 main rotor is 1 IPS. As a maintainer, you know this is a HUGE amount of vibration. One of our customers requested to set the threshold for warning at 0.25 IPS, and alarm at 0.38 IPS. Because we can always can provide an adjustment after a flight, and the operator trusts our adjustment, this customer will typically make an adjustment at the end of the day when the vibration gets close to 0.25 IPS. They schedule no test flight. As a result, their fleets average SO1 is about 0.11 IPS, whereas the average for Bell 407’s we monitor as a whole is 0.4 IPS.

We treat internal shafts (within the gearbox) different than outer shafts (gearbox input shaft, tail rotor drive shaft). For internal shaft, we look at SO1 and SO2 (second shaft harmonic, which is sensitive to a bent shaft) and use the within and between aircraft variance to establish a threshold based on a 1 in a million probability of false alarm. Typically, this process results in SO1 thresholds on the order 0.02 IPS or so (which is pretty smooth). For external shafts, we use SO1, SO2, and SO3 (third harmonic, which is sensitive to coupling failures). We typically set an IEC limit for SO1 of 0.25 IPS, and set SO2, SO3 statically, again, based on the within and between aircraft.

The process is mathematically robust and defines a hypothesis test. In contrast, most HUMS manufacturers set thresholds by asking the questions “when is it bad,” which is very hard to determine. We ask the subtlety different question “When is it no longer good.” This construction of the question allows us to define a hypothesis test, for which I have developed a theory to support. Our system uses Condition Indicators (CIs) to establish evidence that the component is no longer good. The false alarm rate when we recommend a warning is small, about 1 in 100 million. You can be very well assured that if we say the component is no longer good and needs maintenance.

The process for bearings is somewhat different, because there is no OEM guidance. Because the vibration signals of a faulted bearing are small compared to shaft order and gear mesh, detection of fault at the bearing rate frequencies using Fourier analysis is difficult. Fault detection of the baseband frequencies of the bearing rate is “stage 1” fault detection. Bearing faults detected using these types of analyses are late-stage — that is, the bearing can be close to catastrophic failure. At the very least, a bearing in this state is generating metal, which can cause damage to other components within the gearbox.
Ultrasonic emission can detect bearing inner and outer race roughness (a “stage 3” fault). Still, the remaining useful life of a bearing at this stage is relatively long compared to the overall life of the bearing. Bearing envelope analysis (BEA) can typically detect bearing faults 100s of hours before when it is appropriate to do maintenance.

BEA is based on demodulation of high-frequency resonance associated with bearing element impacts. For rolling element bearings, when the rolling elements strike a local fault on the inner or outer race or a fault on a rolling element strikes the inner or outer race, an impact is produced. These impacts modulate a signal at the associated bearing pass frequencies, such as Cage Pass Frequency (CPF), Ball Pass Frequency Outer Race (BPFO), Ball Pass Frequency Inner Race (BPFI), and Ball Fault Frequency (BFF). Figure (1) is an Outer Race Fault, where the BPFO is approximately 80 Hz. Note that the modulation rate, T1, is approximately .0125 seconds (e.g., 1/80 Hz). The time T2, the period of the resonance, is approximately 1.12e-4 seconds, or about 9000 Hz. Note that the time domain representation is the superposition of many resonances of the bearing itself.

detailed analysis

Figure 1 Example Outer Race Fault

Mathematically, the modulation is described as:

This is amplitude modulation of the bearing rate (a) with the high-frequency carrier signal (resonant frequency (b)). This causes sidebands in the spectrum surrounding the resonant frequency. It is sometimes difficult to distinguish the exact frequency of the resonance. It is usually not known a priori and cannot be determined easily without a faulted component. However, demodulation techniques typically do not need to know the exact frequency. One method for the BEA involves multiplying the vibration signal by a resonant frequency (example, 9 kHz). This is then low pass filtered to remove the high-frequency image, decimated, and the spectral power density is estimated. (Eq 2)

The bearing components have many vibration modes, which will correspondingly generate resonance at various frequencies throughout the spectrum. The selection of the frequency range used to demodulate the bearing rate signal (e.g., the window center frequency) should take into account some issues: First, the gearbox spectrum contains several high-energy frequencies from shaft and gear harmonics, which would mask analysis at lower bearing frequencies. Second, there are several accelerometers with natural resonance at frequencies that are similar to the bearing modes. Using a higher frequency window close to the accelerometer resonance can amplify the bearing fault signal, increasing the probability of fault detection.

BEA should be performed at frequencies higher than the shaft and gear mesh frequencies. This ensures that the demodulated bearing frequencies are not masked by the other rotating sources, such as shaft and gear mesh, which are present at CPF, BPFO, BPFI, and BFF frequencies. Typical shaft order amplitudes of 0.1 G’s and gear mesh amplitudes of 10s of G’s are typical. Damaged bearing amplitudes are 0.003 G’s.

Note that because we perform the envelope at a frequency that is higher than those associated with the gearbox shaft/gears, the spectrum associated with health components is Gaussian. It can be proved that the spectrum of a Gaussian system is Rayleigh distributed. The square root of the sum of four Rayleigh CIs (representing the cage, ball, inner and outer race spectral energy) can then be shown to be Nakagami. Given that we can calculate the within and between aircraft variance, we can calculate the inverse cumulative distribution, and hence the threshold based on the probability of false alarm (again, set at 1 in a million, or “6 9s” reliability).

Gears are a slightly different process, as the phenomenology is different. Gears are complex and have several different failure modes (at least six). All analysis for gears is based on the time-synchronous average (TSA). For a shaft, we use a tachometer as a key phasor to resample the data for that shaft, to make synchronous to the angular position of that shaft. This filters out vibration associated with different shafts, gears, and bearings within the gearbox. Then we operate on the TSA to extract features that are sensitive to gear faults.

HUMS threshold setting and gear analysis

For example, if we remove vibration associated with the gear mesh and SO1, SO2, and SO3 of the shaft, what is left would be random noise. The kurtosis would be close to 3 (Gaussian). If there is a fault (see above, due to a chipped tooth), the signal is no longer Gaussian, the statics change, and we can get the threshold on this. For gear analysis, we generate 18 condition indicators based on the residual, narrowband analysis, the energy operator, amplitude modulation and frequency modulation analysis. No one condition indicator works for every gear fault. We again fuse a number of condition indicators into a health indicator.

The process is similar to bearing thresholding, but a transform is made on the gear CIs to make them more “Rayleigh” like, again, using the of square Rayleigh and taking advantage of the inverse cumulative Nakagami distribution to find a threshold.

We have run many gearboxes to failure in testing – we may have the largest set of fault data in the world. What we find with the threshold setting process is conservative. While we recommend maintenance at a Health Indicator (HI) of 1 (physical damage is visible), the bearing will seize around a HI of 50 to 100. While it may take 150 to 300 hours to go from 0.5 to 1, it may take only 50 hours to go to HI 10. We have never run a gear to failure, typically stopping testing at 3 to 5.

As you can see, there is a lot that goes into developing reliable thresholds on a component-by-component basis. The most gratifying indicator is the feedback we get from maintainers when they remove a part we’ve identified as needing maintenance. They report their own inspection validated the tool’s detection capabilities and appreciate Foresight’s role in helping them avoid unplanned downtime and/or collateral damage to their gearbox. And ultimately, we hope the information Foresight provides helps to improve safety and efficiency of operations.

I’ve got a large body of published work you can find here: