Troubleshooting Intermittent Failures at Power Plants

Intermittent failures can be one of the most challenging issues to troubleshoot in a power plant or any other complex system. These types of faults can occur randomly, under different operating conditions, and can cause significant downtime, maintenance costs, and safety concerns. Troubleshooters often face frustration and aggravation when trying to diagnose the problem. However, there are systematic steps that can be taken to help recreate the fault and isolate its cause. In this blog, we will look at these steps and discuss some of the common types of intermittent failures that power plant personnel may encounter. Additionally, we will explore how proper training can help troubleshooters develop the skills and knowledge needed to address these issues effectively.

Steps for Troubleshooting Intermittent Failures[1]

An intermittent failure can create aggravation and frustration for the troubleshooter.  It also can create havoc within a process or a system operation.  Diagnosing the fault, as difficult as it is, can be accomplished using these general guidelines.

  • Attempt to recreate the problem.
    • Isolate the fault once the problem recurs.
    • Monitor the operation if the problem does not recur.

Attempt to Recreate the Problem

If the problem is no longer apparent and operator error has been ruled out, the system or equipment must be examined to find the fault.  One of the first things a troubleshooter should try to do is recreate the problem.  Using information obtained from the operator and from any equipment history or logs, make an attempt to establish operating conditions that are similar to those that existed at the time of failure.  This may require placing the equipment in a state that is contrary to other equipment operation.  For this reason, troubleshooting of an intermittent failure is performed off-line, usually in a maintenance shop.

Three basic types of intermittent problems will be described.  Most intermittent problems fall into one of the following categories:

  • Thermally-induced failure
    • Mechanically induced failure
    • Erratic failure

Although other classifications could be used, an intermittent problem usually occurs only under certain circumstances.  Contrary to common belief, most equipment does not have a mind of its own.  Two of the most likely things to change in a system during operation are temperature and mechanical functions.  For this reason, the first two categories exist.  The third category, erratic failure, includes other intermittent problems.  It is also the most difficult problem to troubleshoot.

Thermally-Induced Failure

The thermally induced failure is a problem that only becomes apparent when equipment is warmed up.  This problem may only occur on very hot days or when air conditioning is not operating.  It may also occur each time the equipment has operated for an extended period of time at normal operating temperature.

To isolate the thermally induced failure, the equipment must be cooled down first.  After the equipment is cool, it can be re-energized and allowed to warm up to normal operating temperature.  Once the equipment has operated for some time, the thermally induced failure should reappear.  Once the problem reappears, it can be verified by cycling the equipment through a cool-to-warm state several times.

To help isolate the thermally induced failure, the equipment may need to be cooled down in sections as it operates.  This can be done using a directional forced air source or a special product developed for this specific purpose.

Mechanically-Induced Failure

A mechanically induced failure is relatively easy to recognize. This type of failure occurs when the equipment or circuit experiences a vibration, mechanical shock, or motion.  By repeatedly tapping on the troubled area, the fault condition should appear and reappear.  The faulty component can be isolated by tapping or applying pressure to different areas of the equipment.

Erratic Failure

The most difficult trouble to diagnose is the erratic failure.  An erratic failure is a failure that is virtually impossible to predict.  It occurs randomly and under different operating conditions.  Many times these types of failures are related to voltage transients or irregularities.  Static discharge voltages and damage associated with static discharge can lead to erratic failures.  Digital equipment, such as computers and peripherals, are good examples of devices that are subject to these problems. 

Finding a solution to an erratic failure is not easy.  It usually requires substitution of components on a subsystem basis.  For example, assume that a computer system has erratic failures resulting in the system “locking up” at various times.  The system locks up in various modes, programs, and operating conditions.  There is no apparent trend to the failure.  A computer-based diagnostic software program may even pass on the system.  In a case such as this, each system component can be replaced, individually, with a known good component.  The system can then be run for an extended period of time after each substitution is made.  If the fault does not reappear, the component that was replaced can be assumed to be bad.  Although this is not very practical, it may be the only way to isolate the fault.

Isolate the Fault Once the Problem Recurs

In the case of each type of intermittent failure, it is important to recreate the fault condition so that the fault can recur.  Once this has been accomplished, normal troubleshooting techniques can be used to find the cause and repair it.

Monitor the Operation if the Problem Does Not Recur

The very name of intermittent failures guarantees that the problem is not always going to occur.  In the case of erratic failures especially, it may be virtually impossible to see the fault recur.  If this is the case, alternate monitoring methods can be used to track the equipment operation over an extended period of time.  Some of these methods include the following:

  • Oscilloscope[2]
    • Noise monitor

This is just a partial list of devices available for long-range monitoring of the suspect system.  Other means can be used to help diagnose the erratic failure.

Once the monitoring has been performed, the results must be analyzed.  Each aspect of the factors that may contribute to the failure must be assessed to determine the real cause of the problem.  While using a monitoring device provides useful information concerning the symptoms of the problem, it does not identify the cause of the problem.

Conclusion

Intermittent failures can be one of the most difficult problems to troubleshoot in a power plant or any other type of complex system. These types of faults are often sporadic and can occur under different operating conditions, making them very challenging to diagnose and isolate. However, by following some systematic steps, it is possible to recreate the fault and identify its root cause.  It is also important to note that proper training can help personnel better understand these issues and develop the skills needed to troubleshoot them.

FCS offers training programs and resources to help power plant personnel better understand intermittent failures and how to troubleshoot them. Our courses cover a range of topics, including techniques for recreating the problem, isolating the fault, and monitoring the operation. We also provide hands-on training using real-world scenarios and equipment, allowing personnel to gain practical experience in diagnosing and resolving intermittent failures. Our experienced trainers have worked in various industries and can offer valuable insights and best practices to help personnel develop their troubleshooting skills.

Investing in proper training can help power plant personnel reduce the risk of intermittent failures and ensure optimal system performance. Contact FCS today to learn more about our training programs and resources.


[1] https://www.electronicdesign.com/technologies/test-measurement/article/21800826/troubleshooting-techniques-for-intermittent-failures

[2] https://www.tek.com/en/documents/application-note/troubleshooting-esd-failures-using-an-oscilloscope

[3] https://www.plantengineering.com/articles/webcast-finding-and-troubleshooting-intermittent-signal-faults-dec-7-1-p-m-est/

Previous Blogs by James

Fossil Fuel Prices Over Time

Combustion Turbine: Glossary of Terms

Integrated Turbine-Boiler Control