Improvement of Overhaul Maintenance Management System Based on Failure Method Operation Failure Analysis Using The FMEA Method

― The Failure Mode and Effect Analysis (FMEA) methodology is a recent systematic approach to analyze and assess identified waste (potential failure mode) on products / services or processes and prevent the frequency of occurrence. This study aims to develop FMEA and is used to analyze the root causes of failure and pareto diagrams to show the most critical identified failures for immediate corrective action. While the Faultr Tree Analysis (FTA) method analyzes system failures from a combination of several systems, levels below and component failures. The results achieved are failure priorities based on the largest RPN value and then action recommendations are taken to address them. The results of this study indicate that in the process of maintaining the plant overhaul there are several wastes. For each waste identified (failure mode) it has been analyzed using Pareto diagrams. To facilitate the assessment of identified waste carried out by respondents from management and experts in their fields through questionnaires. The results will produce a risk priority number (RPN), where the HP TBV HRSG fails to operate at 210, the HP BFP pump is off at 196, the BCP pump has a high Vibration of 192. While the root cause of the potential failure mode has been identified by fault tree analysis (FTA). While some alternative improvements such as: There is a need for training for operators to know and recognize the system, involvement of operators in carrying out repairs / maintenance is important, increasing the availability of spare parts in the field is recommended for the company.


INTRODUCTION
HE GROWTH of electricity generation in Indonesia is increasing at this time. This is to meet the national electrification program and meet customer growth. In the last 5 years (2012-2016) PLN electricity sales have increased by an average of 6.7% per year.
In the power generation unit of PT. PJB is a business operation and maintenance service unit in managing power plants. In power plant management it has a target to achieve key performance indicators that are included in the service level agreement with the customer. Gresik PLTGU is one of the power plants in charge of operating the power plant in Java well and safely. As a center that has plant installation facilities capable of operating with a maximum power of 1500 MW, the PLTGU must of course have a maintenance management system for the generating system. In maintenance management, a regular, systematic approach to planning, organizing, controlling and evaluating all maintenance activities is offered. So by implementing a maintenance management system that is programmed and implemented well, it is expected to be able to overcome the risk of failure that occurs in the operation of the plant.
Failure in the operation of the plant is caused by one / more functions of the generating system decreases in reliability or fails to fulfill its function. If unreliable occurs in the operating system and safety, it can pose a risk to the failure of the operation of the plant. Risks that affect the plant workers themselves and also the environment if the failure occurs in a safety system that is installed in layers. This might happen even though it is very small. Risks that occur are the danger of loss of electricity supply and the release of chemicals into the environment that can result in environmental pollution. So it must be ensured that all generating systems can operate according to their functions.
In this study, an analysis of the failure of the operation of the research generating system is based on existing operational failure data hystorical. It is expected that from the results of this study, the Institution can find out the operation failures that occur in the power plant system and be able to make improvements or maintenance management of the power plant system. Because seeing maintenance is also an important part of the institution, it is as important as other functions such as production. It is also hoped that with better maintenance the operating system can work according to its function, the life time of the system will be longer and production operations will become uninterrupted.

II. METHOD
A. FMEA FMEA is a systematic process for identifying potential failures to fill the objective function, identifying possible causes of failure so that those causes can be eliminated and allocating the effects of failures so that those effects can be reduced. There are three main focuses in the FMEA process, namely: Very likely An event may occurs in almost any conditions 8 -7 Likely to occurr An event that may occurs in some conditions 6 -5 Equal opportunities between occurred An event that may or may not occurs in certain or not conditions

-3 Not likely to occurr
An event may occurs in certain conditions, but less likely to occur 2 -1 Very unlikely An event that is not possible in some conditions  Severity is a measurement of the loss / damage from failures arising from various targets. The ranking of severity is applied only to consequences that arise. Occurence is a measurement of the frequency of failures that occur. Detection is the ability to detect / find failures before they affect the target. S, O and D are each given a rating / level of measurement ranging from 1 (very low) to 10 (very high).
In this study, the data obtained were processed using the FMEA method. Top rating severity is related to judgments about how likely they are to fail. The event assessment is carried out to determine how often the likelihood of failures in the operation of the plant after maintenance of the overhaul. Detection assessment aims to find out how the possibility of this failure can be detected to the maximum.
After knowing the value of the severity, events, and detection of the construction process, the Risk Priority Number (RPN) and Probability Impact Matrix values were calculated. Risk management with risk response strategies such as prevention, reduction and documentation of actions taken. Determination of risk responses, carried out for risks with the highest RPN values, is obtained using the Fishbone Diagram.

B. Failure Tree Analysis
Fault tree analysis (FTA) is a graphical tool to explore the causes of system level failures. It uses boolean logic to combine a series of lower-level events and is basically a topdown approach to identifying component-level failures (basic events) that cause system-level failures (upper events) to occur. Fault tree analysis consists of two "event" elements and a "logic gate" that links events to identify the top causes of undesirable events. Fault tree analysis is an easier method than Failure Mode and Effects Analysis (FMEA) because it focuses on all possible system failures from unwanted peak events. While FMEA conducts analysis to find all modes of system failure that may be independent of their severity.
Stages of Fault Tree Analysis 1. Determine the main failure to be analyzed in other words identifying the undesired peak event 2. Identify first-level contributors that are right below the top level using available technical information 3. Link these contributors to top-level events using logical gates (AND, OR gates), and also see their relationship, so it will help to identify the appropriate logical gate 4. Identify second level contributors and link up using logical gates.    Table 6 is a risk assessment level table, in which there are five levels: very low, low, medium, high, and very high. From the results of the impact matrix assessment in Figure 1, there are three risk factors that are classified as critical. They are HP TBV HRSG failed to operate (7), difficulty in obtaining permission HP BFP pump is dead (10), high vibration BCP pump (16). To reduce the level of risk or failure rate in a project, it is necessary to reduce the three risk factors.
Before mitigation, first of all, risk analysis, by identifying the causes and consequences of all project risks that are classified as critical. The process of identifying the cause using the Fishbone Diagram. A. Risk Mitigation 1. Parameter settings and function tests before operation 2. Check and test the function of the pump and motor 3. Functional tests and periodic checks IV. CONCLUSION Based on the identification and analysis carried out, the following conclusions are obtained: 1. The risk factors contained in the Gresik unit overhaul maintenance project are the HP TBV HRSG failing to operate, the HP BFP Pump is off, the high Vibration BCP Pump. 2. The risk response used to mitigate the risk of maintaining a power plant overhaul in Gresik is: Setting parameters and function tests before operation, checking and testing of pump and motor functions, function tests and periodic checks 3. Suggestions that can be given from researchers to improve the company is that the company is expected to make continuous improvement more frequently and assess the overhaul process in the field so as to increase productivity and reduce risks in the field, especially the failure of the operation of generating equipment to implement risk management in the full project.