An Opinion Anomaly Detection Using K-Nearest Neighbours on Public Sector Financial Reports

The Main Inspectorate (Itama) as internal auditor of BPK RI is obliged to protect the credibility and the honor of its institution. The opinion of financial statements is one of the BPK RI's products that become popular because of frequent bribery cases related to it. Typically, the bribe was given to change the opinion of the financial statements from an examined entity. The anomaly detection method becomes one of the alternative methods for filtering out reports with "problem" opinions to be examined more deeply by Itama. KNN, SVM-RBF Kernel, and J48 method were used for the classification of 150 data of local government financial statements. The validation used in this paper was 60% hold-out validation (60% data for test data and the rest for training data). This paper showed that the KNN classifier (AUC=61.11%) was superior compared to another classifier, but still classified as "poor classification". Keywords―Financial statement, KNN, Opinion anomaly detection, Public sector.

I. INTRODUCTION 1 The Main Inspectorate (Itama) as internal auditor of BPK RI is obliged to protect the credibility and the honor of its institution. The opinion of financial statements is one of the BPK RI's products that become popular because of frequent bribery cases related to it. Typically, the bribe was given to change the opinion of the financial statements from an examined entity. The anomaly detection method becomes one of the alternative methods for filtering out reports with "problem" opinions to be examined more deeply by Itama.
The public sector has the different character from the private sector. Profit-oriented and revenue entity are the hallmark of the private sector, while service-oriented and cost-entity are the hallmark of the public sector. The features related to government spending used in this paper are capital expenditure divided by change in fixed assets, operational expenditure divided by change in inventories, salaries and allowances expenditures divided by total expenditures, capital expenditures divided by total expenditures, grant expenditure divided by total expenditures, social assistance expenditure divided by total expenditures, local own-source revenue (pad) divided by transfer revenue, zone territory (west or east), and administrative region type (city, district, or province). 1 Ahmad Dwi Arianto, Achmad Affandi, and Supeno Mardi Susiki Nugroho are with Department of Electrical Engineering, Faculty of Electrical Technology, Institut Teknologi Sepuluh Nopember (ITS), Kampus ITS Sukolilo, Surabaya 60111, Indonesia. E-mail: ahmad15@mhs.ee.its.ac.id; affandi@ee.its.ac.id; mardi@its.ac.id This paper uses a classification method for detecting anomalous opinions on financial statements. The K-Nearest Neighbors (KNN) algorithm is the oldest, easiest, and widely used method of classification [1]. Behind the simplicity of the algorithm, KNN has a good performance that is not inferior to more complicated algorithms [2]. Therefore, this paper suggests the use of KNN for the detection of anomalous opinions on public sector financial statements. The anomaly found can be Itama's consideration to select the audit sample.

A. Data Acquisition
Data sources were derived from 150 local government financial statements in Indonesia according to research features. The details of the financial statements obtained were 75 reports with the unqualified opinion (WTP), 25 reports with the qualified opinion (WDP), 25 reports with the adverse opinions (TW), and 25 reports with the disclaimer opinions (TMP).

B. Data Normalization
Normalization was done to uniform the range of data. The data were normalized using the scale of [-1, 1] to simplify the next process. The formula for normalizing the data was presented in Equation 1.
Where: � = new data at row-i and column-k = old data at row-i and column-k min(xk) = minimum value of column-k max(xk) = maximum value of column-k

C. Data Classification
KNN, Support Vector Machine-RBF Kernel, and J48 Algorithms were used for data classification. The comparison of Area under ROC Curve (AUC) was done to determine the best model.

D. The Validation of Classification
The validation used in this paper was 60% hold-out validation (60% data for test data and the rest for training data). The AUC of the testing phase was used to infer model performance. The diagram describing the proposed method was presented in figure1.

III. RESULT AND ANALYSIS
The AUC value from testing phase using KNN, SVM-RBF Kernel, and J48 Algorithms arranged down from the largest to the smallest is presented in Table 1. According to Gorunescu, the interpretation of the classification performance using the AUC value is as presented in Table 2.  The classification performance using KNN algorithm is better than SVM-RBF Kernel and J48. HThe classification performance using KNN is still classified as "poor classification" [3].
F-measure of each class using KNN algorithm is presented in Table 3. This criterion can be used to understand the performance of the classification model more deeply. Table 3 shows the classification model can predict the WTP class better than the other class. However, the imbalance in data amount for each class causes this "better" condition. Classification results are usually biased to the majority class. In general, the classification performance is poor in every class.

IV. CONCLUSION
Based on the test results in this paper, the performance of the KNN classification is still very poor and cannot be applied directly to Itama's job. Further research is needed to improve the classification performance with the purpose of opinions anomaly detection. Further research can be done by the increasing the amount of research data and the using of another feature of public sector financial reports. This paper uses only 150 data and 9 features of financial statements, classification performance is expected to increase through the addition of the data amount and research features.