Weighted k Nearest Neighbor Using Grey Relational Analysis To Solve Missing Value

Desepta Isna Ulumi, Daniel Siahaan

Abstract


Software defect prediction model is an important role in detecting the most vulnerable component error software. Some research have been worked to improve the accuracy of the prediction defects of the software in order to manage human, costs and time. But previous research used specific dataset for software defect prediction model. However, there is no a generic dataset handling for software defect prediction model yet. This research proposed improvements to the results of the software defect prediction on the merged dataset, which is called generic dataset, with a number of different features. In order to balance the number of features, each dataset should be filled with a missing value. To fill the missing values, Weighted k Nearest Neighbor (WkNN) method was used. Then, after missing values were filled, Naïve Bayes was used to classify the selected features. This research needed to obtain a set of features which was relevant, then performed a feature selection method. The results showed that by using seven NASA public MDP datasets, Naïve Bayes with Information Gain (IG) or Symmetric Uncertainty (SU) feature selection presented the best balance value.

Software defect, NASA public MDP, weighted KNN,Naive Bayes


Keywords


Software Defect; NASA Public MDP; Weighted KNN; Naive Bayes

Full Text:

PDF

References


P. He, B. Li, X. Liu, J. Chen, and Y. Ma, “An empirical study on software defect prediction with a simplified metric set,” vol. 59, pp. 170–190, 2015.

I. H. Laradji, M. Alshayeb, and L. Ghouti, “Software defect prediction using ensemble learning on selected features,” Inf. Softw. Technol., vol. 58, pp. 388–402, 2015.

G. Czibula, Z. Marian, and I. G. Czibula, “Software defect prediction using relational association rule mining,” Inf. Sci. (Ny)., vol. 264, pp. 260–278, 2014.

F. P. B. Muhamad, D. O. Siahaan, and C. Fatichah, “Software Fault Prediction Using Filtering Feature Selection in Cluster-Based Classification,” IPTEK J. Proc. Ser., vol. 4, no. 1, p. 59, 2018.

F. Pralienka, B. Muhamad, D. O. Siahaan, and C. Fatichah, “Perbaikan Prediksi Kesalahan Perangkat Lunak Menggunakan Seleksi Fitur dan Cluster-Based Classification,” J. Nas. Tek. Elektro dan Teknol. Inf., vol. 6, no. 3, pp. 275–283, 2017.

N. Bhatia and C. Author, “Survey of Nearest Neighbor Techniques,” IJCSIS) Int. J. Comput. Sci. Inf. Secur., vol. 8, no. 2, pp. 302–305, 2010.

K. Vatansever and Y. Akgűl, “Performance evaluation of websites using entropy and grey relational analysis methods: The case of airline companies,” Decis. Sci. Lett., vol. 7, pp. 119–130, 2018.

M. Zhu and X. Cheng, “Iterative KNN imputation based on GRA for missing values in TPLMS,” Proc. 2015 4th Int. Conf. Comput. Sci. Netw. Technol. ICCSNT 2015, no. Iccsnt, pp. 94–99, 2016.

S. A. Putri and Frieyadie, “Combining integreted sampling technique with feature selection for software defect prediction,” 2017 5th Int. Conf. Cyber IT Serv. Manag. CITSM 2017, pp. 1–6, 2017.

Y. H. Wang and I. C. Wu, “Achieving high and consistent rendering performance of java AWT/Swing on multiple platforms,” Softw. - Pract. Exp., vol. 39, no. 7, pp. 701–736, 2009.

J. Novakovic, “The Impact of Feature Selection on the Accuracy of Naive Bayes Classifier,” 18th Telecommun. forum TELFOR, vol. 2, pp. 1113–1116, 2010.




DOI: http://dx.doi.org/10.12962/j20882033.v29i3.5011

Refbacks

  • There are currently no refbacks.


Creative Commons License

IPTEK Journal of Science and Technology by Lembaga Penelitian dan Pengabdian kepada Masyarakat, ITS is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Based on a work at https://iptek.its.ac.id/index.php/jts.