Software Fault Prediction Using Filtering Feature Selection in Cluster-Based Classification

Fachrul Pralienka Bani Muhamad, Daniel Oranova Siahaan, Chastine Fatichah

Abstract


The high accuracy of software fault prediction can help testing effort and improving software quality. Previous researchers had proposed the combination of Entropy-Based Discretization (EBD) and Cluster-Based Classification (CBC). However, the irrelevant and redundant features in software fault dataset tend to decrease the prediction accuracy value. This study proposes improvement of CBC outcomes by integrating filtering feature selection methods. Filtering feature selection methods that will be integrated with CBC i.e. Information Gain (IG), Gain Ratio (GR), and One-R (OR). Based on the research using 2 datasets NASA public MDP (i.e. PC2 and PC3), the result shows that the combination of CBC and IG yields the best average accuracy value compared to GR and OR. It generates 67.52% average of probability detection (pd) and 37.42% average of probability false alarm (pf). While CBC without feature selection yields 65.38% average pd and 49.95% average pf. It can be concluded that IG can improve CBC outcomes by increasing 2.14% average pd and reducing 12.53% average pf

Keywords


Cluster-based Classification; Entropy-Based Discretization; Filtering Feature Selections; Software Fault Prediction

Full Text:

PDF

References


C. Catal, “Software fault prediction: A literature review and current trends,” Expert Syst. Appl., vol. 38, no. 4, pp. 4626–4636, Apr. 2011.

P. Singh and S. Verma, “Software Fault Prediction Model for Embedded Systems: A Novel finding,” Int. J. Comput. Sci. Inf. Technol., vol. 5, no. 2, pp. 2348–2354, 2014.

T. Hall, S. Beecham, D. Bowes, D. Gray, and S. Counsell, “A Systematic Literature Review on Fault Prediction Performance in Software Engineering,” IEEE Trans. Softw. Eng., vol. 38, no. 6, pp. 1276–1304, Nov. 2012.

T. Menzies, J. Greenwald, and A. Frank, “Data Mining Static Code Attributes to Learn Defect Predictors,” IEEE Trans. Softw. Eng., vol. 33, no. 1, pp. 2–13, Jan. 2007.

D. A. A. G. Singh, A. E. Fernando, and E. J. Leavline, “Experimental study on feature selection methods for software fault detection,” in 2016 International Conference on Circuit, Power and Computing Technologies (ICCPCT), 2016, pp. 1–6.

T. Hall, S. Beecham, D. Bowes, D. Gray, and S. Counsell, “A Systematic Review of Fault Prediction Performance in Software Engineering,” IEEE Trans. Softw. Eng., vol. 38, no. 6, pp. 1276–1304, 2012.

R. Malhotra, “A systematic review of machine learning techniques for software fault prediction,” Appl. Soft Comput., vol. 27, pp. 504–518, Feb. 2015.

A. Kumar and D. Zhang, “Hand-Geometry Recognition Using Entropy-Based Discretization,” IEEE Trans. Inf. Forensics Secur., vol. 2, no. 2, pp. 181–187, Jun. 2007.

P. Singh and S. Verma, “An Investigation of the Effect of Discretization on Defect Prediction Using Static Measures,” in 2009 International Conference on Advances in Computing, Control, and Telecommunication Technologies, 2009, pp. 837–839.

P. Singh and O. P. Vyas, “Software Fault Prediction Model for Embedded Software : A Novel finding,” Int. J. Comput. Sci. Inf. Technol., vol. 5, no. 2, pp. 2348–2354, 2014.

P. Singh and S. Verma, “An Efficient Software Fault Prediction Model using Cluster based Classification,” Int. J. Appl. Inf. Syst., vol. 7, no. 3, pp. 35–41, 2014.

D. A. Antony, G. Singh, A. E. Fernando, and E. J. Leavline, “Software Fault Detection using Honey Bee Optimization,” Int. J. Appl. Inf. Syst., vol. 11, no. 1, pp. 1–9, 2016.

M. S. Akbar, “Prediksi Cacat Perangkat Lunak Dengan Optimasi Naive Bayes Menggunakan Pemilihan Fitur Gain Ratio,” Institut Teknologi Sepuluh Nopember, 2017.

J. Novakovic, “The Impact of Feature Selection on the Accuracy of Naive Bayes Classifier,” 18th Telecommun. forum TELFOR, vol. 2, pp. 1113–1116, 2010.

E. Erturk and E. A. Sezer, “A comparison of some soft computing methods for software fault prediction,” Expert Syst. Appl., vol. 42, no. 4, pp. 1872–1879, Mar. 2015.

S. Lessmann, B. Baesens, C. Mues, and S. Pietsch, “Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings,” IEEE Trans. Softw. Eng., vol. 34, no. 4, pp. 485–496, Jul. 2008.

L. Ladha and T. Deepa, “Feature Selection Methods and Algotithms,” Int. J. Comput. Sci. Eng., vol. 3, no. 5, pp. 1787–1797, 2011.

A. Gowda Karegowda, A. S. Manjunath, and M. A. Jayaram, “Comparative Study of Attribute Selection using Gain Ratio and Correlation Based Feature Selection,” Int. J. Inf. Technol. Knowl. Manag., vol. 2, no. 2, pp. 271–277, 2010.

Feihu Yang, Weiqing Cheng, Renfu Dou, and Ningning Zhou, “An improved feature selection approach based on ReliefF and Mutual Information,” in International Conference on Information Science and Technology, 2011, pp. 246–250.

G. Abaei and A. Selamat, “A survey on software fault detection based on different prediction approaches,” Vietnam J. Comput. Sci., vol. 1, no. 2, pp. 79–95, 2014.

G. Chandrashekar and F. Sahin, “A survey on feature selection methods,” Comput. Electr. Eng., vol. 40, no. 1, pp. 16–28, Jan. 2014.

P. Singh, N. R. Pal, S. Verma, and O. P. Vyas, “Fuzzy Rule-Based Approach for Software Fault Prediction,” IEEE Trans. Syst. Man, Cybern. Syst., pp. 1–12, 2016.

C. Akalya Devi, K. E. Kannammal, and B. Surendiran, “A Hybrid Feature Selection Model for Software Fault Prediction,” Int. J. Comput. Sci. Appl., vol. 2, no. 2, pp. 25–35, 2012.

K. Gao, T. M. Khoshgoftaar, H. Wang, and N. Seliya, “Choosing software metrics for defect prediction: an investigation on feature selection techniques,” Softw. - Pract. Exp., vol. 39, no. 7, pp. 701–736, 2011.

D. H. Murti, N. Suciati, and D. J. Nanjaya, “Clustering data non-numerik dengan pendekatan algoritma k-means dan hamming distance studi kasus biro jodoh,” J. Ilm. Teknol. Inf., vol. 4, pp. 46–53, 2005.

D. Gray, D. Bowes, N. Davey, Yi Sun, and B. Christianson, “The misuse of the NASA Metrics Data Program data sets for automated software defect prediction,” in 15th Annual Conference on Evaluation & Assessment in Software Engineering (EASE 2011), 2011, pp. 96–103.

C. Catal, “Performance Evaluation Metrics for Software Fault Prediction Studies,” Acta Polytech. Hungarica, vol. 9, no. 4, pp. 193–206, 2012.




DOI: http://dx.doi.org/10.12962/j23546026.y2018i1.3508

Refbacks

  • There are currently no refbacks.


View my Stat: Click Here

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.