Algorithms Comparison for Non-Requirements Classification using the Semantic Feature of Software Requirement Statements

Achmad An'im Fahmi, Daniel Siahaan


Noise in a Software Requirements Specification (SRS) is an irrelevant requirements statement or a non-requirements statement. This can be confusing to the reader and can have negative repercussions in later stages of software development. This study proposes a classification model to detect the second type of noise, the non-requirements statement. The classification model that is built is based on the semantic features of the non-requirements statement. This research also compares the five best-supervised machine learning methods to date, which are support vector machine (SVM), naïve Bayes (NB), random forest (RF), k-nearest neighbor (kNN), and Decision Tree. This comparison aimed to determine which method can produce the best non-requirements classification, model. The comparison shows that the best model is produced by the SVM method with an average accuracy of 0.96. The most significant features in this non-requirement classification model are the requirements statement or non-requirements, id statement, normalized mean value, standard deviation value, similarity variant value, standard deviation normalization value, maximum normalized value, similarity variant normalization value, value Bad NN, mean value, number of sentences, bad VB score, and project id.


Noise, SVM Classification, Irrelevant Requirements Statement, Non-Requirements Statement, Requirements Specifications.

Full Text:

Full Text


Meyer B. On Formalism in Specifications. IEEE Software 1985 jan;2(1):6–26.

Romano S, Scanniello G, Fucci D, Juristo N, Turhan B. The effect of noise on software engineers’ performance. In: Proceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement New York, NY, USA: ACM; 2018. p. 1–10.

Purnomo W, Siahaan DO. Pendeteksian Overspesification Pada Dokumen Spesifikasi Kebutuhan Perangkat Lunak. Inspiration : Jurnal Teknologi Informasi dan Komunikasi 2017 jun;7(1):1–9.

Enda D, Siahaan D. Rekomendasi Perbaikan Pernyataan Kebutuhan yang Rancu dalam Spesifikasi Kebutuhan Perangkat Lunak Menggunakan Teknik Berbasis Aturan. Jurnal Teknologi Informasi dan Ilmu Komputer 2018 may;5(2):207.

Sahadi FVS, Siahaan DO, Yuhana UL. Pendeteksian Istilah Berbeda Pada Dokumen Spesifikasi Kebutuhan Perangkat Lunak (Skpl). SCAN - Jurnal Teknologi Informasi dan Komunikasi 2015;10(3):9–16.

Siahaan D, Umami I. Natural Language Processing for Detecting Forward Reference in a Document. IPTEK The Journal for Technology and Science 2012 nov;23(4).

Yang H, de Roeck A, Gervasi V, Willis A, Nuseibeh B. Analysing anaphoric ambiguity in natural language requirements. Requirements Engineering 2011 sep;16(3):163–189.

Manek PG, Siahaan D. Noise Detection in Software Requirements Specification Document Using Spectral Clustering. JUTI: Jurnal Ilmiah Teknologi Informasi 2019 mar;17(1):30–37.

Cai X, Zhang R, Gao D, Li W. Simultaneous Clustering and Noise Detection for Theme-based Summarization. Proceedings of 5th International Joint Conference on Natural Language Processing 2011;p. 491–499.

Jiang Sy, An Qb. Clustering-Based Outlier Detection Method. In: 2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery IEEE; 2008. p. 429–433.

Pamula R, Deka JK, Nandi S. An Outlier Detection Method Based on Clustering. In: 2011 Second International Conference on Emerging Applications of Information Technology IEEE; 2011. p. 253–256.

Gan G, Ng MKP. k -means clustering with outlier removal. Pattern Recognition Letters 2017 apr;90:8–14.

Mahapatra A, Srivastava N, Srivastava J. Contextual Anomaly Detection in Text Data. Algorithms 2012 oct;5(4):469–489.

Kamaruddin SS, Hamdan AR, Bakar AA, Mat Nor F. Deviation detection in text using conceptual graph interchange format and error tolerance dissimilarity function. Intelligent Data Analysis 2012 may;16(3):487–511.{&}doi=10.3233/IDA-2012-0535.

Liu Z, Lv X, Liu K, Shi S. Study on SVM Compared with the other Text Classification Methods. In: 2010 Second International Workshop on Education Technology and Computer Science, vol. 1 IEEE; 2010. p. 219–222.

Colditz RR. An evaluation of different training sample allocation schemes for discrete and continuous land cover classification using decision tree-based algorithms. Remote Sensing 2015;7(8):9655–9681.

Setyawan DA, Fatichah C. Enhancement of Decission Tree Method Based on Hierarchical Clustering and Dispersion Ratio. JUTI: Jurnal Ilmiah Teknologi Informasi 2020 jul;18(2):179–187.

Hakim L, Rochimah S, Fatichah C. Klasifikasi Kebutuhan Non-fungsional Menggunakan FSKNN Berbasis ISO/IECC 25010. JUTI: Jurnal Ilmiah Teknologi Informasi 2019 aug;17(2):107–116.

Hussain I, Ormandjieva O, Kosseim L. Automatic Quality Assessment of SRS Text by Means of a Decision-Tree-Based Text Classifier. In: International Conference on Quality Software; 2007. p. 209–218.

Amancio DR, Comin CH, Casanova D, Travieso G, Bruno OM, Rodrigues FA, et al. A Systematic Comparison of Supervised Classifiers. PLoS ONE 2014 apr;9(4):1–14.



  • There are currently no refbacks.

Creative Commons License

IPTEK Journal of Science and Technology by Lembaga Penelitian dan Pengabdian kepada Masyarakat, ITS is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Based on a work at