Predicting Popularity of Movie Using Support Vector Machines

Dwi Rantini, Rosyida Inas, Santi Wulan Purnami


There are many movies performed, from low until high rating, which is the movie maybe popular or not popular. If many people watched that movie maybe it is popular, in other hand if a movie is watched by a little person so that movie can called as not popular movie. Popularity of movie can determined by several factors, such as likes, ratings, comments, etc. To determine popular or not popular of movie based on features, will use two classification methods that is logistic regression and Support Vector Machine (SVM). In this research, the data are Conventional and Social Media Movies Dataset 2014 and 2015. To get the best model and without ignoring the principle of parsimony, will do feature selection. The selected features are genre, sentiment, likes, and comments. That features will be used to classify the popularity of movies. This research used two classification methods namely logistic regression and Support Vector Machine (SVM). When used logistic regression, the accuracy is 77.29%, while used SVM the accuracy is 83.78%. Based on the accuracy of both methods, it is found that SVM gives the highest accuracy for CSM dataset. The highest accuracy is obtained from the SVM method with non-stratified holdout training-testing strategy.



Logistic Regression; Movie; Predicting Popularity; Support Vector Machines

Full Text:



M. Babita and C. K. Jangid, "Survey on Movies Popularity Prediction System Using Social Media Feature," International Journal of Innovative Research in Computer and Communication Engineering, vol. 4, no. 9, 2016.

D. A. Salazar, J. Veliz and J. C. Salazar, "Comparison between SVM and Logistic Regression: Which one is Better to Discriminate," Revista Colombiana de Estadistica Numero especial on Biostadistica, vol. 35, no. 2, pp. 223-237, 2012.

C. Cortes and V. Vapnik, "Support vector networks," Machine Learning, vol. 20, no. 3, p. 273–297, 1995.

C. Jie, L. Jiawei, W. Shulin and Y. Sheng, "Feature selection in machine learning: a new perspective," Neurocomputing , 2018.

S. Purnami, S. Rahayu and A. Embong, "Feature Selection and Classification of Breast Cancer Diagnosis Based on Support Vector Machines," IEEE, pp. 1-6, 2008.

Tokan, N. Turker and G. Filiz, "Analysis and Synthesis of the Microstrip Lines Based on Support Vector Regression.," in Microwave Conference, 2008. EuMC 2008. 38th European. IEEE, 2008.

Y. Chang and C. Hsieh, "Training and Testing Low-degree Polynomial Data Mappings via Linear SVM," Journal of Machine Learning Research, vol. 11, p. 1471–1490, 2010.

B. Ribeiro, C. Silva, N. Chen, A. Vieira and N. Carvalho, "Enhanced default risk models with SVM+," Expert Systems with Applications, vol. 39, no. 11, p. 10140–10152, 2012.

S. Haykin, Neural networks and learning machines, Upper Saddle River: Pearson Education, 2009.

C.-W. Hsu and C.-J. Lin, "A comparison of methods for multiclass support vector machines," IEEE Transactions, pp. 415-425, 2002.

H. Byun and S.-W. Lee., "A survey on pattern recognition applications of support vector machines," International Journal of Pattern Recognition and Artificial Intelligence, pp. 459-486, 2003.

Vanitha A.R. and L. Venmathi, "Classification of Medical Images Using Support Vector Machine," in Proceedings of International Conference on Information and Network Technology (ICINT 2011)., 2011.

C.-W. Hsu and L. Chih-Jen, A Practical Guide to Support Vector Classification, National Taiwan University, 2016.



  • There are currently no refbacks.

Creative Commons License
Inferensi by Department of Statistics ITS is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Based on a work at

ISSN:  0216-308X

e-ISSN: 2721-3862

Analytics Made Easy - StatCounter View My Stats