Tax Complaints Classification on Twitter Using Text Mining

Prita Dellia, Aris Tjahyanto

Abstract


Twitter growth and utilization encourage the emergence of limitless textual information so that people can express their complaints easily This leads the Directorate General of Taxation uses twitter to deal with tax complaints faced by the community. However, the messages on twitter can contain any information, either the tax complaint or not. This will cause difficulties in handling complaints process. It is important to automatically identify so tax complaint handling can be done effectively and efficiently. Given these problems, it is necessary to do the twitter tax complaint classification with the support of text mining. There are several methods of classification such as Naïve Bayes classifiers, Support Vector Machine (SVM) and Decision Tree. This research aims to classify the tax complaint on twitter automatically by using text mining. The experimental results show the value of f-measure of SVM, Naïve Bayes and Decision Tree, respectively, are 89.3%, 85.6% and 76.9%

Keywords


Classification, Twitter, Tax Complaints, Text Mining

Full Text:

PDF

References


I. T. Review, “1108: SPT Masa PPN Hard Copy,” Indonesian Tax Review, vol. 7, no. 20, SMARTaxes Publishing, pp. 13–19, 2008.

T. O’Reilly and S. Milstein, The Twitter Book, Second Edi. Sebastopol: O’Reilly Media, Inc., 2009.

J. Xu, K. Jun, X. Zhu, and A. Bellmore, “Learning from Bullying Traces in Social Media,” Proc. 2012 Conf. North Am. Chapter Assoc. Comput. Linguist. Hum. Lang. Technol., pp. 656–666, 2012.

B. J. Jansen, M. Zhang, K. Sobel, and A. Chowdury, “Twitter power: Tweets as electronic word of mouth,” J. Am. Soc. Inf. Sci. Technol., vol. 60, no. 11, pp. 2169–2188, 2009.

P. H. C. Guerra, A. Veloso, W. M. Jr, and V. Almeida, “From Bias to Opinion : a Transfer-Learning Approach to Real-Time Sentiment Analysis,” Proc. 17th ACM SIGKDD Int. Conf. Knowl. Discov. data Min., pp. 150–158, 2011.

N. A. Diakopoulos and D. A. Shamma, “Characterizing Debate Performance via Aggregated Twitter Sentiment,” Chi2010 Proc. 28Th Annu. Chi Conf. Hum. Factors Comput. Syst. Vols 1-4, pp. 1195–1198, 2010.

M. Cheong and V. C. S. Lee, “A microblogging-based approach to terrorism informatics: Exploration and chronicling civilian sentiment and response to terrorism events via Twitter,” Inf. Syst. Front., vol. 13, no. 1, pp. 45–59, 2011.

W. Reiboldt, “Factors That Influence a Consumer Complainer’s Rating of Service Received from a Third Party Complaint-Handling Agency - the Los Angeles Department of Consumer Affairs,” J. Consum. Satisf. Dissatisfaction Complain. Behav., vol. 16, pp. 166–177, 2003.

Suryadi, “Penanganan Keluhan Publik pada Birokrasi Dinas Perijinan,” Masyarakat, Kebud. dan Polit., vol. 23, no. 4, pp. 293–303, 2010.

R. Feldman and J. Sanger, The Text Mining Handbook Advanced Approaches in Analyzing Unstructured Data. New York: Cambridge University Press, 2007.

N. Chirawichitchai, “Emotion classification of Thai text based using term weighting and machine learning techniques,” in 2014 11th Int. Joint Conf. on Computer Science and Software Engineering: “Human Factors in Computer Science and Software Engineering” - e-Science and High Performance Computing: eHPC, JCSSE 2014, 2014, pp. 91–96.

Y. A. Sari, E. K. Ratnasari, S. Mutrofin, and A. Z. Arifin, “User Emotion Identification in Twitter Using Specific Features: Hashtag, Emoji, Emoticon, and Adjective Term,” J. Ilmu Komput. dan Inf. (Journal Comput. Sci. Information), vol. 7, no. 1, pp. 18–23, 2014.

A. N. Chy, M. H. Seddiqui, and S. Das, “Bangla news classification using naive Bayes classifier,” in 16th Int’l Conf. Computer and Information Technology, ICCIT 2013, 2014, no. March, pp. 366–371.

Y. Jiang, Q. Shen, J. Fan, and X. Zhang, “The classification for e-government document based on SVM,” in Proceedings - 2010 International Conference on Web Information Systems and Mining, WISM 2010, 2010, vol. 2, pp. 257–260.

X. Chen, Y. Cho, and S. Y. Jang, “Crime Prediction Using Twitter Sentiment and Weather,” in Systems and Information Engineering Design Symposium (SIEDS), 2015, vol. 0, no. c, pp. 63–68.

H. A. Aldahawi and S. M. Allen, “Twitter mining in the oil business: A sentiment analysis approach,” in Proceedings - 2013 IEEE 3rd International Conference on Cloud and Green Computing, CGC 2013 and 2013 IEEE 3rd International Conference on Social Computing and Its Applications, SCA 2013, 2013, pp. 581–586.

D. Arora, K. F. Li, and S. W. Neville, “Consumers’ sentiment analysis of popular phone brands and operating system preference using twitter data: A feasibility study,” in Proceedings - International Conference on Advanced Information Networking and Applications, AINA, 2015, vol. 2015–April, pp. 680–686.

L. Bing, K. C. C. Chan, and C. Ou, “Public sentiment analysis in twitter data for prediction of a company’s stock price movements,” in Proceedings - 11th IEEE International Conference on E-Business Engineering, ICEBE 2014 - Including 10th Workshop on Service-Oriented Applications, Integration and Collaboration, SOAIC 2014 and 1st Workshop on E-Commerce Engineering, ECE 2014, 2014, pp. 232–239.

A. Abbasi, H. Chen, and A. Salem, “Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums,” ACM Trans. Inf. Syst. …, vol. 26, no. 3, p. 12:1-12:34, 2008.

B. Pang, L. Lee, and S. Vaithyanathan, “Thumbs up?: sentiment classification using machine learning techniques,” in Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2002, pp. 79–86.

A. Abbasi and H. Chen, “Analysis to Extremist- Messages,” no. October, pp. 67–75, 2005.

R. Zheng and J. Li, “A framework for authorship analysis of online messages: Writing-Style features and techniques,” J. Am. Soc. Inf. Sci. Technol., vol. 57, no. 3, pp. 378–393, 2006.

E. Leopold and J. Kindermann, “Text Categorization with Support Vector Machines. How to Represent Texts in Input Space?,” Mach. Learn., vol. 46, no. 1, pp. 423–444, 2002.

O. Maimon and L. Rokach, Data Mining and Knowledge Discovery Handbook. Springer US, 2010.

C. D. Manning, P. Raghavan, and H. Schütze, Introduction to Information Retrieval. Cambridge, 2009.

C. Horn, “Analysis and Classification of Twitter Messages,” Graz University of Technology, 2010.

J. Ortigosa-Hernández, J. D. Rodríguez, L. Alzate, M. Lucania, I. Inza, and J. A. Lozano, “Approaching Sentiment Analysis by using semi-supervised learning of multi-dimensional classifiers,” Neurocomputing, vol. 92, pp. 98–115, 2012.

X. Glorot, A. Bordes, and Y. Bengio, “Domain Adaptation for Large-Scale Sentiment Classification: A Deep Learning Approach,” in Proceedings of the 28th International Conference on Machine Learning, 2011, no. 1, pp. 513–520.

A. Holts, C. Riquelme, and R. Alfaro, “Automated Text Binary Classification Using Machine Learning Approach,” in 2010 XXIX International Conference of the Chilean Computer Science Society, 2010, pp. 212–217.

D. J. Hand, H. Mannila, and P. Smyth, Principles of Data Mining. MIT Press eBooks, 2001.

B. Gokaraju, S. S. Durbha, R. L. King, S. Member, and N. H. Younan, “A Machine Learning Based Spatio-Temporal Data Mining Approach for Detection of Harmful Algal Blooms in the Gulf of Mexico,” Ieee J. Sel. Top. Appl. Earth Obs. Remote Sens., vol. 4, no. 3, pp. 710–720, 2011.

T. A. Wilson, J. Wiebe, and P. Hoffmann, “Recognizing Contextual Polarity: an exploration of features for phrase-level sentiment analysis,” Comput. Linguist., vol. 35, no. 3, pp. 399–433, 2009.

and C.-J. L. Chih-Wei Hsu, Chih-Chung Chang, “A Practical Guide to Support Vector Classification,” BJU Int., vol. 101, no. 1, pp. 1396–400, 2008.

Y. Wang, W. Fu, A. Sui, and Y. Ding, “Comparison of Four Text Classifiers on Movie Reviews,” in 2015 3rd International Conference on Applied Computing and Information Technology/2nd International Conference on Computational Science and Intelligence, 2015, pp. 495–498.

I. H. (Ian H. . Witten, E. Frank, and M. A. (Mark A. Hall, Data mining : practical machine learning tools and techniques. Morgan Kaufmann, 2011.

A. Tripathy, A. Agrawal, and S. K. Rath, “Classification of sentiment reviews using n-gram machine learning approach,” Expert Syst. Appl., vol. 57, pp. 117–126, 2016.

M. Farhoodi and A. Yari, “Applying machine learning algorithms for automatic Persian text classification Applying Machine Learning Algorithms for Automatic Persian Text Classification,” in International Conference on Advanced Information Management and Service (IMS), 2010, pp. 318–323.

M. Vuk, “ROC Curve , Lift Chart and Calibration Plot,” Metod. Zv., vol. 3, no. 1, pp. 89–108, 2006.

J.~Platt, Fast Training of Support Vector Machines using Sequential Minimal Optimization, no. April. 1999.




DOI: http://dx.doi.org/10.12962/j23378530.v2i1.a2254

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.

Visit Statistik : Click Here

Visitor :

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.