Film Review Sentiment Analysis: Comparison of Logistic Regression and Support Vector Classification Performance Based on TF-IDF
DOI:
https://doi.org/10.33633/jais.v8i3.9090Abstract
Film sentiment analysis is a process for evaluating a sentiment value that exists in film reviews, so that positive or negative responses from films can be identified. In this study, a sentiment analysis will be carried out on film reviews on IMBD. The analysis was carried out to find out which reviews were positive and negative from film critics. The method used to carry out sentiment analysis in this study is review analysis and processing with TF-IDF and a positive or negative prediction process based on reviews that have been processed using a logistic regression algorithm and support vector classification. The data to be used is film reviews on IMBD, which consists of 2000 data, which is divided into 1000 positive data and 1000 negative data. Which is where the data will be preprocessed first and split with a percentage of 70% training data and 30% testing data. In the prediction process using the logistic regression algorithm, obtaining a test accuracy of 80.61%. While the prediction process using the support vector classification algorithm obtains a test accuracy of 82.42%.References
Astuti, R. W., Waluyo, H. J., & Rohmadi, M. (2019). Character Education Values in Animation Movie of Nussa and Rarra. Budapest International Research and Critics Institute (BIRCI-Journal) : Humanities and Social Sciences, 2(4), 215–219. https://doi.org/10.33258/birci.v2i4.610
Fithratullah, M. (2021). Representation of Korean values sustainability in American remake movies. Teknosastik, 19(1), 60-73. https://doi.org/10.33365/ts.v19i1.874
Pavitha, N., Pungliya, V., Raut, A., Bhonsle, R., Purohit, A., Patel, A., & Shashidhar, R. (2022). Movie recommendation and sentiment analysis using machine learning. Global Transitions Proceedings, 3(1), 279-284. https://doi.org/10.1016/j.gltp.2022.03.012
Rahman, A., & Hossen, M. S. (2019, September 1). Sentiment Analysis on Movie Review Data Using Machine Learning Approach. 2019 International Conference on Bangla Speech and Language Processing, ICBSLP 2019. https://doi.org/10.1109/ICBSLP47725.2019.201470
Rehman, A. U., Malik, A. K., Raza, B., & Ali, W. (2019). A Hybrid CNN-LSTM Model for Improving Accuracy of Movie Reviews Sentiment Analysis. Multimedia Tools and Applications, 78(18), 26597–26613. https://doi.org/10.1007/s11042-019-07788-7
A. M. Rahat, A. Kahir and A. K. M. Masum, "Comparison of Naive Bayes and SVM Algorithm based on Sentiment Analysis Using Review Dataset," 2019 8th International Conference System Modeling and Advancement in Research Trends (SMART), Moradabad, India, 2019, pp. 266-270, doi: 10.1109/SMART46866.2019.9117512.
Teixeira, M. B. M., Galvão, L. L. da C., Mota-Santos, C. M., & Carmo, L. J. O. (2021). Women and work: film analysis of Most Beautiful Thing. In Revista de Gestao (Vol. 28, Issue 1, pp. 66–83). Emerald Group Holdings Ltd. https://doi.org/10.1108/REGE-03-2020-0015
Kumar, K., Harish, B. S., & Darshan, H. K. (2019). Sentiment Analysis on IMDb Movie Reviews Using Hybrid Feature Extraction Method. International Journal of Interactive Multimedia and Artificial Intelligence, 5(5), 109. https://doi.org/10.9781/ijimai.2018.12.005
Bintang Purnomoputra, R., & Novia Wisesty, U. (2019). Sentiment Analysis of Movie Reviews using Naïve Bayes Method with Gini Index Feature Selection. OPEN ACCESS J DATA SCI APPL, 2(2), 85–094. https://doi.org/10.34818/JDSA.2019.2.36
Kumar, S., De, K., & Roy, P. P. (2020). Movie Recommendation System Using Sentiment Analysis from Microblogging Data. IEEE Transactions on Computational Social Systems, 7(4), 915–923. https://doi.org/10.1109/TCSS.2020.2993585
Dang, N. C., Moreno-García, M. N., & de la Prieta, F. (2020). Sentiment analysis based on deep learning: A comparative study. Electronics (Switzerland), 9(3). https://doi.org/10.3390/electronics9030483
Bonta, V., Kumaresh, N., & Janardhan, N. (2019). A Comprehensive Study on Lexicon Based Approaches for Sentiment Analysis. Asian Journal of Computer Science and Technology, 8(S2), 1–6. https://doi.org/10.51983/ajcst-2019.8.s2.2037
Behera, R. K., Jena, M., Rath, S. K., & Misra, S. (2021). Co-LSTM: Convolutional LSTM model for sentiment analysis in social big data. Information Processing and Management, 58(1). https://doi.org/10.1016/j.ipm.2020.102435
Li, L., Goh, T. T., & Jin, D. (2020). How textual quality of online reviews affect classification performance: a case of deep learning sentiment analysis. Neural Computing and Applications, 32(9), 4387–4415. https://doi.org/10.1007/s00521-018-3865-7
Malviya, S., Tiwari, A. K., Srivastava, R., & Tiwari, V. K. (2020). Machine Learning Techniques for Sentiment Analysis: A Review. SAMRIDDHI : A Journal of Physical Sciences, Engineering and Technology, 12(2), 72–78. https://doi.org/10.18090/samriddhi.v12i02.3
Maulana, R., Rahayuningsih, P. A., Irmayani, W., Saputra, D., & Jayanti, W. E. (2020). Improved Accuracy of Sentiment Analysis Movie Review Using Support Vector Machine Based Information Gain. Journal of Physics: Conference Series, 1641(1). https://doi.org/10.1088/1742-6596/1641/1/012060
Qaisar, S. M. (2020, October 13). Sentiment Analysis of IMDb Movie Reviews Using Long Short-Term Memory. 2020 2nd International Conference on Computer and Information Sciences, ICCIS 2020. https://doi.org/10.1109/ICCIS49240.2020.9257657
Haque, M. R., Akter Lima, S., & Mishu, S. Z. (2019). Performance Analysis of Different Neural Networks for Sentiment Analysis on IMDb Movie Reviews. 3rd International Conference on Electrical, Computer and Telecommunication Engineering, ICECTE 2019, 161–164. https://doi.org/10.1109/ICECTE48615.2019.9303573
Sharma, N., Sharma, R., & Jindal, N. (2021). Machine Learning and Deep Learning Applications-A Vision. Global Transitions Proceedings, 2(1), 24–28. https://doi.org/10.1016/j.gltp.2021.01.004
Wei, J., Chu, X., Sun, X. Y., Xu, K., Deng, H. X., Chen, J., Wei, Z., & Lei, M. (2019). Machine learning in materials science. In InfoMat (Vol. 1, Issue 3, pp. 338–358). Blackwell Publishing Ltd. https://doi.org/10.1002/inf2.12028
Sen, P. C., Hajra, M., & Ghosh, M. (2020). Supervised Classification Algorithms in Machine Learning: A Survey and Review. Advances in Intelligent Systems and Computing, 937, 99–111. https://doi.org/10.1007/978-981-13-7403-6_11
Ramesh, T. R., Lilhore, U. K., Poongodi, M., Simaiya, S., Kaur, A., & Hamdi, M. (2022). PREDICTIVE ANALYSIS OF HEART DISEASES WITH MACHINE LEARNING APPROACHES. Malaysian Journal of Computer Science, 2022(Special Issue 1), 132–148. https://doi.org/10.22452/mjcs.sp2022no1.10
Lee, C. S., & Lee, A. Y. (2020). Clinical applications of continual learning machine learning. In The Lancet Digital Health (Vol. 2, Issue 6, pp. e279–e281). Elsevier Ltd. https://doi.org/10.1016/S2589-7500(20)30102-3
Breck, E., Polyzotis, N., Roy, S., Whang, S., & Zinkevich, M. (2019, April). Data Validation for Machine Learning. In MLSys. Proceedings of the 2 nd SysML Conference, Palo Alto, CA, USA, 2019
Huy, D. T. N., Le, T. H., Hang, N. T., Gwo?dziewicz, S., Trung, N. D., & Van Tuan, P. (2021). Further researches and discussion on machine learning meanings-and methods of classifying and recognizing users gender on internet. Advances in Mechanics, 9(3), 1190-1204.
Zhou, Z., Qin, J., Xiang, X., Tan, Y., Liu, Q., & Xiong, N. N. (2020). News text topic clustering optimized method based on TF-iDF algorithm on spark. Computers, Materials and Continua, 62(1), 217–231. https://doi.org/10.32604/cmc.2020.06431
Dalaorao, G. A., Sison, A. M., & Medina, R. P. (2019). Integrating Collocation as TF-IDF Enhancement to Improve Classification Accuracy. 2019 IEEE 13th International Conference on Telecommunication Systems, Services, and Applications (TSSA). doi:10.1109/tssa48701.2019.8985458
Wang, J., Xu, W., Yan, W., & Li, C. (2019). Text similarity calculation method based on hybrid model of LDA and TF-IDF. ACM International Conference Proceeding Series, 1–8. https://doi.org/10.1145/3374587.3374590
Samsudin, N. M., Mohd Foozy, C. F. B., Alias, N., Shamala, P., Othman, N. F., & Wan Din, W. I. S. (2019). Youtube spam detection framework using naïve bayes and logistic regression. Indonesian Journal of Electrical Engineering and Computer Science, 14(3), 1508–1517. https://doi.org/10.11591/ijeecs.v14.i3.pp1508-1517
Zou, X., Hu, Y., Tian, Z., & Shen, K. (2019). Logistic Regression Model Optimization and Case Analysis. Proceedings of IEEE 7th International Conference on Computer Science and Network Technology, ICCSNT 2019. https://doi.org/10.1109/ICCSNT47585.2019.8962457
Luo, H., Pan, X., Wang, Q., Ye, S., & Qian, Y. (2019). Logistic regression and random forest for effective imbalanced classification. Proceedings - International Computer Software and Applications Conference, 1, 916–917. https://doi.org/10.1109/COMPSAC.2019.00139
Alotaibi, F. M. (2019). Classifying text-based emotions using logistic regression. http://dx.doi.org/10.21015/vtcs.v16i2.551
Shah, K., Patel, H., Sanghvi, D., & Shah, M. (2020). A Comparative Analysis of Logistic Regression, Random Forest and KNN Models for the Text Classification. Augmented Human Research, 5(1). https://doi.org/10.1007/s41133-020-00032-0
Robles-Velasco, A., Cortés, P., Muñuzuri, J., & Onieva, L. (2020). Prediction of pipe failures in water supply networks using logistic regression and support vector classification. Reliability Engineering and System Safety, 196. https://doi.org/10.1016/j.ress.2019.106754
Rákos, O., Aradi, S., & Bécsi, T. (2020). Lane change prediction using Gaussian classification, support vector classification and neural network classifiers. Periodica Polytechnica Transportation Engineering, 48(4), 327–333. https://doi.org/10.3311/PPTR.15849
Liu, W., & Rao, Z. (2020). Road Icing Warning System Based on Support Vector Classification. IOP Conference Series: Earth and Environmental Science, 440(5). https://doi.org/10.1088/1755-1315/440/5/052071
Djedidi, O., Djeziri, M. A., Morati, N., Seguin, J. L., Bendahan, M., & Contaret, T. (2021). Accurate detection and discrimination of pollutant gases using a temperature modulated MOX sensor combined with feature extraction and support vector classification. Sensors and Actuators, B: Chemical, 339. https://doi.org/10.1016/j.snb.2021.129817
Soubraylu, S., & Rajalakshmi, R. (2021). Hybrid convolutional bidirectional recurrent neural network based sentiment analysis on movie reviews. Computational Intelligence, 37(2), 735–757. https://doi.org/10.1111/coin.12400
Bodapati, J. D., Veeranjaneyulu, N., & Shaik, S. (2019). Sentiment analysis from movie reviews using LSTMs. Ingenierie Des Systemes d’Information, 24(1), 125–129. https://doi.org/10.18280/isi.240119
Downloads
Published
Issue
Section
License
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).