Sentiment Analysis for Political Debates on YouTube Comments using BERT Labeling, Random Oversampling, and Multinomial Naïve Bayes
DOI:
https://doi.org/10.62411/jcta.11668Keywords:
BERT, Candidate Debates, Indonesian Presidential Election, Naïve Bayes, Sentiment Analysis, Random Oversampling, YouTube CommentsAbstract
The 2024 Indonesian Presidential Election marked the fifth general election in the country, aimed at electing a new President and Vice President for the 2024–2029 term. Candidates competed to succeed the outgoing president, who had served two constitutional terms. A key aspect of this election was the candidate debates, where each candidate presented their vision, allowing the public to assess their policies. These debates were broadcast on platforms like YouTube, giving the public a space to comment. However, analyzing YouTube comments presents challenges due to the volume of data, language diversity, and informal expressions. Sentiment analysis, crucial for understanding public opinion, uses algorithms such as Naïve Bayes, which is based on Bayes' Theorem and assumes feature independence. Naïve Bayes is widely used in text analysis for its speed and simplicity. When applied to YouTube comments from the 2024 debates, the algorithm demonstrated its effectiveness, especially with a balanced dataset through random oversampling. It achieved 85.155% accuracy, high precision, recall, and an AUC of 96.8% on an 80:20 data split. Its fast classification time (0.000998 seconds) makes it suitable for real-time sentiment analysis, validating its use for political events. Future applications may incorporate advanced techniques like BERT for more sophisticated analysis.References
K. Bidwell, K. Casey, and R. Glennerster, “Debates: Voting and Expenditure Responses to Political Communication,” J. Polit. Econ., vol. 128, no. 8, 2020.
T. M. Holbrook, “Political Learning from Presidential Debates,” Polit. Behav., vol. 21, no. 1, pp. 67–89, 1999, doi: 10.1023/A:1023348513570.
L. South, M. Schwab, N. Beauchamp, L. Wang, J. Wihbey, and M. A. Borkin, “DebateVis: Visualizing Political Debates for Non-Expert Users,” in 2020 IEEE Visualization Conference (VIS), Oct. 2020, pp. 241–245. doi: 10.1109/VIS47514.2020.00055.
P. P. Surya and B. Subbulakshmi, “Sentimental Analysis using Naive Bayes Classifier,” in 2019 International Conference on Vision Towards Emerging Trends in Communication and Networking (ViTECoN), Mar. 2019, pp. 1–5. doi: 10.1109/ViTECoN.2019.8899618.
M. Wongkar and A. Angdresey, “Sentiment Analysis Using Naive Bayes Algorithm Of The Data Crawler: Twitter,” in 2019 Fourth International Conference on Informatics and Computing (ICIC), Oct. 2019, pp. 1–5. doi: 10.1109/ICIC47613.2019.8985884.
W. Purbaratri, H. D. Purnomo, D. Manongga, I. Setyawan, and H. Hendry, “Sentiment Analysis of e-Government Service Using the Naive Bayes Algorithm,” MATRIK J. Manajemen, Tek. Inform. dan Rekayasa Komput., vol. 23, no. 2, pp. 441–452, Mar. 2024, doi: 10.30812/matrik.v23i2.3272.
R. Kosasih and A. Alberto, “Sentiment analysis of game product on shopee using the TF-IDF method and naive bayes classifier,” Ilk. J. Ilm., vol. 13, no. 2, pp. 101–109, Aug. 2021, doi: 10.33096/ilkom.v13i2.721.101-109.
H. Huang, A. A. Zavareh, and M. B. Mustafa, “Sentiment Analysis in E-Commerce Platforms: A Review of Current Techniques and Future Directions,” IEEE Access, vol. 11, pp. 90367–90382, 2023, doi: 10.1109/ACCESS.2023.3307308.
D. R. I. M. D. R. I. M. Setiadi, D. Marutho, and N. A. Setiyanto, “Comprehensive Exploration of Machine and Deep Learning Classification Methods for Aspect-Based Sentiment Analysis with Latent Dirichlet Allocation Topic Modeling,” J. Futur. Artif. Intell. Technol., vol. 1, no. 1, pp. 12–22, May 2024, doi: 10.62411/faith.2024-3.
J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019, pp. 4171–4186.
Y. Liu et al., “RoBERTa: A Robustly Optimized BERT Pretraining Approach,” ArXiv. Jul. 26, 2019.
R. P. Schumaker and H. Chen, “Textual analysis of stock market prediction using breaking financial news,” ACM Trans. Inf. Syst., vol. 27, no. 2, pp. 1–19, Feb. 2009, doi: 10.1145/1462198.1462204.
C. R. Machuca, C. Gallardo, and R. M. Toasa, “Twitter Sentiment Analysis on Coronavirus: Machine Learning Approach,” J. Phys. Conf. Ser., vol. 1828, no. 1, p. 012104, Feb. 2021, doi: 10.1088/1742-6596/1828/1/012104.
A. Angdresey, I. Y. Kairupan, and K. G. Emor, “Classification and Sentiment Analysis on Tweets of the Ministry of Health Republic of Indonesia,” in 2022 Seventh International Conference on Informatics and Computing (ICIC), Dec. 2022, pp. 1–6. doi: 10.1109/ICIC56845.2022.10007008.
A. S. Talaat, “Sentiment analysis classification system using hybrid BERT models,” J. Big Data, vol. 10, no. 1, p. 110, Jun. 2023, doi: 10.1186/s40537-023-00781-w.
Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma, and R. Soricut, “ALBERT: A Lite BERT for Self-Supervised Learning of Language Representations,” in Proceeding of the 8th International Conference on Learning Representations (ICLR), 2020.
A. Rogers, O. Kovaleva, and A. Rumshisky, “A Primer in BERTology: What We Know About How BERT Works,” Trans. Assoc. Comput. Linguist., vol. 8, pp. 842–866, Dec. 2020, doi: 10.1162/tacl_a_00349.
P. Nishad and S. Sankar, “Efficient Random Sampling Statistical Method to Improve Big Data Compression Ratio and Pattern Matching Techniques for Compressed Data,” Int. J. Comput. Sci. Inf. Secur., vol. 14, no. 6, pp. 179–184, May 2016.
Z. Zhang, Y. Tang, P. Zhang, P. Zhang, D. Zhang, and P. Wang, “An Adaptive Drilling Sampling Method and Evaluation Model for Large-Scale Streaming Data,” in Web Information Systems Engineering (WISE), Springer, 2023, pp. 813–825. doi: 10.1007/978-981-99-7254-8_63.
P. Sundarreson and S. Kumarapathirage, “SentiGEN: Synthetic Data Generator for Sentiment Analysis,” J. Comput. Theor. Appl., vol. 1, no. 4, pp. 461–477, Apr. 2024, doi: 10.62411/jcta.10480.
K. K. Yusuf, E. Ogbuju, T. Abiodun, and F. Oladipo, “A Technical Review of the State-of-the-Art Methods in Aspect-Based Sentiment Analysis,” J. Comput. Theor. Appl., vol. 1, no. 3, pp. 287–298, Feb. 2024, doi: 10.62411/jcta.9999.
F. Kamalov, H.-H. Leung, and A. K. Cherukuri, “Keep it simple: random oversampling for imbalanced data,” in 2023 Advances in Science and Engineering Technology International Conferences (ASET), Feb. 2023, pp. 1–4. doi: 10.1109/ASET56582.2023.10180891.
J. P. Matrutty, A. M. Adrian, and A. Angdresey, “Sentiment Analysis of Visitor Reviews on Star Hotels in Manado City,” J. Inf. Technol. Comput. Sci., vol. 8, no. 1, pp. 21–32, Apr. 2023, doi: 10.25126/jitecs.202381403.
Y. Y. Lase, A. R. Lubis, F. Elyza, and S. A. Syafli, “Mental Health Sentiment Analysis on Social Media TikTok with the Naïve Bayes Algorithm,” in 2023 6th International Conference of Computer and Informatics Engineering (IC2IE), Sep. 2023, pp. 186–191. doi: 10.1109/IC2IE60547.2023.10331126.
P. J. B. Pajila, B. G. Sheena, A. Gayathri, J. Aswini, M. Nalini, and S. S. R, “A Comprehensive Survey on Naive Bayes Algorithm: Advantages, Limitations and Applications,” in 2023 4th International Conference on Smart Electronics and Communication (ICOSEC), Sep. 2023, pp. 1228–1234. doi: 10.1109/ICOSEC58147.2023.10276274.
K. Kowsari, K. Jafari Meimandi, M. Heidarysafa, S. Mendu, L. Barnes, and D. Brown, “Text Classification Algorithms: A Survey,” Information, vol. 10, no. 4, p. 150, Apr. 2019, doi: 10.3390/info10040150.
M. H. Lidero, “mdhugol/indonesia-bert-sentiment-classification,” Hugging Face. https://huggingface.co/mdhugol/indonesia-bert-sentiment-classification (accessed Mar. 15, 2024).
T. D. Purnomo, “taufiqdp/indonesian-sentiment,” Hugging Face. https://huggingface.co/taufiqdp/indonesian-sentiment (accessed Mar. 15, 2024).
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Apriandy Angdresey, Lanny Sitanayah, Ignatius Lucky Henokh Tangka

This work is licensed under a Creative Commons Attribution 4.0 International License.