Sentence-Level Sentiment Analysis of Indonesian App Reviews Using IndoBERTweet

Inge Najwa Aqiilah; Ristu Saptono; Akhmad Syaifuddin

doi:10.62411/jcta.16240

Authors

Inge Najwa Aqiilah Sebelas Maret University
Ristu Saptono Sebelas Maret University
Akhmad Syaifuddin Sebelas Maret University

DOI:

https://doi.org/10.62411/jcta.16240

Keywords:

Sentence-level Sentiment Analysis, IndoBERTweet, IndoBERT, Indonesian App Reviews, Natural Language Processing, Transformer-based Classification, User-Generated Content Analytics, Sustainable Digital Services

Abstract

Document-level sentiment analysis assigns a single polarity label to an entire review, often obscuring opinion diversity within multi-sentence submissions. This limitation is particularly evident in reviews of multi-service platforms, where users frequently express heterogeneous opinions toward different aspects of the platform in the same review. To address this challenge, this study proposes a sentence-level sentiment analysis framework for Indonesian Gojek app reviews collected from the Google Play Store. The proposed framework introduces a two-stage segmentation strategy that combines punctuation-aware rules with conjunction-aware splitting based on coordinating and adversative conjunctions (e.g., tapi [but], padahal [even though]) to identify opinion boundaries and decompose mixed-sentiment reviews into independently classifiable sentence units. A total of 14,730 raw reviews collected between May and July 2025 were subjected to data cleaning and quality filtering, resulting in 7,187 valid reviews that were further segmented into 14,187 sentence-level instances. Each instance was manually annotated by three annotators using a four-class labeling scheme consisting of app-positive, app-negative, app-neutral, and service categories. Sentiment-level inter-annotator agreement, computed on the subset of instances unanimously categorized as app-related by all three annotators (n = 4,384), achieved substantial agreement (Fleiss' = 0.636). Hyperparameter optimization was conducted using Optuna with the Tree-structured Parzen Estimator (TPE) sampler across four experimental scenarios. The best performance was achieved by IndoBERTweet under Stratified K-Fold evaluation, attaining an accuracy of 0.751 and a macro F1-score of 0.729, outperforming all IndoBERT configurations. The results demonstrate the effectiveness of domain-adaptive pre-training on informal Indonesian text and highlight the value of conjunction-aware segmentation for preserving fine-grained opinion structures in mixed-sentiment reviews. These findings suggest that domain-aligned language representations provide a practical and effective solution for sentence-level sentiment analysis of Indonesian app reviews.

Author Biographies

Inge Najwa Aqiilah, Sebelas Maret University

Faculty of Information Technology and Data Science, Sebelas Maret University, Surakarta 57126, Indonesia

Ristu Saptono, Sebelas Maret University

Faculty of Information Technology and Data Science, Sebelas Maret University, Surakarta 57126, Indonesia

Akhmad Syaifuddin, Sebelas Maret University

Faculty of Information Technology and Data Science, Sebelas Maret University, Surakarta 57126, Indonesia

References

S. Wahyu Handani, D. Intan Surya Saputra, Hasirun, R. Mega Arino, and G. Fiza Asyrofi Ramadhan, “Sentiment Analysis for Go-Jek on Google Play Store,” J. Phys. Conf. Ser., vol. 1196, p. 012032, Mar. 2019, doi: 10.1088/1742-6596/1196/1/012032.

K. S. Nugroho, A. Y. Sukmadewa, H. Wuswilahaken DW, F. A. Bachtiar, and N. Yudistira, “BERT Fine-Tuning for Sentiment Analysis on Indonesian Mobile Apps Reviews,” in 6th International Conference on Sustainable Information Engineering and Technology 2021, Sep. 2021, pp. 258–264. doi: 10.1145/3479645.3479679.

I. H. Setiawan, M. Rahardi, A. Aminuddin, and F. F. Abdulloh, “Sentiment Analysis of Tokopedia Application Reviews on Google Play Store Using BERT,” in 2024 International Conference on Information Technology Systems and Innovation (ICITSI), Dec. 2024, pp. 242–247. doi: 10.1109/ICITSI65188.2024.10929357.

M. Hanafi, S. Adi, and A. Setiawan, “A Model of Sentiment Analysis on Gojek Application Review using Word Vector Representation and Long Short-Term Memory (LSTM),” in 2025 International Conference on Computer Sciences, Engineering, and Technology Innovation (ICoCSETI), Jan. 2025, pp. 944–949. doi: 10.1109/ICoCSETI63724.2025.11019757.

A. P. Singh, A. Singh, A. Prakash, A. Kumar, and Vikas, “Sentiment Analysis on Play Store Application Reviews Using BERT Model,” in 2025 International Conference on Networks and Cryptology (NETCRYPT), May 2025, pp. 495–499. doi: 10.1109/NETCRYPT65877.2025.11102489.

R. Budianoor, S. W. Saputro, F. Abadi, R. A. Nugroho, and A. Farmadi, “Quantifying the Impact of Text Preprocessing on IndoBERT Fine-Tuning for Indonesian Informal Culinary Sentiment Analysis,” J. Comput. Theor. Appl., vol. 3, no. 4, pp. 564–581, May 2026, doi: 10.62411/jcta.15980.

S. Mifrah and E. H. Benlahmar, “Sentence-Level Sentiment Classification A Comparative Study Between Deep Learning Models,” J. ICT Stand., May 2022, doi: 10.13052/jicts2245-800X.10213.

M. R. Pribadi, D. Manongga, H. D. Purnomo, I. Setyawan, and Hendry, “Sentiment Analysis of the PeduliLindungi on Google Play using the Random Forest Algorithm with SMOTE,” in 2022 International Seminar on Intelligent Technology and Its Applications (ISITIA), Jul. 2022, pp. 115–119. doi: 10.1109/ISITIA56226.2022.9855372.

X. Li, X. Sun, Z. Xu, and Y. Zhou, “Explainable Sentence-Level Sentiment Analysis for Amazon Product Reviews,” in 2021 5th International Conference on Imaging, Signal Processing and Communications (ICISPC), Jul. 2021, pp. 88–94. doi: 10.1109/ICISPC53419.2021.00024.

D. R. I. M. Setiadi, D. Marutho, and N. A. Setiyanto, “Comprehensive Exploration of Machine and Deep Learning Classification Methods for Aspect-Based Sentiment Analysis with Latent Dirichlet Allocation Topic Modeling,” J. Futur. Artif. Intell. Technol., vol. 1, no. 1, pp. 12–22, May 2024, doi: 10.62411/faith.2024-3.

B. Wilie et al., “IndoNLU: Benchmark and Resources for Evaluating Indonesian Natural Language Understanding,” in Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, 2020, pp. 843–857. doi: 10.18653/v1/2020.aacl-main.85.

F. Koto, J. H. Lau, and T. Baldwin, “IndoBERTweet: A Pretrained Language Model for Indonesian Twitter with Effective Domain-Specific Vocabulary Initialization,” in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021, pp. 10660–10668. doi: 10.18653/v1/2021.emnlp-main.833.

F. Koto, A. Rahimi, J. H. Lau, and T. Baldwin, “IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP,” in arXiv, Nov. 2020, pp. 757–770. doi: 10.18653/v1/2020.coling-main.66.

A. F. Hidayatullah, K. Kalinaki, M. M. Aslam, R. Y. Zakari, and W. Shafik, “Fine-Tuning BERT-Based Models for Negative Content Identification on Indonesian Tweets,” in 2023 8th International Conference on Information Technology and Digital Applications (ICITDA), Nov. 2023, pp. 1–6. doi: 10.1109/ICITDA60835.2023.10427046.

A. P. Kirani, R. Saptono, and R. Anggrainingsih, “Which Features Matter Most? Evaluating Numerical and Textual Features for Helpfulness Classification in Imbalance Dataset using XGBoost,” Sci. J. Informatics, vol. 12, no. 4, pp. 731–742, Nov. 2025, doi: 10.15294/sji.v12i4.33443.

R. Saptono and T. Mine, “Time-based Sampling Methods for Detecting Helpful Reviews,” in 2020 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT), Dec. 2020, pp. 508–513. doi: 10.1109/WIIAT50758.2020.00076.

I. P. Windasari, F. N. Uzzi, and K. I. Satoto, “Sentiment analysis on Twitter posts: An analysis of positive or negative opinion on GoJek,” in 2017 4th International Conference on Information Technology, Computer, and Electrical Engineering (ICITACEE), Oct. 2017, pp. 266–269. doi: 10.1109/ICITACEE.2017.8257715.

C.-H. Du, M.-F. Tsai, and C.-J. Wang, “Beyond Word-level to Sentence-level Sentiment Analysis for Financial Reports,” in ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), May 2019, pp. 1562–1566. doi: 10.1109/ICASSP.2019.8683085.

D. Sebastian, H. D. Purnomo, and I. Sembiring, “BERT for Natural Language Processing in Bahasa Indonesia,” in 2022 2nd International Conference on Intelligent Cybernetics Technology & Applications (ICICyTA), Dec. 2022, pp. 204–209. doi: 10.1109/ICICyTA57421.2022.10038230.

B. R. P. Darnoto and D. B. Firmawan, “Language-Similarity-Guided Transfer Fine-Tuning of Pre-trained Transformer Models for Sentiment Analysis Across 12 Indonesian Regional Languages,” J. Comput. Theor. Appl., vol. 3, no. 4, pp. 547–563, May 2026, doi: 10.62411/jcta.15975.

E. Supriyadi and P. N. Makatita, “Sentiment Analysis of TikTok User Comments on QRIS Adoption in Indonesia Using IndoBERT,” Procedia Comput. Sci., vol. 269, pp. 121–130, 2025, doi: 10.1016/j.procs.2025.08.265.

D. R. I. M. Setiadi, A. R. Muslikh, S. W. Iriananda, W. Warto, J. Gondohanindijo, and A. A. Ojugo, “Outlier Detection Using Gaussian Mixture Model Clustering to Optimize XGBoost for Credit Approval Prediction,” J. Comput. Theor. Appl., vol. 2, no. 2, pp. 244–255, Nov. 2024, doi: 10.62411/jcta.11638.

J. Cohen, “A Coefficient of Agreement for Nominal Scales,” Educ. Psychol. Meas., vol. 20, no. 1, pp. 37–46, Apr. 1960, doi: 10.1177/001316446002000104.

J. L. Fleiss, “Measuring nominal scale agreement among many raters.,” Psychol. Bull., vol. 76, no. 5, pp. 378–382, Nov. 1971, doi: 10.1037/h0031619.

J. R. Landis and G. G. Koch, “The Measurement of Observer Agreement for Categorical Data,” Biometrics, vol. 33, no. 1, p. 159, Mar. 1977, doi: 10.2307/2529310.

T. Akiba, S. Sano, T. Yanase, T. Ohta, and M. Koyama, “Optuna,” in Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Jul. 2019, pp. 2623–2631. doi: 10.1145/3292500.3330701.

T. Wolf et al., “Transformers: State-of-the-Art Natural Language Processing,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, 2020, pp. 38–45. doi: 10.18653/v1/2020.emnlp-demos.6.

P. Röttger and J. Pierrehumbert, “Temporal Adaptation of BERT and Performance on Downstream Document Classification: Insights from Social Media,” in Findings of the Association for Computational Linguistics: EMNLP 2021, 2021, pp. 2400–2412. doi: 10.18653/v1/2021.findings-emnlp.206.

Year	Acceptance Rate	Days to First Decision
2025	35%	2 days
2024	45%	3 days

Sentence-Level Sentiment Analysis of Indonesian App Reviews Using IndoBERTweet

Authors

DOI:

Keywords:

Abstract

Author Biographies

Inge Najwa Aqiilah, Sebelas Maret University

Ristu Saptono, Sebelas Maret University

Akhmad Syaifuddin, Sebelas Maret University

References

Downloads

Published

How to Cite

Issue

Section

License

Information


This journal is licensed under a Creative Commons Attribution 4.0 International License.