Part-of-Speech Tagging Bahasa Jawa Menggunakan Model Pre-Trained Bidirectional Encoder Representation from Transformers

Ahmad  Izzuddin; Nuzul Hikmah; Muhammad Alvin Ajry

doi:10.33633/joins.v11i1.14923

Authors

Ahmad Izzuddin Universitas Panca Marga
Nuzul Hikmah Universitas Panca Marga
Muhammad Alvin Ajry Universitas Panca Marga

DOI:

https://doi.org/10.33633/joins.v11i1.14923

Keywords:

BERT, deep learing, javanese language, part-of-speech tagging

Abstract

Part-of-Speech Tagging (POS tagging) is the process of determining word classes in a text that is important in natural language processing. In Javanese, POS tagging is still a challenge due to limited linguistic resources and the complexity of the language. With the development of deep learning technology, the BERT (Bidirectional Encoder Representations from Transformers) fine-tuning method has been applied to classify word classes in Javanese, which is a language with limited resources. The javanese-bert-small model was trained using the UD_Javanese-CSUI dataset, and evaluated using precision, recall, F1-score, and accuracy metrics. The results showed that the model achieved good performance with an accuracy of 88,87%, and showed stability during training without significant overfitting. These findings indicate that the BERT-based approach is effective in handling word class ambiguity in Javanese and can be a stepping stone for further development in NLP systems for regional languages.

References

S. M. Ah, R. D. Permata, and R. Nugrahani, “Pengaruh Pemanfaatan Aplikasi Digital Berbasis Android terhadap Perkembangan Bahasa Jawa pada Anak Usia Dini,” Indones. Res. J. Educ., vol. 5, no. https://irje.org/irje/issue/view/15, pp. 155 – 163, 2025, doi: https://doi.org/10.31004/irje.v5i1.1801.

I. Alfina, A. Yuliawati, D. Tanaya, A. Dinakaramani, and D. Zeman, “A Gold Standard Dataset for Javanese Tokenization, POS Tagging, Morphological Feature Tagging, and Dependency Parsing,” Forum Linguist. Stud., vol. 6, no. 5, pp. 131–148, 2024, doi: 10.30564/fls.v6i5.6957.

A. Raup, W. Ridwan, Y. Khoeriyah, S. Supiana, and Q. Y. Zaqiah, “Deep Learning dan Penerapannya dalam Pembelajaran,” JIIP - J. Ilm. Ilmu Pendidik., vol. 5, no. 9, pp. 3258–3267, 2022, doi: 10.54371/jiip.v5i9.805.

T. M. Nasir, Y. K. I. Rohima, M. Sabarudin, M. Yasir, S. Supiana, and Q. Y. Zaqiah, “Innovation in the Field of Learning: Deep Learning Approach and Its Application in Learning at Hayat School Bandung City,” Al Ulya J. Pendidik. Islam, vol. 10, no. 2, pp. 221–239, 2025, doi: 10.32665/alulya.v10i2.4414.

Y. Banua and W. Wiji, “The Implementation of Deep Learning Based Experiential Learning in Developing Metacognitive and Critical Thinking Skills of High School Students: A Systematic Literature Review,” Eurasia Proc. Educ. Soc. Sci., vol. 46, pp. 10–19, 2025, doi: 10.55549/epess.977.

E. C. Garrido-Merchan, R. Gozalo-Brizuela, and S. Gonzalez-Carvajal, “Comparing BERT Against Traditional Machine Learning Models in Text Classification,” J. Comput. Cogn. Eng., vol. 2, no. 4, pp. 352–356, 2023, doi: 10.47852/bonviewJCCE3202838.

M. Raquib et al., “A Unified BERT-CNN-BiLSTM Framework for Simultaneous Headline Classification and Sentiment Analysis of Bangla News,” pp. 1–16, 2025, [Online]. Available: http://arxiv.org/abs/2511.18618

Y. Liang and J. Liu, “Robust Text Classification via Improved CNN, Unbalanced BiLSTM, and Multi-Head Attention,” Informatica, vol. 49, no. 35, pp. 95–108, 2025, doi: 10.31449/inf.v49i35.11100.

M. Homburg et al., “AI-driven early infectious disease detection in Dutch primary care using BERT and ERNIE,” npj Digit. Med., 2025, doi: 10.1038/s41746-025-02278-7.

J. Rawa and J. Sienkiewicz, “Quantifying correlations between information overload and fake news during COVID-19 pandemic: a Reddit study with BERT model approach,” pp. 1–22, 2026, [Online]. Available: http://arxiv.org/abs/2601.00496

M. Alfian, U. L. Yuhana, and D. Siahaan, “Indonesian Part-of-Speech Tagger: A Comparative Study,” 2023 10th Int. Conf. Adv. Informatics Concept, Theory Appl. ICAICTA 2023, no. October 2023, pp. 1–6, 2023, doi: 10.1109/ICAICTA59291.2023.10390353.

M. Alfian, U. L. Yuhana, D. Siahaan, H. Munazharoh, and E. Pardede, “Out-of-Vocabulary Handling in Part-of-Speech Tagging: A Semantic Web-Driven Systematic Review,” Int. J. Semant. Web Inf. Syst., vol. 21, no. 1, pp. 1–36, 2025, doi: 10.4018/IJSWIS.388421.

A. Sultana and F. Ahmed, “Explicit Grammar Semantic Feature Fusion for Robust Text Classification,” 2026, [Online]. Available: http://arxiv.org/abs/2602.20749

A. Zilziana, A. A. Suryani, and I. Asror, “Part of Speech Tagging Menggunakan Bahasa Jawa Dengan Metode Condition Random Fields,” e-Proceeding Eng., vol. 7, no. 2, pp. 8103–8111, 2020.

H. Li, H. Mao, and J. Wang, “Part-of-speech tagging with rule-based data preprocessing and transformer,” Electron., vol. 11, no. 1, 2022, doi: 10.3390/electronics11010056.

H. Visuwalingam, R. Sakuntharaj, J. Alawatugoda, and R. Ragel, “Deep Learning Model for Tamil Part-of-Speech Tagging,” Comput. J., vol. 67, no. 8, pp. 2633–2642, 2024, doi: 10.1093/comjnl/bxae033.

P. Sonawane, K. T. Patil, R. P. Bhavsar, and B. V Pawar, “POS Tagging : A Review of Recent Techniques,” 2026.

Ryan Armiditya Pratama, A. A. Suryani, and W. Maharani, “Part of Speech Tagging for Javanese Language with Hidden Markov Model,” J. Comput. Sci. Informatics Eng., vol. 4, no. 1, pp. 84–91, 2020, doi: 10.29303/jcosine.v4i1.346.

D. Fimoza, A. Amalia, and T. Henny Febriana Harumy, “Sentiment Analysis for Movie Review in Bahasa Indonesia Using BERT,” 2021 Int. Conf. Data Sci. Artif. Intell. Bus. Anal. DATABIA 2021 - Proc., pp. 27–34, 2021, doi: 10.1109/DATABIA53375.2021.9650096.

P. You, C. So, S. Choe, and Y. Lee, “Word Embeddings Network and Transformer Based Part of Speech Tagging for Korean,” vol. 12, no. 1, pp. 11–24, 2026.

Y. Jumaryadi, R. Meiyanti, R. Fajriah, A. N. Mahsyar, and P. S. Anggraeni, “Implementasi Algoritma Random Forest untuk Analisis Sentimen Ulasan Pengguna Aplikasi Merdeka Mengajar,” Bull. Comput. Sci. Res., vol. 5, no. 4, pp. 813–820, 2025, doi: 10.47065/bulletincsr.v5i4.530.

W. Wongso, D. S. Setiawan, and D. Suhartono, “Causal and Masked Language Modeling of Javanese Language using Transformer-based Architectures,” 2021 Int. Conf. Adv. Comput. Sci. Inf. Syst. ICACSIS 2021, 2021, doi: 10.1109/ICACSIS53237.2021.9631331.