Multi-label Classification of Indonesian Al-Quran Translation based CNN, BiLSTM, and FastText

Authors

  • Ahmad Rofiqul Muslikh University of Merdeka Malang
  • Ismail Akbar University of Merdeka Malang
  • De Rosal Ignatius Moses Setiadi Universitas Dian Nuswantoro http://orcid.org/0000-0001-6615-4457
  • Hussain Md Mehedul Islam

DOI:

https://doi.org/10.62411/tc.v23i1.9925

Keywords:

Bi-LSTM, CNN, FastText, Multi-label text classification, Quran translation

Abstract

Studying the Qur'an is a pivotal act of worship in Islam, which necessitates a structured understanding of its verses to facilitate learning and referencing. Reflecting this complexity, each Quranic verse is rich with unique thematic elements and can be classified into a range of distinct categories. This study explores the enhancement of a multi-label classification model through the integration of FastText. Employing a CNN+Bi-LSTM architecture, the research undertakes the classification of Quranic translations across categories such as Tauhid, Ibadah, Akhlak, and Sejarah. Based on model evaluation using F1-Score, it shows significant differences between the CNN+Bi-LSTM model without FastText, with the highest result being 68.70% in the 80:20 testing configuration. Conversely, the CNN+Bi-LSTM+FastText model, combining embedding size and epoch parameters, achieves a result of 73.30% with an embedding size of 200, epoch of 100, and a 90:10 testing configuration. These findings underscore the significant impact of FastText on model optimization, with an enhancement margin of 4.6% over the base model.

Author Biography

Ahmad Rofiqul Muslikh, University of Merdeka Malang

Scopus ID: 57282930500

References

A. Mokrani, “Islamic Hermeneutics of Nonviolence: Key Concepts and Methodological Steps,” Religions, vol. 13, no. 4, p. 295, Mar. 2022, doi: 10.3390/rel13040295.

M. A. Ahmed, H. Baharin, and P. N. Ellyza Nohuddin, “K-means variations analysis for translation of English Tafseer Al-Quran text,” Int. J. Electr. Comput. Eng., vol. 13, no. 3, p. 3255, Jun. 2023, doi: 10.11591/ijece.v13i3.pp3255-3265.

M. Fauzan, H. Junaedi, and E. Setyati, “KLASIFIKASI AL – QUR’AN TERJEMAHAN BAHASA INDONESIA DENGAN MENGGUNAKAN ALGORITMA SUPPORT VECTOR MACHINE (SVM),” KONVERGENSI, vol. 18, no. 2, pp. 42–49, Dec. 2022, doi: 10.30996/konv.v18i1.6912.

M. H. Bashir et al., “Arabic natural language processing for Qur’anic research: a systematic review,” Artif. Intell. Rev., vol. 56, no. 7, pp. 6801–6854, Jul. 2023, doi: 10.1007/s10462-022-10313-2.

E. Junianto and R. Rachman, “Implementation of Text Mining Model to Emotions Detection on Social Media Comments Using Particle Swarm Optimization and Naive Bayes Classifier,” in 2019 7th International Conference on Cyber and IT Service Management (CITSM), Nov. 2019, pp. 1–6. doi: 10.1109/CITSM47753.2019.8965382.

Y. Ying, T. N. Mursitama, Shidarta, and Lohansen, “Effectiveness of the News Text Classification Test Using the Naïve Bayes’ Classification Text Mining Method,” J. Phys. Conf. Ser., vol. 1764, no. 1, p. 012105, Feb. 2021, doi: 10.1088/1742-6596/1764/1/012105.

B. M. P. Waseso and N. A. Setiyanto, “Web Phishing Classification using Combined Machine Learning Methods,” J. Comput. Theor. Appl., vol. 1, no. 1, pp. 11–18, Aug. 2023, doi: 10.33633/jcta.v1i1.8898.

B. Yang et al., “Automatic Text Classification for Label Imputation of Medical Diagnosis Notes Based on Random Forest,” 2018, pp. 87–97. doi: 10.1007/978-3-030-01078-2_8.

Y. Fang and P. Zhang, “Recognition of Spam Messages Based on Text Mining,” J. Phys. Conf. Ser., vol. 1624, no. 5, p. 052024, Oct. 2020, doi: 10.1088/1742-6596/1624/5/052024.

F. Mustofa, A. N. Safriandono, and A. R. Muslikh, “Dataset and Feature Analysis for Diabetes Mellitus Classification using Random Forest,” J. Comput. Theor. Appl., vol. 1, no. 1, pp. 41–48, Sep. 2023, doi: 10.33633/jcta.v1i1.9190.

M. I. Akazue, I. A. Debekeme, A. E. Edje, C. Asuai, and U. J. Osame, “UNMASKING FRAUDSTERS: Ensemble Features Selection to Enhance Random Forest Fraud Detection,” J. Comput. Theor. Appl., vol. 1, no. 2, pp. 201–211, Dec. 2023, doi: 10.33633/jcta.v1i2.9462.

Nur Ghaniaviyanto Ramadhan, “Indonesian Online News Topics Classification using Word2Vec and K-Nearest Neighbor,” J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 5, no. 6, pp. 1083–1089, 2021, doi: 10.29207/resti.v5i6.3547.

C. Wei, P. Zhou, D. Li, Y. Zhong, and Y. Han, “Application of SA-KNN combined with SVM based on different kernel functions in pattern knowledge mining,” in International Conference on Computer Graphics, Artificial Intelligence, and Data Processing (ICCAID 2021), Mar. 2022, p. 23. doi: 10.1117/12.2631116.

J. Hermanto, “Klasifikasi Teks Humor Bahasa Indonesia Memanfaatkan SVM,” J. Inf. Syst. Hosp. Technol., vol. 3, no. 01, pp. 39–48, 2021, doi: 10.37823/insight.v3i01.118.

K. I. Gunawan and J. Santoso, “Multilabel Text Classification Menggunakan SVM dan Doc2Vec Classification Pada Dokumen Berita Bahasa Indonesia,” Journal of Information System,Graphics, Hospitality and Technology, vol. 3, no. 01. pp. 29–38, 2021. doi: 10.37823/insight.v3i01.126.

A. Hanafi, A. Adiwijaya, and W. Astuti, “Klasifikasi Multi Label pada Hadis Bukhari Terjemahan Bahasa Indonesia Menggunakan Mutual Information dan k-Nearest Neighbor,” J. Sisfokom (Sistem Inf. dan Komputer), vol. 9, no. 3, pp. 357–364, 2020, doi: 10.32736/sisfokom.v9i3.980.

R. Hayami, S. Mohnica, and Soni, “Klasifikasi multilabel komentar toxic pada sosial media twitter menggunakan convolutional neural network(CNN),” J. CoSciTech (Computer Sci. Inf. Technol., vol. 4, no. 1, pp. 1–6, 2023, doi: 10.37859/coscitech.v4i1.4365.

I. F. Rozi, V. N. Wijayaningrum, and N. Khozin, “Klasifikasi Teks Laporan Masyarakat Pada Situs Lapor! Menggunakan Recurrent Neural Network,” Sistemasi, vol. 9, no. 3, p. 633, 2020, doi: 10.32520/stmsi.v9i3.977.

A. Omar, T. M. Mahmoud, T. Abd-El-Hafeez, and A. Mahfouz, “Multi-label Arabic text classification in Online Social Networks,” Inf. Syst., vol. 100, p. 101785, 2021, doi: 10.1016/j.is.2021.101785.

W. Kurnia Sari, D. Palupi Rini, R. Firsandaya Malik, and I. B. Saladin Azhar, “Klasifikasi Teks Multilabel pada Artikel Berita Menggunakan Long Short-Term Memory dengan Word2Vec,” J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 1, no. 3, pp. 276–285, 2017.

S. Sayyida, “Ayat-Ayat Tauhid Terhadap Budaya Pemeliharaan Keris Di Jawa (Studi Kasus Buku Mt Arifin),” J. Qur’an Hadith Stud., vol. 6, no. 1, pp. 24–52, 2019, doi: 10.15408/quhas.v6i1.13403.

A. Kallang, “Konteks Ibadah Menurut Al-Quran,” Al-Din J. Dakwah dan Sos. Keagamaan, vol. 4, no. 2, pp. 1–13, 2018, doi: 10.35673/ajdsk.v4i2.630.

M. Murharyana, I. I. Al Ayyubi, and R. Rohmatulloh, “Pendidikan Akhlak Anak Kepada Orang Tua Dalam Perspektif Al-Quran,” Piwulang J. Pendidik. Agama Islam, vol. 5, no. 2, pp. 175–191, 2023, doi: http://dx.doi.org/10.32478/piwulang.v5i2.1515.

J. Mirdad and S. Rahmat, “Sejarah Dalam Perspektif Islam,” El -Hekam, vol. 6, no. 1, p. 9, 2021, doi: 10.31958/jeh.v6i1.3303.

S. Bessou and R. Aberkane, “Subjective Sentiment Analysis for Arabic Newswire Comments,” J. Digit. Inf. Manag., vol. 17, no. 5, p. 289, 2019, doi: 10.6025/jdim/2019/17/5/289-295.

A. A. Firdaus, A. Yudhana, and I. Riadi, “Public Opinion Analysis of Presidential Candidate Using Naïve Bayes Method,” Kinet. Game Technol. Inf. Syst. Comput. Network, Comput. Electron. Control, vol. 4, no. 2, 2023, doi: 10.22219/kinetik.v8i2.1686.

E. Y. Sari, A. D. Wierfi, and A. Setyanto, “Sentiment Analysis of Customer Satisfaction on Transportation Network Company Using Naive Bayes Classifier,” 2019 Int. Conf. Comput. Eng. Network, Intell. Multimedia, CENIM 2019 - Proceeding, vol. 2019-Novem, 2019, doi: 10.1109/CENIM48368.2019.8973262.

A. Alwehaibi, M. Bikdash, M. Albogmi, and K. Roy, “A study of the performance of embedding methods for Arabic short-text sentiment analysis using deep learning approaches,” J. King Saud Univ. - Comput. Inf. Sci., no. xxxx, 2021, doi: 10.1016/j.jksuci.2021.07.011.

Y. Taher, A. Moussaoui, and F. Moussaoui, “Automatic Fake News Detection based on Deep Learning, FastText and News Title,” Int. J. Adv. Comput. Sci. Appl., vol. 13, no. 1, pp. 146–158, 2022, doi: 10.14569/IJACSA.2022.0130118.

H. A. Almuzaini and A. M. Azmi, “Impact of Stemming and Word Embedding on Deep Learning-Based Arabic Text Categorization,” IEEE Access, vol. 8, pp. 127913–127928, 2020, doi: 10.1109/ACCESS.2020.3009217.

D. I. Af’idah, D. Dairoh, S. F. Handayani, and R. W. Pratiwi, “Pengaruh Parameter Word2Vec terhadap Performa Deep Learning pada Klasifikasi Sentimen,” J. Inform. J. Pengemb. IT, vol. 6, no. 3, pp. 156–161, 2021, doi: 10.30591/jpit.v6i3.3016.

E. Lim, E. I. Setiawan, and J. Santoso, “Stance Classification Post Kesehatan di Media Sosial Dengan FastText Embedding dan Deep Learning,” J. Intell. Syst. Comput., vol. 1, no. 2, pp. 65–73, 2019, doi: 10.52985/insyst.v1i2.86.

A. Saifudin, “Metode Data Mining Untuk Seleksi Calon Mahasiswa,” vol. 10, no. 1, pp. 25–36, 2018, doi: https://dx.doi.org/10.24853/jurtek.10.1.25-36.

Yudi Widhiyasana, Transmissia Semiawan, Ilham Gibran Achmad Mudzakir, and Muhammad Randi Noor, “Penerapan Convolutional Long Short-Term Memory untuk Klasifikasi Teks Berita Bahasa Indonesia,” J. Nas. Tek. Elektro dan Teknol. Inf., vol. 10, no. 4, pp. 354–361, 2021, doi: 10.22146/jnteti.v10i4.2438.

S. Ali, A. Hashmi, A. Hamza, U. Hayat, and H. Younis, “Dynamic and Static Handwriting Assessment in Parkinson ’ s Disease : A Synergistic Approach with C-Bi-GRU and VGG19,” J. Comput. Theor. Appl., vol. 1, no. 2, pp. 151–162, 2023, doi: 10.33633/jcta.v1i2.9469.

D. T. Hermanto, A. Setyanto, and E. T. Luthfi, “Algoritma LSTM-CNN untuk Binary Klasifikasi dengan Word2vec pada Media Online,” Creat. Inf. Technol. J., vol. 8, no. 1, p. 64, 2021, doi: 10.24076/citec.2021v8i1.264.

H. Deng, D. Ergu, F. Liu, Y. Cai, and B. Ma, “Text sentiment analysis of fusion model based on attention mechanism,” Procedia Comput. Sci., vol. 199, pp. 741–748, 2021, doi: 10.1016/j.procs.2022.01.092.

A. M. Ertugrul and P. Karagoz, “Movie Genre Classification from Plot Summaries Using Bidirectional LSTM,” Proc. - 12th IEEE Int. Conf. Semant. Comput. ICSC 2018, vol. 2018-Janua, pp. 248–251, 2018, doi: 10.1109/ICSC.2018.00043.

N. N. Wijaya, D. R. Ignatius, M. Setiadi, and A. R. Muslikh, “Music-Genre Classification using Bidirectional Long Short- Term Memory and Mel-Frequency Cepstral Coefficients,” J. Comput. Theor. Appl., vol. 2, no. 1, pp. 13–26, 2024, doi: 10.33633/jcta.v2i1.9655.

A. R. Isnain, A. Sihabuddin, and Y. Suyanto, “Bidirectional Long Short Term Memory Method and Word2vec Extraction Approach for Hate Speech Detection,” IJCCS (Indonesian J. Comput. Cybern. Syst., vol. 14, no. 2, p. 169, 2020, doi: 10.22146/ijccs.51743.

A. Alfando and R. Hayami, “Klasifikasi Teks Berita Berbahasa Indonesia Menggunakan Machine Learning Dan Deep Learning: Studi Literatur,” JATI (Jurnal Mahasiswa Teknik Informatika), vol. 7, no. 1. pp. 681–686, 2023. doi: 10.36040/jati.v7i1.6486.

T. H. Putrisanni, A. Adiwijaya, and S. Al Faraby, “Klasifikasi Ayat Al-Quran Terjemahan Bahasa Inggris Menggunakan K-Nearest Neighbor (Knn) Dan Information Gain,” KOMIK (Konferensi Nas. Teknol. Inf. dan Komputer), vol. 3, no. 1, pp. 362–369, 2019, doi: 10.30865/komik.v3i1.1614.

A. Abdullahi, N. A. Samsudin, M. H. A. Rahim, S. K. A. Khalid, and R. Efendi, “Multi-label classification approach for Quranic verses labeling,” Indones. J. Electr. Eng. Comput. Sci., vol. 24, no. 1, pp. 484–490, 2021, doi: 10.11591/ijeecs.v24.i1.pp484-490.

Downloads

Published

2024-06-18