Perangkingan Dokumen Berbahasa Arab berdasarkan Query dengan Metode Klasifikasi Naïve Bayes dan K-Nearest Neighbor
DOI:
https://doi.org/10.33633/tc.v19i4.3939Keywords:
perangkingan dokumen, naïve bayes, k-nearest neighborAbstract
Penelitian tentang perangkingan dokumen pada temu kembali informasi saat ini mudah ditemukan, hal ini terkait perkembangan keilmuan dibidang penggalian informasi yang bergerak sangat cepat. Namun, Walaupun sudah penelitian yang menggunakan Bahasa Arab sebagai objek masih terbatas. Karena keterbatasan penggunaan dokumen Bahasa Arab untuk penelitian bidang penggalian informasi maka penulis mencoba melakukan pendekatan sederhana, yaitu dengan mengimplementasikan metode klasifikasi naïve bayes dan k-Nearest Neighbor (k-NN). Tujuan dari penelitian ini adalah untuk mengetahui apakah metode klasifikasi terutama naïve bayes dan k-NN dapat digunakan untuk melakukan perangkingan, dan juga membandingkan akurasi dari kedua metode tersebut. Berdasarkan penelitian yang dilakukan, didapatkan hasil bahwa perangkingan dengan metode klasifikasi dapat dilakukan dengan tingkat akurasi metode Naïve Bayes lebih baik dibandingkan dengan metode k-NN dengan rata-rata nilai F1 Measure mencapai 72%, rata-rata nilai precision mencapai 75%, dan rata-rata nilai recall mencapai 80%. Sedangkan hasil dari metode k-NN diperoleh rata-rata nilai F1 Measure mencapai 70%, rata-rata nilai precision mencapai 76%, dan rata-rata nilai recall mencapai 79%. Namun penelitian ini masih kurang dari segi teknik yang dilakukan, yaitu dengan menghilangkan proses stemming. Sehngga penulis memberikan saran untuk penelitian selanjutnya supaya bisa dilakukan proses stemming dan menggunakan metode perangkingan yang lebih baru.References
Y. CAO, J. XU, T.-Y. LIU, H. LI, Y. HUANG, and H.-W. HON, “Adapting Ranking SVM to Document Retrieval,” Spec. Interes. Gr. Inf. Retr., 2006, Accessed: Aug. 03, 2020. [Online]. Available: http://www.bigdatalab.ac.cn/~junxu/publications/SIGIR2006_AdaptSVM.pdf.
R. Al-Shalabi, G. G. Kanan, and M. H. Gharaibeh, “Arabic Text Categorization Using kNN Algorithm,” Jordan, 2006. Accessed: Aug. 03, 2020. [Online]. Available: https://www.researchgate.net/publication/228802987_Arabic_Text_Categorization_Using_kNN_Algorithm.
I. Lukmana, A. Z. Arifin, and D. Purwitasari, “Perangkingan Dokumen Berbahasa Arab Berdasarkan Susunan Posisi Kata dari Query,” Surabaya, 2012. Accessed: Aug. 03, 2020. [Online]. Available: http://digilib.its.ac.id/public/ITS-paper-20218-5106100801-Paper.pdf.
S. H. Mustafa, “Character contiguity in N-gram-based word matching: The case for Arabic text searching,” Inf. Process. Manag., vol. 41, no. 4, pp. 819–827, Jul. 2005, doi: 10.1016/j.ipm.2004.02.003.
A. Najibullah, A. Z. Arifin, and D. Purwitasari, “Implementasi N-Gram Dalam Pencarian Teks Sebagai Penunjang Aplikasi Perpustakaan Kitab Berbahasa Arab,” Surabaya.
E. Abd, E. Nagwa, L. Badr, and M. Fahmy Tolba, “An Efficient Ranking Module for an Arabic Search Engine,” 2010.
S. H. Mustafa, “Arabic string searching in the context of character code standards and orthographic variations,” Comput. Stand. Interfaces, vol. 20, no. 1, pp. 31–51, Nov. 1998, doi: 10.1016/S0920-5489(98)00032-4.
A. Rozaq, A. Z. Arifin, and D. Purwitasari, “Klasifikasi Dokumen Teks Berbahasa Arab menggunakan Algoritma Naïve Bayes,” Surabaya, 2011. Accessed: Aug. 03, 2020. [Online]. Available: http://digilib.its.ac.id/public/ITS-Undergraduate-16871-klasifikasi-dokumen-teks-berbahasa-arab-menggunakan-algoritma-naive-bayes.pdf.
Y. Liu, “Learning to Rank for Information Retrieval,” Found. Trends R Inf. Retr., vol. 3, no. 3, pp. 225–331, 2009, doi: 10.1561/1500000016.
H. Mahgoub, D. Rösner, N. Ismail, and F. Torkey, “A Text Mining Technique Using Association Rules Extraction,” Int. J. Comput. Intell., vol. 4, no. 1, pp. 21–28, 2008, Accessed: Aug. 04, 2020. [Online]. Available: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.212.8624&rep=rep1&type=pdf.
R. Manurung, F. Pisceldo, and M. Adriani, “Probabilistic Part Of Speech Tagging for Bahasa Indonesia,” 2009. Accessed: Aug. 07, 2020. [Online]. Available: https://www.researchgate.net/publication/228531087.
A. S. M. Arif, M. M. Rahman, and S. Y. Mukta, “Information retrieval by modified term weighting method using random walk model with query term position ranking,” in 2009 International Conference on Signal Processing Systems, ICSPS 2009, 2009, pp. 526–530, doi: 10.1109/ICSPS.2009.122.
S. Joshi and B. Nigam, “Categorizing the document using multi class classification in data mining,” in Proceedings - 2011 International Conference on Computational Intelligence and Communication Systems, CICN 2011, 2011, pp. 251–255, doi: 10.1109/CICN.2011.50.
T. M. Mitchell, “Machine Learning.”
W. Hidayat, “Penerapan K-Nearest Neighbour untuk Klasifikasi Gambar landscape Berdasarkan Fitur Warna Dan Tekstur.” Universitas Telkom, 2007, Accessed: Aug. 07, 2020. [Online]. Available: https://openlibrary.telkomuniversity.ac.id/home/catalog/id/95029/slug/penerapan-k-nearest-neighbour-untuk-klasifikasi-gambar-landscape-berdasarkan-fitur-warna-dan-tekstur.html.
Downloads
Published
Issue
Section
License
Copyright (c) 2020 Usfita Kiftiyani

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
License Terms
All articles published in Techno.COM Journal are licensed under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0). This means:
1. Attribution
Readers and users are free to:
-
Share – Copy and redistribute the material in any medium or format.
-
Adapt – Remix, transform, and build upon the material.
As long as proper credit is given to the original work by citing the author(s) and the journal.
2. Non-Commercial Use
-
The material cannot be used for commercial purposes.
-
Commercial use includes selling the content, using it in commercial advertising, or integrating it into products/services for profit.
3. Rights of Authors
-
Authors retain copyright and grant Techno.COM Journal the right to publish the article.
-
Authors can distribute their work (e.g., in institutional repositories or personal websites) with proper acknowledgment of the journal.
4. No Additional Restrictions
-
The journal cannot apply legal terms or technological measures that restrict others from using the material in ways allowed by the license.
5. Disclaimer
-
The journal is not responsible for how the published content is used by third parties.
-
The opinions expressed in the articles are solely those of the authors.
For more details, visit the Creative Commons License Page:
? https://creativecommons.org/licenses/by-nc/4.0/