Teknik Resampling untuk Mengatasi Ketidakseimbangan Kelas pada Klasifikasi Penyakit Diabetes Menggunakan C4.5, Random Forest, dan SVM
DOI:
https://doi.org/10.33633/tc.v20i3.4762Keywords:
Resampling, Ketidakseimbangan Kelas, Klasifikasi, Area Under Curve (AUC)Abstract
Penderita diabetes di seluruh dunia terus mengalami peningkatan dengan angka kematian sebesar 4,6 juta pada tahun 2011 dan diperkirakan akan terus meningkat secara global menjadi 552 juta pada tahun 2030. Pencegahan Penyakit diabetes mungkin dapat dilakukan secara efektif dengan cara mendeteksinya sejak dini. Data mining dan machine learning terus dikembangkan agar menjadi alat yang handal dalam membangun model komputasi untuk mengidentifikasi penyakit diabetes pada tahap awal. Namun, masalah yang sering dihadapi dalam menganalisis penyakit diabetes ialah masalah ketidakseimbangan class. Kelas yang tidak seimbang membuat model pembelajaran akan sulit melakukan prediksi karena model pembelajaran didominasi oleh instance kelas mayoritas sehingga mengabaikan prediksi kelas minoritas. Pada penelitian ini kami mencoba menganalisa dan mencoba mengatasi masalah ketidakseimbangan kelas dengan menggunakan pendekatan level data yaitu teknik resampling data. Eksperimen ini menggunakan R language dengan library ROSE (version 0.0-4). Dataset Pima Indians dipilih pada penelitian ini karena merupakan salah satu dataset yang mengalami ketidakseimbangan kelas. Model pengklasifikasian pada penelitian ini menggunakan algoritma decision tree C4.5, RF (Random Forest), dan SVM (Support Vector Machines). Dari hasil eksperimen yang dilakukan model klasifikasi SVM dengan teknik resampling yang menggabungkan over dan under-sampling menjadi model yang memiliki performa terbaik dengan nilai AUC (Area Under Curve) sebesar 0.80References
Y. Hayashi and S. Yukita, “Rule extraction using Recursive-Rule extraction algorithm with J48graft combined with sampling selection techniques for the diagnosis of type 2 diabetes mellitus in the Pima Indian dataset,” Informatics Med. Unlocked, vol. 2, pp. 92–104, 2016, doi: 10.1016/j.imu.2016.02.001.
B. P. Manoj Kumar, S. R. Perumal, and N. R. K, “Type 2: Diabetes mellitus prediction using Deep Neural Networks classifier,” Int. J. Cogn. Comput. Eng., vol. 1, pp. 55–61, 2020, doi: 10.1016/j.ijcce.2020.10.002.
J. J. Khanam and S. Y. Foo, “A comparison of machine learning algorithms for diabetes prediction,” ICT Express, no. xxxx, 2021, doi: 10.1016/j.icte.2021.02.004.
S. A. Kaveeshwar and J. Cornwall, “The current state of diabetes mellitus in India,” Australas. Med. J., vol. 7, no. 1, pp. 45–48, 2014, doi: 10.4066/AMJ.2014.1979.
G. Swapna, R. Vinayakumar, and K. P. Soman, “Diabetes detection using deep learning algorithms,” ICT Express, vol. 4, no. 4, pp. 243–246, 2018, doi: 10.1016/j.icte.2018.10.005.
N. Chawla, K. Bowyer, L. Hall, and W. Kegelmeyer, “SMOTE: synthetic minority over-sampling technique?,” J. Artif. Intell. Res., vol. 16, pp. 321–357, 2002, doi: 10.1613/jair.953.
H. Hairani, K. E. Saputro, and S. Fadli, “K-means-SMOTE for handling class imbalance in the classification of diabetes with C4.5, SVM, and naive Bayes,” J. Teknol. dan Sist. Komput., vol. 8, no. 2, pp. 89–93, 2020, doi: 10.14710/jtsiskom.8.2.2020.89-93.
W. Nugraha, M. S. Maulana, and A. Sasongko, “Clustering Based Undersampling for Handling Class Imbalance in C4.5 Classification Algorithm,” J. Phys. Conf. Ser., vol. 1641, no. 1, pp. 1–6, 2020, doi: 10.1088/1742-6596/1641/1/012014.
C. Beyan and R. Fisher, “Classifying imbalanced data sets using similarity based hierarchical decomposition,” Pattern Recognit., vol. 48, no. 5, pp. 1653–1672, 2015, doi: 10.1016/j.patcog.2014.10.032.
W. C. Lin, C. F. Tsai, Y. H. Hu, and J. S. Jhang, “Clustering-based undersampling in class-imbalanced data,” Inf. Sci. (Ny)., vol. 409–410, pp. 17–26, 2017, doi: 10.1016/j.ins.2017.05.008.
M. M. Rahman and D. N. Davis, “Cluster Based Under-Sampling for Unbalanced Cardiovascular Data,” Proc. World Congr. Eng. 2013, vol. 3, pp. 1–6, 2013.
F. Gorunescu, Data mining: concepts and techniques. Berlin, 2011.
C. Vercellis, Business Intelligence : Data Mining and Optimization for Decision Making. John Wiley & Sons, Ltd, 2009.
H. He and E. A. Garcia, “Learning from Imbalanced Data,” IEEE Trans. Knowl. Data Eng., vol. 21, no. 9, pp. 1263–1284, 2019, doi: 10.1109/TKDE.2008.239.
R. S. Wahono, N. S. Herman, and S. Ahmad, “A Comparison Framework of Classification Models for Software Defect Prediction,” vol. 20, no. 10, pp. 1945–1950, 2014, doi: 10.1166/asl.2014.5640.
I. H. Witten, E. Frank, and M. A. Hall, Data Mining Third Edition. Elsevier Inc, 2011.
S. Lessmann, B. Baesens, C. Mues, and S. Pietsch, “Benchmarking classification models for software defect prediction: A proposed framework and novel findings,” IEEE Trans. Softw. Eng., vol. 34, no. 4, pp. 485–496, 2008, doi: 10.1109/TSE.2008.35.
M. Kuhn, “Building Predictive Models in R Using the caret Package,” J. Stat. Softw., vol. 28, no. 5, pp. 1–26, 2008, [Online]. Available: http://www.jstatsoft.org/v28/i05/paper.
B. Chen, S. Xia, Z. Chen, B. Wang, and G. Wang, “RSMOTE: A self-adaptive robust SMOTE for imbalanced problems with label noise,” Inf. Sci. (Ny)., vol. 553, pp. 397–428, 2021, doi: 10.1016/j.ins.2020.10.013.
Downloads
Published
Issue
Section
License
Copyright (c) 2021 Wahyu Nugraha, Raja Sabaruddin

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
License Terms
All articles published in Techno.COM Journal are licensed under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0). This means:
1. Attribution
Readers and users are free to:
-
Share – Copy and redistribute the material in any medium or format.
-
Adapt – Remix, transform, and build upon the material.
As long as proper credit is given to the original work by citing the author(s) and the journal.
2. Non-Commercial Use
-
The material cannot be used for commercial purposes.
-
Commercial use includes selling the content, using it in commercial advertising, or integrating it into products/services for profit.
3. Rights of Authors
-
Authors retain copyright and grant Techno.COM Journal the right to publish the article.
-
Authors can distribute their work (e.g., in institutional repositories or personal websites) with proper acknowledgment of the journal.
4. No Additional Restrictions
-
The journal cannot apply legal terms or technological measures that restrict others from using the material in ways allowed by the license.
5. Disclaimer
-
The journal is not responsible for how the published content is used by third parties.
-
The opinions expressed in the articles are solely those of the authors.
For more details, visit the Creative Commons License Page:
? https://creativecommons.org/licenses/by-nc/4.0/