Outlier Detection Using Gaussian Mixture Model Clustering to Optimize XGBoost for Credit Approval Prediction

Authors

  • De Rosal Ignatius Moses Setiadi Dian Nuswantoro University https://orcid.org/0000-0001-6615-4457
  • Ahmad Rofiqul Muslikh Universitas Merdeka Malang
  • Syahroni Wahyu Iriananda Universitas Widya Gama Malang
  • Warto Warto UIN Profesor Kiai Haji SaifuddinZuhri
  • Jutono Gondohanindijo AKI University
  • Arnold Adimabua Ojugo Federal University of Petroleum Resources Effurun

DOI:

https://doi.org/10.62411/jcta.11638

Keywords:

Credit Approval Prediction, Data Preprocessing, Ensemble Learning, Gaussian Mixture Model, Imbalanced Data, Outlier Clustering, Outlier Detection, XGBoost

Abstract

Credit approval prediction is one of the critical challenges in the financial industry, where the accuracy and efficiency of credit decision-making can significantly affect business risk. This study proposes an outlier detection method using the Gaussian Mixture Model (GMM) combined with Extreme Gradient Boosting (XGBoost) to improve prediction accuracy. GMM is used to detect outliers with a probabilistic approach, allowing for finer-grained anomaly identification compared to distance- or density-based methods. Furthermore, the data cleaned through GMM is processed using XGBoost, a decision tree-based boosting algorithm that efficiently handles complex datasets. This study compares the performance of XGBoost with various outlier detection methods, such as LOF, CBLOF, DBSCAN, IF, and K-Means, as well as various other classification algorithms based on machine learning and deep learning. Experimental results show that the combination of GMM and XGBoost provides the best performance with an accuracy of 95.493%, a recall of 91.650%, and an AUC of 95.145%, outperforming other models in the context of credit approval prediction on an imbalanced dataset. The proposed method has been proven to reduce prediction errors and improve the model's reliability in detecting eligible credit applications.

Author Biographies

De Rosal Ignatius Moses Setiadi, Dian Nuswantoro University

Scopus ID: 57200208474 Google Scholar : tFeuHLcAAAAJ Sinta ID: 6007744

Ahmad Rofiqul Muslikh, Universitas Merdeka Malang

Faculty of Information Technology, University of Merdeka, Malang, East Java 65146, Indonesia

Syahroni Wahyu Iriananda, Universitas Widya Gama Malang

Department of Informatics Engineering,Universitas Widya Gama Malang, Indonesia

Warto Warto, UIN Profesor Kiai Haji SaifuddinZuhri

Informatics Department, Faculty of Dakwah, UIN Profesor Kiai Haji SaifuddinZuhri, Purwokerto

Jutono Gondohanindijo, AKI University

Faculty of Technics and Informatics, AKI University, Semarang, Central Java 50136, Indonesia

Arnold Adimabua Ojugo, Federal University of Petroleum Resources Effurun

Department of Computer Science, Federal University of Petroleum Resources Effurun, Nigeria

References

B. Casu, L. Chiaramonte, E. Croci, and S. Filomeni, “Access to Credit in a Market Downturn,” J. Financ. Serv. Res., vol. 66, no. 2, pp. 143–169, Oct. 2024, doi: 10.1007/s10693-022-00388-x.

Y. Abakarim, M. Lahby, and A. Attioui, “Towards An Efficient Real-time Approach To Loan Credit Approval Using Deep Learning,” in 2018 9th International Symposium on Signal, Image, Video and Communications (ISIVC), Nov. 2018, pp. 306–313. doi: 10.1109/ISIVC.2018.8709173.

M. F. Faisal, M. N. U. Saqlain, M. A. S. Bhuiyan, M. H. Miraz, and M. J. A. Patwary, “Credit Approval System Using Machine Learning: Challenges and Future Directions,” in 2021 International Conference on Computing, Networking, Telecommunications & Engineering Sciences Applications (CoNTESA), Dec. 2021, pp. 76–82. doi: 10.1109/CoNTESA52813.2021.9657153.

Y. Wang, M. Wang, Y. Pan, and J. Chen, “Joint loan risk prediction based on deep learning‐optimized stacking model,” Eng. Reports, vol. 6, no. 4, Apr. 2024, doi: 10.1002/eng2.12748.

P. S. A. B. Reddy, M. G. Reddy, and R. S. Ponmagal, “An approach for prediction of loan approval using ML algorithm,” in 4th International Conference on Internet of Things, ICIoT2023, 2024, p. 020120. doi: 10.1063/5.0217401.

E. B. Wijayanti, D. R. I. M. Setiadi, and B. H. Setyoko, “Dataset Analysis and Feature Characteristics to Predict Rice Production based on eXtreme Gradient Boosting,” J. Comput. Theor. Appl., vol. 1, no. 3, pp. 299–310, Feb. 2024, doi: 10.62411/jcta.10057.

B. I. Igoche, O. Matthew, P. Bednar, and A. Gegov, “Integrating Structural Causal Model Ontologies with LIME for Fair Machine Learning Explanations in Educational Admissions,” J. Comput. Theor. Appl., vol. 2, no. 1, pp. 65–85, Jun. 2024, doi: 10.62411/jcta.10501.

F. Omoruwou, A. A. Ojugo, and S. E. Ilodigwe, “Strategic Feature Selection for Enhanced Scorch Prediction in Flexible Polyurethane Form Manufacturing,” J. Comput. Theor. Appl., vol. 1, no. 3, pp. 346–357, Feb. 2024, doi: 10.62411/jcta.9539.

J. A. Ingio, A. S. Nsang, and A. Iorliam, “Optimizing Rice Production Forecasting Through Integrating Multiple Linear Regression with Recursive Feature Elimination,” J. Futur. Artif. Intell. Technol., vol. 1, no. 2, pp. 96–108, Aug. 2024, doi: 10.62411/faith.2024-17.

N. R. M and S. Satheeskumaran, “An efficient multi-disease prediction model using advanced optimization aided weighted convolutional neural network with dilated gated recurrent unit,” Intell. Decis. Technol., vol. 18, no. 2, pp. 769–798, Jun. 2024, doi: 10.3233/IDT-240368.

Z. S. Dhahir, “A Hybrid Approach for Efficient DDoS Detection in Network Traffic Using CBLOF-Based Feature Engineering and XGBoost,” J. Futur. Artif. Intell. Technol., vol. 1, no. 2, pp. 174–190, Sep. 2024, doi: 10.62411/faith.2024-33.

A. Pinto, L.-C. Herrera, Y. Donoso, and J. A. Gutierrez, “Enhancing Critical Infrastructure Security: Unsupervised Learning Approaches for Anomaly Detection,” Int. J. Comput. Intell. Syst., vol. 17, no. 1, p. 236, Sep. 2024, doi: 10.1007/s44196-024-00644-z.

A. R. Muslikh, P. N. Andono, A. Marjuni, and H. A. Santoso, “Ensemble IDO Method for Outlier Detection and N2O Emission Prediction in Agriculture,” Int. J. Adv. Comput. Sci. Appl., vol. 15, no. 7, 2024, doi: 10.14569/IJACSA.2024.0150737.

R. Wei, Z. Li, L. Geng, M. Wuken, and Y. Liu, “Industrial image anomaly detection based on multi Gaussian discriminant model and robust core set,” Meas. Sci. Technol., vol. 35, no. 11, p. 116009, Nov. 2024, doi: 10.1088/1361-6501/ad6c76.

E. F. Agyemang, “Anomaly detection using unsupervised machine learning algorithms: A simulation study,” Sci. African, vol. 26, p. e02386, Dec. 2024, doi: 10.1016/j.sciaf.2024.e02386.

K. A. ElDahshan, G. E. Abutaleb, B. R. Elemary, E. A. Ebeid, and A. A. AlHabshy, “An optimized intelligent open-source MLaaS framework for user-friendly clustering and anomaly detection,” J. Supercomput., vol. 80, no. 18, pp. 26658–26684, Dec. 2024, doi: 10.1007/s11227-024-06420-2.

S. K. Nanda and N. J. Borah, “Development of Novel Framework for Identifying Anomalies in High Volume of Data Using Robust Machine Learning Algorithm,” SN Comput. Sci., vol. 5, no. 5, p. 500, Apr. 2024, doi: 10.1007/s42979-024-02681-z.

B. Zhao, X. Zhou, and Z. Wen, “Bank Customer Profile Based on Classification Algorithm,” in Proceedings of the 2024 Guangdong-Hong Kong-Macao Greater Bay Area International Conference on Digital Economy and Artificial Intelligence, Jan. 2024, pp. 366–371. doi: 10.1145/3675417.3675478.

J. Lwin, “Enhancing Cloud Task Scheduling with Multi-Objective Optimization Using K-Means Clustering and Dynamic Resource Allocation,” J. Comput. Theor. Appl., vol. 2, no. 2, pp. 202–211, Oct. 2024, doi: 10.62411/jcta.11337.

S. Sharma, J. Tandukar, and R. Bista, “Generating Harmonious Colors through the Combination of n-Grams and K-means,” J. Comput. Theor. Appl., vol. 1, no. 2, pp. 140–150, Dec. 2023, doi: 10.33633/jcta.v1i2.9470.

A. A. Bushra and G. Yi, “Comparative Analysis Review of Pioneering DBSCAN and Successive Density-Based Clustering Algorithms,” IEEE Access, vol. 9, pp. 87918–87935, 2021, doi: 10.1109/ACCESS.2021.3089036.

W.-R. Chen, Y.-H. Yun, M. Wen, H.-M. Lu, Z.-M. Zhang, and Y.-Z. Liang, “Representative subset selection and outlier detection via isolation forest,” Anal. Methods, vol. 8, no. 39, pp. 7225–7231, 2016, doi: 10.1039/C6AY01574C.

D. Kim, J. Park, H. C. Chung, and S. Jeong, “Unsupervised Outlier Detection using Random Subspace and Subsampling Ensembles of Dirichlet Process Mixtures,” arXiv. Jan. 01, 2024. [Online]. Available: http://arxiv.org/abs/2401.00773

X. Yang, L. J. Latecki, and D. Pokrajac, “Outlier Detection with Globally Optimal Exemplar-Based GMM,” in Proceedings of the 2009 SIAM International Conference on Data Mining, Apr. 2009, pp. 145–154. doi: 10.1137/1.9781611972795.13.

M. B. Teferi and L. A. Akinyemi, “Deep Learning-Based Cross-Cancer Morphological Analysis: Identifying Histopathological Patterns in Breast and Lung Cancer,” J. Futur. Artif. Intell. Technol., vol. 1, no. 3, pp. 235–248, Oct. 2024, doi: 10.62411/faith.3048-3719-36.

A. Alagic et al., “Machine Learning for an Enhanced Credit Risk Analysis: A Comparative Study of Loan Approval Prediction Models Integrating Mental Health Data,” Mach. Learn. Knowl. Extr., vol. 6, no. 1, pp. 53–77, Jan. 2024, doi: 10.3390/make6010004.

D. R. I. M. Setiadi, K. Nugroho, A. R. Muslikh, S. W. Iriananda, and A. A. Ojugo, “Integrating SMOTE-Tomek and Fusion Learning with XGBoost Meta-Learner for Robust Diabetes Recognition,” J. Futur. Artif. Intell. Technol., vol. 1, no. 1, pp. 23–38, May 2024, doi: 10.62411/faith.2024-11.

J. Martin, S. Taheri, and M. Abdollahian, “Optimizing Ensemble Learning to Reduce Misclassification Costs in Credit Risk Scorecards,” Mathematics, vol. 12, no. 6, p. 855, Mar. 2024, doi: 10.3390/math12060855.

A. Bhaskar et al., “Automatic credit card approval prediction system,” in 2nd International Conference on Computing and Communication Networks, ICCCN 2022, 2024, p. 050007. doi: 10.1063/5.0184623.

F. O. Aghware et al., “Enhancing the Random Forest Model via Synthetic Minority Oversampling Technique for Credit-Card Fraud Detection,” J. Comput. Theor. Appl., vol. 1, no. 4, pp. 407–420, Mar. 2024, doi: 10.62411/jcta.10323.

M. D. Okpor et al., “Pilot Study on Enhanced Detection of Cues over Malicious Sites Using Data Balancing on the Random Forest Ensemble,” J. Futur. Artif. Intell. Technol., vol. 1, no. 2, pp. 109–123, Sep. 2024, doi: 10.62411/faith.2024-14.

D. R. I. M. Setiadi, D. Marutho, and N. A. Setiyanto, “Comprehensive Exploration of Machine and Deep Learning Classification Methods for Aspect-Based Sentiment Analysis with Latent Dirichlet Allocation Topic Modeling,” J. Futur. Artif. Intell. Technol., vol. 1, no. 1, pp. 12–22, May 2024, doi: 10.62411/faith.2024-3.

D. R. I. M. Setiadi, H. M. M. Islam, G. A. Trisnapradika, and W. Herowati, “Analyzing Preprocessing Impact on Machine Learning Classifiers for Cryotherapy and Immunotherapy Dataset,” J. Futur. Artif. Intell. Technol., vol. 1, no. 1, pp. 39–50, Jun. 2024, doi: 10.62411/faith.2024-2.

K. Babu, S. Prabhakaran, P. Marikkannu, M. S. Roobini, P. Rai, and A. Pratap Singh, “Smart Credit Card Approval Prediction System using Machine Learning,” E3S Web Conf., vol. 540, p. 13001, Jun. 2024, doi: 10.1051/e3sconf/202454013001.

M. A. Sheikh, A. K. Goel, and T. Kumar, “An Approach for Prediction of Loan Approval using Machine Learning Algorithm,” in 2020 International Conference on Electronics and Sustainable Communication Systems (ICESC), Jul. 2020, no. Icesc, pp. 490–494. doi: 10.1109/ICESC48915.2020.9155614.

K. K. Karthik and D. B. David, “A novel approach for enhancing the performance accuracy of loan prediction by comparing Naive Bayes with Decision Tree algorithm,” in International Conference on Advanced Communication Computing and Material Sciences, ICACCMS 2022, 2024, p. 050030. doi: 10.1063/5.0228264.

F. S. Gomiasti, W. Warto, E. Kartikadarma, J. Gondohanindijo, and D. R. I. M. Setiadi, “Enhancing Lung Cancer Classification Effectiveness Through Hyperparameter-Tuned Support Vector Machine,” J. Comput. Theor. Appl., vol. 1, no. 4, pp. 396–406, Mar. 2024, doi: 10.62411/jcta.10106.

D. R. I. M. Setiadi, S. Widiono, A. N. Safriandono, and S. Budi, “Phishing Website Detection Using Bidirectional Gated Recurrent Unit Model and Feature Selection,” J. Futur. Artif. Intell. Technol., vol. 2, no. 1, pp. 75–83, 2024, doi: 10.62411/faith.2024-15.

A. Imtiaz, N. Pathirana, S. Saheel, K. Karunanayaka, and C. Trenado, “A Review on the Influence of Deep Learning and Generative AI in the Fashion Industry,” J. Futur. Artif. Intell. Technol., vol. 1, no. 3, pp. 201–216, Oct. 2024, doi: 10.62411/faith.3048-3719-29.

A. Pathirana, D. K. Rajakaruna, D. Kasthurirathna, A. Atukorale, R. Aththidiye, and M. Yatiipansalawa, “A Reinforcement Learning-Based Approach for Promoting Mental Health Using Multimodal Emotion Recognition,” J. Futur. Artif. Intell. Technol., vol. 1, no. 2, pp. 124–141, 2024, doi: 10.62411/faith.2024-22.

C.-H. Weng and C.-K. Huang, “A Hybrid Machine Learning Model for Credit Approval,” Appl. Artif. Intell., vol. 35, no. 15, pp. 1439–1465, Dec. 2021, doi: 10.1080/08839514.2021.1982475.

P. H. Prastyo, S. E. Prasetyo, and S. Arti, “A Machine Learning Framework for Improving Classification Performance on Credit Approval,” IJID (International J. Informatics Dev., vol. 10, no. 1, pp. 47–52, Jun. 2021, doi: 10.14421/ijid.2021.2384.

A. S. Kadam, S. R. Nikam, A. A. Aher, G. V Shelke, and A. S. Chandgude, “Prediction for Loan Approval Using Machine Learning Algorithm,” Int. Res. J. Eng. Technol., vol. 8, no. 4, pp. 4089–4092, 2021.

V. Viswanatha, A. C. Ramachandra, K. N. Vishwas, and G. Adithya, “Prediction of Loan Approval in Banks using Machine Learning Approach,” Int. J. Eng. Manag. Res., vol. 13, no. 4, pp. 7–19, 2023, doi: 10.31033/ijemr.13.4.2.

Y. Diwate, P. Rana, and P. Chavan, “Loan Approval Prediction Using Machine Learning,” Int. Res. J. Eng. Technol., vol. 8, no. 5, pp. 1741–1745, 2021.

S. Fanijo, “AI4CRC: A Deep Learning Approach Towards Preventing Colorectal Cancer,” J. Futur. Artif. Intell. Technol., vol. 1, no. 2, pp. 143–159, Sep. 2024, doi: 10.62411/faith.2024-28.

J. R. Quinlan, “Credit Approval - UCI Machine Learning Repository,” UCI Machine Learning Repository, 1987. https://archive.ics.uci.edu/dataset/27/credit+approval

M. G. Kibria and M. Sevkli, “Application of Deep Learning for Credit Card Approval: A Comparison with Two Machine Learning Techniques,” Int. J. Mach. Learn. Comput., vol. 11, no. 4, pp. 286–290, Aug. 2021, doi: 10.18178/ijmlc.2021.11.4.1049.

Downloads

Published

2024-11-01

How to Cite

Setiadi, D. R. I. M., Muslikh, A. R., Iriananda, S. W., Warto, W., Gondohanindijo, J., & Ojugo, A. A. (2024). Outlier Detection Using Gaussian Mixture Model Clustering to Optimize XGBoost for Credit Approval Prediction. Journal of Computing Theories and Applications, 2(2), 244–255. https://doi.org/10.62411/jcta.11638