Enhancing Default Prediction in P2P Lending using Random Forest and Grey Wolf Optimization-based Feature Selection
DOI:
https://doi.org/10.33633/jais.v8i3.9234Abstract
Online lending services such as Peer to Peer (P2P) loans provide convenience for lenders to transact directly without involving banks as intermediaries. Identifying potential loan recipients who are at risk of default is a crucial step in preventing financial losses, as lenders are responsible for default risk. However, predicting default risk becomes a challenge when P2P lending datasets have various complex features. Some features in P2P lending are redundant, while others do not significantly contribute to an effective solution. Therefore, feature selection is an important process to choose a relevant subset of features from input or target data. Traditional feature selection methods often fail to provide optimal results. A better approach is to use heuristic search algorithms capable of finding suboptimal feature subsets. We employ the Grey Wolf Optimization (GWO) technique, inspired by the hierarchy of leadership and grey wolf hunting mechanisms. Combined with Random Forest (RF), which has limitations in classifying data with very high dimensions, our GWO+RF combination has proven to enhance classification performance better than previous research. It achieves an accuracy score of 97.31%, compared to previous research with scores of only 67.72% for RBM+RF, 64% for Binary PSO+ERT, and 92% for GA+RF.References
W. Yin, B. Kirkulak-Uludag, D. Zhu, and Z. Zhou, “Stacking ensemble method for personal credit risk assessment in Peer-to-Peer lending,” Appl. Soft Comput., vol. 142, 2023, doi: 10.1016/j.asoc.2023.110302.
Y. Rong, S. Liu, S. Yan, W. W. Huang, and Y. Chen, “Proposing a new loan recommendation framework for loan allocation strategies in online P2P lending,” Ind. Manag. Data Syst., vol. 123, no. 3, pp. 910–930, 2023, doi: 10.1108/IMDS-07-2022-0399.
P. C. Ko, P. C. Lin, H. T. Do, and Y. F. Huang, “P2P Lending Default Prediction Based on AI and Statistical Models,” Entropy, vol. 24, no. 6, 2022, doi: 10.3390/e24060801.
Y. Tan and G. Zhao, “Multi-view representation learning with Kolmogorov-Smirnov to predict default based on imbalanced and complex dataset,” Inf. Sci. (Ny)., vol. 596, pp. 380–394, 2022, doi: 10.1016/j.ins.2022.03.022.
V. Moscato, A. Picariello, and G. Sperlí, “A benchmark of machine learning approaches for credit score prediction,” Expert Syst. Appl., vol. 165, 2021, doi: 10.1016/j.eswa.2020.113986.
Y. R. Chen, J. S. Leu, S. A. Huang, J. T. Wang, and J. I. Takada, “Predicting Default Risk on Peer-to-Peer Lending Imbalanced Datasets,” IEEE Access, vol. 9, pp. 73103–73109, 2021, doi: 10.1109/ACCESS.2021.3079701.
K. Niu, Z. Zhang, Y. Liu, and R. Li, “Resampling ensemble model based on data distribution for imbalanced credit risk evaluation in P2P lending,” Inf. Sci. (Ny)., vol. 536, pp. 120–134, 2020, doi: 10.1016/j.ins.2020.05.040.
Y. Song, Y. Wang, X. Ye, D. Wang, Y. Yin, and Y. Wang, “Multi-view ensemble learning based on distance-to-model and adaptive clustering for imbalanced credit risk assessment in P2P lending,” Inf. Sci. (Ny)., vol. 525, pp. 182–204, 2020, doi: 10.1016/j.ins.2020.03.027.
M. J. Christ, R. N. P. Tri, W. Chandra, and T. Mauritsius, “Lending club default prediction using Naïve Bayes and decision tree,” Int. J. Adv. Trends Comput. Sci. Eng., vol. 8, no. 5, pp. 2528–2534, 2019, doi: 10.30534/ijatcse/2019/99852019.
A. Semiu and A. A. R. Gilal, “A boosted decision tree model for predicting loan default in P2P lending communities,” Int. J. Eng. Adv. Technol., vol. 9, no. 1, pp. 1257–1261, 2019, doi: 10.35940/ijeat.A9626.109119.
S. F. Chen, G. Charkaborty, L. H. Li, and C. T. Lin, “Credit risk assessment using regression model on P2P lending,” Int. J. Appl. Sci. Eng., vol. 16, no. 2, pp. 149–157, 2019.
W. Li, S. Ding, H. Wang, Y. Chen, and S. Yang, “Heterogeneous ensemble learning with feature engineering for default prediction in peer-to-peer lending in China,” World Wide Web, vol. 23, no. 1, pp. 23–45, 2020, doi: 10.1007/s11280-019-00676-y.
A. Byanjankar, M. Heikkila, and J. Mezei, “Predicting credit risk in peer-to-peer lending: A neural network approach,” Proc. - 2015 IEEE Symp. Ser. Comput. Intell. SSCI 2015, pp. 719–725, 2015, doi: 10.1109/SSCI.2015.109.
M. Malekipirbazari and V. Aksakalli, “Risk assessment in social lending via random forests,” Expert Syst. Appl., vol. 42, no. 10, pp. 4621–4631, 2015, doi: 10.1016/j.eswa.2015.02.001.
H. Li, Y. Zhang, N. Zhang, and H. Jia, “Detecting the Abnormal Lenders from P2P Lending Data,” Procedia Comput. Sci., vol. 91, pp. 357–361, 2016, doi: 10.1016/j.procs.2016.07.095.
C. Serrano-Cinca and B. Gutiérrez-Nieto, “The use of profit scoring as an alternative to credit scoring systems in peer-to-peer (P2P) lending,” Decis. Support Syst., vol. 89, pp. 113–122, 2016, doi: 10.1016/j.dss.2016.06.014.
R. C. Chen, C. Dewi, S. W. Huang, and R. E. Caraka, “Selecting critical features for data classification based on machine learning methods,” J. Big Data, vol. 7, no. 1, 2020, doi: 10.1186/s40537-020-00327-4.
H. D. Wang, “Research on the features of car insurance data based on machine learning,” Procedia Comput. Sci., vol. 166, pp. 582–587, 2020, doi: 10.1016/j.procs.2020.02.016.
M. Papoušková and P. Hájek, “Two-stage consumer credit risk modelling using heterogeneous ensemble learning,” Decis. Support Syst., vol. 118, no. October 2018, pp. 33–45, 2019.
S. Gu, R. Cheng, and Y. Jin, “Feature selection for high-dimensional classification using a competitive swarm optimizer,” Soft Comput., vol. 22, no. 3, pp. 811–822, 2018, doi: 10.1007/s00500-016-2385-6.
M. A. Muslim et al., “New model combination meta-learner to improve accuracy prediction P2P lending with stacking ensemble learning,” Intell. Syst. with Appl., vol. 18, 2023, doi: 10.1016/j.iswa.2023.200204.
A. K. Sharma, L. H. Li, and R. Ahmad, “Default Risk Prediction Using Random Forest and XGBoosting Classifier,” Smart Innov. Syst. Technol., vol. 314, pp. 91–101, 2023, doi: 10.1007/978-3-031-05491-4_10.
J. Xu, D. Chen, and M. Chau, “Identifying features for detecting fraudulent loan requests on P2P platforms,” IEEE Int. Conf. Intell. Secur. Informatics Cybersecurity Big Data, ISI 2016, pp. 79–84, 2016, doi: 10.1109/ISI.2016.7745447.
S. F. Chen, G. Chakraborty, and L. H. Li, “Feature Selection on Credit Risk Prediction for Peer-to-Peer Lending,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 11717 LNAI, pp. 5–18, 2019, doi: 10.1007/978-3-030-31605-1_1.
N. Setiawan, Suharjito, and Diana, “A comparison of prediction methods for credit default on peer to peer lending using machine learning,” Procedia Comput. Sci., vol. 157, pp. 38–45, 2019, doi: 10.1016/j.procs.2019.08.139.
L. Cui, Y. Jiao, L. Bai, L. Rossi, and E. R. Hancock, “Adaptive feature selection based on the most informative graph-based features,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 10310 LNCS, pp. 276–287, 2017, doi: 10.1007/978-3-319-58961-9_25.
L. He, H. Xu, and G. Y. Ke, “A hybrid predictive framework for evaluating P2P credit risks,” Grey Syst., vol. 12, no. 3, pp. 551–573, 2022, doi: 10.1108/GS-03-2021-0041.
C. Shen and K. Zhang, “Two-stage improved Grey Wolf optimization algorithm for feature selection on high-dimensional classification,” Complex Intell. Syst., vol. 8, no. 4, pp. 2769–2789, 2022, doi: 10.1007/s40747-021-00452-4.
S. Mirjalili, S. M. Mirjalili, and A. Lewis, “Grey Wolf Optimizer,” Adv. Eng. Softw., vol. 69, pp. 46–61, 2014, doi: 10.1016/j.advengsoft.2013.12.007.
T. Nguyen Truong, S. Khuat Thanh, T. Ngo Thi Thu, N. Nguyen Ha, and D. Tran Manh, “Improve Risk Prediction in Online Lending (P2P) Using Feature Selection and Deep Learning,” Int. J. Comput. Sci. Netw. Secur., vol. 19, no. 11, pp. 216–222, 2019.
L. Victor and M. Raheem, “Loan Default Prediction Using Genetic Algorithm: A Study within Peer-To-Peer Lending Communities,” Int. J. Innov. Sci. Res. Technol., vol. 6, no. 3, pp. 1195–1205, 2021.
Downloads
Published
Issue
Section
License
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).