Predicting First-Year Student Performance with SMOTE-Enhanced Stacking Ensemble and Association Rule Mining for University Success Profiling

Authors

  • Philippe Boribo Kikunda Université Catholique de Bukavu
  • Issa Tasho Kasongo Institut Supérieur Pédagogique de Bukavu
  • Thierry Nsabimana University of Burundi
  • Jérémie Ndikumagenge University of Burundi
  • Longin Ndayisaba University of Burundi
  • Elie Zihindula Mushengezi Institut Supérieur Pédagogique de Bukavu
  • Jules Raymond Kala Université Catholique de Bukavu

DOI:

https://doi.org/10.62411/jcta.14043

Keywords:

Apriori algorithm, Data mining, Educational data mining, Higher education, Performance prediction, SMOTE, Stacking ensemble, Student success

Abstract

This study examines the application of Educational Data Mining (EDM) to predict the academic per-formance of first-year students at the Catholic University of Bukavu and the Higher Institute of Edu-cation (ISP) in the Democratic Republic of Congo. The primary objective is to develop a model that can identify at-risk students early, providing the university with a tool to enhance student support and academic guidance. To address the challenges posed by data imbalance (where successful cases outnumber failures), the study adopts a hybrid methodological approach. First, the SMOTE algorithm was applied to balance the dataset. Then, a stacking classification model was developed to combine the predictive power of multiple algorithms. The variables used for prediction include the National Exam score (PEx), the secondary school track (Humanities), and the type of prior institution (public, private, or religious-affiliated schools), as well as age and sex. The results demonstrate that this approach is highly effective. The model is not only capable of predicting success or failure but also of forecasting students' performance levels (e.g., honors or distinctions). Moreover, the use of the Apriori association rule mining algorithm allowed the identification of faculty-specific success profiles, transforming prediction into an interpretable decision-support tool. This research makes several significant contributions. Practically, it provides the University of Bukavu with a tool for student orientation and early risk detection. Methodologically, it illustrates the effectiveness of a combined approach to EDM in an African context. However, the study acknowledges certain limitations, including the non-public nature of the data and the geographical specificity of the sample. It therefore proposes avenues for future research, such as the integration of Explainable AI (XAI) techniques for more refined and transparent analysis of the results.

Author Biographies

Philippe Boribo Kikunda, Université Catholique de Bukavu

Computer Science Department, Faculty of Sciences, Université Catholique de Bukavu (UCB), PO Box 285, Bukavu, Democratic Republic of Congo Management Computer Department, Institut Supérieur Pédagogique de Bukavu(ISP/Bukavu), PO Box 854, Bukavu, Democratic Republic of Congo Doctoral school of the University of Burundi, Center for Research in Infrastructure, Environment and   Technology (CRIET), Bujumbura, Burundi

Issa Tasho Kasongo, Institut Supérieur Pédagogique de Bukavu

Management Computer Department, Institut Supérieur Pédagogique de Bukavu (ISP/Bukavu), PO Box 854, Bukavu, Democratic Republic of Congo

Thierry Nsabimana, University of Burundi

Doctoral school of the University of Burundi, Center for Research in Infrastructure, Environment and  Technology (CRIET), Bujumbura, Burundi

Jérémie Ndikumagenge, University of Burundi

Doctoral school of the University of Burundi, Center for Research in Infrastructure, Environment and  Technology (CRIET), Bujumbura, Burundi

Longin Ndayisaba, University of Burundi

Doctoral school of the University of Burundi, Center for Research in Infrastructure, Environment and  Technology (CRIET), Bujumbura, Burundi

Elie Zihindula Mushengezi, Institut Supérieur Pédagogique de Bukavu

Management Computer Department, Institut Supérieur Pédagogique de Bukavu (ISP/Bukavu), PO Box 854, Bukavu, Democratic Republic of Congo

Jules Raymond Kala, Université Catholique de Bukavu

Computer Science Department, Faculty of Sciences, Université Catholique de Bukavu (UCB), PO Box 285, Bukavu, Democratic Republic of Congo

References

P. Golding and O. Donaldson, “Predicting Academic Performance,” in Proceedings. Frontiers in Education. 36th Annual Conference, 2006, pp. 21–26. doi: 10.1109/FIE.2006.322661.

S. A. Alwarthan, N. Aslam, and I. U. Khan, “Predicting Student Academic Performance at Higher Education Using Data Mining: A Systematic Review,” Appl. Comput. Intell. Soft Comput., vol. 2022, pp. 1–26, Sep. 2022, doi: 10.1155/2022/8924028.

M. M. Arcinas, una S. Sajja, S. Asif, S. Gour, E. Okoronkwo, and M. Naved, “Role of data mining in education for improving students performance for social change,” Turkish J. Physiother. Rehabil., vol. 32, no. 3, pp. 6519–6526, 2021.

P. B. Kikunda, J. Ndikumagenge, L. Ndayisaba, and T. Nsabimana, “Explainable Bayesian Network Recommender for Personalized University Program Selection,” J. Comput. Theor. Appl., vol. 3, no. 1, pp. 17–33, Jun. 2025, doi: 10.62411/jcta.12720.

K. K. San, H. H. Win, and K. E. E. Chaw, “Enhancing Hybrid Course Recommendation with Weighted Voting Ensemble Learning,” J. Futur. Artif. Intell. Technol., vol. 1, no. 4, pp. 337–347, Jan. 2025, doi: 10.62411/faith.3048-3719-55.

X. Zhang and X. Zhang, “An Overview of Data Mining Techniques for Student Performance Prediction,” in Artificial Intelligence in Education and Teaching Assessment, Singapore: Springer Singapore, 2021, pp. 149–159. doi: 10.1007/978-981-16-6502-8_14.

C. Romero and S. Ventura, “Educational Data Mining: A Review of the State of the Art,” IEEE Trans. Syst. Man, Cybern. Part C (Applications Rev., vol. 40, no. 6, pp. 601–618, Nov. 2010, doi: 10.1109/TSMCC.2010.2053532.

M. T. Nguyen and T. T. Nguyen, “Advanced and AI Embedded Technologies in Education: Effectiveness, Recent Developments, and Opening Issues,” J. Futur. Artif. Intell. Technol., vol. 1, no. 3, pp. 191–200, Oct. 2024, doi: 10.62411/faith.3048-3719-19.

M. A. Marjan, M. P. Uddin, and M. Ibn Afjal, “An Educational Data Mining System For Predicting And Enhancing Tertiary Students’ Programming Skill,” Comput. J., vol. 66, no. 5, pp. 1083–1101, May 2023, doi: 10.1093/comjnl/bxab214.

P. Strecht, L. Cruz, C. Soares, J. Mendes-Moreira, and R. Abreu, “A Comparative Study of Classification and Regression Algorithms for Modelling Students’ Academic Performance,” in Proceedings of the 8th International Conference on Educational Data Mining, 2015, pp. 392–395. [Online]. Available: https://files.eric.ed.gov/fulltext/ED560769.pdf

Z. A. Pardos, N. T. Heffernan, B. Anderson, and C. L. Heffernan, “The Effect of Model Granularity on Student Performance Prediction Using Bayesian Networks,” in User Modeling 2007, Berlin, Heidelberg: Springer Berlin Heidelberg, 2007, pp. 435–439. doi: 10.1007/978-3-540-73078-1_60.

A. Alam and A. Mohanty, “Predicting Students’ Performance Employing Educational Data Mining Techniques, Machine Learning, and Learning Analytics,” in Communication, Networks and Computing, 2023, pp. 166–177. doi: 10.1007/978-3-031-43140-1_15.

H. A. Mengash, “Using Data Mining Techniques to Predict Student Performance to Support Decision Making in University Admission Systems,” IEEE Access, vol. 8, pp. 55462–55470, 2020, doi: 10.1109/ACCESS.2020.2981905.

B. I. Igoche, O. Matthew, P. Bednar, and A. Gegov, “Integrating Structural Causal Model Ontologies with LIME for Fair Machine Learning Explanations in Educational Admissions,” J. Comput. Theor. Appl., vol. 2, no. 1, pp. 65–85, Jun. 2024, doi: 10.62411/jcta.10501.

V. Barra, L. Miclet, and A. Cornuéjols, Concepts et algorithmes - De Bayes et Hume au Deep Learning, 4th ed. 2021.

P. Rojanavasu, “Educational Data Analytics using Association Rule Mining and Classification,” in 2019 Joint International Conference on Digital Arts, Media and Technology with ECTI Northern Section Conference on Electrical, Electronics, Computer and Telecommunications Engineering (ECTI DAMT-NCON), Jan. 2019, pp. 142–145. doi: 10.1109/ECTI-NCON.2019.8692274.

R. Asif, A. Merceron, S. A. Ali, and N. G. Haider, “Analyzing undergraduate students’ performance using educational data mining,” Comput. Educ., vol. 113, pp. 177–194, Oct. 2017, doi: 10.1016/j.compedu.2017.05.007.

J. Zimmermann, K. H. Brodersen, H. R. Heinimann, and J. M. Buhmann, “A Model-Based Approach to Predicting Graduate-Level Performance Using Indicators of Undergraduate-Level Performance,” J. Educ. Data Min., vol. 7, no. 3, pp. 151–176, 2015, doi: 10.5281/zenodo.3554733.

S. Batool, J. Rashid, M. W. Nisar, J. Kim, H.-Y. Kwon, and A. Hussain, “Educational data mining to predict students’ academic performance: A survey study,” Educ. Inf. Technol., vol. 28, no. 1, pp. 905–971, Jan. 2023, doi: 10.1007/s10639-022-11152-y.

A. I. Adekitan and O. Salau, “Toward an improved learning process: the relevance of ethnicity to data mining prediction of students’ performance,” SN Appl. Sci., vol. 2, no. 1, p. 8, Jan. 2020, doi: 10.1007/s42452-019-1752-1.

L. M. Abu Zohair, “Prediction of Student’s performance by modelling small dataset size,” Int. J. Educ. Technol. High. Educ., vol. 16, no. 1, p. 27, Dec. 2019, doi: 10.1186/s41239-019-0160-3.

S. Zheng, K. Zhou, and C. Chen, “Perturbation-Based SMOTE for Multi-Class Imbalanced Classification,” in 2024 5th International Conference on Machine Learning and Computer Application (ICMLCA), Oct. 2024, pp. 53–56. doi: 10.1109/ICMLCA63499.2024.10753837.

N. A. Rahmi, S. Defit, and Okfalisa, “Enhancing Classification Performance: A Study on SMOTE and Ensemble Learning Techniques,” in 2024 International Conference on Future Technologies for Smart Society (ICFTSS), Aug. 2024, pp. 63–68. doi: 10.1109/ICFTSS61109.2024.10691339.

Nguyen Thai Nghe, P. Janecek, and P. Haddawy, “A comparative analysis of techniques for predicting academic performance,” in 2007 37th annual frontiers in education conference - global engineering: knowledge without borders, opportunities without passports, Oct. 2007, pp. T2G-7-T2G-12. doi: 10.1109/FIE.2007.4417993.

B. Minaei-Bidgoli, D. A. Kashy, G. Kortemeyer, and W. F. Punch, “Predicting student performance: an application of data mining methods with an educational web-based system,” in 33rd Annual Frontiers in Education, 2003. FIE 2003., 2003, vol. 1, p. T2A_13-T2A_18. doi: 10.1109/FIE.2003.1263284.

G. V. S. C. S. L. V Prasad and M. Rambabu, “Association Rule Generation for Student Performance Analysis usingApriori Algorithm,” J. Sci. Technol., vol. 7, no. 3, pp. 107–112, 2022, doi: 10.46243/jst.2022.v7.i03.pp107%20-112.

Downloads

Published

2025-09-30

How to Cite

Kikunda, P. B., Kasongo, I. T., Nsabimana, T., Ndikumagenge, J., Ndayisaba, L., Mushengezi, E. Z., & Kala, J. R. (2025). Predicting First-Year Student Performance with SMOTE-Enhanced Stacking Ensemble and Association Rule Mining for University Success Profiling. Journal of Computing Theories and Applications, 3(2), 132–144. https://doi.org/10.62411/jcta.14043