Android Malware Detection Using Machine Learning with SMOTE-Tomek Data Balancing

Maryam Sufiyanu Masari; Maiauduga Abdullahi Danladi; Ilori Loretta Onyinye; Loreta Katok Tohomdet

doi:10.62411/jcta.15084

Authors

Maryam Sufiyanu Masari Air Force Institute of Technology
Maiauduga Abdullahi Danladi Air Force Institute of Technology
Ilori Loretta Onyinye Air Force Institute of Technology
Loreta Katok Tohomdet Air Force Institute of Technology

DOI:

https://doi.org/10.62411/jcta.15084

Keywords:

Android malware detection, Cybersecurity, Imbalanced dataset, Intrusion detection, Machine learning, Malicious detection, Malware classification, Random Forest

Abstract

This study presents a comprehensive comparative analysis of four traditional machine learning algorithms Decision Tree, Random Forest, K-Nearest Neighbors, and Support Vector Machine for Android malware detection using the preprocessed TUANDROMD dataset comprising 4,465 instances and 241 features representing both static and dynamic application characteristics. Motivated by the limitations of conventional signature-based and hybrid detection methods, especially in managing imbalanced datasets and detecting emerging malware variants, the study employed SMOTE to ensure balanced training data and fair model evaluation. The dataset was divided into 80% training and 20% testing subsets, and models were assessed using key performance metrics including accuracy, precision, recall, F1-score, and ROC AUC. The findings revealed that the proposed Random Forest model outperformed the other classifiers, achieving an accuracy of 0.993, precision of 0.992, recall of 1.000, F1-score of 0.996, and a near-perfect ROC AUC of 0.9998 surpassing state-of-the-art approaches. These results affirm the superior predictive capability, consistency, and robustness of the Random Forest algorithm in Android malware detection. The study concludes that base models, when integrated with class-balancing techniques, provide reliable and efficient malware detection across imbalanced datasets. For future research, the study recommends exploring advanced hybrid or ensemble frameworks that integrate Random Forest with deep learning architectures or other meta-heuristic optimization techniques to further enhance detection accuracy, adaptability, and resilience against rapidly evolving Android malware threats.

Author Biographies

Maryam Sufiyanu Masari, Air Force Institute of Technology

Department of Cybersecurity, Faculty of Computing, Air Force Institute of Technology, Kaduna 800001, Kaduna State, Nigeria

Maiauduga Abdullahi Danladi, Air Force Institute of Technology

Department of Cybersecurity, Faculty of Computing, Air Force Institute of Technology, Kaduna 800001, Kaduna State, Nigeria

Ilori Loretta Onyinye, Air Force Institute of Technology

Department of Information and Communication Technology, Faculty of Ground and Communication Engineering, Air Force Institute of Technology, Kaduna 800001, Kaduna State, Nigeria

Loreta Katok Tohomdet, Air Force Institute of Technology

Department of Information and Communication Technology, Faculty of Ground and Communication Engineering, Air Force Institute of Technology, Kaduna 800001, Kaduna State, Nigeria

References

P. Kumar, G. P. Gupta, and R. Tripathi, “A Review on Intrusion Detection Systems and Cyber Threat Intelligence for Secure IoT-Enabled Networks,” in Big Data Analytics in Fog-Enabled IoT Networks, Boca Raton: CRC Press, 2023, pp. 51–76. doi: 10.1201/9781003264545-3.

A. K. Dey, G. P. Gupta, and S. P. Sahu, “A metaheuristic-based ensemble feature selection framework for cyber threat detection in IoT-enabled networks,” Decis. Anal. J., vol. 7, p. 100206, Jun. 2023, doi: 10.1016/j.dajour.2023.100206.

P. Kumar, G. P. Gupta, and R. Tripathi, “Toward Design of an Intelligent Cyber Attack Detection System using Hybrid Feature Reduced Approach for IoT Networks,” Arab. J. Sci. Eng., vol. 46, no. 4, pp. 3749–3778, Apr. 2021, doi: 10.1007/s13369-020-05181-3.

A. Aghamohammadi and F. Faghih, “Lightweight versus obfuscation-resilient malware detection in android applications,” J. Comput. Virol. Hacking Tech., vol. 16, no. 2, pp. 125–139, Jun. 2020, doi: 10.1007/s11416-019-00341-y.

V. Sihag, M. Vardhan, and P. Singh, “A survey of android application and malware hardening,” Comput. Sci. Rev., vol. 39, p. 100365, Feb. 2021, doi: 10.1016/j.cosrev.2021.100365.

J. Singh and J. Singh, “A survey on machine learning-based malware detection in executable files,” J. Syst. Archit., vol. 112, p. 101861, Jan. 2021, doi: 10.1016/j.sysarc.2020.101861.

M. Conti, V. P., and A. Vitella, “Obfuscation detection in Android applications using deep learning,” J. Inf. Secur. Appl., vol. 70, p. 103311, Nov. 2022, doi: 10.1016/j.jisa.2022.103311.

S. K. Smmarwar, G. P. Gupta, and S. Kumar, “AI-empowered malware detection system for industrial internet of things,” Comput. Electr. Eng., vol. 108, p. 108731, May 2023, doi: 10.1016/j.compeleceng.2023.108731.

P. H. Hussan and S. M. Mangj, “BERTPHIURL : A Teacher-Student Learning Approach Using DistilRoBERTa and RoBERTa for Detecting Phishing Cyber URLs,” J. Futur. Artif. Intell. Technol., vol. 1, no. 4, 2025, doi: 10.62411/faith.3048-3719-71.

M. D. Okpor et al., “Pilot Study on Enhanced Detection of Cues over Malicious Sites Using Data Balancing on the Random Forest Ensemble,” J. Futur. Artif. Intell. Technol., vol. 1, no. 2, pp. 109–123, Sep. 2024, doi: 10.62411/faith.2024-14.

M. N. Musa and M. E. Irhebhude, “An Empirical Analysis of Injection Attack Vectors and Mitigation Strategies in Redis NoSQL Database,” J. Comput. Theor. Appl., vol. 2, no. 4, pp. 553–571, May 2025, doi: 10.62411/jcta.12640.

C. Prakash, M. Lind, and E. De La Cruz, “Hybrid Real-time Framework for Detecting Adaptive Prompt Injection Attacks in Large Language Models,” J. Comput. Theor. Appl., vol. 3, no. 3, pp. 286–301, Jan. 2026, doi: 10.62411/jcta.15254.

M. Alazab et al., “A Hybrid Wrapper-Filter Approach for Malware Detection,” J. Networks, vol. 9, no. 11, Dec. 1969, doi: 10.4304/jnw.9.11.2878-2891.

T. Sharma and D. Rattan, “Malicious application detection in android — A systematic literature review,” Comput. Sci. Rev., vol. 40, p. 100373, May 2021, doi: 10.1016/j.cosrev.2021.100373.

J. P. Ntayagabiri, Y. Bentaleb, J. Ndikumagenge, and H. El Makhtoum, “A Comparative Analysis of Supervised Machine Learning Algorithms for IoT Attack Detection and Classification,” J. Comput. Theor. Appl., vol. 2, no. 3, pp. 395–409, Feb. 2025, doi: 10.62411/jcta.11901.

D. Gibert, C. Mateu, and J. Planes, “The rise of machine learning for detection and classification of malware: Research developments, trends and challenges,” J. Netw. Comput. Appl., vol. 153, p. 102526, Mar. 2020, doi: 10.1016/j.jnca.2019.102526.

S. Abijah Roseline and S. Geetha, “A comprehensive survey of tools and techniques mitigating computer and mobile malware attacks,” Comput. Electr. Eng., vol. 92, p. 107143, Jun. 2021, doi: 10.1016/j.compeleceng.2021.107143.

D. Ucci, L. Aniello, and R. Baldoni, “Survey of machine learning techniques for malware analysis,” Comput. Secur., vol. 81, pp. 123–147, Mar. 2019, doi: 10.1016/j.cose.2018.11.001.

S. Madan, S. Sofat, and D. Bansal, “Tools and Techniques for Collection and Analysis of Internet-of-Things malware: A systematic state-of-art review,” J. King Saud Univ. - Comput. Inf. Sci., vol. 34, no. 10, pp. 9867–9888, Nov. 2022, doi: 10.1016/j.jksuci.2021.12.016.

S. I. Imtiaz, S. ur Rehman, A. R. Javed, Z. Jalil, X. Liu, and W. S. Alnumay, “DeepAMD: Detection and identification of Android malware using high-efficient Deep Artificial Neural Network,” Futur. Gener. Comput. Syst., vol. 115, pp. 844–856, Feb. 2021, doi: 10.1016/j.future.2020.10.008.

A. Iqubal, H. Happy, and S. K. Tiwari, “Android Malware Defense: Leveraging Machine Learning Models,” in 2024 4th International Conference on Advancement in Electronics & Communication Engineering (AECE), Nov. 2024, pp. 1356–1361. doi: 10.1109/AECE62803.2024.10911128.

T. Bhandari, R. V. Romould, M. K. Gourisaria, V. Singh, R. Chatterjee, and D. K. Behera, “Unveiling Machine Learning Paradigms for Robust Malware Detection in Personal Data Security,” in 2024 Sixth International Conference on Computational Intelligence and Communication Technologies (CCICT), Apr. 2024, pp. 226–231. doi: 10.1109/CCICT62777.2024.00045.

H. Shah, V. Shah, N. Soni, V. Vadhavana, and K. Patel, “A Comparative Analysis for Android Malware Detection Using Machine Learning Models,” in 2025 6th International Conference on Mobile Computing and Sustainable Informatics (ICMCSI), Jan. 2025, pp. 1040–1047. doi: 10.1109/ICMCSI64620.2025.10883385.

A. Wajahat et al., “An effective deep learning scheme for android malware detection leveraging performance metrics and computational resources,” Intell. Decis. Technol., vol. 18, no. 1, pp. 33–55, Feb. 2024, doi: 10.3233/IDT-230284.

A. Çetin and S. Öztürk, “Comprehensive Exploration of Ensemble Machine Learning Techniques for IoT Cybersecurity Across Multi-Class and Binary Classification Tasks,” J. Futur. Artif. Intell. Technol., vol. 1, no. 4, pp. 371–384, Feb. 2025, doi: 10.62411/faith.3048-3719-51.

B. Poudyal and M. Shakya, “Enhancing Earthquake Preparedness in Nepal through Machine Learning-Based Damage Prediction Models,” J. Futur. Artif. Intell. Technol., vol. 2, no. 3, pp. 476–492, Oct. 2025, doi: 10.62411/faith.3048-3719-109.

D. R. I. M. Setiadi, K. Nugroho, A. R. Muslikh, S. W. Iriananda, and A. A. Ojugo, “Integrating SMOTE-Tomek and Fusion Learning with XGBoost Meta-Learner for Robust Diabetes Recognition,” J. Futur. Artif. Intell. Technol., vol. 1, no. 1, pp. 23–38, May 2024, doi: 10.62411/faith.2024-11.

A. Wajahat et al., “Outsmarting Android Malware with Cutting-Edge Feature Engineering and Machine Learning Techniques,” Comput. Mater. Contin., vol. 79, no. 1, pp. 651–673, 2024, doi: 10.32604/cmc.2024.047530.

E. S. Akkaya and E. V. Altay, “Investigating the Performance of Machine Learning Methods for Malware Detection,” in EAI/Springer Innovations in Communication and Computing, 2025, pp. 329–340. doi: 10.1007/978-3-031-88999-8_25.

N. G. Ambekar, S. Thokchom, and S. Moulik, “TC-AMD: Android Malware Detection through Transfomer-CNN Hybrid Architecture,” in 2024 IEEE International Conference on Advanced Networks and Telecommunications Systems (ANTS), Dec. 2024, pp. 1–6. doi: 10.1109/ANTS63515.2024.10898633.

N. G. Ambekar, N. N. Devi, S. Thokchom, and Yogita, “TabLSTMNet: enhancing android malware classification through integrated attention and explainable AI,” Microsyst. Technol., vol. 31, no. 3, pp. 695–713, Mar. 2025, doi: 10.1007/s00542-024-05615-0.

T. Kacem and S. Tossou, “Trandroid: An Android Mobile Threat Detection System Using Transformer Neural Networks,” Electronics, vol. 14, no. 6, p. 1230, Mar. 2025, doi: 10.3390/electronics14061230.

T. Palabaş, “Android malware classification using basic machine learning methods,” Adıyaman Üniversitesi Mühendislik Bilim. Derg., vol. 11, no. 23, pp. 190–202, Aug. 2024, doi: 10.54365/adyumbd.1462488.

P. Borah, D. Bhattacharyya, and J. Kalita, “Malware Dataset Generation and Evaluation,” in 2020 IEEE 4th Conference on Information & Communication Technology (CICT), Dec. 2020, pp. 1–6. doi: 10.1109/CICT51604.2020.9312053.

R. Taheri, M. Shojafar, F. Arabikhan, and A. Gegov, “Unveiling vulnerabilities in deep learning-based malware detection: Differential privacy driven adversarial attacks,” Comput. Secur., vol. 146, p. 104035, Nov. 2024, doi: 10.1016/j.cose.2024.104035.

Year	Acceptance Rate	Days to First Decision
2025	35%	2 days
2024	45%	3 days

Android Malware Detection Using Machine Learning with SMOTE-Tomek Data Balancing

Authors

DOI:

Keywords:

Abstract

Author Biographies

Maryam Sufiyanu Masari, Air Force Institute of Technology

Maiauduga Abdullahi Danladi, Air Force Institute of Technology

Ilori Loretta Onyinye, Air Force Institute of Technology

Loreta Katok Tohomdet, Air Force Institute of Technology

References

Downloads

Published

How to Cite

Issue

Section

License

Information


This journal is licensed under a Creative Commons Attribution 4.0 International License.