Web Phishing Classification using Combined Machine  Learning Methods

Bambang Mahardhika Poerbo Waseso; Noor Ageng Setiyanto

doi:10.33633/jcta.v1i1.8898

Authors

Bambang Mahardhika Poerbo Waseso Dian Nuswantoro University
Noor Ageng Setiyanto Universitas Dian Nuswantoro

DOI:

https://doi.org/10.33633/jcta.v1i1.8898

Keywords:

Phishing detection, Phishing classification, Naïve Bases, K-Nearest Neighbor, Combined Classifier

Abstract

Phishing is a crime that uses social engineering techniques, both in deceptive statements and technically, to steal consumers' personal identification data and financial account credentials. With the new Phishing machine learning approach, websites can be recognized in real-time. K-Nearest Neighbor(KNN) and Naïve Bayes (NB) are popular machine learning approaches. KNN and NB have their own strengths and weaknesses. By combining the two, deficiencies can be covered. So this study proposes to combine K-Nearest Neighbor with Naïve Bayes to classify phishing websites. Based on the results of the accuracy test of the combination of KNN with k=8 and Naïve Bayes, a maximum accuracy of 93.44% is produced. This result is 6.25% superior compared to using only one classifier.

References

“Statistics.” https://www.itu.int/en/ITU-D/Statistics/Pages/stat/default.aspx (accessed Aug. 01, 2023).

Cybersecurity Ventures, “Cybercrime To Cost The World $10.5 Trillion Annually By 2025,” Cybercrime Magazine, Sausalito, Nov. 2020. [Online]. Available: https://cybersecurityventures.com/hackerpocalypse-cybercrime-report-2016/

P. Yang, G. Zhao, and P. Zeng, “Phishing Website Detection Based on Multidimensional Features Driven by Deep Learning,” IEEE Access, vol. 7, pp. 15196–15209, 2019, doi: 10.1109/ACCESS.2019.2892066.

H. S. Lallie et al., “Cyber security in the age of COVID-19: A timeline and analysis of cyber-crime and cyber-attacks during the pandemic,” Comput. Secur., vol. 105, p. 102248, 2021, doi: 10.1016/j.cose.2021.102248.

S. Gastellier-Prevost, G. G. Granadillo, and M. Laurent, “Decisive Heuristics to Differentiate Legitimate from Phishing Sites,” in 2011 Conference on Network and Information Systems Security, May 2011, pp. 1–9. doi: 10.1109/SAR-SSI.2011.5931389.

S. Bhattacharyya, C. kumar Pal, and P. kumar Pandey, “Detecting Phishing Websites, a Heuristic Approach,” Int. J. Latest Eng. Res. Appl., vol. 3, pp. 120–129, 2017, [Online]. Available: www.ijlera.com

J. Solanki and R. G. Vaishnav, “Website Phishing Detection using Heuristic Based Approach,” Int. Res. J. Eng. Technol., pp. 2044–2048, 2016, [Online]. Available: www.irjet.net

C. M. R. da Silva, E. L. Feitosa, and V. C. Garcia, “Heuristic-based strategy for Phishing prediction: A survey of URL-based approach,” Comput. Secur., vol. 88, p. 101613, 2020, doi: 10.1016/j.cose.2019.101613.

A. Safi and S. Singh, “A systematic literature review on phishing website detection techniques,” J. King Saud Univ. - Comput. Inf. Sci., vol. 35, no. 2, pp. 590–611, 2023, doi: 10.1016/j.jksuci.2023.01.004.

M. N. Alam, D. Sarma, F. F. Lima, I. Saha, R. E. Ulfath, and S. Hossain, “Phishing attacks detection using machine learning approach,” Proc. 3rd Int. Conf. Smart Syst. Inven. Technol. ICSSIT 2020, no. Icssit, pp. 1173–1179, 2020, doi: 10.1109/ICSSIT48917.2020.9214225.

S. A. Khan, W. Khan, and A. Hussain, “Phishing Attacks and Websites Classification Using Machine Learning and Multiple Datasets (A Comparative Analysis),” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 12465 LNAI, 2020, pp. 301–313. doi: 10.1007/978-3-030-60796-8_26.

M. Mithra Raj and J. A. Arul Jothi, “Website Phishing Detection Using Machine Learning Classification Algorithms,” 2022, pp. 219–233. doi: 10.1007/978-3-031-19647-8_16.

J. Kumar, A. Santhanavijayan, B. Janet, B. Rajendran, and B. S. Bindhumadhava, “Phishing Website Classification and Detection Using Machine Learning,” in 2020 International Conference on Computer Communication and Informatics (ICCCI), Jan. 2020, pp. 1–6. doi: 10.1109/ICCCI48352.2020.9104161.

A. K. Dutta, “Detecting phishing websites using machine learning technique,” PLoS One, vol. 16, no. 10, p. e0258361, Oct. 2021, doi: 10.1371/journal.pone.0258361.

S. Alnemari and M. Alshammari, “Detecting Phishing Domains Using Machine Learning,” Appl. Sci., vol. 13, no. 8, p. 4649, Apr. 2023, doi: 10.3390/app13084649.

T. A. Assegie, “K-Nearest Neighbor Based URL Identification Model for Phishing Attack Detection,” Indian J. Artif. Intell. Neural Netw., vol. 1, no. 2, pp. 18–21, 2021, doi: 10.54105/ijainn.b1019.041221.

A. Manconi, G. Armano, M. Gnocchi, and L. Milanesi, “A Soft-Voting Ensemble Classifier for Detecting Patients Affected by COVID-19,” Appl. Sci., vol. 12, no. 15, 2022, doi: 10.3390/app12157554.

S. Chatterjee and Y.-C. Byun, “Voting Ensemble Approach for Enhancing Alzheimer’s Disease Classification,” Sensors, vol. 22, no. 19, p. 7661, Oct. 2022, doi: 10.3390/s22197661.

F. Ülker and A. Küçüker, “Probabilistic weighted voting model using multiple machine learning methods for fault detection and classification,” COMPEL - Int. J. Comput. Math. Electr. Electron. Eng., vol. 41, no. 5, pp. 1542–1565, Aug. 2022, doi: 10.1108/COMPEL-06-2021-0200.

S. H. Ahammad et al., “Phishing URL detection using machine learning methods,” Adv. Eng. Softw., vol. 173, no. July, p. 103288, 2022, doi: 10.1016/j.advengsoft.2022.103288.

Z. Mushtaq, M. F. Ramzan, S. Ali, S. Baseer, A. Samad, and M. Husnain, “Voting Classification-Based Diabetes Mellitus Prediction Using Hypertuned Machine-Learning Techniques,” Mob. Inf. Syst., vol. 2022, pp. 1–16, Mar. 2022, doi: 10.1155/2022/6521532.

S. Marianingsih, F. Utaminingrum, and F. A. Bachtiar, “Road surface types classification using combination of K-nearest neighbor and Naïve Bayes based on GLCM,” Int. J. Adv. Soft Comput. its Appl., vol. 11, no. 2, pp. 15–27, 2019.

R. G. Devi and P. Sumanjani, “Improved classification techniques by combining KNN and Random Forest with Naive Bayesian classifier,” ICETECH 2015 - 2015 IEEE Int. Conf. Eng. Technol., no. March, pp. 1–4, 2015, doi: 10.1109/ICETECH.2015.7274997.

N. A. Afandi and Isredza Rahmi A Hamid, “Covid-19 Phishing Detection Based on Hyperlink Using K-Nearest Neighbor (KNN) Algorithm,” Appl. Inf. Technol. Comput. Sci., vol. 2, no. 2, pp. 287–301, 2021, [Online]. Available: https://publisher.uthm.edu.my/periodicals/index.php/aitcs/article/view/2317

N. Kumar and P. Chaudhary, “Mobile phishing detection using naive bayesian algorithm,” IJCSNS Int. J. Comput. Sci. Netw. Secur., vol. 17, no. 7, pp. 142–147, 2017.

D. R. Ignatius Moses Setiadi et al., “Comparison of SVM, KNN, and NB Classifier for Genre Music Classification based on Metadata,” in 2020 International Seminar on Application for Technology of Information and Communication (iSemantic), Sep. 2020, pp. 12–16. doi: 10.1109/iSemantic50169.2020.9234199.

A. Susanto, Z. H. Dewantoro, C. A. Sari, D. R. I. M. Setiadi, E. H. Rachmawanto, and I. U. W. Mulyono, “Shallot Quality Classification using HSV Color Models and Size Identification based on Naive Bayes Classifier,” J. Phys. Conf. Ser., vol. 1577, no. 1, 2020, doi: 10.1088/1742-6596/1577/1/012020.