Enhancing Lung Cancer Classification Effectiveness Through Hyperparameter-Tuned Support Vector Machine
DOI:
https://doi.org/10.62411/jcta.10106Keywords:
Hyperparameter Tuning, Lung cancer classification, Radial Basis Function Kernel, Random Grid Search, Support Vector MachineAbstract
This research aims to improve the effectiveness of lung cancer classification performance using Support Vector Machines (SVM) with hyperparameter tuning. Using Radial Basis Function (RBF) kernels in SVM helps deal with non-linear problems. At the same time, hyperparameter tuning is done through Random Grid Search to find the best combination of parameters. Where the best parameter settings are C = 10, Gamma = 10, Probability = True. Test results show that the tuned SVM improves accuracy, precision, specificity, and F1 score significantly. However, there was a slight decrease in recall, namely 0.02. Even though recall is one of the most important measuring tools in disease classification, especially in imbalanced datasets, specificity also plays a vital role in avoiding misidentifying negative cases. Without hyperparameter tuning, the specificity results are so poor that considering both becomes very important. Overall, the best performance obtained by the proposed method is 0.99 for accuracy, 1.00 for precision, 0.98 for recall, 0.99 for f1-score, and 1.00 for specificity. This research confirms the potential of tuned SVMs in addressing complex data classification challenges and offers important insights for medical diagnostic applications.References
H. Sung et al., “Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries,” CA. Cancer J. Clin., vol. 71, no. 3, pp. 209–249, May 2021, doi: 10.3322/caac.21660.
G. C. Observatory, “Global Cancer Observatory.” Nov. 16, 2021. [Online]. Available: https://gco.iarc.fr/today/data/factsheets/populations/360-indonesia-fact-sheets.pdf.
M. Vedaraj, C. S. Anita, A. Muralidhar, V. Lavanya, K. Balasaranya, and P. Jagadeesan, “Early Prediction of Lung Cancer Using Gaussian Naive Bayes Classification Algorithm,” Int. J. Intell. Syst. Appl. Eng., vol. 11, no. 6s, pp. 838–848, 2023.
A. C. Society, “Cancer Facts & Figures 2023.” Nov. 16, 2023. [Online]. Available: https://www.cancer.org/content/dam/cancer-org/research/cancer-facts-and-statistics/annual-cancer-facts-and-figures/2023/2023-cff-special-section-lung-cancer.pdf
World Health Organization, “Cancer.” Nov. 16, 2023. [Online]. Available: https://www.who.int/health-topics/cancer
W.-T. Wu et al., “Data mining in clinical big data: the frequently used databases, steps, and methodological models,” Mil. Med. Res., vol. 8, no. 1, p. 44, Aug. 2021, doi: 10.1186/s40779-021-00338-z.
E. B. Wijayanti, D. R. I. M. Setiadi, and B. H. Setyoko, “Dataset Analysis and Feature Characteristics to Predict Rice Production based on eXtreme Gradient Boosting,” J. Comput. Theor. Appl., vol. 2, no. 1, 2024, doi: 10.62411/jcta.10057.
M. S. Sunarjo and H. Gan, “High-Performance Convolutional Neural Network Model to Identify COVID-19 in Medical Images,” J. Comput. Theor. Appl., vol. 1, no. 1, pp. 19–30, Aug. 2023, doi: 10.33633/jcta.v1i1.8936.
Z. Rustam and S. A. A. Kharis, “Comparison of Support Vector Machine Recursive Feature Elimination and Kernel Function as feature selection using Support Vector Machine for lung cancer classification,” J. Phys. Conf. Ser., vol. 1442, no. 1, p. 12027, Jan. 2020, doi: 10.1088/1742-6596/1442/1/012027.
T. R. Noviandy, K. Nisa, G. M. Idroes, I. Hardi, and N. R. Sasmita, “Classifying Beta-Secretase 1 Inhibitor Activity for Alzheimer’s Drug Discovery with LightGBM,” J. Comput. Theor. Appl., vol. 2, no. 2, pp. 138–147, Mar. 2024, doi: 10.62411/jcta.10129.
S. Ali, A. Hashmi, A. Hamza, U. Hayat, and H. Younis, “Dynamic and Static Handwriting Assessment in Parkinson ’ s Disease : A Synergistic Approach with C-Bi-GRU and VGG19,” J. Comput. Theor. Appl., vol. 1, no. 2, pp. 151–162, 2023, doi: 10.33633/jcta.v1i2.9469.
M. Siraj-Ud-Doula and M. A. Alam, “Ecological Data Analysis Based on Machine Learning Algorithms,” p. 18, Dec. 2018, [Online]. Available: https://arxiv.org/abs/1812.09138
F. Mustofa, A. N. Safriandono, A. R. Muslikh, and D. R. I. M. Setiadi, “Dataset and Feature Analysis for Diabetes Mellitus Classification using Random Forest,” J. Comput. Theor. Appl., vol. 1, no. 1, pp. 41–48, Jan. 2023, doi: 10.33633/jcta.v1i1.9190.
H. T. Adityawan, O. Farroq, S. Santosa, H. M. M. Islam, M. K. Sarker, and D. R. I. M. Setiadi, “Butterflies Recognition using Enhanced Transfer Learning and Data Augmentation,” J. Comput. Theor. Appl., vol. 1, no. 2, pp. 115–128, Nov. 2023, doi: 10.33633/jcta.v1i2.9443.
N. N. Wijaya, D. R. I. M. Setiadi, and A. R. Muslikh, “Music-Genre Classification using Bidirectional Long Short-Term Memory and Mel-Frequency Cepstral Coefficients,” J. Comput. Theor. Appl., vol. 2, no. 1, pp. 13–26, Jan. 2024, doi: 10.62411/jcta.9655.
T. A. Assegie, “An optimized K-Nearest Neighbor based breast cancer detection,” J. Robot. Control, vol. 2, no. 3, Jan. 2021, doi: 10.18196/jrc.2363.
H. Karamti et al., “Improving Prediction of Cervical Cancer Using KNN Imputed SMOTE Features and Multi-Model Ensemble Learning Approach,” Cancers (Basel)., vol. 15, no. 17, p. 4412, Sep. 2023, doi: 10.3390/cancers15174412.
M. E. Shipe, S. A. Deppen, F. Farjah, and E. L. Grogan, “Developing prediction models for clinical use using logistic regression: an overview,” J. Thorac. Dis., vol. 11, no. S4, pp. S574–S584, Mar. 2019, doi: 10.21037/jtd.2019.01.25.
W. Ksi??ek, M. Gandor, and P. P?awiak, “Comparison of various approaches to combine logistic regression with genetic algorithms in survival prediction of hepatocellular carcinoma,” Comput. Biol. Med., vol. 134, p. 104431, Jul. 2021, doi: 10.1016/j.compbiomed.2021.104431.
S. Dasariraju, M. Huo, and S. McCalla, “Detection and Classification of Immature Leukocytes for Diagnosis of Acute Myeloid Leukemia Using Random Forest Algorithm,” Bioengineering, vol. 7, no. 4, p. 120, Oct. 2020, doi: 10.3390/bioengineering7040120.
B. O. Macaulay, B. S. Aribisala, S. A. Akande, B. A. Akinnuwesi, and O. A. Olabanjo, “Breast cancer risk prediction in African women using Random Forest Classifier,” Cancer Treat. Res. Commun., vol. 28, p. 100396, Jan. 2021, doi: 10.1016/j.ctarc.2021.100396.
M. R. Abbasniya, S. A. Sheikholeslamzadeh, H. Nasiri, and S. Emami, “Classification of Breast Tumors Based on Histopathology Images Using Deep Features and Ensemble of Gradient Boosting Methods,” Comput. Electr. Eng., vol. 103, no. 1, p. 108382, Jan. 2022, doi: 10.1016/j.compeleceng.2022.108382.
P. Arunachalam et al., “Synovial Sarcoma Classification Technique Using Support Vector Machine and Structure Features,” Intell. Autom. Soft Comput., vol. 32, no. 2, pp. 1241–1259, Jan. 2022, doi: 10.32604/iasc.2022.022573.
B. A. Akinnuwesi et al., “Application of support vector machine algorithm for early differential diagnosis of prostate cancer,” Data Sci. Manag., vol. 6, no. 1, pp. 1–12, Mar. 2023, doi: 10.1016/j.dsm.2022.10.001.
H. Tabrizchi, M. Tabrizchi, and H. Tabrizchi, “Breast cancer diagnosis using a multi-verse optimizer-based gradient boosting decision tree,” SN Appl. Sci., vol. 2, no. 4, p. 752, Apr. 2020, doi: 10.1007/s42452-020-2575-9.
D. D. Rufo, T. G. Debelee, A. Ibenthal, and W. G. Negera, “Diagnosis of Diabetes Mellitus Using Gradient Boosting Machine (LightGBM),” Diagnostics, vol. 11, no. 9, p. 1714, Sep. 2021, doi: 10.3390/diagnostics11091714.
W. Bakasa and S. Viriri, “Light Gradient-Boosting Machine Edge Detection With Cropping Layer for Semantic Segmentation of Pancreas,” Adv. Artif. Intell. Mach. Learn., vol. 03, no. 03, pp. 1274–1294, Jan. 2023, doi: 10.54364/AAIML.2023.1175.
F. Su et al., “Prognostic models for breast cancer: based on logistics regression and Hybrid Bayesian Network,” BMC Med. Inform. Decis. Mak., vol. 23, no. 1, p. 120, Jul. 2023, doi: 10.1186/s12911-023-02224-1.
M. Pyingkodi, R. Mahalakshmi, and M. Gowthami, “Performance Evaluation Of Machine Learning Algorithm For Lung Cancer,” vol. 12, no. 03, p. 11, 2021.
D. Mustafa Abdullah, A. Mohsin Abdulazeez, and A. Bibo Sallow, “Lung cancer Prediction and Classification based on Correlation Selection method Using Machine Learning Techniques,” Qubahan Acad. J., vol. 1, no. 2, pp. 141–149, May 2021, doi: 10.48161/qaj.v1n2a58.
E. Dritsas and M. Trigka, “Lung Cancer Risk Prediction with Machine Learning Models,” Big Data Cogn. Comput., vol. 6, no. 4, p. 139, Nov. 2022, doi: 10.3390/bdcc6040139.
“Lung Cancer Classification Dataset.” Nov. 05, 2023. [Online]. Available: https://www.kaggle.com/datasets/mysarahmadbhat/lung-cancer
C. Aroef, Y. Rivan, and Z. Rustam, “Comparing random forest and support vector machines for breast cancer classification,” TELKOMNIKA (Telecommunication Comput. Electron. Control., vol. 18, no. 2, p. 815, Apr. 2020, doi: 10.12928/telkomnika.v18i2.14785.
H. Naik, K. Yashwanth, S. P, and N. Jayapandian, “Machine Learning based Food Sales Prediction using Random Forest Regression,” in 2022 6th International Conference on Electronics, Communication and Aerospace Technology, Dec. 2022, pp. 998–1004. doi: 10.1109/ICECA55336.2022.10009277.
B. Yassin, C. Mohamed, and A.-A. Yassine, “A Nonlinear Support Vector Machine Analysis Using Kernel Functions for Nature and Medicine,” E3S Web Conf., vol. 319, p. 01103, Nov. 2021, doi: 10.1051/e3sconf/202131901103.
D. A. Anggoro and S. S. Mukti, “Performance Comparison of Grid Search and Random Search Methods for Hyperparameter Tuning in Extreme Gradient Boosting Algorithm to Predict Chronic Kidney Failure,” Int. J. Intell. Eng. Syst., vol. 14, no. 6, pp. 198–207, Dec. 2021, doi: 10.22266/ijies2021.1231.19.
R. Akbarinia, “Parallel Techniques for Big Data Analytics,” Université de Montpellier, 2019. [Online]. Available: https://hal-lirmm.ccsd.cnrs.fr/tel-02169414
A. P. Gopi, R. N. S. Jyothi, V. L. Narayana, and K. S. Sandeep, “Classification of tweets data based on polarity using improved RBF kernel of SVM,” Int. J. Inf. Technol., vol. 15, no. 2, pp. 965–980, Feb. 2023, doi: 10.1007/s41870-019-00409-4.
Muljono, S. A. Wulandari, H. Al Azies, M. Naufal, W. A. Prasetyanto, and F. A. Zahra, “Breaking Boundaries in Diagnosis: Non-Invasive Anemia Detection Empowered by AI,” IEEE Access, vol. 12, pp. 9292–9307, Jan. 2024, doi: 10.1109/ACCESS.2024.3353788.
M. A. Mezher, A. Altamimi, and R. Altamimi, “A Genetic Folding Strategy Based Support Vector Machine to Optimize Lung Cancer Classification,” Front. Artif. Intell., vol. 5, p. 826374, Jun. 2022, doi: 10.3389/frai.2022.826374.
W. Setiawan, J. Banjarnahor, M. F. Shandika, A. -, and M. Radhi, “Analysis of Classification of Lung Cancer using The Decision Tree Classifier Method,” J. Sist. Inf. dan Ilmu Komput. Prima(JUSIKOM PRIMA), vol. 7, no. 1, pp. 121–131, Aug. 2023, doi: 10.34012/jurnalsisteminformasidanilmukomputer.v7i1.4136.
N. Devihosur and R. K. M. G, “Enhancing Precision in Lung Cancer Diagnosis Through Machine Learning Algorithms,” Int. J. Adv. Comput. Sci. Appl., vol. 14, no. 8, Jan. 2023, doi: 10.14569/IJACSA.2023.01408116.
S. T. Rikta, K. M. M. Uddin, N. Biswas, R. Mostafiz, F. Sharmin, and S. K. Dey, “XML-GBM lung: An explainable machine learning-based application for the diagnosis of lung cancer,” J. Pathol. Inform., vol. 14, p. 100307, Jan. 2023, doi: 10.1016/j.jpi.2023.100307.
D. Dablain, B. Krawczyk, and N. V Chawla, “DeepSMOTE: Fusing Deep Learning and SMOTE for Imbalanced Data,” IEEE Trans. Neural Networks Learn. Syst., vol. 34, no. 9, pp. 6390–6404, Sep. 2023, doi: 10.1109/TNNLS.2021.3136503.
M. M. Pushpalatha and N. Indira, “Application and Comparison of Majority Weighted Minority Oversampling Techniques and Random OverSampling Examples Data Balancing Methods on the Vertebral Column Dataset,” vol. 8, no. 3, 2021, [Online]. Available: https://www.jetir.org/papers/JETIR2103328.pdf
S. B. Imanulloh, A. R. Muslikh, and D. R. I. M. Setiadi, “Plant Diseases Classification based Leaves Image using Convolutional Neural Network,” J. Comput. Theor. Appl., vol. 1, no. 1, pp. 1–10, Aug. 2023, doi: 10.33633/jcta.v1i1.8877.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Fita Sheila Gomiasti, De Rosal Ignatius Moses Setiadi
This work is licensed under a Creative Commons Attribution 4.0 International License.