Comparative Analysis of CatBoost and LightGBM for Tree Seedling Survival Prediction to Support Smart Forestry

Authors

  • Angga Bayu Santoso Universitas Teknokrat Indonesia, Lampung, Indonesia
  • Okma Arnilia UIN Siber Syekh Nurjati Cirebon, Indonesia
  • Sahrial Ihsani Ishak Universitas Dian Nusantara, Jawa Barat, Indonesia
  • I Gusti Nyoman Agung Bisma Tatwa Institut Pertanian Bogor, Jawa Barat, Indonesia

DOI:

https://doi.org/10.62411/tc.v25i2.15989

Abstract

Tree seedling survival is a critical factor in forest regeneration and sustainable ecosystem management. However, predicting seedling survival remains challenging due to complex interactions between environmental conditions, soil biotic factors, and functional plant traits. This study aims to compare the performance of CatBoost and Light Gradient Boosting Machine (LightGBM) algorithms in predicting tree seedling survival using a machine learning approach. The dataset, obtained from the Tree Survival Prediction dataset on Kaggle, includes environmental variables, soil interaction factors, and functional traits. The target variable is binary, indicating whether a seedling survives or not. Data preprocessing involved handling missing values, encoding categorical variables, normalization, and model validation using 10-fold cross-validation. Model performance was evaluated using accuracy, precision, recall, F1-score, and Receiver Operating Characteristic Area Under Curve (ROC-AUC). The results show that LightGBM outperforms CatBoost, achieving an accuracy of 0.8456, precision of 0.8718, recall of 0.8553, F1-score of 0.8635, and ROC-AUC of 0.9282. In comparison, CatBoost achieves an accuracy of 0.8223 and ROC-AUC of 0.9132. Feature importance analysis indicates that arbuscular mycorrhizal fungi, phenolics, and lignin are the most influential factors affecting seedling survival. These findings demonstrate that LightGBM is a reliable and efficient model for smart forestry applications, supporting data-driven decision-making and improving reforestation strategies. The model enables simulation of planting scenarios, improving resource efficiency and restoration success rates. Keywords - CatBoost, LightGBM, Machine Learning, Seedling Survival, Smart Forestry

Downloads

Published

2026-05-28

Issue

Section

Articles