Dataset Analysis and Feature Characteristics to Predict Rice Production based on eXtreme Gradient Boosting
DOI:
https://doi.org/10.62411/jcta.10057Keywords:
Harvest prediction, Paddy production forecasting, Regression analysis, Rice yield prediction, Rice production prediction, XGBoost PredictionAbstract
Rice plays a vital role as the main food source for almost half of the global population, contributing more than 21% of the total calories humans need. Production predictions are important for determining import-export policies. This research proposes the XGBoost method to predict rice harvests globally using FAO and World Bank datasets. Feature analysis, removal of duplicate data, and parameter tuning were carried out to support the performance of the XGBoost method. The results showed excellent performance based on which reached 0.99. Evaluation of model performance using metrics such as MSE, and MAE measured by k-fold validation show that XGBoost has a high ability to predict crop yields accurately compared to other regression methods such as Random Forest (RF), Gradient Boost (GB), Bagging Regressor (BR) and K-Nearest Neighbor (KNN). Apart from that, an ablation study was also carried out by comparing the performance of each model with various features and state-of-the-art. The results prove the superiority of the proposed XGBoost method. Where results are consistent, and performance is better, this model can effectively support agricultural sustainability, especially rice production.References
A. R. Muslikh, D. R. I. M. Setiadi, and A. A. Ojugo, “Rice Disease Recognition using Transfer Learning Xception Convolutional Neural Network,” J. Tek. Inform., vol. 4, no. 6, pp. 1535–1540, Dec. 2023, doi: 10.52436/1.jutif.2023.4.6.1529.
A. Morales and F. J. Villalobos, “Using machine learning for crop yield prediction in the past or the future,” Front. Plant Sci., vol. 14, p. 1128388, Nov. 2023, doi: 10.3389/fpls.2023.1128388.
C. Singha and K. C. Swain, “Rice crop growth monitoring with sentinel 1 SAR data using machine learning models in google earth engine cloud,” Remote Sens. Appl. Soc. Environ., vol. 32, p. 101029, Nov. 2023, doi: 10.1016/j.rsase.2023.101029.
D. Leising, J. Burger, J. Zimmermann, M. Bäckström, J. R. Oltmanns, and B. S. Connelly, “Why do items correlate with one another? A conceptual analysis with relevance for general factors and network models,” PsyArXiv, Jan. 2020. doi: 10.31234/osf.io/7c895.
N. Sansika, R. Sandumini, C. Kariyawasam, T. Bandara, K. Wisenthige, and R. Jayathilaka, “Impact of economic globalisation on value-added agriculture, globally,” PLoS One, vol. 18, no. 7, p. e0289128, Jan. 2023, doi: 10.1371/journal.pone.0289128.
H. T. Pham, J. Awange, M. Kuhn, B. Van Nguyen, and L. K. Bui, “Enhancing Crop Yield Prediction Utilizing Machine Learning on Satellite-Based Vegetation Health Indices,” Sensors, vol. 22, no. 3, p. 719, Nov. 2022, doi: 10.3390/s22030719.
S. Brice and H. Almond, “Health Professional Digital Capabilities Frameworks: A Scoping Review,” J. Multidiscip. Healthc., vol. Volume 13, pp. 1375–1390, Jan. 2020, doi: 10.2147/JMDH.S269412.
X. Gao, J. Wen, and C. Zhang, “An Improved Random Forest Algorithm for Predicting Employee Turnover,” Math. Probl. Eng., vol. 2019, pp. 1–12, Jan. 2019, doi: 10.1155/2019/4140707.
L. Zhu, D. Qiu, D. Ergu, C. Ying, and K. Liu, “A study on predicting loan default based on the random forest algorithm,” Procedia Comput. Sci., vol. 162, pp. 503–513, Jan. 2019, doi: 10.1016/j.procs.2019.12.017.
W. Li, Y. Yin, X. Quan, and H. Zhang, “Gene Expression Value Prediction Based on XGBoost Algorithm,” Front. Genet., vol. 10, p. 1077, Jan. 2019, doi: 10.3389/fgene.2019.01077.
G. Abdurrahman and M. Sintawati, “Implementation of xgboost for classification of parkinson’s disease,” J. Phys. Conf. Ser., vol. 1538, no. 1, p. 12024, Jan. 2020, doi: 10.1088/1742-6596/1538/1/012024.
Q. Zhou and A. Ismaeel, “Integration of maximum crop response with machine learning regression model to timely estimate crop yield,” Geo-spatial Inf. Sci., vol. 24, no. 3, pp. 474–483, Nov. 2021, doi: 10.1080/10095020.2021.1957723.
J. Velthoen, C. Dombry, J.-J. Cai, and S. Engelke, “Gradient boosting for extreme quantile regression,” Extremes, vol. 26, no. 4, pp. 639–667, Jan. 2023, doi: 10.1007/s10687-023-00473-x.
E. Bueechi et al., “Crop yield anomaly forecasting in the Pannonian basin using gradient boosting and its performance in years of severe drought,” Agric. For. Meteorol., vol. 340, p. 109596, Jan. 2023, doi: 10.1016/j.agrformet.2023.109596.
S. Uddin, I. Haque, H. Lu, M. A. Moni, and E. Gide, “Comparative performance analysis of K-nearest neighbour (KNN) algorithm and its different variants for disease prediction,” Sci. Rep., vol. 12, no. 1, p. 6256, Jan. 2022, doi: 10.1038/s41598-022-10358-x.
D. M. Atallah, M. Badawy, A. El-Sayed, and M. A. Ghoneim, “Predicting kidney transplantation outcome based on hybrid feature selection and KNN classifier,” Multimed. Tools Appl., vol. 78, no. 14, pp. 20383–20407, Jan. 2019, doi: 10.1007/s11042-019-7370-5.
P. W. Khan, S.-J. Park, S.-J. Lee, and Y.-C. Byun, “Electric Kickboard Demand Prediction in Spatiotemporal Dimension Using Clustering-Aided Bagging Regressor,” J. Adv. Transp., vol. 2022, pp. 1–15, Jan. 2022, doi: 10.1155/2022/8062932.
S. B. Xu, S. Y. Huang, Z. G. Yuan, X. H. Deng, and K. Jiang, “Prediction of the Dst Index with Bagging Ensemble-learning Algorithm,” Astrophys. J. Suppl. Ser., vol. 248, no. 1, p. 14, Jan. 2020, doi: 10.3847/1538-4365/ab880e.
H. Suryono, H. Kuswanto, and N. Iriawan, “Two-Phase Stratified Random Forest for Paddy Growth Phase Classification: A Case of Imbalanced Data,” Sustainability, vol. 14, no. 22, p. 15252, Nov. 2022, doi: 10.3390/su142215252.
M. Aljabri et al., “Machine Learning-Based Detection for Unauthorized Access to IoT Devices,” J. Sens. Actuator Networks, vol. 12, no. 2, p. 27, Jan. 2023, doi: 10.3390/jsan12020027.
S. Dhaliwal, A.-A. Nahid, and R. Abbas, “Effective Intrusion Detection System Using XGBoost,” Information, vol. 9, no. 7, p. 149, Nov. 2018, doi: 10.3390/info9070149.
A. Nagaraju, M. A. Kumar Reddy, C. Venugopal Reddy, and R. Mohandas, “Multifactor Analysis to Predict Best Crop using Xg-Boost Algorithm,” in 2021 5th International Conference on Trends in Electronics and Informatics (ICOEI), Nov. 2021, pp. 155–163. doi: 10.1109/ICOEI51242.2021.9452918.
S. Hazra, S. Karforma, A. Bandyopadhyay, S. Chakraborty, and D. Chakraborty, “Prediction of Crop Yield Using Machine Learning Approaches for Agricultural Data,” Nov. 2023. doi: 10.36227/techrxiv.23694867.v1.
J. Ge et al., “Prediction of Greenhouse Tomato Crop Evapotranspiration Using XGBoost Machine Learning Model,” Plants, vol. 11, no. 15, p. 1923, Nov. 2022, doi: 10.3390/plants11151923.
A. Gupta and A. Singh, “Prediction Framework on Early Urine Infection in IoT–Fog Environment Using XGBoost Ensemble Model,” Wirel. Pers. Commun., vol. 131, no. 2, pp. 1013–1031, Nov. 2023, doi: 10.1007/s11277-023-10466-5.
Z. Noroozi, A. Orooji, and L. Erfannia, “Analyzing the impact of feature selection methods on machine learning algorithms for heart disease prediction,” Sci. Rep., vol. 13, no. 1, p. 22588, Jan. 2023, doi: 10.1038/s41598-023-49962-w.
D. Zeppilli, G. Ribaudo, N. Pompermaier, A. Madabeni, M. Bortoli, and L. Orian, “Radical Scavenging Potential of Ginkgolides and Bilobalide: Insight from Molecular Modeling,” Antioxidants, vol. 12, no. 2, p. 525, Jan. 2023, doi: 10.3390/antiox12020525.
L. S. Cedric et al., “Crops yield prediction based on machine learning models: Case of West African countries,” Smart Agric. Technol., vol. 2, p. 100049, Jan. 2022, doi: 10.1016/j.atech.2022.100049.
D. Chicco, M. J. Warrens, and G. Jurman, “The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation,” PeerJ Comput. Sci., vol. 7, pp. 1–24, 2021, doi: 10.7717/PEERJ-CS.623.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Ella Budi Wijayanti, De Rosal Ignatius Moses Setiadi
This work is licensed under a Creative Commons Attribution 4.0 International License.