Ensemble Learning Model in Predicting Corrosion Inhibition Capability of Pyridazine Compounds

ABSTRACT


INTRODUCTION
A simple, useful, and affordable method of controlling corrosion is using inhibitor technology [1], [2].Using inhibitors is a well-known and effective way to stop corrosion damage [3], [4].By preventing charge and mass transfer, corrosion inhibitor compounds can cover metal surfaces in a protective layer that shields the metal from corrosive environmental impacts [5], [6].To stop oxidation reactions that cause corrosion on the metal surface, corrosion inhibitors usually work by forming a shield [7], [8], [9].
In the context of organic inhibitors, pyridazine compounds have garnered a lot of attention due to their ability to stop corrosion in a variety of settings.The greater efficacy of quinoxaline-based corrosion inhibitors has been associated with the presence of functional groups, double conjugate bonds, and aromatic rings in their molecular structure [10], [11].In general, theoretical techniques such as quantum chemical analyses and atomic simulations have been employed by researchers to ascertain the electrical and structural properties relevant to inhibitory efficacy.Moreover, several studies that have employed the results of theoretical calculations like density functional theory (DFT) and molecular simulations have clarified the inhibitor's inhibitory mechanism [12], [13].
Machine learning (ML) can be used to assess a compound's effectiveness in preventing corrosion because there is a measurable correlation between a compound's molecular characteristics and activity and its structure [14], [15].To develop machine learning models to evaluate inhibitor performance, several algorithms have also been used and combined, including ensemble methods, bayesian approaches, decision trees, gradient boosting machines, deep learning neural networks, and clustering algorithms [16], [17], [18], [19], [20], [21].
For the results to provide pertinent information and accurately characterize the qualities of the material being tested, the primary issue in machine learning research is creating models that can make correct predictions.Therefore, to validate the ML model's ability to predict the corrosion inhibition efficiency (CIE) value of pyridazine derivative chemical inhibitors, we assessed it in this study using an ensemble-based model.

ML Model
The first step in building an ML model is preprocessing, where data normalization using the MinMax scaling technique is applied to reduce the sensitivity of the model to certain features.The next preprocessing step is to divide the data into training and testing sets using a k-fold cross-validation strategy.This approach was chosen to overcome data bias and variation by continuously training the model until it reaches the lowest statistical error [26], [27].The value k = 10 was chosen to divide the test set into one fold, while the training set consisted of the remaining nine folds.Generally, k = 5 or k = 10 are used, while the exact number of k-folds depends on the characteristics of the data used [28], [29].
In the modeling stage, we evaluate and compare the predictive performance of ensemble-based models, such as random forest (RF), gradient boosting (GB), and adaboost (ADA) regressors.The efficacy of prediction models is evaluated using regression metrics such as coefficient of determination (R2), root mean square error (RMSE), and mean absolute error (MAE).The ideal model has lower RMSE and MAE values and an R2 value that is close to 1 [30].

RESULT AND DISCUSSION
Regression model performance is typically assessed using R2, RMSE, and MAE metrics.R2 quantifies the proportion of dependent variable variance explained by independent variables, with 1 denoting a perfect fit.Higher R2 values indicate better predictive performance.RMSE represents the typical error magnitude, with lower values indicating greater prediction accuracy.MAE measures the average absolute difference between expected and observed values, with lower values indicating better prediction accuracy.Table 1 displays R2, RMSE, and MAE values for models ADA, GB, and RF, offering a quantitative comparison of their performance.In comparison to the other models, the GB model's data points are closer to its prediction (fitting) line, suggesting a better fit and alignment with the real data.Based on all evaluation criteria (R2, RMSE, and MAE), GB consistently performs better than ADA and RF models, suggesting improved predictive capability.This demonstrates that GB is effective for the prediction challenge.

CONCLUSION
The ability of the ML model to predict the CIE value of pyridazine compounds has been examined by comparing it with the ensemble-based models.The GB model was found to be more accurate than the ADA and RF models based on the R2, MAE, and RMSE measurements.GB is the better model, with higher R2 values showing better variance capture, lower RMSE values reflecting smaller prediction errors, and lower MAE values suggesting increased accuracy.Visual examination of the data distribution in comparison to model predictions confirms this finding and highlights how much better GB fits the real data.This research provides useful insights into developing realistic and effective material exploration strategies to aid the industry in producing corrosion-inhibiting materials.

Figure 1 .
Scatter plot of (a) GB, (b) ADA, and (c) RF models Furthermore, Figure 1 provides visual confirmation of these results by showing the distribution of data points concerning the models' prediction lines.

Table 1 .
Model prediction performances