Investigation of Corrosion Inhibition Efficiency of Pyridine-Quinoline Compounds through Machine Learning

,


INTRODUCTION
Corrosion in materials is a very important concern for the industrial and academic world because corrosion results in enormous losses in various fields such as economics, environment, society, industry, security, safety, and others [1], [2], [3].One of the simplest, most effective, and economical methods is the application of inhibitor technology for corrosion control [4], [5], [6].The effectiveness of inhibitor compounds depends on their ability to form an adsorbed/protective layer on the metal surface, which can block charge and mass transfer, thereby protecting the metal from a corrosive environment [7], [8], [9].Experimental investigations in evaluating various potential inhibitor compound candidates require intensive costs, time, and resources [10], [11], [12].
Because electronic properties and chemical reactivity can be quantified against the chemical structure of compounds, the quantitative structure-property relationship (QSPR) model based on the machine learning (ML) approach can be used further in investigating various candidate inhibitor compounds [13], [14], [15].Quantum chemical descriptors (QCD) calculated by density functional theory (DFT) are a significant feature in the development of reliable and precise QSPR models.Generally, feature selection is carried out to obtain relevant quantum chemical descriptors used in the development of the QSPR model [16], [17], [18].QSPR modeling by linear and non-linear regression of various quantum chemical descriptors has been well reported.In addition, the use of machine learning methods can optimize the performance of inhibitor synthesis before experimental analysis to achieve performance effectiveness and efficiency.
Various ML algorithms such as genetic algorithm (GA), multiple linear regression (MLR), partial least squares (PLS), ordinary least squares regression (OLS), artificial neural network (ANN), adaptive neural fuzzy inference system (ANFIS), autoregressive with exogenous inputs (ARX) have been widely used and combined in the development of QSPR models to evaluate inhibitor performance.The ANN model was used to predict the corrosion inhibition potential of 11 thiophene derivatives with 7 quantum chemical descriptors which resulted in a coefficient of determination (R 2 ) value of 0.96 [19].Another QSPR study was also developed to predict pyridine and quinoline-derived compounds with 20 QCD using a combination of linear GA-PLS and non-linear GA-ANN techniques.The GA-PLS model shows a root mean squared error (RMSE) of 14.9, while the GA-ANN model shows an RMSE of 16.7, respectively, and 8.8 [20].MLR linear and non-linear ANN models were used to evaluate 20 pyridazine derivatives with 5 QCD by Quadri et al. [21].The results show that the ANN model provides more optimal results with an RMSE value of 10.6.In a separate study, Quadri et al. [11] also developed OLS linear and non-linear ANN models to predict 40 quinoxaline-derived compounds with 5 selected QCDs.The results reported that the ANN nonlinear model shows a better prediction with an RMSE value of 5.4.Anadebe et al. [9] reported the performance of the ANN and ANFIS models.The two non-linear methods produced R 2 and RMSE values of 0.91 and 4.4 for ANN, while R 2 and RMSE for ANFIS were 0.99 and 1.4.These results indicate that the ANFIS model is better than the ANN model in evaluating 15 expired salbutamol drug molecules as inhibitors.In addition, a recent report developing an ARX model for 250 commercial drugs used as corrosion inhibitors obtained an RMSE value of 7.0 [22].
In this work, we develop a QSPR-based ML model with a comparative analysis between algorithms to evaluate the corrosion inhibition performance of pyridine-quinoline organic compounds using datasets in the literature [20], [23], [24], [25], [26].Various DFT-calculated quantum chemical descriptors in the dataset were used to build a statistically validated QSPR model to consider, analyze, and model to guide the design of corrosion inhibition.introduction must contain the background, and literature review (state of the art) to show the results and findings of previous research and indicate the main limitations of previous research.

Dataset
The dataset containing 41 pyridazine-quinoline compounds evaluated in this study comes from the literature [20], [25], [26].Various quantum chemical descriptors of the inhibitor compound are used to construct the QSPR model to guide the design of corrosion inhibition.Corrosion inhibition is highly dependent on the chemical reactivity of the inhibitor molecule which is represented in various quantum chemical descriptors [27], [28].Twenty quantum chemical descriptors were considered (see Table 1).
Quantum chemical descriptors such as HOMO, LUMO, NBO, Energy gap, Emolecule, log P, Van der Waals Volume, Van der Waals Surface Area, and Solvent Accessible Surface Area, are generally obtained from DFT calculations.While the others can continue to be calculated based on Koopman's theory with the following equation: HOMO (highest occupied molecular orbital) is the highest occupied molecular orbital, while LUMO (lowest unoccupied molecular orbital) is the lowest occupied molecular orbital.HOMO describes the ability of inhibitor molecules as electron donors, while LUMO describes the properties of inhibitor molecules as electron acceptors.Electron transfer can be studied through the HOMO-LUMO orbital conditions based on their energy values.The inhibitor molecule is not only an electron donor to the metal surface, but also acts as an electron acceptor from the metal surface.The energy gap (ΔE) is the energy difference between LUMO and HOMO, which indicates the degree of binding ability of the inhibitor molecule to the metal surface.Ionization potential (I) and electron affinity (A) also describe the degree of reactivity of the inhibitor molecule.Electronegativity (relates to the ability of inhibitor molecules to attract electrons so that electron equilibrium is reached.Global hardness (η) indicates the resistance of a molecule to transfer charge, while global softness (σ) indicates the capacity of a molecule to accept charge.The dipole moment (µ) of a molecule describes the ability (bond dipole) of the molecule to interact with the metal surface dipole.This relates to the contact area between the inhibitor molecule and the metal surface leading to better corrosion inhibition capability.The polarizability (δ) of the molecule considers the distribution of electron density around the molecule.Electrophilicity (ω) also describes the ability of a molecule to absorb electrons.When the inhibitor molecule and the metal surface interact, there will be a flow of electrons from the inhibitor molecule to the metal surface atom (ΔN) [29], [30], [31].Electron transfer occurs due to differences in electronegativity values between the inhibitor molecule and the metal surface.Electrons will move from the inhibitor molecule (low electronegativity) to the metal surface (high electronegativity) until the chemical potential is balanced.The total energy is related to the ability of the inhibitor molecule to be adsorbed on the metal surface.The electron-donating capacity (ω−) describes the tendency of molecules to donate charges, while the electron-accepting capacity (ω+) describes the tendency of molecules to accept charges.Analysis of interacting charges can be done using Natural Bonding Orbital (NBO) population analysis.It can show the negative value of the atomic charge.In addition, it can also be analyzed for the positive atomic charge which is the center of the electron acceptor from the metal surface.In general, the mechanism of corrosion inhibition is related to the interaction between the inhibitor molecule and the metal surface.Corrosion inhibitors can be absorbed on metal surfaces through chemisorption or physisorption.Therefore, the adsorption energy (ΔEads) is an important molecular descriptor.Hydrophobicity (log P) relates to the ability of molecules to form adsorbed layers via hydrophobic mechanisms to inhibit corrosion.Van der Waals surface area (VSA) and van der Waals volume (VV), as well as solvent accessible surface area (SASA), are considered to measure the ability of molecules to prevent access of corrosive agents to metal surfaces [32], [33], [34], [35], [36].

ML Model
This study uses three scenarios to build a prediction model for corrosion inhibition efficiency (CIE).The prediction model is built using the ensemble algorithms with the Python programming language.The three models were evaluated to explain the potential relationship between features (QCD) and targets (CIE).The model was built using a dataset of 41 molecules broken down into training and testing with a ratio of 70:30.In the preprocessing stage, data normalization is carried out to avoid problems with the sensitivity of certain features to the prediction results.The model is validated using k-fold cross-validation, i.e. 1 data is used as validation data, and the rest is for model training [37], [38].Model performance is measured using the metric coefficient of determination R 2 and root mean squared error (RMSE) [39].All parameters and other settings are default as set in sci-kit learn release 0.23.2.

RESULT AND DISCUSSION
The model prediction performance metrics are presented in Table 2.The distribution of data points is illustrated in Figure 1.The pattern of the predicted value to the actual value is shown in Figure 2. The analysis of important features is shown in Figure 3.  From Table 2 the RF model has the highest R 2 value and lowest RMSE compared to the SVR and KNN models.The best model is the one with an R 2 value close to 1 and the lowest RMSE.These results are supported by the visualization of the distribution of data points in Figure 1 which shows that the predicted data points are closer to the fitting line than the other two models.In addition, in Figure 2 it can also be seen that the target predicted value of the RF model (in red) shows the most similar pattern to the experimental pattern (actual).The results above show that the RF model outperforms the SVR and KNN models and shows the best predictive performance because the resulting predicted values are closest to the actual values.The relationship between the target and the features can be interpreted through the important feature values in Figure 3.This study uses the Random Forest model to calculate the important feature values.It can be observed that the energy gap shows the most responsible variable for the performance of the model in predicting the value of inhibition efficiency.These results are also by the general theory that has long been developed related to the efficiency of molecular inhibition.The gap energy indicates the ability of the inhibitor molecule to bind to the metal surface; The more negative the energy gap value indicates that the molecule requires lower energy to remove electrons from the HOMO orbital to the LUMO orbital.The low energy gap indicates that the inhibitor molecule has a high level of reactivity, so the corrosion inhibition rate is higher.The parallel adsorption energy feature also appears as another important feature.The adsorption energy indicates the ability of the adsorbed molecule; the more negative the adsorption energy value indicates the stronger the adsorption (bonding) of the molecule on the metal surface, so that the inhibition efficiency is higher.Another important feature is global hardness and global softness.Global hardness is related to the resistance of a molecule to transfer charge, while global softness shows the capacity of a molecule to accept charge.The lower the hardness value or the higher the softness value indicates that the molecule has higher reactivity, meaning that it is easier to interact and bond with the metal surface, so the efficiency of corrosion inhibition is higher.The ability of molecules as electron acceptors (LUMO, electron affinity, electron acceptor capability) was also found to be an important feature.The lower the LUMO value, the higher the electron affinity and electron acceptor capability values, indicating that the ability of the inhibitor molecule to accept electrons from the metal surface is getting better, resulting in a higher inhibitor efficiency.Lastly, NBO charge on-N is also an important feature in determining inhibition efficiency.The NBO charge on-N is related to parallel adsorption because pyridazine and quinoline molecules can be adsorbed perpendicularly through the heterocyclic nitrogen atom to the metal surface.This affects the adsorption strength.

CONCLUSION
We developed a QSPR-based ML approach to evaluate the predictive performance of the SVR, RF, and KNN models in investigating the corrosion inhibition of iron by pyridine-quinoline derivatives.The RF model shows the best predictive ability based on R 2 and RMSE values.The energy gap appears as the feature most responsible for the performance of the prediction model.Overall, our study provides new insights regarding the ML model in predicting corrosion inhibition on iron surfaces.Our model can still be developed to improve prediction accuracy, for example by adding polynomial functions and/or virtual samples, in future research.

Figure 1 .Figure 2 .
Figure 1.Scatter plot of data point model prediction

Figure 3 .
Figure 3. Importance values of the features

Table 2 .
Model prediction performance