Log-Transformed Regime-Based Prediction of Cloud Job Length Using Machine Learning
DOI:
https://doi.org/10.62411/jcta.15866Keywords:
Cloud computing, Cloud job-length prediction, Energy-efficient computing, Heavy-tailed data, Log transformation, Machine learning, Regime-based learning, Workload predictionAbstract
Cloud job-length prediction remains challenging when the target distribution is highly skewed and contains rare extreme values. This study proposes a log-transformed, regime-based machine learning framework for robust prediction of cloud job length, represented in million instructions (MI). The approach integrates sequential feature engineering, logarithmic target transformation, weighted learning, and regime-aware modeling to distinguish between normal and extreme job-length behavior. Using an ordered GoCJ-derived cloud job-length sequence of 1000 jobs, the dataset exhibits a heavy-tailed distribution, with a mean of 129,662 MI, a median of 93,000 MI, a 95th percentile of 525,000 MI, a 99th percentile of 900,000 MI, and a skewness of 3.695. The proposed model is evaluated against sequential baselines and stronger machine learning baselines, including Naive_Last, RollingMean_5, Global_Log_ExtraTrees, RandomForest, GradientBoosting, and MLP_Log. On the main test split, the proposed Regime_Log_ExtraTrees achieved the best RMSE of 206,255.66 and the least negative R² of −0.01062, while Global_Log_ExtraTrees remained competitive in terms of MAE, MedAE, and RMSLE. Additional walk-forward validation confirms that the regime-aware model consistently achieves the best mean RMSE and mean R² across temporal folds. Ablation results further show that regime-aware learning is the primary contributor to robustness, although accurate prediction of extreme jobs remains challenging. These findings indicate that log-transformed, regime-based learning provides a practical and more robust strategy for cloud job-length prediction under heavy-tailed workload conditions.References
S. Kashyap, A. Singh, and S. S. Gill, “Machine learning-centric prediction and decision based resource management in cloud computing environments,” Cluster Comput., vol. 28, no. 2, p. 130, Apr. 2025, doi: 10.1007/s10586-024-04787-8.
A. Rossi, A. Visentin, D. Carraro, S. Prestwich, and K. N. Brown, “Forecasting workload in cloud computing: towards uncertainty-aware predictions and transfer learning,” Cluster Comput., vol. 28, no. 4, p. 258, Aug. 2025, doi: 10.1007/s10586-024-04933-2.
S. Kashyap and A. Singh, “Prediction-based scheduling techniques for cloud data center’s workload: a systematic review,” Cluster Comput., vol. 26, no. 5, pp. 3209–3235, Oct. 2023, doi: 10.1007/s10586-023-04024-8.
P. Nehra and A. Nagaraju, “Host utilization prediction using hybrid kernel based support vector regression in cloud data centers,” J. King Saud Univ. - Comput. Inf. Sci., vol. 34, no. 8, pp. 6481–6490, Sep. 2022, doi: 10.1016/j.jksuci.2021.04.011.
T. Ali, H. U. Khan, F. K. Alarfaj, and M. AlReshoodi, “Hybrid deep learning and evolutionary algorithms for accurate cloud workload prediction,” Computing, vol. 106, no. 12, pp. 3905–3944, Dec. 2024, doi: 10.1007/s00607-024-01340-8.
M. Xu, C. Song, H. Wu, S. S. Gill, K. Ye, and C. Xu, “esDNN: Deep Neural Network Based Multivariate Workload Prediction in Cloud Computing Environments,” ACM Trans. Internet Technol., vol. 22, no. 3, pp. 1–24, Aug. 2022, doi: 10.1145/3524114.
J. Kumar and A. K. Singh, “Performance Assessment of Time Series Forecasting Models for Cloud Datacenter Networks’ Workload Prediction,” Wirel. Pers. Commun., vol. 116, no. 3, pp. 1949–1969, Feb. 2021, doi: 10.1007/s11277-020-07773-6.
M. E. Karim, M. M. S. Maswood, S. Das, and A. G. Alharbi, “BHyPreC: A Novel Bi-LSTM Based Hybrid Recurrent Neural Network Model to Predict the CPU Workload of Cloud Virtual Machine,” IEEE Access, vol. 9, pp. 131476–131495, 2021, doi: 10.1109/ACCESS.2021.3113714.
Y. S. Patel and J. Bedi, “MAG-D: A multivariate attention network based approach for cloud workload forecasting,” Futur. Gener. Comput. Syst., vol. 142, pp. 376–392, May 2023, doi: 10.1016/j.future.2023.01.002.
S. Ouhame, Y. Hadi, and A. Ullah, “An efficient forecasting approach for resource utilization in cloud data center using CNN-LSTM model,” Neural Comput. Appl., vol. 33, no. 16, pp. 10043–10055, Aug. 2021, doi: 10.1007/s00521-021-05770-9.
L. Kidane, P. Townend, T. Metsch, and E. Elmroth, “Automated Hyperparameter Tuning for Adaptive Cloud Workload Prediction,” in Proceedings of the IEEE/ACM 16th International Conference on Utility and Cloud Computing, Dec. 2023, pp. 1–8. doi: 10.1145/3603166.3632244.
J. Bawa, K. Kaur Chahal, and K. Kaur, “Improving cloud resource management: an ensemble learning approach for workload prediction,” J. Supercomput., vol. 81, no. 10, p. 1138, Jul. 2025, doi: 10.1007/s11227-025-07560-9.
T. Khan, W. Tian, S. Ilager, and R. Buyya, “Workload forecasting and energy state estimation in cloud data centres: ML-centric approach,” Futur. Gener. Comput. Syst., vol. 128, pp. 320–332, Mar. 2022, doi: 10.1016/j.future.2021.10.019.
J. Lwin, “Enhancing Cloud Task Scheduling with Multi-Objective Optimization Using K-Means Clustering and Dynamic Resource Allocation,” J. Comput. Theor. Appl., vol. 2, no. 2, pp. 202–211, Oct. 2024, doi: 10.62411/jcta.11337.
N. K. Mon, “Optimizing Cloud Computing Performance by Integrating the Novel PSBR Service Broker Policy and Load Balancing Algorithms,” J. Comput. Theor. Appl., vol. 2, no. 2, pp. 212–221, Oct. 2024, doi: 10.62411/jcta.11221.
A. Bin Faisal, N. Martin, H. M. Bashir, S. Lamelas, and F. R. Dogar, “When will my ML Job finish? Toward providing Completion Time Estimates through Predictability-Centric Scheduling,” in Proceedings of the 18th USENIX Symposium on Operating Systems Design and Implementation (OSDI 24), 2024, pp. 487–505. doi: 10.5555/3691938.3691964.
Y. Xing, “Work Scheduling in Cloud Network Based on Deep Q-LSTM Models for Efficient Resource Utilization,” J. Grid Comput., vol. 22, no. 1, p. 36, Mar. 2024, doi: 10.1007/s10723-024-09746-6.
S. Lajili, Z. Brahmi, and M. N. Omri, “ML WPStreamCloud: ML-based Workload Prediction and Task Clustering for Efficient Stream Application Ofoading in Heterogeneous Edge and Cloud Environments,” Procedia Comput. Sci., vol. 246, pp. 1527–1537, 2024, doi: 10.1016/j.procs.2024.09.610.
E. Yildirim, M. Hussein, M. Titov, and O. O. Kilic, “Predicting runtime and resource utilization of jobs on integrated cloud and HPC systems,” Futur. Gener. Comput. Syst., vol. 176, p. 108230, Mar. 2026, doi: 10.1016/j.future.2025.108230.
N. I. Mahbub, M. D. Hossain, S. Akhter, M. I. Hossain, K. Jeong, and E.-N. Huh, “Robustness of Workload Forecasting Models in Cloud Data Centers: A White-Box Adversarial Attack Perspective,” IEEE Access, vol. 12, pp. 55248–55263, 2024, doi: 10.1109/ACCESS.2024.3385863.
P. Nehra and N. Kesswani, “A workload prediction model for reducing service level agreement violations in cloud data centers,” Decis. Anal. J., vol. 11, p. 100463, Jun. 2024, doi: 10.1016/j.dajour.2024.100463.
H. L. Leka, Z. Fengli, A. T. Kenea, A. T. Tegene, P. Atandoh, and N. W. Hundera, “A Hybrid CNN-LSTM Model for Virtual Machine Workload Forecasting in Cloud Data Center,” in 2021 18th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP), Dec. 2021, pp. 474–478. doi: 10.1109/ICCWAMTIP53232.2021.9674067.
K. Menear, A. Nag, J. Perr-Sauer, M. Lunacek, K. Potter, and D. Duplyakin, “Mastering HPC Runtime Prediction: From Observing Patterns to a Methodological Approach,” in Practice and Experience in Advanced Research Computing, Jul. 2023, pp. 75–85. doi: 10.1145/3569951.3593598.
Z. Ahamed, M. Khemakhem, F. Eassa, F. Alsolami, A. Basuhail, and K. Jambi, “Deep Reinforcement Learning for Workload Prediction in Federated Cloud Environments,” Sensors, vol. 23, no. 15, p. 6911, Aug. 2023, doi: 10.3390/s23156911.
S. Karimunnisa and Y. Pachipala, “Deep Learning Approach for Workload Prediction and Balancing in Cloud Computing,” Int. J. Adv. Comput. Sci. Appl., vol. 15, no. 4, 2024, doi: 10.14569/IJACSA.2024.0150477.
O. Ghandour, S. El Kafhali, and M. Hanini, “Adaptive workload management in cloud computing for service level agreements compliance and resource optimization,” Comput. Electr. Eng., vol. 120, p. 109712, Dec. 2024, doi: 10.1016/j.compeleceng.2024.109712.
S. Kashyap and A. Singh, “A Hybrid Scheduling for Multi-Objective Optimization using Prediction Approach,” J. Grid Comput., vol. 23, no. 3, p. 22, Sep. 2025, doi: 10.1007/s10723-025-09809-2.
G. Zhou, W. Tian, R. Buyya, R. Xue, and L. Song, “Deep reinforcement learning-based methods for resource scheduling in cloud computing: a review and future directions,” Artif. Intell. Rev., vol. 57, no. 5, p. 124, Apr. 2024, doi: 10.1007/s10462-024-10756-9.
Y. Sanjalawe, S. Al-E’mari, S. Fraihat, and S. Makhadmeh, “AI-driven job scheduling in cloud computing: a comprehensive review,” Artif. Intell. Rev., vol. 58, no. 7, p. 197, Apr. 2025, doi: 10.1007/s10462-025-11208-8.
K. Menear, K. Konate, K. Potter, and D. Duplyakin, “Tandem Predictions for HPC jobs,” in Practice and Experience in Advanced Research Computing 2024: Human Powered Computing, Jul. 2024, pp. 1–9. doi: 10.1145/3626203.3670547.
A. Hussain and M. Aleem, “GoCJ: Google Cloud Jobs Dataset for Distributed and Cloud Computing Infrastructures,” Data, vol. 3, no. 4, p. 38, Sep. 2018, doi: 10.3390/data3040038.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Ardi Pujiyanta, Bambang Robiin, Faisal Fajri Rahani

This work is licensed under a Creative Commons Attribution 4.0 International License.














