Time Series Forecasting of Top 3 Ranking Cryptocurrencies

– Cryptocurrency has become a phenomenon worldwide. Although not all countries have legalized it, it is considered a promising investment asset. Currently, there are three top-ranking cryptocurrencies: Bitcoin, Ethereum, and Tether. This research aims to compare the performance of five forecasting algorithms, namely Autoregressive Integrated Moving Average (ARIMA), Neural Network, Support Vector Machine, Linear Regression, and Generalized Linear Model, using the dataset of Bitcoin, Ethereum, and Tether cryptocurrencies. The research methodology employed is Knowledge Discovery In Databases (KDD). The technique involves assessing the performance based on the Root Mean Square Error (RMSE) and comparing the results to find the most optimal model performance. The research findings indicate that for Bitcoin cryptocurrency, the Neural Network algorithm produced the most optimal results with an RMSE of 9180.534. For Ethereum cryptocurrency, the Neural Network algorithm demonstrated the best performance with an RMSE value of 537.528. Furthermore, for Tether cryptocurrency, the ARIMA algorithm yielded the best performance with an RMSE value of 0.003.


INTRODUCTION
Cryptocurrency is a technology that utilizes cryptography with the purpose of ensuring security and regulating authority through decentralized systems.The primary objective of this system is to manage the creation of new units, record transactions, and provide security that cannot be replicated or forged [1].Cryptocurrency serves as a medium of transaction for purchasing goods and services, and it can also be utilized as a long-term investment with the goal of obtaining future profits [2].One of the advantages of the cryptocurrency market is its availability, operating 24 hours a day.This allows users to monitor the movements of the crypto market on a daily basis.As a result of these advantages, cryptocurrency is now gaining recognition in several countries worldwide [3].The movement of cryptocurrency can be monitored through a website called Yahoo.Finance.com.The interface of this website is presented in Figure 1.
Figure 1.Cryptocurrency Market Cap [4] Figure 1 displays the ranking of cryptocurrencies from highest to lowest.The top 3 ranked cryptocurrencies based on market capitalization are Bitcoin, Ethereum, and Tether.Bitcoin, ranked first, was introduced by Sakoshi Nakamoto as the first digital currency.Its unique characteristic lies in the control maintained through an open-source software system, granting anyone the ability to influence or modify the system [5].Ethereum, ranked second, is a digital currency built on the innovations of Bitcoin but with significant differences.Ethereum serves not only as a means of payment but also as a marketplace for financial services, games, and applications that safeguard user data from security threats [6].Tether, ranked third, is a type of cryptocurrency categorized as a stablecoin, with its value pegged to the U.S. dollar.Tether is frequently used as an intermediary when traders switch between different cryptocurrencies [6].
There have been several previous studies focusing on Cryptocurrency Forecasting.The first study examined the forecasting of Bitcoin prices using the Random Forest Algorithm [7].The research findings revealed an RMSE value of 0.010 and a Mean Absolute Error (MAE) of 0.008 when employing the Random Forest Algorithm for Bitcoin forecasting.The second study explored the forecasting of Ethereum prices using the Backpropagation Neural Network Algorithm [8].The results showed that utilizing this method with a learning rate of 0.001 and 1000 epochs resulted in Mean Absolute Percentage Error (MAPE) values of 1.4694, 1.4839, and 1.4727.The third study focused on forecasting Cryptocurrency prices using the Long-Short Term Memory (LSTM) method [9].The dataset used in this study was DOGE, and the application of the LSTM method yielded an RMSE value of 0.0630 for DOGE cryptocurrency.The fourth study involved comparing the use of Linear Regression, Neural Network, Deep Learning, and K-Nearest Neighbor algorithms for forecasting Bitcoin prices [10].The research findings demonstrated that the best models for predicting Bitcoin prices were the Linear Regression and Neural Network algorithms, resulting in RMSE values of 296.227 +/-60.125 (micro average: 301.655 +/-0.000) and 338.988 +/-47.837(micro average: 342.000 +/-0.000),respectively.Lastly, the fifth study focused on forecasting Bitcoin prices using the Random Forest Algorithm [5].The study showed that implementing this method yielded an MAPE value of 1.50% with an accuracy of 98.50%.A summarized comparison of the previous studies is presented in Table 1.This study fills a gap in previous research by conducting forecasting for three cryptocurrencies simultaneously, namely Bitcoin, Ethereum, and Tether, making it an essential contribution.Additionally, the five algorithms are used as benchmarks to obtain the most optimal model.The evaluation stage will utilize the Root Mean Square Error to measure the performance generated by each model.

RESEARCH METHOD
The research methodology employed in this study is Knowledge Discovery In Databases (KDD), a technique used to uncover and analyze patterns in data, as well as interpret and predict future events [11].The proposed research framework is in chart form, as shown in Figure 2.   The dataset represents the population data starting from the beginning of trading until the date of this research, which is 13/06/2023.The initial dataset results gathered from the website are presented in the form of tables, as shown in Table 2 for Bitcoin, Table 3 for Ethereum, and Table 4 for Tether.All the gathered datasets consist of several attributes, including: 1. Date: an attribute that represents a time series [12]; 2. Open: the opening price of Crypto [13]; 3. High : the highest price of Crypto for one day [14]; 4. Low: the lowest price of Crypto for one day [15]; 5. Closing: the closing price of Crypto for one day [14]; 6. Volume: trading volume of Crypto in USD [16]; 7. Adj Close : Closing price adjusted for corporate actions such as rights issue, stock split or stock reverse [13].
The seven attributes in the raw dataset will not all be used.Only the Date and Close attributes will be utilized in this research, while the rest will be eliminated from the dataset.The second stage is Pre-Processing, which is a crucial step before entering the modeling process and before simulating using the Algorithms [17], [18].Proper pre-processing techniques can enhance recognition accuracy and expedite the subsequent processes [19].The techniques employed in this stage include : 1. Data Cleansing: the process of cleaning data from empty values, inconsistent, empty attributes such as missing values and noisy data [20]; 2. Data Integration: merging data into one archive [21]; 3. Data Reduction: eliminating unnecessary attributes [22].
The third stage is Validation.In the Validation stage, the accuracy performance of the developed model is assessed.The algorithms employed for validation include ARIMA, Neural Network, Support Vector Machine, Linear Regression, and Generalized Linear Model, using the K-Fold Cross-Validation (KCV) technique.Cross-validation is a method used to evaluate the generalizability of statistical analysis outcomes to new and unseen datasets [23].It involves assessing how well the model or analysis results can be applied to novel data.KVC divides the dataset into k parts and performs k iterations.In each iteration, one part of the dataset is used as testing data, while the remaining k -1 parts are used for training.This process is repeated k times, and the average deviation (error) value is calculated based on the different test results obtained in each iteration [12].The formulas for each algorithm are presented as follows: 1. ARIMA Algorithm [24] y_t=c+ϕ_1 y_(t-1)+...+ϕ_p y_(t-p)+θ_1 ε_(t-1)+...+θ_q ε_(t-q)+ε_t (1) In this formula, ϕ determines the coefficient of the AR process, and θ shows the coefficient for the MA process.As in the literature, the ARIMA model is determined by ARIMA (p, d, q).The "p" refers to the order of the AR process, which means how many previous lags of the variable are included in the model.The "d" represents the level of differencing is needed to transform the series into stationary.Finally, the "q" shows the level of the regression residuals' previous residuals that are included in the model.

Neural Network Algorithm
Neural networks, also known as ANNs (artificial neural networks), are a specific branch within the realm of machine learning.Drawing inspiration from the intricate workings of the human brain, they imitate the exchange of signals between biological neurons [10].These networks are constructed by interconnecting various layers, such as the input layer, hidden layers, and output layer.A visual representation of the structural model of a neural network can be observed in Figure 6.

Linear Regression Algorithm
Regression modeling is an analytical method employed to estimate the value of the dependent variable 'y' based on a range of independent variable 'x'.Multivariate linear regression, on the other hand, is a statistical approach utilized to forecast the outcome of a response variable by considering multiple explanatory variables [10].

𝑦 = 𝛽 0 + 𝛽 1𝑋 1 + ⋯ + 𝛽 𝑚𝑋𝑚 + 𝜀 (3) 5. Generalized Linear Model
The generalized linear model (GLM) is an expansion of the Linear Regression model.It assumes that the predictors have a linear impact on the response variable but does not make assumptions about a particular distribution for the response variable.GLM is particularly useful when the response variable belongs to the exponential family.
Common distributions within the exponential family include normal, Poisson, binomial, gamma, and inverse Gaussian distributions [26].
The final stage is Evaluation.The evaluation phase encompasses the assessment of the results obtained from applying the model in order to determine if the research objectives have been met.Based on this evaluation, a decision is made regarding the utilization of the modeling outcomes [27].The Parameter is use Root Mean Square Error.RMSE (Root Mean Square Error) represents the magnitude of the error rate in a prediction.A smaller RMSE value indicates a higher level of accuracy in the resulting prediction [28], [29].The formula for RMSE is as follows:

RESULTS AND DISCUSSION
The results of the Dataset and Pre-Processing stages are presented in Table 5.Table 5 displays the results of the pre-processing stage, where the selected attributes are "Attribute Date" and "Attribute Close".The "Attribute Close" is designated as the target or class variable.Each dataset is now prepared to proceed to the next stage.The subsequent stages are Validation and Evaluation.These stages involve using Rapidminer to build the process model.RapidMiner offers multiple benefits.In addition to its cross-platform compatibility due to being programmed in Java, it stands out with its agile error correction capabilities.It demonstrates excellence in data transformation, modeling, and visualization techniques [30].Consequently, RapidMiner is an ideal choice for this research as it provides comprehensive graph-based data visualization capabilities.Validation is performed using the Validation operator in Rapidminer, incorporating five algorithms: ARIMA, Neural Network, Support Vector Machine, Linear Regression, and Generalized Linear Model.The output of the validation process is the Root Mean Square Error (RMSE) value.In the first process model, the Validation operator is connected to the ARIMA algorithm.The process model is presented in Figure 7.The results of the process model validation shown in Figures 7 and 8 are presented in tabular form, as depicted in Table 6.Based on the evaluation results, it can be analyzed that for Bitcoin cryptocurrency, the most optimal RMSE value is shown by the Neural Network algorithm, with an RMSE value of 918.534.Furthermore, for Ethereum cryptocurrency, the most optimal RMSE value is indicated by the Neural Network algorithm with an RMSE value of 537.528.Lastly, for Tether cryptocurrency, the most optimal RMSE value is demonstrated by the ARIMA algorithm with an RMSE value of 0.003.To assess the conformity between the conducted forecasting and the actual data, it is presented in chart form, as shown in Figure 9 for Bitcoin, Figure 10 for Ethereum, and Figure 11 for Tether.Based on the chart, the accuracy level between the forecasting values and the actual data can be observed.The highest price movement in reality occurred on October 2, 2019, with a value of 43,327 USD.However, for the forecasting values, they seem to stabilize around 1,000 USD.

CONCLUSION
The research results indicate that each cryptocurrency has distinct characteristics in terms of the dataset.For Bitcoin, the most optimal result was achieved using the Neural

Figure 9 .
Figure 9. Chart Bitcoin using Neural Network Based on the chart, the accuracy level between the forecasting values and the actual data can be observed.The highest price movement in reality occurred on October 1, 2021, with a value of 61,318 USD.However, the highest forecasted value occurred on January 1, 2023, reaching 35,193 USD.

Table 1 .
Research Gap

Table 6 .
Evaluation Result