Can machine learning predict cryptocurrency prices? An analysis using piecewise interpolation method as an example
This article was published in Crypto Valley Live, author: Michel Kana, translation: Jeremy.
This practical guide provides the foundational knowledge you need to predict the rapid rise in cryptocurrency prices.
Fifteen years ago, I began exploring the world of digital currencies and prototyped a peer-to-peer mobile currency platform that only used SMS.
Recently, a collaborator asked me, can artificial intelligence predict cryptocurrency prices? She was curious about the hype surrounding blockchain.
After research, I found that predicting cryptocurrency prices is a solvable problem, but it is definitely not applicable to all market conditions.
Typical predictive models for crypto assets would utilize time series forecasting (such as ARIMA, Facebook Prophet), machine learning (such as random forest algorithms, linear regression), or deep learning methods (such as LSTM).
In this article, I investigate how piecewise interpolation performs when predicting the average price of Litecoin on a given date.
Data
We will focus on the historical prices of Litecoin from April 2013 to February 2021. This data is sourced from coinmarketcap and is freely available. I divided the data into an 80% training dataset and a 20% testing dataset. The latter is used to evaluate the accuracy of our predicted closing prices.
Historical price of cryptocurrency Litecoin (Source: Kaggle)
A brief exploratory data analysis shows that the average closing price is highest at the beginning and end of the year, with the lowest in October.
Polynomial Regression
You may have heard of polynomial regression, which can be considered the simplest example of creating a basis of degree d to approximate a nonlinear function (in our case, the fluctuations in cryptocurrency prices).
I performed a simple polynomial regression on the historical prices of Litecoin, using degrees of 5, 25, and 80. In each case, the R2 value provides some insight into how well the model fits the testing dataset.
From the fit of the blue line to the training data below, we can observe that as the polynomial degree increases, the curve becomes steeper. This is due to the increased complexity of the model, as high-degree polynomials attempt to chase every single data point in the training set.
Day 0 represents April 30, 2013, and Day 2800 represents February 28, 2021.
Especially in areas with outliers (the middle part of the graph), high-degree polynomials tend to extend towards these outliers. Therefore, the model with an 80-degree polynomial has the highest variance.
Its bias on the training data is also the lowest, as reflected in the highest R2 value, whereas lower-degree polynomials have lower R2 values, indicating higher bias but lower variance. Lower-degree polynomials are less sensitive to the training data.
Piecewise Interpolation
I found a more flexible approach is to use piecewise polynomials to predict cryptocurrency prices.
Piecewise interpolation fits a large number of data points using low-degree polynomials. By only using low-degree polynomials, we eliminate excessive oscillation and non-convergence.
Given a set of data points, piecewise interpolation works by using different polynomials in each segment of the data.
Specifically, we use connected piecewise polynomials, also known as splines.
An example of a spline is the truncated linear function below. It is flat to the left of 4, known as the knot of the function.
Given several knots, we can combine multiple linear basis functions and fit them to nonlinear data.
To detect the highly curved relationships present in cryptocurrency prices, I used a truncated cubic function, also known as cubic splines.
Using cubic splines, we divide the data into segments and fit a cubic spline to each segment. Each spline function connects to the next function at the knots.
Cubic splines are a very good choice for the fluctuations in cryptocurrency prices because the connections are smooth. The slopes of cubic splines and their first and second derivatives are all matched. Cubic splines are third-degree polynomial functions that are still small enough to avoid excessive variance.
Cubic B-splines are an easier variant of cubic splines for efficient computation, as at most 5 basis functions contribute to the interpolation. Below we can see the performance of cubic B-splines on Litecoin prices after placing knots at the quartiles.
By manually selecting knots, that is, in cases where we have a bunch of data points, we achieved better R2 on the testing dataset compared to values when placing knots based on quartiles.
Cubic splines near the boundaries may behave strangely, as you can notice in the red graph above. The so-called natural cubic spline enforces the function to be linear outside the limit knots by transforming a cubic polynomial into linear at each limit.
Natural cubic splines require the selection of a degree of freedom. For the price of Litecoin, I found the optimal degree of freedom through cross-validation: selecting an appropriate quantum of 174 knots as predictor dates. The result shows less variance at the edges compared to cubic B-splines, but the R2 on the testing dataset is slightly worse.
Finally, I implemented smooth splines, minimizing the mean squared error while penalizing price changes.
Smooth splines seem to be the most suitable piecewise interpolation for Litecoin prices. This model achieved the best R2 value obtained so far on the testing dataset.
The exciting part of the cubic spline model is how it infers beyond the range of data used to train the model.
According to the renowned statistician Rob Jhyndman, known for working with predictions and time series, cubic smoothing spline models can serve as equivalent models to ARIMA models in forecasting, but their parameter space is limited. Rob claims that spline models provide a smooth historical trend along with a linear prediction function.
I invite you to further experiment with this idea. My computer code can be viewed online in Jupyter Python/R Notebook format.
Google Colab Notebook used in this article
Digital currencies and cryptocurrencies, such as Litecoin, are among the most controversial and complex technological innovations in the modern global economy. This article aims to use a less popular approach: cubic splines to predict the changes in Litecoin prices.