ABSTRACT
In this paper, we are interested in predicting the bubbles in the S&P 500 stock market with a two-step machine learning approach that employs a real-time bubble detection test and support vector machine (SVM). SVM as a nonparametric binary classification technique is already a widely used method in financial time series forecasting. In the literature, a bubble is often defined as a situation where the asset price exceeds its fundamental value. As one of the early warning signals, prediction of bubbles is vital for policymakers and regulators who are responsible to take preemptive measures against the future crises. Therefore, many attempts have been made to understand the main factors in bubble formation and to predict them in their earlier phases. Our analysis consists of two steps. The first step is to identify the bubbles in the S&P 500 index using a widely recognized right-tailed unit root test. Then, SVM is employed to predict the bubbles by macroeconomic indicators. Also, we compare SVM with different supervised learning algorithms by using k-fold cross-validation. The experimental results show that the proposed approach with high predictive power could be a favourable alternative in bubble prediction.
Keywords: Bubbles, early warning, machine learning, support vector machines, macroeconomic indicators
1. Introduction
Financial crisis prediction is one of the most challenging problem faced by both emerging and advanced economies. Although the globalization has led to a more efficient financial market through the increasing interaction between the actors, at the same time the economies have become more fragile. Especially, the United States housing bubble bursting in 2007 turned into a global credit crisis in a very short time. After seeing the destructive effects of this crisis to the economies, policymakers have tried to make many regulations and to take precautionary measures against the possible future crises. The S&P 500 as the proxy of U.S. economy has always a large impact on the global financial markets because of its size and interaction with other markets [29]. Hence, a large number of studies have been conducted to understand the causes of housing bubble while the others have been performed for an empirical evidence in the S&P 500 market.
Detecting anomalies in stock market data has become crucial because financial crisis generally follows up a boom and bust cycle in asset prices, namely, asset price bubbles [30]. A bubble is often defined as a situation where asset prices exceeds its fundamental value. Gilles and Leroy [9] defined the bubbles as the abnormal difference between the market prices and real values. From a theoretical point of view, the real or fundamental value of any asset is equal to the sum of the present values of all future cash flows discounted with the required rate of return. The history shows that predicting bubbles has always been vital for policymakers and financial regulators who are responsible for wellbeing of the financial system. Thereafter, a growing number of studies in the literature has tried to develop early warning systems to detect the abnormal stock market behavior leading to crises.
A popular real-time bubble detection test was developed by Phillips, Shi and Yu (PSY hereafter) in 2015 [30,31], since then it has been widely used as an early warning device for crises (see the references in [29]). The test is based on the Augmented Dickey-Fuller (ADF) model specification and employs a recursive evolving algorithm which achieved an outstanding performance in detecting multiple bubbles and locating their origination and termination dates [30]. However, a recent study [1] showed that if there is a few bubble observations at the end of sample, the presence of heteroskedasticity can affect the performance of this recursive test. Then they developed a heteroskedasticity-robust wild bootstrap procedure. Following this, in 2019, Phillips and Shi [29] pointed out an additional multiplicity issue arises from the sequential nature of recursive hypothesis testing and introduced a new bootstrap procedure that fixes both heteroskedasticity and multiplicity issues in testing. Our analysis employs the PSY test along with the new bootstrap procedure.
The main goal of this study is to predict the U.S. bubbles with macroeconomic indicators by applying the PSY procedure and SVM algorithm sequentially. Bubble prediction is a challenging task. The choice of prediction model is very important. Recent studies have shown that machine learning algorithms deal better with the nonstationarities and nonlinearities in financial time series and consequently have better prediction performance than stochastic models. Being a popular supervised learning method, SVM has many financial applications especially in forecasting asset prices, the price movements or bankruptcy events of a firm. As far as we know, no previous research has been done to predict the asset price bubbles by employing SVM. Here, we try to fill the gap in the literature by answering the question whether the financial bubbles can be predicted with macroeconomic indicators, whether the exuberance behavior in stock market data is machine learnable.
In 1963, Vapnik and his colleagues [42] invented the original SVM algorithm. In their seminal paper of 1992 [3], they introduced a way to create nonlinear classifiers by applying the kernel trick to maximum margin hyperplanes. In 1995, the algorithm was extended to the case of regression by [41]. In 1997, Mukherjee and his co-workers [27] demonstrated the applicability of SVM to time series forecasting and in 2001, Tay and Cao [38] showed the predictability of financial time series by employing SVM as a regression tool.
As a flexible supervised learning algorithm without any assumption on underlying function, Support Vector Regression has been widely used for forecasting of a continuous value e.g. stock price, for further references see [34]. Also, SVM has been extensively studied to solve many financial classification problems such as bankruptcy prediction, stock price direction prediction, credit scoring and trading analysis [2,7,13,15,16,20,23–26,36,37,43–45]. Here, we will only mention a few studies that use SVM to predict the class label in financial time series. The comprehensive review in this field can be found in [22,34].
The first paper to employ SVM as a classification tool and apply to financial time series is the paper of Kim [16], published in 2003. In his study, the direction of daily change of the Korean stock price index was predicted with 12 technical indicators. The directions of daily change were categorized as ‘0' if the next day's index is lower than today's index and ‘1' if the next day's index is higher than today's index (binary classification). SVM algorithm here predicts the class label ‘1' or ‘0'. This study concludes that SVM as a classification methodology is also applicable to financial time series. The work of [16] is followed by [5,6,12,13,21,28,32,39].
The most valuable performance measure of an SVM algorithm is the accuracy rate which can be defined as the percentage of correct predictions. Recently, hybrid machine learning methods have become so popular due to their high prediction power. The authors in [45], for example, proposed a hybrid approach named status box method which integrates AdaBoost algorithm, probabilistic SVM and genetic algorithm and reduces the prediction of stock price trend and turning points problem into status boxes classification problem. Their approach aims is to help the investors choose the optimal trading strategy. Meanwhile, [37] declared the importance of the relationship between input window length and forecast horizon for the performance of systems when predicting stock price direction. As in the Kim's work [16], they used technical indicators as input features to forecast the future behaviours of asset prices by employing the following machine learning algorithms: SVM, artificial neural networks and k-Nearest neighbors.
In another study, Wang and Choi [43] proposed an integrated machine learning approach using first Principal Component Analysis for feature selection and then employing SVM to predict the stock price and stock market direction. The interesting part of this study, they used both lagged daily prices and economic factors as input features in SVM algorithm. The previous studies frequently employed the data inherent to stock market such as technical indicators or trading volume to forecast stock price direction. Also, plenty of study has been done to predict the stock prices with macroeconomic indicators. Based on the literature review, it is observed that financial applications of SVM have focused on using the features that the data about past performance or economic conditions or both, but there is no clear evidence that one approach is superior to others.
To the best of our knowledge, this is the first study that employs SVM to predict the asset price bubbles. Our analysis uses macroeconomic variables as the features of SVM algorithm to predict the targeted bubble periods. The results show that SVM obtained high prediction accuracy on this dataset, i. e. classification accuracies of more than 99%. The rest of the paper is organized as follows. Section 2 introduces the bubble detection test, SVM algorithm and the other machine learning algorithms used for comparison purposes. Section 3 describes the dataset and experimental design, gives the test results and the comparison results and lastly, Section 4 concludes the work.
2. Methodology
2.1. The recursive evolving algorithm
The PSY procedure uses recursive regression techniques to test the unit root against the alternative right-tailed explosive hypothesis. The specified null hypothesis is a random walk process with an asymptotically negligible drift:
| (1) |
where d is constant and T denotes the sample size with . The proposed regression model is
| (2) |
In PSY method, denotes the minimum window size, the fractional starting point and the fractional ending point for each sample. The end point is fixed and changes from 0 to . PSY test applies the above regression model recursively to calculate the right-tailed ADF test statistics on a backward expanding sample sequence and makes inferences about the explosiveness of the observations based on the supremum value of these test statistics by comparing this with the right-tailed corresponding critical values of its limit distribution. Then the backward sup ADF test statistics is represented as
| (3) |
The PSY test assumes that the error term in regression ϵ has a constant variance. However, as specified in [1], the heteroskedasticity can often be seen in financial time series. Together with, the multiplicity is a common problem in recursive testing procedures [29]. The PSY procedure, employed in this paper, calculates the bootstrap critical values by using a composite algorithm to deal with both heteroskedasticity and multiplicity issues. The technical details of the procedure can be found in [29–31].
2.2. Support vector machines
Support vector machine is a supervised machine learning algorithm which gives a decision boundary for classification of data. The basic principle of SVM is to create a hyperplane that separates the data into classes. If there is two classes and the data are linearly separable, you only need to find a line that separates the data into two classes, but there can be many line to do this job. SVM suggests to find the points that are closest to both classes. These points are called as support vectors. The next step is to find the distance between the points and the dividing line which is known as margin. The aim of SVM algorithm is to create the maximum margin in order to obtain the optimal hyperplane. If the data are not linearly separable, SVM produces nonlinear boundaries by constructing a linear boundary in a high-dimensional, transformed version of feature space.
In the literature, SVM classifier can be seen in different versions, mainly in primal and dual forms. The dual formulation uses the dot product of features which serves to our purpose when we use kernels, particularly the nonlinear ones [18]. Hence, we will follow the dual formulation of SVM classifier after briefly introducing the classification problem. The details can be found in the following fundamental books of statistical learning [11,41].
Suppose we are given a training set of N observations and , belonging to two different classes . Classification problem is to estimate from these data s.t.
| (4) |
| (5) |
For a correct classification, . A linear classification constructs a hyperplane with the decision rule
| (6) |
where w is the normal to the line (weight vector) and b is the bias. For a linear classifier, the training data are used to learn only w to classify new data. Maximum margin solution gives the best w.
| (7) |
Then the margin is given by
| (8) |
The aim is to maximize the margin, i.e. to minimize the length of w. The hyperplane maximizing the margin is the solution to the following optimization;
| (9) |
or equivalently
| (10) |
This is a linearly constrained optimization problem with quadratic objective function and there is a unique minimum. It allows us to obtain the following equation using Lagrange's method with which we are able to maximize or minimize without any constraints.
| (11) |
where
| (12) |
w is equal to the linear sum of the vectors in the sample set and 's are the Lagrange coefficients. Now, the decision rule can be written in dual form as follows:
| (13) |
which is frequently determined by only a small subset of training set, called support vectors with nonzero , while the remaining zero 's can be neglected.
In order to measure the error between the output of the algorithm and the given target value, loss functions are used. In SVM, the hinge loss is used as a loss function. For a binary classification problem, the hinge loss function for the prediction of y is
| (14) |
where t represents the target value.
Linear SVM's are generally used for binary classification. If the dataset is not linearly separable, then change the space by mapping data to a higher dimension where it is linearly separable. Kernel function provides a transformation in a feature space by the inner product of two vectors. If there is a kernel function such that
| (15) |
where is a feature map, SVM algorithm is capable to construct a linear boundary in a large transformed feature space.
In non-linearly separable case by using slack variable ξ, constraints can be written as
| (16) |
| (17) |
Here, the optimization problem must also provide a way to compute the new variable ξ. Also, the magnitude of ξ may change from case to case. For that reason a penalty term, C, to weight the contribution of ξ is added to the objective function. The objective function for the non-linearly separable case is of the form:
| (18) |
Again, the Lagrange's method is applied to the constrained quadratic optimization problem and then the optimization of hyperplane is realized in the new feature space which corresponds to nonlinear decision rule in the original space. The constrained optimization is done by minimization of the Lagrangian
| (19) |
| (20) |
where , the Lagrangian multipliers, are introduced to support the slack variable. Now, the decision rule for non-separable data can be represented by the following function:
| (21) |
where C is the cost parameter for misclassification and K represents the kernel function. As stated before, there is a lower bound 0 on the Lagrange multiplier for the separable data. For non-separable data, in addition to the lower bound 0, each multiplier is bounded above by C which serves as a regularization parameter controlling the outliers. The lower the parameter C, the more the outliers and vice versa. Allowing fewer outliers means high misclassification cost C [11].
The most frequently used Kernel functions in financial applications of SVM are [13,16]
Parameter selection is critical for Gaussian kernel which affects the model performance. The large value of γ results in overfitting, while a small value of γ leads to underfitting.
2.3. Logistic regression
Logistic regression is the regression model that is used when dependent variable Y is a binary or categorical variable. It is used to describe the relation between one dependent binary variable and any independent variable. In logistic regression, instead of modeling dependent variable as a function of independent variables, the probability of dependent variable is equal to class 1 is calculated. Now, let with the coefficients then the logistic model is given as
| (22) |
The function on the right-hand side is called the sigmoid function of . Taking the natural logarithms of both sides yields
| (23) |
The function on the left-hand side is called the log odds function which is the ratio of probability of an event occurring and not occurring. Given a sample space for and then we have
| (24) |
In order to construct an estimate for coefficients , maximum likelihood approach is used. Log likelihood function of the coefficients is
| (25) |
This function does not have a close form solution so iterative optimization algorithms are used to maximize [14]. The decision boundary for the classes of the dependent variable is made according to the following rule
| (26) |
or equivalently
| (27) |
According to the decision rule, if the calculated probability is greater than 0.5 we conclude that Y belongs to class 1 otherwise it belongs to class 0.
2.4. Decision trees
Decision trees as classifiers use the features of the data to predict the class label. Decision trees consist of internal nodes, branches and leaf nodes. Internal nodes are labeled with an input feature to test for the feature value and their branches are labeled with the possible values of the test. Root is the starting node of tree and leaf is the terminal node that labeled by the class label predictions. The observations are classified starting from the root navigating down to a leaf, according to the outcome of the test[4].
Decision tree algorithm has a growing and pruning phase. Decision trees are built by splitting the data based on features. The algorithm decides the splitting variables and split points. Splitting rules are determined by searching the best split which separates observations based on the class label. Then, a node is splitted into two subsets and the process is applied to each subset node in a recursive manner. Splitting is completed when the subset has all the same values of class label or when no further splits add value to the prediction. The pruning phase reduces the tree size by eliminating the parts which has little contribution to the classification accuracy [33].
2.5. k-Nearest neighbors
k-Nearest neighbors (k-NN) is a classification algorithm which makes decisions according to the closest training set feature space. (k-NN) is a lazy algorithm, generalization of the training data is postponed until a query is made. Algorithm does not form any model or perform any learning in the training phase [40]. The next paragraph explains the key concepts of the algorithm.
Suppose that, we have a sample space for , where Y is the class label and X represents the features. The algorithm calculates the value of the target function for a new query from the known values of the nearest training set. The most commonly used distance is the Euclidean distance, that is
| (28) |
(k-NN) searches for the minimum distance example on the training set and the target value ,
| (29) |
then the found label of the feature x is assigned as the label of .
2.6. Artificial neural networks
Artificial neural network (ANN) is a parametric machine learning algorithm in which the pattern between the inputs and outputs is investigated. The fundamental element and processing unit is neuron, a single cell of the system, that takes inputs, processes them and returns an output y. The neural network is a set of connected input-output units where each connection has a weight attached to it. The network begins to learn by adjusting the weights to predict the correct class of inputs [46].
Let represent as inputs of the neuron and as weights of respective inputs. b is the bias, which is summed up with the weighted inputs to form the sum of product. Bias and weights are adjusted parameters using some rules in learning phase. To normalize the output into a range, we need to specify an activation function f which provides mapping between the input and output of the neuron. For classification problems, logistic function is a popular choice for the activation function which leads to a binary classification model. The output is expressed as follows:
| (30) |
A large number of input are fed to neural network and the output of each input is used to train the network. In this way, when faced with a new input, neural network recognizes similarities and predicts output.
3. Experimental application
3.1. Data
We now evaluate the capability of our two-step analysis to predict the bubbles in financial time series. The identification of U.S. stock market bubbles is based on the S&P 500 price-dividend ratio series provided by Shiller in his homepage [35]. We first conduct PSY test for bubble detection and then use the binary outputs as the target labels for SVM algorithm. We employ the generalized version of real-time bubble detection test developed by Phillips and Shi [29], which is capable to identify the multiple bubble periods within a dataset while dealing with the multiplicity and heteroskedasticity issues.
The PSY procedure is applied to monthly price-dividend ratio series covering the period 1973 to 2018, including 501 observations. The bubble data start from November 1976 because of the minimum window size which serves as the training data for PSY test. The minimum window size is determined with the rule in [30]. Notice that, we follow the same procedure as in [29] for the same sample period and not surprisingly obtain the same results. The identified bubble periods coincide with the famous bubble and crisis periods: Black Monday crash, dot-com bubble and subprime mortgage crisis. The details of the procedure and the detected bubble periods can be seen in [29].
The SVM analysis relies on the identified bubbles and the selected macroeconomic variables: gross domestic product (GDP) growth, unemployment rates (UNEMP), short-term interest rates (IRS), long-term interest rates (IRL), inflation (INF) and balance of payments (BLNC). The variables are chosen based on their relevance to the purpose of this study. They are the main indicators of U.S. economy according to the fiscal and monetary authorities. The following paragraphs describe the selected variables and explain their importance.
GDP measures the value of goods and services produced within the borders of country in a specific time period. GDP growth is a famous indicator of economic health of a country so it directly affects perceptions of the investors and therefore stock markets. The healthy balance of payments also contributes to economic growth. Balance of payments measures the all transactions of a country's individuals, firms and governments with individuals, firms and governments outside the country. It provides a detailed information about supply and demand of country's currency that represents trade deficit or surplus which directly shows how well the economy works.
Short-term interest rates are the rates that are applied on short-term borrowing. They are usually averages of daily rates and computed based on three-month money market. Long-term interest rates are measured by government bonds maturing in 10 years. Like the other indicators interest rates are the primary factors that represent the power of the economy. They are also strongly followed by decision makers to understand the price movements of a market.
Inflation measures the price changes of goods and services in an economy over a period of time. It is also an important factor which directly determines the stock price of a company by decreasing or increasing its present value of the future cash flows. Lastly, unemployment is a leading indicator representing the strength of the economy as well as GDP. Unemployment rate is defined as the percentage of unemployed person in a labour force. It is strongly followed by economists and investors to help in their investment decisions.
3.2. Experimental design and results
SVM algorithm needs a large dataset to obtain meaningful results and to make possible to do prediction dividing the data into training and test parts. So, the monthly bubble data are converted into weekly frequency, including 256 bubble and 1922 no bubble data points. Together with, GDP growth, BLNC, INF, UNEMP series are available at quarterly frequency, while IRS and IRL series are on monthly frequency. All data series are converted into weekly frequency by cubic spline interpolation which is a very smooth interpolation method comparing with others. The time span of the features is from November 1976 to July 2018, coinciding with bubble test results. In order to avoid the dataset to suppress the model, each feature series of SVM is normalized.
Length of training and testing data is also important for meaningful inference from the observations. The data are divided into two parts: training and test. The training data are randomly selected from the set of all data, including of the data. The rest are the test data used to see how well SVM works. SVM is implemented using the e1071 package [8] in R. Polynomial and Gaussian radial basis functions are used as kernel functions.
One important flexibility of SVM is that there is only one parameter to tune, the cost parameter C. Appropriate range for C is between 1 and 100 according to [38]. Also, in the nonlinear SVM, there is one additional parameter in Gaussian kernel, the kernel parameter σ to tune. A good combination of parameter σ and cost parameter C for misclassification is searched for better performance. The performance under different parameter settings for training and test data are shown in Table 1 for Gaussian kernel and in Table 2 for polynomial kernel. We report both prediction accuracy and error rate for training and test data. Accuracy is defined as the number of correct predictions divided by the number of total sample.
Table 1. SVM prediction performance under Gaussian Kernel with various parameters.
| Training data | Test data | |||
|---|---|---|---|---|
| C | Accuracy | Training error | Accuracy | Test error |
| (a) | ||||
| 1 | 0.9793 | 0.0207 | 0.9770 | |
| 10 | 0.9868 | 0.0132 | 0.9839 | 0.0161 |
| 33 | 0.9868 | 0.0132 | 0.9862 | 0.0138 |
| 55 | 0.9879 | 0.0121 | 0.9862 | 0.0138 |
| 78 | 0.9905 | 0.0095 | 0.9773 | 0.0227 |
| 100 | 0.9905 | 0.0095 | 0.9773 | 0.0227 |
| (b) | ||||
| 1 | 0.9917 | 0.0083 | 0.9690 | 0.0310 |
| 10 | 0.9970 | 0.0030 | 0.9855 | 0.0145 |
| 33 | 0.9988 | 0.0012 | 0.9855 | 0.0145 |
| 55 | 0.9988 | 0.0012 | 0.9855 | 0.0145 |
| 78 | 0.9988 | 0.0012 | 0.9855 | 0.0145 |
| 100 | 0.9988 | 0.0012 | 0.9855 | 0.0145 |
| (c) | ||||
| 1 | 0.9947 | 0.0053 | 0.9731 | 0.0269 |
| 10 | 0.9994 | 0.0006 | 0.9814 | 0.0186 |
| 33 | 1.0000 | 0.0000 | 0.9834 | 0.0166 |
| 55 | 1.0000 | 0.0000 | 0.9834 | 0.0166 |
| 78 | 1.0000 | 0.0000 | 0.9834 | 0.0166 |
| 100 | 1.0000 | 0.0000 | 0.9834 | 0.0166 |
| (d) | ||||
| 1 | 0.9964 | 0.0036 | 0.9731 | 0.0269 |
| 10 | 1.0000 | 0.0000 | 0.9814 | 0.0186 |
| 33 | 1.0000 | 0.0000 | 0.9834 | 0.0166 |
| 55 | 1.0000 | 0.0000 | 0.9834 | 0.0166 |
| 78 | 1.0000 | 0.0000 | 0.9834 | 0.0166 |
| 100 | 1.0000 | 0.0000 | 0.9834 | 0.0166 |
| (e) | ||||
| 1 | 0.9964 | 0.0036 | 0.9731 | 0.0269 |
| 10 | 1.0000 | 0.0000 | 0.9814 | 0.0186 |
| 33 | 1.0000 | 0.0000 | 0.9834 | 0.0166 |
| 55 | 1.0000 | 0.0000 | 0.9834 | 0.0166 |
| 78 | 1.0000 | 0.0000 | 0.9834 | 0.0166 |
| 100 | 1.0000 | 0.0000 | 0.9834 | 0.0166 |
Table 2. SVM prediction performance under polynomial kernel with various parameters.
| Training data | Test data | |||
|---|---|---|---|---|
| C | Accuracy | Training error | Accuracy | Test error |
| 1 | 0.9823 | 0.0177 | 0.9711 | 0.0289 |
| 10 | 0.9847 | 0.0153 | 0.9690 | 0.0310 |
| 33 | 0.9870 | 0.0130 | 0.9711 | 0.0289 |
| 55 | 0.9870 | 0.0130 | 0.9648 | 0.0352 |
| 78 | 0.9881 | 0.0119 | 0.9648 | 0.0352 |
| 100 | 0.9881 | 0.0119 | 0.9628 | 0.0372 |
The performance of the classifier is measured by both training error rate and test error rate. The training set is . Training error of the estimate is the proportion of mistakes which are obtained by applying estimated model to the training data, that is
| (31) |
where is the predicted label by SVM and is the indicator function. To measure the test error rate, we apply the above formula to the test data. If the test dataset is of the form then the following measures the test error
| (32) |
Here, we summarize our approach (1) Implement PSY test to the price-dividend ratio series to identify the bubbles in the S&P 500 index, which forms the binary data. (2) Normalize the independent variables (features). (3) Divide the dataset randomly as for training set and for test set. (4) Train SVM for different parameter values of C and . (5) Use trained model on test set to predict bubbles. (6) Calculate both training and test accuracy of the models.
For Gaussian kernel, the best performance for training data is reached when and C = 33, which is accuracy. The range of accuracy for the training set is between and while the highest error rate is . The best performance for the test data is achieved when and C = 33 with accuracy of , while the error rate is between and . For polynomial kernel, the best performance for the training data is done when C = 78, while the best performance is recorded for the test data when C = 1. Accuracy of the test data is ranged between and where the highest error rate is . Consequently, the classification performance of SVM is not significantly affected by the choice of cost parameter C and the kernel parameter σ. The performance of SVM algorithm is almost stable and the results show relatively high accuracy, more than .
3.3. Comparison with other machine learning algorithms
To compare the prediction power of SVM, we employ four different supervised learning algorithms: logistic regression, decision trees, k-Nearest neighbors, and artificial neural networks. We have two different goals in this subsection: to tune the parameters of the algorithms and to evaluate the performance of models. In order to choose the best one, we first train the model. Then, we evaluate the performance of the trained learning model based on the prediction accuracy on test data and compare the machine learning algorithms to see how well they performed.
We randomly divide the data into two parts: a training set including of the data and a test set including of the data while preserving the percentage of bubble and no bubble data points as in the original data. The training set is used to tune the model parameters, the test set is used for assessment of chosen model performance.
Our dataset consists few bubble observations, i.e. 12–78 split for bubble and no bubble. Therefore, for comparison purposes, we use not only classification accuracy but also report the Cohen's kappa statistics. Accuracy, i.e. the percentage of correctly classified observations, is the most widely used performance metric for parameter tuning and evaluation in machine learning. Kappa is a statistic ranging between 0 and 1 which is described by the following formula:
| (33) |
where is the observed accuracy, is the expected accuracy under chance agreement. It measures the inter-rater reliability, higher reliability is achieved by higher kappa statistics. According to Landis and Koch [19]: a kappa value between 0.00–0.20 indicates slight, 0.21–0.40 fair, 0.41–0.60 moderate, 0.61–0.80 substantial, and 0.81–1 almost perfect agreement.
We use repeated k-fold cross-validation to tune and evaluate the performance of the compared machine learning algorithms. Cross-validation is implemented in R, using the caret package [17]. Validation and evaluation procedure has the following steps: (1) Split the data into training and test set. (2) Randomly split the training data into k partitions. (3) Use each k-1 folds to train the learning algorithm and the retained k-th fold to validate the model. (4) Repeat the previous step several times. (5) Calculate the average accuracy across the k-th hold-out sets of predictions. (6) Determine the model parameters or model complexity based on the highest cross-validated accuracy. (7) Evaluate the performance of the fitted model on the test set.
We use 5 repeats of 10-fold cross-validation which creates 50 total resamples. Our comparison starts with a naive approach, logistic regression. We estimate the following logistic model
| (34) |
where Y represents the bubble and equals to , X represents the features. Training accuracy of the model is with kappa 0.6257, while the test accuracy is with kappa 0.6563.
Second machine learning algorithm to test is decision trees. The algorithm conducts as many splits as possible, then uses 10 folds cross-validation to prune the tree. Tuning the tree is done over complexity parameter and the optimal model is chosen based on the highest accuracy with corresponding complexity. For different complexity parameters, the accuracy and kappa statistics are given in Table 3. According to the table, the highest training accuracy of with substantial kappa 0.7573 is achieved when complexity parameter is 0.0390. The test accuracy is observed as with kappa 0.8091.
Table 3. Resampling results across tuning parameter of decision tree algorithm.
| Complexity | Accuracy | Kappa |
|---|---|---|
| 0.0390 | 0.9572 | 0.7573 |
| 0.1000 | 0.9484 | 0.6925 |
| 0.2317 | 0.9174 | 0.3912 |
Third learning algorithm used for classification is k-Nearest neighbors. Table 4 shows the number of nearest neighbors k used in the algorithm with accuracy and kappa statistics. According to the table, the highest cross-validated accuracy for the training set with kappa 0.9490 is achieved when 3 neighbors are used. When the number of nearest neighbors decreases, the model complexity increases. As can be seen, the more complex model gives the highest accuracy. In the test set, the model has accuracy of with almost perfect kappa 0.9782.
Table 4. Resampling results across tuning parameter of k-Nearest neighbors.
| k | Accuracy | Kappa |
|---|---|---|
| 3 | 0.9893 | 0.9490 |
| 5 | 0.9873 | 0.9400 |
| 7 | 0.9812 | 0.9103 |
| 9 | 0.9788 | 0.8983 |
Lastly, artificial neural network algorithm is conducted for comparison purposes. Macroeconomic variables are employed as inputs to a feed forward neural network for prediction of S&P 500 bubbles. Results of the algorithm are given in Table 5. The table shows accuracy and kappa statistics across size and decay parameters. Size denotes the number of units in the hidden layer and decay is the regularization parameter used to avoid overfitting [17]. The best performance for the training set is achieved when size is equal to 5 and decay is equal to 0 with accuracy and kappa . The test accuracy is equal to with kappa 0.8407.
Table 5. Resampling results across tuning parameters of artificial neural networks.
| Size | Decay | Accuracy | Kappa |
|---|---|---|---|
| 1 | 0e + 00 | 0.9159 | 0.3514 |
| 1 | 1e−04 | 0.9095 | 0.2797 |
| 1 | 1e−01 | 0.9589 | 0.7736 |
| 3 | 0e + 00 | 0.9656 | 0.8239 |
| 3 | 1e−04 | 0.9644 | 0.8178 |
| 3 | 1e−01 | 0.9639 | 0.8112 |
| 5 | 0e + 00 | 0.9739 | 0.8715 |
| 5 | 1e−04 | 0.9733 | 0.8696 |
| 5 | 1e−01 | 0.9706 | 0.8530 |
In the previous subsection [3.2] we used the methodology of [16] to tune parameters of SVM. Here, in order to have meaningful comparison, repeated k-fold cross-validation will be used. We start with polynomial kernel. Table 6 shows the results of repeated k-fold cross-validation of the SVM with polynomial kernel. According to the table, the best possible training set accuracy of with kappa 0.8966 is achieved when cost parameter C is equal to 33. Test accuracy of the model is with almost perfect kappa 0.9782.
Table 6. Resampling results across tuning parameters of SVM with polynomial Kernel.
| C | Accuracy | Kappa |
|---|---|---|
| 1 | 0.9737 | 0.8685 |
| 10 | 0.9774 | 0.8907 |
| 33 | 0.9792 | 0.8996 |
| 55 | 0.9787 | 0.8964 |
| 78 | 0.9783 | 0.8949 |
| 100 | 0.9790 | 0.8980 |
Table 7 gives accuracy and kappa statistics of SVM with Gaussian kernel over two tuning parameters. According to the table, the highest training set accuracy with kappa 0.9036 is obtained when is equal to 1 and C is equal to 10. Test accuracy of the model is with almost perfect kappa 0.9782.
Table 7. Resampling results across tuning parameters of SVM with Gaussian Kernel.
| C | Accuracy | Kappa | C | Accuracy | Kappa | ||
|---|---|---|---|---|---|---|---|
| 1 | 1 | 0.9704 | 0.8462 | 55 | 1 | 0.8870 | |
| 1 | 25 | 0.7701 | 55 | 25 | 0.8414 | ||
| 1 | 50 | 0.7486 | 55 | 50 | 0.9664 | 0.8239 | |
| 1 | 75 | 0.4236 | 55 | 75 | 0.8232 | ||
| 1 | 100 | 0.0031 | 55 | 100 | 0.7910 | ||
| 10 | 1 | 0.9036 | 78 | 1 | 0.9760 | 0.8845 | |
| 10 | 25 | 0.8259 | 78 | 25 | 0.8424 | ||
| 10 | 50 | 0.9595 | 0.7730 | 78 | 50 | 0.8289 | |
| 10 | 75 | 0.7730 | 78 | 75 | 0.8214 | ||
| 10 | 100 | 0.7694 | 78 | 100 | 0.8153 | ||
| 33 | 1 | 0.8956 | 100 | 1 | 0.8903 | ||
| 33 | 25 | 0.8409 | 100 | 25 | 0.9696 | 0.8413 | |
| 33 | 50 | 0.8213 | 100 | 50 | 0.8314 | ||
| 33 | 75 | 0.9634 | 0.7976 | 100 | 75 | 0.9660 | 0.8205 |
| 33 | 100 | 0.7708 | 100 | 100 | 0.8209 |
In order to compare the performance of the employed machine learning algorithms, we report some descriptive statistics of accuracy and kappa based on 50 resamples. These values are given in Table 8. We use both accuracy and kappa to compare the performance of the models. When we compare their training accuracy, the best performance is achieved by k-NN following by SVM with Gaussian kernel while logistic regression has the lowest statistics. According to their kappa statistics, SVM, k-NN and ANN have almost perfect kappa values while logistic regression and decision trees show substantial performance.
Table 8. Comparison of resampling results of employed algorithms.
| Algorithm | Minimum | 1st Quartile | Median | Mean | 3rd Quartile | Maximum |
|---|---|---|---|---|---|---|
| (a)Accuracy | ||||||
| Logistic Regression | 0.8850 | |||||
| Decision Trees | 0.9252 | 0.9482 | 0.9570 | 0.9572 | 0.9656 | |
| Artificial Neural Networks | 0.9655 | 0.9741 | 0.9739 | 0.9813 | ||
| k-Nearest Neighbors | 0.9842 | 0.9942 | 0.9893 | |||
| SVM (Polynomial Kernel) | 0.9712 | 0.9799 | 0.9792 | |||
| SVM (Gaussian Kernel) | 0.9826 | 0.9796 | ||||
| (b) Kappa | ||||||
| Logistic Regression | 0.3691 | 0.5668 | 0.6315 | 0.6257 | 0.8143 | |
| Decision Trees | 0.5197 | 0.7653 | 0.7573 | |||
| Artificial Neural Networks | 0.8242 | 0.8717 | 0.8715 | 0.9070 | ||
| k-Nearest Neighbors | 0.8440 | 0.9255 | 0.9711 | 0.9490 | 0.9734 | |
| SVM (Polynomial Kernel) | 0.8617 | 0.9047 | 0.8996 | |||
| SVM (Gaussian Kernel) | 0.7998 | 0.9134 | 0.9036 |
Our ultimate aim is to generate the most accurate estimate for bubble, so we use alternative performance metrics derived from confusion matrix: precision and specificity of bubble class. Table 9 reports the performance metrics of employed algorithms for the test set. The specificity of bubble class is defined as the percentage of bubble predictions that are classified correctly. The precision of bubble class is the ratio of correctly predicted bubbles divided by the total number of bubbles. Specificity shows the models' capability of bubble detection while precision expresses how trustable is the model when detects a data point as a bubble [11].
Table 9. Performance metrics of the employed machine learning algorithms.
| Algorithm | Accuracy | Kappa | Precision | Specificity |
|---|---|---|---|---|
| Logistic Regression | 0.6563 | 0.5882 | ||
| Decision Trees | 0.9655 | 0.8091 | 0.7059 | |
| Artificial Neural Networks | 0.9701 | 0.8407 | 0.9750 | 0.7647 |
| k-Nearest Neighbors | 0.9782 | 0.9623 | 1.0000 | |
| SVM (Polynomial Kernel) | 0.9782 | 0.9623 | 1.0000 | |
| SVM (Gaussian Kernel) | 0.9782 | 1.0000 |
According to the results, SVM and k-NN have higher classification accuracy with almost perfect kappas than the others. Also, the higher specificity and higher precision values imply that they are better at detecting the bubble class. Here, SVM and k-NN perform similarly when the optimal model is used. However, the prediction power of k-NN depends on the model complexity which may lead to overfitting. Also, k-NN, for each new test observation, calculates the distance between the existing points and the newly introduced observation. Thus, for large datasets the algorithm consumes much computation time and processing resources. Since the decision of label of the new observation depends only on the distance between the old data points, k-NN is sensitive to outliers and missing data [11]. Logistic regression has the lowest accuracy with substantial kappa among others, and also has moderate ability at bubble detection. Although ANN and decision trees are better at accuracy than logistic regression and show close accuracy performance to others, their bubble detection power is not satisfactory.
4. Conclusion
In this paper, we proposed a method for bubble prediction via combining a recursive unit root test and a supervised learning algorithm, SVM. According to the test results, the proposed two-step model perform favorably on this dataset. When the methodology of [16] is used, the worst case has accuracy of while the best case has accuracy of on the test data. Thus, we believe that SVM with high predictive power could be a favourable alternative to predict asset price bubbles by macroeconomic indicators.
We also compared the prediction power of SVM with four other machine learning algorithms by using repeated k-fold cross-validation. The results showed that logistic regression has the lowest accuracy with while the accuracy of other learning algorithms ranges from to . The highest accuracy is achieved by SVM with two different kernels and k-NN.
Moreover, the experimental results showed that the selected indicators GDP growth, inflation, short-term interest rates, long-term interest rates, unemployment rates and balance of payments are capable to predict S&P 500 bubbles. This information is important for the decision makers, policymakers and scientists to understand the main reasons under bubble formations and behaviour of the bubbles and can be used to take precautions against the future crises.
In the long term, it is worth to examine whether the two-step approach we consider in this paper can be performed on other financial markets. The proposed method could be further improved with feature selection techniques or using combination of other supervised machine learning algorithms as embedded k-NN and ANN. It would also be interesting to work on the macroeconomic variables of the countries that has intensive foreign trade with U.S. to see the effect of these countries on bubble formations.
Acknowledgments
The authors thank to the Associate Editor and the anonymous Reviewers for their valuable suggestions and corrections.
Correction Statement
This article has been corrected with minor changes. These changes do not impact the academic content of the article.
Disclosure statement
No potential conflict of interest was reported by the author(s).
References
- 1.Astill S., Harvey D.I., Leybourne S.J., and Taylor A.R., Tests for an end-of-sample bubble in financial time series, Econom. Rev. 36 (2017), pp. 651–666. doi: 10.1080/07474938.2017.1307490 [DOI] [Google Scholar]
- 2.Zbikowski K., Using volume weighted support vector machines with walk forward testing and feature selection for the purpose of creating stock trading strategy, Expert. Syst. Appl. 42 (2015), pp. 1797–1805. doi: 10.1016/j.eswa.2014.10.001 [DOI] [Google Scholar]
- 3.Boser B.E., Guyon I.M., and Vapnik V.N., A training algorithm for optimal margin classifiers, in Proceedings of the Fifth Annual Workshop on Computational Learning Theory, ACM, 1992, pp. 144–152.
- 4.Breiman L., Friedman J., Stone C.J., and Olshen R.A., Classification and Regression Trees, Wadsworth, Belmont, CA, 1984. [Google Scholar]
- 5.Chen Y. and Hao Y., A feature weighted support vector machine and k-Nearest neighbor algorithm for stock market indices prediction, Expert. Syst. Appl. 80 (2017), pp. 340–355. doi: 10.1016/j.eswa.2017.02.044 [DOI] [Google Scholar]
- 6.Chong E., Han C., and Park F.C., Deep learning networks for stock market analysis and prediction: methodology, data representations, and case studies, Expert. Syst. Appl. 83 (2017), pp. 187–205. doi: 10.1016/j.eswa.2017.04.030 [DOI] [Google Scholar]
- 7.Dellepiane U., Di Marcantonio M., Laghi E., and Renzi S., Bankruptcy prediction using support vector machines and feature selection during the recent financial crisis, Int. J. Econ. Financ. 7 (2015), pp. 182–195. doi: 10.5539/ijef.v7n8p182 [DOI] [Google Scholar]
- 8.Dimitriadou E., Hornik K., Leisch F., Meyer D., Weingessel A., and Leisch M.F., Package “e1071”. R Software package. 2009. Available at http://cran.rproject.org/web/packages/e1071/index.html.
- 9.Gilles C. and LeRoy S.F., Asset Price Bubbles, The New Palgrave Dictionary of Money and Finance, Macmillan Reference, London, 1992, pp. 573–577.
- 10.Harvey D.I., Leybourne S.J., Sollis R., and Taylor A.M.R., Tests for explosive financial bubbles in the presence of non-stationary volatility, J. Empirical Financ. 38 (2016), pp. 548–574. doi: 10.1016/j.jempfin.2015.09.002 [DOI] [Google Scholar]
- 11.Hastie T., Tibshirani R., and Friedman J., The Elements of Statistical Learning: Data Mining Inference and Prediction, Springer Science & Business Media, New York, 2009. [Google Scholar]
- 12.Henrique B.M., Sobreiro V.A., and Kimura H., Stock price prediction using support vector regression on daily and up to the minute prices, J. Financ. Data Sci. 4 (2018), pp. 183–201. doi: 10.1016/j.jfds.2018.04.003 [DOI] [Google Scholar]
- 13.Huang W., Nakamori Y., and Wang S.Y., Forecasting stock market movement direction with support vector machine, Comput. Oper. Res. 32 (2005), pp. 2513–2522. doi: 10.1016/j.cor.2004.03.016 [DOI] [Google Scholar]
- 14.Hosmer Jr D.W., Lemeshow S., and Sturdivant R.X., Applied Logistic Regression, Vol. 398, John Wiley & Sons, New York, 2013. [Google Scholar]
- 15.Kara Y., Boyacioglu M.A., and Baykan K., Predicting direction of stock price index movement using artificial neural networks and support vector machines: the sample of the istanbul stock exchange, Expert. Syst. Appl. 38 (2011), pp. 5311–5319. doi: 10.1016/j.eswa.2010.10.027 [DOI] [Google Scholar]
- 16.Kim K.J., Financial time series forecasting using support vector machines, Neurocomputing 55 (2003), pp. 307–319. doi: 10.1016/S0925-2312(03)00372-2 [DOI] [Google Scholar]
- 17.Kuhn M., Building predictive models in R using the caret package, J. Stat. Softw. 28 (2008), pp. 1–26. available at https://cran.r-project.org/package=caret. doi: 10.18637/jss.v028.i0527774042 [DOI] [Google Scholar]
- 18.Kumar M.P., Zisserman A., and Torr P.H., Efficient discriminative learning of parts-based models, in IEEE 12th International Conference on Computer Vision, 2009, pp. 552–559.
- 19.Landis J.R. and Koch G.G., The measurement of observer agreement for categorical data, Biometrics 33 (1977), pp. 159–174. doi: 10.2307/2529310 [DOI] [PubMed] [Google Scholar]
- 20.Lee M.C., Using support vector machine with a hybrid feature selection method to the stock trend prediction, Expert. Syst. Appl. 36 (2009), pp. 10896–10904. doi: 10.1016/j.eswa.2009.02.038 [DOI] [Google Scholar]
- 21.Lee T.K., Cho J.H., Kwon D.S., and Sohn S.Y., Global stock market investment strategies based on financial network indicators using machine learning techniques, Expert. Syst. Appl. 117 (2019), pp. 228–242. doi: 10.1016/j.eswa.2018.09.005 [DOI] [Google Scholar]
- 22.Lin W.Y., Hu Y.H., and Tsai C.F., Machine learning in financial crisis prediction: A survey, IEEE Trans. Syst. Man Cybernetics, Part C (Appl. Rev.) 42 (2012), pp. 421–436. doi: 10.1109/TSMCC.2011.2170420 [DOI] [Google Scholar]
- 23.Luo L. and Chen X., Integrating piecewise linear representation and weighted support vector machine for stock trading signal prediction, Appl. Soft. Comput. 13 (2013), pp. 806–816. doi: 10.1016/j.asoc.2012.10.026 [DOI] [Google Scholar]
- 24.Markovi I., Stojanovi M., Stankovi J., and Stankovi M., Stock market trend prediction using AHP and weighted kernel LS-SVM, Soft. Comput. 21 (2017), pp. 5387–5398. doi: 10.1007/s00500-016-2123-0 [DOI] [Google Scholar]
- 25.Min J.H. and Lee Y.C., Bankruptcy prediction using support vector machine with optimal choice of kernel function parameters, Expert Syst. Appl. 28 (2005), pp. 603–614. doi: 10.1016/j.eswa.2004.12.008 [DOI] [Google Scholar]
- 26.Min S.H., Lee J., and Han I., Hybrid genetic algorithms and support vector machines for bankruptcy prediction, Expert Syst. Appl. 31 (2006), pp. 652–660. doi: 10.1016/j.eswa.2005.09.070 [DOI] [Google Scholar]
- 27.Mukherjee S., Osuna E., and Girosi F., Non-linear prediction of chaotic time series using support vector machines, in Neural Networks for Signal Processing VII, Proceedings of the 1997 IEEE Signal Processing Society Workshop, IEEE, 1997, pp. 511–520.
- 28.Okasha M.K., Using support vector machines in financial time series forecasting, Int. J. Stat. Appl. 4 (2004), pp. 28–39. [Google Scholar]
- 29.Phillips P.C.B. and Shi S., Real time monitoring of asset markets: bubbles and crises, in Handbook of Statistics, Elsevier, Available at 10.1016/bs.host.2018.12.002, 2019. [DOI]
- 30.Phillips P.C.B., Shi S., and Yu J., Testing for multiple bubbles: historical episodes of exuberance and collapse in the S&P 500, Int. Econ. Rev. (Philadelphia) 56 (2015a), pp. 1043–1078. doi: 10.1111/iere.12132 [DOI] [Google Scholar]
- 31.Phillips P.C.B., Shi S., and Yu J., Testing for multiple bubbles: limit theory of real time detectors, Int. Econ. Rev. (Philadelphia) 56 (2015b), pp. 1079–1134. doi: 10.1111/iere.12131 [DOI] [Google Scholar]
- 32.Qiu M. and Song Y., Predicting the direction of stock market index movement using an optimized artificial neural network model, PLoS One 11 (2016), pp. e0155133. doi: 10.1371/journal.pone.0155133 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Rokach L. and Maimon O.Z., Data Mining with Decision Trees: Theory and Applications, Vol. 69, World Scientific, Singapore, 2008. [Google Scholar]
- 34.Ryll L. and Seidens S., Evaluating the performance of machine learning algorithms in financial markets forecasting: a comprehensive survey. Available at arXiv:1906.07786v2.
- 35.Shiller R.J., Homepage of R. J. Shiller. Available at http://www.econ.yale.edu/shiller/data.htm
- 36.Shin K.S., Lee T.S., and Lee H.J., An application of support vector machines in bankruptcy prediction model, Expert Syst. Appl. 28 (2005), pp. 127–135. doi: 10.1016/j.eswa.2004.08.009 [DOI] [Google Scholar]
- 37.Shynkevich Y., McGinnity T.M., Coleman S.A., Belatreche A., and Li Y., Forecasting price movements using technical indicators: investigating the impact of varying input window length, Neurocomputing 264 (2017), pp. 71–88. doi: 10.1016/j.neucom.2016.11.095 [DOI] [Google Scholar]
- 38.Tay F.E. and Cao L., Application of support vector machines in financial time series forecasting, Omega 29 (2001), pp. 309–317. doi: 10.1016/S0305-0483(01)00026-3 [DOI] [Google Scholar]
- 39.Tang H.H., Zhang L.S., and Wang H., Predicting the direction of stock markets using optimized neural networks with google trends, Neurocomputing 285 (2018), pp. 188–195. doi: 10.1016/j.neucom.2018.01.038 [DOI] [Google Scholar]
- 40.Thirumuruganathan S., A detailed introduction to K-Nearest Neighbor (KNN) algorithm (2010). https://saravananthirumuruganathan.wordpress.com/2010/05/17/a-detailed-introduction-to-k-nearest-neighbor-knnalgorithm/.
- 41.Vapnik V.N., The Nature of Statistical Learning Theory, Springer Science & Business Media, NewYork, 1995. [Google Scholar]
- 42.Vapnik V.N. and Lerner A.J., Generalized portrait method for pattern recognition, Autom. Remote Control 24 (1963), pp. 774–780. [Google Scholar]
- 43.Wang Y. and Choi I.C., Market index and stock price direction prediction using machine learning techniques: an empirical study on the KOSPI and HSI, preprint (2013). Available at arXiv:1309.7119.
- 44.Yolcu U., Aladag C.H., Egrioglu E., and Uslu V.R., Time-series forecasting with a novel fuzzy time-series approach: an example for Istanbul stock market, J. Stat. Comput. Simul. 83 (2013), pp. 599–612. doi: 10.1080/00949655.2011.630000 [DOI] [Google Scholar]
- 45.Zhang X.-d., Li A., and Pan R., Stock trend prediction based on a new status box method and AdaBoost probabilistic support vector machine, Appl. Soft. Comput. 49 (2016), pp. 385–398. doi: 10.1016/j.asoc.2016.08.026 [DOI] [Google Scholar]
- 46.Zurada J.M., Introduction to Artificial Neural Systems, Vol. 8, West Publishing Company, St. Paul, 1992. [Google Scholar]
