Skip to main content
PLOS One logoLink to PLOS One
. 2019 Nov 4;14(11):e0223529. doi: 10.1371/journal.pone.0223529

Optimally adjusted last cluster for prediction based on balancing the bias and variance by bootstrapping

Jeongwoo Kim 1,2,*
Editor: Peng Li3
PMCID: PMC6827892  PMID: 31682618

Abstract

Estimating a predictive model from a dataset is best initiated with an unbiased estimator. However, since the unbiased estimator is unknown in general, the problem of the bias-variance tradeoff is raised. Aside from searching for an unbiased estimator, the convenient approach to the problem of the bias-variance tradeoff may be to use the clustering method. Within a cluster whose size is smaller than the whole sample, we would expect the simple form of the estimator for prediction to avoid the overfitting problem. In this paper, we propose a new method to find the optimal cluster for prediction. Based on the previous literature, this cluster is considered to exist somewhere between the whole dataset and the typical cluster determined by partitioning data. To obtain a reliable cluster size, we use the bootstrap method in this paper. Additionally, through experiments with simulated and real-world data, we show that the prediction error can be reduced by applying this new method. We believe that our proposed method will be useful in many applications using a clustering algorithm for a stable prediction performance.

Introduction

When analyzing relationships among variables, including random variables with an equation of y = f(x)+e, where e is a random vector of a certain distribution with zero mean and preset covariance, two main problems are generally encountered: one is related to f(x) and the other to e. The latter mainly includes heteroscedasticity and serial correlation problems, which induce efficiency of the estimator to decrease, thereby hindering rigorous hypothesis testing. The former, which is related to the misspecification problem leading to biased estimator, frequently presents a critical problem. In numerous applications, unless an underlying theoretical foundation for an approximation model has been rigorously established or is widely acknowledged in a related area, one is prone to obtaining a biased estimator.

If the purpose of a study is to verify an exact relationship among the given variables of a certain dataset, the first priority is to set the approximation model with the best fit—i.e., an unbiased model specific to the given dataset. However, for prediction purposes, one may not struggle to obtain an unbiased estimator to yield the smallest error metric by considering only the given dataset. In fact, focusing only on the given dataset may result in the overfitting problem, which hinders accurate prediction; therefore, a biased approximation model can sometimes be useful for the purpose of prediction.

Using a biased approximation model, the more instances we use, the greater the predictor bias we obtain. Therefore, with a biased approximation model, a subsampling approach could yield a more accurate performance up to the point where the gain from the reduced bias exceeds the loss from the increased variance [1]. Conversely, the subsampling approach could have serious drawbacks when the increased variance considerably exceeds the reduced bias because the convergence rate of variance is generally O(N−1), where N is the sample size. Moreover, searching for every neighbor reference of a target point, such as the k-nearest neighbor and moving average methods, is computationally expensive [2]. Considering these factors, a more efficient method would be to partition the data into clusters. Clustering methods are easy to implement and extensively used as one of the unsupervised learning methods. However, in general, the number of clusters is not analytically obtained, but numerically searched in a set of candidate numbers. Therefore, it is prone to lead to the overfitting problem [3, 4]. With the effect of the overfitting problem, a poor cluster would cause high variance [5] because the small size of the subsample usually displays a high variance [6, 7]. Thus, determining how to adequately adjust the size of the cluster would be a key approach to overcome the high variance and the overfitting problem. In particular, to adjust the size of the cluster for prediction, we may consider the cluster close to a prediction target, unlike the usual clustering method fitting a whole dataset. In the current study, the term last cluster indicates the cluster located closest to a prediction target after partitioning the data into clusters using a method such as k-means clustering because considering only the last cluster in a prediction problem can be an efficient way to achieve accurate prediction. For example, Oh and Kim [8] obtained a good prediction performance using stock market data by referring to the last homogeneous clusters obtained from the data.

Meanwhile, the high variance problem using the last cluster can still present a problem during prediction because it is difficult to determine the size of the last cluster that ensures an accurate prediction. Hence, balancing the bias and the variance would be useful for the prediction problem using the last cluster. As mentioned above, adjusting the size of the last cluster would reduce the variance of a cluster obtained from partitioning data, which helps balancing the bias and the variance, so that the prediction accuracy would improve. Thus, in the current study, adjusting the size of the last cluster for the accurate prediction will be focused on.

Consider a function f(x) and its arbitrary estimator f^(x). Then, the mean squared error (MSE) of f^(x) is decomposed into E[(f(x)E[f^(x)])2]+E[(E[f^(x)]f^(x))2], which is a squared bias plus a variance. Hence, we can reduce the MSE by controlling the squared bias and the variance. However, unless f(x) is known, it is not easy to directly control the bias by specifying an estimator. Therefore, the method of reducing the variance can be more efficient. In addition, model regularization using a shrinkage method, such as least absolute shrinkage and selection operators (LASSO), is also popular for balancing bias and variance. In model estimation, these methods avoid the overfitting problem by adopting a new hyperparameter, termed a tuning parameter, that leads to increasing bias but decreasing variance through a smoothing effect. In general, this tuning parameter is usually obtained by cross-validation [9] and is helpful in alleviating the overfitting problem.

Aside from the model regularization methods, other unique approaches to the bias-variance balance in the econometrics community exist that address the trade-off between bias and variance with structural breaks in a dataset because structural breaks can be useful in determining where to partition a given dataset. Elliott [10] attempted to improve prediction accuracy by using weighted-averaging on the regression results for multiple structural breaks. Pesaran and Timmermann [11] presented some approaches based on structural breaks to determine a window (or cluster) size to achieve accurate prediction. Furthermore, also considering the structural breaks, Clark and McCracken [12] proposed a novel approach to improve the prediction accuracy by combining two estimated regression coefficients using a new hyperparameter, similar to LASSO.

The bootstrap method is well-known for reducing the bias [13] and variance [14] of an estimator. This method allows us to approximate an underlying functional form of a given dataset by averaging the noise of different bootstrapped samples out, thereby reducing bias. Furthermore, it is useful in obtaining an estimator with low variance through training a model multiple times. In particular, the relative advantage of variance reduction is often achieved at the expense of a slight increase in bias [15], which is one of the reasons why the bagging algorithm can improve model accuracy. Bauer and Kohavi [16] reported numerical results on the relatively greater variance reduction compared to bias reduction by using the bagging algorithm in a series of experiments. Fumera et al. [17] also showed that the bagging algorithm of N bootstrap replicates induces the same bias as that of a single bootstrap replicate but decreases the variance by a factor of N.

Therefore, combining the idea from a new model parametrization (such as the shrinkage method or the combination of two estimated regression coefficients) with the bootstrap method would be useful in balancing bias and variance, thus improving model prediction accuracy. Additionally, because a data-fitting-oriented model often leads to overfitting in many applications, it would be more efficient to adjust the size of the last cluster rather than to fit the entire dataset. In this study, the size-adjusted last cluster is termed as an adjusted last cluster (aCL), which has a size that falls between the size of the entire dataset and that of a last cluster by partitioning data. We expect this approach would improve the prediction performance of models based on the balance between bias and variance.

Materials and methods

Notations and assumptions

Here, we follow some conventional notations. Hat (^) notations such as a^ indicate an estimate, and an overbar (−) such as a¯ denotes an average value.

Consider a given dataset ZN={Zn:Zn=(xn,yn)Rm+1,n=1,,N}, where xn=(x1n,x2n,,xmn)TRm,ynR, in which the Zn’s are arranged by the norm of xn, ‖xn‖, for prediction purposes. We want to determine the aCL to accurately predict yN+h at xN+h beyond xN, where h is the prediction horizon. To accomplish this, assume that xn and yn follow a data generating process (DGP) as follows:

yn=f(xn)+en, (1)

where en is an independent and identically distributed random variable with zero mean, a variance of σ2 and is independent of xn.

Considering an estimator f^(xn) based on ZN for f(xn), our goal would then be to estimate f^(xn) to accurately predict yN+h at xN+h. The function f^(xn) is usually obtained by minimizing the risk function as follows:

R(f^(xn))=E[(ynf^(xn))2], (2)

where E[∙] is the expected value.

Using f^(xn), the prediction error (PE) is given by

PE(f^(xN+h))=E[(yN+hf^(xN+h))2]. (3)

In the Results and Discussion section, in order to verify the method demonstrated in this paper, we will compare the prediction performances of the methods presented in this paper using the PE and test the statistical significance of the prediction performances.

Bias and variance of a predictor: Whole dataset and subsample

In practice, the risk function is estimated as follows:

R(f^(xn))=1Nn=1N(ynf^(xn))2. (4)

Considering a linear estimator f^(xn)=k=1mbkxkn=βTxnforβRm; the estimate is obtained by minimizing the risk function with respect to β:

β^=argminβRmR(f^(xn)) (5)
=(XNTXN)1XNTyN,

where XN = (x1,x2,…,xN)T and yN = (y1,y2,…,yN)T. Then, the predictor is f^(xN+h)=β^TxN+h.

However, this predictor is based on the assumption that a linear estimator is applicable to the whole dataset; hence, this predictor produces a vulnerable prediction performance depending on the shape of the true function. For example, consider a DGP, yn = −0.5x1n+x1n2+en,en~N(0,1) and a linear estimator f^(x1n)=a0+a1x1n over the whole dataset. Then, the value predicted by this estimator over the whole dataset would be quite distant from the prediction target as shown in Fig 1. However, the predicted value from a subsample of the given dataset achieves a more accurate prediction.

Fig 1. Prediction when using the whole dataset (whole, solid line) and a subsample (subsample, dashed line).

Fig 1

Furthermore, as more samples are used, the bias of the predictor increases but its variance decreases, as shown in Fig 2. In general, an estimator from a subsample suffers from high variance, whereas the estimator from a whole dataset suffers from high bias [18]. Therefore, a size-adjusted subsample optimal for prediction should exist based on the balance between bias and variance [19]. For example, the size of such a subsample would be approximately 26, as seen in Fig 2.

Fig 2. Change of MSE, squared bias (Bias^2) and variance (Var) of the predictor according to the number (n) of samples used in estimation.

Fig 2

Here, the DGP and the estimator are the same as those in Fig 1.

To obtain the optimal subsample for prediction, first it is necessary to partition a given dataset using a method such as k-means clustering. However, determining the size of the last cluster for prediction may not be easy. To reliably determine the size of the last cluster, knowing the change points, structural breaks and local extrema of the given DGP can be useful. In practice, such values are not known a priori; one must guess the size of the last cluster by simply observing a graph of the DGP in an exogenous manner. However, partitioning the dataset in such a manner might be unreliable because the dataset observed is only a single realized sample among all the possible samples generated by the DGP. Thus, to obtain the size of the last cluster reliably, we would need to consider as many realized samples as possible, which would be unfeasible in practice. As an alternative, the bootstrap method used in [20] can be useful for virtually mimicking these samples to obtain the reliable last cluster.

Adjusted last cluster for prediction

When simply partitioning one realized dataset, the size of the last cluster may be estimated to be smaller than the last cluster of the true function because the partitioning process of one realized dataset tends to be affected by noise, en. To avoid this problem, adequately expanding the size of the last cluster results in reducing a predictor’s variance and would therefore be an alternative for approximating the last cluster of the true function. Hence, the aCL between the last cluster achieved by partitioning one realized dataset and the whole dataset would yield a more accurate prediction performance as long as the decreased variance by expanding the size of the last cluster exceeds the increased bias, similar to a regularization method such as LASSO [21].

In the following example, Fig 3A depicts the prediction by one realized sample (dotted line). A true function (solid line) is f(x), and a linear estimator is used. The size of the last cluster (cluster, dashed line) from which a predictive value is obtained appears to be relatively small and is much influenced by noise, hence rendering a nonnegligible gap between the predictive value (circle) and the prediction target (asterisk). In contrast, Fig 3B shows that the new sample (BTSmean, dashed line with square) averaged over bootstrapped samples (BTS, dotted lines) well approximates the true function f(x). The last cluster (aCL, dashed line) based on the new sample is larger than the last cluster of the sample in Fig 3A. Thus, the predictive value of this last cluster more accurately indicates the prediction target.

Fig 3.

Fig 3

Prediction based on the last cluster of one realized sample (A) and the last cluster of the new sample from bootstrapped samples (B). The predictive value (circle) in 3B indicates the prediction target (asterisk) more accurately than does the predictive value (circle) in 3A. DGP: yn = −2x1n+x1n2+en,en~N(0,1).

Analogous to the idea of aCL, Clark and McCracken [12] proposed a novel approach to obtain a new regression coefficient for more accurate prediction by linearly combining two estimated regression coefficients under structural breaks: one estimated from a last rolling window (or a last cluster) and the other estimated from a recursive window (or a whole dataset).

Consider a sample ZN realized from (1) and the linear estimator in (4)–(5), and denote the estimated regression coefficient obtained from the recursive window of size N as βR and the estimated regression coefficient obtained from the last rolling window of size L as βL. Then, the linear combination of these two estimated regression coefficients βC is

βC=αβL+(1α)βR,0α1. (6)

Hence, βC is a compromise solution between βL and βR. Clark and McCracken [12] showed that βC allows more accurate prediction by inducing lower bias than βR and lower variance than βL.

However, in [12], only one realized dataset was considered, and the regression coefficients already estimated from the last rolling window and the recursive window were used for βC rather than being based on a last cluster optimal for prediction.

Therefore, if a sufficient number of realized datasets from a certain DGP are given and βC is estimated from using these datasets, we can obtain a more generalized estimate for βC, which leads to more accurate prediction.

However, in practice, we cannot acquire many realized datasets from the DGP. Hence, in this paper, we adopt the bootstrap method to reproduce such datasets and achieve more accurate prediction as shown in Fig 3. We summarize the aCL method based on the bootstrap method as below.

Algorithm Prediction (one-step ahead) using the aCL

            1: Given a set ZN={Zn:Zn=(xn,yn)Rm+1,n=1,,N}

                      2: for i∈{1,…,m} do

        3: Reproduce the i-th bootstrap dataset, ZNi={ZNi:ZNi=(xin,yin)Rm+1,n=1,,N}

4: Obtain a last cluster by partitioning the dataset (such as k-means clustering), and denote the size of the last cluster as Li

            5: Let the size of aCL be Ci = αiLi+(1−αi)N, 0≤αi≤1

6: Estimate C^i by minimizing the value of the risk function, C^i=argminLiCiN1Cin=NCi+1N(yinf^(xin))2

                                7: end for

                          8: Calculate the average of the C^is, C¯=1mi=1mC^i

        10: Estimate the regression coefficient using C¯,β^C=argminβCRm1C¯n=NC¯+1N(ynf^(xn))2

                    11: Using β^C, construct the predictor f^(xN+1) for yN+1

Hence, the aCL acquired via the bootstrapped datasets is used for the predictor f^(xN+1) rather than the last cluster obtained by simply partitioning the dataset, and it is expected to achieve a more accurate prediction performance based on the balance between bias and variance. The prediction performance of the presented method will be compared with other relevant methods in the Results and Discussion section.

Methods for evaluating prediction error

In this section, we present several methods for comparing the prediction performance of the aCL method. First, we briefly introduce three existing optimization methods to check whether the aCL method improves the prediction performance of the optimization methods. Next, as competitors to the aCL method, we introduce three methods relevant to the aCL method. The numerical results of the prediction errors comparison are then presented in the Results and Discussion section.

Optimization methods

Because the aCL method is a way of searching for a last cluster optimal for prediction, we can apply it to general optimization methods. Therefore, we need to assess whether combining the aCL method with an optimization method reduces the prediction error of the optimization methods.

First, we combine the aCL method with ordinary least square (OLS), which is the most widely used optimization method due to its ease of use and easy interpretation. Under certain conditions, OLS is known to yield the estimator with the lowest variance among linear unbiased estimators [22]; however, in practice, it is not easy to create a linear unbiased estimator covering an entire dataset. However, a linear estimator can fit the data well when considering only a subsample of the entire dataset.

Least squares adjusted with pseudo data. The aCL method was next applied to a new optimization method for prediction—least squares adjusted with pseudo data (LSPD) [23]. The main principle of LSPD is to convert test error (or prediction error) into training error (or residual) by adopting a pseudo data point for a prediction target.

In general, to calculate the prediction target, the parameters of a predictive model are estimated by minimizing the training error using a given dataset, which can often lead to overfitting. Focusing only on a given dataset during optimization is one of the biggest reasons for the occurrence of the overfitting problem because it results in estimated parameters that fail to accurately predict the prediction target beyond the given dataset. Some methods using pseudo data beyond the given dataset have been suggested in the nonparametric community. For example, Schuster [24] suggested the reflection data method. When estimating kernel density with a dataset {X1,X2,…,XN}, this method claimed that adding a pseudo dataset {−X1,−X2,…,−XN} to the former could yield a consistent kernel density estimator based on the symmetry property of the kernel density function. Similarly, Cowling and Hall [25] proposed a general version of the reflection data method that can be used when discontinuities in the density derivatives exist.

However, unlike in kernel density function estimation, we are unable to use the symmetry property of a kernel density function in the prediction problem. There is little research on using pseudo data in the prediction problem, but Breiman [26] proposed a pseudo data method that reproduces the pseudo data from a certain convex combination of a given dataset to achieve more accurate prediction. Similarly, LSPD supposes that we can attain some pseudo data points close to the prediction target and add those to the given dataset to generate a new dataset. The estimated parameters based on the new dataset may then be useful for a more accurate prediction result by avoiding focusing only on the given dataset and thus alleviating the overfitting problem.

Put a pseudo data point yN+1p close to yN+1 and denote the set including it by ZN+1p={(x1,y1),(x2,y2),,(xN,yN),(xN+1,yN+1p)}. Then, the following holds [23]:

E[(yN+1f^(xN+1))2|ZN+1p]E[(yN+1f^(xN+1))2|ZN], (7)

where E[∙|A] is the conditional expected value given A.

In addition, among the various types of yN+1p, Kim et al. [23] reported that a sample mean of a given dataset usually results in the lowest prediction error. We use the sample mean for yN+1p in this study.

Least absolute shrinkage and selection operator. Finally, we apply the aCL method to LASSO to investigate whether it still holds in the regularization method. LASSO generally provides a biased estimator due to penalization of the estimated coefficients [27]. Taking the previous example of a linear estimator used in (4)–(5), the LASSO method minimizes

n=1N(ynk=1mbkxkn)2+λk=1m|bk|, (8)

where λ is the penalty (or tuning) parameter. 10-fold cross-validation is used to estimate λ in this study. The first term of (8) is the OLS method, which can possibly cause overfitting in a whole dataset, and the second term is the penalization term, which allows a predictive model to avoid the overfitting problem but leads to a biased estimator. Meanwhile, the aCL method uses a subsample of the given data; hence, it would not cause serious overfitting. In addition, based on this advantage, the penalization term would not shrink the estimated coefficients more than necessary, which alleviates the biased-estimator problem. Therefore, the aCL method is expected to reduce the prediction error when combined with LASSO.

Competing methods

K-means clustering. The k-means clustering algorithm is a well-known and widely used method of partitioning data. Hence, we evaluated whether the aCL method can reduce the prediction error compared to k-means clustering.

In general, k-means clustering is used for performing classifications; however, because it is based on unsupervised learning, k-means clustering is also useful for making predictions on real-world data problems [27, 28]. Meanwhile, the problem of selecting the appropriate k value in an application persists. Suppose that ZN is partitioned into k subsets, S1,S1,…,Sk. The k value is estimated as follows:

k^=argminkZ+i=1kZnSiZnμi2, (9)

where μi is the mean of Si.

Because the value of k is numerically determined in many cases [28], it is dependent on a type of data or on researcher knowledge. Moreover, the noise in a given dataset can easily lead to a small-sized cluster. Very small clusters result in high predictor variance, such that the prediction error increases beyond what is desirable [7]. Therefore, the aCL with a size between a last cluster and the whole dataset can result in lower prediction error than can k-means clustering.

The new sample averaged over bootstrap samples. As shown in Fig 3, the true function can be well approximated by bootstrap samples. Namely, a new sample averaged over the bootstrap samples can achieve a more accurate prediction performance than when simply using only one realized sample. Therefore, we need to evaluate the prediction performance based on the new sample averaged over the bootstrap samples compared to the aCL method.

Linear combination of two coefficients.As explained in the previous section, Clark and McCracken [12] suggested that the linear combination of two estimated regression coefficients improves the prediction performance, similar to the idea of the aCL method. However, in [12], only one realized sample was used, and a small number of change points were assumed. The aCL method is derived from using bootstrap samples and involves the last cluster among the multiple clusters in a given dataset; thus, the aCL method is expected to achieve a more accurate prediction result than the linear combination of two estimated regression coefficients.

Results and discussion

Simulation study

Simulation models

To examine whether the aCL method applies well to various functional forms, we conducted Monte Carlo simulations over four types of DGPs (100 Monte Carlo simulations for each DGP). The aCL method helps to find the change point of f(x), at which a whole sample is adequately partitioned so that a linear estimator fits well to the last cluster. Therefore, the DGPs having the change point would be good examples. DGP 1 is a univariate quadratic model, in which a change point exists at the local maximum. DGP 2 consists of two planes where the change points occur at the midpoint of a sample; thus the optimal location for partitioning data would be around the midpoint. DGP 3 is a hyperbolic paraboloid that resembles a saddle, in which a change point can be made from two directions. Thus, finding where to partition a given dataset is challenging. Finally, as a general example of DGP, we set DGP 4 to a multivariate polynomial function form. All the DGPs are estimated by a linear estimator as in (4)–(5):

DGP1yn=3(θ1+θ2)2θ1θ2x1n+2θ2θ1x1n2+en
DGP2yn={1θ1θ1θ2x1n+θ1θ2x2n+enfornN21θ2θ2θ1x1n+θ2θ1x2n+enforn>N2
DGP3yn=1θ1x1n21θ2x2n2+en
DGP4yn=θ1θ2x1n+θ2θ1x2n+θ1x1nx2n+θ2x2n2+en.

To consider the dynamic structure in each DGP, we let xmn follow the AR(1) structure for m = 1 and 2; x1n = 0.5x1n−1+e1n,e1n~N(0,1) and x2n = 0.7x2n−1+e2n,e2n~N(0,1), respectively. Moreover, to observe the variations in the numerical results according to the changes in coefficients, we generated the coefficients according to [29],

θk=p1pγk, (10)
γk={(k/3)A(3k/3)B}/k=12{(k/3)A(3k/3)B}2,

where p takes the values {0.1,0.2,…,0.9}, and both A and B take the values {0.1,0.2,…,1}. The noise term en in each DGP follows an independent and identical normal distribution of zero mean and variances (denoted by Sig) of 1, 2, and 3. Additionally, we consider sample sizes (denoted by N) of 100, 200, and 300 and prediction horizons (denoted by Ho) of 1, 2, and 3. In brief, to test the prediction performance of the aCL method in a variety of situations, all DGPs were generated by Monte Carlo simulation with the independent variable of the AR(1) process, varying coefficients of DGP and variances of the noise term. In addition, we varied the sample size of DGP and prediction horizon to test the prediction performance of the proposed method.

Despite various simulation settings, these four DGP experiments may be insufficient to assert that the proposed method is always more accurate. For example, in the DGP consisting of only noise, a simple predictor of the sample mean would be a better choice; in contrast, in a DGP with no noise, 1-nearest-neighbor algorithm would achieve a good prediction result. However, for easily visible DGPs such as these extremal cases, one can directly specify an estimator for a given dataset.

As a statistical test on the prediction results among the presented methods we applied the Diebold-Mariano test [30], which is used to test the null hypothesis that two methods have the same prediction accuracy. This test is useful in many applications because it can be used for non-Gaussian or serially correlated data.

Results on the simulated data

First, it would be helpful to see the existence of the last cluster for more accurate prediction using each DGP. From Fig 4, we can observe that prediction based on some last clusters results in lower prediction error than does prediction using the entire dataset. Nevertheless, there exist other last clusters that yield higher prediction error than using the whole dataset. Therefore, selecting a last cluster of adequate size is important and so using the aCL method can be useful for obtaining the last cluster.

Fig 4. Change of the prediction error based on the size of the last cluster.

Fig 4

Prediction based on the last cluster (cluster, dashed line) can yield lower prediction error than does linear regression over the entire dataset (LR, solid line). Because the full dataset is used for LR, the PE does not change in LR. For each DGP, N = 100.

Furthermore, Fig 5 depicts how the aCL is chosen and the predictions are made based on the aCL. For all the DGPs, the estimator based on the aCL more accurately indicates the prediction target than does the estimator based on the entire dataset.

Fig 5. Selection of aCL and its prediction.

Fig 5

DGP 1 shows how aCL (square) is selected among the data (white circle) and that the fitted line of aCL (aCL-fit, dashed line) more accurately indicates the prediction target (target, black circle) than does the linear regression fitted line of a whole dataset (LR-fit, solid line). DGP 2–4 also show that aCL is well selected in three-dimensional space, and the fitted plane of aCL (aCL-fit, dark green plane) more accurately indicates the prediction target than does the linear regression fitted plane of the whole dataset (LR-fit, light gray plane). For each DGP, N = 100, Ho = 1, and Sig = 1.

Tables 14 show the prediction errors of the methods used in this study. The first nine rows in the table present the results when the aCL method is applied to OLS, LSPD, and LASSO. The last six rows show the prediction errors of the competing methods (k-means clustering, the new sample obtained from bootstrap samples, and the linear combination of two estimated regression coefficients). Additionally, the tables present the changes of prediction errors according to the different sample sizes (N), horizons (Ho), and variances (Sig). The abbreviations in the tables are as follows. OLS = ordinary least squares, OLSaCL = OLS combined with aCL, LSPD = least squares adjusted with pseudo data, LSPDaCL = LSPD combined with aCL, LASSO = least absolute shrinkage and selection operator, LASaCL = LASSO combined with aCL, Kmeans = k-means clustering, Bootstr = the new sample obtained from bootstrap samples, and Comb = the linear combination of two estimated regression coefficients.

Table 1. Comparison of prediction error in DGP 1.
DGP = 1 N = 100 N = 200 N = 300
Ho 1 2 3 1 2 3 1 2 3
Sig 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3
OLS 245.019 348.978 342.525 387.463 392.13 375.065 274.54 421.606 257.788 320.217 444.204 345.762 467.147 343.895 514.531 340.587 503.16 362.892 460.07 482.527 419.929 479.324 494.506 508.297 374.599 517.43 444.044
OLSaCL 11.608 14.307 17.168 20.125 20.842 27.083 20.693 31.435 27.26 13.484 26.915 25.485 37.781 19.109 63.072 23.303 44.693 29.793 27.662 32.758 30.162 52.051 44.51 63.703 28.637 45.64 38.795
(p-value) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001)
LSPD 246.171 346.157 340.285 381.899 385.778 370.38 271.444 412.312 255.431 322.563 446.452 347.975 468.145 345.49 515.003 342.05 502.971 364.028 461.974 484.368 420.827 480.497 495.739 509.549 374.776 517.988 444.158
LSPDaCL 20.039 27.033 28.1 32.241 30.671 36.312 30.084 42.633 34.552 16.839 41.457 30.424 58.335 21.882 80.442 26.411 60.528 33.179 42.287 47.008 37.065 70.264 61.544 88.701 35.353 65.583 44.794
(p-value) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001)
LASSO 243.205 341.971 339.174 384.461 389.413 372.398 272.355 418.626 253.913 318.275 441.989 344.245 464.411 342.197 511.615 338.708 500.569 361.098 457.355 479.373 407.125 476.788 490.855 488.41 372.313 515.099 441.486
LASaCL 12.787 19.905 18.788 20.27 21.737 28.358 21.306 33.839 26.819 14.698 29.041 27.258 42.372 19.072 71.371 22.328 39.689 27.366 26.81 32.928 32.569 53.885 46.549 59.386 28.678 47.937 38.953
(p-value) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001)
Kmeans 31.493 57.695 56.587 57.589 73.917 65.173 42.737 75.498 51.922 49.992 61.279 61.051 82.013 65.229 88.867 60.757 84.908 83.619 60.22 84.505 74.83 73.919 84.414 77.206 64.273 89.589 81.851
(p-value) (0.002) (<0.001) (<0.001) (0.001) (<0.001) (0.002) (0.008) (0.003) (0.014) (0.002) (0.008) (0.001) (0.033) (<0.001) (0.084) (0.008) (0.002) (0.002) (0.005) (0.004) (<0.001) (0.094) (0.006) (0.207) (0.007) (0.001) (0.002)
Bootstr 242.343 346.982 341.665 388.084 390.902 374.487 276.42 421.122 259.228 321.503 445.978 348.669 470.8 347.044 517.961 340.979 506.693 363.502 463.408 484.174 422.316 480.895 496.311 509.644 375.228 520.118 444.081
(p-value) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001)
Comb 176.012 274.632 282.323 324.494 304.418 292.052 205.806 363.356 216.887 200.121 323.458 241.346 334.116 242.172 412.308 246.003 409.515 260.036 341.701 331.195 325.951 347.476 369.015 379.948 297.122 403.248 377.845
(p-value) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001)

NOTE: The three optimization methods (OLS, LSPD, and LASSO) are compared with their aCL methods. Other competing methods (Kmeans, Bootstr, and Comb) are compared with OLSaCL. The p-value of the Diebold-Mariano test is shown in parentheses.

Table 4. Comparison of prediction error in DGP 4.
DGP = 4 N = 100 N = 200 N = 300
Ho 1 2 3 1 2 3 1 2 3
Sig 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3
OLS 7.727 7.565 7.016 8.059 7.505 6.873 9.74 12.159 8.514 17.703 21.741 27.373 17.958 25.376 29.965 18.701 24.961 31.936 25.405 34.045 47.398 25.205 35.533 52.104 23.552 35.046 44.619
OLSaCL 0.752 1.609 3.186 1.346 2.568 4.513 1.524 3.971 5.051 1.856 4.441 7.406 2.39 4.689 10.849 2.818 6.558 14.001 1.945 6.338 11.552 2.097 4.5 13.422 3.5 5.978 14.943
(p-value) (<0.001) (<0.001) (0.001) (<0.001) (<0.001) (0.033) (<0.001) (<0.001) (0.053) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001)
LSPD 7.645 7.669 7.386 8.23 7.559 7.11 9.793 12.046 8.912 17.89 21.194 27.522 17.928 25.051 30.201 18.731 24.976 32.115 25.697 33.569 47.688 25.52 35.904 51.636 23.865 34.523 45.203
LSPDaCL 1.492 2.258 3.612 2.436 2.929 4.545 2.744 4.795 5.728 2.473 5.033 7.928 2.912 5.367 11.446 3.164 7.062 14.802 2.673 6.921 12.695 2.764 4.703 14.555 4.197 6.684 15.923
(p-value) (<0.001) (<0.001) (0.001) (<0.001) (<0.001) (0.05) (<0.001) (0.001) (0.086) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001)
LASSO 7.798 8.409 7.049 8.028 7.566 7.109 9.745 12.414 6.997 17.788 21.601 27.361 18.011 25.394 30.045 18.797 24.946 33.342 25.353 34.16 47.734 25.223 35.593 52.959 23.629 35.68 45.011
LASaCL 0.888 1.239 2.863 1.387 2.459 5.283 1.716 3.931 5.214 1.88 4.459 6.735 2.416 5.624 12.326 3.382 6.999 14.528 1.909 5.764 11.203 2.321 4.76 13.978 3.581 6.929 15.242
(p-value) (<0.001) (<0.001) (0.001) (<0.001) (<0.001) (0.058) (<0.001) (<0.001) (0.085) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001)
Kmeans 1.822 4.847 13.545 2.355 8.571 16.997 2.598 6.659 18.116 4.42 7.092 21.05 4.729 8.856 25.98 6.308 14.677 26.239 4.737 10.346 20.862 4.838 11.242 22.267 5.026 10.693 23.504
(p-value) (0.001) (<0.001) (<0.001) (0.009) (<0.001) (<0.001) (0.008) (0.006) (<0.001) (<0.001) (0.002) (<0.001) (0.006) (0.002) (<0.001) (<0.001) (<0.001) (0.002) (<0.001) (0.036) (0.004) (0.001) (<0.001) (0.005) (0.072) (0.001) (0.01)
Bootstr 7.654 7.491 7.192 8.057 7.487 7.006 9.63 11.909 8.531 17.907 21.882 27.061 18.274 25.895 29.923 18.877 25.004 32.182 26.497 35.249 49.32 26.247 37.054 53.803 24.284 36.392 46.379
(p-value) (<0.001) (<0.001) (0.001) (<0.001) (<0.001) (0.025) (<0.001) (<0.001) (0.05) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001)
Comb 5.273 5.866 6.986 6.253 6.683 8.126 7.583 7.632 8.257 15.247 16.04 24.833 15.289 19.234 28.538 15.627 20.16 29.114 18.257 31.635 40.538 18.814 28.54 50.077 16.133 32.559 39.39
(p-value) (<0.001) (<0.001) (0.001) (<0.001) (<0.001) (0.009) (<0.001) (0.001) (0.037) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001)
Table 2. Comparison of prediction error in DGP 2.
DGP = 2 N = 100 N = 200 N = 300
Ho 1 2 3 1 2 3 1 2 3
Sig 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3
OLS 1.312 1.262 3.423 2.445 2.055 5.018 1.567 2.245 6.831 4.172 7.628 11.998 4.761 8.751 21.282 5.275 10.754 16.625 6.184 10.507 20.276 6.852 14.483 21.081 6.955 12.511 28.861
OLSaCL 0.313 1.162 3.69 0.633 2.153 3.01 0.675 2.324 4.422 1.074 1.965 5.088 1.418 2.741 7.993 1.889 6.768 10.178 0.855 4.609 5.64 2.327 5.87 17.602 1.547 5.401 3.474
(p-value) (<0.001) (0.4) (0.599) (0.016) (0.548) (0.084) (0.002) (0.55) (0.122) (0.002) (0.004) (0.014) (<0.001) (<0.001) (0.004) (0.004) (0.029) (0.001) (<0.001) (0.01) (<0.001) (<0.001) (<0.001) (0.249) (<0.001) (0.016) (<0.001)
LSPD 2.421 2.595 4.819 3.518 3.578 7.047 2.751 3.752 8.743 4.336 7.79 12.044 4.956 8.903 21.75 5.579 10.369 16.046 6.382 10.179 19.561 7.04 14.293 20.642 7.056 12.584 28.466
LSPDaCL 0.571 1.661 4.098 1.028 2.754 3.522 0.933 3.275 5.662 0.784 2.409 5.454 1.385 2.423 8.857 1.846 5.911 10.426 0.967 4.567 6.162 1.763 5.648 17.361 1.352 4.836 3.358
(p-value) (<0.001) (0.039) (0.302) (0.004) (0.2) (0.021) (<0.001) (0.262) (0.114) (<0.001) (0.008) (0.015) (0.001) (0.001) (0.006) (0.003) (0.007) (0.003) (<0.001) (0.011) (<0.001) (<0.001) (<0.001) (0.27) (0.001) (0.008) (<0.001)
LASSO 1.287 1.175 3.27 2.073 2.277 5.035 1.477 1.958 6.175 3.96 6.821 10.83 4.518 8.244 20.203 5.162 10.429 15.718 6.116 10.57 19.652 6.753 13.408 20.884 6.907 11.668 28.137
LASaCL 0.424 1.354 4.409 0.803 2.369 2.036 0.531 3.048 3.2 0.989 1.714 4.673 1.526 2.721 8.551 1.922 7.845 13.449 0.854 4.364 5.92 2.097 6.621 17.856 1.408 5.57 3.317
(p-value) (<0.001) (0.406) (0.563) (0.034) (0.356) (0.018) (0.001) (0.462) (0.08) (0.001) (0.003) (0.016) (<0.001) (<0.001) (0.006) (0.005) (0.043) (0.001) (<0.001) (0.009) (<0.001) (<0.001) (<0.001) (0.175) (<0.001) (0.024) (<0.001)
Kmeans 1.108 2.673 7.525 1.201 16.642 11.836 1.265 8.266 19.091 1.682 4.908 17.438 1.936 5.385 25.424 2.862 11.506 19.876 1.493 6.012 18.83 1.296 6.046 12.417 1.546 6.631 17.995
(p-value) (0.001) (0.051) (0.014) (0.105) (0.078) (<0.001) (0.004) (0.009) (<0.001) (0.096) (0.012) (0.017) (0.171) (0.047) (0.002) (0.157) (0.116) (0.039) (0.005) (0.208) (<0.001) (1) (0.459) (0.872) (0.5) (0.323) (<0.001)
Bootstr 1.308 1.314 3.334 2.562 1.9 4.72 1.673 1.971 6.622 4.16 7.817 11.992 4.927 8.852 21.006 5.52 10.931 16.743 6.218 10.727 20.766 6.876 14.384 20.493 7.189 12.953 29.458
(p-value) (<0.001) (0.349) (0.638) (0.016) (0.625) (0.11) (0.001) (0.699) (0.157) (0.001) (0.005) (0.014) (<0.001) (0.001) (0.005) (0.003) (0.024) (0.001) (<0.001) (0.009) (<0.001) (<0.001) (<0.001) (0.286) (<0.001) (0.013) (<0.001)
Comb 0.895 1.201 3.818 1.104 2.565 5.613 0.948 2.906 9.575 2.702 6.005 13.179 3.284 6.805 20.8 4.098 8.631 16.924 3.543 8.851 19.813 4.274 13.465 19.503 5.483 9.336 26.72
(p-value) (0.002) (0.461) (0.457) (0.128) (0.335) (0.023) (0.272) (0.164) (0.013) (0.009) (0.005) (0.022) (0.019) (0.003) (0.006) (0.017) (0.177) (0.015) (<0.001) (0.034) (<0.001) (0.019) (0.001) (0.346) (0.002) (0.081) (<0.001)
Table 3. Comparison of prediction error in DGP 3.
DGP = 3 N = 100 N = 200 N = 300
Ho 1 2 3 1 2 3 1 2 3
Sig 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3
OLS 11.451 14.406 20.64 12.451 17.555 24.14 11.901 17.448 22.579 11.881 11.586 19.696 12.124 16.382 16.467 11.457 14.385 27.168 13.35 13.871 17.449 15.314 14.568 15.742 17.187 16.616 19.512
OLSaCL 1.264 2.792 6.841 2.442 4.825 6.806 2.26 5.097 9.574 2.224 4.368 6.184 2.595 7.312 9.563 3.111 7.586 13.042 1.745 4.638 7.756 2.644 5.523 10.183 2.574 4.759 8.504
(p-value) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (0.001) (0.006) (<0.001) (0.004) (0.003) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (0.036) (<0.001) (<0.001) (0.001)
LSPD 11.537 14.523 20.668 12.408 17.518 23.999 11.818 17.327 22.339 11.916 11.621 19.651 12.141 16.457 16.467 11.464 14.357 27.106 13.382 13.879 17.447 15.341 14.556 15.74 17.185 16.593 19.503
LSPDaCL 1.933 3.41 7.432 3.106 5.221 7.171 2.75 5.396 9.798 2.611 4.604 6.442 2.95 7.395 9.91 3.504 7.884 13.389 2.101 4.917 8.047 2.9 5.643 10.239 2.948 4.85 8.584
(p-value) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (0.001) (0.007) (<0.001) (0.005) (0.003) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (0.039) (<0.001) (<0.001) (0.001)
LASSO 10.674 13.425 20.546 11.735 19.472 25.308 10.179 15.338 20.469 11.643 12.053 19.365 12.089 16.688 15.42 11.317 14.575 25.719 13.541 11.842 17.867 15.561 14.699 15.864 16.836 15.103 17.955
LASaCL 1.463 2.781 5.935 2.982 4.056 8.66 2.494 5.447 7.367 2.248 4.227 5.356 2.811 8.88 7.885 3.32 7.487 13.62 2.023 4.381 8.616 3.223 5.523 10.892 3.268 4.87 9.38
(p-value) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (0.003) (0.006) (<0.001) (<0.001) (0.004) (<0.001) (<0.001) (0.001) (<0.001) (<0.001) (0.034) (<0.001) (<0.001) (0.001)
Kmeans 2.563 7.893 10.698 3.343 8.103 16.749 4.24 20.004 18.19 4.522 8.317 18.469 4.543 15.137 14.435 4.616 9.554 26.893 3.158 7.748 11.102 4.257 5.532 9.018 5.339 8.582 18.172
(p-value) (0.004) (<0.001) (0.018) (0.019) (0.006) (<0.001) (<0.001) (0.002) (0.024) (0.001) (0.011) (<0.001) (0.001) (0.004) (0.053) (0.078) (0.136) (0.002) (0.008) (0.013) (0.029) (0.011) (0.497) (0.692) (<0.001) (0.03) (<0.001)
Bootstr 11.041 14.159 20.696 12.14 16.87 22.999 11.782 17.205 22.1 11.971 11.623 19.506 12.064 16.029 16.159 11.421 14.514 27.127 13.919 14.261 18.269 15.648 14.829 15.88 17.714 17.178 19.675
(p-value) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (0.001) (0.006) (<0.001) (0.003) (0.002) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (0.036) (<0.001) (<0.001) (<0.001)
Comb 10.029 12.552 15.673 10.116 11.488 18.54 10.705 16.353 20.803 10.821 11.194 18.972 11.045 15.652 15.623 10.55 13.2 26.992 10.366 12.985 16.602 12.654 13.66 14.458 14.695 15.628 18.763
(p-value) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (0.002) (0.012) (<0.001) (0.003) (0.003) (<0.001) (<0.001) (0.001) (<0.001) (<0.001) (0.073) (<0.001) (<0.001) (0.001)

The results in Tables 14 indicate that the three aCL methods outperform their preceding optimization methods under different conditions. Considering the type of DGP, the prediction errors are relatively lower in the curved functional forms (DGP 1, 3, and 4) compared to DGP 2, which comprises two planes. Additionally, when Sig is large in some cases, the aCL method shows relatively low prediction performance. Regarding the prediction errors of the three competing methods, we also observe that the aCL method mainly shows lower prediction errors than those of the three competitors. It is quite evident that the aCL method yields lower prediction errors than do Bootstr and Comb. Meanwhile, k-means clustering yields good prediction results when N = 300 in DGP 2, where the effect of noise is relatively low, and so k-means clustering can work well [31].

In addition, because the size of the aCL is between the whole dataset and a last cluster, the variance of prediction error when using the aCL method is lower than that when using k-means clustering, as presented in Table 5, and so the aCL method can help to obtain a stable prediction performance.

Table 5. Variance of prediction errors by the aCL method and k-means clustering.
DGP 1 DGP 2 DGP 3 DGP 4
OLSaCL 13.884 3.680 3.047 5.964
Kmeans 14.875 7.365 6.344 7.438

NOTE: Prediction errors from Tables 14 are used

To demonstrate the robustness of the aCL method under various conditions, boxplots of the prediction errors are presented according to N, Ho, Sig, and coefficients. First, changes of the sample size N have little influence on the performance of the aCL method compared to its preceding optimization method and competing methods, as seen in Fig 6. As N increases, the aCL method consistently shows lower prediction errors than the other methods. K-means clustering achieves the second-lowest prediction errors in many cases but sometimes causes very high prediction errors.

Fig 6. Prediction error comparison by sample size (Ho = 2,Sig = 2).

Fig 6

As in the previous example, the aCL method outperforms the other methods for all the prediction horizons, as shown in Fig 7. A longer prediction horizon should affect the prediction ability of a prediction model more than a shorter prediction horizon, and so we can observe that prediction errors increase as Ho increases. In particular, k-means clustering shows high prediction errors when Ho is high.

Fig 7. Prediction error comparison by horizon (N = 200,Sig = 2).

Fig 7

A high variance of noise generally degrades prediction ability. Because the size of a cluster is smaller than the size of the full dataset, methods using the cluster would be considerably influenced by the size of the variance. In Fig 8, we can see that the prediction performance of the aCL method decreases as the variance increases. In particular, k-means clustering produces very poor prediction performances in cases with increasing variance.

Fig 8. Prediction error comparison by variance (N = 200,Ho = 2).

Fig 8

In Fig 9, we present the prediction errors resulting from changes of coefficients. Similar to previous experiments, the aCL methods achieve lower prediction errors than do the other methods. The prediction errors of k-means clustering also seem low in some cases; however, its prediction performance appears to fluctuate depending on the change of coefficients.

Fig 9. Prediction error comparison by changes of coefficients.

Fig 9

Estimation of change point. This section discusses whether a change point of a DGP is well approximated by the aCL method when compared to k-means clustering. Because a change point is distinctly set as half the sample size in DGP 2, we use the data generated by DGP 2. For the robustness check, the change point is estimated across the various values of N, Ho, and Sig. Well-estimated change points should be approximately half of the sample size. As shown in Table 6, the change points estimated by the aCL method are close to the true change points, whereas the change points estimated by k-means clustering are downward-biased because k-means clustering usually produces a small cluster size by the effect of the noise.

Table 6. Change points estimated by OLSaCL and Kmeans (DGP 2).
Factor Value OLSaCL Kmeans
100 44.5 18.8
N
(Ho = Sig = 1)
200 100.5 25.2
300 147.2 48.7
1 41.3 18.2
Ho
(N = 100,Sig = 1)
2 47 1 6.1
3 50.4 1.5
1 43 1 6.4
Sig
(N = 100,Ho = 1)
2 40.5 24.0
3 45.0 18.7

NOTE: The change point is the average over Monte Carlo iterations

Fig 10 depicts the histograms of estimated change points by OLSaCL and Kmeans through 100 Monte Carlo iterations. We can see that the histogram of OLSaCL shows a central tendency similar to a normal distribution, unlike Kmeans.

Fig 10. Histograms of change points estimated by OLSaCL and Kmeans.

Fig 10

Real data study

Stock market index

As the first application using real data, the S&P 500 index is predicted by the USD/EUR exchange rate and interest rate (effective federal funds rate in the US). Financial data have been the subject of numerous studies. In particular, the relation between stock market index and the exchange rate has long been studied [32, 33]. Additionally, the interest rate is known to affect the stock market because it is usually considered as a substitute for stock market returns [34].

More recently, machine learning methods such as the nearest neighbor algorithm have been applied to predict the stock market index [35], the exchange rate [36], and the interest rate [37] by focusing on the nonlinear characteristics of financial data. Meanwhile, most financial data have stylized facts that make prediction difficult [38], such as the fat tail distribution, clustering volatility, etc. These financial data characteristics frequently result in unpredictable spikes or clusters in time-series data which degrade the prediction power of a model using the whole dataset. Therefore, the nearest neighbor algorithm tends to be much influenced by the outliers leading to difficulty in establishing the predictive model, which might bring about the varying prediction performances depending on the range of the dataset [37]. Therefore, partitioning a dataset can be effective for these types of data. Some works reported improved prediction performances by adopting the k-means clustering method [39, 40, 41]. However, most of the related works estimate the number of clusters numerically and directly use the obtained clusters. So one may be able to test the many numbers of clusters until the desirable result is obtained [41], which makes the size of the cluster somewhat unreliable. On the other hand, because the aCL method applies the bootstrap method, it is expected to yield more reliable results.

We adopted a dataset comprising a total of 250 weeks of data from Sep. 30, 2011 to Sep. 23, 2016. To control the effect of the data size, the dataset is divided into six equal-sized subsets. Each subset includes 100 instances, from which the last 30 are predicted. The first-order difference data are used as in the usual financial data analysis [42].

Given a dataset of size N, we predict the values at N+1,N+2 and N+3. As seen in Table 7, the aCL methods largely outperform their preceding optimizations and competing methods. K-means clustering yields very poor prediction performances in several cases, for example, in the third subset; similarly, the prediction performance of the aCL method is relatively low in the third subset compared to others. Note that standard deviations in prediction errors of the aCL methods over all the prediction horizons are lower than other methods (Fig 11), which implies the low variance of the prediction error in the aCL methods.

Table 7. Comparison of prediction error (S&P 500 index).
Subset 1 2 3 4 5 6
Ho 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3
OLS 1,623.641 1,456.661 1,583.565 1,006.735 1,044.876 795.876 5,150.994 5,351.771 5,600.641 4,570.235 4,343.653 4,392.129 5,056.139 4,986.251 4,904.958 5,006.36 4,930.956 4,933.17
OLSaCL 529.096 424.337 243.216 553.64 333.66 357.637 610.101 196.099 266.285 2,213.057 1,671.478 1,543.156 950.783 895.095 548.783 658.434 827.888 1,213.569
(p-value) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001)
LSPD 1,542.055 1,392.232 1,494.012 1,110.979 1,149.076 935.288 4,816.755 4,994.519 5,216.836 3,933.531 3,770.43 3,783.216 4,315.94 4,268.341 4,188.265 4,197.11 4,142.642 4,132.846
LSPDaCL 522.096 448.278 292.519 633.418 419.701 459.249 296.197 85.188 125.423 1,542.501 1,196.477 1,049.984 664.488 644.935 338.042 450.153 599.778 866.493
(p-value) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001)
LASSO 1,389.534 1,257.328 1,354.365 777.932 803.751 596.907 3,980.313 4,116.747 4,200.481 4,428.887 4,716.819 4,279.51 4,713.972 4,792.391 4,648.729 6,022.143 5,743.751 5,356.878
LASaCL 640.095 324.586 191.067 479.821 291.952 273.487 528.755 171.586 203.63 1,982.531 3,512.937 1,270.665 2,552.292 912.82 690.8 2,249.648 2,826.237 1,796.834
(p-value) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (0.167) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (0.099)
Kmeans 2,106.306 1,528.799 1,642.423 896.124 1,432.912 196.805 687.919 935.397 2,028.618 8,666.065 8,603.418 7,978.947 5,706.817 5,039.112 5,268.619 4,690.76 4,681.7 4,832.931
(p-value) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (1) (0.327) (0.03) (0.004) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001)
Bootstr 1,627.618 1,459.733 1,534.486 964.687 959.824 711.141 5,115.023 5,308.184 5,607.416 5,142.519 4,557.184 4,470.688 5,229.428 5,405.18 5,419.757 5,263.469 4,759.555 5,259.559
(p-value) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001)
Comb 1,626.753 1,457.796 1,584.437 1,380.856 1,475.762 1,351.905 4,892.685 5,209.421 5,214.018 1,915.456 1,845.557 1,836.637 2,128.457 2,083.357 2,070.776 1,731.392 1,700.678 1,706.512
(p-value) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (0.91) (0.01) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (0.001) (0.148)
Fig 11. Comparison of standard deviations of prediction errors over the prediction horizons.

Fig 11

Home price index

The home price index (S&P/Case-Shiller U.S. National Home Price Index) is predicted by using the mortgage rate (30-year fixed rate mortgage average in the US) and the expected inflation (survey by University of Michigan).

A total of 250 monthly data points from Feb. 1990 to Dec. 2015 are included, and the data are divided into the six equal subsets as in the previous example. The last 30 instances out of 100 instances in each subset are predicted, and the prediction horizon ranges from 1 to 3.

There exists a vast literature predicting home prices. Many studies on this topic adopt time-series analysis tools [43, 44]. With regard to the factors influencing home price, the expected inflation can have a positive effect on home prices because the inflation rate usually reflects home prices [45]. However, a rise in inflation results in an increase in interest rates and a corresponding reduction in home prices [46]. Hence, the expected inflation has different influences on home prices depending on the relevant economic environment. In general, the mortgage rate is known to have a negative influence on home prices [47, 48]. As the time-series model enables achieving a good prediction performance, the selection of the lag parameter in the time-series model still remains a factor that is not easy to be determined. Moreover, as previous studies [43, 44] adopted the method of combining the different models in order to obtain the stable prediction performance, there also still remains the problem about how to combine models or which models to choose. Therefore, a clustering algorithm could be an alternative for the prediction problem when it is not easy to determine a comprehensive time-series model explaining the whole dataset.

Similar to the case of the stock market index, we can also check the low standard deviations of prediction errors of the aCL methods in Fig 12.

Fig 12. Comparison of standard deviations of prediction errors over the prediction horizons.

Fig 12

Table 8 shows that the aCL methods mainly achieve good prediction performances; however, in the fourth and sixth subsets, the aCL method is unable to yield good prediction performances and the k-means clustering also results in high prediction errors. For this phenomenon, a clue can be found in the heat maps in Fig 13, which show the dissimilarity matrix of the instances from the fourth to the sixth subsets. In the fifth subset, where the aCL method results in low prediction error, the instances from approximately the 50th to 100th appear well-clustered according to the dissimilarity level; thus, the prediction targets from the 70th to the 100th instances can be predicted well by the aCL methods. In contrast, in the fourth and sixth subsets, the instances around the prediction targets do not show well-clustered shapes; thus, we can conclude that methods using the clustering algorithm do not work well at these subsets.

Table 8. Comparison of prediction error (Home price index).
Subset 1 2 3 4 5 6
Ho 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3
OLS 7.828 10.981 13.477 104.179 122.637 159.995 535.21 675.131 860.787 27.34 22.612 27.207 405.931 433.311 477.964 72.431 59.753 68.189
OLSaCL 1.415 3.605 13.186 14.503 59.207 88.479 16.915 89.869 156.927 22.389 37.38 59.258 35.495 36.581 27.162 63.628 139.413 99.749
(p-value) (0.008) (0.125) (0.489) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (0.258) (0.934) (0.999) (<0.001) (<0.001) (<0.001) (0.216) (0.964) (0.724)
LSPD 12.156 16.989 19.974 77.154 88.39 113.302 402.313 491.3 607.367 9.664 14.131 17.752 330.297 348.942 380.806 75.899 63.338 71.756
LSPDaCL 3.056 2.571 0.64 20.338 52.732 65.159 26.75 85.028 125.856 3.74 4.397 10.576 26.787 26.469 18.86 74.36 139.601 106.155
(p-value) (0.001) (0.009) (0.027) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (0.014) (0.006) (0.134) (<0.001) (<0.001) (<0.001) (0.442) (0.971) (0.762)
LASSO 6.664 9.908 12.219 106.23 123.888 159.673 585.304 728.003 961.346 23.275 20.403 24.667 413.923 437.733 477.003 79.21 64.433 76.155
LASaCL 1.274 2.98 9.705 33.559 78.001 87.183 24.095 129.964 233.149 20.15 30.896 43.616 82.133 48.193 26.764 90.636 201.612 148.198
(p-value) (0.011) (0.066) (0.398) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (0.258) (0.934) (0.999) (<0.001) (<0.001) (<0.001) (0.832) (1) (0.907)
Kmeans 17.135 28.166 65.534 25.309 38.026 89.703 52.252 120.376 302.769 97.486 121.879 184.943 4.441 4.316 1.604 218.579 157.704 296.049
(p-value) (<0.001) (<0.001) (<0.001) (<0.001) (1) (0.432) (0.025) (0.235) (0.001) (0.005) (0.01) (0.053) (1) (1) (1) (<0.001) (0.329) (0.001)
Bootstr 9.311 14.371 13.942 118.511 136.676 171.779 582.779 758.247 993.253 29.41 24.612 29.126 396.858 421.812 466.379 72.878 61.841 67.481
(p-value) (0.006) (0.093) (0.473) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (0.191) (0.896) (0.998) (<0.001) (<0.001) (<0.001) (0.203) (0.96) (0.733)
Comb 7.795 10.854 13.35 99.291 117.107 155.867 516.993 657.46 858.033 27.171 22.242 27.019 403.307 429.392 475.835 74.532 60.194 68.216
(p-value) (0.008) (0.127) (0.494) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (<0.001) (0.26) (0.942) (0.999) (<0.001) (<0.001) (<0.001) (0.165) (0.965) (0.726)
Fig 13. Heat maps of the dissimilarity matrix of instances from the fourth to the sixth subset.

Fig 13

The dissimilarity metric is the Euclidean distance between the instances.

Conclusions

Partitioning data produces a more accurate prediction than using a whole dataset when a true function is not known or when setting an estimator for the whole dataset is difficult. Meanwhile, the small-sized cluster produced by simply partitioning data causes high variance problems, which leads to high prediction errors. In this study, we showed that adjusting the size of the last cluster avoids high prediction error. To adjust the size of the last cluster between the whole dataset and the last cluster, we applied the bootstrap method, which has the effect that a given model is trained multiple times by different bootstrapped samples generated from a certain DGP; hence, the model can estimate the reliable location of the last cluster that is optimal for prediction. Therefore, we believe that this paper contributes to understanding that the adjusted last cluster could be optimal for prediction based on the balance between bias and variance by the bootstrap method.

In this paper, the aCL method was shown to reduce prediction errors using the numerical results of both simulated and real data. Notice that the aCL method is not for establishing a complete model fitting a whole dataset but for selecting an optimal subsample for prediction. Therefore, an estimator of simple functional form such as a linear function can be easily used with the aCL method for prediction, and this advantage of the aCL method would serve a practical use in research or application. In addition, the aCL method yielded the size-adjusted last cluster by the bootstrap method, which is easy to be adopted in studies using a clustering algorithm, and thus produces a more reliable subsample for prediction.

Nevertheless, this study has some limitations. The datasets were numerically partitioned, which usually leads to the problem of time consumption that occurs with other clustering algorithms. As described in the second real data example, for the subsamples not suitable for the clustering algorithm, the k-means clustering yielded very high prediction errors; hence, the aCL method was also unable to achieve a good prediction performance. To solve this problem, a more elaborate technique needs to be developed. However, observing the degree of clustering in data before using a clustering algorithm could be helpful to avoid this problem. In addition, as future work, a theoretical study on the aCL method should be done to rigorously prove the reliability of the proposed method.

In summary, we proposed a new method that adjusts the size of a last cluster to achieve accurate prediction. By applying the bootstrap method to find the most reliable aCL, this method could yield low prediction errors. More specifically, the aCL method can be useful in many applications using clustering algorithms for a stable prediction performance.

Acknowledgments

I would like to express deep gratitude to Professor Jaehwan Kim for his constructive suggestions and useful critique of this study.

Data Availability

The minimal data set necessary to replicate the study is available from the following DOI: 10.17605/OSF.IO/TQUYX. (The simulation data sets were randomly generated for each simulation experiment and are not included.)

Funding Statement

The authors received no specific funding for this work.

References

  • 1.Dietterich TG, Kong EB. Machine learning bias, statistical bias, and statistical variance of decision tree algorithms. Technical report, Department of Computer Science, Oregon State University, 1995. Available from: http://www.cems.uwe.ac.uk/~irjohnso/coursenotes/uqc832/tr-bias.pdf. [Google Scholar]
  • 2.Bennett PN. Neighborhood-based local sensitivity: In European Conference on Machine Learning; 2007. Sep: Springer, Berlin, Heidelberg; 2007. 30–41. [Google Scholar]
  • 3.Bubeck S, Von Luxburg U. Overfitting of clustering and how to avoid it. Preprint. 2007. 1–39. Available from: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.527.6339&rep=rep1&type=pdf. [Google Scholar]
  • 4.Ernst J, Nau G. J, Bar-Joseph Z. Clustering short time series gene expression data. Bioinformatics. 2005;21(suppl_1):i159–i168. [DOI] [PubMed] [Google Scholar]
  • 5.Thrun M. C. Projection-based clustering through self-organization and swarm intelligence: combining cluster anaysis with the visualization of high-dimensional data: Springer; 2018. pp. 151–152. [Google Scholar]
  • 6.Pruessner G. Self-organised criticality: theory, models and characterisation Cambridge University Press; 2012. p. 218. [Google Scholar]
  • 7.Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction: Springer; New York; 2013. p. 38, 199. [Google Scholar]
  • 8.Oh KJ, Kim K-j. Analyzing stock market tick data using piecewise nonlinear model. Expert Syst Appl. 2002;22(3):249–255. [Google Scholar]
  • 9.Bishop C, Bishop CM. Neural networks for pattern recognition: Oxford university press; 1995. p. 333. [Google Scholar]
  • 10.Elliott G. Forecasting when there is a single break Manuscript, University of California at San Diego; 2005. Available from: http://www.uh.edu/~cmurray/TCE/papers/Elliott.pdf. [Google Scholar]
  • 11.Pesaran MH, Timmermann A. Selection of estimation window in the presence of breaks. J Econometrics. 2007;137(1):134–161. [Google Scholar]
  • 12.Clark TE, McCracken MW. Improving forecast accuracy by combining recursive and rolling forecasts. Federal Reserve Bank of Kansas City, 2004. Available from:https://files.stlouisfed.org/files/htdocs/wp/2008/2008-028.pdf. [Google Scholar]
  • 13.Horowitz JL. The bootstrap. Handbook of econometrics. 5: Elsevier; 2001. p. 3163. [Google Scholar]
  • 14.Breiman L. Bias, variance, and arcing classifiers. 1996. Available from: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.115.7931&rep=rep1&type=pdf. [Google Scholar]
  • 15.Smith RS, Windeatt T, editors. The bias variance trade-off in bootstrapped error correcting output code ensembles: International Workshop on Multiple Classifier Systems; 2009: Springer, Berlin, Heidelberg; 2009. 1–10. [Google Scholar]
  • 16.Bauer E, Kohavi R. An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Mach Learn. 1998;36(1–2):105–139. [Google Scholar]
  • 17.Fumera G, Roli F, Serrau A, editors. Dynamics of variance reduction in bagging and other techniques based on randomization: International Workshop on Multiple Classifier Systems; 2005: Springer, Berlin, Heidelberg; 2005. 316–325. [Google Scholar]
  • 18.Hassanien AE, Oliva DA. Advances in Soft Computing and Machine Learning in Image Processing: Springer International Publishing; 2017. p. 121. [Google Scholar]
  • 19.Kim JW, Kim JC, Kim JH. Adjusted k-nearest neighbor algorithm. J. Korean Soc. of Marine Engineering. 2018;42(2):127–135. [Google Scholar]
  • 20.Diebold FX, Chen C. Testing structural stability with endogenous breakpoint a size comparison of analytic and bootstrap procedures. J Econometrics. 1996;70(1):221–241. [Google Scholar]
  • 21.Goeman JJ. L1 penalized estimation in the Cox proportional hazards model. Biom J. 2010;52(1):70–84. 10.1002/bimj.200900028 [DOI] [PubMed] [Google Scholar]
  • 22.Greene WH. Econometric Analysis: Pearson Education, India; 2003. p. 193. [Google Scholar]
  • 23.Kim JW, Kim JC, Kim JT, Kim JH. Forecasting greenhouse gas emissions in the Korean shipping industry using the least squares adjusted with pseudo data. J. Korean Soc. of Marine Engineering. 2017;41(5):452–460. [Google Scholar]
  • 24.Schuster EF. Incorporating support constraints into nonparametric estimators of densities. Commun Stat Theory Methods. 1985;14(5):1123–1136. [Google Scholar]
  • 25.Cowling A, Hall P. On pseudodata methods for removing boundary effects in kernel density estimation. J R Stat Soc Series B Stat Methodol. 1996:551–563. [Google Scholar]
  • 26.Breiman L. Using convex pseudo-data to increase prediction accuracy. breast (Wis). 1998;699(9):2. [Google Scholar]
  • 27.Zou H. The adaptive lasso and its oracle properties. J Amer Statistical Assoc. 2006;101(476):1418–1429. [Google Scholar]
  • 28.Pham DT, Dimov SS, Nguyen CD. Selection of K in K-means clustering. Proc. Inst. Mech. Eng. Pt. C J. Mechan. Eng. Sci. 2005;219(1):103–119. [Google Scholar]
  • 29.Hansen BE. Least-squares forecast averaging. J Econometrics. 2008;146(2):342–350. [Google Scholar]
  • 30.Diebold FX, Mariano RS. Comparing predictive accuracy. J Bus Econ Stat. 2002;20(1):134–144. [Google Scholar]
  • 31.Tseng GC. Penalized and weighted K-means for clustering with scattered objects and prior information in high-throughput biological data. Bioinformatics. 2007;23(17):2247–2255. 10.1093/bioinformatics/btm320 [DOI] [PubMed] [Google Scholar]
  • 32.Ma CK, Kao GW. On exchange rate changes and stock price reactions. J Bus Finan Account. 1990;17(3):441–449. [Google Scholar]
  • 33.Kwon CS, Shin TS. Cointegration and causality between macroeconomic variables and stock market returns. Global Finance J. 1999;10(1):71–81. [Google Scholar]
  • 34.Chen N-F, Roll R, Ross SA. Economic forces and the stock market. J Bus. 1986:383–403. [Google Scholar]
  • 35.Ou P, Wang H. Prediction of stock market index movement by ten data mining techniques. Mod Appl Sci. 2009;3(12):28. [Google Scholar]
  • 36.Fernández-Rodríguez F, Sosvilla-Rivero S, Andrada-Félix J. Nearest-neighbour predictions in foreign exchange markets Computational Intelligence in Economics and Finance: Springer, Berlin, Heidelberg; 2004. pp. 297–325. [Google Scholar]
  • 37.Barkoulas J, Baum CF, Chakraborty A. Nearest-neighbor forecasts of US interest rates. International Journal of Banking and Finance. 2003;1(1):119–135. [Google Scholar]
  • 38.Cont R, Nitions D. Statistical properties of financial time series. 1999. Available from: http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=FF27A44E7634D28DD37321E26DD7EAF3?doi=10.1.1.32.9235&rep=rep1&type=pdf. [Google Scholar]
  • 39.He H, Chen J, Jin H, Chen S-H, editors. Stock Trend Analysis and Trading Strategy. JCIS; 2006. Available from: http://users.cecs.anu.edu.au/~hdjin/publications/2006/CIEF-165.pdf. [Google Scholar]
  • 40.Pavlidis NG, Plagianakos VP, Tasoulis DK, Vrahatis MN. Financial forecasting through unsupervised clustering and neural networks. Operational Research. 2006;6(2):103–127. [Google Scholar]
  • 41.Cai F, Le-Khac N-A, Kechadi T. Clustering approaches for financial data analysis: a survey. arXiv preprint arXiv:160908520. 2016. Available from: https://arxiv.org/pdf/1609.08520. [Google Scholar]
  • 42.Bhar R, Hamori S. Empirical Techniques in Finance: Springer; Berlin Heidelberg; 2006. p. 46. [Google Scholar]
  • 43.Rapach DE, Strauss JK. Forecasting real housing price growth in the eighth district states. Federal Reserve Bank of St Louis Regional Economic Development. 2007;3(2):33–42. Available from: http://research.stlouisfed.org/publications/red/2007/02/Rapach.pdf. [Google Scholar]
  • 44.Gupta R, Kabundi A, Miller SM. Forecasting the US real house price index: Structural and non-structural models with and without fundamentals. Econ Modelling. 2011;28(4):2013–2021. [Google Scholar]
  • 45.Piazzesi M, Schneider M. Inflation and the price of real assets: Federal Reserve Bank of Minneapolis, Research Department; 2009. Available from: https://pdfs.semanticscholar.org/8cc9/15dc446e7d4d07198d9d51b0072b75336134.pdf. [Google Scholar]
  • 46.Lessard DR, Modigliani F. Inflation and the housing market: Problems and potential solutions. 1975. p. 15 Available from: https://www.bostonfed.org/-/media/Documents/conference/14/conf14c.pdf. [Google Scholar]
  • 47.Kearl JR, Mishkin FS. Illiquidity, the demand for residential housing, and monetary policy. J Finance. 1977;32(5):1571–1586. [Google Scholar]
  • 48.Hendershott PH, Bosworth BP, Jaffee DM. Real user costs and the demand for single-family housing. Brookings Pap Econ Act. 1980;1980(2):401–452. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The minimal data set necessary to replicate the study is available from the following DOI: 10.17605/OSF.IO/TQUYX. (The simulation data sets were randomly generated for each simulation experiment and are not included.)


Articles from PLoS ONE are provided here courtesy of PLOS

RESOURCES