Abstract
The time-varying kernel density estimation relies on two free parameters: the bandwidth and the discount factor. We propose to select these parameters so as to minimize a criterion consistent with the traditional requirements of the validation of a probability density forecast. These requirements are both the uniformity and the independence of the so-called probability integral transforms, which are the forecast time-varying cumulated distributions applied to the observations. We thus build a new numerical criterion incorporating both the uniformity and independence properties by the mean of an adapted Kolmogorov–Smirnov statistic. We apply this method to financial markets during the onset of the COVID-19 crisis. We determine the time-varying density of daily price returns of several stock indices and, using various divergence statistics, we are able to describe the chronology of the crisis as well as regional disparities. For instance, we observe a more limited impact of COVID-19 on financial markets in China, a strong impact in the US, and a slow recovery in Europe.
Keywords: Bandwidth selection, divergence statistics, financial crisis, kernel density, probability integral transform
1. Introduction
The knowledge of the distribution of price returns is overriding in finance. Indeed, forecasts and risk measures, such as the Value-at-Risk (VaR), the expected shortfall, or even the volatility, can be seen as scalars calculated from a probability density function (pdf). Practitioners appreciate these scalars for their simplicity but the pdf contains relevant and more comprehensive information. An accurate description of the pdf is thus worthwhile in a financial perspective.
For this reason, pdf in finance should not be limited to the popular Gaussian distribution. More realistic parametric distributions have thus been put forward [32], like the NIG distribution [13] or the alpha-stable one [43], among many others. Besides, the non-parametric alternative makes it possible to depict more accurately the real pdf but it may be subject to overfitting if it does not include any regularization. For this purpose, Beran has proposed the minimum Hellinger distance approach, in which a non-parametric pdf has to be estimated first, before being approximated by a parametric distribution [4]. This approach finds some applications in finance [29]. Other semi-parametric approaches include the distortion of a parametric density in order to take into account higher-order empirical moments, using for example an Edgeworth expansion [16,26], which also has some applications in finance [35]. Finally, non-parametric approaches, like the kernel density, include a smoothing parameter, called bandwidth, which is supposed to balance accuracy and statistical robustness [44,47]. Selecting an appropriate bandwidth is a hard task often left aside by practitioners in finance, but some statistical methods propose criteria that the bandwidth should minimize, like the asymptotic mean integrated square error (AMISE) [22].
In addition, as illustrated by financial crisis, bad economic news do in general not only result in one extreme daily price variation but can initiate a longer turmoil period. This alternation of market regimes motivates to introduce time-varying densities. Once again, several approaches are possible, depending on the fact that we consider a parametric pdf [1] or a non-parametric one [19]. We are particularly interested in the non-parametric approach, which offers the possibility to reach a higher accuracy. We stress the fact that applications of time-varying kernels in finance are not limited to estimating a pdf of price returns. They indeed include interest rate models [51] or the study of the dynamic pdf of correlation coefficients with a particular shape during tumble periods [27].
Time-varying kernel densities rely on the choice of two important free parameters: the bandwidth smooths the pdf at a given date, exactly as in the static approach, whereas the discount factor smooths the variations in time. Harvey and Oryshchenko propose a maximum likelihood approach to select these free parameters [19]. However, the literature about density validation requires stronger properties, namely the fact that the cumulative distribution function (cdf) of price returns must form a set of iid uniform variables, known as the probability integral transforms (PITs) [11]. In the present paper, we thus propose a new selection rule of the bandwidth and of the discount parameter, so that it is consistent with the validation rule of the pdf. The main challenge is then to build the criterion, that is to define a function of the bandwidth and of the discount factor that we intend to minimize. Indeed, the traditional approach for validating an estimated pdf consists in a series of statistical tests and graphical analysis, not in a sole numerical criterion. The criterion we propose relies on a Kolmogorov–Smirnov statistic, that we can replace by any statistic of distribution divergence. We also adapt this statistic so as to minimize the discrepancy of the series of the PITs for which we need the independence. Our work is not the first one to stress the limitations of the maximum likelihood approach in the selection of the two free parameters. We can indeed cite an article following a least-square approach [40] and another one maximizing a uniformity criterion for the PITs by the mean of an artificial neural network [48]. Our method differs from the latter as it also takes into account the independence of the PITs and it requires only standard statistical tools compared to artificial neural networks.
We apply our new method to several stock indices before and during the financial crisis induced by the onset of the COVID-19 in the US, in Europe, and in Asia. The question of the impact of the pandemic on stock markets is a hot topic. Several papers deal with this subject and stress the exceptional amplitude of the crisis [2,3,37]. We propose here a new outlook on this financial crisis, using the time-varying kernel densities to describe its chronology. We also study the significance of the daily kernel density with respect to the pdf in a steady market. This makes it possible to determine the interval of dates for which the distribution of price returns significantly indicates a financial crisis. In particular, we observe that the speed at which markets recover varies a lot among the regions considered.
The paper is organized as follows. In Section 2, we introduce the method for estimating a dynamic kernel density along with the selection rule for the bandwidth and the discount factor. In Section 3, we compare this selection method to another one based on a likelihood criterion, with the help of simulations. In Section 4, we apply the introduced method to stock markets during the COVID-19 crisis. Section 5 concludes.
2. Statistical methods
In this section, we introduce a method to estimate a time-varying density. For this purpose, we recall how we can estimate a static non-parametric density as well as its dynamic adaptation. This method relies on the choice of two free parameters. The main innovation of this paper consists in basing this choice on a quantitative version of criteria usually devoted to the evaluation of forecast densities. The last subsection is about some divergence statistics between two densities. We will use these divergences in the empirical part of this paper to quantify the amplitude of the variations of the densities through time and to determine the significance of these variations.
2.1. Kernel density
A widespread non-parametric method to estimate a pdf uses kernels. The kernel density, estimated on observations , is defined, for , by:
| (1) |
where h>0 is the bandwidth and K a function following the same rules as a pdf, namely it is positive, integrable and its integral is one [44,47]. With these two properties, also has the features of a density. In particular, when integrating , the substitution in each of the t integrals in the sum clearly shows that we need to normalize the sum by th in order to have . The symmetry and the continuity of the kernel is also often desirable.
The rationale of the kernel density is to make a continuous generalization of a histogram. Indeed, in the histogram, we count the number of occurrences in given intervals. The thinner the intervals, the more accurate the density estimation. But very thin intervals lead to overfitting, with a very erratic estimated density. To avoid this, we prefer to smooth the histogram. A simple manner to do this consists in replacing the number of occurrences of observations in each thin interval by a criterion of proximity of each observation to the middle of this interval. This is how the kernel density works. The proximity function K must thus reach its maximum in zero and it must decrease progressively when its argument gets away from zero. Thus, the impact of on the estimated pdf in x is maximal for and it decreases progressively when becomes higher until reaching zero, at least asymptotically. It means that the observation of will have no impact on the density if is by far greater or lower than x.
There exists a large literature on the choice of the kernel K [6,39]. Epanechnikov and Gaussian kernels are widespread, due to their simplicity. 1 But it seems, according to the related literature, that the choice of the kernel is often less overriding than the choice of the bandwidth h. Indeed, this parameter plays the role of a regularization parameter. In practice, we tune h in order to balance accuracy and robustness. The larger h, the wider each kernel and the larger the interval on which each observation has an impact. We review in Section 2.4 some methods to select this bandwidth.
2.2. Dynamic kernel density
We can change the formulation of the kernel density in order to take into account its progressive evolution through time. We get this dynamic version of the kernel density by the mean of weights :
| (2) |
such that [19]. For a fixed t, if the weights increase with i, more recent observations will be overweighted and the update of the kernel density is consistent with the economic intuition. The exponential weighting is widespread in the statistical literature as it can reduce the computation of a density update to a simple recursive formula instead of the linear cost induced by the whole estimation of the density from scratch. We then express the weights by
| (3) |
where and is the time at which we start estimating the dynamic density ( in Equation (3)). With this setting for the weights, we note the density introduced in Equation (2). When the duration of the initial estimation sample is large enough with respect to the speed of decay of the weights, that is when , then tends to . Otherwise, the inequality must hold to have increasing weights with respect to i at a given estimation time t. 2 In the empirical study in Section 4.1, the highest value for ω estimated with the proposed method across all the series is 0.974 and we have considered corresponding approximatively to the 1120th date of the dataset, so that we are very close to the asymptotic framework, with . More generally, whatever the value of and using the weights defined in Equation (3), the recursive formula of the dynamic kernel density is, for any :
| (4) |
The two free parameters of this dynamic non-parametric density are the bandwidth h, and the discount factor ω. In practice, we start at the given time from a density estimated thanks to Equation (2) and we obtain the density at subsequent times iteratively by applying Equation (4).
Along with the time-varying density, we can build the corresponding cdf. Integrating Equation (2), the first estimated cdf, at time , is:
where is the primitive of K such that . Subsequently, we get the cdf at a time t + 1 by the mean of the following iteration:
which is the primitive of Equation (4).
Other approaches are possible for estimating a time-varying density. For example, we could have estimated static densities on successive intervals and have then smoothed the transition between the resulting densities. For parametric densities, this amounts to smoothing time-varying parameters, which is a well-known subject in statistics [14]. However, this approach is not very natural for non-parametric densities, as we need a big amount of data to estimate one static density.
2.3. Evaluation of the quality of the dynamic density
The purpose of density forecast may vary a lot. In a financial perspective, one may need it to build risk measures, or to forecast an average price return or a most likely price return. In practice, an investment decision is to be made relying on this density. One must then evaluate the quality of the forecast with respect to a loss function corresponding to this decision. Unfortunately, there cannot exist any absolute ranking of density forecasts valid for all the possible loss functions [11].
We therefore have to make a choice which is necessarily subject to discussion. In the econometric literature, we can find an evaluation of the density forecast by the mean of the likelihood of observations [19]. We think that this choice, as focusing on the body of the distribution, neglects the behavior of the density in its tails. A more general perspective would motivate to choose a density forecast consistent with the real density, even with its tails. Such a forecast would be more relevant in finance for calculating a VaR or an expected shortfall. However, the real density is never observed. If we had a static density forecast, the evaluation of this forecast could simply consist in comparing it with the empirical density of all the observed price returns. But the forecast density is supposed to change at each time, so that we have to base our analysis on another invariant density. This is the purpose of the analysis of the PITs, introduced by Diebold, Gunther, and Tay [11] and widespread in the literature of evaluation of density forecasts [17,19,24]. We now set out this method, that we will adapt in the next subsection from the evaluation of density forecasts to the selection of the bandwidth and of the discount factor.
We observe T successive price returns: . We use the first to build a density estimation , using Equation (2). This density includes a discount in order to depict more closely recent observations. We thus conceive as a forecast of the true density of , as well as we conceive , for any , as a forecast of the true density of . Of course, varies with t, and we only observe one random variable in this density, namely . However, we are able to build a density which does not depend on t and which will therefore be very useful for evaluating the quality of the density forecast. This invariant distribution is the one of the PIT variables, which are defined by:
Indeed, if our forecast is good, that is if , , where is the true cdf, then, whatever t, assuming that is continuous and nonzero for , follows a uniform random variable in the interval : [11]. In fact, this idea is quite old [38] and is even something with which any person simulating random variables following a given cdf is familiar. In addition to being uniform, under the same condition on for , the variables must also be independent [11].
Thanks to the PITs, we have observations in the same uniform density. This makes it possible to evaluate the density forecast: we have to check that are indeed iid and uniform in . We expose in the next subsection the difference of approach regarding this point between the evaluation of density forecast and our framework, which is about the selection of optimal parameters.
2.4. Selection of the bandwidth and of the discount factor
The literature about bandwidth selection is very rich [22,44,47]. Beyond the rule-of-thumb which often leads the choice of the bandwidth h among practitioners, we can cite more relevant methods of selection, such as the minimization of AMISE [42], evaluated for instance with cross validation or a plug-in technique. As exposed in the previous subsection, we will try to select h in order to make the distribution of the PITs close to a uniform distribution, and the uniform case is trivial in the AMISE approach and makes this method ineffective. We can also cite the possibility to estimate a time-varying bandwidth, like in the literature about online estimation of kernel density [25,49]. Our approach will be different from the online framework: our time-varying aspect is not about h but about the density.
Our problem, in addition, is not only about selecting h. We have to select it jointly with the discount factor ω. As already mentioned, we can base this selection on the maximization of a likelihood [19]. But we want to have an accurate description of the true density, not to make the best point forecast. This thus motivates us to use PITs and to adapt the method of evaluation of density forecasts. We have two objectives regarding the PITs: the uniformity and the independence.
We first focus on the uniformity. According to Diebold, Gunther, and Tay, methods based on statistical tests, such as the Kolmogorov–Smirnov test, are not relevant because nonconstructive, insofar as they do not indicate why PITs are not uniform [11]. They thus prefer a qualitative analysis using graphical tools such as a simple correlogram. Besides this mainstream approach, some papers propose a statistical test assessing the uniformity of the PITs [5]. Our framework is in fact different as we do not want to determine whether our density forecast is good or not. Instead, given a density model, we only want to select its best parameters, here h and ω. Maybe our forecast will be poor, even though the non-parametric approach makes this case unlikely, but we will have done the best with respect to the model used. We thus do not want to test the consistence of our PITs with a uniform distribution, but we select the parameters h and ω minimizing some test statistic. We choose to minimize the Kolmogorov–Smirnov statistic, k, because it is widespread and easy to understand. This statistic is simply the maximum gap between the empirical cdf and the theoretical one, which, in our case, is uniform:
| (5) |
But the Kolmogorov–Smirnov statistic says nothing about the independence of the variables and the fact that the sampling is random [12]. This property is however crucial and its absence could lead to nonsense estimations [9,21]. In the standard approach regarding the evaluation of density forecast, the independence is assessed by graphical tools, such as a correlogram [11]. We would like again a more systematic approach. We thus use an additional criterion coming from the literature of simulation of quasi-random variables. We indeed want our series of PITs to be a low-discrepancy sequence [18,33,34,45]. This is even more important that we want to estimate a time-varying density of price returns in a regime-switching market. The rationale behind the discrepancy is that the uniformity must be a feature not only of isolated PITs but also of sequences of PITs: the sequence must be equidistributed [11]. This method will avoid almost static densities in which price returns are globally well distributed, but with mainly high PITs in a bullish regime and then mainly low PITs during a crisis period. The discrepancy criterion we propose to minimize is then the multivariate uniformity Kolmogorov–Smirnov statistic for a given size of sequence of PITs, taking into account the targeted independence of the PITs. This independence, along with the uniformity, makes that the theoretical joint distribution of the sequence is in the targeted case, as soon as . We focus on sequences of dimension 2 because the discrepancy statistics are difficult to compute for larger dimensions [23,28]. In particular, for a pair of observations with a given time lag , we define:
In the definition of , the case is not allowed. Indeed, in this case, the two PITs and are not to be independent from each other because they are equal. However, we can define another statistic , for , which includes both the independence of the PITs for , thanks to , and their univariate uniformity, thanks to k:
Besides, the multivariate Kolmogorov–Smirnov statistic depends on the size of the sample. We thus consider a size-adapted aggregation of the . Indeed, for a sample of observations, has a limit distribution which does not depend on n [10]. We also choose a maximal lag ν above which we will not consider dependence effects. Indeed, if the time lag is too big the number of observations is very limited and the asymptotic Kolmogorov distribution may not apply. 3 We thus propose the following size-adapted statistic: 4
Finally, the optimal bandwidth and discount factor are defined as the parameters minimizing this uniformity and discrepancy statistic:
| (6) |
By doing so, we do not exactly target the independence of any sequence of observations. Instead, our simplified statistic aims at obtaining the independence of pairs of observations contained in a time interval of duration ν, like with the correlogram approach suggested by Diebold, Gunther, and Tay [11].
We also propose a constrained version of this optimization problem. Indeed, the above unconstrained problem may lead to a dynamic of densities far from the economic intuition, for example with very rough densities. In practice, the time-varying densities we have built with this method seem empirically robust, at first sight. But we are interested in defining some reasonable bounds for h or ω. In order to have a robust time-varying density, we want that an isolated observation does not change too much the density between two consecutive dates. To state things quantitatively, we want to limit the Kolmogorov–Smirnov statistic between two densities at consecutive dates. We propose as an upper bound. With a daily bound, the maximal change of the Kolomogorov–Smirnov statistic, which is 1, cannot be reached before the horizon ν. The link between the bound and the parameters is straightforward, as the update of the cdf leads to an increase of the cdf at one point of at most . Therefore, we introduce a new bound for ω and the constrained problem is as follows:
| (7) |
We could also want to set bounds to h in order to secure the robustness of the density at one date instead of the robustness across time. Nevertheless, we consider that the robustness across time is enough. Indeed, as the density will not change very rapidly, each density will be a fairly good forecast for observations close in time.
2.5. Amplitude of the variations of the series of densities
In the method exposed above to select h and ω, we minimize the divergence between an empirical distribution and a uniform one. In particular, we use the Kolmogorov–Smirnov statistic to depict this divergence because of both its simplicity and its asymptotic behavior. But other divergence metrics could replace the Kolmogorov–Smirnov statistic in this method.
We can also use various divergence statistics to quantify to which extent the estimated pdf is different from what it was at a reference date and thus track the evolution of the pdf through time. This is what these various statistics are devoted to in the empirical part of this paper. In addition, thanks to simulations, we will determine confidence intervals for each of these divergences at each date, so that we will be able to assess whether the evolution of the pdf through time is significant or not. We now review three of these divergence statistics in addition to the Kolmogorov–Smirnov statistic. Some are based on a comparison of densities, and others on a comparison of cdfs or even of quantiles.
First we recall the definition of the Kolmogorov–Smirnov statistic between the cdfs and :
Whereas the Kolmogorov–Smirnov statistic considers the maximal difference between two cdfs, the Hellinger distance is the cumulated difference between densities:
The p-Wasserstein distance, given , is related to the optimal transportation theory [46]. It is the minimal cost to reconfigure one pdf in another one. The Kolmogorov–Smirnov statistic is the distance between the cdfs, whereas the Wasserstein metric is the distance between their quantiles. The p-Wasserstein distance is indeed defined by [36]:
In this paper, we focus on the case p = 1, for which the Wasserstein distance is also equal to the distance between cdfs [36]. It thus clearly generalizes the Kolmogorov–Smirnov statistic. We will see in the empirical part of this paper that this generalization may be more appropriate than the Kolmogorov–Smirnov statistic to assess the significance of the variations of the distribution. Indeed, the occurrence of several extreme observations may not impact significantly the Kolmogorov–Smirnov statistic, which mainly focuses on the body of the distribution, whereas the Wasserstein distance takes into account the whole distribution. On the other hand, since a uniform distribution is not subject to a dichotomy between body and tails, the Kolmogorov–Smirnov statistic seems appropriate for assessing the uniformity of the PITs.
As opposed to the other divergences, the Kullback-Leibler divergence is not strictly speaking a distance function as it is not symmetric in the two densities. It is related to Shannon's entropy. It is defined by:
All these divergences can be generalized easily if we work with a discrete grid instead of . They are also always positive and with a value of zero if .
3. A simulation study
The purpose of this simulation study is to confirm with simulated data the relevance of the selection method for h and ω, introduced in Equation (6). We thus introduce an alternative selection criterion and compare the performance of the two methods in forecasting a simple simulated time-varying density.
The alternative selection method we use, in this time-varying framework, is the one proposed by Harvey and Oryshchenko. They use a maximum likelihood approach to select the free parameters h and ω [19]:
| (8) |
We now compare the two methods defined by Equations (6) and (8). It appeared to us that our approach performs well when the simulated distribution has fat tails. Therefore, we generate 2, 000 independent variables of time-varying density , which is a Cauchy density with scale parameter equal to 1 and time-varying location parameter equal to t/100:
| (9) |
Incidentally the t-th variable equivalently follows a Student's distribution with one degree of freedom, translated by a quantity t/100.
The particularity of this kind of distribution is the frequent occurrence of high values, possibly very far from all the past observations. Likelihood-based selection methods of the free parameters of the kernel density thus require a consequent smoothing, that is a high h and a high ω, in order to avoid very low likelihoods for these big observations. On the contrary, a method based on a cdf criterion, such as the one we put forward, does not have a similar constraint. As a consequence, using the t first simulated data to estimate the density at time t + 1, for t evolving between 1, 000 and 1, 999, we find smaller free parameters in the PIT-based approach than in the likelihood method: , , , and .
Since we know the true density, we can compare the estimated densities in t with . We see for example in Figure 1 that the likelihood method leads to a smoother density whose bulk is also more shifted, because of the time-varying aspect, than the PIT-based approach. The divergence statistics introduced in Section 2.5 reveal that our method, which explicitly uses a Kolmogorov–Smirnov criterion, leads to a better estimate of the time-varying density than the likelihood method, but only for the Kolmogorov–Smirnov statistic, as one can see in Figure 2. This suggests that our PIT-based approach could benefit from an adaptation to other metrics than the sole Kolmogorov–Smirnov criterion, depending on the preferred divergence statistic. This could be the subject of further research.
Figure 1.
For the time-varying Cauchy distribution, true pdf (dotted line) and estimates at time t + 1 = 2, 000. The three solid lines represent the estimates with the three competing vectors of parameters: (black curve), (light grey), and (dark grey).
Figure 2.
Divergence statistics with respect to , as a function of the instant t, when is the time-varying Cauchy density defined in Equation (9): the Kolmogorov–Smirnov statistic (top left), the Hellinger distance (top right), the Wasserstein distance (bottom left), and the Kullback-Leibler divergence (bottom right). The three curves represent the three competing vectors of parameters: (black curve), (light grey), and (dark grey). The more erratic aspect of the black curves is to be explained by the lower selected free parameters.
Although the Kolmogorov–Smirnov statistic shows a good performance of the PIT-based approach, compared to the likelihood method, Figure 1 is not totally satisfactory. It shows undoubtedly a better fit of the bulk and thus of the time-varying mode of the distribution, but one also observes bumps in the tails at some dates. We understand that this undesirable effect is a consequence of the low discount parameter which is necessary to catch the time-varying trend of the distribution. A trade-off can be found between the good time-varying representation of PIT-based approach and the better fit of the tails for the likelihood approach. We propose a censored PIT method, in which the Kolmogorov–Smirnov criterion introduced in Equation (5) is modified to take into account only the PITs which are extreme in a uniform distribution, that is by replacing the interval over which the maximum is calculated by the subset , where p = 0.05 in the numerical application. 5 We obtain the parameters and . Figure 1 shows a better representation of the tails in the censored approach than in the original PIT method. Figure 2 indicates together lower divergence statistics than those of the likelihood approach. The PIT-based approach thus offers a flexibility which could be deepened in further research to match specific desirable properties of the estimated time-varying density.
One could wonder if the relative superiority, at least regarding the Kolmogorov–Smirnov statistic, of the standard PIT-based approach over the likelihood method also holds when the true density is static. We thus simulate 2, 000 i.i.d. Cauchy variables, with static density equal to . We estimate a static kernel density and cdf , like in Equation (1), using only the 1, 000 first observations. This estimate is based on a bandwidth h that we select following either a PIT validation criterion,
or a likelihood criterion,
evaluated on the 1, 000 last observations of our simulated dataset. In other words, we use a straightforward adaptation of Equations (6) and (8) in the static framework.
Like in the dynamic framework, the PIT-based method leads to a less smooth density than the likelihood method: and . All the divergence statistics with respect to the true density underline the superiority, in this case, of the PIT-based approach: 0.027 (0.093 for the likelihood method) for the Kolmogorov–Smirnov statistics, 0.113 (0.147) for the Hellinger distance, 0.513 (0.890) for the Wasserstein distance, and 0.032 (0.076) for the Kullback-Leibler divergence.
4. Financial application
In this section, we first present an empirical study of stock indices during the COVID-19 crisis, using dynamic kernel densities. Then, we discuss the practical implications of this method in finance.
4.1. Empirical results
We now apply the PIT-based method to the estimation of time-varying densities of several stock indices. We consider American indices (NASDAQ Composite, S&P 500, S&P 100), European indices (EURO STOXX 50, Euronext 100, DAX, CAC 40), and Asian indices (Nikkei 225, KOSPI, SSE 50), with a particular focus on S&P 500, EURO STOXX 50, and the South-Korean KOSPI indices. We have used data from Yahoo finance in the time interval from 04/17/2015 to 05/28/2020. The study period includes the economic crisis related to the COVID-19. In particular, we study the impact of the COVID-19 on three stock markets corresponding to economic areas with different crisis management regarding the pandemic. The questions we want to answer are about the significance of this impact and the characterization of a recovery after the peak of the crisis.
We have estimated daily a pdf of daily price returns from the date corresponding to November 1st, 2019. These densities include observations from 2015, exponentially weighted with an optimal discount factor depending on the index. We provide these optimal discount factors in Table 1, along with the optimal bandwidth, determined by Equation (6), as well as the constrained version defined by Equation (7) and the likelihood-based version proposed by Harvey and Oryshchenko as in Equation (8).
Table 1.
Optimal bandwidth and discount factor minimizing the criterion , for several stock indices for densities between November 2019 and May 2020.
| NASDAQ Composite | 0.875 | 0.962 | 0.873 | |||
| S&P 500 | 0.864 | 0.955 | 0.838 | |||
| S&P 100 | 0.889 | 0.963 | 0.847 | |||
| EURO STOXX 50 | 0.883 | 0.964 | 0.989 | |||
| Euronext 100 | 0.872 | 0.956 | 0.834 | |||
| DAX | 0.856 | 0.959 | 0.973 | |||
| CAC 40 | 0.780 | 0.957 | 0.847 | |||
| Nikkei 225 | 0.911 | 0.965 | 0.996 | |||
| KOSPI | 0.884 | 0.957 | 0.869 | |||
| SSE 50 | 0.914 | 0.974 | 0.996 |
Note: The constrained version is and and and are obtained from the method of Harvey and Oryshchenko, following Equation (8).
For the rest of the empirical study, we consider a common pair of parameters for all the indices, so as to make fair comparisons. Focusing on the constrained case reported in Table 1, we choose the highest estimated bandwidth, to ensure robustness of the densities, and the lowest discount parameter, so as to have the highest responsiveness of the dynamics of densities: and . We remark that these values are close to the median values obtained for an alternative method described in Appendix A.
For the S&P 500, EURO STOXX 50, and KOSPI indices, we display in Figure 3 the estimated dynamic pdf of price returns at four dates which illustrate the chronology of the impact of the pandemic on financial markets:
before the crisis: on the 16th December 2019, in a period where the markets were steady,
at the first turmoil in the markets, the 7th February 2020,
at the peak of the pandemic, which occurs at a different date for each market,
at the end of our sample, the 28th May 2020.
Figure 3.
Estimated dynamic pdf of daily price returns for S&P 500 (top left), EURO STOXX 50 (top right), and KOSPI (bottom) indices.
We determine the date of the peak of the pandemic as the date t maximizing the Hellinger distance of the density with respect to the estimated pdf in , that is . This peak does not follow an epidemiological definition, since we only observe financial data. It corresponds to a maximal divergence with the steady state of the market.
For the KOSPI and the EURO STOXX 50, the pdf before the crisis looks like a Gaussian distribution, with thin tails. Then the pdf slightly widens on the losses side. At the peak of the crisis, the pdf crushes, with very fat tails. After the peak, it tends to an asymmetric distribution with a negative skewness and slowly decreasing tails. The chronology is similar for the S&P500, except that the pdf the 7th February is similar to the one before the crisis. It may indicate a low responsiveness of the US market in front of the outbreak. Or it may denote a temporary lag in the impact of the COVID-19 on the US market, reflecting the lag in the spread of the outbreak in the region.
Displaying pdfs at several dates as in Figure 3 makes it possible to depict the chronology of the crisis. But it is limited since displaying this density every day of our sample would make the figure unreadable. Therefore, instead of displaying each pdf, we display one statistic per day. This statistic must reflect the divergence of the pdf with respect to a steady state of markets. We thus determine the Kolmogorov–Smirnov statistic, the Hellinger distance, the 1-Wasserstein distance, as well as the Kullback-Leibler divergence of the pdf each day with respect to the pdf in . Results are displayed in Figure 4. Whatever the divergence statistic, we observe first a slight increase from 0 toward a low positive value until the beginning of the crisis, where the divergence sharply increases till the peak where it begins to slowly decrease. This last phase corresponds to the slow recovery of the markets after the crisis.
Figure 4.
Daily evolution through time of four divergence statistics: the Kolmogorov–Smirnov statistic (top left), the Hellinger distance (top right), the Wasserstein distance (bottom left), and the Kullback-Leibler divergence (bottom right). The curves correspond to S&P 500 (black), EURO STOXX 50 (dark grey), and KOSPI (light grey) indices. The dotted lines are simulated confidence intervals, with confidence levels, from the bottom to the top: , and .
In addition to the evolution through time of the four divergence statistics, Figure 4 shows confidence intervals for each statistic. These confidence intervals come from the simulation of 10,000 Brownian motions on which we apply our method of density estimation and implementation of divergence statistics. The null hypothesis is thus that all the price returns are iid Gaussian variables. For each statistic and each date, we represent the quantile estimated on simulations and corresponding to three confidence levels: , and . At a given date t, for a given stock index, if the divergence of the current pdf with respect to the pdf in is above a particular curve of the confidence interval, we reject with the corresponding confidence level p. In other words, we consider the pdf in t to be significantly different from the pdf in with a confidence level p.
Depending on the divergence considered, we are able to determine the peak of the impact as the date maximizing the statistic. We display in Table 2 the date of the peak as well as the value of the divergence at the peak, before the crisis, and late May in the Hellinger approach. According to this table, the strongest impact is in the US but the recovery seems faster there than in Europe. The smallest impact and the fastest recovery is by far in China. The peak occurs between the 25th March (China and South-Korea) and the 6th April (US and Japan), whatever the index, except for the DAX, whose peak is in May. We use the other divergences as a robustness check of these results. The conclusions are in fact similar: small impact and almost total recovery late May for the Chinese SSE 50 index, strongest impact on the US market, slowest recovery on the European market. We also observe some variations in the estimation of the peak date. The most surprising one is provided by the Kolmogorov–Smirnov statistic, according to which the peak is reached first in Europe the 18th March, before continental Asia and US the 23rd March, and finally Japan the 2nd April.
Table 2.
Hellinger distance H with respect to . The peak corresponds to when the maximal Hellinger distance is reached. Dates are in 2020.
| H on 7th Feb. | H at the peak | Date of the peak | H on 28th May | |
|---|---|---|---|---|
| NASDAQ Composite | 0.111 | 0.531 | 2020-04-01 | 0.295 |
| S&P 500 | 0.084 | 0.562 | 2020-04-06 | 0.363 |
| S&P 100 | 0.089 | 0.562 | 2020-04-06 | 0.324 |
| EURO STOXX 50 | 0.051 | 0.466 | 2020-03-27 | 0.398 |
| Euronext 100 | 0.050 | 0.484 | 2020-03-27 | 0.385 |
| DAX | 0.061 | 0.458 | 2020-05-05 | 0.377 |
| CAC 40 | 0.052 | 0.479 | 2020-03-27 | 0.389 |
| Nikkei 225 | 0.095 | 0.477 | 2020-04-06 | 0.350 |
| KOSPI | 0.083 | 0.518 | 2020-03-25 | 0.294 |
| SSE 50 | 0.070 | 0.381 | 2020-03-25 | 0.122 |
We stress the fact that the alternative chronology of the peaks is not the only particularity of the Kolmogorov–Smirnov statistic with respect to the three other divergence statistics we have implemented. For instance, the significance of the financial crisis in some regions is questionable according to this divergence statistic, as one can see in Figure 4. We can nevertheless explain this striking, and certainly dubious, conclusion. Indeed, when simulating two sets of iid random variables, we get two kernel densities but the Kolmogorov–Smirnov statistic focuses on only one quantile, generally corresponding to where the cdf is the steepest, that is in the body of the distribution. If we disrupt one of these two densities with a limited number of outliers, the Kolmogorov–Smirnov statistic may not change a lot as this modification mainly impacts the tails and not the body of the distribution. On the contrary, the three other divergence statistics are less robust to outliers as they are defined by integrals over all the distribution. Their responsiveness to a crisis is thus higher. For this reason, we prefer them to the Kolmogorov–Smirnov statistic for assessing the significance of the variations of a dynamic pdf.
4.2. Discussion on practical implications
The accurate and dynamic description of the kernel density of price returns has various practical implications in finance. It may indeed be useful for helping investors to make an investment decision, but also for market regulators.
The most widespread method in portfolio selection is the mean-variance approach of Markowitz [30]. The two first moments of the distribution are enough in this framework. They can be derived from a kernel density, but the complexity of this tool seems excessive compared to the simplicity of a direct estimation of the moments. Moreover, while dynamic kernel densities may look appealing for calculating time-varying means and variances, a direct dynamic estimation of these moments is also possible. In fact, the advantage of knowing the time-varying density of price returns is that many statistics can be derived from this density, such as higher-order moments, quantiles, entropy, and so on. These various statistics can be used in portfolio selection, as generalizations of Markowitz's work. One can for instance cite works about portfolio selection using higher moments [20] or risk measures defined by a quantile, also called value-at-risk [8], or by more complex statistics, such as the conditional value-at-risk [50]. The estimation of these risk measures with a kernel density is very appropriate and popular. Using a time-varying kernel density should improve both the estimation of these risk measures and the portfolio selection.
Regarding the market regulators, one of their tasks consists in detecting market manipulations as well as diminutions of market efficiency. The weak-form market efficiency quantifies the propensity of past observations of a time series of price returns to predict its future values. When financial markets experience turmoil times, such as during the COVID-19 crisis, debates often arise to determine whether markets are still efficient and whether they should remain open or closed. We think that dynamic kernel densities applied to price returns can help answering these questions. Indeed, several indicators of market efficiency can be derived from a kernel density, such as the probability of positive price returns, or the market information based on Shannon entropy [7,41]. The impact of the bandwidth selection and of the kernel density on these indicators of market efficiency has even already been studied [15].
5. Conclusion
In this paper, we have introduced a new method to select the two free parameters of a dynamic kernel density estimation, namely the discount factor and the bandwidth. This method relies on the maximization of the accuracy of the daily pdf. This accuracy is to be understood in the sense of the literature about density forecast evaluation: the PIT of each new observation, expressed using the time-varying distribution, forms a set of variables which must be iid uniform variables. We use the Kolmogorov–Smirnov statistic and a discrepancy statistic to build a quantitative criterion of accuracy of the pdf. It is this criterion that we try to maximize when selecting the bandwidth and the discount factor of our time-varying pdf. Future research could focus on extensions of this method to other divergence statistics.
We have applied this method to financial data. In particular, we represent the evolution of the pdf of daily price returns for several stock indices during the COVID-19 pandemic. We are thus able to expose an accurate chronology of the financial crisis. Though the impact of the pandemic on the Chinese market seems limited, we observe that the strongest impact occurred in the US. The slowest recovery is in Europe, for which the pdf of daily returns is still significantly different from a steady market late May 2020. On the contrary, the recovery of the Chinese and South-Korean markets is very rapid. According to several divergence statistics late May 2020, they are even not significantly different from what they were before the crisis.
Acknowledgments
The authors would like to thank Brieuc-Marie Le Brigand for his valuable help in the implementation of some of the methods described in this paper.
Appendix A. An alternative criterion
The selection of the free parameters h and ω relies on the minimization of a criterion exposed in Section 2.4. Alternatively to this criterion, we propose another criterion to be minimized. The difference between and consists in a different interpretation of the independence of the PITs. In , one simply considers a necessary condition of the independence, what makes this approach less rigorous than . Given the statistic k defined in Equation (5), the criterion follows the definition:
The rationale behind this alternative criterion is that the uniformity must be a feature not only of the PITs in the interval but also of the PITs in any of its subintervals: the sequence must be equidistributed. Like for , this method will avoid almost static densities with price returns globally well distributed, but with high PITs in a bullish regime compensated by low PITs during a crisis period. The criterion to be minimized is then the worst uniformity statistic over all the subintervals of . Once again, this criterion adapts the Kolmogorov-Smirnov statistics to the size of the subsample [31]. We also choose a minimal size ν above which we consider that the asymptotic Kolmogorov distribution may be applied. We may consider, for example, , so that we verify the uniformity for every one-month interval of daily price returns. Thanks to this size-adapted statistic, the criterion in fact focuses on the subinterval of size higher than ν with the least uniform PITs.
The optimal parameters h and ω obtained by this method for the stock indices studied in the empirical part of this paper are gathered in Table A1.
Table A1.
Optimal bandwidth and discount factor minimizing the criterion for several stock indices for densities between November 2019 and May 2020. The constrained version is and .
| NASDAQ Composite | 0.0110 | 0.827 | 0.0101 | 0.955 |
| S&P 500 | 0.0122 | 0.840 | 0.0121 | 0.955 |
| S&P 100 | 0.0124 | 0.877 | 0.0124 | 0.955 |
| EURO STOXX 50 | 0.0124 | 0.831 | 0.0124 | 0.960 |
| Euronext 100 | 0.0110 | 0.883 | 0.0123 | 0.963 |
| DAX | 0.0038 | 0.864 | 0.0014 | 0.955 |
| CAC 40 | 0.0117 | 0.864 | 0.0119 | 0.960 |
| Nikkei 225 | 0.0124 | 0.790 | 0.0165 | 0.955 |
| KOSPI | 0.0124 | 0.813 | 0.0286 | 0.955 |
| SSE 50 | 0.0107 | 0.778 | 0.0002 | 0.974 |
| Mean value | 0.0110 | 0.838 | 0.0112 | 0.959 |
| Median value | 0.0122 | 0.840 | 0.0123 | 0.955 |
For each stock index, we observe differences in the optimal parameters between Tables 1 and A1. These differences stem from the disparity between the two criteria used, namely and . Indeed, though both are based on Kolmogorov-Smirnov statistics, these divergence metrics focus on distinct samples: the whole dataset for , subsamples of consecutive observations of size greater than ν for . Moreover, incorporates an independence criterion of bivariate observations, separated by less than ν time steps, whereas is limited to univariate observations. More intuitively, when one selects the optimal bandwidth and discount factor with the help of , one tends to avoid the short-range dependence of PITs, that is for observations separated by less than ν. Conversely, when using , the focus is on the long-range independence of PITs, that is in time ranges higher than ν.
Notes
In the empirical application, we use Epanechnikov kernel.
A decrease of the weights may only appear between and . The inequality directly comes from the monotony condition .
We may consider, for example, , so that we verify the independence for every one-month interval of every pair of daily price returns. This arbitrary choice is applied in the empirical part of this paper.
An alternative criterion is proposed in Appendix A.
We consider in the same time , so that we don't take into consideration the independence feature of the PITs and we focus on the accuracy in the tails only.
Data availability statement
The data that support the findings of this study are openly available in Yahoo finance.
Disclosure statement
No potential conflict of interest was reported by the author(s).
References
- 1.Ammy-Driss A. and Garcin M., Efficiency of the financial markets during the COVID-19 crisis: Time-varying parameters of fractional stable dynamics, Phys. A Stat. Mech. Appl. 609 (2023), pp. 128335. [Google Scholar]
- 2.Arias-Calluari K., Alonso-Marroquin F., Najafi M.N., and Harré M., Forecasting the effect of COVID-19 on the S&P500, Working Paper, 2020. [Google Scholar]
- 3.Baker S.R., Bloom N., Davis S.J., Kost K.J., Sammon M.C., and Viratyosin T., The Unprecedented Stock Market Impact of COVID-19, Working paper, 2020. [Google Scholar]
- 4.Beran R., Minimum Hellinger distance estimates for parametric models, Ann. Stat. 5 (1977), pp. 445–463. [Google Scholar]
- 5.Berkowitz J., Testing density forecasts, with applications to risk management, J. Bus. Econ. Stat. 19 (2001), pp. 465–474. [Google Scholar]
- 6.Bouezmarni T. and Rombouts J.V., Nonparametric density estimation for multivariate bounded data, J. Stat. Plan. Inference 140 (2010), pp. 139–152. [Google Scholar]
- 7.Brouty X. and Garcin M., A statistical test of market efficiency based on information theory, Quant. Finance 23 (2023), pp. 1003–1018. [Google Scholar]
- 8.Campbell R., Huisman R., and Koedijk K., Optimal portfolio selection in a Value-at-Risk framework, J. Bank. Finance 25 (2001), pp. 1789–1804. [Google Scholar]
- 9.Davis M.H., Verification of internal risk measure estimates, Stat. Risk Modeling 33 (2016), pp. 67–93. [Google Scholar]
- 10.Deheuvels P., An asymptotic decomposition for multivariate distribution-free tests of independence, J. Multivar. Anal. 11 (1981), pp. 102–113. [Google Scholar]
- 11.Diebold F.X., Gunther T.A., and Tay A.S., Evaluating density forecasts, with applications to financial risk management, Int. Econ. Rev. (Philadelphia) 39 (1998), pp. 863–883. [Google Scholar]
- 12.Diebold F.X., Tay A.S., and Wallis K.F., Evaluating density forecasts of inflation: The survey of professional forecasters. In R.F. Engle and H. White, editors, Cointegration, causality, and forecasting: A Festschrift in honour of Clive W.J. Granger, pages 76–90. Oxford university press, 1999.
- 13.Forsberg L. and Bollerslev T., Bridging the gap between the distribution of realized (ECU) volatility and ARCH modelling (of the euro): The GARCH-NIG model, J. Appl. Econ. 17 (2002), pp. 535–548. [Google Scholar]
- 14.Garcin M., Estimation of time-dependent Hurst exponents with variational smoothing and application to forecasting foreign exchange rates, Phys. A. Stat. Mech. Appl. 483 (2017), pp. 462–479. [Google Scholar]
- 15.Garcin M., Complexity measure, kernel density estimation, bandwidth selection, and the efficient market hypothesis, Working paper, 2023. [Google Scholar]
- 16.Garcin M. and Guégan D., Probability density of the empirical wavelet coefficients of a noisy chaos, Phys. D: Nonlinear Phenom. 276 (2014), pp. 28–47. [Google Scholar]
- 17.Gneiting T., Balabdaoui F., and Raftery A.E., Probabilistic forecasts, calibration and sharpness, J. R. Stat. Soc Ser. B (Stat. Methodol.) 69 (2007), pp. 243–268. [Google Scholar]
- 18.Grabner P.J., Strauch O., and Tichy R.F., -discrepancy and statistical independence of sequences, Czechoslov. Math. J. 49 (1999), pp. 97–110. [Google Scholar]
- 19.Harvey A. and Oryshchenko V., Kernel density estimation for time series data, Int. J. Forecast. 28 (2012), pp. 3–14. [Google Scholar]
- 20.Harvey C.R., Liechty J.C., Liechty M.W., and Müller P., Portfolio selection with higher moments, Quant. Finance 10 (2010), pp. 469–485. [Google Scholar]
- 21.Holzmann H. and Eulert M., The role of the information set for forecasting – with applications to risk management, Ann. Appl. Stat. 8 (2014), pp. 595–621. [Google Scholar]
- 22.Jones M.C., Marron J.S., and Sheather S.J., A brief survey of bandwidth selection for density estimation, J. Am. Stat. Assoc. 91 (1996), pp. 401–407. [Google Scholar]
- 23.Justel A., Peña D., and Zamar R., A multivariate Kolmogorov-Smirnov test of goodness of fit, Stat. Probab. Lett. 35 (1997), pp. 251–259. [Google Scholar]
- 24.Ko S.I. and Park S.Y., Multivariate density forecast evaluation: A modified approach, Int. J. Forecast. 29 (2013), pp. 431–441. [Google Scholar]
- 25.Kristan M., Leonardis A., and Skočaj D., Multivariate online kernel density estimation with Gaussian kernels, Pattern. Recognit. 44 (2011), pp. 2630–2642. [Google Scholar]
- 26.Lacoume J.-L., Amblard P.-O., and Comon P., Statistiques d'ordre supérieur pour le traitement du signal, Masson, Paris, Paris, 1997. [Google Scholar]
- 27.Li Z., Liu S., and Tian M., Collective behavior of equity returns and market volatility, J. Data. Sci. 12 (2014), pp. 545–562. [Google Scholar]
- 28.Liang J.J., Fang K.T., Hickernell F., and Li R., Testing multivariate uniformity and its applications, Math. Comput. 70 (2001), pp. 337–355. [Google Scholar]
- 29.Luong A. and Bilodeau C., Simulated minimum Hellinger distance estimation for some continuous financial and actuarial models, Open. J. Stat. 7 (2017), pp. 743–759. [Google Scholar]
- 30.Markowitz H., Portfolio selection, J. Finance 7 (1952), pp. 77–91. [Google Scholar]
- 31.Marsaglia G., Tsang W.W., and Wang J., Evaluating Kolmogorov's distribution, J. Stat. Softw. 8 (2003), pp. 1–4. [Google Scholar]
- 32.Morimura T., Sugiyama M., Kashima H., Hachiya H., and Tanaka T., Parametric return density estimation for reinforcement learning, Working paper, 2012. [Google Scholar]
- 33.Niederreiter H., Low-discrepancy and low-dispersion sequences, J. Number. Theory. 30 (1988), pp. 51–70. [Google Scholar]
- 34.Niederreiter H., Recent constructions of low-discrepancy sequences, Math. Comput. Simul. 135 (2017), pp. 18–27. [Google Scholar]
- 35.Níguez T.M. and Perote J., Moments expansion densities for quantifying financial risk, North Am J. Econ Finance 42 (2017), pp. 53–69. [Google Scholar]
- 36.Panaretos V.M. and Zemel Y., Statistical aspects of Wasserstein distances, Annu. Rev. Stat. Appl. 6 (2019), pp. 405–431. [Google Scholar]
- 37.Pavlyshenko B.M., Regression approach for modeling COVID-19 spread and its impact on stock market, Working paper, 2020. [Google Scholar]
- 38.Rosenblatt M., Remarks on a multivariate transformation, Ann. Math. Stat. 23 (1952), pp. 470–472. [Google Scholar]
- 39.Scaillet O., Density estimation using inverse and reciprocal inverse Gaussian kernels, Nonparametr. Stat. 16 (2004), pp. 217–226. [Google Scholar]
- 40.Semeyutin A. and O'Neill R., A brief survey on the choice of parameters for: Kernel density estimation for time series data, North Am. J. Econ. Finance 50 (2019), pp. 101038. [Google Scholar]
- 41.Shternshis A., Mazzarisi P., and Marmi S., Measuring market efficiency: The Shannon entropy of high-frequency finacnial time series, Chaos, Solitons & Fractals 162 (2022), pp. 112403. [Google Scholar]
- 42.Silverman B.W., Density estimation for statistics and data analysis, CRC press, 1986. [Google Scholar]
- 43.Tokat Y., Rachev S.T., and Schwartz E.S., The stable non-Gaussian asset allocation: A comparison with the classical Gaussian approach, J. Econ. Dyn. Control 27 (2003), pp. 937–969. [Google Scholar]
- 44.Tsybakov A.B., Introduction to nonparametric estimation, Springer science & business media, 2008. [Google Scholar]
- 45.Tuffin B., On the use of low discrepancy sequences in Monte Carlo methods, Monte Carlo Methods Appl. 2 (1996), pp. 295–320. [Google Scholar]
- 46.Villani C., Topics in optimal transportation, American mathematical society, 2003. [Google Scholar]
- 47.Wand M.P. and Jones M.C., Kernel smoothing, CRC press, 1994. [Google Scholar]
- 48.Wang X., Tsokos C.P., and Saghafi A., Improved parameter estimation of time dependent kernel density by using artificial neural networks, J. Finance Data Sci. 4 (2018), pp. 172–182. [Google Scholar]
- 49.Wegman E.J. and Davies H.I., Remarks on some recursive estimators of a probability density, Ann. Stat. 7 (1979), pp. 316–327. [Google Scholar]
- 50.Yao H., Li Z., and Lai Y., Mean-CVaR portfolio selection: A nonparametric estimation framework, Comput. Oper. Res. 40 (2013), pp. 1014–1022. [Google Scholar]
- 51.Zhang T. and Wu W.B., Time-varying nonlinear regression models: Nonparametric estimation and model selection, Ann. Stat. 43 (2015), pp. 741–768. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The data that support the findings of this study are openly available in Yahoo finance.




