Vast Volatility Matrix Estimation using High Frequency Data for Portfolio Selection

Jianqing Fan; Yingying Li; Ke Yu

doi:10.1080/01621459.2012.656041

. Author manuscript; available in PMC: 2012 Dec 19.

Published in final edited form as: J Am Stat Assoc. 2012 Jun 11;107(497):412–428. doi: 10.1080/01621459.2012.656041

Vast Volatility Matrix Estimation using High Frequency Data for Portfolio Selection^{^*}

Jianqing Fan ¹, Yingying Li ², Ke Yu ³

PMCID: PMC3526073 NIHMSID: NIHMS424720 PMID: 23264708

Abstract

Portfolio allocation with gross-exposure constraint is an effective method to increase the efficiency and stability of portfolios selection among a vast pool of assets, as demonstrated in Fan et al. (2011). The required high-dimensional volatility matrix can be estimated by using high frequency financial data. This enables us to better adapt to the local volatilities and local correlations among vast number of assets and to increase significantly the sample size for estimating the volatility matrix. This paper studies the volatility matrix estimation using high-dimensional high-frequency data from the perspective of portfolio selection. Specifically, we propose the use of “pairwise-refresh time” and “all-refresh time” methods based on the concept of “refresh time” proposed by Barndorff-Nielsen et al. (2008) for estimation of vast covariance matrix and compare their merits in the portfolio selection. We establish the concentration inequalities of the estimates, which guarantee desirable properties of the estimated volatility matrix in vast asset allocation with gross exposure constraints. Extensive numerical studies are made via carefully designed simulations. Comparing with the methods based on low frequency daily data, our methods can capture the most recent trend of the time varying volatility and correlation, hence provide more accurate guidance for the portfolio allocation in the next time period. The advantage of using high-frequency data is significant in our simulation and empirical studies, which consist of 50 simulated assets and 30 constituent stocks of Dow Jones Industrial Average index.

Keywords: Volatility matrix estimation, high frequency data, concentration inequalities, portfolio allocation, risk assessment, refresh time

1 Introduction

The mean-variance efficient portfolio theory by Markowitz (1952, 1959) has profound impact on modern finance. Yet, its applications to practical portfolio selection face a number of challenges. It is well known that the selected portfolios depend too sensitively on the expected future returns and volatility matrix. This leads to the puzzle postulated by Jagannathan and Ma (2003) why no short-sale portfolio outperforms the efficient Markowicz portfolio. The sensitivity on the dependence can be effectively addressed by the introduction of the constraint on the gross exposure of portfolios (Fan et al., 2011). Their results give not only a theoretical answer to the puzzle postulated by Jagannathan and Ma (2003) but also pave a way for optimal portfolio selection in practice.

The second challenge of the implementation of Markowitz’s portfolio selection theory is the intrinsic difficulty of the estimation of the large volatility matrix. This is well documented in the statistics and econometrics literature even for the static large covariance matrix (Bickel and Levina, 2008; Fan, et al., 2008; Lam and Fan, 2009; Rothman et al., 2009). The additional challenge comes from the time-varying nature of a large volatility matrix. For a short and medium holding period (one day or one week, say), the expected volatility matrix in the near future can be very different from the average of the expected volatility matrix over a long time horizon (the past one year, say). As a result, even if we know exactly the realized volatility matrix in the past, the bias can still be large. This calls for a stable and robust portfolio selection. The portfolio allocation under the gross exposure constraint provides a needed solution. To reduce the bias of the forecasted expected volatility matrix, we need to shorten the learning period to better capture the dynamics of the time-varying volatility matrix, adapting better to the local volatility and correlation. But this is at the expense of a reduced sample size. The wide availability of high-frequency data provides sufficient amount of data for reliable estimation of the volatility matrix.

Recent years have seen dramatic developments in the study of high frequency data in estimating integrated volatility. Statisticians and econometricians have been focusing on the interesting and challenging problem of volatility estimation in the presence of market microstructure noise and asynchronous tradings, which are the stylized features of high-frequency financial data. The progresses are very impressive with a large literature. In particular, in the one dimensional case when the focus is on estimation of integrated volatility, Aït-Sahalia, et al. (2005) discussed a subsampling scheme; Zhang, et al. (2005) proposed a two-scale estimate which was extended and improved by Zhang (2006) to multiple scales; Fan and Wang (2007) separated jumps from diffusions in presence of market microstructural noise using a wavelet method; the robustness issues are addressed by Li and Mykland (2007); the realized kernel methods are proposed and thoroughly studied in Barndorff-Nielsen et al. (2009, 2011); Jacod, et al. (2009) proposed a pre-averaging approach to reduce the market microstructral noise; Xiu (2010) demonstrated that a simple quasi-likelihood method achieves the optimal rate of convergence for estimating integrated volatility. For estimation of integrated covariation, the non-synchronized trading issue was first addressed by Hayashi and Yoshida (2005) in absence of the microstructural noise; the kernel method with refresh time idea was first proposed by Barndorff-Nielsen et al. (2008); Zhang (2011) extend the two-scale method to study the integrated covariation using a previous tick method; Aït-Sahalia, et al. (2010) extend the quasi-maximum likelihood method; Kinnebrock et al. (2010) extend the pre-averaging technique. For the high-dimensional case, Wang and Zou (2010) estimate volatility matrix when sparsity condition is satisfied; Hautsch et al. (2009) study a blocking and regularization approach; Tao et al. (2011) aggregate daily integrated volatility matrix via a factor model; Zheng and Li (2011) study the empirical spectral distribution of the volatility matrix.

The aim of this paper is to study the volatility matrix estimation using high-dimensional high-frequency data from the perspective of financial engineering. Specifically, our main topics are how to extract the covariation information from high-frequency data for asset allocation and how effective they are. Two particular strategies are proposed for handling the nonsynchronized trading: “pairwise-refresh” and “all-refresh” schemes. The former retains much more data points and estimates covariance matrix element by element, which is often not positive semi-definite, whereas the latter retains far less data points and the resulting covariance matrix is more often positive semi-definite. As a result, the former has a better elementwise estimation error and is better in controlling risk approximation mentioned in the first paragraph of the introduction. However, the merits between the two methods are not that simple. In implementation, we need to project the estimate of covariance matrices onto the space of the semi-positive definite matrices. The projections distort the accuracy of the elementwise estimation. As a result, the pairwise-refresh scheme does not have much more advantage than the all-refresh method, though the former is very easy to implement. However, both methods significantly outperform the methods based on low frequency data, since they adapt better to the time-varying volatilities and correlations. The comparative advantage is more dramatic when there are rapid changes of the volatility matrix over time. This will be demonstrated in both simulation and empirical studies.

As mentioned above and demonstrated in Section 2, the accuracy of portfolio risk relative to the theoretically optimal portfolio is governed by the maximum elementwise estimation error. How does this error grow with the number of assets? Thanks to the concentration inequalities derived in this paper, it grows only at the logarithmic order of the number of assets. This gives a theoretical endorsement why the portfolio selection problem is feasible for vast portfolios.

The paper is organized as follows. Section 2 gives an overview of portfolio allocation using high-frequency data. Section 3 studies the volatility matrix estimation using high-frequency data from the perspective of asset allocation, where the analytical results are also presented. How well our idea works in simulation and empirical studies can be found in Sections 4 and 5, respectively. Conclusions are given in Section 6. Technical conditions and proofs are relegated to the appendix.

2 Constrained Portfolio Optimization with High Frequency Data

2.1 Problem Setup

Consider a pool of p assets, with log-price processes $X_{t}^{(1)}, X_{t}^{(2)}, \dots X_{t}^{(p)}$ . Denote by $X_{s} = {(X_{s}^{(1)}, \dots, X_{s}^{(p)})}^{T}$ the vector of the log-price processes at time s. Suppose they follow a diffusion process, namely,

d X_{t} = μ_{t} d t + S_{t}^{1 ∕ 2} d W_{t}

(1)

where W_t is the vector of p-dimensional standard Brownian motions. The drift vector μ_t and the instantaneous variance S_t can be stochastic processes and are assumed to be continuous.

A given portfolio with the allocation vector w at time t and a holding period τ has the log-return $w^{T} \int_{t}^{t + τ} d X_{s}$ with variance (risk)

R_{t, τ} (w) = w^{T} Σ_{t, τ} w,

(2)

where W^T1 = 1 and

Σ_{t, τ} = \int_{t}^{t + τ} E_{t} S_{u} d u

(3)

with E_t denoting the conditional expectation given the history up to time t. Let w⁺ be the proportion of long positions and w⁻ be the proportion of the short positions. Then, ∥w∥₁ = w₊ + w⁻ is the gross exposure of the portfolio. To simplify the problem, following Jagannathan and Ma (2003) and Fan et al. (2011), we consider only the risk optimization problem. In practice, the expected return constraint can be replaced by the constraints of sectors or industries, to avoid unreliable estimates of the expected return vector. For a short-time horizon, the expected return is usually negligible. Following Fan et al. (2011), we consider the following risk optimization under gross exposure constraints:

\min w^{T} Σ_{t, τ} w, s, t, {∥ w ∥}_{1} \leq c and w^{T} 1 = 1,

(4)

where c is the total exposure allowed. Note that using w⁺ − w⁻ = 1, the problem (4) puts equivalently the constraint on the proportion of the short positions: w⁻ ≤ (c − 1)/2. As noted in Jagannathan and Ma (2003), the constrained optimization problem (4) is equivalent to unconstrained risk optimization with a regularized covariance matrix. Other methods of regularization are also possible to handle the noise accumulation problem (e.g. shrinkage method of Lediot and Wolf (2004)).

Problem (4) involves the conditional expected volatility matrix (3) in the future. Unless we know exactly the dynamic of the volatility process, this is usually unknown, even if we observed the entire continuous paths up to the current time t. As a result, we rely on the approximation even with ideal data that we were able to observe the processes continuously without error. The typical approximation is

τ^{- 1} Σ_{t, τ} \approx h^{- 1} \int_{t - h}^{t} S_{u} d u

(5)

for an appropriate window width h and we estimate $\int_{t - h}^{t} S_{u} d u$ based on the historical data at the time interval [t − h, t].

The approximation (5) holds reasonably well when τ and h are both small. This relies on the continuity assumptions: local time-varying volatility matrices are continuous in time. The approximation is also reasonable when both τ and h are large. This relies on the ergodicity assumption so that both quantities will be approximately ES_u, when the stochastic volatility matrix S_u is ergodic. The approximation is not good when τ is small whereas h is large as long as S_u is time varying, whether or not the stochastic volatility S_u is stationary or not. In other words, when the holding time horizon τ is short, as long as S_u is time varying, we can only use a short time window [t − h, t] to estimate Σ_t,τ. The recent arrivals of high-frequency data make this problem feasible.

The approximation error in (5) can not usually be evaluated unless we have a specific parametric model on the stochastic volatility matrix S_u. However, this is at the risk of model misspecifications and nonparametric approach is usually preferred for high-frequency data. With p² elements approximated, which can be in the order of hundreds of thousands or millions, a natural question to ask is whether these errors accumulate and whether the result (risk) is stable. The gross-exposure constraint gives a stable solution to the problem as shown in Fan et al. (2011).

We would like to close this section by noting that the formulation (4) is a one-period, not a multi-period portfolio optimization problem.

2.2 Risk approximations with gross exposure constraints

The utility of gross-exposure constraint can easily be seen through the following inequality. Let ${\hat{Σ}}_{t, τ}$ be an estimated covariance matrix and

{\hat{R}}_{t, τ} (w) = w^{T} {\hat{Σ}}_{t, τ} w

(6)

be estimated risk of the portfolio. Then, for any portfolio with gross-exposure ∥w∥₁ ≤ c, we have

∣ {\hat{R}}_{t, τ} (w) - R_{t, τ} (w) ∣ \leq \sum_{i = 1}^{p} \sum_{j = 1}^{p} ∣ {\hat{σ}}_{i, j} - σ_{i, j} ∣ ∣ w_{i} ∣ ∣ w_{j} ∣ \leq {∣ Σ_{t, τ} - {\hat{Σ}}_{t, τ} ∣}_{\infty} {∥ w ∥}_{1}^{2} \leq {∣ Σ_{t, τ} - {\hat{Σ}}_{t, τ} ∣}_{\infty} c^{2},

(7)

where ${\hat{σ}}_{i, j}$ and σ_i,j are respectively the (i, j) elements of ${\hat{Σ}}_{t, τ}$ and Σ_t,τ, and

{∣ Σ_{t, τ} - {\hat{Σ}}_{t, τ} ∣}_{\infty} = \max_{i, j} ∣ {\hat{σ}}_{i, j} - σ_{i, j} ∣

is the maximum elementwise estimation error. The risk approximation (7) reveals that there is no large error accumulation effect when gross exposure c is moderate.

From now on, we drop the dependence of t and τ whenever there is no confusion. This facilitates the notation.

Fan et al. (2011) showed further that the risks of optimal portfolios are indeed close. Let

w_{o p t} = {argmin}_{w^{T} 1 = 1, {∥ w ∥}_{1 \leq c}} R (w), {\hat{w}}_{o p t} = {argmin}_{w^{T} 1 = 1, {∥ w ∥}_{1 \leq c}} \hat{R} (w)

(8)

be respectively the theoretical (oracle) optimal allocation vector we want and the estimated optimal allocation vector we get. Then, R(w_opt) is the theoretical minimum risk and $R ({\hat{w}}_{o p t})$ is the actual risk of our selected portfolio, whereas $\hat{R} ({\hat{w}}_{o p t})$ is our perceived risk, which is the quantity known to financial econometricians. They showed that

∣ R ({\hat{w}}_{o p t}) - R (w_{o p t}) ∣ \leq 2 a_{p} c^{2},

(9)

∣ R ({\hat{w}}_{o p t}) - \hat{R} ({\hat{w}}_{o p t}) ∣ \leq a_{p} c^{2},

(10)

∣ R (w_{o p t}) - \hat{R} ({\hat{w}}_{o p t}) ∣ \leq a_{p} c^{2} .

(11)

with $a_{p} = {∣ \hat{Σ} - Σ ∣}_{\infty}$ , which usually grows slowly with the number of assets p. These reveal that the three relevant risks are in fact close as long as the gross-exposure parameter c is moderate and the maximum elementwise estimation error a_p is small.

The above risk approximations hold for any estimate of covariance matrix. It does not even require $\hat{Σ}$ a semi-positive definite matrix. This facilitates significantly the method of covariance matrix estimation. In particular, the elementwise estimation methods are allowed. In fact, since the approximation errors in (9), (10) and (11) are all controlled by the maximum elementwise estimation error, it can be advantageous to use elementwise estimation methods. This is particularly the case for the high-frequency data where trading are non-synchronized. The synchronization can be done pairwisely or for all assets. The former retains much more data than the latter, as shown in the next section.

3 Covariance Matrix Estimation Using High Frequency Data

3.1 All-refresh method and pairwise-refresh method

Estimating high-dimensional volatility matrix using high-frequency data is a challenging task. One of the challenges is the non-synchronicity of trading. Several synchronization schemes have been studied. The refresh time method is proposed in Barndorff-Nielsen et al. (2008) and the previous tick method is used in Zhang (2011). The former uses more efficiently the available data and will be used in this paper.

The idea of refresh time is to wait until all assets are traded at least once at time v₁ (say) and then use the last price traded before or at v₁ of each asset as its price at time v₁. This obtains one synchronized price vector at time v₁. The clock now starts again. Wait until all assets are traded at least once at time v₂ (say) and again use the previous tick price of each asset as its price at time v₂. This yields the second synchronized price vector at time v₂. Repeat the process until all available trading data are synchronized. Clearly, the process discards a large portion of the available trades. We will refer this synchorization scheme as the “all-refresh time” (The method is called all-refresh method for short). Barndorff-Nielsen et al. (2008) advocate the kernel method to estimate integrated volatility matrix after synchronization; this can also be done by using other methods.

A more efficient method to use the available sample is the pairwise refresh time scheme, which synchronizes the trading for each pair of assets separately (The method is called pairwise-refresh method for short). The pairwise-refresh scheme makes far more efficient use of the rich information in high-frequency data, and enables us to estimate each element in the volatility matrix more precisely, which helps improve the efficiency of the selected portfolio. We will study the merits of these two methods. A third synchronization scheme is the blocking scheme of Hautsch et al. (2009) by grouping stocks with similar liquidities. The pairwise refresh approach corresponds to the case with one-stock per group.

The pairwise estimation method allows us to use a wealth of univariate integrated volatility estimators such as those mentioned in the introduction. For any given two assets with log-price processes $X_{t}^{(i)}$ and $X_{t}^{(j)}$ , with pairwise-refresh times, the synchronized prices of $X_{t}^{(i)} + X_{t}^{(j)}$ and $X_{t}^{(i)} - X_{t}^{(j)}$ can be computed. With the univariate estimate of the integrated volatilities $< \hat{X^{(i)} + X^{(j)}} >$ and $< \hat{X^{(i)} - X^{(j)}} >$ , the integrated covariation can be estimated as

{\hat{σ}}_{i, j} = 〈 \hat{X^{(i)}, X^{(j)}} 〉 = (〈 \hat{X^{(i)} X^{(j)}} 〉 - 〈 \hat{X^{(i)} - X^{(j)}} 〉) ∕ 4 .

(12)

In particular, the diagonal elements are estimated by the method itself. When the two-scale realized volatility (TSRV) ((Zhang, et al., 2005)) is used, this results in the two-scale realized covariance (TSCV) estimate (Zhang, 2011).

3.2 Pairwise refresh method and TSCV

We now focus on the pairwise estimation method. To facilitate the notation, we reintroduce it.

We consider two log price processes X and Y that satisfy

d X_{t} = μ_{t}^{(X)} d t + σ_{t}^{(X)} d B_{t}^{(X)} and d Y_{t} = μ_{t}^{(Y)} d t + σ_{t}^{(Y)} d B_{t}^{(Y)},

(13)

where $c o r (B_{t}^{(X)}, B_{t}^{(Y)}) = ρ_{t}^{(X, Y)}$ . For the two processes X and Y, consider the problem of estimating 〈X, Y〉_T with T = 1. Denote by $T_{n}$ the observation times of X and $S_{m}$ the observation times of Y. Denote the elements in $T_{n}$ and $S_{m}$ by ${τ_{n, i}}_{i = 0}^{n}$ and ${θ_{m, i}}_{i = 0}^{m}$ respectively, in an ascending order (τ_n,0 and θ_m,0 are set to be 0). The actual log-prices are not directly observable, but are observed with microstructure noises:

X_{τ_{n, i}}^{o} = X_{τ_{n, i}} + ∊_{i}^{X}, Y_{θ_{m, i}}^{o} = Y_{θ_{m, i}} + ∊_{i}^{Y}

(14)

where X^o and Y^o are the observed transaction prices in the logarithmic scale, and X and Y are the latent log prices govern by the stochastic dynamics (13). We assume that the microstructure noise $∊_{i}^{X}$ and $∊_{i}^{Y}$ processes are independent of the X and Y processes and that

∊_{i}^{X} \sim_{i . i . d .} N (0, η_{X}^{2}) and ∊_{i}^{Y} \sim_{i . i . d .} N (0 . η_{Y}^{2}) .

(15)

Note that this assumption is mainly for the simplicity of presentation; as we can see from the proof, one can for example easily replace the identical Gaussian assumption with the not necessarily identically distributed (but are of the same variance) sub-Gaussian assumption without affecting our results.

The pairwise refresh time $V = {v_{0}, v_{1}, \dots, v_{\tilde{n}}}$ can be obtained by setting v₀ = 0, and

v_{i} = \max {\min {τ \in T_{n} : τ > v_{i - 1}}, \min {θ \in S_{m} : θ > v_{i - 1}}},

where $\tilde{n}$ is the total number of refresh times in the interval (0, 1]. The actual sample times for the two individual processes X and Y that correspond to the refresh times are

t_{i} = \max {τ \in T_{n} : τ \leq v_{i}} and s_{i} = \max {θ \in S_{m} : θ \leq v_{i}},

which are indeed the previous-tick measurement.

We study the property of the TSCV based on the asynchronous data:

{〈 \hat{X, Y} 〉}_{1} = {[X^{0}, Y^{0}]}_{1}^{(K)} - \frac{\overset{‒}{n} K}{\overset{‒}{n} J} {[X^{o}, Y^{o}]}_{1}^{(J)},

(16)

where

{[X^{o}, Y^{o}]}_{1}^{(K)} = \frac{1}{K} \sum_{i = K}^{\tilde{n}} (X_{t_{i}}^{o} - X_{t_{i} - K}^{o}) (Y_{s i}^{o} - Y_{s i - K}^{o})

and ${\overset{‒}{n}}_{K} = (\tilde{n} - K + 1) ∕ K$ . As discussed in Zhang (2011), the optimal choice of K has order $K = O ({\tilde{n}}^{2 ∕ 3})$ , and J can be taken to be a constant such as 1.

When either the microstructure error or the asynchronicity exists, the realized covariance is seriously biased. An asymptotic normality result in Zhang (2011) reveals that TSCV can simultaneously remove the bias due to the microstructure error and the bias due to the asynchronicity. However, this result is not adequate for our application to the vast volatility matrix estimation. To understand its impact on a_p, we need to establish the concentration inequality. In particular, for a sufficiently large |x| = O((log p)^α), if

\max_{i, j} P {\sqrt{n} ∣ σ_{i j} - {\hat{σ}}_{i j} ∣ > x} < C_{1} \exp (- C_{2} x^{1 ∕ α}),

(17)

for some positive constants C₁, C₂ and α, then

a_{p} = {∣ Σ - \hat{Σ} ∣}_{\infty} = O_{P} (\frac{{(\log p)}^{α}}{\sqrt{n}}) .

(18)

We will show in the next section that the result indeed holds for some α which depends on the tail of the volatility process and n replaced by the minimum of the subsample size $({\overset{‒}{n}}_{K} ~ {(\tilde{n})}^{\frac{1}{3}})$ . Hence the impact of the number of assets is limited, only of the logarithmic order.

3.3 Concentration Inequalities

Inequality (17) requires the conditions on both diagonal elements and off-diagonal elements. Technically, they are treated differently. For the diagonal cases, the problem corresponds to the estimation of integrated volatility and there is no issue of asynchronicity. TSCV (16) reduces to TSRV (Zhang, et al., 2005), which is explicitly given by

{〈 \hat{X, X} 〉}_{1} = {[X^{o}, X^{o}]}_{1}^{(K)} - \frac{\overset{‒}{n} K}{\overset{‒}{n} J} {[X^{o}]}_{1}^{(J)},

(19)

where ${[X^{o}, X^{o}]}_{1}^{(K)} = \frac{1}{K} \sum_{i = K}^{n} {(X_{t_{i}}^{o} - X_{t_{i - K}}^{o})}^{2}$ and ${\overset{‒}{n}}_{K} = (n - K + 1) ∕ K$ . As shown in Zhang, et al. (2005), the optimal choice of K has order K = O(n^2/3) and J can be taken to be a constant such as 1.

To facilitate the reading, we relegate the technical conditions and proofs to the appendix. The following two results establish the concentration inequalities for the integrated volatility and integrated covariation.

Theorem 1. Let X process be as in (13), and n be the total number of observations for the X process during the time interval (0,1]. Under Conditions 1-4 in Appendix A.1

If $σ_{t}^{(X)} \leq C_{σ} < \infty$ for all t ∈ [0, 1], then for all large n, for x ∈ [0, cn^1/6],
$P {n^{1 ∕ 6} ∣ {〈 \hat{X, X} 〉}_{1} - \int_{0}^{1} σ_{t}^{{(X)}^{2}} d t ∣ > x} \leq 4 \exp {- C x^{2}}$
for positive constants c and C. A set of candidate values for c and C are given in (50).
If the tail behavior of $σ_{t}^{(X)}$ can be described as
$P {\sup_{0 \leq t \leq 1} σ_{t}^{(X)} \geq C_{σ}} \leq k_{σ} \exp {- a C_{σ}^{b}}, for any C_{σ} > 0$
with positive constants k_σ, a and b, then for all large n, for $x \in [0, c n^{\frac{4 + b}{6 b}}]$ ,
$P {n^{1 ∕ 6} ∣ {〈 \hat{X, X} 〉}_{1} - \int_{0}^{1} σ_{t}^{{(X)}^{2}} d t ∣ > x} \leq (4 + k_{σ}) \exp {- C \cdot x^{\frac{2 b}{4 + b}}} .$
A set of candidate values for c and C are given in (51).

Theorem 2. Let X and Y be as in (13), and $\tilde{n}$ be the total number of refresh times for the processes X and Y during the time interval (0,1]. Under Conditions 1-5 in Appendix A.1,

if $σ_{t}^{(i)} \leq C_{σ} < \infty$ for all t ∈ [0, 1] and i = X and Y , then for all large $\tilde{n}$ , for $x \in [0, c {\tilde{n}}^{1 ∕ 6}]$ ,
$P {{\tilde{n}}^{1 ∕ 6} ∣ {〈 \hat{X, Y} 〉}_{1} - \int_{0}^{1} σ_{t}^{(X)} σ_{t}^{(Y)} ρ_{t}^{(X, Y)} d t ∣ > x} \leq 8 \exp {- C x^{2}}$
for positive constants c and C. A set of candidate values for c and C are given in (54).
If the tail behavior of $σ_{t}^{(i)}$ for i = X or Y satisfy
$P {\sup_{0 \leq t \leq 1} σ_{t}^{(i)} \geq C_{σ}} \leq k_{σ} \exp {- a C_{σ}^{b}}, for any C_{σ} > 0$
with positive constants k_σ, a and b, then for all large $\tilde{n}$ , for $x \in [0, c {\tilde{n}}^{\frac{4 + b}{6 b}}]$ ,
$P {{\tilde{n}}^{1 ∕ 6} ∣ {〈 \hat{X, Y} 〉}_{1} - \int_{0}^{1} σ_{t}^{(X)} σ_{t}^{(Y)} ρ_{t}^{(X, Y)} d t ∣ > x} \leq (8 + 2 k_{σ}) \exp {- C x^{\frac{2 b}{4 + b}}}$
A set of candidate values for c and C are given in (55).

3.4 Error rates on risk approximations

Having had the above concentration inequalities, we can now readily give an upper bound of the risk approximations. Consider the p log-price processes as in Section 2.1. Suppose the processes are observed with the market microstructure noises. Let ${\tilde{n}}^{(i, j)}$ be the observation frequency obtained by the pairwise-refresh method for two processes X⁽ⁱ⁾ and X^(j) and ${\tilde{n}}_{*}$ be the observation frequency obtained by the all-refresh method. Clearly, ${\tilde{n}}^{(i, j)}$ is typically much larger than ${\tilde{n}}_{*}$ . Hence, most elements are estimated more accurately using the pairwise-refresh method than using the all-refresh method. On the other hand, for less liquidly traded pairs, its observation frequency of pairwise-refresh time can not be an order of magnitude larger than ${\tilde{n}}_{*}$ .

Using (18), an application to Theorems 1 and 2 to each element in the estimated integrated covariance matrix yields

a_{p}^{pairwise-refresh} = {∣ {\hat{Σ}}^{pairwise - Σ} ∣}_{\infty} = O_{P} (\frac{{(\log p)}^{α}}{{\tilde{n}}_{\min}^{1 ∕ 6}}),

(20)

where α is $\frac{1}{2}$ when the volatility processes are bounded and is a constant depending on the tail behavior of the volatility processes when they are unbounded; and ${\tilde{n}}_{\min} = \min_{i, j} {\tilde{n}}^{(i, j)}$ is the minimum number of observations of the pairwise-refresh time.

Note that based on our proofs which don’t rely on any particular properties of pairwise-refresh times, our results of Theorem 1 and Theorem 2 are applicable to all-refresh method as well, with the observation frequency of the pairwise-refresh times replaced by that of the all-refresh times. Hence, using the all-refresh time scheme, we have

a_{p}^{all-refresh} = {∣ {\hat{Σ}}^{all-refresh} - Σ ∣}_{\infty} = O_{P} (\frac{{(\log p)}^{α}}{{\tilde{n}}_{*}^{1 ∕ 6}}),

(21)

with the same α as above. Clearly, ${\tilde{n}}_{\min}$ is larger than ${\tilde{n}}_{*}$ . Hence, the pairwise refresh method gives a somewhat more accurate estimate in terms of the maximum elementwise estimation error.

3.5 Projections of estimated volatility matrices

The risk approximations (9)-(11) hold for any solutions to (8) whether the matrix $\hat{Σ}$ is positive semi-definite or not. However, convex optimization algorithms typically require the positive semi-definiteness of the matrix $\hat{Σ}$ . Yet, the estimates based on the elementwise estimation sometimes can not all satisfy this and even the ones from all-refresh method can have the same problem when TSCV is applied. This leads to the issue of how to project a symmetric matrix onto the space of positive semi-definite matrices.

There are two intuitive methods for projecting a p × p symmetric matrix A onto the space of positive semi-definite matrices. Consider the singular value decomposition: A = Γ^Tdiag(Γ₁,…, Γ_p)Γ, where Γ is an orthogonal matrix, consisting of p eigenvectors. The two intuitive appealing projection methods are

A_{1}^{+} = Γ^{T} diag (λ_{1}^{+}, \dots, λ_{n}^{+}) Γ,

(22)

where $λ_{j}^{+}$ is the positive part of Γ_j and

A_{2}^{+} = (A + λ_{\min}^{-} I_{p}) ∕ (1 + λ_{\min}^{-}),

(23)

where $λ_{\min}^{-}$ is the negative part of the minimum eigenvalue. For both projection methods, the eigenvectors remain the same as those of A. When A is positive semi-definite matrix, we have obviously that A₁ = A₂ = A.

In applications, we apply the above transformations to the estimated correlation matrix A rather than directly to the volatility matrix estimate $\hat{Σ}$ . The correlation matrix A has diagonal elements of 1. The resulting matrix under the projection method (23) apparently still satisfies this property, whereas the one under the projection method (22) does not. As a result, the projection method (23) keeps the integrated volatility of each asset intact.

In our initial simulation and empirical studies, we applied both projections. It turns out that there is no significant difference between the two projection methods in terms of results. We decided to apply only the projection (23) in all numerical studies.

3.6 Comparisons between pairwise- and all-refresh methods

The pairwise-refresh method keeps far richer information in the high-frequency data than the all-refresh method. Thus, it is expected to estimate each element more precisely. Yet, the estimated correlation matrix is typically not positive semi-definite. As a result, projection (23) can distort the accuracy of elementwise estimation. On the other hand, the all-refresh method more often gives positive semi-definite estimates. Therefore, the projection (23) has less impact on the all-refresh method than on the pairwise-refresh method.

Risk approximations (9)–(11) are only the upper bounds. The upper bounds are controlled by a_p, which has rates of convergence govern by (20) and (21). While the average number of observations of pairwise-refresh time is far larger than the number of observations ${\tilde{n}}_{*}$ of the all-refresh time, the minimum number of observations of pairwise-refresh time ${\tilde{n}}_{\min}$ is not much larger than ${\tilde{n}}_{*}$ . Therefore, the upper bounds (20) and (21) are approximately of the same level. This together with the distortion due to projection do not leave much advantage for the pairwise-refresh method.

4 Simulation Studies

In this section, we simulate the market trading data using a reasonable stochastic model. As the latent prices and dynamics of simulations are known, our study on the risk profile is facilitated. In particular, we would like to verify our theoretical results and to quantify the finite sample behaviors.

In this section and the next, the risk refers to the standard deviation of portfolio’s returns. To avoid ambiguity, we call $\sqrt{R (w_{o p t})}$ the theoretical optimal risk or oracle risk, $\sqrt{\hat{R} ({\hat{w}}_{o p t})}$ the perceived optimal risk, and $\sqrt{R ({\hat{w}}_{o p t})}$ the actual risk of the perceived optimal allocation.

4.1 Design of Simulations

A slightly modified version of the simulation model in Barndorff-Nielsen et al. (2008) is used to generate the latent price processes of p traded assets. It is a multivariate factor model with stochastic volatilities. Specifically, the latent log-prices $X_{t}^{(i)}$ follow

d X_{t}^{(i)} = μ^{(i)} d t + ρ^{(i)} σ_{t}^{(i)} d B_{t}^{(i)} + \sqrt{1 - {(ρ^{(i)})}^{2}} σ_{t}^{(i)} d W_{t} + ν^{(i)} d Z_{t}, i = 1, \dots, p,

(24)

where the elements of B, W and Z are independent standard Brownian motions. The spot volatility obeys the independent Ornstein-Uhlenbeck processes:

d ϱ_{t}^{(i)} = α^{(i)} (β_{0}^{(i)} - ϱ_{t}^{(i)}) d t + β_{1}^{(i)} d U_{t}^{(i)},

(25)

where $ϱ_{t}^{(i)} = \log σ_{t}^{(i)}$ and $U_{t}^{(i)}$ is an independent Brownian motion.

The number of assets p is taken to be 50. Slightly modified from Barndorff-Nielsen et al. (2008), the parameters are set to be $(μ^{(i)}, β_{0}^{(i)}, β_{1}^{(i)}, α^{(i)}, ρ^{(i)}) = (0.03 x_{1}^{(i)}, - x_{2}^{(i)}, 0.75 x_{3}^{(i)}, - 1 ∕ 40 x_{4}^{(i)}, - 0.7)$ and $ν^{(i)} = \exp (β_{0}^{(i)})$ , where $x_{j}^{(i)}$ is an independent realization from the uniform distribution on [0.7, 1.3]. The parameters are kept fixed in the simulations.

The model (24) is used to generate the latent log-price values with initial values $X_{0}^{(i)} = 1$ (log-price) and $ϱ_{0}^{(i)}$ from its stationary distribution. The Euler scheme is used to generate latent price at the frequency of once per second. To account for the market microstructure noise, the Gaussian noises $ε_{t}^{(i)} ~_{i . i . d .} N (0, ω^{2})$ with ω = 0.0005 are added. Therefore, like (14), the observed log-prices are $X_{t}^{o (i)} = X_{t}^{(i)} + ε_{t}^{(i)}$ .

To model the non-synchronicity, p independent Poisson processes with intensity parameters Γ₁, Γ₂, …, Γ_p are used to simulate the trading times of the assets. Motivated by the US equity trading dataset (the total number of seconds in a common trading day of the US equity is 23400), we set the trading intensity parameters Γ_i’s to be 0.02 × 23400 for i = 1, 2, …, 50, meaning that the average numbers of trading times for each asset are spread out in the arithmetic sequence of the interval [468, 23400].

4.2 An oracle investment strategy and risk assessment

An oracle investment strategy is usually a decent benchmark for other portfolio strategies to compare with. There are several oracle strategies. The one we choose is to make portfolio allocation based on the covariance matrix estimated using latent prices at the finest grid (one per second). Latent prices are the noise-free prices of each asset at every time points (one per second), which are unobservable in practice and is available to us only in the simulation. Therefore, for each asset, there are 23400 latent prices in a normal trading day. We will refer to the investment strategy based on the latent prices as the oracle or latent strategy. This strategy is not available for the empirical studies.

The assessment of risk is based on the high-frequency data. For a given portfolio strategy, its risk is computed based on the latent prices at every 15 minutes for the simulation studies; whereas for the empirical studies, the observed prices at every 15 minutes are used to assess its risk. This mitigates the influence of the microstructure noises. For the empirical study, we do not hold positions overnight therefore are immune to the overnight price jumps (we will discuss the details in Section 5).

4.3 Out-of-sample Optimal Allocation

One of the main purposes of this paper is to investigate the comparative advantage of the high frequency based methods against the low frequency based method, especially in the context of portfolio investment. Hence, it is essential for us to run the following out-of-sample investment strategy test which includes both the high frequency and low frequency based approaches. Moreover, since in the empirical studies, we do not know the latent asset prices, the out-of-sample test should be designed so that it can also be conducted in the empirical studies.

We simulate the prices of 50 traded assets as described in section 4.1 for the duration of 200 trading days (numbered as day 1, day 2, …, day 200) and record all the tick-by-tick trading times and trading prices of the assets.

We start investing 1 unit of capital into the pool of assets with low frequency and high frequency based strategies from day 101 (the portfolios are bought at the opening of day 101). For the low frequency strategy, we use the previous 100 trading days’ daily closing prices to compute the sample covariance matrix and make the portfolio allocation accordingly with the gross exposure constraints. For the high frequency strategies, we use the previous h = 10 trading days’ tick-by-tick trading data. For the all-refresh strategy, we use all-refresh time to synchronize the trades of the assets before applying TSCV to estimate the integrated volatility matrix and make the portfolio allocation; while for the pairwise-refresh high frequency strategy, we use pairwise-refresh times to synchronize each pair of assets and apply TSCV to estimate the integrated covariance for the corresponding pair. With the projection technique (23), the resulting TSCV integrated volatility matrix can always be transformed to a positive semi-definite matrix which facilitates the optimization.

We run two investment plans. In the first plan, the portfolio is held for τ = 1 trading day before we re-estimate the covariation structure and adjust the portfolio weights accordingly. The second plan is the same as the first one except for the fact that the portfolio is held for τ = 5 trading days before rebalancing.

In the investment horizon (which is from day 101 to day 200 in this case), we record the 15-minute portfolio returns based on the latent prices of the assets, the variation of the portfolio weights across 50 assets, and other relevant characteristics. While it appears that 100 trading days is short, calculating 15-minute returns increases the size of the relevant data for computing the risk by a factor of 26.

We study those portfolio features for a whole range of gross exposure constraint c from c = 1, which stands for the no-short-sale portfolio strategy, to c = 3. This is usually the relevant range of gross exposure for investment purpose.

The standard deviations and other characteristics of the strategy for τ = 1 are presented in Table 1 (the case τ = 5 gives similar comparisons and hence is omitted). The standard deviations, which are calculated based on the 15-minute returns as mentioned above, represent the actual risks of the strategy. As we only optimize the risk profile, we should not look significantly on the returns of the optimal portfolios. They can not even be estimated with good accuracy over such a short investment horizon. Figures 1 and 2 provides graphical details to these characteristics for both τ = 1 and τ = 5.

Table 1. The out-of-sample performance of daily-rebalanced optimal portfolios with gross-exposure constraint.

We simulate one trial of intra-day trading data for 50 assets, make portfolio allocations for 100 trading days and rebalance daily. The standard deviations and other characteristics of these portfolios are recorded. All the characteristics are annualized (Max Weight: Median of maximum weights; Min Weight: Median of minimum weights; No. of Long: Median of numbers of long positions whose weights exceed 0.001; No. of Short: Median of numbers of short positions whose absolute weights exceed 0.001)

Methods	Std Dev %	Max Weight	Min Weight	No. of Long	No. of Short
Low Frequency Sample Covariance Matrix Estimator
c = 1 (No short)	16.69	0.19	−0.00	13	0
c = 2	16.44	0.14	−0.05	28.5	20
c = 3	16.45	0.14	−0.05	28.5	20

High Frequency All-Refresh TSCV Covariance Matrix Estimator
c = 1 (No short)	16.08	0.20	−0.00	15	0
c=2	14.44	0.14	−0.05	30	19
c=3	14.44	0.14	−0.05	30	19

High Frequency Pairwise-Refresh TSCV Covariance Matrix Estimator
c = 1 (No short)	15.34	0.18	−0.00	15	0
c=2	12.72	0.13	−0.03	31	18
c=3	12.72	0.13	−0.03	31	18

Open in a new tab

Out-of-sample performance of daily-rebalanced optimal portfolios based on high-frequency and low-frequency estimation of the integrated covariance matrix. (a) Annualized risk of portfolios. (b) Maximum weight of allocations.

Out-of-sample performance of optimal portfolios based on high-frequency and low-frequency estimation of the integrated covariance matrix with holding period τ = 5.

From Table 1 and Figures 1 and 2, we see that for both holding lengths τ = 1 and τ = 5, the all-refresh TSCV and pairwise-refresh TSCV approaches outperform significantly the low frequency one in terms of risk profile for the whole range of the gross exposure constraints. This supports our theoretical results and intuitions. The shorter estimation window allows these 2 high frequency approaches to deliver consistently better results than the low frequency one. Secondly, the pairwise method outperforms the all-refresh method, as expected. Finally, both low-frequency strategy and the high-frequency strategies outperforms significantly the equal-weight portfolio (see Figure 1 and Figure 2).

All the risk curves attain their minimum around c = 1.2 (see Figure 1 and Figure 2), which meets our expectation again, since that must be the point where the marginal increase in estimation error outpaces the marginal decrease in specification error. This, coupled with the result we get in the empirical studies section, will give us some guidelines about what gross exposure constraint to use in investment practice.

In terms of portfolio weights, neither the low frequency nor the high frequency optimal no-short-sale portfolios are well diversified with all approaches assigning a concentrated weight of around 20% to one individual asset. Their portfolio risks can be improved by relaxing the gross-exposure constraint (Figure 1 and Figure 2).

5 Empirical Studies

The risk minimization problem (6) has important applications in asset allocation. We demonstrate its application in the stock portfolio investment in the 30 Dow Jones Industrial Average (DJIA) constituent stocks (will be called the 30 DJIA stocks for short).

To make asset allocation, we use the high frequency data of the 30 DJIA stocks from Jan 1, 2008 to September 30, 2008. These stocks are highly liquid. The period covers the birth of financial crisis in 2008.

At the end of each holding period of τ = 1 or τ = 5 trading days in the investment period (from May 27, 2008 to Sep 30, 2008), the covariance of the 30 stocks is estimated according to the different estimators. They are the sample covariance of the previous 100 trading days’ daily return data (low-frequency), the all-refresh TSCV estimator of the previous 10 trading days, and the pairwise-refresh TSCV estimator of the previous 10 trading days. These estimated covariance matrices are used to construct optimal portfolios with a range of exposure constraints. For τ = 5, we do not count the overnight risks of the portifolio. The reason is that the overnight price jumps are often due to the arrival of news and are irrelevant of the topics of our study. The standard deviations and other characteristics of these portfolio returns for τ = 1 are presented in Table 2 together with the standard deviation of an equally weighted portfolio of the 30 DJIA stocks rebalanced daily. The standard deviations are for the 15 minutes returns, which represent the actual risks. Figure 3 and Figure 4 provide the graphical details to these characteristics for both τ = 1 and τ = 5.

Table 2. The out-of-sample performance of daily-rebalanced optimal portfolios of the 30 DJIA stocks.

Methods	Std Dev %	Max Weight	Min Weight	No. of Long	No. of Short
Low Frequency Sample Covariance Matrix Estimator
c = l (No short)	12.73	0.50	−0.00	8	0
c = 2	14.27	0.44	−0.12	16	10
c = 3	15.12	0.45	−0.18	18	12

High Frequency All-Refresh TSCV Covariance Matrix Estimator
c = l (No short)	12.55	0.40	−0.00	8	0
c=2	12.36	0.36	−0.10	17	12
c = 3	12.50	0.36	−0.10	17	12

High Frequency Pairwise-Refresh TSCV Covariance Matrix Estimator
c = l (No short)	12.54	0.39	−0.00	9	0
c=2	12.23	0.35	−0.08	17	12
c = 3	12.34	0.35	−0.08	17	12

Unmanaged Index
Dow Jones 30 equally weighted	22.12

Open in a new tab

Out-of-sample performance of daily-rebalanced optimal portfolios for Dow Jones 30 constituent stocks with investment period from May 27, 2008 to Sep 30, 2008 (89 trading days). (a) Annualized risk of portfolios. (b) Maximum weight of allocations.

Out-of-sample performance of 5-day-rebalanced optimal portfolios for Dow Jones 30 constituent stocks with investment period from May 27, 2008 to Sep 30, 2008 (89 trading days). (a) Annualized risk of portfolios. (b) Maximum weight of allocations.

Table 2, Figures 3 and 4 reveal that in terms of the portfolio’s actual risk, the all-refresh TSCV and pairwise-refresh TSCV strategies perform at least as well as the low frequency based strategy when the gross exposure is small and outperform the latter significantly when the gross exposure is large. Both facts support our theoretical results and intuitions. Given 10 times the length of covariance estimation window, the low frequency approach still cannot perform better than the high frequency TSCV approaches, which affirms our belief that the high frequency TSCV approaches can significantly shorten the necessary covariance estimation window and capture better the short-term time-varying covariation structure (or the “local” covariance). These results, together with the ones presented in the simulation section, lend strong support to the above statement.

As the gross exposure constraint increases, the portfolio risk of the low frequency approach increases drastically relative to the ones of the high frequency TSCV approaches. The reason could be a combination of the fact that the low frequency approach does not produce a well-conditioned estimated covariance due to the lack of data and the fact that the low frequency approach can only attain the long run covariation but cannot capture well the “local” covariance dynamics. The portfolio risk of the high frequency TSCV approaches increased only moderately as the gross exposure constraint increases. From financial practitioner’s standpoint, that is also one of the comparative advantages of high frequency TSCV approaches, which means that investors do not need to be much concerned about the choice of the gross exposure constraint while using the high frequency TSCV approaches.

It can be seen that both the low frequency and high frequency optimal no-short-sale port-folios are not diversified enough. Their risk profiles can be improved by relaxing the gross-exposure constraint to around c = 1.2, i.e. 10% short positions and 110% long positions are allowed. The no-short-sale portfolios under all approaches have the maximum portfolio weight of 22% to 50%. As the gross exposure constraint relaxes, the pairwise-refresh TSCV approach has its maximum weight reaching the smallest value around 30% to 34% while the low frequency approach goes down to only around 40%. That is another comparative advantage of the high frequency approach in practice as a portfolio with less weight concentration is typically considered more preferable.

Another interesting fact is that the equally weighted daily-rebalanced portfolio of the 30 DJIA stocks carries an annualized return of only −10% while DJIA went down 13.5% during the same period (May 27, 2008 to Sep 30, 2008), giving an annualized return of −38.3%. The cause of the difference is that we intentionally avoided holding portfolios overnight, hence the portfolios are not affected by the overnight price jumps. In the turbulent financial market of May to September 2008, that means our portfolio strategies are not affected by the numerous sizeable downward jumps. Those jumps are mainly caused by the news of distressed economy and corporations. The moves could deviate far from what the previously held covariation structure dictates.

6 Conclusion

We advocate the portfolio selection with gross-exposure constraint (Fan et al., 2011). It is less sensitive to the error of covariance estimation and mitigates the noise accumulation. The out-of-sample portfolio performance depends on the expected volatility in the holding period. It is at best approximated and the gross-exposure constraints help reducing the error accumulation in the approximations.

Two approaches are proposed for the use of high-frequency data to estimate the integrated covariance: “all-refresh” and “pairwise-refresh” methods. The latter retains far more data on average and hence estimates more precisely element by element. Yet, the pairwise-refresh estimates are often not positive semi-definite and projections are needed for the convex optimization algorithms. The projection distorts somewhat the performance of the pairwise-refresh strategies. New optimization algorithms need to be developed in order to take full advantage of pairwise-refresh. Further investigations on the relative merits of “pairwise-refresh”, “blocking approach”, and “all-refresh” are needed.

The use of high frequency financial data increases significantly the available sample size for volatility estimation, and hence shortens the time window for estimation, adapts better to local covariations. Our theoretical observations are supported by the empirical studies and simulations, in which we demonstrate convincingly that the high-frequency based strategies outperform the low-frequency based one in general.

With the gross-exposure constraint, the impact of the size of the candidate pool for portfolio allocation is limited. We derive the concentration inequalities to demonstrate this theoretically. Simulation and empirical studies also lend further support to it.

A APPENDIX

Conditions and Proofs

A.1 Conditions

We derive our theoretical results under the following conditions. For simplicity, we state the conditions for integrated covariation (Theorem 2). The conditions for integrated volatility (Theorem 1) are simply the ones with Y = X.

Condition 1. The drift processes are such that $μ_{t}^{(X)} = μ_{t}^{(Y)} = 0$ for all t ∈ [0, 1].

Condition 2. $σ_{i}^{(i)}$ , i = X, Y are continuous stochastic processes which are either bounded by a 0 < C_σ < ∞, or such that the tail behavior can be described by

P {\sup_{0 \leq t \leq 1} σ_{t}^{(i)} \geq C_{σ}} \leq k_{σ} \exp {- a C_{σ}^{b}}, for any C_{σ} > 0,

with positive constants k_σ, a and b.

Condition 3. The observation times are independent with the X and Y processes. The synchronized observation times for the X and Y processes satisfy $\sup_{1 \leq j \leq \tilde{n}} \tilde{n} \cdot (v_{j} - v_{j - 1}) \leq C_{Δ} \leq \infty$ , where C_Δ is a non-random constant, $\tilde{n}$ is the observation frequency and $V = {v_{0}, v_{1}, \dots, v_{\tilde{n}}}$ is the set of refresh times of the processes X and Y .

Condition 4. For the TSCV parameters, we consider the case when J = 1 $({\overset{‒}{n}}_{J} = \tilde{n})$ and ${\overset{‒}{n}}_{K} = O ({\tilde{n}}^{1 ∕ 3})$ such that $\frac{1}{2} \cdot {\tilde{n}}^{1 ∕ 3} \leq {\overset{‒}{n}}_{K} \leq 2 \cdot {\tilde{n}}^{1 ∕ 3}$ .

Condition 5. The processes ε^X and ε^Y are independent.

Remark 1. Conditions 1 and 4 are imposed for simplicity. They can be removed at the expenses of lengthier proofs. For a short horizon and high-frequency, whether Condition 1 holds or not has little impact on the investment. For estimating integrated volatility, the synchronized time becomes observation time {τ_n,j} and Condition 3 and 4 becomes

\sup_{1 \leq j \leq n} n \cdot (τ_{n, j} - τ_{n, j - 1}) \leq C_{Δ} < \infty

(26)

and $\frac{1}{2} \cdot n^{1 ∕ 3} \leq {\overset{‒}{n}}_{K} \leq 2 \cdot n^{1 ∕ 3}$ .

Remark 2. Note that indeed, in the derivations of the Theorem 1 and Theorem 2, $σ_{t}^{(X)}$ and $σ_{t}^{(Y)}$ are only required to be càdlàg. In other words, the continuity assumption is not needed for the concentration inequalities to hold. The continuity of the volatility processes is only needed in the approximation (5) in our study, it can be removed or relaxed for other applications of the concentration inequalities.

A.2 Lemmas

We need the following three lemmas for the proofs of Theorems 1 and 2. In particular, Lemma 2 is exponential type of inequality for any dependent random variables that have a finite moment generating function. It is useful for many statistical learning problems. Lemma 3 is a concentration inequality for the realized volatility based on discretely observed latent process.

Lemma 1. When Z ~ N(0, 1), for any $∣ θ ∣ \leq \frac{1}{4}$ , E exp{θ(Z² − 1)} ≤ exp(2θ²).

Proof. Using the moment generating function of $Z^{2} ~ χ_{1}^{2}$ , we have

E \exp {θ (Z^{2} - 1)} = \exp {- \frac{1}{2} \log (1 - 2 θ) - θ} .

Let g(x) = log(1 − x) + x + x² with |x| ≤ 1/2. Then, g’(x) = x(1 − 2x)/(1 − x) is nonegative when x ∈ [0, 1/2] and negative when x ∈ [−1/2, 0). In other words, g(x) has a minimum at point 0, namely g(x) ≥ 0 for |x| ≤ 1/2. Consequently, for |θ| ≥ 1/4, log(1 − 2θ) ≥ −2θ − (2θ)². Hence, E exp{θ(Z² − 1)} ≤ exp(2θ²).

Lemma 2. For a set of random variables X_i, i = 1, …, K, and an event A, if there exists two positive constants C₁ and C₂ such that for all |θ| ≤ C₁,

E (\exp (θ X_{i}) I_{A}) \leq \exp (C_{2} θ^{2}),

(27)

then for w_i’s being weights satisfying $\sum_{i = 1}^{K} ∣ w_{i} ∣ \leq w \in [1, \infty)$ , we have

P ({∣ \sum_{i = 1}^{K} w_{i} X_{i} ∣ > x} \cap A) \leq 2 \exp (- \frac{x^{2}}{4 C_{2} w^{2}}), when 0 \leq x \leq 2 C_{1} C_{2} .

Proof. By the Markov inequality, for 0 ≤ ζ ≤ C₁/w and $w^{*} = \sum_{i = 1}^{K} ∣ w_{i} ∣$ , we have

\begin{matrix} P ({∣ \sum_{i = 1}^{K} w_{i} X_{i} ∣ > x} \cap A) \leq & \exp (- ξ x) E (\exp (ξ ∣ \sum_{i = 1}^{K} w_{i} X_{i} ∣) I_{A}) \\ \leq & \exp (- ξ x) w^{* - 1} \sum_{i = 1}^{K} ∣ w_{i} ∣ E (\exp (ξ w ∣ X_{i} ∣) I_{A}) \\ \leq & 2 \exp (C_{2} ξ^{2} w^{2} - ξ x) . \end{matrix}

(28)

Taking ζ = x/(2C₂w²), we have

P ({∣ \sum_{i = 1}^{K} w_{i} X_{i} ∣ > x} \cap A) \leq 2 \exp (- \frac{x^{2}}{4 C_{2} w^{2}}), when 0 \leq x \leq 2 C_{1} C_{2} .

(29)

Lemma 3. (A Concentration Inequality for Realized Volatility) Let ${[\hat{X, X}]}_{1} = \sum_{i = 1}^{n} {(X_{v_{i}} - X_{v_{i - 1}})}^{2}$ be the realized volatility based on the discretely observed X process from model (1) of the univariate case: dX_t = μ_tdt + σ_tdW_t. Under Conditions 1-3,

if $σ_{t}^{(x)} \leq C_{σ} < \infty$ for all t ∈ [0, 1], then for all large n, for $x \in [0, c \sqrt{n}]$ ,
$P {n^{1 ∕ 2} ∣ {[\tilde{X, X}]}_{1} - \int_{0}^{1} σ_{t}^{2} d t ∣ > x} \leq 2 \exp {- C x^{2}},$
where the constants c and C can be taken as in (32).
If the tail behavior of $σ_{t}^{(X)}$ satisfies
$P {\sup_{0 \leq t \leq 1} σ_{t}^{(X)} \geq C_{σ}} \leq k_{σ} \exp {- a C_{σ}^{b}}, for any C_{σ} > 0$
with positive constants k_σ, a and b, then for all large n, for $x \in [0, c n^{\frac{4 + b}{2 b}}]$ ,
$P {n^{1 ∕ 2} ∣ {[\tilde{X, X}]}_{1} - \int_{0}^{1} σ_{t}^{2} d t ∣ > x} \leq (2 + k_{σ}) \exp {- C x^{\frac{2 b}{4 + b}}} .$
A set of candidate values for c and C are given in (33).

Proof. For any constant C_σ > 0, define a stopping time $Γ_{C_{σ}} ≔ \inf {t : \sup_{0 \leq s \leq t} σ_{s} > C_{σ}} \land 1$ .

Let

{\tilde{σ}}_{s} = {\begin{matrix} σ_{s}, when s \leq Γ_{C_{σ}} \\ C_{σ}, when s > Γ_{C_{σ}} \end{matrix}

and ${\tilde{X}}_{t} = \int_{0}^{t} {\tilde{σ}}_{s} d W_{s}$ . By time-change for martingales, (see, for example, Theorem 4.6 of chapter 3 of Karatzas and Shreve (2000)), if $τ_{t} = \inf {s : {[\tilde{X}]}_{s} \geq t}$ where ${[\tilde{X}]}_{s}$ is the quadratic variation process, then $B_{t} ≔ {\tilde{X}}_{τ_{t}}$ is a Brownian-motion w.r.t. ${F_{τ_{t}}}_{0 \leq t \leq \infty}$ . We then have that

E \exp (θ ({\tilde{X}}_{t}^{2} - \int_{0}^{t} {\tilde{σ}}_{s}^{2} d s)) = E \exp (θ (B_{{[\tilde{X}]}_{t}}^{2} - {[\tilde{X}]}_{t})) .

Note further that for any t, ${[\tilde{X}]}_{t}$ is a stopping time w.r.t. ${F_{τ_{s}}}_{0 \leq s \leq \infty}$ , and the process $\exp (θ (B_{s}^{2} - S))$ is a sub-martingale for any θ. By the optional sampling theorem, using ${[\tilde{X}]}_{u} \leq C_{σ}^{2} u$ (bounded stopping time), we have

E \exp (θ (B_{{[\tilde{X}]}_{u}}^{2} - {[\tilde{X}]}_{u})) \leq E \exp (θ (B_{C_{σ}^{2} u}^{2} - C_{σ}^{2} u)) .

Therefore, note that $Δ X_{i} = Δ {\tilde{X}}_{i}$ on the set of {Γ_Cσ = 1}, we have that, under Condition 3,

E (\exp {θ \sqrt{n} ({(Δ X_{i})}^{2}) - \int_{v_{i} - 1}^{v_{i}} σ_{t}^{2} d t} I_{{Λ_{C_{σ}} = 1}} ∣ F_{v_{i} - 1}) \leq E \exp {θ \sqrt{n} (B_{\frac{C_{σ}^{2} C_{Δ}}{n}}^{2} - \frac{C_{σ}^{2} C_{Δ}}{n})} = E \exp {θ \frac{C_{σ}^{2} C_{Δ}}{\sqrt{n}} (Z^{2} - 1)},

(30)

where Z ~ N(0, 1) and ΔX_i = X_v_i − X_{v_i−1}.

It follows from the law of iterated expectations and (30) that

\begin{matrix} E (\exp {θ \sqrt{n} ({[\tilde{X, X}]}_{1} - \int_{0}^{1} σ_{t}^{2} d t)} I_{{Γ_{C_{σ} = 1}}}) \\ = & E [(\exp {θ \sqrt{n} (\sum_{i = 1}^{n - 1} {(Δ X_{i})}^{2} - \int_{0}^{v_{n - 1}} σ_{t}^{2} d t)} I_{{Γ_{C_{σ} = 1}}}) \cdot E (\exp {θ \sqrt{n} (Δ X_{n}^{2} - \int_{v_{n_{n - 1}}}^{v_{n}} σ_{t}^{2} d t)} I_{{Γ_{C_{σ} = 1}}} ∣ F_{v_{n} - 1})] \\ \leq & E (\exp {θ \sqrt{n} (\sum_{i = 1}^{n - 1} {(Δ X_{i})}^{2} - \int_{0}^{v_{n - 1}} σ_{t}^{2} d t)} I_{{Γ_{C_{σ} = 1}}}) E \exp {θ \frac{C_{σ}^{2} C_{Δ}}{\sqrt{n}} (Z^{2} - 1)}, \end{matrix}

where Z ~ N(0, 1). Repeating the process above, we obtain

E (\exp {θ \sqrt{n} ({[\tilde{X, X}]}_{1} - \int_{0}^{1} σ_{t}^{2} d t)} I_{{Γ_{C_{σ} = 1}}}) \leq {(E \exp {θ \frac{C_{σ}^{2} C_{Δ}}{\sqrt{n}} (Z^{2} - 1)})}^{n} .

By Lemma 1, we have for $∣ θ ∣ \leq \frac{\sqrt{n}}{4 C_{σ}^{2} C_{Δ}}$ ,

E (\exp {θ \sqrt{n} ({[\tilde{X, X}]}_{1} - \int_{0}^{1} σ_{t}^{2} d t)}_{{I_{C_{σ} = 1}}}) \leq \exp {2 θ^{2} C_{σ}^{4} C_{Δ}^{2}} .

(31)

By Lemma 2, we have,

P ({n^{1 ∕ 2} ∣ {[\tilde{X, X}]}_{1} - \int_{0}^{1} σ_{t}^{2} d t ∣ > x} \cap {Γ_{C_{σ} = 1}}) \leq 2 \exp {- \frac{x^{2}}{8 C_{σ}^{4} C_{Δ}^{2}}},

(32)

when $0 \leq x \leq C_{σ}^{2} C_{Δ} \sqrt{n}$ . This proves the first half of the theorem. For the second half,

\begin{matrix} P {n^{1 ∕ 2} ∣ {[\tilde{X, X}]}_{1} - \int_{0}^{1} σ_{t}^{2} d t ∣ > x} \\ \leq & P {n^{1 ∕ 2} ∣ {[\tilde{X, X}]}_{1} - \int_{0}^{1} σ_{t}^{2} d t ∣ > x, Γ_{C_{a} = 1}} + P {Γ_{C_{σ}} < 1} \\ \leq & \exp {- θ x} E (\exp {θ \sqrt{n} (∣ {[\tilde{X, X}]}_{1} - \int_{0}^{1} σ_{t}^{2} d t ∣)} I_{{Γ_{C_{σ} = 1}}}) + P {Γ_{C_{σ}} < 1} for ninnegative θ \\ \leq & 2 \exp {- \frac{x^{2}}{8 C_{σ}^{4} C_{Δ}^{2}}} + k_{σ} \exp {- a C_{σ}^{b}}, when 0 \leq x \leq C_{σ}^{2} C_{Δ} \sqrt{n} . \end{matrix}

Now, let $C_{σ} = {(\frac{x^{2}}{8 a C_{Δ}^{2}})}^{\frac{1}{4 + b}}$ , we have that when $0 \leq x \leq C_{σ}^{2} C_{Δ} \sqrt{n}$ ,

P {n^{1 ∕ 2} ∣ {[\tilde{X, X}]}_{1} - \int_{0}^{1} σ_{t}^{2} d t ∣ > x} \leq (2 + k_{σ}) \exp {- 8^{\frac{- b}{4 + b}} \cdot a^{\frac{4}{4 + b}} \cdot C_{Δ}^{\frac{- 2 b}{4 + b}} \cdot x^{\frac{2 b}{4 + b}}} .

(33)

A.3 Proof of Theorem 1

We first prove the results conditioning on the set of observation times $V$ . Recall notation introduced in sections 3.2 and 3.3. Let n be the observation frequency. For simplicity of notation, without ambiguity, we will write τ_n,i as τ_i and $σ_{t}^{(X)}$ as σ_t. Again let $Γ_{C_{σ}} ≔ \inf {t : \sup_{0 \leq s \leq t} σ_{s} > C_{σ}} \land 1$ . Denote the TSRV based on the latent process by

{〈 \tilde{X, X} 〉}_{1}^{(K)} = {[\tilde{X, X}]}_{1}^{(K)} - \frac{\overset{‒}{n} K}{\overset{‒}{n} J} {[\tilde{X, X}]}_{1}^{(J)},

(34)

where ${[\tilde{X, X}]}_{1}^{(K)} = K^{- 1} \sum_{i = K}^{n} {(X_{τ_{i}} - X_{τ_{i - K}})}^{2}$ . Then, from the definition, we have,

\begin{matrix} {〈 \hat{X, X} 〉}_{1} = & {[\tilde{X, X}]}_{1}^{(K)} + {[\tilde{∊^{X}, ∊^{X}}]}_{1}^{(K)} + 2 {[\tilde{X, ∊^{X}}]}_{1}^{(K)} - \frac{\overset{‒}{n} K}{\overset{‒}{n} J} ({[\tilde{X, X}]}_{1}^{(J)} + {[\tilde{∊^{X}, ∊^{X}}]}_{1}^{(J)} + 2 {[\tilde{X, ∊^{X}}]}_{1}^{(J)}) \\ = & \frac{1}{K} \sum_{ℓ = 0}^{K - 1} V_{K}^{(ℓ)} - \frac{\overset{‒}{n} K}{\overset{‒}{n} J} {[\tilde{X, X}]}_{1}^{(J)} + R_{1} + R_{2}, \end{matrix}

(35)

where $R_{1} = {[\tilde{∊^{X}, ∊^{X}}]}_{1}^{(K)} - \frac{{\overset{‒}{n}}_{K}}{{\overset{‒}{n}}_{J}} {[\tilde{∊^{X}, ∊^{X}}]}_{1}^{(1)}$ , $R_{2} = 2 {[\tilde{X, ∊^{X}}]}_{1}^{(K)} - 2 \frac{{\overset{‒}{n}}_{K}}{{\overset{‒}{n}}_{J}} {[\tilde{X, ∊^{X}}]}_{1}^{(1)}$ , and

V_{k}^{(ℓ)} = \sum_{j = 1}^{\overset{‒}{n} K} {(X_{τ j K + ℓ} - X_{τ (j - 1) K + ℓ})}^{2}, for, ℓ = 0, 1, \dots, K - 1 .

Note that we have assumed that ${\overset{‒}{n}}_{K} = \frac{n - K + 1}{K}$ is an integer above, to simplify the presentation.

Recall that we consider the case when J = 1, or ${\overset{‒}{n}}_{J} = n$ . Let

I_{1} = \frac{1}{K} \sum_{ℓ = 0}^{K - 1} \sqrt{{\overset{‒}{n}}_{K}} (V_{K}^{(ℓ)} - \int_{0}^{1} σ_{t}^{2} d t) - {(\frac{\overset{‒}{n} K}{n})}^{\frac{3}{2}} \cdot \sqrt{n} ({[\tilde{X, X}]}_{1}^{(1)} - \int_{0}^{1} σ_{t}^{2} d t) + \sqrt{{\overset{‒}{n}}_{K}} R_{1} + \sqrt{{\overset{‒}{n}}_{K}} R_{2},

(36)

and $I_{2} = - \frac{{\overset{‒}{n}}_{K}^{3 ∕ 2}}{n} \int_{0}^{1} σ_{t}^{2} d t$ . We are interested in

\sqrt{{\overset{‒}{n}}_{K}} ({〈 \hat{X, X} 〉}_{1} - \int_{0}^{1} σ_{t}^{2} d t) = I_{1} + I_{2} .

The key idea is to compute the moment generating functions for each term in I₁ and then to use Lemma 2 to conclude.

For the first term in I₁, since $V_{k}^{(l)}$ is a realized volatility based on discretely observed X process, with observation frequency satisfying $\sup_{1 \leq i \leq {\overset{‒}{n}}_{K}} {\overset{‒}{n}}_{K} \cdot (τ_{i K + l} - τ_{(i - 1) K + l}) \leq C_{Δ}$ , we have, by (31) in Lemma 3, for $∣ θ ∣ \leq \frac{\sqrt{{\overset{‒}{n}}_{K}}}{4 C_{σ}^{2} C_{Δ}}$ ,

E (\exp {θ \sqrt{{\overset{‒}{n}}_{K}} (V_{K}^{(ℓ)} - \int_{0}^{1} σ_{t}^{2} d t)} I_{{Γ_{C_{σ} = 1}}}) \leq \exp {2 θ^{2} C_{σ}^{4} C_{Δ}^{2}} .

(37)

For the second term in I₁, we have obtained in (31) that

E (\exp {θ \sqrt{n} ({[\tilde{X, X}]}_{1}^{(1)} - \int_{0}^{1} σ_{t}^{2} d t)} I_{{Γ_{C_{σ} = 1}}}) \leq \exp {2 θ^{2} C_{σ}^{4} C_{Δ}^{2}}, when ∣ θ ∣ \leq \frac{\sqrt{n}}{4 C_{σ}^{2} C_{Δ}} .

(38)

We introduce an auxiliary sequence a_n that grows with n at a moderate rate to facilitate our presentation in the following. In particular, we can set a = n^1/12. Let us now deal with R₁, the third term in I₁. Note that from the definition

\begin{matrix} \sqrt{\overset{‒}{n} K} R_{1} \\ = \frac{\sqrt{\overset{‒}{n} K}}{K} {\sum_{i = K}^{n} {(∊_{i} - ∊_{i - K})}^{2} - \frac{n - K + 1}{n} \sum_{i = 1}^{n} {(∊_{i} - ∊_{i - 1})}^{2}} \\ = & \frac{\sqrt{{\overset{‒}{n}}_{K}} (n - K + 1)}{K \sqrt{n}} \cdot \frac{2}{\sqrt{n}} \sum_{i = 1}^{n} ∊_{i} ∊_{i - 1} - \frac{\sqrt{\overset{‒}{n} K} \sqrt{n - K + 1}}{K} \cdot \frac{2}{\sqrt{n - K + 1}} \sum_{i = K}^{n} ∊_{i} ∊_{i - K} \\ - \frac{\sqrt{\overset{‒}{n} K} \sqrt{K - 1 a_{n}}}{K} \cdot \frac{1}{a_{n} \sqrt{k - 1}} \sum_{i = 1}^{K - 1} (∊_{i}^{2} - η_{X}^{2}) - \frac{\sqrt{\overset{‒}{n} K} \sqrt{K - 1 a_{n}}}{K} \cdot \frac{1}{a_{n} \sqrt{K - 1}} \sum_{n - K + 1}^{n - 1} (∊_{i}^{2} - η_{X}^{2}) \\ + \frac{\sqrt{{\overset{‒}{n}}_{K}} (K - 1) a_{n}}{K \sqrt{n}} \cdot \frac{1}{a_{n} \sqrt{n}} \sum_{i = 1}^{n} (∊_{i}^{2} - η_{X}^{2}) + \frac{\sqrt{\overset{‒}{n} K} (K - 1) a_{n}}{K \sqrt{n}} \cdot \frac{1}{a_{n} \sqrt{n}} \sum_{i = 0}^{n - 1} (∊_{i}^{2} - η_{X}^{2}) . \end{matrix}

(39)

The first two terms in (39) are not sums of independent variables. But they can be decomposed into sums of independent random variables and the moment generating functions can be computed. To simplify the argument without losing the essential ingredient, let us focus on the first term of (39). We have the following decomposition

\sum_{i = 1}^{n} ε_{i} ε_{i - 1} = \sum_{odd i} ε_{i} ε_{i - 1} + \sum_{even i} ε_{i} ε_{i - 1},

and the summands in each terms of the right-hand side are now independent. Therefore, we need only to calculate the moment generating function of ε_iε_i−1.

For two independent normally distributed random variables $X ~ N (0, σ_{X}^{2})$ and $Y ~ N (0, σ_{Y}^{2})$ , it can easily be computed that

E (\exp {{θ_{n}}^{- 1 ∕ 2} X Y}) = {(\frac{1}{1 - σ_{X}^{2} σ_{Y}^{2} θ^{2} ∕ n})}^{1 ∕ 2} \leq \exp {σ_{X}^{2} σ_{Y}^{2} θ^{2} ∕ n} when ∣ θ ∣ \leq \frac{\sqrt{n}}{\sqrt{2} σ_{X} σ_{Y}},

where we have used the fact that log(1 − x) ≥ −2x when $0 \leq x \leq \frac{1}{2}$ .

Hence, by independence, it follows that (we assume n is even to simplify the presentation)

E \exp {2 θ n^{- 1 ∕ 2} \sum_{odd i} ε_{i} ε_{i - 1}} = {(\frac{1}{1 - 4 η_{X}^{4} θ^{2} ∕ n})}^{n ∕ 4} \leq \exp {2 η_{X}^{4} θ^{2}}, when ∣ θ ∣ \leq \frac{\sqrt{n}}{2 \sqrt{2} η_{X}^{2}} .

(40)

The second term in R₁ works similarly and has the same bound. For example, when ${\overset{‒}{n}}_{K}$ is even, one can have the following decomposation

\sum_{i = K}^{n} ∊_{i} ∊_{i - K} = \sum_{j = 1}^{{\overset{‒}{n}}_{K} ∕ 2} \sum_{i = 2 j K - K}^{2 j K - 1} ∊_{i} ∊_{i - K} + \sum_{j = 1}^{{\overset{‒}{n}}_{K} ∕ 2} \sum_{i = 2 j K}^{2 j K + K - 1} ∊_{i} ∊_{i - K} .

The last four terms are sums of independent χ²-distributed random variables and their moment generating functions can easily be bounded by using Lemma 1. Taking the term $\frac{1}{a_{n} \sqrt{K - 1}} \sum_{i = 1}^{K - 1} (∊_{i}^{2} - η_{X}^{2})$ for example, we have

E (\exp {\frac{θ}{a_{n} \sqrt{K - 1}}} \sum_{i = 1}^{K - 1} (∊_{i}^{2} - η_{X}^{2})) \leq \exp {2 η_{X}^{4} θ^{2} ∕ a_{n}^{2}} when ∣ θ ∣ \leq \frac{a_{n} \sqrt{K - 1}}{4 η_{X}^{2}} .

For the term R₂, we have,

\sqrt{{\overset{‒}{n}}_{K}} R_{2} = \frac{2 a_{n} {\overset{‒}{n}}_{K}}{n} \frac{1}{a_{n}} (\sum_{i = 1}^{n} Δ X_{i ∊_{i - 1}} - \sum_{i = 1}^{n} Δ X_{i ∊_{i}}) + \frac{2}{a_{n}} \frac{a_{n} \sqrt{{\overset{‒}{n}}_{K}}}{K} (\sum_{i = K}^{n} Δ^{(K)} X_{i ∊_{i}} - \sum_{i = K}^{n} Δ^{(K)} X_{i ∊_{i - K}}),

(41)

where ΔX_i = X_τi − X_{τ_i−1}, and Δ^(K)X_i = X_τi − X_τi−K. The first term above satisfies

\begin{matrix} E (\exp {\frac{θ}{a_{n}} \sum_{i = 1}^{n} Δ X_{i ∊_{i}}} I_{{Γ_{C_{σ} = 1}}}) = & E (\exp {\sum_{i = 1}^{n} {(\frac{θ}{a_{n}} Δ X_{i})}^{2} η_{X}^{2} ∕ 2} I_{{Γ_{C_{σ} = 1}}}) \\ \leq {(E (\exp {θ^{2} η_{X}^{2} C_{σ}^{2} C_{Δ} Z^{2} ∕ 2 n a_{n}^{2}}))}^{n} \\ = & {(\frac{1}{1 - η_{X}^{2} C_{σ}^{2} C_{Δ} θ^{2} ∕ n a_{n}^{2}})}^{n ∕ 2} \\ \leq \exp {η_{X}^{2} C_{σ}^{2} C_{Δ θ^{2}} ∕ a_{n}^{2}}, when ∣ θ ∣ \leq \frac{\sqrt{n} a_{n}}{\sqrt{2 C_{Δ}} C_{σ} η_{X}}, \end{matrix}

(42)

where in the second line we have again used the optional sampling theorem and law of iterated expectations as in the derivations of Lemma 3; Z denotes a standard normal random variable. The second term in R₂ works similarly and has the same bound. For the third term, by conditioning on the X-process first, we have

\begin{matrix} E [\exp {\frac{a_{n} θ \sqrt{{\overset{‒}{n}}_{K}}}{K} \sum_{i = K}^{n} Δ^{(K)} X_{i} ∊_{i}} I_{{Γ_{C_{σ} = 1}}}] \\ = & E [\exp {\frac{a_{n}^{2} θ^{2} {\overset{‒}{n}}_{K}}{2 K^{2}} \sum_{l = 0}^{K - 1} \sum_{j = 1}^{{\overset{‒}{n}}_{K}} {(Δ^{(K)} X_{j K + l})}^{2} η_{X}^{2}} I_{{Γ_{C_{σ} = 1}}}] \\ \leq & Π_{l = 0}^{K - 1} {E [\exp {\frac{a_{n}^{2} θ^{2} {\overset{‒}{n}}_{K} η_{X}^{2}}{2 K} \sum_{j = 1}^{{\overset{‒}{n}}_{K}} {(Δ^{(K)} X_{j K + l})}^{2}} I_{{Γ_{C_{σ} = 1}}}]}^{\frac{1}{K}} \\ \leq & Π_{l = 0}^{K - 1} {{(1 - \frac{a_{n}^{2} θ^{2} η_{X}^{2}}{K} C_{σ}^{2} C_{Δ})}^{- {\overset{‒}{n}}_{K} ∕ 2}}^{\frac{1}{K}} \\ \leq & \exp {\frac{a_{n}^{2} θ^{2} {\overset{‒}{n}}_{K} η_{X}^{2}}{K} C_{σ}^{2} C_{Δ}} when ∣ θ ∣ \leq \frac{\sqrt{K}}{\sqrt{2 C_{Δ}} a_{n} η C_{σ}}, \end{matrix}

(43)

where we have used the Hölder’s inequality above. The forth term works similarly and has the same bound.

Combining the results for all the terms (37) – (43) together, applying Lemma 2 to I₁, we have that the conditions for Lemma 2 are satisfied with A = {Γ_Cσ = 1}, $C_{1} = C_{1, x} \sqrt{{\overset{‒}{n}}_{K}}$ ,

\begin{matrix} C_{1, x} = & \min {\frac{1}{4 C_{σ}^{2} C_{Δ}}, \frac{\sqrt{n ∕ {\overset{‒}{n}}_{K}}}{2 \sqrt{2} η_{X}^{2}}, \frac{a_{n} \sqrt{(K - 1) ∕ {\overset{‒}{n}}_{K}}}{4 η_{C}^{2}}, \frac{a_{n} \sqrt{n ∕ {\overset{‒}{n}}_{K}}}{\sqrt{2 C_{Δ}} η_{X} C_{σ}}, \frac{\sqrt{K ∕ {\overset{‒}{n}}_{K}}}{\sqrt{2 C_{Δ}} a_{n} η C_{σ}}} \\ = & \frac{1}{4 C_{σ}^{2} C_{Δ}} for big enough n \end{matrix}

(44)

and

\begin{matrix} C_{2} = & \max {2 C_{σ}^{4} C_{Δ}^{2}, 2 η_{X}^{4}, 2 η_{X}^{4} ∕ a_{n}^{2}, η_{X}^{2} C_{σ}^{2} C_{Δ} ∕ a_{n}^{2}, \frac{a_{n}^{2} {\overset{‒}{n}}_{K} η_{X}^{2}}{K} C_{σ}^{2} C_{Δ}} \\ = & \max {2 C_{σ}^{4} C_{Δ}^{2}, 2 η_{X}^{4}} for big enough n \\ = & 2 C_{σ}^{4} C_{Δ}^{2} for the typical case when C_{Δ} \geq 1 and C_{σ} \geq η_{X} . \end{matrix}

(45)

Let w = 8, which is larger, when n is sufficiently large, than

Set C_I1 = (4C₂w²)⁻¹. By Lemma 2, when $0 \leq x \leq 2 C_{1, x} C_{2} \sqrt{{\overset{‒}{n}}_{K}}$ ,

P ({∣ I_{1} ∣ > x} \cap {Γ_{C_{a} = 1}}) \leq 2 \exp (- C_{I_{1}} x^{2}) .

Hence when $0 \leq x \leq 4 C_{1, x} C_{2} \sqrt{{\overset{‒}{n}}_{K}}$ ,

P ({∣ I_{1} ∣ > \frac{x}{2}} \cap {Γ_{C_{σ} = 1}}) \leq 2 \exp (- \frac{C_{I_{1}}}{4} x^{2}) .

(46)

For the term I₂, let $C_{I_{2}} = 2^{3 ∕ 2} C_{σ}^{2}$ , we have, by Condition 4,

I_{2} = \frac{{\overset{‒}{n}}_{K}^{3 ∕ 2}}{n} \int_{0}^{1} σ_{t}^{3} d t \leq C_{I_{2}} ∕ \sqrt{n}, on the set {Γ_{C_{σ} = 1}} .

(47)

Hence

P ({∣ I_{2} ∣ > \frac{x}{2}} \cap {Γ_{C_{σ} = 1}}) \leq {\begin{matrix} 0, & if x > 2 C_{I_{2}} \sqrt{n} \\ 1, & if x \leq 2 C_{I_{2}} ∕ \sqrt{n} . \end{matrix}

Since for all large $n, 1 \leq 2 \exp (- \frac{C_{I_{1}} 4 C_{I_{2}}^{2}}{4 n})$ , which is smaller than $2 \exp (- \frac{C_{I_{1}} x^{2}}{4})$ when $x \leq \frac{2 C_{I_{2}}}{\sqrt{n}}$ , we have that for all large n,

P ({∣ I_{2} ∣ > \frac{x}{2}} \cap {Γ_{C_{σ} = 1}}) \leq 2 \exp (- \frac{C_{I_{1}}}{4} x^{2}) .

(48)

Combining (46) and (48), we have, when $0 \leq x \leq 4 C_{1, x} C_{2} \sqrt{{\overset{‒}{n}}_{K}}$ ,

\begin{matrix} P ({\sqrt{{\overset{‒}{n}}_{K}} ∣ {〈 \hat{X, X} 〉}_{1} - \int_{0}^{1} σ_{t}^{2} d t ∣ > x} \cap {Γ_{C_{σ} = 1}}) \\ = & P ({∣ I_{1} + I_{2} ∣ > x} \cap {Γ_{C_{σ} = 1}}) \\ \leq & P ({∣ I_{1} ∣ > x ∕ 2} \cap {Γ_{C_{σ} = 1}}) + P ({∣ I_{2} > x, 2 ∣} \cap {Γ_{C_{σ} = 1}}) \leq 4 \exp (- \frac{C_{I_{1}}}{4} x^{2}) . \end{matrix}

By the Condition 4 again, we have, when 0 ≤ x ≤ cn^1/6

\begin{matrix} P ({n^{1 ∕ 6} ∣ {〈 \hat{X, X} 〉}_{1} - \int_{0}^{1} σ_{t}^{2} d t ∣ > x} \cap {Γ_{C_{σ} = 1}}) \\ \leq P ({\sqrt{{\overset{‒}{n}}_{K}} ∣ {〈 \hat{X, X} 〉}_{1} - \int_{0}^{1} σ_{t}^{2} d t ∣ > x ∕ \sqrt{2}} \cap {Γ_{C_{σ} = 1}}) \leq 4 \exp (- C x^{2}), \end{matrix}

(49)

where $c = 4 \sqrt{2} C_{1, x} C_{2} \sqrt{{\overset{‒}{n}}_{K}}$ , and $C = \frac{C_{I_{1}}}{8} = {(32 C_{2} w^{2})}^{- 1}$ . For big enough n and the typical case when C_Δ ≥ 1 and C_σ ≥ η_X, we have that

c = 2 \sqrt{2} C_{σ}^{2} C_{Δ} and C = \frac{1}{64 w^{2} C_{σ}^{4} C_{Δ}^{2}} .

(50)

Notice that this conditional result depends only on the observation frequency n and not on the locations of the observation times as long as the Condition 3 is satisfied, (49) holds unconditionally on the set of the observation times. This proves the first half of the Theorem 1 when Γ_Cσ ≡ 1.

For the second half of the theorem, we have,

\begin{matrix} P (n^{1 ∕ 6} ∣ {〈 \hat{X, X} 〉}_{1} - \int_{01} σ_{t}^{2} d t ∣ > x) \\ = & P ({n^{1 ∕ 6} ∣ {〈 \hat{X, X} 〉}_{1} - \int_{0}^{1} σ_{t}^{2} d t ∣ > x} \cap {Γ_{C_{σ} = 1}}) + P ({Γ_{C_{σ}} < 1}) \\ < 4 \exp (- \frac{x^{2}}{64 w^{2} C_{σ}^{4} C_{Δ}^{2}}) + k_{σ} \exp {- a C_{σ}^{b}}, when, 0 \leq x \leq 2 \sqrt{2} C_{σ}^{2} C_{Δ} n^{1 ∕ 6} . \end{matrix}

Letting $C_{σ} = {(\frac{x^{2}}{64 w^{2} a C_{Δ}^{2}})}^{\frac{1}{4 + b}}$ , we have, when $0 \leq x \leq 2^{\frac{3 b - 12}{2 b}} C_{Δ} {(w^{2} a)}^{\frac{- 2}{b}} \cdot n^{\frac{4 + b}{6 b}}$ ,

P {n^{1 ∕ 6} ∣ {[\tilde{X, X}]}_{1} - \int_{0}^{1} σ_{t}^{2} d t ∣ > x} \leq (4 + k_{σ}) \exp {- {(64 w^{2})}^{\frac{- b}{4 + b}} \cdot a^{\frac{4}{4 + b}} \cdot C_{Δ}^{\frac{- 2 b}{4 + b}} \cdot x^{\frac{2 b}{4 + b}}} .

(51)

Remark 3. In the above proof, we have demonstrated by using a sequence a_n that goes to ∞ at a moderate rate that, one can eliminate the impact of the small order terms on the choices of the constants, as long as the terms have their moment generating functions satisfy inequalities of form (27). We will use this technique again in the next subsection.

A.4 Proof of Theorem 2

We again conduct all the analysis assuming the observation times are given. Our final result holds because the conditional result doesn’t depend on the locations of the observation times as long as the Condition 3 is satisfied.

Recall notation for the observation times as introduced in section 3.2. Define

Z^{+} = X + Y and Z^{-} = X - Y .

Z⁺ and Z⁻ are diffusion processes with volatilities satisfying Condition 2. To see this, let W⁺ and W⁻ be processes such that

d W_{t}^{+} = \frac{σ_{t}^{(X)} d B_{t}^{(X)} + σ_{t}^{(Y)} d B_{t}^{(Y)}}{\sqrt{{(σ_{t}^{(X)})}^{2} + {(σ_{t}^{(Y)})}^{2} + 2 ρ_{t} σ_{t}^{(X)} σ_{t}^{(Y)}}} and d W_{t}^{-} = \frac{σ_{t}^{(X)} d B_{t}^{(X)} - σ_{t}^{(Y)} d B_{t}^{(Y)}}{\sqrt{{(σ_{t}^{(X)})}^{2} + {(σ_{t}^{(Y)})}^{2} - 2 ρ_{t} σ_{t}^{(X)} σ_{t}^{(Y)}}} .

W⁺ and W⁻ are standard Brownian motions by Levy’s characterization of Brownian motion (see, for example, Theorem 3.16, Chapter 3 of Karatzas and Shreve (2000)). Write

σ_{t}^{Z^{+}} = \sqrt{{(σ_{t}^{(X)})}^{2} + {(σ_{t}^{(Y)})}^{2} + 2 ρ_{t} σ_{t}^{(X)} σ_{t}^{(Y)}} and σ_{t}^{Z^{-}} = \sqrt{{(σ_{t}^{(X)})}^{2} + {(σ_{t}^{(Y)})}^{2} - 2 ρ_{t} σ_{t}^{(X)} σ_{t}^{(Y)}},

we have

d Z^{+} = σ_{t}^{Z^{+}} d E_{t}^{+} and d Z^{-} = σ_{t}^{Z^{-}} d W_{t}^{-}

with

0 \leq σ_{t}^{Z^{+}}, σ_{t}^{Z^{-}} \leq 2 C_{σ}

when $σ_{t}^{(X)}$ , $σ_{t}^{(Y)}$ are bounded by C_σ; or

P {\sup_{0 \leq t \leq 1} σ_{t}^{Z^{+}} \geq 2 C_{σ}} \leq P {\sup_{0 \leq t \leq 1} σ_{t}^{X} \geq C_{σ}} + P {\sup_{0 \leq t \leq 1} σ_{t}^{Y} \geq C_{σ}} \leq 2 k_{σ} \exp {- a C_{σ}^{b}},

when $σ_{t}^{X}$ , $σ_{t}^{Y}$ are such that their tail probabilities are bounded as in Condition 2.

In fact, let $Γ_{C_{σ}} ≔ \inf {t : \sup_{0 \leq s \leq t} σ_{s}^{(X)} > C_{σ} or \sup_{0 \leq s \leq t} σ_{s}^{(Y)} > C_{σ}} \land 1$ . We have $P {Γ_{C_{σ}} < 1} \leq P {\sup_{0 \leq t \leq 1} σ_{t}^{x} \geq c_{σ}} + P {\sup_{0 \leq t \leq 1} σ_{t}^{Y} \geq C_{σ}} \leq 2 k_{σ} \exp {- a C_{σ}^{b}}$ .

For the observed Z⁺ and Z⁻ processes, we have

Z_{v_{i}}^{+, 0} = X_{t_{i}}^{o} + Y_{s_{i}}^{o} = Z_{v_{i}}^{+} + ∊_{i, +} and Z_{v_{i}}^{-, o} = X_{t_{i}}^{o} - Y_{s_{i}}^{o} = Z_{v_{i}}^{-} + ∊_{i, -},

where t_i and s_i are the last ticks at or before v_i and

\begin{matrix} ∊_{i, +} = & X_{t_{i}} - X_{v_{i}} + Y_{s_{i}} - Y_{v_{i}} + ∊_{i}^{X} + ∊_{i}^{Y}, \\ ∊_{i, -} = & X_{t_{i}} - X_{v_{i}} - Y_{s_{i}} + Y_{v_{i}} + ∊_{i}^{X} - ∊_{i}^{Y} . \end{matrix}

Note that ${〈 \hat{X, Y} 〉}_{1} = \frac{1}{4} ({〈 \hat{Z^{+}, Z^{+}} 〉}_{1} - {〈 \hat{Z^{-}, Z^{-}} 〉}_{1})$ . We can first prove analogues results as Theorem 1 for ${〈 \hat{Z^{+}, Z^{+}} 〉}_{1}$ and ${〈 \hat{Z^{-}, Z^{-}} 〉}_{1}$ , then utilize the results to obtain the final conclusion for TSCV.

For ${〈 \hat{Z^{+}, Z^{+}} 〉}_{1}$ , the derivation is different from that of Theorem 1 only for the terms that involve the noise, namely $\sqrt{{\overset{‒}{n}}_{K}} R_{1}$ and $\sqrt{{\overset{‒}{n}}_{K}} R_{2}$ . Write $\tilde{Δ} X_{i} = X_{t_{i}} - X_{v_{i}}$ and $\tilde{Δ} Y_{i} = Y_{s_{i}} - Y_{v_{i}}$ . Then, we have, the first term in $\sqrt{{\overset{‒}{n}}_{K}} R_{1}$ becomes

\begin{matrix} \frac{\sqrt{{\overset{‒}{n}}_{K}} \sqrt{\tilde{n}}}{K} \cdot \frac{2}{\sqrt{\tilde{n}}} \sum_{i = 1}^{\tilde{n}} ∊_{i,} + ∊_{i - 1}, + = & \frac{\sqrt{{\overset{‒}{n}}_{K}} \sqrt{\tilde{n}}}{K} \cdot \frac{2}{\sqrt{\tilde{n}}} \sum_{i = 1}^{\tilde{n}} (\tilde{Δ} X_{i} \tilde{Δ} X_{i - 1} + \tilde{Δ} X_{i} \tilde{Δ} Y_{i - 1} + \tilde{Δ} X_{i} (∊_{i - 1}^{X} + ∊_{i - 1}^{Y})) \\ + \tilde{Δ} Y_{i} \tilde{Δ} X_{i - 1} + \tilde{Δ} Y_{i} \tilde{Δ} Y_{i - 1} + \tilde{Δ} Y_{i} (∊_{i - 1}^{X} + ∊_{i - 1}^{Y}) + (∊_{i}^{X} + ∊_{i}^{Y}) \tilde{Δ} X_{i - 1} \\ + (∊_{i}^{X} + ∊_{i}^{Y}) \tilde{Δ} Y_{i - 1} + (∊_{i}^{X} + ∊_{i}^{Y}) (∊_{i - 1}^{X} + ∊_{i - 1}^{Y}) . \end{matrix}

The only O_P (1) term is the last term, which involves only independent normals, and can be dealt with by the same way as before (again assume $\tilde{n}$ is even for simplicity of presentation):

\begin{matrix} E \exp {2 θ {\tilde{n}}^{- 1 ∕ 2} \sum_{odd i} (∊_{i}^{X} + ∊_{i}^{Y}) (∊_{i - 1}^{X} + ∊_{i - 1}^{Y})} & = E \exp {2 θ {\tilde{n}}^{- 1 ∕ 2} \sum_{even i} (∊_{i}^{X} + ∊_{i}^{Y}) (∊_{i - 1}^{X} + ∊_{i - 1}^{Y})} \\ = & {(\frac{1}{1 - 4 {(η_{X}^{2} + η_{Y}^{2})}^{2} θ^{2} ∕ \tilde{n}})}^{\tilde{n} ∕ 4} \\ \leq \exp {2 {(η_{X}^{2} + η_{Y}^{2})}^{2} θ^{2}}, when ∣ θ ∣ \leq \frac{\sqrt{\tilde{n}}}{2 \sqrt{2} (η_{X}^{2} + η_{Y}^{2})} . \end{matrix}

The other terms are of a smaller order of magnitude. By applying an $a_{\tilde{n}}$ sequence which grows moderately with $\tilde{n}$ as in the proof of Theorem 1 (we can set $a_{\tilde{n}} = {\tilde{n}}^{1 ∕ 12}$ ), we can see easily that as long as we can show that the moment generating functions of these terms can indeed be suitably bounded as in (27), their exact bounds don’t have effect on our choice of C₁, C₂ or ω. To show the bounds for the moment generating functions, first note that, for any positive number a and real valued b, by the optional sampling theorem (applied to sub-martingales $\exp (a B_{s}^{2})$ and $\exp (b \tilde{Δ} y B_{s})$ with stopping time ${[X]}_{u} \leq C_{σ}^{2} u$ for real number $\tilde{Δ} y$ ), we have,

\begin{matrix} E (\exp {a {(\tilde{Δ} X_{i})}^{2}} I_{{Γ_{C_{σ} = 1}}} ∣ F_{i - 1}) \leq & (E (\exp {a C_{σ}^{2} C_{Δ} Z^{2} ∕ \tilde{n}})) for Z \sim N (0, 1) \\ = & {(\frac{1}{1 - 2 a C_{σ}^{2} C_{Δ} ∕ \tilde{n}})}^{1 ∕ 2}, \end{matrix}

(52)

where $F_{i}$ is the information collected up to time v_i. Inequality (52) holds when $\tilde{Δ} X_{i}$ is replaced by $\tilde{Δ} Y_{i}$ . Similarly,

\begin{matrix} E (\exp {b \tilde{Δ} X_{i} \tilde{Δ} Y_{i - 1}} I_{{Γ_{C_{σ} = 1}}} ∣ F_{i - 1}) \leq & E (E (\exp {b \tilde{Δ} X_{i} \tilde{Δ} Y_{i - 1}} I_{{Γ_{c_{σ} = 1}}} ∣ F_{i - 1}) F_{i - 2}) \\ \leq & E (\exp {b^{2} C_{Δ} C_{σ}^{2} {(\tilde{Δ} Y_{i - 1})}^{2} ∕ 2 \tilde{n}} I_{{Γ_{C_{σ} = 1}}} ∣ F_{i - 2}) \\ \leq {(\frac{1}{1 - b^{2} C_{σ}^{4} C_{Δ}^{2} ∕ {\tilde{n}}^{2}})}^{1 ∕ 2} . \end{matrix}

(53)

The inequalities (52) and (53) can be used to obtain the bounds we need. For example, by (53) and the law of iterated expectations,

\begin{matrix} E (\exp {θ \sum_{odd i} \tilde{Δ} X_{i} \tilde{Δ} Y_{i - 1}} I_{{Γ_{C_{σ} = 1}}}) \leq & {(\frac{1}{1 - θ^{2} C_{σ}^{2} C_{Δ}^{2} ∕ {\tilde{n}}^{2}})}^{\tilde{n} ∕ 4} \\ \leq & \exp {θ^{2} C_{σ}^{4} C_{Δ}^{2} ∕ 2 \tilde{n}} when ∣ θ ∣ \leq \frac{\tilde{n}}{\sqrt{2} C_{σ}^{2} C_{Δ}}; \end{matrix}

and by independence, normality of the noise, the law of iterated expectations and (52), we have

\begin{matrix} E (\exp {\frac{θ}{a_{\tilde{n}}} \sum_{i = 1}^{\tilde{n}} \tilde{Δ} X_{i} (∊_{i - 1}^{X} + ∊_{i - 1}^{Y})} I {_{Γ_{C_{σ} = 1}}}) \\ = E (\exp {\sum_{i = 1}^{\tilde{n}} {(\frac{θ}{a \tilde{n}} \tilde{Δ} X_{i})}^{2} (η_{X}^{2} + η_{Y}^{2}) ∕ 2} I_{{Γ_{C_{σ} = 1}}}) \\ \leq & {(\frac{1}{1 - (η_{X}^{2} + η_{Y}^{2}) θ^{2} C_{σ}^{2} C_{Δ} ∕ \tilde{n} a_{\tilde{n}}^{2}})}^{\tilde{n} ∕ 2} \\ \leq \exp {(η_{X}^{2} + η_{Y}^{2}) C_{σ}^{2} C_{Δ} θ^{2} ∕ a_{\tilde{n}}^{2}}, when ∣ θ ∣ \leq \frac{\sqrt{\tilde{n} a_{\tilde{n}}}}{C_{σ} \sqrt{2 C_{Δ} (η_{X}^{2} + η_{Y}^{2})}} . \end{matrix}

Similar results can be found for the other terms above, with the same techniques.

The second term in $\sqrt{{\overset{‒}{n}}_{K}} R_{1}$ works similarly and have the same bound. The other terms in $\sqrt{{\overset{‒}{n}}_{K}} R_{1}$ (with $η_{X}^{2}$ replaced by $η_{X}^{2} + η_{Y}^{2}$ ) and the whole term of $\sqrt{{\overset{‒}{n}}_{K}} R_{2}$ are of order o_P (1) and have good tail behaviors. Again, by using a sequence $a_{\tilde{n}}$ we can conclude that their exact bounds won’t matter in our choice of the constants and we only need to show that their moment generating functions are appropriately bounded as in (27). The arguments needed to prove the inequalities of form (27) for each elements in these terms are similar to those presented in the above proofs, and are omitted here.

Hence, by still letting w = 8 and redefining

C_{1, x} = \frac{1}{4 {(2 C_{σ})}^{2} C_{Δ}}

and

C_{2} = \max {2 {(2 C_{σ})}^{4} C_{Δ}^{2}, 2 {(η_{X}^{2} + η_{Y}^{2})}^{2}} = 32 C_{σ}^{4} C_{Δ}^{2} for the typical case when C_{σ} \geq η_{X}, η_{Y},

we have, when $0 \leq x \leq c^{'} {\tilde{n}}^{1 ∕ 6}$ ,

P ({{\tilde{n}}^{1 ∕ 6} ∣ {〈 \hat{Z^{+}, Z^{+}} 〉}_{1} - \int_{0}^{1} σ_{t}^{Z + 2} d t ∣ > x} ⋂ {Γ_{C_{σ}} = 1}) \leq 4 \exp (- C^{'} x^{2}),

and

P ({{\tilde{n}}^{1 ∕ 6} ∣ {〈 \hat{Z^{-}, Z^{-}} 〉}_{1} - \int_{0}^{1} σ_{t}^{Z - 2} d t ∣ > x} ⋂ {Γ_{C_{σ}} = 1}) \leq 4 \exp (- C^{'} x^{2}),

where $c^{'} = 4 \sqrt{2} C_{1, x} C_{2}$ and C’ = (32C₂w²)⁻¹.

Finally, for the TSCV, when $C^{'} = {(32 C_{2} w^{2})}^{- 1}$ ,

\begin{matrix} P ({{\tilde{n}}^{1 ∕ 6} ∣ {〈 \hat{X, Y} 〉}_{1} - \int_{0}^{1} σ_{t}^{(X)} σ_{t}^{(Y)} ρ_{t}^{(X, Y)} d t ∣ > x} ⋂ {Γ_{C_{σ}} = 1}) \\ \leq P ({{\tilde{n}}^{1 ∕ 6} ∣ {〈 \hat{Z^{+}, Z^{+}} 〉}_{1} \int_{0}^{1} σ_{t}^{Z^{+ 2}} d t ∣ > 2 x} ⋂ {Γ_{C_{σ}} = 1}) \\ + & P ({{\tilde{n}}^{1 ∕ 6} ∣ {〈 \hat{Z^{-}, Z^{-}} 〉}_{1} - \int_{0}^{1} σ_{t}^{Z^{- 2}} d t ∣ > 2 x} ⋂ {Γ_{C_{σ}} = 1}) \\ \leq 8 \exp (- C x^{2}), \end{matrix}

where $c = c^{'} ∕ 2 = 2 \sqrt{2} C_{1, x} C_{2}$ and C = 4C’ = (8C₂w²)⁻¹. For big enough n and the typical case when C_Δ ≥ 1 and C_σ ≥ η_X, we have that

c = 4 \sqrt{2} C_{σ}^{2} C_{Δ} and C = {(256 w^{2} C_{σ}^{4} C_{Δ}^{2})}^{- 1} .

(54)

This completes the proof of the first half of the statement of Theorem 2, when Γ_Cσ ≡ 1.

For the second half of the theorem, we have,

\begin{matrix} P ({\tilde{n}}^{1 ∕ 5} ∣ {〈 \hat{X, Y} 〉}_{1} - \int_{0}^{1} σ_{t}^{(X)} σ_{t}^{(Y)} ρ_{t}^{(X, Y)} d t ∣ > x) \\ = & P ({{\tilde{n}}^{1 ∕ 6} ∣ {〈 \hat{X, Y} 〉}_{1} - \int_{0}^{1} σ_{t}^{(X)} σ_{t}^{(Y)} ρ_{t}^{(X, Y)} d t ∣ > x} \cap {Γ_{C_{σ}} < 1}) + P ({Γ_{C_{σ}} < 1}) \\ < 8 \exp (- {(256 w^{2} C_{σ}^{4} C_{Δ}^{2})}^{- 1} x^{2}) + 2 k_{σ} \exp {- a C_{σ}^{b}}, when 0 \leq x \leq 4 \sqrt{2} C_{σ}^{2} C_{Δ} {\tilde{n}}^{1 ∕ 6} . \end{matrix}

Let $C_{σ} = {(\frac{x^{2}}{256 w^{2} a C_{Δ}^{2}})}^{\frac{1}{4 + b}}$ the above inequality becomes

P {{\tilde{n}}^{1 ∕ 6} ∣ {〈 \hat{X, Y} 〉}_{1} - \int_{0}^{1} σ_{t}^{(X)} σ_{t}^{(Y)} ρ_{t}^{(X, Y)} d t ∣ > x} \leq (8 + 2 k_{σ}) \exp {- {(256 w^{2})}^{\frac{- b}{4 + b}} \cdot a^{\frac{4}{4 + b}} \cdot C_{Δ}^{\frac{- 2 b}{4 + b}} \cdot x^{\frac{2 b}{4 + b}}},

(55)

when $0 \leq x \leq 2^{\frac{5 b - 12}{2 b}} C_{Δ} {(w^{2} a)}^{\frac{- 2}{b}} \cdot {\tilde{n}}^{\frac{4 + b}{6 b}}$ .

Remark 4. Note that the argument is not restricted to TSCV based on the pairwise refresh times – it works the same (only with $\tilde{n}$ replaced by ${\tilde{n}}_{*}$ , the observation frequency of the all-refresh method) for the case when the synchronization scheme is chosen to be the all-refresh method, as long as the sampling conditions Condition 3-4 are satisfied.

Footnotes

Fan’s research was supported by NSF grant DMS-0704337, NIH grant R01-GM072611 and NIH grant R01GM100474. The main part of the work was carried while Yingying Li was a postdoctoral fellow at the Department of Operations Research and Financial Engineering, Princeton University. Li’s research was further partially supported by GRF 606811 of Hong Kong SAR. The authors thank the editor, the associate editor, and two referees for their helpful comments.

Contributor Information

Jianqing Fan, Department of Operations Research and Financial Engineering, Princeton University, Princeton, NJ 08544, (jqfan@princeton.edu).

Yingying Li, Department of Information Systems, Business Statistics and Operations Management, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, (yyli@ust.hk).

Ke Yu, Department of Operations Research and Financial Engineering, Princeton University, Princeton, NJ 08544, (kyu@princeton.edu).

REFERENCES

Aït-Sahalia Y, Fan J, Xiu D. High Frequency Covariance Estimates with Noisy and Asynchronous Financial Data. Journal of the American Statistical Association. 2010;105:1504C1517. [Google Scholar]
Aït-Sahalia Y, Mykland PA, Zhang L. How often to sample a continuous-time process in the presence of market microstructure noise. Review of Financial Studies. 2005;18:351–416. [Google Scholar]
Andersen TG, Bollerslev T, Diebold FX, Labys P. Great realizations. Risk. 2000;13:105–108. [Google Scholar]
Barndorff-Nielsen OE, Hansen PR, Lunde A, Shephard N. Multivariate realised kernels: consistent positive semi-definite estimators of the covariation of equity prices with noise and non-synchronous trading. Manuscript. 2008 [Google Scholar]
Barndorff-Nielsen OE, Hansen PR, Lunde A, Shephard N. Realized kernels in practice: trades and quotes. Econometrics Journal. 2009;12:1–32. [Google Scholar]
Barndorff-Nielsen OE, Hansen PR, Lunde A, Shephard N. Subsampling realised kernel. Journal of Econometrics. 2011;160:204–219. [Google Scholar]
Best MJ, Grauer RR. On the sensitivity of mean-variance-efficient portfolios to changes in asset means: Some analytical and computational results. Review of Financial Studies. 1991;2:315–342. [Google Scholar]
Bickel PJ, Levina E. Regularized estimation of large covariance matrices. Annals of Statistics. 2008;36:199–227. [Google Scholar]
Chopra VK, Ziemba WT. The effect of errors in means, variance and covariances on optimal portfolio choice. Journal of Portfolio Management. 1993 winter:6–11. [Google Scholar]
Delbaen F, Schachermayer W. A general version of the fundamental theorem of asset pricing. Mathematische Annalen. 1994;300:463–520. [Google Scholar]
De Roon FA, Nijman TE, Werker BJM. Testing for mean-variance spanning with short sales constraints and transaction costs: The case of emerging markets. Journal of Finance. 2001;54:721–741. [Google Scholar]
Epps TW. Comovements in stock prices in the very short run. Journal of the American Statistical Association. 1979;74:291–298. [Google Scholar]
Fan J, Wang Y. Multi-scale jump and volatility analysis for high-Frequency financial data. Journal of the American Statistical Association. 2007;102:1349–1362. [Google Scholar]
Fan J, Fan Y, Lv J. Large dimensional covariance matrix estimation via a factor model. Journal of Econometrics. 2008;147:186–197. [Google Scholar]
Fan J, Zhang J, Yu K. Asset Allocation and Risk Assessment with Gross Exposure Constraints for Vast Portfolios. Journal of American Statistical Association. 2011 doi: 10.1080/01621459.2012.682825. under revision. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hayashi T, Yoshida N. On covariance estimation of non-synchronously observed diffusion processes. Bernoulli. 2005;11:359–379. [Google Scholar]
Hautsch N, Kyj RM, Oomen LCA. A blocking and regularization approach to high dimensional realized covariance estimation. Journal of Applied Econometrics. 2009 forth-coming. [Google Scholar]
Jagannathan R, Ma T. Risk reduction in large portfolios: Why imposing the wrong constraints helps. Journal of Finance. 2003;58:1651–1683. [Google Scholar]
Jacod J, Li Y, Mykland PA, Podolskij M, Vetter M. Microstructure Nnoise in the continuous case: The Pre-averaging approac. Stochastic Processes and their Applications. 2009;119:2249–2276. [Google Scholar]
Jacod J, Shiryaev AN. Limit Theorems for Stochastic Processes. 2nd edition Springer-Verlag; New York: 2003. [Google Scholar]
Johnstone IM. On the distribution of the largest eigenvalue in principal components analysis. Annals of Statistics. 2001;29:295–327. [Google Scholar]
Karatzas I, Shreve SE. Brownian Motion and Stochastic Calculus. 2nd ed. Springer; New York: 2000. [Google Scholar]
Kinnebrock S, Podolskij M, Christensen K. Pre-Averaging estimators of the ex-post covariance matrix in noisy diffusion models with non-synchronous data. Journal of Econometrics. 2010;159:116–133. [Google Scholar]
Klein RW, Bawa VS. The effect of estimation risk on optimal portfolio choice. Journal of Financial Economics. 1976;3:215–231. [Google Scholar]
Lam C, Fan J. Sparsistency and rates of convergence in large covariance matrices estimation. The Annals of Statistics. 2009;37:4254–4278. doi: 10.1214/09-AOS720. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ledoit O, Wolf M. Honey, I Shrunk the Sample Covariance Matrix. Journal of Portfolio Management. 2004;30:110–119. [Google Scholar]
Li Y, Mykland P. Are volatility estimators robust with respect to modeling assumptions? Bernoulli. 2007;13:601–622. [Google Scholar]
Markowitz HM. Portfolio selection. Journal of Finance. 1952;7:77–91. [Google Scholar]
Markowitz H. Portfolio Selection: Efficient Diversification of Investments. John Wiley & Sons; New York: 1959. [Google Scholar]
Rothman AJ, Levina E, Zhu J. Generalized thresholding of large covariance matrices. Journal of the American Statistical Association. 2009;104:177–186. [Google Scholar]
Tao M, Wang Y, Yao Y, Zou J. Large Volatility Matrix Inference via Combining Low-Frequency and High-Frequency Approaches. Journal of the American Statistical Association. 2011;106:1025–1040. [Google Scholar]
Wang Y, Zou J. Vast volatility matrix estimation for high-frequency financial data. Annals of Statistics. 2010;38:943–978. [Google Scholar]
Xiu D. Quasi-maximum likelihood estimation of volatility with high frequency data. Journal of Econometrics. 2010;159:235–250. [Google Scholar]
Zhang L. Efficient estimation of stochastic volatility using noisy observations: a multiscale approach. Bernoulli. 2006;12:1019–1043. [Google Scholar]
Zhang L. Estimating covariation: Epps effect and microstructure noise. Journal of Econometrics. 2011;160:33–47. [Google Scholar]
Zhang L, Mykland PA, Aït-Sahalia Y. A Tale of Two Time Scales: Determining Integrated Volatility with Noisy High-Frequency Data. Journal of the American Statistical Association. 2005;100:1394–1411. [Google Scholar]
Zheng X, Li Y. On the Estimation of Integrated Covariance Matrices of High Dimensional Diffusion Processes. Annals of Statistics. 2011 forthcoming. [Google Scholar]

[R1] Aït-Sahalia Y, Fan J, Xiu D. High Frequency Covariance Estimates with Noisy and Asynchronous Financial Data. Journal of the American Statistical Association. 2010;105:1504C1517. [Google Scholar]

[R2] Aït-Sahalia Y, Mykland PA, Zhang L. How often to sample a continuous-time process in the presence of market microstructure noise. Review of Financial Studies. 2005;18:351–416. [Google Scholar]

[R3] Andersen TG, Bollerslev T, Diebold FX, Labys P. Great realizations. Risk. 2000;13:105–108. [Google Scholar]

[R4] Barndorff-Nielsen OE, Hansen PR, Lunde A, Shephard N. Multivariate realised kernels: consistent positive semi-definite estimators of the covariation of equity prices with noise and non-synchronous trading. Manuscript. 2008 [Google Scholar]

[R5] Barndorff-Nielsen OE, Hansen PR, Lunde A, Shephard N. Realized kernels in practice: trades and quotes. Econometrics Journal. 2009;12:1–32. [Google Scholar]

[R6] Barndorff-Nielsen OE, Hansen PR, Lunde A, Shephard N. Subsampling realised kernel. Journal of Econometrics. 2011;160:204–219. [Google Scholar]

[R7] Best MJ, Grauer RR. On the sensitivity of mean-variance-efficient portfolios to changes in asset means: Some analytical and computational results. Review of Financial Studies. 1991;2:315–342. [Google Scholar]

[R8] Bickel PJ, Levina E. Regularized estimation of large covariance matrices. Annals of Statistics. 2008;36:199–227. [Google Scholar]

[R9] Chopra VK, Ziemba WT. The effect of errors in means, variance and covariances on optimal portfolio choice. Journal of Portfolio Management. 1993 winter:6–11. [Google Scholar]

[R10] Delbaen F, Schachermayer W. A general version of the fundamental theorem of asset pricing. Mathematische Annalen. 1994;300:463–520. [Google Scholar]

[R11] De Roon FA, Nijman TE, Werker BJM. Testing for mean-variance spanning with short sales constraints and transaction costs: The case of emerging markets. Journal of Finance. 2001;54:721–741. [Google Scholar]

[R12] Epps TW. Comovements in stock prices in the very short run. Journal of the American Statistical Association. 1979;74:291–298. [Google Scholar]

[R13] Fan J, Wang Y. Multi-scale jump and volatility analysis for high-Frequency financial data. Journal of the American Statistical Association. 2007;102:1349–1362. [Google Scholar]

[R14] Fan J, Fan Y, Lv J. Large dimensional covariance matrix estimation via a factor model. Journal of Econometrics. 2008;147:186–197. [Google Scholar]

[R15] Fan J, Zhang J, Yu K. Asset Allocation and Risk Assessment with Gross Exposure Constraints for Vast Portfolios. Journal of American Statistical Association. 2011 doi: 10.1080/01621459.2012.682825. under revision. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] Hayashi T, Yoshida N. On covariance estimation of non-synchronously observed diffusion processes. Bernoulli. 2005;11:359–379. [Google Scholar]

[R17] Hautsch N, Kyj RM, Oomen LCA. A blocking and regularization approach to high dimensional realized covariance estimation. Journal of Applied Econometrics. 2009 forth-coming. [Google Scholar]

[R18] Jagannathan R, Ma T. Risk reduction in large portfolios: Why imposing the wrong constraints helps. Journal of Finance. 2003;58:1651–1683. [Google Scholar]

[R19] Jacod J, Li Y, Mykland PA, Podolskij M, Vetter M. Microstructure Nnoise in the continuous case: The Pre-averaging approac. Stochastic Processes and their Applications. 2009;119:2249–2276. [Google Scholar]

[R20] Jacod J, Shiryaev AN. Limit Theorems for Stochastic Processes. 2nd edition Springer-Verlag; New York: 2003. [Google Scholar]

[R21] Johnstone IM. On the distribution of the largest eigenvalue in principal components analysis. Annals of Statistics. 2001;29:295–327. [Google Scholar]

[R22] Karatzas I, Shreve SE. Brownian Motion and Stochastic Calculus. 2nd ed. Springer; New York: 2000. [Google Scholar]

[R23] Kinnebrock S, Podolskij M, Christensen K. Pre-Averaging estimators of the ex-post covariance matrix in noisy diffusion models with non-synchronous data. Journal of Econometrics. 2010;159:116–133. [Google Scholar]

[R24] Klein RW, Bawa VS. The effect of estimation risk on optimal portfolio choice. Journal of Financial Economics. 1976;3:215–231. [Google Scholar]

[R25] Lam C, Fan J. Sparsistency and rates of convergence in large covariance matrices estimation. The Annals of Statistics. 2009;37:4254–4278. doi: 10.1214/09-AOS720. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R26] Ledoit O, Wolf M. Honey, I Shrunk the Sample Covariance Matrix. Journal of Portfolio Management. 2004;30:110–119. [Google Scholar]

[R27] Li Y, Mykland P. Are volatility estimators robust with respect to modeling assumptions? Bernoulli. 2007;13:601–622. [Google Scholar]

[R28] Markowitz HM. Portfolio selection. Journal of Finance. 1952;7:77–91. [Google Scholar]

[R29] Markowitz H. Portfolio Selection: Efficient Diversification of Investments. John Wiley & Sons; New York: 1959. [Google Scholar]

[R30] Rothman AJ, Levina E, Zhu J. Generalized thresholding of large covariance matrices. Journal of the American Statistical Association. 2009;104:177–186. [Google Scholar]

[R31] Tao M, Wang Y, Yao Y, Zou J. Large Volatility Matrix Inference via Combining Low-Frequency and High-Frequency Approaches. Journal of the American Statistical Association. 2011;106:1025–1040. [Google Scholar]

[R32] Wang Y, Zou J. Vast volatility matrix estimation for high-frequency financial data. Annals of Statistics. 2010;38:943–978. [Google Scholar]

[R33] Xiu D. Quasi-maximum likelihood estimation of volatility with high frequency data. Journal of Econometrics. 2010;159:235–250. [Google Scholar]

[R34] Zhang L. Efficient estimation of stochastic volatility using noisy observations: a multiscale approach. Bernoulli. 2006;12:1019–1043. [Google Scholar]

[R35] Zhang L. Estimating covariation: Epps effect and microstructure noise. Journal of Econometrics. 2011;160:33–47. [Google Scholar]

[R36] Zhang L, Mykland PA, Aït-Sahalia Y. A Tale of Two Time Scales: Determining Integrated Volatility with Noisy High-Frequency Data. Journal of the American Statistical Association. 2005;100:1394–1411. [Google Scholar]

[R37] Zheng X, Li Y. On the Estimation of Integrated Covariance Matrices of High Dimensional Diffusion Processes. Annals of Statistics. 2011 forthcoming. [Google Scholar]

PERMALINK

Vast Volatility Matrix Estimation using High Frequency Data for Portfolio Selection*

Jianqing Fan

Yingying Li

Ke Yu

Roles