Portfolio Selection Based on EMD Denoising with Correlation Coefficient Test Criterion

Kuangxi Su; Yinhong Yao; Chengli Zheng; Wenzhao Xie

doi:10.1007/s10614-022-10345-4

. 2022 Nov 27:1–31. Online ahead of print. doi: 10.1007/s10614-022-10345-4

Portfolio Selection Based on EMD Denoising with Correlation Coefficient Test Criterion

Kuangxi Su ¹, Yinhong Yao ², Chengli Zheng ^3,^✉, Wenzhao Xie ⁴

PMCID: PMC9702635 PMID: 36467874

Abstract

Noise is an important factor affecting portfolio performance, how to construct an effective denoising strategy is becoming increasingly important for investors. In this study, we theoretically explain the impact of noise on portfolio and argue the necessity of denoising. Next, the empirical mode decomposition (EMD) denoising strategy based on the correlation coefficient test criterion is proposed to improve portfolio performance. In detail, EMD is used to decompose the noisy price, then, a series of correlation coefficient tests are performed to determine which intrinsic mode functions (IMFs) are noise. In the empirical analysis, we apply the proposed method to denoise the SSE 50 index’s constituents, and further test the out-of-sample performance under the mean–variance framework. The empirical results show that the proposed denoising method outperforms four common EMD, Ensemble EMD (EEMD) and wavelet denoising methods in return-risk ratio. The proposed method is the optimal denoising strategy, which can help investors improve portfolio performance to the greatest extent.

Keywords: Portfolio selection, Empirical mode decomposition, Correlation coefficient test, Financial data denoising

Introduction

Portfolio selection problem has been one of the core issues of the modern investment theory (Ao et al., 2019). How to construct an effective portfolio to improve the out-of-sample performance is the focus in academia and industry (Ma et al., 2019). In practice, an often ignored fact is that noise is an important factor affecting portfolio performance (Kondor et al., 2007; Dessaint et al., 2019; Peress and Schmidt, 2020). Some studies indicate that denoising can significantly improve investors’ returns (Aloui and Jammazi, 2015; Zhu et al., 2019, 2021). However, the previous common denoising methods, especially empirical mode decomposition (EMD) denoising, have some weaknesses in portfolio management, such as inadequate or excessive denoising (He et al., 2017; Helong et al., 2019). To address these weaknesses, an EMD denoising strategy based on the correlation coefficient test criterion is proposed to improve portfolio performance.

The existence of noise originates from that individual investors have no access to inside information, they do not follow buy and hold strategies, and tend to select stocks with strong past returns (Black, 1986; Odean, 1999). A result from this concentrated trading is that prices tend to deviate from their fundamental values (Odean, 1999). Black (1986) labels these deviations as "noise". One often ignored fact is that the time series in financial market are easily interfered by noise, which may mislead the model fitting (Kondor et al., 2007). As results, the portfolio models may provide inaccurate results, investors who make decisions based on biased results will inevitably suffer losses. To eliminate noise interference, some researchers try to introduce data decomposition methods, such as the popular wavelet decomposition, into portfolio management. For example, Aloui & Jammazi (2015), Zhu et al. (2019, 2021) propose different denoising methods to construct portfolio models based on the wavelet decomposition technique, their empirical results indicate that the profitability, Sharpe ratio, and model accuracy have been improved after filtering the noise from original data. Overall, there are limited theoretical and empirical studies to investigate portfolio performance from a denoising perspective.

Except for the wavelet decomposition, EMD also receives extensive attention (Huang et al., 1998). Compare to wavelet decomposition, EMD does not require any prior assumptions about signal modes or system orders, and can directly decompose original data into finite intrinsic mode functions (IMFs) and a trend item. To date, it has shown outstanding advantages in decomposing financial data (Zhu et al., 2017; Yang et al., 2019). In this study, we use EMD instead of wavelet decomposition to construct different denoising strategies.

The key to EMD denoising is how to select the decomposed IMFs. It is generally accepted that different IMFs represent different fluctuation levels (Huang et al., 1998), the high-frequency IMFs are disordered and display minimal regularity, which are mainly caused by a series of factors that have short-term effects, such as bad weather and strikes, etc. Flandrin et al. (2004) consider these high-frequency components as noise and argue that the main information is concentrated in the low-frequency IMFs. Thus, there must be a key index, the IMFs after IMF $_{index}$ are regarded as the dominant modes, and the formers are considered as noise. Numerous studies follow this framework to denoise different types of data in engineering and medical fields, etc (Boudraa and Cexus, 2007; Nguyen and Kim, 2016). However, these denoising methods may not be suitable for finance data since the optimal denoising strategy highly depends on the data characteristic, i.e., different types of data have different optimal denoising strategies (Li et al., 2016; Nguyen and Kim, 2016; Zhu et al., 2019, 2021). In practice, the approach might face many weaknesses, such as inadequate or excessive denoising.

Therefore, a new EMD denoising strategy based on the correlation coefficient test criterion is proposed to improve portfolio performance. In detail, we first theoretically prove that noise can cause the optimal portfolio weights and effective frontier to deviate from their true positions. Thus, it is necessary to eliminate noise. Next, we apply EMD to decompose original noisy price and perform a series of correlation coefficient tests to identify which IMFs are noise. If the tests accept the null hypothesis, the IMFs are considered as noise. Conversely, they are considered as non-noisy components. Finally, we sum the non-noisy components and residual to construct the denoised price.

In the empirical analysis, the daily closing prices of 3180 trading days ranging from October 8, 2007 to October 30, 2020 are collected to test portfolio performance. Four quantitative indicators including Sharpe ratio, Sortino ratio, upside potential ratio and tracking error ratio, are used to deeply summarize out-of-sample performance. The empirical results show that the proposed denoising method outperforms common EMD, Ensemble EMD (EEMD) and wavelet denoising methods under the mean–variance framework. Besides, the portfolio performance is examined in four different subsamples, including bull, bear markets and two special periods, i.e., the 2007–2008 financial crisis and coronavirus disease 2019 (COVID-19) pandemic in 2020. The results reconfirm the superiority of the proposed denoising method. The simulation study by setting different parameters validates the above conclusions. Overall, the proposed denoising method can minimize noise interference, and help investors improve portfolio performance to the greatest extent.

This paper contributes to portfolio management in the following two dimensions. First, we theoretically analyze the impact of noise on the portfolio, and prove that noise causes the optimal portfolio and effective frontier to deviate from their true positions. In this way, the theoretical basis of denoising is argued. Second, we point out the weaknesses of common denoising methods applied to portfolio management and construct an EMD denoising strategy based on the correlation coefficient test criterion, whose portfolio performance significantly outperforms other common denoising methods.

Figure 1 plots the framework of this paper. Section 2 theoretically analyzes the motivation of denoising. Section 3 introduces the proposed EMD denoising method based on the correlation coefficient test criterion. As a comparison, four common EMD denoising methods are also described. Section 4 compares the portfolio performance of different denoising methods under the mean–variance framework with different sample periods. Section 5 further evaluates the robustness of the proposed denoising method through simulated data. The last section concludes the paper.

Portfolio Theory Under Noisy Environment

In this section, we decompose the noisy price into non-noisy component and noise, and further construct the mean–variance model under the noisy environment. By comparing the portfolio under non-noisy environment, we explain the impact of noise on portfolio and argue the necessity of denoising.

The Noisy Portfolio Returns

Due to the asymmetry and incompleteness of information, the stock prices are generally noisy (Black, 1986; Odean 1999). Considering the price $x_{i} (t)$ of stock $i (i = 1, \dots, k)$ at time $t (t = 1, \dots, T)$ is composed of non-noisy component $s_{i} (t)$ and noise $n_{i} (t)$ .

\begin{matrix} x_{i} (t) = s_{i} (t) + n_{i} (t), t = 1, \dots, T \end{matrix}

where the noise $n_{i} (t)$ and non-noisy component $s_{i} (t)$ are uncorrelated, i.e., $cov (s_{i} (t), n_{i} (t)) = 0$ . Then, the return $r_{i} (t)$ for stock i can be calculated as

\begin{matrix} \begin{matrix} r_{i} (t) & = \frac{x_{i} (t) - x_{i} (t - 1)}{x_{i} (t - 1)} = \frac{s_{i} (t) - s_{i} (t - 1) + n_{i} (t) - n_{i} (t - 1)}{x_{i} (t - 1)} \\ = \frac{s_{i} (t) - s_{i} (t - 1)}{s_{i} (t - 1)} \frac{s_{i} (t - 1)}{x_{i} (t - 1)} + \frac{n_{i} (t) - n_{i} (t - 1)}{n_{i} (t - 1)} \frac{n_{i} (t - 1)}{x_{i} (t - 1)} \\ = r_{i, s} (t) \frac{s_{i} (t - 1)}{x_{i} (t - 1)} + r_{i, n} (t) \frac{x_{i} (t - 1) - s_{i} (t - 1)}{x_{i} (t - 1)} \\ = α_{i} (t - 1) r_{i, s} (t) + (1 - α_{i} (t - 1)) r_{i, n} (t) \end{matrix} \end{matrix}

where $r_{i, s} (t) = (s_{i} (t) - s_{i} (t - 1)) / s_{i} (t - 1)$ is the return of non-noisy component. Similarly, $r_{i, n} (t) = (n_{i} (t) - n_{i} (t - 1)) / n_{i} (t - 1)$ is the return for noise. $α_{i} (t - 1) = s_{i} (t - 1) / x_{i} (t - 1)$ denotes the share of non-noisy component in x(t). For reading convenience, the variables $r_{i} (t), r_{i, s} (t), r_{i, n} (t), x_{i} (t), s_{i} (t), n_{i} (t)$ and $α_{i} (t - 1)$ are denoted by $r_{i}, r_{i, s}, r_{i, n}, x_{i}, s_{i}, n_{i}$ and $α_{i}$ , respectively. Furthermore, the noisy returns $r = {(r_{1}, \dots, r_{k})}^{τ}$ can be expressed as

\begin{matrix} \begin{matrix} r & = α ⊙ r_{s} + (1 - α) ⊙ r_{n} \\ = R_{s} + R_{n} \end{matrix} \end{matrix}

where ${(r_{1}, \dots, r_{k})}^{τ}$ and $⊙$ denote the transposition of $(r_{1}, \dots, r_{k})$ and Hadamard product (Johnson, 1990). $r_{s} = {(r_{1, s}, \dots, r_{k, s})}^{τ}$ and $r_{n} = {(r_{1, n}, \dots, r_{k, n})}^{τ}$ present the noisy and non-noisy returns, their shares in the noisy returns are $α = {(α_{1}, \dots, α_{k})}^{τ}$ and $1 - α = {(1 - α_{1}, \dots, 1 - α_{k})}^{τ}$ , respectively. Besides, we let $R_{s} = {(R_{1, s}, \dots, R_{k, s})}^{τ}$ and $R_{n} = {(R_{1, n}, \dots, R_{k, n})}^{τ}$ denote $α ⊙ r_{s}$ and $(1 - α) ⊙ r_{n}$ , where $R_{i, s} = α_{i} ⊙ r_{i, s} = r_{i, s} s_{i} / x_{i}$ and $R_{i, n} = (1 - α_{i}) ⊙ r_{i, n} = r_{i, n} n_{i} / x_{i}$ .

Since the price $x_{i}$ is generally bounded, i.e., $M_{1} \leq x_{i} \leq M_{2}$ , where $M_{1}$ and $M_{2}$ are constants. Besides, it is deduced that $cov (r_{i, s}, r_{i, n}) = cov (s_{i} r_{i, s}, n_{i} r_{i, n}) = 0$ based on $cov (s_{i}, n_{i}) = 0$ . Finally, the covariance $cov (R_{i, s}, R_{i, n})$ follows the inequality if considering $1 / x_{i}$ as a coefficient term.

\begin{matrix} 0 = \frac{1}{M_{2}^{2}} cov (s_{i} r_{i, s}, n_{i} r_{i, n}) \leq cov (R_{i, s}, R_{i, n}) = cov (\frac{s_{i}}{x_{i}} r_{i, s}, \frac{n_{i}}{x_{i}} r_{i, n}) \leq \frac{1}{M_{1}^{2}} cov (s_{i} r_{i, s}, n_{i} r_{i, n}) = 0 \end{matrix}

Equation 4 shows that $c o v (R_{i, s}, R_{i, n}) = 0$ , which means that the return $r_{i}$ are mainly composed of non-noisy component $R_{i, s}$ and noise $R_{i, n}$ . Besides, we can deduce that $cov (R_{i, s}, R_{j, n}) = 0, i \neq j$ . In this way, the portfolio return $r_{p}$ is

\begin{matrix} r_{p} = w^{τ} r = w^{τ} (R_{s} + R_{n}) \end{matrix}

where $w = {(w_{i}, \dots, w_{k})}^{τ}$ are the portfolio weights, and $cov (R_{s}, R_{n}) = 0$ . Furthermore, we can obtain that the expectation and variance of the portfolio return $r_{p}$ are

\begin{matrix} \begin{matrix} E (r_{p}) & = w^{τ} (μ_{s} + μ_{n}) \\ v a r (r_{p}) & = w^{τ} Σ_{s} w + w^{τ} Σ_{n} w \end{matrix} \end{matrix}

where $μ_{s}$ and $μ_{n}$ denote the expectations of non-noisy component $R_{s}$ and noise $R_{n}$ . Similarly, $Σ_{s}$ and $Σ_{n}$ denote the covariance matrices of $R_{s}$ and $R_{n}$ , respectively.

Mean–Variance Model Under Noisy Environment

Following Markowitz’s portfolio optimization framework (Markowitz 1952). The classical mean–variance portfolio model, which aims at minimizing portfolio variance under the given expected return $E (r_{p}) = μ_{0}$ , can be expressed as

\begin{matrix} \begin{matrix} w (μ_{0}) = argmin & w^{τ} Σ_{s} w + w^{τ} Σ_{n} w \\ s.t. & w^{τ} (μ_{s} + μ_{n}) = μ_{0} \end{matrix} \end{matrix}

For calculation convenience, we consider an investor’s wealth might be partially allocated to the risk-free security and short sales are allowed, the restriction $w^{τ} 1 = 1$ is not included in Eq. (7). By using the Lagrange multiplier algorithm, the optimal solution can be obtained by solving $\min_{(w, λ)} L (w, λ)$ ,

\begin{matrix} L (w, λ) = w^{τ} Σ_{s} w + w^{τ} Σ_{n} w - λ [w^{τ} (μ_{s} + μ_{n}) - μ_{0}] \end{matrix}

where $w$ is the optimal solution of Eq. (7) when the Lagrange function $L (w, λ)$ satisfies

\begin{matrix} \{\begin{matrix} \frac{\partial L}{\partial w} = 2 (Σ_{s} + Σ_{n}) w - λ (μ_{s} + μ_{n}) = 0 \\ [3 m m] \frac{\partial L}{\partial λ} = w^{τ} (μ_{s} + μ_{n}) - μ_{0} = 0 \end{matrix}) \end{matrix}

Then under the noisy environment, the optimal mean–variance portfolio weight vector $w_{noise}^{*}$ is computed as

\begin{matrix} w_{noise}^{*} = μ_{0} \frac{{(Σ_{s} + Σ_{n})}^{- 1} (μ_{s} + μ_{n})}{{(μ_{s} + μ_{n})}^{τ} {(Σ_{s} + Σ_{n})}^{- 1} (μ_{s} + μ_{n})} \end{matrix}

Similarly, the optimal portfolio weight vector $w_{nonnoise}^{*}$ under the noise-free environment is calculated as follows:

\begin{matrix} w_{nonnoise}^{*} = μ_{0} \frac{{(Σ_{s})}^{- 1} μ_{s}}{μ_{s}^{τ} {(Σ_{s})}^{- 1} μ_{s}} \end{matrix}

Equations (10), (11) show that noise affects portfolio weight not only through the covariance matrix but also through the expected return, which confirms the fact that noise is an important factor affecting portfolio performance. In practice, what investors need is the portfolio weight $w_{nonnoise}^{*}$ under non-noisy environment, however, due to the existence of noise, the actual portfolio weight they obtain is $w_{noise}^{*}$ . As a result, it is difficult for investors to construct an effective diversification, therefore, it is necessary to use some appropriate denoising strategies to suppress the noise interference.

When focusing on noise, a common assumption in practice is that the mean of noise is 0, i.e., $μ_{n} = 0$ (Donoho and Johnstone, 1994). In this case, the optimal portfolio weight $w_{noise}^{†}$ under noisy environment is

\begin{matrix} w_{noise}^{†} = μ_{0} \frac{{(Σ_{s} + Σ_{n})}^{- 1} μ_{s}}{μ_{s}^{τ} {(Σ_{s} + Σ_{n})}^{- 1} μ_{s}} \end{matrix}

It is clear that noise affects portfolio performance only through the covariance matrix, which confirms the validity of previous studies to filter the covariance matrix (Daly et al., 2008; Tian and Zhao, 2020). However, when the assumption $μ_{n} = 0$ is not satisfied, only filtering the covariance matrix is not sufficient.

Mean–Variance Effective Frontier

When analyzing the interference of noise on portfolio variance, since the mean of returns is close to 0 in practice, we can consider a simple scenario, i.e., the assumption $μ_{n} = 0$ is satisfied. In this way, we bring Eq. (12) into Eq. (6), then, the portfolio variance under noisy environment is calculated as

\begin{matrix} \begin{matrix} σ_{noise}^{2} = {(w_{noise}^{†})}^{τ} (Σ_{s} + Σ_{n}) w_{noise}^{†} = \frac{μ_{0}^{2}}{μ_{s}^{τ} {(Σ_{s} + Σ_{n})}^{- 1} μ_{s}} \end{matrix} \end{matrix}

If taking the portfolio variance $σ_{noise}^{2}$ and expected return $μ_{0}$ as the axis, the shape of mean–variance effective frontier is a parabola that opens to the right and passes through the origin point. The reason for this result is that we impose certain constraints on the mean–variance model, such as $μ_{n} = 0$ , etc. Similarly, the portfolio variance under the non-noisy environment is computed as

\begin{matrix} \begin{matrix} σ_{nonnoise}^{2} = {(w_{nonnoise}^{*})}^{τ} Σ_{s} w v^{*} = \frac{μ_{0}^{2}}{μ_{s}^{τ} Σ_{s}^{- 1} μ_{s}} \end{matrix} \end{matrix}

Equation (14) shows that noise causes the portfolio variance to deviate from the true position, which is consistent with the results of optimal portfolio weights. Besides, when comparing the portfolio variance under noisy and non-noisy environments, the magnitude between them can be obtained from the following equation.

\begin{matrix} \begin{matrix} \frac{μ_{0}^{2}}{σ_{noise}^{2}} - \frac{μ_{0}^{2}}{σ_{nonnoise}^{2}} & = μ_{s}^{τ} {(Σ_{s} + Σ_{n})}^{- 1} μ_{s} - μ_{s}^{τ} Σ_{s}^{- 1} μ_{s} \\ [- 1 m m] & = | μ_{s}^{τ} {(Σ_{s} + Σ_{n})}^{- 1} μ_{s} | - | μ_{s}^{τ} Σ_{s}^{- 1} μ_{s} | \\ = | {(Σ_{s} + Σ_{n})}^{- 1} | \cdot | μ_{s}^{τ} μ_{s} | - | Σ_{s}^{- 1} | \cdot | μ_{s}^{τ} μ_{s} | \\ = [| {(Σ_{s} + Σ_{n})}^{- 1} | - | Σ_{s}^{- 1} |] \cdot | μ_{s}^{τ} μ_{s} | \end{matrix} \end{matrix}

where $| μ_{s}^{τ} μ_{s} | \geq 0$ , the matrices $Σ_{s}$ , $Σ_{n}$ and $Σ_{s} + Σ_{n}$ are positive definite. Based on the knowledge of higher algebra, the inverse matrices $Σ_{s}^{- 1}$ , $Σ_{n}^{- 1}$ and ${(Σ_{s} + Σ_{n})}^{- 1}$ are also positive definite. Besides, it can be deduced that $| Σ_{s} + Σ_{n} | \geq | Σ_{s} |$ ,1 and $| {(Σ_{s} + Σ_{n})}^{- 1} | \leq | Σ_{s}^{- 1} |$ ,2 In this way, we can obtain the following inequality.

\begin{matrix} \begin{matrix} \frac{μ_{0}^{2}}{σ_{noise}^{2}} \leq \frac{μ_{0}^{2}}{σ_{nonnoise}^{2}} ⟺ σ_{noise}^{2} \geq σ_{nonnoise}^{2} \end{matrix} \end{matrix}

Equation (16) implies that noise increases the portfolio variance and shifts the mean–variance effective frontier to the right. Therefore, denoising is equivalent to changing from a noisy environment to a non-noisy environment. As consequence, the effective frontier will shift to the left compared to that of using original price, and the higher the denoising degree is, the farther the shift to the left will be. Figure 2 summarizes the mean–variance effective frontier for different scenarios.

Fig. 2 — Mean–variance effective frontier

Measures of Portfolio Performance

In practice, investors are more concerned about the return they can achieve under a certain level of risk tolerance (Moura et al., 2020). Thus, four common quantitative indicators are considered to evaluate portfolio performance, which include the Sharpe ratio, Sortino ratio, upside potential ratio, and tracking error ratio. The higher these indicators are, the better the effect of portfolio will be.

As we know, the Sharpe ratio, abbreviated SR, is the most common indicator adopted by investors to measure portfolio return.

\begin{matrix} S R = \frac{E (r_{p})}{\sqrt{var (r_{p})}} \end{matrix}

Due to potential drawbacks of Sharpe ratio in evaluating portfolio performance, we apply the Sortino ratio, abbreviated SoR, to take account of the asymmetric pattern of financial volatility which cannot be captured via Sharpe ratio (Sortino and Van Der Meer, 1991).

\begin{matrix} S o R = \frac{E (r_{p})}{\sqrt{E {(m i n (r_{p}, 0))}^{2}}} \end{matrix}

Additionally, as described by Sortino et al. (1999), we take into account the upside potential return, and use the upside potential ratio, abbreviated UPR, to study the information in the higher moment.

\begin{matrix} U P R = \frac{E (m a x (r_{p}, 0))}{\sqrt{E {(m i n (r_{p}, 0))}^{2}}} \end{matrix}

Also, in order to quantify the differences between competing portfolio strategies, the tracking error ratio, abbreviated TR, is used to evaluate the error-tracking ability (Berger and Czudaj, 2020).

\begin{matrix} T R = \frac{E (r_{p} - r_{b})}{\sqrt{v a r (r_{p} - r_{b})}} \end{matrix}

where $r_{b}$ denotes the portfolio based on original unfiltered return, which is defined as the benchmark. TR gives the tracking error, i.e. the difference between the evaluated portfolio return and the benchmark. Thus, a higher TR denotes that the portfolio performance on error-tracking is better.

EMD Denoising Methodology

Section 2 points out that noise is an important factor affecting portfolio performance, take a step forward, a new EMD denoising method is constructed to improve portfolio performance. The reason for preferring EMD to construct the denoising method is that compared to traditional denoising methods such as wavelet denoising, etc, it is adaptive and does not require any prior assumptions about signal pattern or system order, such as basis function, decomposition level, etc, which are important factors affecting the denoising results. For investors, how to choose the right parameters is a difficult task. Besides, EMD shows better properties in dealing with nonlinear and non-stationary data (Huang et al., 1998), and has been widely applied to decompose financial data (Zhu et al., 2017; Yang et al., 2019). To illustrate the superiority of the proposed denoising method, we thoroughly compare several common denoising methods and test the portfolio performance under the mean–variance framework.

Empirical Mode Decomposition

The EMD proposed by Johnson et al. (1998) decomposes original noisy price x(t) into a series of IMFs, which need to satisfy the following two conditions: (1) The extremum numbers and zero-crossing points must be equal or differ at most by one in the whole time series. (2) The mean value of the envelope defined by the local maxima and minima is zero at any point. With this definition, the noisy price x(t) can be decomposed according to Table 1:

Table 1.

EMD algorithm

Step 1	Find the local extrema of $x_{i} (t)$ , including both maxima and minima
Step 2	Identify its upper and lower envelopes, $x_{i, u p} (t)$ and $x_{i, l o w} (t)$ with cubic spline interpolation
Step 3	Compute the point-by-point means $m_{i}$ from upper and lower envelopes: $\bar{x} (t) = (x_{up} (t) + x_{low} (t)) / 2$
Step 4	Subtract the means from the time series to obtain an IMF candidate $y (t) = x (t) - \bar{x} (t)$
Step 5	Check the properties of y(t): If y(t) meets the above two conditions, then IMF is extracted and replace x(t) with the residue $r (t) = x (t) - y (t)$ , If y(t) does not meet, replace x(t) with y(t)
Step 6	Repeat steps 1–5 until the stop criterion is satisfied

	$H_{0} : ρ_{j} = 0$	$H_{1} : ρ_{j} \neq 0$
IMF $_{j}$	Noise	Non-noisy component
p value	$p_{j} > β$	$p_{j} \leq β$

	IMF $_{1}$	IMF $_{2}$	IMF $_{3}$	IMF $_{4}$	IMF $_{5}$	IMF $_{6}$	IMF $_{7}$	IMF $_{8}$	Res	Original
Var	0.0061	0.0088	0.0185	0.0252	0.1072	0.0970	0.5142	0.8462	7.1420	8.3707
Cov	0.0029	0.0057	0.0158	0.0220	0.1096	0.1813	0.4315	0.6735	6.9285	–
$ρ$	0.0126	0.0210	0.0401	0.0479	0.1157	0.2012	0.2080	0.2530	0.8961	–
H $_{0}$	0	0	0	0	1	1	1	1	1	–
p	0.4776	0.2372	0.0236	0.0069	0.0000	0.0000	0.0000	0.0000	0.0000	-

ID	600000	600016	600019	600028	600030	600031	600036	600048	600050	600111
${EMD}_{MSE}$	1	1	1	1	1	1	1	1	1–2	1
${EMD}_{CP}$	1–6	1–6	1–5	1–5	1–6	1–6	1–5	1–5	1–6	1–7
${EMD}_{KLD}$	1	1	1–2	1–2	1	1	1–2	1–2	1–3	1
${EMD}_{ED}$	1	1	1	1	1	1	1	1	1–2	1
EMD $ρ$	1–4	1–6	1–5	1–5	1–3	1–3,5–6	1–8	1–5,8	1,3–5	1–5
ID	600123	600256	600348	600362	600383	600489	600518	600519	600549	600585
${EMD}_{MSE}$	1	1	1	1	1	1	1	1	1	1
${EMD}_{CP}$	1–6	1–6	1–5	1–5	1–5	1–6	1–7	1–7	1–6	1–6
${EMD}_{KLD}$	1–3	1	1–2	1–2	1–2	1–9	1–2	1–2	1	1
${EMD}_{ED}$	1	1–2	1	1	1	1	1	1	1	1
EMD $ρ$	1–5	1–6	1–3,5	1–4,8	1–6,9	1–3	1–5,7	1–5,7,8	1–4	1–4,6
ID	600837	600887	601006	601088	601166	601169	601328	601398	601628	601699
${EMD}_{MSE}$	1	1	1	1–2	1	1	1	1	1	1
${EMD}_{CP}$	1–6	1–4	1–5	1–5	1–5	1–6	1–8	1–6	1–6	1–6
${EMD}_{KLD}$	1–3	1	1	1–2	1	1	1–3	1–2	1–2	1–2
${EMD}_{ED}$	1–2	1	1	1	1	1–2	1	1	1	1
EMD $ρ$	1–3,10	1–8	1–5	1–3	1–5,8	1–5	1–4	1–6,8	1–3	1–5

Step 1	Calculate the maximum Sharpe ratios $s_{1}, \dots, s_{m}$ for different effective frontiers, where m is the number of effective frontiers. Then, the maximum and minimum Sharpe ratios are $s_{\min} = m i n (s_{1}, \dots, s_{m})$ and $s_{\max} = m a x (s_{1}, \dots, s_{m})$ , respectively
Step 2	Locate the average returns corresponding to the Sharpe ratios $s_{\min}, s_{\max}$ as $r_{m 1}$ and $r_{m 2}$ . By combining the maximum average returns $r_{k 1}, \dots, r_{km}$ of different efficient frontiers, we can construct the return interval $[r_{\min}, r_{\max}]$ , where $r_{\min} = m i n (r_{m 1}, r_{m 2})$ , $r_{\max} = m i n (m a x (r_{m 1}, r_{m 2}), r_{k 1}, \dots, r_{km})$
Step 3	Using the return interval $[r_{\min}, r_{\max}]$ as the benchmark to search the portfolio weight. Finally, different methods include $n_{1}, \dots, n_{m}$ group portfolio weights within the interval, respectively
Step 4	Construct the average portfolio return using the selected portfolio weights and in-sample unfiltered return. Check whether the portfolio return meets the investors’ expectation, $E (r_{p}) \geq μ_{0}^{a}$ . If they do, the portfolio weights are determined. If not, gradually reduce the interval $[r_{\min}, r_{\max}]$ range, repeat steps 2–4 to obtain the final portfolio that meets the investor’s expectation
Step 5	Construct the portfolio return using the selected portfolio weights, and the out-of-sample unfiltered return. Finally, calculate the average portfolio return to represent the optimal portfolio return

	Original	EMD $_{MSE}$	EMD $_{CP}$	EMD $_{KLD}$	EMD $_{ED}$	EMD $ρ$
SR	− 0.0240	− 0.0258	− 0.0407	− 0.0371	− 0.0400	0.0200
SoR	− 0.0321	− 0.0343	− 0.0531	− 0.0491	− 0.0517	0.0280
UPR	0.4455	0.4416	0.4111	0.4163	0.4133	0.5115
TE	–	− 0.0080	− 0.0543	− 0.0556	− 0.0544	0.0605

	Original	EEMD $_{MSE}$	EEMD $_{CP}$	EEMD $_{KLD}$	EEMD $_{ED}$	EEMD $ρ$	EMD $ρ$
SR	− 0.0240	− 0.0240	− 0.0472	− 0.0433	− 0.0241	− 0.0389	0.0200
SoR	− 0.0321	− 0.0320	− 0.0603	− 0.0560	− 0.0321	− 0.0502	0.0280
UPR	0.4455	0.4518	0.3997	0.4074	0.4515	0.4132	0.5115
TE	−	0.0053	− 0.0553	− 0.0504	0.0053	− 0.0342	0.0605

	Original	Sym8	Haar	Coif4	Equal	EMD $ρ$
SR	− 0.0240	− 0.0185	− 0.0275	− 0.0181	0.0075	0.0200
SoR	− 0.0321	− 0.0249	− 0.0366	− 0.0244	0.0100	0.0280
UPR	0.4455	0.4578	0.4380	0.4584	0.4453	0.5115
TE	–	0.0595	− 0.0670	0.0622	0.0354	0.0605

	In-sample	Obs	Out-of-sample	Obs
Bear market	2007/10/8–2011/12/23	1032	2011/12/24–2014/11/2	688
Bull market	2014/11/3–2018/6/5	876	2018/6/6–2020/10/30	584
Financial crisis	2007/10/8–2008/6/3	163	2008/6/4–2008/11/11	108
COVID-19 pandemic	2020/1/1–2020/7/2	119	2020/7/3–2020/10/30	80

	Original	EMD $_{MSE}$	EMD $_{CP}$	EMD $_{KLD}$	EMD $_{ED}$	Wavelet	EEMD $ρ$	EMD $ρ$
Panel A: Bear market
SR	0.0216	0.0189	− 0.0384	0.0038	0.0189	0.0213	0.0388	0.0430
SoR	0.0307	0.0266	− 0.0540	0.0053	0.0266	0.0301	0.0570	0.0634
UPR	0.5485	0.5475	0.4901	0.5393	0.5475	0.5477	0.5868	0.5896
TE	–	− 0.0180	− 0.0563	− 0.0345	− 0.0180	− 0.0143	0.0342	0.0295
Panel B: Bull market
SR	0.0168	0.0253	0.0117	0.0175	0.0249	0.0425	0.0427	0.0591
SoR	0.0239	0.0359	0.0165	0.0246	0.0353	0.0623	0.0623	0.0886
UPR	0.5342	0.5361	0.5386	0.5294	0.5366	0.5640	0.5726	0.5980
TE	–	0.0368	− 0.0082	0.0096	0.0360	0.1078	0.1042	0.1267
Panel C: Financial crisis
SR	− 0.1079	− 0.1435	− 0.0987	− 0.1130	− 0.1263	− 0.0732	− 0.1001	− 0.1076
SoR	− 0.1396	− 0.1848	− 0.1300	− 0.1473	− 0.1632	− 0.0951	− 0.1300	− 0.1399
UPR	0.4247	0.4065	0.4612	0.4350	0.4147	0.4559	0.4358	0.4280
TE	–	− 0.1115	− 0.0129	− 0.0500	− 0.0913	0.0913	0.0387	0.0299
Panel D: COVID-19 pandemic
SR	0.0109	0.0068	0.0215	− 0.0400	0.0212	0.0399	0.0589	0.0790
SoR	0.0153	0.0095	0.0304	− 0.0616	0.0304	0.0555	0.0809	0.1120
UPR	0.5002	0.4862	0.4875	0.5239	0.5138	0.5202	0.5291	0.5640
TE	−	− 0.0371	0.0248	− 0.0669	0.0624	0.1419	0.1539	0.2135

Setting 1	All parameters are artificially specified with $μ_{i} = 0.1$ and $σ_{i}$ = 0.5, 1, 1.5 $(i = 1, \dots, k)$ . The initial prices are set to 100, and the dimension $k = 30, 50, 100$ . We assume that the added noise is white noise, which is sampled from the standard normal N(0,1) distribution
Setting 2	Different from setting 1, all parameters are estimated from the real-world dataset, More precisely, we calculate different parameters based on SSE 50 sample. Besides, the added noise is sampled from the standard normal N(0,1) distribution
Setting 3	The parameters keep the same as setting 2 except that the added noise follows a uniform U(0,1) distribution

	Original	EMD $_{MSE}$	EMD $_{CP}$	EMD $_{KLD}$	EMD $_{ED}$	Wavelet	EEMD $ρ$	EMD $ρ$
Panel A: Setting 1
SR	0.0648	− 0.0181	0.0151	− 0.0040	− 0.0120	0.0197	− 0.0209	0.1423
SoR	0.0958	− 0.0254	0.0219	− 0.0056	− 0.0169	0.0280	− 0.0296	0.2258
UPR	0.6429	0.5464	0.5925	0.5577	0.5522	0.5878	0.5464	0.7528
TE	–	− 0.0418	− 0.0366	− 0.0323	− 0.0347	− 0.0791	− 0.1132	0.1269
Panel B: Setting 2
SR	0.0122	0.0007	− 0.0070	0.0009	0.0006	0.0014	0.0081	0.0292
SoR	0.0176	0.0010	− 0.0102	0.0012	0.0008	0.0020	0.0118	0.0421
UPR	0.5860	0.5376	0.5103	0.5714	0.5388	0.5119	0.5793	0.5942
TE	–	− 0.0002	− 0.0097	− 0.0007	− 0.0003	− 0.0012	0.0007	0.0262
Panel C: Setting 3
SR	0.0100	− 0.0191	− 0.0179	− 0.0053	− 0.0192	− 0.0080	− 0.0126	0.0451
SoR	0.0144	− 0.0274	− 0.0259	− 0.0076	− 0.0276	− 0.0113	− 0.0177	0.0636
UPR	0.5962	0.5343	0.5728	0.5799	0.5324	0.5589	0.5619	0.5943
TE	–	− 0.0212	− 0.0274	− 0.0086	− 0.0213	− 0.0140	− 0.0193	0.0412

	Original	${EMD}_{MSE}$	${EMD}_{CP}$	${EMD}_{KLD}$	${EMD}_{ED}$	Wavelet	EEMD $ρ$	EMD $ρ$
Panel A: 500-day sample period
SR	0.0148	− 0.0103	− 0.0062	− 0.0047	− 0.0096	0.0035	0.0292	0.0400
SoR	0.0215	− 0.0147	− 0.0087	− 0.0065	− 0.0139	0.0051	0.0418	0.0580
UPR	0.5863	0.5535	0.5250	0.5496	0.5649	0.5942	0.5856	0.5980
TE	–	− 0.0129	− 0.0103	− 0.0088	− 0.0124	− 0.0051	0.0207	0.0309
Panel B: 3000-day sample period
SR	0.0343	0.0180	0.0341	0.0105	0.0190	0.0267	0.0283	0.0426
SoR	0.0489	0.0260	0.0491	0.0151	0.0274	0.0383	0.0413	0.0612
UPR	0.5835	0.5730	0.5790	0.5509	0.5745	0.5769	0.5760	0.5922
TE	–	− 0.0011	0.0056	− 0.0022	− 0.0011	− 0.0006	0.0043	0.0152

	Mean	SD	Min	Max	Skew	Kurt	Days
Original	0.0015	0.0252	− 0.1057	0.0958	− 0.0434	6.5429	–
${EMD}_{MSE}$	0.0014	0.0153	− 0.0851	0.0788	− 0.0621	6.3852	3.1607
${EMD}_{CP}$	0.0011	0.0047	− 0.0689	0.0776	0.2192	22.6717	3.9230
${EMD}_{KLD}$	0.0015	0.0114	− 0.0589	0.0537	− 0.0912	5.8432	3.4286
${EMD}_{ED}$	0.0015	0.0150	− 0.0827	0.0767	− 0.0737	6.3944	3.1838
EMD $ρ$	0.0002	0.0188	− 0.3530	0.3616	0.0371	66.3609	3.9151

	Original	${EMD}_{MSE}$	${EMD}_{CP}$	${EMD}_{KLD}$	${EMD}_{ED}$	Wavelet	EEMD $ρ$	EMD $ρ$
SR	− 0.0106	0.0127	− 0.0108	0.0012	0.0167	0.0173	0.0062	0.0330
SoR	− 0.0142	0.0173	− 0.0139	0.0018	0.0233	0.0243	0.0083	0.0467
UPR	0.4588	0.4684	0.4116	0.5062	0.5054	0.5091	0.4518	0.5215
TE	−	0.0289	0.0060	0.0099	0.0361	0.0425	0.0179	0.0410

	Original	${EMD}_{MSE}$	${EMD}_{CP}$	${EMD}_{KLD}$	${EMD}_{ED}$	Wavelet	EEMD $ρ$	EMD $ρ$
SR	− 0.0437	− 0.0497	− 0.0713	− 0.0561	− 0.0497	− 0.0512	0.0228	0.0260
SoR	− 0.0592	− 0.0670	− 0.0943	− 0.0752	− 0.0670	− 0.0689	0.0321	0.0369
UPR	0.4783	0.4726	0.4329	0.4594	0.4726	0.4678	0.5223	0.5319
TE	–	− 0.0856	− 0.1243	− 0.1242	− 0.0856	− 0.1277	0.0630	0.0680

PERMALINK

Portfolio Selection Based on EMD Denoising with Correlation Coefficient Test Criterion

Kuangxi Su

Yinhong Yao

Chengli Zheng

Wenzhao Xie

Abstract

Introduction

Fig. 1.

Portfolio Theory Under Noisy Environment

The Noisy Portfolio Returns

Mean–Variance Model Under Noisy Environment

Mean–Variance Effective Frontier

Fig. 2.

Measures of Portfolio Performance

EMD Denoising Methodology

Empirical Mode Decomposition

Table 1.

Common EMD Denoising Methods

The Proposed Denoising Method

Table 2.

Empirical Analysis

Data Resource

Denoising Analysis

Fig. 3.

Table 3.

Fig. 4.

Optimal Portfolio Construction

Fig. 5.

Table 13.

Table 4.

Portfolio Performance Evaluation

Full Sample Analysis

Table 5.

Table 6.

Table 7.

Subsamples Analysis

Fig. 6.

Table 8.

Table 9.

Simulation Study

Table 10.

Table 11.

Table 16.

Conclusions

Appendix 1

Table 12.

Appendix 2: Denoising Analysis

Table 14.

Appendix 3: Portfolio performance based on different wavelet soft threshold denoising methods

Table 15.

Appendix 4: Simulation study based on different sample lengths

Appendix 5: Robustness Test

Table 17.

Table 18.

Funding

Availability of Data and Materials

Declarations

Conflict of Interest

Ethical Approval

Consent to Participate

Consent for Publication

Footnotes

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases