COVID-19 forecasts via stock market indicators

Yi Liang; James Unwin

doi:10.1038/s41598-022-15897-x

. 2022 Aug 1;12:13197. doi: 10.1038/s41598-022-15897-x

COVID-19 forecasts via stock market indicators

Yi Liang ¹, James Unwin ^2,^✉

PMCID: PMC9342844 PMID: 35915102

Abstract

We propose that technical analysis tools developed to give buy/sell signals in asset trading can be applied to analyze time series datasets in the natural sciences, and we show this explicitly for a study of WHO COVID-19 data. Notably, reliable short term forecasting can provide potentially lifesaving insights into logistical planning, and in particular, into the optimal allocation of resources such as hospital staff and equipment. By reinterpreting COVID-19 daily cases in terms of candlesticks, we are able to apply some of the most popular stock market technical indicators to obtain predictive power over the course of the pandemics. By providing a quantitative assessment of MACD, RSI, and candlestick analyses, we show their statistical significance in making predictions for both stock market data and WHO COVID-19 data. In particular, we show the utility of this novel approach by considering the identification of the beginnings of subsequent waves of the pandemic. Finally, our new methods are used to assess whether current health policies are impacting the growth in new COVID-19 cases.

Subject terms: Viral infection, Statistical methods, Public health

Introduction

Logistical planning can be the difference between life and death during a pandemic, such as the ongoing COVID-19 crisis. Here we identify new techniques which can be applied during pandemics to assist in the optimal allocation of resources, and to aid in the evaluation of current health policies. Specifically, we repurpose a number of tools developed as stock market strategies and demonstrate that these techniques can be applied to predict future trends in the number of daily new cases of COVID. Notably, these tools can be used to

Identify the peak of a pandemic wave;
Forecast the start of new waves.

While fluctuations in the number of new COVID cases and the prices of stocks may naively seem disconnected, both systems can be described as non-stationary random walks, i.e. a time series which exhibits random fluctuations around a longer-term trend. In the context of stocks, the random walk hypothesis can be formulated with the daily rate of returns in the stock market randomly drawn from a Gaussian or Laplace distribution¹. On the other hand, the daily increase in COVID-19 cases can be modeled as a random walk due to the complex nature of human interactions and it has an overall trend as an infectious disease in the spread and controlled phases. Drawing on this connection we hypothesized that strategies developed for predicting stock price movements can be repurposed to forecast changes in the trend of the number of new COVID cases, and more generally any system that is well described as a non-stationary random walk. We propose here that these techniques, collectively known as “technical analysis”, developed to give buy/sell signals in the stock market (in particular ‘momentum’ indicators) can be used to provide predictions for other time series data, such as COVID-19 forecasts. Specifically, we show that these technical indicators do indeed provide accurate predictions of the COVID-19 pandemic, in the sense that they are statistically significant.

Traders look to identify continuations or reversals in stock market trends to profit from short-term trades, and there are several well established set of techniques for forecasting future stock movements, known as “technical indicators”. Popular technical indicators include candlestick patterns², the Moving Average Convergence Divergence (MACD) indicator³, and the Relative Strength Index (RSI) indicator⁴. While, the robustness of these techniques has been debated⁵—in particular, the efficient market hypothesis states that if the market is efficient, then any profitability information of technical indicators would be incorporated into the new prices and thus it should be impossible to gain abnormal profits⁶—it has been shown that several of these technical indicators are indeed statistically significant. Specifically, statistical tests of technical indicators have been conducted to examine the effectiveness, and the profitability of the stock markets of Taiwan, Thailand, USA, and Brazil^7–11. These studies have shown that some techniques do have predictive power, providing evidence for market inefficiency, while others indicators are not successful. For further recent studies of technical indicators applied to stock market returns the author may also wish to compare with e.g.^12–15.

After introducing the notion of trends and candlestick representations for time series data, we provide mathematical definitions for the various technical indicators in “Technical indicators”. To emphasize the novelty of our techniques, we use the World Health Organization (WHO) COVID-19 data throughout the paper and highlight the application of these technical analysis techniques outside of the domain of the financial markets. By performing statistical tests on technical indicators in “Statistical methods”, we show that a selection of these indicators correctly predict reversals in existing trends at rates which are statistically significant. Our analysis improves on aspects of earlier analyses of stock market data and, moreover, extends the study to these indicators outside of stock market forecasts, as illustrated in “Predicting new cases of COVID-19” for COVID-19. Our results have the following important implications:

Technical indicators have predictive power beyond forecasting future asset prices.
These tools can be used to identify the beginnings of subsequent waves of a pandemic.
New methods to asses whether current health policies are impacting the growth in new cases.

For completeness, “Application in stock markets” gives an analysis using stock market data. “Concluding remarks” gives some concluding remarks.

Technical indicators

Technical analysis aims to forecast future price movements in the stock market, and will be the key tools which we shall use to analyze COVID-19 data. In this section, we first introduce the notion of a trend (“Trends”) and Candlestick Representations of data (“Candlestick representations”), and then outline some of the main technical indicators:

Candlestick Patterns (“Candlestick patterns”)
RSI (“Relative strength index”)
MACD (“Moving average convergence divergence”)

In subsequent sections, we will investigate the predictive power of these technical indicators on COVID-19 data.

Trends

While stock prices and daily COVID cases may fluctuate on shorter time frames, they have an observed tendency to evolve in the same direction for extended periods of time. The cause for long term trends in the stock market may be linked to macroeconomic factors such as monetary policy, or in the case of individual companies trends may be due to particular news or sentiment which result in the continual increases or decreases of the stock prices. For COVID-19, the growth of new cases is due to the fact that the virus is very infectious and the population was (and remains) highly vulnerable, which led to a significant initial uptrend. As a result, there is typically a well defined notion of “trends” in time series such as these. A good approach to identify trends in a time series is through a simple regression procedure (as we will detail below) and Fig. 1 presents an example in terms of the daily change in asset prices.

Daily Prices for S &P 500 from February 5, 2020 to March 19, 2020 with our simple polynomial regression line (shown in blue) indicates the trend. Observe that the price follows the general trend for a few unit of time, and thus a regression line can provide a useful insight.

A period for which the gradient of the regression line has a fixed sign indicates a trend in the time series, which can be either an uptrend or a downtrend. Identifying such trends can provide insight into the likely subsequent behavior of the time series. Technical analysis was originally developed to provide signals for the start and end of trends in stock market data, and here we apply these techniques to alternative time series data sets.

We introduce a time variable which is integer valued and in particular we focus on daily and weekly intervals, counting from some partiular start date. A natural question to ask is whether for a given date D a particular evolution in the time series exhibits a preexisting trend over the last $δ$ days. If such a trend exists we will say that the data exhibits a $δ$ -interval trend, additionally:

A trend is called bullish if the gradient of the trend is strictly positive.
Conversely, a trend is said to be bearish its gradient is strictly negative.

In what follows we shall often partition the time series data into daily and weekly intervals. Over the course of an interval the value of the time series will vary, following conventions of stock market analyses, we shall track a number of characteristic features of each time interval, in particular:

The initial value for a given time interval, is called the opening value and is denoted $O_{t}$ .
The final value for a given time interval, is called the closing value and is denoted $C_{t}$ .

The subscript t is the index of the time interval. Since the data is discretized, typically $C_{t} \neq O_{t + 1}$ . It is also useful to define an average value for a given time interval

\begin{matrix} M_{t} : = \frac{O_{t} + C_{t}}{2} . \end{matrix}

To identify potential trends we apply a linear regression fitted to $M_{t}$ over the range of dates $D - δ \leq t \leq D$ , with residuals $γ_{t}$ of the regression defined by

\begin{matrix} γ_{t} : = abs [(l (t) - M_{t}], \end{matrix}

where l(t) is the value of the linear regression at time t. We then define a trend function $T (\cdot)$ which takes the set of ${M_{t}}$ as input and returns $+ 1$ for an uptrend and $- 1$ for an downwards trend, as follows

\begin{matrix} T (\cdot) : = \{\begin{matrix} 1, & k \geq 0.005 \cdot μ, γ_{t} < 0.02 \cdot μ \\ - 1, & k \leq - 0.005 \cdot μ, γ_{t} < 0.02 \cdot μ \\ 0, & otherwise \end{matrix}), \end{matrix}

where k denotes the slope of the regression and the mean is given by $μ = mean ({M_{t}})$ . The requirement on k corresponds to an increase or decrease of at least half of a percent of the mean. The restriction on $γ$ evaluates the goodness of the linear regression fit, requiring that each ${M_{t}}$ be no further than $2 %$ from the trend line. The function returns zero if there is not a robust trend in values, indicating that there is no clear trend.

Candlestick representations

While price movements in the stock market can be represented as a continuous curve that is smoothed by time averaging over some period (be it seconds, minutes, hours, or days), candlesticks were proposed as a tool to better visualize the movements. Candlesticks provide a summary of prices using four numbers - open, close, high, low - in a given period. In addition to the opening $O_{t}$ and closing $C_{t}$ values defined above, we now introduce:

The highest value $H_{t}$ in a given interval is the high;
The lowest value $L_{t}$ in a given interval is the low.

Typical lengths of the periods that candlestick describe are one day, an hour, 30 minutes, and 5 minutes. Specifically, given a time series over a certain period, a candlestick $I_{t}$ for the interval t is defined by the quadruple

\begin{matrix} I_{t} = (O_{t}, C_{t}, H_{t}, L_{t}) . \end{matrix}

Taking the period to be a single day, this implies that $C_{t} - O_{t}$ is the change in value over the day. For $O_{t} > C_{t}$ this indicates a decrease in the value of the time series during the day, while $C_{t} > O_{t}$ implies an increase. Moreover, $C_{t - 7} - O_{t}$ is the change in value over a given week.

A visualization of how a single candlestick is constructed from the data in the intervening period is shown in Fig. 2, following common practice we color the candlesticks red if $O_{t} > C_{t}$ and green if $C_{t} > O_{t}$ , the color indicating whether the price increased or decreased over the period of the candlestick.

Illustrations of the construction of candlesticks. A red candlestick represents a decrease in value during the intervening period, observe that the open price is higher than the price at close. A green candlestick, conversely, indicates an increase in value. The proportions of the candlesticks are set by the open, high, low, close values over the period.

Each candlestick is comprised of three parts, the real body, and its lower and upper shadows. The real body $r_{t}$ at time t is the difference between the opening values and the closing value

\begin{matrix} \begin{matrix} r_{t} = abs (O_{t} - C_{t}) . \end{matrix} \end{matrix}

This is represented as the central solid rectangle in the visualization of Fig. 2. The lower shadows $l_{t}$ and upper shadows $u_{t}$ at time t are defined by

\begin{matrix} l_{t} = & min (O_{t}, C_{t}) - L_{t}, \end{matrix}

\begin{matrix} u_{t} = & H_{t} - max (O_{t}, C_{t}) . \end{matrix}

These are represented as the thin lines which extend above and below the real body in the visualization of Fig. 2. Note that in some cases the shadows may have vanishing extent, for instance for $O_{t} = L_{t}$ with $C_{t} < O_{t}$ .

In the context of the stock market, asset traders may often choose to utilize these discrete candlesticks to visualize the data, as this representation provides substantially much information than the simpler line graphs of stock prices. Traders have developed a number of visual cues based on this candlestick representation—known as candlestick patterns—which are thought to forecast future asset price moves, as we discuss next.

Candlestick patterns

Candlestick patterns typically involve the relative magnitude of the high, low, open, and close values of one or two consecutive candlesticks. There is a widespread use of these patterns within the trading community, with the belief that specific configurations of candlesticks can be used to forecast future price movements²:

If a pattern predicts an uptrend will reverse to a downtrend, it is called a bearish reversal pattern.
Conversely, a bullish reversal pattern predicts a reversal of a downwards trend to an uptrend.

In this work we focus our analysis on three bearish reversal patterns patterns (Bearish Engulfing, Hanging Man, and Dark Cloud Over) and two bullish reversal patterns (Bullish Engulfing and Hammer). For the mathematical definition of the candlestick patterns we used the definitions proposed in⁷, through restrictions on their $O_{t}, C_{t}, L_{t}, H_{t}$ values and requirements on a pre-existing trend. These patterns are shown graphically in Figs. 3 and 4 and then are defined mathematically in Fig. 5.

Visual definitions of five candlestick patterns.

Examples of accurate forecasting via candlestick patterns. The x-axis provides an index of time with each candle representing one time period, while the y-axis indicates the value of some positive-valued measurable quantity (traditionally, share price). Axes values have been omitted as they are unimportant for these illustrations. The blue line indicates the 4-day trend lines established via linear regression, confirming either an appropriate uptrend or downtrend. The light colored candle indicates the start of each candlestick pattern, observe that in all cases shown the pattern corresponds to a trend reversal.

Mathematical definitions of five candlestick patterns.

The indices appearing in the definitions of Fig. 5 denote the time ordering such that the first candle of each pattern occurs with time stamp $D = 0$ , with the second candle (if any) for $D = 1$ . We require a trend for the preceding $δ$ intervals, such that there is an appropriate trend over the period $D - δ \leq t \leq D$ as outlined in “Trends”.

Using R we implemented a code which takes time-series data and outputs candlestick representations then scans the output for specific patterns. Some example candlestick patterns identified by our code when applied to the S &P 500 Index (GSPC) daily data are presented in Fig. 4. We show the signal event which indicates the candlestick pattern in a lighter shade. The regression line for the center of the four candlesticks preceding the candlestick patterns is shown to confirm the trend (note that we vary the required trend period in later sections).

Moving average convergence divergence

The Moving Average Convergence Divergence (MACD)³ provides an alternative set of bullish/bearish market signals which can be repurposed for general time series data. The MACD is calculated using two exponential moving averages (EMAs), calculated over two periods of differing length n. Specifically, for a given dataset of length n, usually the closing values ${C_{1}, C_{2}, \dots C_{n}}$ , the EMA $V_{n}$ is calculated recursively via

\begin{matrix} \begin{matrix} V_{i} [C_{i}] : = \{\begin{matrix} C_{1} & i = 1 \\ s C_{i} + (1 - s) V_{i - 1} & i > 1 \end{matrix}), \end{matrix} \end{matrix}

where $s = \frac{2}{n + 1}$ is smoothing factor. Thus, $V_{n}$ can be seen as the exponential average over n intervals, which by substitutions in the recursive formula can be expressed

\begin{matrix} \begin{matrix} V_{n} = s [C_{n} + (1 - s) C_{n - 1} \dots {(1 - s)}^{n - 1} C_{1}] . \end{matrix} \end{matrix}

Observe that the coefficient of each term decreases exponentially for earlier values in the time series, thus giving greater weighting to more recent data, hence the name. Given the EMA, the MACD is defined by the difference between a longer period average $n_{2}$ and a shorter period average $n_{1}$ (thus by convention $n_{1} < n_{2}$ ), as follows

\begin{matrix} \begin{matrix} MACD (n_{1}, n_{2}) = V_{n_{1}} - V_{n_{2}} . \end{matrix} \end{matrix}

Common choices for $(n_{1}, n_{2})$ are (12, 26), which corresponds to the number of trading days in roughly two weeks and a month, and lead to the following (Fig. 6):

When the MACD has large positive values, it indicates that the values have risen more in the recent $n_{1}$ observations when compared with the last $n_{2}$ observations, signifying a strong uptrend.
Conversely, when MACD is negative, the price has fallen more in the last $n_{1}$ observations, signifying a recent downtrend.

MACD analyses provides signals based on “momentum” of the time series. To identify buy and sell signals, the MACD is compared to the so-called Signal Line S, defined by

\begin{matrix} \begin{matrix} S = V_{n_{3}} [MACD (n_{1}, n_{2})] . \end{matrix} \end{matrix}

Examples of successful MACD signals, showing number of new COVID cases (y-axis) as a function of time in weeks from the first case. Panel (a) shows the MACD crossing the signal from line below, leading to bullish reversal. Panel (b) shows the converse bearish MACD signal.

A common value for $n_{3}$ is 9, signifying a week and a half trading period.

There are many ways to use the signal line. In this paper we will focus on crossovers between the MACD and S, illustrated in Fig. 6, which are described below:

When MACD crosses from below to above the signal line, it serves as a bullish signal because the crossing signifies a strong uptrend in MACD, meaning the short-term momentum has risen faster than the long term momentum.
Conversely, when MACD crosses from above to below the signal line, it serves as a bearish signal forecasting a downturn in values.

Relative strength index

The Relative Strength Index (RSI) quantifies the momentum in the times series data through average rate of increases and decreases in value (see Fig. 7). The indicator is constructed by dividing the closing values ${C_{t}}$ over some period into two sets:

The set ${G_{t}}$ in which the series increased:
$\begin{matrix} {G_{t}} : = \frac{C_{t} - C_{t - 1}}{C_{t}}, C_{t} > C_{t - 1} . \end{matrix}$ 12
The set ${D_{t}}$ in which the series decreased:
$\begin{matrix} {D_{t}} : = \frac{C_{t - 1} - C_{t}}{C_{t}}, C_{t} < C_{t - 1} . \end{matrix}$ 13

Examples of successful RSI signals. (a) RSI index is low, predicting reversal to an uptrend as observed. (b) RSI index is high, signally the start of a downtrend.

From the above sets one can compute the averages ${\bar{G}}_{t}$ and ${\bar{L}}_{t}$ using the EMA $V_{n}$ over n periods with a smoothing factor $s = \frac{1}{n}$ , leading to

\begin{matrix} \begin{matrix} {\bar{G}}_{t} = V_{n} [G_{t}] and D_{t} = V_{n} [D_{t}] . \end{matrix} \end{matrix}

Then, the RSI $_{t}$ indicator at time t is defined as follows

\begin{matrix} \begin{matrix} {RSI}_{t} : = 100 - \frac{100}{1 + \frac{{\bar{G}}_{t}}{{\bar{D}}_{t}}} . \end{matrix} \end{matrix}

In the stock market the RSI is used to signal when an asset has become overbought (meaning it has appreciated more rapidly than thought to be typically sustaiable) or oversold. In particular, a high RSI is thought to indicate that one should anticipate a reversal from an uptrend to a downtrend in the near-term. A low RSI is interpreted in by market traders as an asset being oversold, and predicts a near-term increase in prices. We set the threshold for high and low RSI $_{t}$ to be 75 and 25. When the RSI reaches 25 it serves as a bullish signal and conversely, when the RSI reaches 75 this gives a bearish signal. Figure 7 gives two examples of accurate RSI signals.

Statistical methods

Following a standard hypothesis testing protocol, a given procedure tests a null hypothesis $H_{0}$ , against a alternative hypothesis $H_{1}$ . The testing framework then either rejects, or fails to reject, the null hypothesis. Specifically, to test whether or not these technical indicators can correctly predict trends, we formulate a testing procedure using the Wilcoxon Signed-Rank Test.

Wilcoxon signed-rank test

The Wilcoxon test¹⁶ is a nonparametric test for testing the median of a distribution, as we outline below. Previous studies by Goo et al.⁷ utilized the t test as a possible way to confirm the predictive powers of candlestick patterns. However, the t test is a parametric test, meaning that it assumes the normality of the observations. The t test studies the means of the given sample which only makes it reliable in normal samples. Notably, when⁷ was published (2007), normal distributions were a common belief for the rate of return in stock markets, however, recent studies have suggested that Laplace distributions fit better to the daily return of stock markets¹⁷. Given that the distribution of an n day return cannot be assumed as normally distributed, a non-parametric test that tests for the median, i.e. the 50th percentile of the distribution, is the most desirable.

Specifically, here we employ a One-Sample Wilcoxon Test to test for a hypothesized median. Suppose that at time t one observes a signal from one of the technical indicators outlined in the previous section, i.e. a candlestick pattern, a high/low RSI $_{t}$ reading, or a crossover in the MACD. We record the value of the close of that day and denote it as $C_{1}$ , we also record the closes for n days following the signal and form a vector of values $C = {C_{1}, C_{2}, \dots, C_{n}}$ . From this we can calculate the rate of return $R_{i}$ for i days after the observation of the signal event as follows

\begin{matrix} R_{i} = \frac{C_{i + t} - C_{t}}{C_{t}} . \end{matrix}

Thus we can also define a vector of rates of return following the signal event: $R = {R_{1}, R_{2}, \dots, R_{n}}$ .

To proceed, we take the n-day rate of returns vector $R = {R_{1}, R_{2}, \dots, R_{n}}$ and denote the median of the set by $\tilde{R}$ . Then we define ${d_{1}, d_{2}, \dots, d_{n}}$ to be the difference of each $R_{n}$ from the median $\tilde{R}$ such that

\begin{matrix} d_{t} = (R_{t} - \tilde{R}) . \end{matrix}

The null hypothesis $H_{0}$ and alternative hypothesis $H_{1}$ for the pooled sample of the rate of returns on a given day $R$ can be formulated as follows. Hypothesis $H_{1}$ holds that at the occurrence of a bullish (bearish) signal event there should be a positive (negative) rate of return in the near-term future $\tilde{R} > 0$ (conversely, $\tilde{R} < 0$ ). The null hypothesis $H_{0}$ holds that the rate of return should be uncorrelated to the signal events, implying that the indicator under examination fails to provide accurate forecasts.

To implement the one-sample Wilcoxon test we assign to each $d_{t}$ sequential integers $R (d_{T}) \in Z$ (a rank), assigning $R = 1$ to the $d_{t}$ with the smallest absolute value, $R = 2$ for the next smallest absolute value, and so forth, such that the rank of the $d_{t}$ with the largest absolute value is $R = n$ . Then we define $W_{1}$ and $W_{2}$ as follows

\begin{matrix} \begin{matrix} W_{1} = \sum_{d_{t} > 0} R (d_{t}), and W_{2} = \sum_{d_{t} < 0} R (d_{t}) . \end{matrix} \end{matrix}

The Wilcoxon test statistic $W^{'}$ is then defined to be

\begin{matrix} W^{'} = W_{1} - W_{2} . \end{matrix}

Note that the statistic $W^{'}$ is essentially the difference between all the ranks of the observations below the hypothesized median ( $W_{1}$ ) and the ranks of the observations above the hypothesized median ( $W_{2}$ ). It is a robust way to measure a median since, if the true median is the hypothesized median, the distribution of samples should be symmetric about the median. Thus, when we rank the difference of the samples from the median, about half should be positive and about half should be negative, and the sum of their ranks should cancel.

The distribution of expected outcomes assuming $H_{0}$ is true, the null distribution, is centered on $\tilde{R} = 0$ . In contrast, suppose that $H_{1}$ is true, then for a bullish reversal $\tilde{R} > 0$ and there should be fewer observations below 0 (the null hypothesis median). In this case the true distribution is shifted toward the new median $\tilde{R} > 0$ and, as a result, the test statistic $W^{'}$ would tend to be greater.

If the true median is sufficiently different from zero, we reject the claim that its median is 0. This statement can be reformulated in terms of a random variable following the null distribution W, such that for bullish reversals the null hypothesis $H_{0}$ is rejected for

\begin{matrix} \begin{matrix} P (W > W^{'}) = \int_{W^{'}}^{\infty} p (t) dt \leq α, \end{matrix} \end{matrix}

where p(t) is the Wilcoxon distribution and $α$ is a constant threshold which signifies thee significance level, commonly A is taken to be 0.05 or 0.10. Conversely, for bearish reversals the $H_{0}$ is rejected for $P (W < W^{'}) \leq α$ .

Calculating p values

Given the above we can compute the p value from Eq. (20) to quantify the statistical significance of positive correlations. The p value is the probability that the obtained statistic (or a more extreme statistic) occurs given that the null hypothesis is true. A low p value, below the significance levels $α$ , signifies the unlikeliness of the null hypothesis being true. Typical values for $α$ are 0.05 and 0.10, we will adopt the latter value going forward. For an observed p value p such that $p \leq α = 0.1$ , we reject the null hypothesis and claim that the alternative hypothesis is more likely. In our testing procedure, a rejection of the null hypothesis would indicate that a given technical indicator has predicting power.

We wish to quantify whether a signal event identified by one of the technical indicators discussed above makes predictions which are statistically, significant. Specifically, the signal events under consideration are

Occurrences of a Candlestick Patterns;
MACD cross over events;
RSI values of 25 or 75.

Out R code scans the data for such signal events and then record each occurrence along with the $O_{t}, C_{t}, H_{t}, L_{t}$ values for a range of days or weeks around each occurrence. We use the first $δ$ time intervals prior to the signal to establish the initial trend, the value of $δ$ is stated in the analysis. We then take the $Δ t$ time intervals after the signal events to assess whether the signal correctly predicted the future evolution of the time series.

As an initial approach towards quantifying the statistically significance of the predictions we took the set of signals and calculated the p value multiple times for different choices of $Δ t$ . This analysis is intuitive and insightful (and we present results arising from this in the Supplementary Material), however, there are a number of critical issues:

It gives multiple p values for each indicator (which can be conflicting).
The p values for each $Δ t$ are not independent.
For 9 indicators and 10 choices of $Δ t$ one calculates 90 p values, thus one anticipates false positives.

Hence, below we outline a more sophisticated analysis. We start with the record of all occurrences of signals for a given indicators identified by our code. Then to calculate a global p value for a given indicator, we subdivide the data of corresponding to each occurrences randomly into n subsets. Given that each signals typically occur $O (100)$ times in the data series (cf. Table 1), we will use $n = 3$ subsets. Then for each subset we calculate the p value using a different $Δ t$ for each subset.

Table 1.

Summary of the number of observations of technical analysis signals in pooled 17,140 weekly candlesticks of COVID-19 data for 237 countries from $January, 3, 2020$ to $July, 29, 2021$ data obtained from WHO¹⁹.

Signal name	Number of observations
Bullish engulfing	99
Bearish engulfing	123
Hammer	127
Hanging man	156
Dark cloud over	30
Bullish MACD	217
Bearish MACD	245
Bullish RSI	46
Bearish RSI	1057

Open in a new tab

Notably, these subsets will be independent of each other, allowing us to calculate independent p values for each subset. Since these p values are independent we can then use the standard Fisher method¹⁸ to combine the three p values $p_{i}$ (with $i = 1, 2, 3$ ) into the following statistic

\begin{matrix} \begin{matrix} X = - 2 \log (p_{1}) - 2 \log (p_{2}) - 2 \log (p_{3}) . \end{matrix} \end{matrix}

The statistic X has a chi-squared distribution with 6 degrees of freedom (more generally 2n for n subdivisions of the dataset). To obtain the global p value for each indicator we then calculate the area under the chi-squared curve (with 6 degrees of freedom) which lies to the right of the value of X. Following this procedure, we report our finding for the case of new COVID infections in Table 2 and in the context of the stock market in Table 4.

Table 2.

Statistical significance of each technical indicator is shown in terms of their global p value (averaging over multiple values of $Δ t$ ).

Signal name	p value	Significance
Bullish engulfing	$0.0005$	3.2 $σ$
Bearish engulfing	0.63	–
Hammer	$0.014$	2.2 $σ$
Hanging Man	0.63	–
Bullish MACD	$0.0071$	2.5 $σ$
Bearish MACD	$6.8 \times 10^{- 9}$	5.7 $σ$
Bearish RSI	0.63	–

Open in a new tab

Statistically significant p values (those $< 0.1$ ) are shown in bold, and for those indicators we also give the significance in terms of their $σ$ (equivalent to a Z-score).

Table 4.

Global statistical significant of each technical indicator for stock market data.

Signal name	p value (daily)	p value (weekly)
Bullish engulfing	0.092	0.41
Bearish engulfing	0.94	0.97
Hanging man	–	0.99
Dark cloud over	0.26	–
Bullish MACD	$1.4 \times 10^{- 9}$	0.058
Bearish MACD	0.0015	0.19
Bullish RSI	0.0015	–
Bearish RSI	0.1	0.997

Open in a new tab

We calculate the p values for both weekly and daily partitions of the data. Statistically significant p values (those $< 0.1$ ) are shown in boldface.

Predicting new cases of COVID-19

With the technical indicators defined in “Technical indicators” and the statistical analysis of “Statistical methods”, we are now prepared to examine whether technical analysis can provide statistically significant predictions when adapted to study the near-term changes in the number of new COVID-19 daily cases. Following this we will outline and evaluate two specific use cases for these indicators, namely, identifying the peak of a wave of infections, and the commencement of subsequent pandemic waves.

Statistical significance

To investigate whether these technical indicators could be of use during a pandemic, we undertook an analysis of World Health Organisation (WHO) COVID-19 data. Specifically, we used data on the daily reported cases for 237 countries from January, 3, 2020 to July, 29, 2021¹⁹. For a given country, the starting date of the pandemic was defined to be the identification of the first case.

We then grouped the observations of new cases into weekly candlesticks ${P_{c}}$ by setting the open values as the number of new cases on the first day of each 7-day period and the close values as the number of new cases on the last day of each 7-day period. The real body of each candle was defined using the highest/lowest number of new daily cases during the corresponding 7-day period. The data was organized into 17,140 candlesticks with about 70 candlesticks for each country.

Our code identified occurrences of the various signals relating to the technical indicators under consideration across all countries, and we present the number of observation for each signal in Table 1. For calculating of the pre-existing trends to identify candlestick patterns, we use the two preceding weekly candlesticks. Following “Statistical methods”, after identifying the occurrences of each indicator in the COVID datasets for each country, we pooled the occurrences together to proceed with the statistical analysis. We dropped any indicator with less than 50 occurrences from our analysis.

Since we only have daily data (and not hourly) we choose to construct weekly candlesticks. For the analysis of all of the indicators we calculated the p value on a weekly time scale, thus taking $Δ t$ to be some $O (1)$ number weeks following a signal observation. We subdivided our data as described in “Statistical methods” and calculated the global p value for each indicator by averaging over three choices for $Δ t$ , specifically we used $Δ t = 3, 5, 7$ weeks. The combination of the individual p values is described in “Statistical methods”. We present the global p values from our analysis in Table 2. Additionally, as a preliminary analysis we calculated the p value while varying $Δ t$ to see the impact, this analysis is presented in the Supplementary Material (however, as alluded to in “Statistical methods”, while this can be insightful, it encounters some technical drawbacks).

By observation of Table 2 it can been seen that some indicators certainly are statistically significant predictors of future COVID cases, while others are not. Specifically, Bullish Engulfing and (bullish) Hammer candlestick patterns, as well as both MACD indicators are all seen to be statistically significant. Notably, the p value of the Bearish MACD signal implies that this is a highly accurate indicator. Note that the Dark Cloud Over and Bullish RSI indicators had only $O (10)$ occurrences, this is a relatively small sample which could lead to an erroneous conclusion, and thus were omitted from our analysis. We also highlight that these global p values are corroborated by the cruder—although perhaps more intuitive—local p value analysis in the Supplementary Material.

Having concluded that a subset of technical indicators can indeed provide insights into the near-term progression of the pandemic, we next explore how these indicators might be applied to gain insights into trends during an ongoing pandemic. Specifically, we next explore two particularly important use cases:

Identification of the peak of a pandemic waves.
Forecast the start of a new wave of a pandemic.

Peaks of pandemic waves

In the early stages of a pandemic (such as the current COVID-19 crisis) new daily cases grow steadily from week-to-week, perhaps with some small daily fluctuations. At this stage the number of COVID cases exhibits a clear uptrend. Accordingly governments and health official put in place policies and funding to endeavour to reduce the spread of infections. A major milestone in controlling the pandemic is to identify a peak in the daily cases. While peaks are simple to identify in retrospect, at the height of a pandemic it is far from obvious whether a decline in cases is a fluctuation or a local top. The signals we have considered each imply reversals in the trend, thus if COVID cases are growing and one observes a bearish signal using weekly data, this is a prediction that cases will begin falling over the next few weeks.

Therefore, the peak in the number of cases corresponds to a change in the trend of the pandemic, and this is precisely what the indicators that we have been studying are designed to detect. We now apply the Bearish MACD indicator in an effort to identify the peaks of the COVID-19 pandemic in a number of case studies. Notably, the Bearish MACD was the sole indicators for bearish reversals that we found to be statistically significant in Table 2. In other words, we propose that a crossing of the MACD line below the signal lines indicate a peak in infections. More specifically, this indicates the end of the first wave and one expects such crossing events to coincide with each of the peaks of the pandemic.

In Fig. 8 we present COVID-19 daily cases for Japan, South Africa, and the UK in the form of a weekly candlestick charts along with the corresponding plots of the MACD indicator. Observe that a change from uptrend to downtrend does indeed coincide with the MACD line crossing the signal line from above. Moreover, one can interpret the convergence of the MACD and signal lines as indication that cases are nearing a peak, which is indicative that current health policies are likely being effective in reducing the rate of infections.

Candlestick plots of new COVID cases in Japan (first case: 2020/01/14), Korea (first case: 2020/01/19) and the UK (first case: 2020/02/01). Below each plot is the associated MACD chart. A bullish MACD cross-over involves the (red) MACD curve crossing the (green) Signal curve from below, this predicts a rise in cases in the near-term. These examples support that the MACD analysis can provide useful information regarding the evolution of the pandemic.

Additional waves of the pandemic

As evident from the COVID-19 crisis, pandemics can exhibit multiple waves of infections. A second wave refers to the case in which after an initial peak in infections there is a period in which new cases are in decline, then a subsequent reversal with daily cases growing once again.

By inspection of Fig. 8 we can see that the transition from declining cases to increasing new cases can be discerned through the observation of the Bullish MACD signal. We know that this is statistically significant predictor of new cases and this is supported by the case studies of Fig. 8, where one can clearly see that there is an apparent correlation between the crossing event and the commencement of a second wave. The observation of a Bullish Engulfing or Hammer pattern in the candlesticks would also be indicative of subsequent waves.

These tools have significant value for predicting the broad strokes of the future course of the pandemic and can used to identify when a relaxation of health restrictions (such as ending social distancing or mask mandates) is leading to a new wave of infections. The MACD analysis for Japan (Fig. 8, left) is a particular good example of how this indicator gives clear signals of subsequent peaks. The observation of a bullish signal should be used as an indicator that health restrictions must be re-established in order to regain a downtrend in new cases.

Since Fig. 8 includes data up to November 2021, this provides a COVID-19 forecast for Winter 2021. For instance this suggests that a new wave of infections is not imminent for Japan, whereas since the Korean MACD is nearly crossing one anticipates there could be a growth in new cases in early 2022. The proximity of the MACD and signal lines in the UK plot is ambiguous and thus the near-term future is unclear, but this should be taken as an indicator that health restrictions should be strengthened to mitigate the risk of increasing infections.

Application in stock markets

Finally, we also apply our statistical methods to a pool of stock market data. While other such studies have been undertaken, we believe that our use of the Wilcoxon signed-rank test and carefully averaging over multiple time periods in calculating the p value make our analysis more robust. Following the methodology of “Statistical methods”, we applied our code on a pool of stock market data based on 28 stocks and indices, including companies such as Google, Amazon, indexes such as $S & P$ 500 (see Supplementary Material and the Data Availability statement for full details). The data was all sampled with daily and weekly candlesticks obtained from Yahoo Finance²⁰. Table 3 summarizes the number of observations for the aforementioned signals in the compiled data set. We dropped any indicator with less than 50 occurrences from our analysis.

Table 3.

Summary of the number of observations of technical analysis signals in pooled 120,000 sample of 28 stocks over 10 years used in our study from Yahoo Finance²⁰.

Signal name	$#$ Occurrences (daily)	$#$ Occurrences (weekly)
Bullish engulfing	290	67
Bearish engulfing	451	104
Hammer	21	31
Hanging man	45	129
Dark cloud over	82	0
Bullish MACD	4587	660
Bearish MACD	4584	660
Bullish RSI	1133	0
Bearish RSI	3588	251

Open in a new tab

Table 4 gives the p value of each signal using the one-sample Wilcoxon signed-rank test. While for the COVID study $Δ t$ was measured in weeks, as we have much more data for stock prices we undertake our analysis at the time scale of both weeks and days. We take statistically significance to be p values less than 0.10.

Our results indicate that for financial data the MACD and RSI signals, as well as the Bullish Engulfing pattern, are statistically significant on the daily timeframe. Although only the Bullish MACD signal is found to be significant for financial data analysed on the weekly timeframe. We note that our findings disagree with⁷ which used a different methodology for their analysis.

Concluding remarks

The world has struggled with the COVID-19 pandemic for the past two years and increasing attention has been given to the forecasting of infectious diseases. This paper has shown that technical analysis used in asset trading can be repurposed to forecast changes in the number of new cases of COVID-19 and future pandemics. Here we analysed WHO data of the daily new COVID-19 cases for all countries and identified a number of technical indicators that make statistically significant predictions.

Since financial data and COVID data arises from very different systems, it is notable that technical analysis can provide predictions in both settings. We conjectured that these indicators work across these two systems since both can be modeled as non-stationary random walks. It is conceivable that these indicators can identify underlying trends in the time series. Moreover, some groups have expressed doubts regarding whether technical analysis has any intrinsic predictive power, and these results in relation to the pandemic provide some evidence that technical indicators are genuinely predictive.

This work presents new tools for evaluating the effectiveness of policies and practices employed in reducing the impact of the current and future pandemics. This was demonstrated in "Peaks of pandemic waves" and "Additional waves of the pandemic" where it was seen that through observations of the weekly MACD indicator one could identify both the peaks of each wave of the pandemic, as well as the onset of subsequent waves of infections. Importantly, reliable short term forecasting can provide potentially lifesaving insights into logistical planning, in particular when and where to allocate additional resources such as hospital staff and equipment.

Acknowledgements

This work was completed as part of the MIT PRIMES program. We are grateful to Laura Schaposnik for her thoughtful insights and help, and to Kent Vashaw for their comments on a draft.

Author contributions

Y.L. and J.U. together conceived the project and completed the analysis and interpretation, contributing equally.

Data availability

The datasets analysed in this are available in the Harvard Dataverse repository at:10.7910/DVN/DYGZBQ.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

1.Malkiel B. A Random Walk Down Wall Street. W.W. Norton & Company; 2019. [Google Scholar]
2.Bulkowski T. Encyclopedia of Candlestick Charts. Wiley; 2013. [Google Scholar]
3.Appel G. Technical Analysis Power Tools for Active Investors. Financial Times. Prentice Hall; 2005. p. 166. [Google Scholar]
4.Wilder J. New Concepts in Technical Trading Systems. Hunter Pub; 1978. [Google Scholar]
5.Irwin S, Park C-H. What do we know about the profitability of technical analysis? J. Econ. Surv. 2007;20:20. [Google Scholar]
6.Fama E. Efficient capital markets: A review of theory and empirical work. J. Financ. 1970;25(2):383–417. doi: 10.2307/2325486. [DOI] [Google Scholar]
7.Goo Y, Chen D, Chang Y. The application of Japanese trading strategies in Taiwan. Invest. Manage. Financ. Innov. 2007;4:4. [Google Scholar]
8.Tharavanij P, Siraprapasiri V, Rajchamaha K. Profitability of candlestick charting patterns in the stock exchange of Thailand. SAGE Open. 2017;7(4):215824401773679. doi: 10.1177/2158244017736799. [DOI] [Google Scholar]
9.Prado H, Ferneda E, Morais L, Luiz A, Matsura E. On the effectiveness of candlestick chart analysis for the Brazilian Stock Market. Proced. Comput. Sci. 2013;22:1136–1145. doi: 10.1016/j.procs.2013.09.200. [DOI] [Google Scholar]
10.Caginalp G, Laurent H. The predictive power of price patterns. Appl. Math. Financ. 1998;5:181. doi: 10.1080/135048698334637. [DOI] [Google Scholar]
11.Anghel G. Stock market efficiency and the MACD. Evidence from countries around the world. Proced. Econ. Financ. 2015;32:1414–1431. doi: 10.1016/S2212-5671(15)01518-X. [DOI] [Google Scholar]
12.Neely CJ, Rapach DE, Tu J, Zhou G. Forecasting the equity risk premium: The role of technical indicators. Manage. Sci. 2014;60(7):1772–1791. doi: 10.1287/mnsc.2013.1838. [DOI] [Google Scholar]
13.Dai Z, Zhu H, Kang J. New technical indicators and stock returns predictability. Int. Rev. Econ. Financ. 2021;71:127–142. doi: 10.1016/j.iref.2020.09.006. [DOI] [Google Scholar]
14.Dai ZF, Li T, Yang M. Forecasting stock return volatility: The role of shrinkage approaches in a data-rich environment. J. Forecast. 2022;20:1–17. [Google Scholar]
15.Dai ZF, Zhu HY. Time-varying spillover effects and investment strategies between WTI crude oil, Natural Gas and Chinese stock markets related to Belt and Road initiative. Energy Econ. 2022;107:105883. doi: 10.1016/j.eneco.2022.105883. [DOI] [Google Scholar]
16.Wilcoxon F. Individual comparisons by ranking methods. Biometrics Bull. 1945;1(6):80. doi: 10.2307/3001968. [DOI] [Google Scholar]
17.Toth, D., & Jones, B. Against the Norm: Modeling Daily Stock Returns with the Laplace Distribution. arXiv:1906.10325 (preprint) (2021).
18.Fisher RA. Questions and answers #14”. Am. Stat. 1948;2(5):30–31. [Google Scholar]
19.Data.humdata.org. Coronavirus (COVID-19) Cases and Deaths. https://data.humdata.org/dataset/coronavirus-covid-19-cases-and-deaths. Accessed 04 Sep 2021 (2021).
20.Yahoo Finance. https://finance.yahoo.com/.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The datasets analysed in this are available in the Harvard Dataverse repository at:10.7910/DVN/DYGZBQ.

[CR1] 1.Malkiel B. A Random Walk Down Wall Street. W.W. Norton & Company; 2019. [Google Scholar]

[CR2] 2.Bulkowski T. Encyclopedia of Candlestick Charts. Wiley; 2013. [Google Scholar]

[CR3] 3.Appel G. Technical Analysis Power Tools for Active Investors. Financial Times. Prentice Hall; 2005. p. 166. [Google Scholar]

[CR4] 4.Wilder J. New Concepts in Technical Trading Systems. Hunter Pub; 1978. [Google Scholar]

[CR5] 5.Irwin S, Park C-H. What do we know about the profitability of technical analysis? J. Econ. Surv. 2007;20:20. [Google Scholar]

[CR6] 6.Fama E. Efficient capital markets: A review of theory and empirical work. J. Financ. 1970;25(2):383–417. doi: 10.2307/2325486. [DOI] [Google Scholar]

[CR7] 7.Goo Y, Chen D, Chang Y. The application of Japanese trading strategies in Taiwan. Invest. Manage. Financ. Innov. 2007;4:4. [Google Scholar]

[CR8] 8.Tharavanij P, Siraprapasiri V, Rajchamaha K. Profitability of candlestick charting patterns in the stock exchange of Thailand. SAGE Open. 2017;7(4):215824401773679. doi: 10.1177/2158244017736799. [DOI] [Google Scholar]

[CR9] 9.Prado H, Ferneda E, Morais L, Luiz A, Matsura E. On the effectiveness of candlestick chart analysis for the Brazilian Stock Market. Proced. Comput. Sci. 2013;22:1136–1145. doi: 10.1016/j.procs.2013.09.200. [DOI] [Google Scholar]

[CR10] 10.Caginalp G, Laurent H. The predictive power of price patterns. Appl. Math. Financ. 1998;5:181. doi: 10.1080/135048698334637. [DOI] [Google Scholar]

[CR11] 11.Anghel G. Stock market efficiency and the MACD. Evidence from countries around the world. Proced. Econ. Financ. 2015;32:1414–1431. doi: 10.1016/S2212-5671(15)01518-X. [DOI] [Google Scholar]

[CR12] 12.Neely CJ, Rapach DE, Tu J, Zhou G. Forecasting the equity risk premium: The role of technical indicators. Manage. Sci. 2014;60(7):1772–1791. doi: 10.1287/mnsc.2013.1838. [DOI] [Google Scholar]

[CR13] 13.Dai Z, Zhu H, Kang J. New technical indicators and stock returns predictability. Int. Rev. Econ. Financ. 2021;71:127–142. doi: 10.1016/j.iref.2020.09.006. [DOI] [Google Scholar]

[CR14] 14.Dai ZF, Li T, Yang M. Forecasting stock return volatility: The role of shrinkage approaches in a data-rich environment. J. Forecast. 2022;20:1–17. [Google Scholar]

[CR15] 15.Dai ZF, Zhu HY. Time-varying spillover effects and investment strategies between WTI crude oil, Natural Gas and Chinese stock markets related to Belt and Road initiative. Energy Econ. 2022;107:105883. doi: 10.1016/j.eneco.2022.105883. [DOI] [Google Scholar]

[CR16] 16.Wilcoxon F. Individual comparisons by ranking methods. Biometrics Bull. 1945;1(6):80. doi: 10.2307/3001968. [DOI] [Google Scholar]

[CR17] 17.Toth, D., & Jones, B. Against the Norm: Modeling Daily Stock Returns with the Laplace Distribution. arXiv:1906.10325 (preprint) (2021).

[CR18] 18.Fisher RA. Questions and answers #14”. Am. Stat. 1948;2(5):30–31. [Google Scholar]

[CR19] 19.Data.humdata.org. Coronavirus (COVID-19) Cases and Deaths. https://data.humdata.org/dataset/coronavirus-covid-19-cases-and-deaths. Accessed 04 Sep 2021 (2021).

[CR20] 20.Yahoo Finance. https://finance.yahoo.com/.

PERMALINK

COVID-19 forecasts via stock market indicators

Yi Liang

James Unwin

Abstract

Introduction

Technical indicators

Trends

Figure 1.

Candlestick representations

Figure 2.

Candlestick patterns

Figure 3.

Figure 4.

Figure 5.

Moving average convergence divergence

Figure 6.

Relative strength index

Figure 7.

Statistical methods

Wilcoxon signed-rank test

Calculating p values

Table 1.

Table 2.

Table 4.

Predicting new cases of COVID-19

Statistical significance

Peaks of pandemic waves

Figure 8.

Additional waves of the pandemic

Application in stock markets

Table 3.

Concluding remarks

Acknowledgements

Author contributions

Data availability

Competing interests

Footnotes

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases