Abstract
We document significant persistence in the market timing performance of active individual investors, suggesting that some investors are skilled at timing. Using data on all trades by active Finnish individual investors over almost 15 years, we also show that the net purchases of skilled versus unskilled investors predict monthly market returns. Our results lend credibility to the view that market returns are predictable, without having to specify which variables active investors use to successfully time the market. (JEL G10, G11, G12, G14, G15).
Studies examining whether market returns are predictable report mixed results.1 Most of these studies use market variables, like dividend yield or volatility, to predict annual or longer-horizon index returns. We examine monthly return predictability using simple functions of the net order flow of active individual investors. Using the difference of net purchases of previously successful and previously unsuccessful market timers as a predictor, we find statistically and economically significant predictability in market returns.
Using the net purchases of individual investors to predict returns has several advantages over using market variables or the trades of institutional investors. First, skillful market timers may use complex combinations of many different variables to form their return forecasts. Observing their net purchases does not require the researcher to know the source of their information. Second, documenting that some individual investors are persistently good market timers provides evidence that the trading strategies required to successfully time the market are feasible given the information that individual investors have. Third, unlike institutional investors, individual investors are essentially unconstrained in their asset allocations, making the net order flows of skilled timers a potentially strong market predictor variable.2
Our approach is to measure timing performance in the first and second halves of our 174-month data set on Finnish individual investors. We then test whether timing performance has persistence, or whether performance in the first half of the sample predicts performance in the second half. Finally, we use the net purchases of successful timers in the first half of the sample minus those of unsuccessful timers to predict market returns in the second half of the sample. Our market timing measure is the correlation of two variables: the net cash flow into (and out of) stocks of individual i in month t, and the market index return in month t + 1. We consider active investors with flows that positively predict future market returns to be successful market timers. Active investors with flows that are unrelated or inversely related to future market returns are considered poor timers.3
We find both economically and statistically significant persistence in market timing performance. For example, successful timers in the first half of the sample period are over 40% more likely than unsuccessful timers to be in the top quintile of timers in the second half. The persistence we document is displayed by both successful timers and unsuccessful timers. Unsuccessful timers in the first half of the sample are quite likely to be unsuccessful in the second half. We design tests inspired by Fama and French (2010) to separate timing luck from skill and find the documented persistence is unlikely to be due to luck.
We report the results of monthly market return predictability regressions which test if information in investor flows can predict future market returns. We sort investors into quintiles based on their first period performance and examine return predictability in the second half of our sample. We find that the difference in group flows between the top and bottom 20% of investors can significantly predict future market returns. The success of the flow-based measure stands in stark contrast to the prediction power of other variables that are commonly used to forecast returns. Only past market returns can significantly predict future market returns during this time period, while two valuation ratios we explore are negatively and insignificantly related to future market returns.4 The magnitude and significance of the flow-based measure’s coefficient is unaffected by controlling for the dividend yield, earnings-to-price ratio, and past market returns. These results suggest that investor flows have more information about future market returns than commonly used economic variables.
We further assess the economic significance of the observed market timing persistence by sorting investors into quintiles based on their first-half timing measure and examining performance across quintiles in the second half of our sample period. First, we examine the average return of an investment strategy that invests the group flow measure in each month t and earns the excess market return over the month t + 1. The strategy based on the flow difference between the top 20% and bottom 20% earns an average annualized excess return of 2.19% compared to an average excess market return of 1.02% during this period. To adjust for total risk, we calculate the strategy’s Sharpe ratio by dividing its average excess return by its standard deviation. The flow-based strategy has an annualized Sharpe ratio of 0.88 compared to 0.04 for the market. In the Internet Appendix, we also examine the ability of investor flows to predict bear markets, which are defined as a return at least half of one standard deviation below the sample average. The unconditional probability of a bear market is 25.3% in the second half of our sample. The conditional probability of a bear market increases to 41.2% when the flow differential between good and bad timers (sorted based on first-half performance) is at least 0.5 standard deviation below the mean. These results are not necessarily surprising given the ability of group flows to forecast future returns. The persistent dispersion in market timing ability appears to be economically significant.
Shiller (2000) and Campbell and Viceira (2002) argue that individuals should be able to time the market, though at much longer horizons than monthly. Dichev (2007) examines aggregate flows and finds that the dollar-weighted returns to investors are lower than buy-and-hold returns indicating that investors on average are poor market timers. We find supporting evidence that the average individual investor cannot time the market. Our main finding, that the net order flow of previously successful timers minus that of previously unsuccessful timers predicts market returns, is not inconsistent with poor timing by the average investor. Most analysis of the cross-section of market timing ability has examined returns on professionally managed funds. The overwhelming majority of these studies find little evidence of market timing.5 However, Bollen and Busse (2001) and Mamaysky, Spiegel, and Zhang (2008) find some timing by professionals. Using holdings data, Elton, Gruber, and Blake (2011) find evidence that fund managers’ timing attempts usually result in low returns. Kacperczyk, Van Nieuwerburgh, and Veldkamp (2013), however, examine the holdings of fund managers and find that managers have some ability to time the market, especially in recessions. In contemporaneous work, Che, Norli, and Priestly (2012) use detailed Norwegian data covering all of the domestic asset holdings of their investors to show that more individuals successfully time the market than would arise by pure chance. Compared to our work, they do not look for persistence in timing ability, and they do not analyze return predictability. Grinblatt, Keloharju, and Linnainmaa (2012) provide some evidence of a positive relationship between IQ and market timing ability during the dot-com boom and bust period. While they focus on differences in performance and trading activity across individuals with different IQ levels, we focus specifically on the persistent cross-sectional dispersion in market timing ability and the implications of this dispersion for market return predictability.
1. Data
Our main data set combines data on individual investor transactions with data on market returns. The original transactions data contain all transactions in Finnish securities during the sample period and come from the Nordic Central Securities Depository (NCSD). We extend the data sets used in Seru, Shumway, and Stoffman (2010), Grinblatt and Keloharju (2000, 2001a, 2001b), and Kaustia and Knüpfer (2012) to cover 14.5 years of trading from January 1, 1995, to June 30, 2009. For each transaction, we are provided the number of shares transacted, the transaction price, a security identifier, an investor identifier code, and information about the investor.6
The transactions of individual investors are aggregated at the individual, not account, level. This level of aggregation eases concerns that investors are not actively moving money into and out of the market, but are moving money between accounts. The data set only provides information on investors’ direct holdings. Investments through an intermediary are attributed to the intermediary’s account. Thus, mutual funds will have their own accounts and will not be included in our analysis. Grinblatt and Keloharju (2000) find that less than 1% of the Finnish population were invested in mutual funds at the beginning of 1997. Although this proportion is likely to have grown, there is no obvious reason excluding flows to mutual funds would affect our results.
We limit our analysis to active individual investors and therefore drop transactions made by institutions. We limit our sample to individual investors for two reasons: (1) individuals are not regulated or restricted in their investment set and (2) individuals are likely investing for themselves, which eliminates any agency concerns. Further, individual investors are traditionally thought of as the least informed or skilled investor group, so finding any timing ability among this subgroup of investors is particularly surprising.7
Since our interest lies in investors’ ability to time the market, we aggregate investor flows at the investor-month level instead of analyzing trades in individual stocks. To calculate an investor’s monthly flow, we sum the euro value of all transactions by that individual within each month. If the investor places no trades during a month, then that investor’s monthly flow is equal to zero. If an investor purchases €x of stock and sells €y, the flow for that investor is €x – y in that month.
The trading records contain codes indicating the type of transaction for each record. About 60% of the records are normal transactions, or trades which individuals executed on the exchange. Other records are generated when firms merge, when they go through a stock split, or when other corporate events occur. After examining the data carefully, we choose to exclude some types of trades, but we keep most of the transactions. The transactions we exclude compose about 12% of the data and often have values of zero.8 We choose to retain records associated with mergers9 and other corporate events because they sometimes generate trades in individuals’ accounts which affect their market exposures.10 For robustness, we ran our analysis with only regular trades and our persistence results retain statistical significance but become economically smaller.
The data also contain short descriptions of all the securities traded on the exchange. We use these codes to identify put options. Since put option returns are usually of the opposite sign of their underlying assets, they almost always have negative betas. Thus, when we calculate investor flows, we give put option flows a negative sign. If investors sell puts, we count the associated flows as positive, and if they buy puts, we count the flows as negative.
Almost all of our tests of timing persistence will examine active traders, traders that have a minimum number of months with nonzero aggregate flows (active flows) in the first half of our sample. We focus on active traders to ensure we have an accurate measure of timing. For an investor to be included in our main analysis, they must have active flows in at least 15 of the 87 months in the first half of the sample.11 We use only first-half activity to determine whether investors are included in the sample so there is no look-ahead bias in our results. This also means we are not conditioning on survival (i.e., trading activity in the second period). By not conditioning on survival, we ensure the second-half performance measures are unbiased. We find similar results if we condition on being an active trader in both periods (we discuss this result more in Section 2.5.3). Figure 1 provides a histogram of active (nonzero) monthly flows in the first and second halves of the sample period for the 1,386,540 investors that owned a security during the first half of our sample.12 The main takeaway is that there is very little activity by the typical investor. The median investor has 2 active flows (months with nonzero aggregate flows) and less than 10% of investors have greater than 10 active flows (out of 87 months) in each half of our sample.
Figure 1.
Histogram of nonzero monthly flows
This figure displays a histogram of the number of months of nonzero monthly flows per investor in the first and second halves of the sample.
Summary statistics on trades and flows by active investors (i.e., investors with at least 15 nonzero flows in the first half of the sample period) are presented in Table 1. There are 68,937 investors that meet the threshold to be an active investor. We treat an individual’s first trade in our sample as their first trade in the market. Once an investor makes their first trade, they remain in our sample and receive a monthly flow equal to zero in all months in which they do not trade. As can be seen in column 3 of Table 1, the number of flows increases each year until 2001 as new investors enter the market. After 2001, the number of flows remains constant because we do not allow new investors into the “active” set in the second half of the sample, and investors with no trades have flows of zero.13 As a robustness check we also conduct our main tests assuming that all investors begin trading before the sample begins, and we find very similar results.
Table 1.
Summary statistics of investor monthly flows
| Year | # trades | # flows | Mean | SD | Outflows | Flow | Inflows |
|---|---|---|---|---|---|---|---|
| 1995 | 269,670 | 581,630 | 119.48 | 18,307.01 | 5.08% | 82.50% | 12.42% |
| 1996 | 362,778 | 622,318 | −722.73 | 28,609.67 | 10.37% | 81.33% | 8.30% |
| 1997 | 504,473 | 656,227 | 268.13 | 39,415.36 | 9.06% | 73.74% | 17.19% |
| 1998 | 746,684 | 725,097 | 322.13 | 36,271.59 | 11.01% | 68.37% | 20.62% |
| 1999 | 1,416,253 | 793,227 | 818.65 | 116,432.90 | 16.50% | 57.21% | 26.28% |
| 2000 | 2,142,002 | 823,453 | −2,306.98 | 210,042.00 | 19.91% | 52.52% | 27.57% |
| 2001 | 1,709,249 | 827,244 | 85.79 | 71,351.88 | 14.78% | 63.60% | 21.63% |
| 2002 | 1,277,208 | 827,244 | −431.95 | 124,862.10 | 17.05% | 67.99% | 14.96% |
| 2003 | 1,089,789 | 827,244 | 459.57 | 49,009.56 | 10.18% | 74.39% | 15.43% |
| 2004 | 1,290,402 | 827,244 | 1,381.05 | 88,048.21 | 10.31% | 71.31% | 18.38% |
| 2005 | 1,624,510 | 827,244 | −941.24 | 142,472.60 | 17.25% | 68.81% | 13.95% |
| 2006 | 1,808,329 | 827,244 | −1,687.51 | 290,959.70 | 13.29% | 74.34% | 12.37% |
| 2007 | 2,323,850 | 827,244 | −551.95 | 121,005.80 | 13.09% | 75.48% | 11.43% |
| 2008 | 2,390,962 | 827,244 | 974.97 | 59,321.56 | 6.67% | 79.54% | 13.80% |
| 2009 | 1,399,064 | 413,622 | 713.16 | 32,935.14 | 9.28% | 70.39% | 20.34% |
This table displays summary statistics of monthly investor flows into and out of securities on the Helsinki Stock Exchange. Our sample contains all transactions by individual investors from January 1995 to June 2009. Statistics are presented for the active traders in our sample. To be considered an active trader, investors must have monthly absolute flows greater than zero in a minimum of 15 months during the first half. # trades is the number of trades made during the year. # flows is the number of monthly flows aggregated at the investor-month level. Mean is the average investor-month flow size in euros. SD is the standard deviation of investor-month flows. Outflows is the percentage of flows that are outflows during the year. Flow is the percentage of flows equal to zero during the year. Inflows is the percentage of flows that are inflows during the year. We drop all trades with a value of zero and all canceled trades from our original transaction data. The 2009 values are for the first 6 months of the year.
In columns 6–8, we provide the percentage of monthly investor flows each year that are net outflows, net zero, and net inflows, respectively. An investor is assigned a monthly flow of zero if the investor was inactive or if the investor bought and sold exactly the same value of securities during the month. The overwhelming majority of monthly flows are equal to zero (because of inactivity). The year 2000 had the lowest percentage of flows equal to zero at 52.52%, and the year 1995 had the highest percentage at 82.50%. The percentage of flows that are outflows hits a low of 5.08% in 1995 and a high of 19.91% in 2000. The percentage of flows that are inflows ranges between 8.30% in 1996 and 27.57% in 1999. In 10 of the 15 years, a greater percentage of investor flows were inflows than outflows. This does not necessarily mean that the average flow size in euros was positive in these years. In column 4, we present the mean flow size, which is negative in 6 of 15 years.
To proxy for the relevant market return, we use returns on the HEX 25 Index (currently, the OMX Helsinki 25), which we obtain from Bloomberg. The HEX 25 is a value-weighted index of the 25 largest companies listed on the Helsinki Stock Exchange.14 The cumulative return of the HEX 25 index over our sample period appears in Figure 2, and monthly returns are presented in Figure 3. The figures clearly show that the sample period can be characterized by two episodes of a market run-up and a subsequent crash in prices. This remarkable pattern makes it possible for us to test whether timing ability around “market bubble” periods is persistent (reported in the Internet Appendix).
Figure 2.
HEX25 cumulative returns
This figure displays the growth of the OMX Helsinki 25 index (formerly the HEX25 index). The HEX25 is a stock index of the 25 most-traded shares on the NASDAQ OMS Helsinki exchange. The index is value weighted with a maximum weight on an individual security of 10%. We present the value of the OMX Helsinki 25 index for our sample period: January 1995 to June 2009. The shaded areas represent the 25 months surrounding the market peaks in each half of our sample.
Figure 3.
HEX25 monthly returns
This figure displays the monthly returns of the OMX Helsinki 25 index (formerly the HEX25 index). The HEX25 is a stock index of the 25 most-traded shares on the NASDAQ OMS Helsinki exchange. The index is value weighted with a maximum weight on an individual security of 10%. We present the monthly returns of the OMX Helsinki 25 index for our sample period: January 1995 to June 2009. The shaded areas represent the 25 months around the market peaks in each half of our sample.
In some tests, we use returns on individual securities to calculate active changes in portfolio betas, stock picking ability, investors’ total investment performance, and to control for any effects due to the dominance of Nokia (the most-traded stock in our sample) during our sample period. For these analyses, we need the time series of individual securities prices, which we obtain from Bloomberg for the 1,000 most-traded securities in our sample.15
1.1 Market timing measure
In the appendix, we provide a model of market timing that motivates our main market timing measure. We show the correlation between investor flow and future market returns is increasing in the investor’s market timing ability, defined as the precision of the investor’s signal about future market returns. Motivated by the model, our timing measure is calculated for each investor as follows:
| (1) |
where Flowit is the net monthly cash flow into or out of stocks for investor i in month t and is the cash return (or simple difference) of the HEX 25 in month t+1.16 Consistent with our model, we use cash returns so that both the flows and returns are in euros. We compare investors’ timing measures in two equal length subperiods: January 1995 to March 2002 and April 2002 to June 2009. Each subperiod comprises 87 months.
Using the correlation between market flows and future market returns makes particular sense for our data for multiple reasons. First, the correlation directly measures whether investors are able to time the market. Second, while common measures of timing utilize the fraction of each person’s wealth that they allocate to risky assets, we do not observe the total wealth of the individuals in our data. Our model (in the appendix) shows that, under some reasonable assumptions, observing wealth or the investor’s other investments is unnecessary to estimate market timing ability, easing concerns about this data limitation.17 Additionally, calculating the correlations of an individual’s flows with future market returns essentially adjusts each individual’s monthly flows by the standard deviation of their flows, which is a proxy for their total wealth.
Table 2 presents summary statistics for the timing measure for each half of the sample. There are 68,937 investors that meet the minimum number of active flows in the first half of the sample. In the second half, 2,097 of the previously active investors fail to have at least two nonzero monthly flows so we are unable to calculate a correlation for them. The mean correlation is 3% in the first half and 0% in the second half of the sample. In untabulated results, we find that the mean monthly timing measure for all (generally inactive) investors is –1% in each half of the sample. Frequent traders are on average better monthly timers than the entire population and these differences are statistically significant at the 1% level. A correlation of –1% for all investors is evidence that investors cannot time the market on average, consistent with Dichev (2007). There is significant variation in the timing measure, especially in the first half of the sample. In the first half, the standard deviation is 18%, 25% of investors have a correlation less than -7% and 25% of investors have a correlation greater than 13%. In the second half, the standard deviation is only 10%, and the 25th and 75th percentiles are -6% and 7%, respectively. In our persistence tests, we will examine whether investors that were in the top (bottom) percentiles in the first period were more likely to be in the top (bottom) percentiles in the second period.
Table 2.
Summary statistics of market timing measures
| Time period | Mean | SD | 25th | Median | 75th | N |
|---|---|---|---|---|---|---|
| 1995–2002 | 0.03 | 0.18 | −0.07 | 0.01 | 0.13 | 68,937 |
| 2002–2009 | 0.00 | 0.10 | −0.06 | 0.00 | 0.07 | 66,840 |
This table gives the summary statistics (mean, standard deviation, 25th percentile, median, and 75th percentile) for the market timing measure. The monthly measure is calculated using Equation (1). We separate the sample into two equal length subperiods: January 1995 to March 2002 and April 2002 to June 2009. Statistics are presented for the active traders in our sample. To be considered an active trader, investors must have monthly absolute flows greater than zero in a minimum of 15 months. N is the number of investors that meet the active investor criteria in the first half of the sample and the subset of those that have enough flows to calculate a correlation in the second half.
2. Results
Table 3 presents results examining persistence in market timing ability. The table is a simple cross-tabulation of the first and second-half timing measures. The rows of the table are sorted into quintiles based on performance in the first half, while the columns are sorted into quintiles based on performance in the second half. We present row percentages, that is, percentages conditional on being in the relevant first-half quintile. Under the null hypothesis that there is no relation between timing performance in the first and second halves of the sample, we would expect to see about 20% of the observations in each cell. The indications of statistical significance are for tests of the null hypothesis.18
Table 3.
Two period cross-tab of the timing measure
| Second period |
Average second |
||||||
|---|---|---|---|---|---|---|---|
| First period | Q1 | Q2 | Q3 | Q4 | Q5 | Total | Period timing |
| Q1 | 24.43%** | 20.57% | 19.35% | 18.36%*** | 17.29%** | 100% | 1.56%*** |
| Q2 | 20.20% | 20.31% | 19.93% | 20.03% | 19.53% | 100% | 0.55% |
| Q3 | 19.42%* | 20.31% | 20.70%*** | 19.78% | 19.79% | 100% | 0.46% |
| Q4 | 18.74%* | 20.05% | 20.25% | 20.27% | 20.69% | 100% | 0.14% |
| Q5 | 17.16%** | 18.76%** | 19.78% | 21.58%** | 21.71%** | 100% | –0.63% |
| Total | 100% | 100% | 100% | 100% | 100% | Q1-Q5 = 2.21%*** | |
This table provides frequencies of investors sorted and grouped by their monthly timing measure in each of the two sample subperiods (January 1995 to March 2002 and April 2002 to June 2009). The monthly timing measure is calculated using Equation (1). The January 1995 to March 2002 percentile rank is along the vertical axis, and the April 2002 to June 2009 percentile rank is on the horizontal axis. The timing measures are grouped into quintiles. Q1 is the top performance quintile. We present row percentages. If the two periods were independent, we would expect row percentages of 20% in each cell. Levels of significance are based on placebo tests (see Section 2 for more details). The quintile cutoff values for the first period are 0.173, 0.051, -0.019, and -0.098. For the second period, the cutoff values are 0.088, 0.028, -0.022, and -0.079. Average second period timing provides the average second-half timing measure (April 2002 to June 2009) of investors sorted and grouped by their monthly timing measure in the first half of the sample (January 1995 to March 2002). The sample size is 66,840 investors. The pairwise correlation between the first and second period monthly timing measures is.0745 and is significant at the 5% level. The Spearman rank correlation coefficient is.0726 and is significant at the 5% level.
p <.01;
p <.05;
p <.1. Null: Cell%=20%.
The results of Table 3 clearly show some timing ability in our data. Focusing on the top row, which corresponds to the best performers in the first half, we see that the fraction of investors that appear in each performance quintile in the second half of the sample declines monotonically from 24.43% to 17.29%. These results indicate that the best first period timers are 41% more likely to be in the top 20% than in the bottom 20% in the second half. Looking at the last row, the fraction in each cell increases monotonically. The extreme quintile pairs are statistically significantly different from the null value of 20%. Looking at the first column of the table, again the fractions in each cell decline near-monotonically. In the last column, the fractions increase near-monotonically.
If investors were uniformly distributed across all the cells in the table, we would expect to see 2,674 investors in each cell. In the first row and column, there are actually 3,266 investors, 592 more than we would expect by chance. The last row and column actually have 3,036 investors, 362 more than we would expect by chance.19 We also examine the number of investors that outperform a certain threshold. We find the number of investors exhibiting a correlation above.1 in both periods is about double what would be expected by chance.20 A rank correlation between the performance in the first and second halves of 7.45%, which is significant at the 5% level, provides further evidence for persistent market timing ability among the investors in our sample. These results show that the best timers in the second half come disproportionately from the better quintiles in the first half, and the worst timers in the second half come disproportionately from the worst quintiles in the first half.
In the last column of Table 3, we report the average second period timing measure for each first period quintile. For the highest quintile, the average timing measure in the second half is 1.56% and is significantly different from zero at the 1% level. For the lowest quintile, the average correlation is only -0.63%. The difference between the highest and lowest quintile is 2.21% and is statistically significant at the 1% level using the traditional test of significance. The results of the table imply that the positive correlation between the timing measures in the first and second halves is not driven by the tails of the distribution, and it is not primarily driven by either very unsuccessful or very successful timers. Rather, there is a considerable amount of persistence in good and bad timing abilities across the entire distribution. In subsection 2.3, we calculate the correlation between first-half group-level aggregate flows and future returns and find even more economically significant dispersion in performance in the second half.
In the Internet Appendix, we present similar results using a beta-adjusted timing measure in which the euro flows are multiplied by the transacted security’s beta. The results are statistically and economically very similar to those in Table 3. This indicates that the most important timing behavior in our data is driven by dynamic asset allocation. Because of the similarity in results between the beta-adjusted flow and unadjusted flow measures, we use the simple unadjusted flow measure throughout the paper. We discuss additional robustness tests examining the persistence in market timing ability in Section 2.5.
2.1 Calculation of significance levels
If investors make investment decisions that are correlated across individuals or over time because of factors like regional shocks to wealth, following the advice of a common financial advisor, trading on common signals like price-to-earnings, or dollar cost averaging (e.g., purchase €500 of stock each month), then traditional tests of statistical significance may be misspecified. In a different context, Fama and French (2010) deal with similar issues using a bootstrapping procedure. They motivate their procedure as a way to separate luck from skill. While the nature of our data makes it impractical for us to use their method, we separate luck from skill by using the distribution of a number of placebo timing measures to determine statistical significance.
The placebo tests are conducted by recalculating each individual’s timing measure each half using the market return in month t + m, where . This provides us with 51 placebo samples of timing measures in which any correlation between flow and return should be attributed to luck. We use the 51 samples to construct a distribution of cross-tabulation cell values and a distribution of correlations between first and second-half timing measures. We use the distribution of these values to calculate the likelihood of observing the point estimates in our main analysis. We choose t + 10 as the earliest month for the placebo tests since it is highly unlikely that 10-month-ahead returns are related to current flows. These placebo tests will account for the commonality in trading activity across individuals since the distribution of flows is exactly the same in the placebo tests as in our main test. In general, we find that results using the placebo test method are less statistically significant than results using normal standard errors. Even so, the statistics of interest remain significantly different from zero. The placebo method is used to calculate all of the p-values and associated levels of statistical significance reported in Table 3.
We also calculate p-values using three additional methods. The first two methods are similar to the placebo test, except we simulate market returns 1,000 times and reestimate individuals’ timing measures and the distribution of timing measures for each simulated return path. In the first set of simulations, we assume returns are independent and normally distributed with a mean and variance equal to the realized mean and variance of market returns during the sample period. In the second set of simulations, we estimate an AR(1) process for market returns and simulate returns using the estimated parameters. Both sets of simulations indicate greater statistical significance of our results than the p-values from the placebo tests. The third method is a bootstrap procedure that randomly matches investors in the first and second halves based on their geographical location and trading frequency. The statistical significance remains similar to using normal standard errors (i.e., extremely statistically significant).
2.2 Monthly return predictability
While the cross-tabulation evidence presented above suggests that skilled individual investors trade on predictable market returns, in this section we present a more direct test of market predictability. The more direct test is useful because it (1) is relatively easy to interpret, (2) allows us to examine whether other common predictors explain market timing by individuals, and (3) is arguably econometrically cleaner than the cross-tabulations reported in Table 3. We compare the ability to predict market returns of group-level investor flow measures to three other predictors of market returns: earnings-to-price ratio, dividend yield, and the concurrent market return. To construct a group-level flow variable, we group individuals’ flows into groups based on first-half performance. Specifically, our group flow is defined as follows:
| (2) |
where the subscript g indexes quintile groups, and i indexes individuals. Individual flows are standardized using the mean and standard deviation of the individual’s past 60 monthly flows.21
Additionally, we create another flow measure from the difference in GroupFlowgt for the top 20% of timers and the bottom 20% of timers based on first-half performance. The measure is calculated as follows:
| (3) |
Table 4 presents correlations between the top 20% group flow, the bottom 20% group flow, the Top-Bottom strategy (Top 20%–Bottom 20% flow), and the three predictor variables (the earnings-to-price ratio, dividend yield, and the lagged excess return of the HEX25). Examining columns 2 and 3, we see that the group flows are contrarian. Both the top 20% and bottom 20% group flows are significantly correlated with each other and they are both positively correlated with the valuation ratios (earnings-to-price ratio and the dividend yield) and negatively correlated with lagged market returns. The Top-Bottom strategy (presented in column 1) is significantly positively correlated with the earnings-to-price ratio and past market returns, and is insignificantly correlated with the dividend yield. These correlations are consistent with the “timing” portion of flows capturing return continuation trading (correlation with past market returns), but also trading in a contrarian manner (correlation with the earnings-to-price ratio). The R2 from regressing the top 20% group flow, bottom 20% group flow, and top 20% minus bottom 20% group flow on the three predictor variables is 19%, 33%, and 18%, respectively. Based on these results, the persistence in timing that we capture cannot be fully explained by simple strategies using these three predictors.
Table 4.
Cross-correlation between market timing group flows and return predictors
| Variables | Top 20% –Bottom 20% flow | Top 20%flow | Bottom 20%flow | log(EP ratio) | log(div. yield) | SD |
|---|---|---|---|---|---|---|
| Top 20% –Bottom 20% flow | .116 | |||||
| Top 20% flow | .709 | 0.194 | ||||
| (.000) | ||||||
| Bottom 20% flow | .156 | .807 | .139 | |||
| (.149) | (.000) | |||||
| log(EP ratio) | .321 | .374 | .255 | .273 | ||
| (.002) | (.000) | (.017) | ||||
| log(div. yield) | .001 | .341 | .477 | .369 | .254 | |
| (.990) | (.001) | (.000) | (.000) | |||
| HEX25t | .187 | −.215 | −.458 | −.243 | −.339 | .065 |
| (.083) | (.045) | (.000) | (.023) | (.001) |
This table presents correlations between group flows and various return predictors over the time period April 2002 to June 2009. Top 20% flow (Bottom 20% flow) is the month t group flow from for the top (bottom) quintile of timers sorted by first-half performance (see Equation (2)). Top 20% –Bottom 20% flow is the month t difference in group flows between the top and bottom quintile of timers sorted by first-half performance (see Equation (3)). log(EP ratio) is the logarithm of the earnings-to-price ratio of the HEX 25 at the end of month t less the monthly Euribor rate. log(div. yield) is the logarithm of the dividend yield of the HEX 25 at the end of month t. HEX25t is the return on the HEX 25 in month t (i.e., the contemporaneous return). p-values are reported in parentheses.
We begin our examination of return predictability by regressing future market returns on the group flow measures in the second half of the sample. We estimate regression models of the following form:
where is the return on the HEX 25 over month t + 1, is the 1-month Euribor rate, Xt is the vector of excess returns on the HEX25 in month t, the logarithm of dividend yield and the logarithm of earnings-to-price ratio of the HEX 25 for month t, and GroupFlowgt is calculated for month t according to equation (2). We focus on the flows of the top 20% and bottom 20% groups and the Top-Bottom strategy.
Table 5 presents the results. In columns 1–3, we report results of univariate OLS regressions using the flow-based measures. The top 20% flow is positively related to future returns with a coefficient of 0.0309, which is insignificant. The bottom 20% flow has a negative coefficient of -0.0353 and is insignificant. The coefficients have the expected signs, but the coefficients and R2 are small in magnitude. In column 1, the independent variable is the Top-Bottom strategy, calculated according to equation (3). This measure can significantly predict future market returns at the 5% level, with a coefficient of 0.137. While the top 20% flow and bottom 20% flow are significantly correlated with each other, those times when past successful timers are buying or selling and past unsuccessful timers are doing the opposite have particular power to predict market returns. The coefficient implies that a one-standard deviation increase in the flow measure is associated with an increase in the market return of 1.59%. The R2 is 6%, which is much larger than any of the other univariate regressions. Thus, the Top-Bottom strategy has some ability to predict returns before controlling for the other economic variables.
Table 5.
Predictability regressions
| Variables | (1) | (2) | (3) | (4) | (5) | (6) | (7) | (8) | (9) |
|---|---|---|---|---|---|---|---|---|---|
| Top 20% - Bottom 20% flow | 0.137*** | 0.168** | 0.168** | 0.176*** | |||||
| (0.059) | (0.064) | (0.064) | (0.003) | ||||||
| Top 20% flow | 0.031 | ||||||||
| (0.036) | |||||||||
| Bottom 20% flow | –0.035 | ||||||||
| (0.051) | |||||||||
| log(EP ratio) | –0.034 | –0.056* | –0.056* | ||||||
| (0.026) | (0.029) | (0.029) | |||||||
| log(div. yield) | –0.015 | 0.017 | 0.015 | ||||||
| (0.028) | (0.030) | (0.031) | |||||||
| HEX25t | 0.200* | 0.108 | 0.110 | ||||||
| (0.107) | (0.116) | (0.117) | |||||||
| HEX25 | –0.018 | ||||||||
| (0.112) | |||||||||
| Observations | 87 | 87 | 87 | 87 | 87 | 87 | 87 | 87 | 174 |
| R-squared | .060 | .009 | .006 | .020 | .003 | .040 | .125 | .125 | .090 |
This table presents predictive OLS regressions for the market excess return over the time period April 2002 to June 2009. The dependent variable is the monthly HEX25 return in month t+1 less the monthly Euribor rate (). Top 20% flow (Bottom 20% flow) is the month t group flow from for the top (bottom) quintile of timers sorted by first-half performance (see Equation (2)). Top 20%–Bottom 20% flow is the month t difference in group flows between the top and bottom quintile of timers sorted by first-half performance (see Equation (3)). log(EP ratio) is the logarithm of the earnings-to-price ratio of the HEX 25 at the end of month t. log(div. yield) is the logarithm of the dividend yield of the HEX25 at the end of month t. HEX25 (HEX25t) is the return on the HEX25 in month t + 1 (t) less the monthly Euribor rate. R-squared is the unadjusted R2. In column 9, we rank investors in the second half of the sample and use their first-half flows to calculate the Top 20%–Bottom 20% flow for the first half of the sample, so we can use the entire 14.5 years of data for the predictability regression. Standard errors appear in parentheses.
p <.01;
p <.05;
p <.1.
In columns 4–6, we regress the market return in month t + 1 on the other predictor variables individually. The dividend yield and earnings-to-price ratio show little forecasting power. They both have negative and insignificant coefficients. Other researchers typically find a positive coefficient for these two valuation ratios. The coefficient for past market return is 0.200 and is significant at the 10% level. The R2 is 4%, which is of reasonable magnitude. This indicates there is some autocorrelation in the HEX25 returns during this period. In column 7, we include the Top-Bottom variable with the other economic variables. The R2 rises to 12.5%. The Top-Bottom variable remains significant at the 5% level and the coefficient actually increases to 0.168.22 In the Internet Appendix, we create a market timing measure that is orthogonal to a strategy based on the autocorrelation in monthly returns. We find similar persistence in this test as in our main test. Column 8 reports on a predictive regression that includes an additional month lag of the return on the index. This leaves the other coefficients in the regression almost unchanged. Thus, the difference in performance across good and bad timers is not captured by the three economic variables (past returns, dividend yield, and earnings-to-price ratio) and the results suggest that including information in individual investor flows improves the performance of the predictability regressions.23
In column 9, we report the results of a regression that uses the entire time series of HEX 25 index returns, from 1995 to 2009. To construct the first-half data for this regression, we “reverse” our sorting procedure by ranking investors that are active in the second half of the sample by their timing performance in the second half, and then we use those rankings to calculate the flows of successful minus unsuccessful timers in the first half of the sample. We follow exactly the same procedure that we follow to create the net flows in the rest of the paper, but we swap the second half of the sample for the first half of the sample. Combining data from the first half of the sample with data from the second half of the sample allows us to run a predictability regression with 174 months of data rather than just 87 months. While this procedure is clearly not implementable in real time, it does generate out-of-sample timing flows for the entire sample period rather than just for the second half of the sample. Using the entire period makes our coefficient slightly larger and much more statistically significant than using just the second half. In unreported tests including other predictor variables does not significantly alter the coefficient of the Top-Bottom variable.
One common concern about market predictability regressions is the possibility of bias pointed out by Stambaugh (1999). This predictive regression bias arises when the independent variable in a time-series regression follows a process that is close to a unit root. The autocorrelation of our net flows measure is only.44, and a Dickey-Fuller test of the hypothesis that it follows a unit root process easily rejects that possibility. For comparison, the autocorrelation coefficient for the log of dividend yield is.92 in our data. Thus, the predictive regression bias that is common in market predictability regressions is not likely to be an issue for our results.
The observed market timing persistence and return predictability is especially surprising given the different economic drivers behind the run up and crash in the 1995–2002 time period and the run up and crash in the 2002–2009 time period. The relative success of investor flows in predicting future returns may be due to investors dynamically adjusting their models to different economic environments and synthesizing many public signals beyond the three economic variables. The above results and the different economic environments across the two time periods ease concerns that performance persistence can be explained by investors following a simple strategy.
2.3 Economic significance
There is clear persistence in investor market timing ability, and information in investor flows is correlated with future market returns. In this subsection, we further examine the economic significance of the observed timing persistence and return predictability. We calculate the performance of strategies that mimic successful and unsuccessful timers as well as the Top-Bottom strategy.
We calculate three performance metrics. First, we calculate the correlation between the quintile group flow in month t and the excess return on the HEX 25 (minus 1-month Euribor) in month t + 1. Second, we calculate the average return to a flow-weighted return strategy. The strategy weights each month’s return by the previous month’s group flow. Specifically,
| (4) |
where is the excess return on the HEX 25 index in month t + 1 and GroupFlowgt is the monthly group flow. The third measure accounts for the risk in the flow-weighted return strategy by dividing the average excess return from the flow-weighted strategy by the standard deviation of the strategy. We refer to this performance metric as the Sharpe ratio.24
Table 6 presents the results. The first row of Table 6 shows the correlations between group flows and market excess returns. Consistent with the previous results, successful timers in the first half of our sample are more likely to be successful in the second half of our sample. The top-performing quintile (Q1) has a correlation of 9%. The correlations decrease nearly monotonically from the top to bottom quintiles. For the worst timers, the correlation is –8%. Note that the passive buy-and-hold strategy has a zero correlation since it is analogous to investing €1 into the index in the first month and then cumulating that investment in subsequent months. The correlation for the Top-Bottom strategy is 25%, which is consistent with the R2 value in Table 5.
Table 6.
Second-half performance measures
| Timing measure | Q1 | Q2 | Q3 | Q4 | Q5 | Q1-Q5 | Passive |
|---|---|---|---|---|---|---|---|
| A. Performance based on group flows | |||||||
| Correlation(flowt,HEX25) | .09 | .05 | –.00 | –.00 | –.08 | .25** | .00 |
| Annualized flow-weighted excess return | 1.37 | 0.65 | –0.00 | –0.06 | –0.81 | 1.02 | |
| Sharpe ratio | 0.30 | 0.20 | –0.00 | –0.02 | –0.25 | 0.88 | 0.04 |
| B. Individual portfolio total performance | |||||||
| Average annualized excess return | 1.08 | 1.70 | 1.01 | 1.32 | 1.08 | –0.00 | −2.03*** |
| Average individual annual Sharpe ratio | 1.46 | 1.07 | 0.66 | –0.09 | –0.38 | 1.84*** | −0.91*** |
This table provides measures of performance in the second half of our sample for investors grouped by first-period performance. Panel A reports the performance of strategies that follow group flows. Row 1 presents the correlations between quintile group flows in month t and market excess returns in month t + 1. Row 2 presents the average flow-weighted return calculated according to Equation (4). This measure multiplies the return in month t + 1 by the group flow in month t. Row 3 presents the Sharpe ratio. The quintile group flows are calculated according to Equation (2). The quintile 1 minus quintile 5 (Q1-Q5) flow is calculated according to Equation (3). Panel B reports on the average total return of investors calculated as described in Section 2.5.2. Note that the passive averages in panel B are averages of a random sample of 30,000 accounts from the data, so those averages do not require that investors be active traders. The averages in panel B use actual individual stock returns to calculate portfolio performance.
We report the average annualized flow-weighted returns along with the returns generated by a passive buy-and-hold strategy in the second row. We find the flow-weighted returns for the top-performing quintile (Q1) outperform the buy-and-hold strategy in terms of average return by 35 bps per year. Even more striking is the difference in performance between past successful and past unsuccessful timers. The difference in average flow-weighted return is 2.19% per year (p-value0.02). In the third row of Table 6, we report the Sharpe ratio (or the flow-weighted return-volatility ratio). The successful timers’ ratio is 7.5 times the ratio of the passive buy-and-hold strategy. The ratio for the worst timers is negative and economically significant at –0.25. The ratio of the Top-Bottom strategy is extremely economically significant at 0.88, which is 22 times the passive strategy’s ratio.
To illustrate the nature of the timing strategy returns we plot the cumulative strategy returns with cumulative HEX25 returns in Figure 4. We also plot the cumulative strategy returns scaled up to have the same volatility as HEX 25 returns in Figure 5. The figures show that the volatility of the weights implied by the strategy is pretty small. Weights are positive in about half of the months. The average absolute value of the market weight of the strategy is about 6.5%. Scaling those weights up generates very large returns, but some months in the scaled-up plot have extreme weights (as low as -6, as high as 4). While it presumes unrealistically extreme portfolio weights, Figure 5 is useful because it shows what the strategy return time series looks like. There are several months when the strategy return seems particularly high. Omitting the 5 months with the highest returns leaves the strategy with a Sharpe ratio of 0.41. While the strategy has high returns in our sample, a longer sample period would be useful to determine if the strategy is truly this successful. The strategy return results suggest that market returns are predictable, and the market timing ability of individual investors is economically large and important.
Figure 4.
Cumulative return of timing strategy versus HEX25
This figure plots the cumulative return of the trading strategy that mimics the trades of past successful market timers and shorts the trades of past unsuccessful timers. The figure also plots the cumulative return of the HEX 25 index. The figure is plotted only for the second half of the sample, from April 2002 to June 2009.
Figure 5.
Cumulative strategy return scaled to have market volatility
This figure plots the cumulative return of a scaled trading strategy that mimics the trades of past successful market timers and shorts the trades of past unsuccessful timers. The strategy is scaled to have the same volatility as the HEX 25 index. The figure also plots the cumulative return of the HEX25 index. The figure is plotted only for the second half of the sample, from April 2002 to June 2009.
Our tests, so far, have examined a linear relationship between investor flows and future returns. We are also interested in whether investor flows can improve predictions of “bear” markets, which are defined here as a return at least half of one standard deviation below the sample average. We calculate the increase in probability of a “bear” market in the next period if there is a large (negative) difference between good and bad timers’ flows. We consider a difference in flows “large” if it is at least half of a standard deviation below its mean (labeled “Low Flow”). The discrepancy in flows is calculated in the same way as the Top-Bottom strategy in equation (3). The Internet Appendix presents the results. We find that when good timers have lower flows than bad timers, this is a pretty good predictor of poor market returns. We find a “Low Flow” indicates a near doubling of the probability of negative market returns in the next month than the unconditional probability. Because we only have 87 months over which to calculate the probabilities and very few months with significant outflows, this number should be considered only as suggestive evidence.
2.4 Characteristics of market timers
2.4.1 Market timing and stock picking
There have been many studies that document persistence in stock picking ability across individual investors (e.g., Seru, Shumway, and Stoffman 2010; Coval, Hirshleifer, and Shumway 2021; Che, Norli, and Priestly 2009; Grinblatt, Keloharju, and Linnainmaa 2012). In this subsection, we examine whether individuals that are better market timers are also better at stock picking. We calculate a stock selection measure and the monthly market timing measure over the entire sample period (14.5 years) for investors with at least 75 trades and 15 nonzero monthly flows in the first half of the sample. We find a Spearman rank correlation between the two measures of -1.24% (p-value of.05) and a pairwise correlation of 0.14% (p-value of.82). There is little evidence that good market timers are more likely to be good stock pickers when ability is calculated over a long time span. The lack of a positive correlation between the two skills could be due to noise in our measure of stock picking ability or due to investors specializing in one of the two skills. Kacperczyk, Van Nieuwerburgh, and Veldkamp (2013) provide evidence that skilled fund managers will focus on one of the two skills conditional on the business cycle. They find that fund managers that are good stock pickers in expansions are more likely to be good timers in recessions. In the Internet Appendix, we provide evidence that investors that are better timers in the bubble periods are also better timers during normal times. We do not measure stock picking or market timing ability during different periods of the business cycle, however. Instead, we show that, unconditionally, stock picking and market timing are relatively uncorrelated.
2.4.2 Demographics of market timers
In the Internet Appendix, we provide results analyzing which investor characteristics are related to market timing ability. We analyze investors along many dimensions: sex, age, education, other demographics, and trading behavior. We estimate three separate regressions where the dependent variable for investor skill is based on the investor’s monthly timing measure calculated over the entire 14.5-year period of interest. The Internet Appendix reports the regression results.
There is no simple explanation of which traders are good timers and which are poor timers. We find that middle-aged men from ZIP codes with higher population density are better timers than others. However, living around highly educated people (the percentage of individuals in a ZIP code with a university degree) significantly reduces the likelihood of being in the top 20%. Investors that speak the Finnish language (instead of Swedish) are marginally worse timers. Investors that trade options are marginally worse timers, and those that trade much more are also worse timers. Surprisingly, investors that have larger average trade size, a proxy for wealth, have worse timing performance. Overall, there is no obvious pattern of who appears to be a successful timer.
2.5 Additional results and robustness
2.5.1 Persistence in timing during “bubble” periods
We examine whether investors exhibit persistence in market timing skill specifically around the two major market peaks and crashes. For this analysis, we recalculate an investor’s timing ability only using the 25 months surrounding each market peak (February 2000 and July 2007). We find similar persistence in bubble-period timing as in the full sample. We find a similar relationship between quintile group flows and market returns in the second half as we find in Table 6. We also find persistence during normal, nonbubble periods. The Internet Appendix provides the results.
2.5.2 Total performance
To determine the extent to which our investors are actually outperforming the market, we make an attempt to calculate their total performance given our data. These calculations differ from all others in the paper because they incorporate the returns of actual stock holdings, using individual stock returns rather than just market returns. We have to make a few strong assumptions to do this, since we do not observe peoples’ entire wealth. Our strongest assumption is about how long people hold wealth in cash before they buy stocks and after they sell stocks. Since all of our timing performance comes from dynamic asset allocation, if we assume that people obtain cash right before each purchase and spend cash right after each sale then we will not capture their timing success. If we assume that people have all the cash they spend on stocks at the beginning of the sample period and that they hold all cash generated by sales until the end of the sample, we will likely overestimate their cash holdings. As a compromise, we assume that people have the cash associated with all of their trades in a calendar quarter. We calculate each quarter the amount of cash they would have to have on hand to make all their trades in that quarter. So, for example, if an investor buys €100 of a stock on August 15, we assume that investor has access to the present value of €100 of cash on July 1. Similarly, if an investor sells €100 of a stock on August 20 and has no more transactions, we assume that the investor holds that cash and earns a risk-free rate until September 30, the end of the calendar quarter. We assume that all cash holdings earn the Euribor rate and calculate total portfolio returns, using actual stock returns rather than just market index returns, for each investor. Panel B of Table 6 reports our results.
Table 6 reports both the average excess return and the average Sharpe ratio of active traders. For benchmarking, we also examine the performance of a random sample of 30,000 individuals in the original data set. These 30,000 investors do not need to have 15 months of nonzero portfolio flows, nor do they need to meet any other particular trading frequency filter, so most of them do not qualify as active traders. Our Sharpe ratios are calculated for each individual using their own time series of returns. To calculate a Sharpe ratio for an investor, we require them to have at least 10 quarters of portfolio returns in the second half of the sample. Note that this does not require any trading, just at least one valid holding for 10 quarters.
The average returns of the most successful timers are pretty close to the average returns of the least successful timers. However, the active traders have higher average excess returns than the randomly selected investors, who are not generally active. While the active traders have average annual excess (above the risk-free rate) returns of about 1%, the randomly selected traders have positive average returns but average excess returns of negative 2%. This is not extremely surprising since our active investors are a very nonrandom subset of all investors, and we measure simple returns without adjusting for transactions costs. The Sharpe ratios of our active investors are strongly related to timing ability. The average Sharpe ratios of the lower ability timing quintiles are actually negative, indicating that those with high returns also have relatively high standard deviations of returns in these groups. The average Sharpe ratio of the best timers is significantly higher than the average for the worst timers, indicating that the best timers indeed benefit from their market timing ability. We take this as suggestive evidence that our best timers are outperforming the market.
2.5.3 Survivorship
We purposefully select active investors by requiring at least 15 nonzero monthly flows in the first half of the sample, but we only require two flow months in the second half of the sample so we can calculate a correlation. This makes our analysis largely unaffected by survivorship. Nevertheless, we examine whether investors’ timing performance is related to future exit and any effect this may have on our results.
We examine whether investors learn about their abilities by analyzing how first period market timing performance affects the probability of an investor stopping active participation in the market. The results (presented in the Internet Appendix) are mixed. Investors in the worst market timing quintile are about 42% more likely to drop out of the sample than investors in the top market timing quintile and the difference is significant. Investors in the middle quintiles are more likely to drop out than either the top or bottom quintile. Thus, the relationship is nonlinear in a way that investors with average timing performance are most likely to drop out.
The nonlinearity eases concerns that any kind of survivorship may be affecting our results. We further examine whether our sample selection process affects our inferences by performing the same analysis with investors that are active (having at least 15 months with nonzero flows) in both sample halves instead of classifying based on just first-half activity. The results (presented in the Internet Appendix) are very similar to those in Table 3, so we conclude that our sample selection procedure is reasonable, and survivorship is not driving the observed market timing persistence.
2.5.4 Modifications to timing measure
In this subsection, we examine several possible explanations for our results besides heterogeneity in investor market timing ability. We adjust our timing measure in various ways to rule out alternative theories for the observed persistence and show the robustness of our results. We also examine the ability (or inability) of financial institutions to persistently time the market in our sample.
Finland is a relatively unique market in that Nokia makes up approximately 50% of the market capitalization during our sample period. Although the market weight of Nokia in our index is capped at 20% until August 1, 2001, and then capped at 10% thereafter, one possible explanation for our results is that investors are just timing movements in Nokia, and this is driving our results. To address this concern, we run two tests. First, we test whether investors can persistently time Nokia, by correlating investor monthly flows into and out of the market with Nokia returns over the next month. The Internet Appendix presents the results. The persistence is similar to the results using market returns, with a nearly monotonic relationship between first and second period performance. Because Nokia’s returns are correlated with market returns, this result may not be surprising and does not necessarily mean Nokia is driving our results. To determine whether investors are timing the market, not solely Nokia, we run a similar test, but omit flows into and out of Nokia and exclude Nokia’s returns from the index. The Internet Appendix provides the detailed results. Once again, we see a near-monotonic relationship between first and second period performance and very significant departures from the null of no timing, so it is highly unlikely Nokia alone is driving the timing persistence we observe.
Investors may time the market at various frequencies: daily, monthly, quarterly, or longer. We focus on monthly timing for two reasons. First, we cannot reliably estimate timing over longer frequencies due to the length of our sample. Second, a very small percentage of individuals trade on a daily basis in our sample and trading daily would be a costly activity for most individual investors. Goetzmann, Ingersoll, and Ivković (2000) show that estimating timing ability of daily timers at a monthly frequency will create downward bias in the estimation of timing skill when using a Henriksson and Merton (1981) returns-based measure. It is unclear if a similar bias exists with our timing measure and if the rank-order of investors would change significantly if we changed the frequency. As a robustness check, we reestimate our main sets of analysis with a quarterly timing measure. This measure is calculated following Equation (1), except we replace the cash return in month t+1 with the cash return from the start of month t+1 to the end of month t+3. The Internet Appendix presents the results, which are very similar to those for the monthly measure. The monthly and quarterly measures are highly correlated across investors with a correlation of.75 (.66) in the first period (second period). This gives us confidence that our results are robust to measuring timing over different frequencies.
We also look for persistent timing ability among financial institutions. We perform the same kind of test for institutions that we perform for individuals. However, our test is somewhat limited by the relatively small number of active institutions in our data; there are only 324 institutions that trade sufficiently in both halves of the sample to be included in our analysis. The Internet Appendix presents the results, which reveal that there is no clear evidence of persistence in relative timing ability among these institutions. This is not too surprising given the low sample size of our tests, the objectives of most financial institutions, and the lack of control institutions have over their customers’ inflows and outflows. We would not expect, for example, for market makers, index funds, or standard equity funds to display any timing ability (as captured by our measure) for these reasons.
3. Conclusion
We document significant persistence in the ability of individual investors to time the stock market, both in general and during market bubble periods. We find that some investors consistently time the market while others consistently mistime the market. We also find that information in investor flows is a better predictor of future market returns than commonly used economic variables. A trading strategy based on investor flows has a Sharpe ratio of 0.88, which is over 22 times the market ratio of 0.04 during our sample period.
The fact that some investors consistently time the market has implications for the way we model markets and investor behavior. It means, of course, that the market may not be perfectly informationally efficient and capital may be misallocated throughout the economy. It may alternatively mean that expected market returns vary substantially with time, and some market participants are willing to bear the additional risk associated with high risk premium periods. If there is a lot of dispersion in the skill of investors and people can learn, then people may rationally incur significant costs to improve their skills. They may trade in an experimental fashion to learn, or they may use financial products that do not make sense in a world characterized by perfectly efficient markets with stable risk premiums. Our evidence suggesting that some individual investors have the ability to time the market, and bubble periods in particular, matters for some of the most important questions in finance.
Supplementary Material
Appendix
A.1 A Model of Market Timing
A number of authors have considered the best way to measure market timing (for a review of the returns-based measurement literature, see Jagannathan and Korajczyk 2014). Most of the methods proposed by these authors involve explicit market forecasts or estimating time-varying portfolio betas using portfolio returns (e.g., Treynor and Mazuy 1966; Merton 1981; Henricksson and Merton 1981), and most use data generated by professional asset managers. These methods are not well-suited for our setting. Instead, we focus on the main dimension in which individual investors will change their market exposure: time-varying asset allocation, measured with flows into and out of the market.
We present a simple model of a market timing investor to derive an appropriate measure of market timing. The model has two time points (t1 and t2) and a single investor who selects an optimal mean-variance portfolio at time t1:
| (A1) |
where is the portfolio value at time t1, is the risk-aversion parameter, and the maximum is taken over trades at time t1. The investor trades one stock and a risk-free bond, and follows the market index. The stock can be any risky asset or portfolio, including the market portfolio. The dynamics of the market index, the stock, and the bond are as follows:
| (A2) |
where μm, μs, σm, and σs are constant drift and volatility parameters, r is the risk-free rate, ϵm, ϵs, and are standard normal random variables, the correlation between ϵs and ϵm is ρs, and and ϵm are independent. We assume that ρs is positive without loss of generality. If ρs is zero, the investor cannot time the market by trading S.
The investor’s portfolio value at t1 and t2 is given by
| (A3) |
where hs and hb are the stock and bond holdings before the trades at time t1, Fs and Fb are the portfolio flows in terms of money into or out of stocks and bonds at time t1, and, for simplicity, there are no trades at time t2. We assume that the portfolio strategy is self-financing so that .
Finally, we assume that at time t1 the investor observes a noisy signal for the value of ϵm that is fully observed at time t2 as follows:
| (A4) |
where is the realized value of ϵm at time t2, the parameter determines the signal-to-noise ratio, and ϵz is a standard normal random variable independent of the other uncertainties. Since the prior distribution of ϵm is the standard normal distribution and the signal is normally distributed, the posterior distribution of ϵm is normal with mean and variance:
| (A5) |
where is the realized value of Z. If then the posterior distribution equals the prior distribution, and if η = 0 then the investor knows the outcome of ϵm after observing at time t1. We can therefore use η, the signal-to-noise ratio parameter, as a measure of market timing ability.
In principle, there might be many ways for one to make inferences about market timing ability from the data. We focus on the correlation between portfolio flows and subsequent market returns as a measure because it is feasible to calculate given the data that we have. The following proposition establishes this correlation as a valid measure of market timing under our assumptions.
Proposition 1. —
Assume:
the investor maximizes objective function (A1),
market dynamics follow (A2),
the investor’s information is given by (A4), and
the portfolio strategy is self-financing.
—
Then the correlation between investor flows into the stock and subsequent changes in the value of the market portfolio is an increasing function of the investor’s market timing ability .
Proof
: With the notation and assumptions above, the return of the portfolio can be written as
(A6) Moreover, conditional on observing , we can rewrite as follows:
(A7) where is a standard random variable independent of . Then by (A1), the independence of the noise terms in (A7), and the self-financing condition, the investor solves the following optimization problem:
(A8) where the variance term because and . By the first order condition, the optimal flow is given by
(A9) Note that, by (A4) and (A9), before observing the signal, , the mean variable in (A5) is given by
and it is a normally distributed random variable with mean zero and standard deviation of . Therefore, before observing the signal, the optimal flow is a random variable with mean
and standard deviation
Then by (A2) and (A9), the correlation is given by
Over the domain , this is a monotonically decreasing function of η.
Thus, the correlation between portfolio flows into stocks and changes in the value of the market is a good measure of market timing ability. We use this measure for much of our empirical analysis. One important limitation of the model is that it does not allow for multiple risky assets with different betas. However, we can easily generalize our correlation measure to account for beta.
Footnotes
See, for example, Goyal and Welch (2008) and Campbell and Yogo (2006).
Fund managers are often constrained by their charters. For example, an equity fund cannot generally hold large amounts of fixed-income assets.
We use an additional market timing measure that captures within-portfolio-market exposure. This measure correlates beta-weighted monthly cash flows with future monthly returns on the Finnish market.
During the time period, the autocorrelation in Finnish market returns is.20, which is less than the.26 autocorrelation in S&P 500 index returns during the same time period. This is suggestive evidence the Finnish market is not significantly less efficient than the U.S. market during this time period.
See Jagannathan and Korajczyk (2014) for a review of the returns-based measurement of market timing literature. Henriksson (1984), Ferson and Schadt (1996), Becker et al. (1999), and Goetzmann, Ingersoll, and Ivković (2000), among others, find little evidence of market timing. Kacperczyk and Seru (2007) find little evidence of timing using returns and holdings data. Daniel et al. (1997) use returns, and Wermers (2000) uses holdings and returns to find little evidence of characteristic timing ability.
The data set contains information on the investor’s ZIP code, gender, firm sector code, firm legal form, firm postal code, firm country, language code, and registration date in the shareholder register. The data set also contains information on the type of transaction and the transaction registration basis.
We use the trading records of institutions to look for evidence of timing in one test, but the test results in no evidence of timing (see the Internet Appendix). Given the relatively small number of institutions in the data, we do not find the lack of evidence for institutions surprising.
We exclude transactions with codes of 0 (initial balance), 4 (transfer to own book-entry account), 55 (splits), 60 (creation of subscription rights), and 96 (transfer to own book-entry account to another operator).
Cash-out mergers generally result in substantial cash flows to investors, while exchange mergers often result in no cash flow.
It is important for us to keep most types of transactions in our set of trades to calculate accurate flows. While it is true that an individual may not actively choose to sell a stock when a company is being taken over or liquidated, it is also true that those transactions generate a cash flow in the individual’s account. Since most of our analysis considers the trades of fairly active traders, we assume that individuals are aware of such transactions, and thus omitting them would be a mistake. Suppose, for example, that a company is taken over, and €10,000 are deposited into an individual’s account. If that individual then spends €5,000 to purchase another stock, then the correct flow for that account in that month is €-5,000. If we omit the takeover transaction, then it appears that this individual’s flow is €5,000 this month, flipping the sign of the true flow.
Our results do not depend on the exact cutoff value chosen (e.g., with a cutoff of 12 flows, the correlation between first and second period monthly timing measures was.0714, which is significant at the 0.01% level). The cutoff value was chosen to optimize the trade-off between the sample size and capturing active investors.
Our sample of 1,386,540 people is approximately 27% of the Finnish population in the year 2002 (see http://www.stat.fi).
The number of flows in 2009 is smaller than 2008 because our sample ends in June 2009.
From November 1, 1995, to August 1, 2001, the index capped the weight of any individual stock at 20%. After August 1, 2001, the index caps the weight of any individual stock at 10%. The number of stocks capped varies over time. As an example, on November 3, 2003, four stocks were capped at 10%. In the Internet Appendix, we present results using a more general market index (HEX) and find that the results are very similar.
There are over 8,285 securities in our sample, of which 155 are stocks (identified by their existence in the Compustat Global database). Stocks account for 63.23% of the trades and 73.95% of the absolute flows. The rest of the securities are derivative instruments, bonds, and exchange-traded funds (ETFs). Only a fraction of the derivatives are traded in any given period since derivatives with different expiration dates have different identifiers. As a robustness check, we run our main analysis using only equities, and, as expected, the results are similar to those with all securities, though they are a little weaker. The reason we include all the securities in our main analysis is that investors could use derivative securities or corporate bonds in their market timing strategy. In sum, the results are not sensitive to the subset of securities used.
For robustness, we have rerun our main analysis using quarterly returns. This measure correlates monthly flows with the cash return on the HEX 25 over the 3 months beginning in month t+1 and ending in month t+3. The Internet Appendix presents the results, which are very similar to the results using the monthly market timing measure.
This limitation is especially unlikely to affect our results since we are concerned about cross-sectional variation across investors. Only if there was a cross-sectional bias in the correlation between savings and future market returns would the lack of wealth data potentially affect our results.
p-values for each cell in the cross-tabulation table and for the pairwise and Spearman rank correlations are calculated using a placebo test procedure. We discuss the procedure later in this section.
The null of 2,674 is similar to the average number we observe in our placebo tests. In our placebo tests, the average number of individuals in the top quintile in both periods is 2,700, and the average number of individuals in the bottom quintile in both periods is also 2,700.
There are 3,956 investors with a flow-return correlation greater than.1 in both periods. The average number in our placebo tests is 1,961.
We winsorize the standardized flows of the investors at the 1% and 99% levels to minimize the impact of outliers.
The significance of the coefficient dramatically increases if Newey-West standard errors with a 1-month lag are used. We report the uncorrected standard errors to be conservative. Bootstrapped standard errors are similar to those reported.
In unreported tests, we find the relationship between our flow variable and future market returns is not explained by aggregate dividend payments, including 2 or 3 months of lagged market returns or including an indicator variable for a market decline of -3.96% (the 25th percentile of returns in our sample). These results suggest that the difference in flows between good and bad timers is not explained by differences in dividend reinvestment behavior (e.g., Kaustia and Rantapuska 2012) or differences in trading behavior in response to past returns (e.g., Friesen and Sapp 2007).
Most Sharpe ratios are calculated for buy-and-hold strategies. The ratio we report is for a dynamic asset allocation strategy, which does not have a constant variance.
Contributor Information
Jussi Keppo, National University of Singapore.
Tyler Shumway, Brigham Young University.
Daniel Weagley, Georgia Institute of Technology.
We thank Noah Stoffman for helping us prepare the data. We thank Justin Birru, Stephen Dimmock, Madhu Kalimipalli, Juhani Linnainmaa, and Yuri Tserlukevich and seminar participants at the University of Colorado, the University of Massachusetts, the University of Michigan, National University of Singapore, the University of Manchester, Warwick University, Babson College, the 2016 American Economic Association Meetings, the Helsinki Finance Summit on Investor Behavior, the Ben Graham Centre’s Symposium on Intelligent Investing and the Finance Research Workshop, and IIM Calcutta for comments. Shumway is grateful for the financial support provided by National Institute on Aging [grant 2-P01-AG026571]. Keppo thankfully acknowledges financial support from the Singapore Ministry of Education [grant R-314-000-097-133]. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the Singapore Ministry of Education. Send correspondence to Daniel Weagley, daniel.weagley@scheller.gatech.edu.
Authors have furnished an Internet Appendix, which is available on the Oxford University Press Web site next to the link to the final published paper online.
References
- Becker C., Ferson W., Myers D. H., Schill M. J.. 1999. Conditional market timing with benchmark investors. Journal of Financial Economics 52:119–48. [Google Scholar]
- Bollen N. P. B., Busse J. A.. 2001. On the timing ability of mutual fund managers. Journal of Finance 56:1075–94. [Google Scholar]
- Campbell J. Y., Yogo M.. 2006. Efficient tests of stock return predictability. Journal of Financial Economics 81:27–60. [Google Scholar]
- Campbell J. Y., Viceira L. M.. 2002. Strategic asset allocation. Oxford, UK: Oxford University Press. [Google Scholar]
- Che L., Norli O., Priestly R.. 2009. Performance persistence of individual investors. Working Paper, BI Norwegian Business School.
- Che L., Norli O., Priestly R.. 2012. Market timing by individual investors. Working Paper, BI Norwegian Business School.
- Coval J. D., Hirshleifer D., Shumway T.. Forthcoming. Can individual investors beat the market? Review of Asset Pricing Studies.
- Daniel K., Grinblatt M., Titman S., Wermers R.. 1997. Measuring mutual fund performance with characteristic based benchmarks. Journal of Finance 52:1035–58. [Google Scholar]
- Dichev I. D. 2007. What are stock investors’ actual historical returns? Evidence from dollar-weighted returns. American Economic Review 97:386–402. [Google Scholar]
- Elton E. J., Gruber M. J., Blake C. R.. 2011. An examination of mutual fund timing ability using monthly holdings data. Review of Finance 16:619–45. [Google Scholar]
- Fama E. F., French K. R.. 2010. Luck versus skill in the cross-section of mutual fund returns. Journal of Finance 65:1915–65. [Google Scholar]
- Ferson W. E., Schadt R. W.. 1996. Measuring fund strategy and performance in changing economic conditions. Journal of Finance 51:425–61. [Google Scholar]
- Friesen G. C., Sapp T. R.. 2007. Mutual fund flows and investor returns: An empirical examination of fund investor timing ability. Journal of Banking & Finance 31:2796–816. [Google Scholar]
- Goetzmann W. N., Ingersoll J. Jr., Ivković Z.. 2000. Monthly measurement of daily timers. Journal of Financial and Quantitative Analysis 35:257–90. [Google Scholar]
- Goyal A., Welch I.. 2008. A comprehensive look at the empirical performance of equity premium prediction. Review of Financial Studies 21:1455–508. [Google Scholar]
- Grinblatt M., Keloharju M.. 2000. The investment behavior and performance of various investor types: A study of Finland’s unique data set. Journal of Financial Economics 55:43–67. [Google Scholar]
- Grinblatt M., Keloharju M.. 2001a. What makes investors trade? Journal of Finance 56:589–616. [Google Scholar]
- Grinblatt M., Keloharju M.. 2001b. How distance, language, and culture influence stockholdings and trades. Journal of Finance 56:1053–73. [Google Scholar]
- Grinblatt M., Keloharju M., Linnainmaa J. T.. 2012. IQ, trading behavior, and performance. Journal of Financial Economics 104:339–62. [Google Scholar]
- Henriksson R. D. 1984. Market timing and mutual fund performance: An empirical investigation. Journal of Business 57:73–96. [Google Scholar]
- Henriksson R. D., Merton R. C.. 1981. On market timing and investment performance II. Statistical procedures for evaluating forecasting skills. Journal of Business 54:513–33. [Google Scholar]
- Jagannathan R., Korajczyk R. A.. 2014. Market timing. Working Paper, Northwestern University.
- Kacperczyk M. T., Seru A.. 2007. Fund manager use of public information: New evidence on managerial skills. Journal of Finance 62:485–528. [Google Scholar]
- Kacperczyk M. T., Van Nieuwerburgh S., Veldkamp L.. 2014. Time-varying fund manager skill. Journal of Finance 69:1414–41. [Google Scholar]
- Kaustia M., Knüpfer S.. 2012. Peer performance and stock market entry. Journal of Financial Economics 104:321–38. [Google Scholar]
- Kaustia M., Rantapuska E.. 2012. Rational and behavioral motives to trade: Evidence from reinvestment of dividends and tender offer proceeds. Journal of Banking & Finance 36:2366–78. [Google Scholar]
- Mamaysky H., Spiegel M., Zhang H.. 2008. Estimating the dynamics of mutual fund alphas and betas. Review of Financial Studies 21:233–64. [Google Scholar]
- Merton R. C. 1981. On market timing and investment performance I. An equilibrium theory of value for market forecasts. Journal of Business 54:363–406. [Google Scholar]
- Seru A., Shumway T., Stoffman N.. 2010. Learning by trading. Review of Financial Studies 23:705–39. [Google Scholar]
- Shiller R. J. 2000. Irrational exuberance. Princeton, NJ: Princeton University Press. [Google Scholar]
- Stambaugh R. F. 1999. Predictive regressions. Journal of Financial Economics 44:375–421. [Google Scholar]
- Treynor J., Mazuy K.. 1966. Can mutual funds outguess the market? Harvard Business Review 54:131–6. [Google Scholar]
- Wermers R. 2000. Mutual fund performance: An empirical decomposition into stock-picking talent, style, transactions costs, and expenses. Journal of Finance 55:1655–95. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.





