Abstract
Motivated by the growing convergence between news media and social media as dominant sources of information dissemination, this study examines the connection between textual sentiment and stock returns. Previous studies have examined the effect of sentiment extracted from these two sources on stock returns independently, without modelling how one source can confound the relationship between stock returns and the other source. We investigate this using data from four markets (USA, UK, South Africa and Brazil) and a sample period stretching from January 2016 to April 2023. Employing a suite of methods that encompass both simple parametric techniques and complex models designed to address nonlinearity, chaos and deviations from normality, the analysis uncovers a pronounced impact of social media sentiment on stock returns in the United States. This influence overshadows the effect of news media sentiment across the employed methods. Interestingly, in other markets, news media exhibits a greater effect on stock returns compared to social media sentiment. By emphasising the convergence of news media and social media, the study highlights the important interplay between these sources, offering valuable insights into understanding the complex dynamics of modern financial markets.
Keywords: Investor sentiment, Behavioural finance, Social media sentiment, Twitter
1. Introduction
In the contemporary digital age, the symbiotic relationship between social media and traditional news media has given rise to an intricate web of information dissemination, significantly altering the landscape of news consumption and its impact on various sectors like financial markets [1]. This convergence of social media and traditional news media has led to increased interconnectedness and the rapid dissemination of information across diverse media platforms [2]. As individuals increasingly rely on these sources to form opinions and make decisions, understanding the interplay between social media and traditional news media in shaping online1 investor sentiment and, consequently, stock market dynamics is important. In this study, we therefore fill this gap by investigating the effect of online investor sentiment on stock returns while acknowledging the potential information flows (and therefore potential confounding effect) between these two sources of information.
Various empirical studies within the domain of behavioural finance have confirmed the influence wielded by news sentiment [3] and social media sentiment2 [4] on investor behaviour and market outcomes. Researchers have reported the predictive power embedded within the sentiments expressed through these channels, illuminating their potential to anticipate future stock returns. However, the existing body of literature predominantly treats social media and traditional news media as isolated entities, neglecting the interconnections and mutual influences that characterise their relationship. This oversight presents a significant research gap in the current understanding of the mechanisms guiding stock market behaviours within this complex information ecosystem. The present study seeks to bridge this critical gap by adopting a comprehensive and integrative approach to reveal the dynamics between social media and traditional news media in influencing stock returns. By highlighting the interplay between news and social media in influencing markets, the study can inform regulatory efforts aimed at maintaining market integrity and preventing manipulative practices.
We utilise a suite of methods ranging from simple parametric models to robust methods that accommodate stylised facts of financial variables. These methods include simple Pearson correlations, time-varying Granger causality and wavelet analyses. Our results show that traditional news media and social media sentiment are interconnected, and it is essential to concurrently model their effects on stock market outcomes. This can be observed, for example, in the results from partial wavelet coherence, which isolates the relationship between news sentiment and stock returns while discounting the influence of social media sentiment vis-à-vis results from ordinary wavelet coherence between news media sentiment and stock returns. The differences in the results from the partial wavelet coherence compared to ordinary wavelet coherence suggest that news sentiment (or social media sentiment) masks the connection between Twitter sentiment (or news media) and stock returns. This empirical observation lays the foundation for a fundamental understanding of the interconnectedness of news media and social media in shaping financial market outcomes.
By revealing the differential impact of sentiment across markets and sources, the study equips investors and analysts with more tools for analysing market trends and making informed investment decisions. This can lead to better risk management and potentially higher returns. The study emphasizes the convergence of news media and social media and the interplay between these sources in influencing stock returns. This suggests that isolating the effect of each source independently might not be sufficient for accurate understanding. Analysing them together provides a more holistic perspective.
We proceed as follows; in Section 2 we give a brief review of the empirical literature, Section 3 outlines the methodology used in the study, in Section 4, we present the results, in Section 5 we discuss the results, and we conclude in Section 6.
2. Literature review
Several studies have been done to understand the effect of social media and news media on stock returns but using independent frameworks. Souza et al. [5] demonstrate the differences between the effect of social media sentiment and news media on stock returns using a sample of 5 retail stocks listed in the USA between November 1, 2013 and September 30, 2014. The relative sentiment extracted from social sentiment was reported as more influential on stock returns than the sentiment extracted from news. Interestingly, among the 5 sampled retail companies, the only stock whose returns had a significant causal relationship with news sentiment was GameStop Corp, the company that has recently been subject to the short squeeze during the COVID period caused by the irrational behaviour of retail investors on a social media platform, Reddit.
Lachana and Schröder [6] use a dataset of daily news and social media for S&P500 companies between 2006 and 2020 to directly compare how the sentiment extracted from the two sources of information is related to stock returns. Social media content was extracted from Seeking Alpha, an information-sharing platform that allows the interaction of individuals discussing investment-related content. In terms of news, Lachana and Schröder [6] utilized content from the Wall Street Journal (WSJ). The study reports empirical findings showing that investor sentiment proxied by social media content is superior at predicting daily stock returns compared to the sentiment that is extracted from traditional print media.
Xu et al. [7] split news sources into newspapers, internet media news and social media to compare how these sources of information are related to stock returns. Sentiment scores for newspapers, internet news and social media were created from content extracted from 8 momentous journals, 20 mainstream internet media news and Eastmoney. Monthly indexes were then correspondingly created for social media sentiment, newspaper sentiment and internet news sentiment. First, the study revealed that the predictive ability of the social media sentiment index as well as internet news sentiment was superior to several macroeconomic predictors. Macroeconomic predictors used in this context include the dividend price ratio, the rate of return on common shareholder's equity and the earnings price ratio among others. The predictive ability of newspapers was however reported as unsatisfactory compared to online news and social media. The rationale was that investor attention is motivated by the need to acquire information about the stock market within the limited time frames that investors have. Investors then turn to social media and financial websites that give them timely information compared to traditional newspapers that often publish lagged information.
Instead of concentrating on the social activity of mainly naïve investors on social platforms, Smith and O'Hare [8] concentrate on the Twitter sentiment from Chief Executive Officers (CEOs) of several companies and compare these with news sentiment and how they affect stock market features. They expand their study to include the effect of traditional news and Twitter activity of heads of government on benchmark indexes of major countries during the COVID-19 period. The financial news was extracted from Forbes. The findings from the study provide some evidence pointing to the ability of financial news to predict stock prices. Twitter posts from CEOs and heads of government were not significantly related to future stock prices.
Alomari et al. [9] investigate how sentiments expressed in news articles and social media influence how much the stock and bond markets fluctuate, and how the returns of these markets are connected over time. The findings suggest that news sentiment has a bigger impact on how much the markets fluctuate, while social media has a stronger influence on the correlation of the returns of the two markets. Additionally, the model that considered news sentiment was better at predicting future returns than the models that only considered social media sentiment or no sentiment at all.
COVID-19 is a global pandemic that came with uncertainty leading to disruptions in financial markets. Several studies have since been done to examine the relationships among financial variables in the context of the COVID-19 pandemic. One such study [10] compares the effects of official media sentiment and social media sentiment on Chinese stocks. The authors reported that on a day when COVID-19 was discussed positively, listed firms experienced higher returns on the following trading day for the whole sample period. Also, in terms of comparison between official news and weibos, the study revealed the effects were stronger with the former.
We depart from the above studies by acknowledging the potential information flow between news sentiment and social media sentiment. We achieve this first, by conducting our analysis first in a VAR framework that treats news sentiment, social media sentiment and stock returns as endogenous inputs. Thus, we hypothesise that the relationship between news media (social media) sentiment is confounded by social media (news media) sentiment. We also utilise other methods like partial wavelet analysis that allows us to understand how news sentiment (social media) is related to stock returns while controlling the effects of social media (news) sentiment.
3. Data and methods
3.1. Data
The study uses a population of firms listed on the Johannesburg Stock Exchange All Share Index (JALSH) [South Africa], Dow Jones Industrial Average (DJIA) [USA], London Stock Exchange (FTSE 100) [UK] and the Brazil 100 Index (IBrX 100) [Brazil] between January 1, 2016 and April 30, 2023. Several reasons have motivated the choice of the above-mentioned exchanges. These countries have been chosen because of the high quality of textual sentiment data extracted from Bloomberg Inc. for these specific countries. The countries are among the top users of social media platforms like Twitter3 and all of them are constituents of the top 20 largest stock exchanges by market capitalisation.4 Brazil and South Africa have the largest stock exchanges in Latin America and Africa respectively as well as hosting the highest number of social media users in their respective regions, while at the same time, Nyakurukwa and Seetharam [3] report that the Global South is disproportionately represented in studies examining the role of textual sentiment in financial markets. The inclusion of these countries, therefore, fills this gap and provides an avenue for comparisons with developed countries. Besides that, the four countries provide a fair dispersion of the sample in terms of geographical location since they correspondingly represent the African continent, Latin America, North America and Europe. Some regions like the Asia Pacific are not represented mainly because social media platforms like Twitter are not common in these regions and therefore the quality of data is not useful for this study. Also, in some Asian-Pacific countries where sizable numbers of social media users are found (such as Japan), the quality of sentiment scores is poor (most daily observations are missing values), a signal that participants could be discussing other issues on social media and not the stock market.
Social media sentiment scores and news media sentiment scores5 are extracted from Bloomberg Inc. Bloomberg only started incorporating social media sentiment data on January 1, 2015 but the quality of the sentiment scores only started improving in 2016 as the percentage of missing values decreased. We, therefore, restrict the study period to the period after December 31, 2015. Firms that are listed or delisted at any time in the middle of the sample period are not included in the study since Bloomberg Inc. only provides sentiment data for currently listed stocks. We use various steps to filter the companies so as to remain with stocks with quality complete observations to ensure robustness of results. First, only companies that have sentiment data from the start of the sample period (January 1, 2016) to the end of the sample period (April 30, 2023) are used in the study. This effectively filters out delisted companies (because they do not have sentiment data) as well as stocks that got listed midway in the sample period (because they do not possess all the observations needed for the time series analyses).
Second, to ensure consideration of stocks that are frequently mentioned on both media platforms, we only include stocks that have a median number of messages above the 50th percentile. Stocks mentioned a few times daily are likely to have biased sentiment scores compared to stocks that are frequently mentioned. This is because a frequently mentioned stock is likely to be subjected to the opinions of a variety of people and/or organisations leading to more representative sentiment scores than stocks that are infrequently mentioned. Third, a stock that is not mentioned on a particular date is given a missing value on the Bloomberg platform. To avoid arbitrary imputation of missing sentiment scores, only stocks mentioned on both platforms daily are included in the study. Finally, literature (such as [11]) has shown that when it comes to online investor sentiment, neutral sentiment (a sentiment score of zero) dominates the other sentiment states (i.e. positive and negative sentiment). To ensure that we remain with the real sentiment that has the potential to influence financial markets, we limit neutral sentiment for each stock to not more than 30 % of the total observations. This leads us to an ultimate dataset of 82 stocks of which 29, 30, 13 and 10 are listed in the USA, UK, SA and Brazil respectively (A full list of the sampled stocks is included in the Appendix in Table A1, Table A2, Table A3 and Table A4). Data is sampled at the daily interval, with each stock having 1907 daily observations and together, the sample constituting 156,374 firm-day observations.
3.2. Variables
Studies have shown that sentiment may be influenced by the same economic factors [12] and that asset-level sentiment may be correlated with market sentiment [13]. This implies that aggregate firm-level textual sentiment could be an accurate proxy for market sentiment. Most existing studies predominantly use the top-down approach to investor sentiment (e.g. Baker and Wurgler, 2006). However, the bottom-up strategy, which employs the sentiment of many different individual stocks, can capture a stock's mood that the top-down approach may not be able to. [14]. The top-down approach scarcely accounts for every stock's sentiment. For instance, investors may be generally upbeat about the prospects of the market, but they may be gloomy about a particular stock. As a result, such pessimism might be overlooked by a market-wide sentiment measure and can only be discovered through a bottom-up approach. Yu [15] contends that the bottom-up strategy is superior to the top-down strategy since it has a lower signal-to-noise ratio. We therefore utilise average market variables constructed using the bottom-up approach and equal weighting following Guo et al. [16] as shown in Equation (1) and Equation (2):
| [1] |
| [2] |
where and represent the average news media and average social media sentiment scores at time ; and represent firm-level social media sentiment and news media sentiment scores on day respectively. Average stock returns are calculated using the same method as shown in Equation (3):
| [3] |
Where:
represents the average market returns at time and is the stock return of firm at time t. Average measures of sentiment and returns are estimated relative to each country included in the sample for this study.
3.3. Methods
3.3.1. Correlation, partial correlation and multiple correlations
To explore the relationships between the two proxies of textual sentiment and stock returns, we first use static measures of comovement, namely Pearson correlation, partial correlation and multiple correlation models. The Pearson correlation coefficient is used to measure the linear correlation between average stock returns vis-à-vis average textual sentiment (either news media or social media sentiment). The Pearson correlation coefficient is estimated as shown in Equation (4):
| [4] |
Where:
is the sample size, and are the individual sample points of average sentiment scores and average stock returns respectively at time , and are the means of online textual sentiment and stock returns respectively.
Since there is likely to be interdependence between social media and traditional media, the sentiment extracted from one source can be a confounding factor in the other's association with stock returns. Partial correlation alleviates this by removing the effect of the confounding variable. We, therefore, use partial correlation to estimate the linear association between stock returns and news media (social media) while removing the effect of social media (news media). The partial correlation between and while controlling for () is estimated as shown in Equation (5):
| [5] |
Where:
is the correlation between and
is the correlation between (the third variable) and
is the correlation between (the third variable) and
Mecklin [17] identifies three distinct scenarios with partial correlation estimations.
-
1.
- a scenario where the partial correlation is approximately equal to the ordinary (zero-order correlation).
-
2.
- a scenario where the partial correlation is noticeably weaker than the ordinary correlation. In such a scenario, the ordinary correlation is deemed a spurious correlation; the apparent correlation between and is because of the confounding variable.
-
3.
- a scenario where the partial correlation is noticeably stronger than the ordinary correlation. In such a scenario the third variable is called a suppressor variable as it is masking the true strength of the association between and .
Finally, in our linear and parametric tests of the association between stock returns and online textual sentiment, we test the overall linear relationship between stock returns and the two proxies of online textual sentiment using multiple correlations. Multiple correlation () tests the overall association between some dependent variable and predictors (). The multiple correlation is the square root of the multiple statistic. The correlation between response variable and the fitted values that arise from a linear regression model is equal to the multiple regression coefficient as shown in Equation (6):
| [6] |
The different forms of correlations outlined above have some assumptions that might not be achievable with real-world data, e.g. linearity, normality, stationarity and absence of outliers. Additionally, in as much as they reveal associations between variables, they do not infer causality. As a result, in the next section, we extend the concepts to ameliorate the weaknesses emanating from the assumptions.
3.3.2. Time-varying Granger causality
Several studies have used Granger causality to understand the causal relationship between variables in a VAR framework. Granger causality can be illustrated by a bivariate V(m) model given Equation (7) and Equation (8):
| [7] |
| [8] |
Where and , respectively represent the time series of interest. Variable is said to Granger cause variable if the past values of have predictive power for the current values of , conditional on the past values of . The null hypothesis of no causality from y1 to y2 involves testing the joint significance of using a Wald test. Granger causality may be supported across a single time frame but may however be fragile when different periods are taken into account, just like with other characteristics of structural stability [18]. Shi, Phillips and Hurn [19] and Shi, Hurn and Phillips [20] prove that it is possible to assess the stability of causal relationships over time through stationary VAR and lag-augmented VAR (allowing for non-stationary variables) respectively. To examine the time-varying stability of the causal relationship between sentiment and stock returns, we, therefore, depend on the time-varying Granger causality framework of Shi, Hurn and Phillips [20]. This method allows for the variation in Granger causal orderings and date-stamping the timing of the changes using recursive methods. The method uses the three algorithms that generate a sequence of test statistics, namely, the forward expanding (FE) window, the rolling (RO) window and the recursive evolving (RE) window.
Considering a sample of observations , a number r such that and considering [Tr] to denote the integer part of the product, then will be taken to denote a Wald test statistic computed over a subsample starting at and ending at . The FE algorithm is a standard forward recursion that is based on Thoma [21]. The Wald test statistic is computed first for a minimum window length , and the sample size then expands sequentially by one observation until the final test statistic is computed using the entire sample. At the conclusion of the FE algorithm, a sequence of Wald statistics, with and is obtained. In the RE algorithm, for a given observation of interest, the algorithm computes a test statistic for every possible subsample of size or larger with the observation of interest providing the common end point of all the subsamples. Phillips, Shi and Yu [22] propose that inference be based on a sequence of supremum norms of these statistics. The RE algorithm produces a sequence of test statistics with and with which are the sup norms of the Wald statistics at each observation. In the RO algorithm [18,23], a window of is rolled through sample advancing one observation at a time and a Wald statistic is computed for each window. We depend on the recursive evolving window approach as it provides higher power than the other
algorithms [20] and is more favourable when performed in conjunction
with a bootstrap engine for maintaining family-wise size control. We use a three-variable VAR framework for daily data involving social media sentiment, news media sentiment and stock returns.
Letting to represent that the direction of the Granger causality being tested runs from x to y, we test the following relationships:
-
1.
and
-
2.
and
Using the Recursive Evolving Window algorithm (see Ref. [20]). We select lags in the VAR model based on the Schwartz and Akaike statistics, an initial estimation window of 20 % of the observations (software default) and the size of the tests over one year (used in Ref. [20]). The tests are robust to heteroskedasticity. The sequence of RE statistics is graphed and compared with the bootstrap percentiles extracted from methods outlined in Shi, Hurn, and Phillips [20] and Shi, Phillips, and Hurn [19]. These estimates are used to identify periods in which the potential Granger causal relationships vary significantly. The estimated origination date of a change is determined as the first instance at which the test statistic exceeds its critical value. Subsequent changes are then identified similarly. Time-varying Granger causality described above is preferred over a static VAR framework, especially in a fast-changing contemporary environment where relationships among variables can change across time.
3.3.3. Wavelet analysis
While the time-varying Granger causality framework outlined above shows a causal relationship in a time-space, it does not reveal causal dependencies in a frequency space. Wavelet analysis is gaining traction in the domain of economics and finance because of its ability to map causal dependencies between variables in a frequency-time space. Wavelets are mathematical, wave-like functions that are used to extract information from different types of data. The application of wavelets to time series data involves synthesising signals into various frequency components by decomposing the initial series into multiple time series. The resultant decomposed series exhibits unique features peculiar to a specific investment horizon. Several advantages of using Wavelet analysis have been documented and these include; capturing dependencies in nonlinear data, not requiring stationary data, giving the relationship in a time-frequency domain and not specifying any distributional characteristics of the data [24].
3.3.3.1. Wavelet coherence
In our first attempt to understand the evolution of the relationship between stock returns and textual sentiment, we use a bivariate Wavelet coherence framework. We utilise the Continuous Wavelet Transform (CWT) to decompose the textual sentiment and stock return series. CWT is found by adding a basis wavelet which is attained from the translation and dilation of the mother wavelet, thereby transforming the initial time series into a two-dimensional plane of time and frequency. The continuous wavelet transform (CWT) Wx(τ, s) for a given time series X(T) corresponding to its mother wavelet is obtained by projecting the mother wavelet into the examined time series, where the mother wavelet is defined as shown in Equation (9):
| [9] |
Where and τ and s are the location parameter and scale dilation parameter of the wavelet respectively. Given a mother wavelet, the CWT is then defined as shown in Equation (10):
| [10] |
Where represents the complex conjugate of the basis wavelet . For this study, in line with Xu et al. [25], the Morlet wavelet is used in analysing the data on both amplitude and phase. The Morlet wavelet is a complex sine wave within a Gaussian envelope as shown in Equation (11):
| [11] |
Where guarantees that the wavelet function has unit energy and guarantees the admissibility condition of a mother wavelet. The wavelet power spectrum of a time series is the modulus of the CWT () which recovers the relative contribution at each time and each scale to the time series variance. The wavelet power spectrum can be obtained using Equation (12), and can be integrated across the to recover the total variance in the investigated series:
| [12] |
Where is the Fourier transform of . The cross-wavelet transform of two time series is defined as . The cross-wavelet spectrum wavelet is correspondingly defined as , implying the local comovement between and . The wavelet coherency is used to measure the local strength of the association between two time series over time and across frequencies. It ranges from 0 to 1 with the former denoting low coherency and the latter denoting high coherency. The wavelet coherency coefficients are estimated using Equation (13):
| [13] |
Where is the smooth factor in time and scale. The interpretation of lead-lag coherence relationship is interpreted using Table 1:
Table 1.
Interpretation of the lead-lag coherence relationships.
| Direction | Implication |
|---|---|
| are positively related | |
| are negatively related | |
Where is the second variable and is the first variable in .
3.3.3.2. Partial wavelet coherence
The Wavelet Coherence described in the above section shows how online textual sentiment and stock returns comove in time-frequency space. This bivariate illustration of the evolution of the relationship between textual sentiment and stock returns can be misleading if both variables used in the model depend on another third variable. Partial Wavelet Coherence was designed to ameliorate this weakness by modelling the relationship between and while eliminating the influence of the third variable . Because there is likely to be a flow of information between traditional media and social media, these two sources of information might therefore depend on each other in influencing a third variable like stock returns. In this section, we, therefore, seek to model how stock returns comove with social media (traditional media) sentiment while controlling for traditional media (social media) sentiment. Mihanović et al. [26] introduced the concept of Partial Wavelet Coherence by measuring the WC of and when the influence of is excluded as shown in Equation (14):
| [14] |
Where: , , are the WC between and , and and and and the asterisk indicates the complex conjugate.
3.3.3.3. Multiple Wavelet Coherence
Partial Wavelet Coherence shows the dynamic relationship between two variables while excluding the influence of a third variable. Meanwhile, the concept of Multiple Wavelet Coherence seeks to explore the combined effect of two covariates on a dependent variable. We employ the concept of Multiple Wavelet Coherence to examine whether social media sentiment and traditional media sentiment complement each other to affect stock returns in a time-frequency domain. Given that represents stock returns and and represent news media and social media sentiment respectively, MWC is computed as shown in Equation (15):
| [15] |
The statistical significance of the MWC is estimated using Monte Carlo methods by generating a large set of surrogate data with the same AR(1) coefficients as the input datasets. The significance level for each wavelet scale is then estimated using values outside the cone of influence.
4. Results
4.1. Parametric correlation between investor sentiment and stock returns
In this section, we report results on the relationship between online textual sentiment and stock returns first using basic methods and more complex methods later. We start by presenting the results of the static relationship between the two proxies of online textual sentiment and stock returns in each of the four markets using parametric correlations between the variables. Table 2 shows the results of the ordinary Pearson correlations as well as partial correlations between investor sentiment and stock returns. Though the results presented here are based on parametric methods, we use them to show the preliminary relationship between the variables. In the next sections, we use robust methods to establish the relationships across time and different frequency intervals. Table 2 Panel A shows each of the four markets, Panel B shows the ordinary correlations between news sentiment and stock returns, Panel C shows the ordinary correlations between social media sentiment and stock returns, Panel D shows the partial correlations between stock returns and news media sentiment while holding the effect of social media constant (News|Twitter) and Panel E shows the partial correlations between stock returns and social media while excluding the effect of news media (Twitter|News).
Table 2.
Parametric correlations.
| Market | News | News|Twitter | Twitter|News | |
|---|---|---|---|---|
| A | B | C | D | E |
| USA | 0.2615*** | 0.1296*** | 0.2379*** | 0.0664*** |
| UK | 0.1750*** | 0.1019*** | 0.1594*** | 0.07126*** |
| SA | 0.1131*** | 0.0944*** | 0.09760*** | 0.0750*** |
| Brazil | 0.1220*** | 0.1213*** | 0.1096*** | 0.1088*** |
Notes: News|Twitter represents the partial correlation between stock returns and news sentiment while holding the effect of Twitter sentiment constant, Twitter|News shows the partial correlation between stock returns and Twitter sentiment while holding the News sentiment constant; *** signifies statistically different from 0 at the 1 % level of significance.
From the results presented in Tables 2 and it can be observed that all the correlation coefficients are statistically significant at the 1 % level of significance. This provides elementary evidence of the potential existence of significant relationships between online investor sentiment and stock returns which should be explored using more robust econometric specifications. Across all the markets, stock returns are more correlated with news sentiment compared to the correlations with Twitter sentiment. The differences in how stock returns are related to social media compared to news media are more pronounced in the USA and UK compared to SA and Brazil. For example, in the USA, the correlation between stock returns and news is 0.2615 while the correlation between social media and stock returns is almost half at 0.1296. In Brazil, on the other hand, the correlation between stock returns and news (Twitter) is almost the same at 0.1220 (0.1213).
Considering partial, correlations, news seems to be a confounding variable in the relationship between stock returns and Twitter in the USA. The ordinary correlation between stock returns and news (0.2615) is not very different from the partial correlation between stock returns and news sentiment while eliminating the effect of Twitter sentiment (0.2379). On the other hand, the ordinary correlation between stock returns and Twitter (0.1296) noticeably reduces by about half (0.0664) when the effect of news sentiment is eliminated in the relationship. This suggests that a possible flow of information between news and social media sentiment exists in the US market. In the remaining 3 markets, we do not see noticeable differences concerning the ordinary correlations between stock returns and online textual sentiment vis-à-vis the partial correlations. This could imply a lack of information flow between the two sentiment proxies as there is no evidence of one proxy being a confounding variable in the relationship between the other proxy and stock returns. The results presented in this section, therefore, provide elementary evidence of possible relationships between online textual sentiment and stock returns. The limitations are that the above results are static (they do not provide a time perspective), parametric (assume a normal distribution of variables), assume investor homogeneity (investors have a single investing horizon) and do not show a causal relationship. In the next sections, we report results from methods that plug these limitations.
4.2. Time-varying Granger causality
In this section, we present the results of the causal relationship between online textual sentiment and stock returns using time-varying Granger causality. We introduce the causal effect as well as the time effect on the relationship between textual sentiment and stock returns. Financial markets are often characterised by sudden changes caused by external shocks and as a result, it is essential to understand how the lead-lag relationship between sentiment and stock returns evolves in time. The time-varying causal relationships between sentiment and stock returns in each of the markets are visualised in Fig. 1, Fig. 2, Fig. 3, Fig. 4. The plots display the 90th (black dashed line) and 95th (red dotted line) percentiles of the empirical distribution of the bootstrap statistics, to be compared with the sequence of the Recursive Evolving Window test statistics (blue solid line). There is significant causality at any point in time if the blue line is above the red dotted line (at the 5 % level of significance). The Schwartz and Akaike lag-order selection statistics are used to select the appropriate lag length in each of the markets. We only consider statistical significance at the 5 % level of significance and only include 10 % for comparison.
Fig. 1.
USA - Time-varying Granger causality
Notes:Fig. 1 shows results of time varying causality from news to returns (Panel A), from social media to returns (Panel B), from returns to news (Panel C) and from returns to social media (Panel D). These results are based on the USA market.
Fig. 2.
UK - Time-varying Granger causality
Notes:Fig. 2 shows results of time-varying causality from news to returns (Panel A), from social media to returns (Panel B), from returns to news (Panel C) and from returns to social media (Panel D). These results are based on the UK market.
Fig. 3.
SA - Time-varying Granger causality
Notes:Fig. 3 shows results of time-varying causality from news to returns (Panel A), from social media to returns (Panel B), from returns to news (Panel C) and from returns to social media (Panel D). These results are based on the SA market.
Fig. 4.
Brazil- Time-varying Granger causality
Notes:Fig. 4 shows results of time-varying causality from news to returns (Panel A), from social media to returns (Panel B), from returns to news (Panel C) and from returns to social media (Panel D). These results are based on the USA market.
Starting with the interpretation of the time-varying Granger causality from news to returns for the USA market presented in Fig. 1 Panel A, for most of the period there, is no significant causal relationship. This lack of causal effect from news sentiment to stock returns can be further demonstrated by the fact that even at the 90 % confidence interval, news sentiment remains insignificantly associated with future stock returns in the USA market. The only significant causal relationship from news sentiment to stock returns occurs momentarily at the beginning of 2022. In Panel B of Fig. 1, we visualise the causal relationship from social media sentiment to stock returns in a time space. It can be observed that though there is an insignificant relationship from the start of the sample period until 2020, from the first quarter of 2020 to the fourth quarter of 2022, social media sentiment significantly leads stock returns at the 5 % level of significance. The Recursive Evolving Window test statistics more than doubled in early 2020 to a figure that is statistically significant at the 5 % level. This could be associated with the uncertainty associated with the COVID-19 period as statistical significance starts to exist around the time COVID-19 was pronounced as a global pandemic by the World Health Organisation in March 2020.
Instead of only examining whether textual sentiment Granger causes stock returns in a time-space, we are also interested in the reverse causality between the variables. It could be possible that some market participants may develop a tendency to discuss stock tickers on online platforms based on their performance in the previous trading period. Panel C and Panel D of Fig. 1 show the causal relationship from stock returns to online investor sentiment in the US market. Starting with news sentiment, we observe that stock returns significantly lead news sentiment for sustainable periods, especially in 2018 and 2022 (including momentary causality in early 2020). However, when it comes to social media, the only notable significant causal relationship takes place momentarily in early 2022.
When it comes to the UK market (Fig. 2), a somewhat different scenario from the USA market can be seen. First, news sentiment seems to be significantly associated with stock returns at the 5 % level of significance for long periods as seen by the former Granger causing the latter from 2019 to the beginning of 2020. However, for the duration of the sample period, no causal relationship exists from social media to stock returns, even at the less stringent 90 % confidence interval. This is the opposite version of the results obtained for the USA market where social media was reported as significantly leading stock returns while no meaningful causal relationship was reported from news sentiment to stock returns. Shifting to the reverse causality, stock returns can significantly Granger cause future news sentiment as seen by a long period of a significant causal relationship between 2020 and 2022 (Panel C). However, when it comes to the causal effect from stock returns to social media sentiment (Panel D), no significant relationship can be observed, even at the 10 % level of significance for the duration of the sample period. An interesting pattern can also be observed for the dynamic relationship between news sentiment and stock returns between 2019 and 2022 in the UK. First, in 2019 and the first quarter of 2020, news sentiment significantly Granger causes stock returns with no reverse causality observed during the same period. On the other hand, from 2020 to 2022, stock returns significantly lead news sentiment and no significant reverse causality is also recorded during the same period. Thus, it seems that news sentiment can predict future returns during tranquil times in the UK. During the COVID-19 crisis (from 2020), market participants’ narratives on different news media platforms seem to be a direct response to the performance of the stock market in the previous period.
We complete our presentation of the results on the time-varying causal relationship between textual sentiment and stock returns by interpreting the results for the South African and Brazilian markets shown in Fig. 3, Fig. 4 respectively. Starting with South Africa, there is a significant causal effect from news sentiment to returns for most of the sample period, starting from 2018 to the end of the sample period. Notably, there is a spike in the Recursive Evolving Window test statistics (represented by the blue line in Panel A of Fig. 3) in 2020, which remain constant until the end of the sample period. This signifies that the strength of the causal effect of news sentiment on stock returns in the SA market amplifies at the back of the COVID-19 pandemic. This could have been caused by the uncertainties about how the pandemic would progress, which led investors to rely on traditional news media for narratives to shape their investing decisions. Conversely, social media sentiment only Granger causes stock returns momentarily in 2018 while for the rest of the sample period, there is no significant causality even at the 10 % level of significance.
Considering reverse causality in the SA market, Panel C (Fig. 3) shows no significant causality from stock returns to news sentiment for the duration of the sample period. On the other hand, stock returns significantly lead social media sentiment from the beginning of 2020 to the end of the sample period, again coinciding with the COVID-19 period. Thus, using this information, it can be cautiously inferred that traditional news media in South Africa can contain narratives that can shape the evolution of future prices. Social media, on the other hand, does not seem to contain information that can be significantly associated with the future prices of stocks. The fact that a significant causal relationship exists from returns to social media sentiment shows that market participants only discuss the stock market outcomes in retrospect on social media platforms in South Africa. A channel can therefore be hypothesised where news sentiment affects stock returns which affects social media sentiment as visualised in Fig. 5.
Fig. 5.
The hypothesised relationship between news, social media and stock returns in SA.
Finally, no statistically significant relationship is observed from stock returns to news sentiment. This suggests that traditional news platforms do not discuss the stock market in retrospect in South Africa, but rather with a future-oriented tone that can predict stock market outcomes. Coming to Brazil, Panel A and Panel B of Fig. 4 show that there is no causal relationship from both proxies of online investor sentiment to stock returns. Thus, social media sentiment and news media sentiment do not contain useful information that could be used to forecast stock returns across the sample period. Panel C and Panel D of Fig. 4 show reverse causality from stock returns to online investor sentiment in a time domain. We observe no significant causality from stock returns to online investor sentiment for both proxies except brief causality in 2020 and 2021 for social media and news media sentiment respectively. In a nutshell, there is no significant time-varying causal relationship between both proxies of textual sentiment and stock returns in both directions. The results in this section, though robust in other aspects, like nonstationary data etc, assume that investors are homogeneous and have a single investing horizon. In the next section, we disaggregate the investing horizons into multiple horizons to establish how different types of investors react to online investor sentiment in the stock market.
4.3. Wavelet analysis
The previous section reported results on the static and time-varying relationships between textual sentiment and stock returns. The caveat was the assumption that the investment horizon is the same for every investor in line with the EMH. In this section, we present the results on the relationship between online investor sentiment and stock returns in a time-frequency domain. This disaggregates the dynamic relationship between sentiment and stock returns into different investing horizons. In interpreting the wavelet-based causal relationships in this section, time is represented on the horizontal axis of each diagram. The vertical axis shows the period; lower period bands (higher frequencies) are shown near the top and higher bands (lower frequencies) are near the bottom. Lower bands would be of interest to investors with short-term horizons, whereas higher bands would be of interest to investors with longer-term horizons. The wavelet coherence plots depicted in the plots highlight regions in the time-frequency space where the two series move together. In this study, colours range from dark blue (0, no coherence) to red (1, strong coherence). Areas of statistical significance are marked by a thick black line.
The orientation of the arrows conveys two pieces of information: the correlation and the leading time series at a specific point. When an arrow points to the left, it indicates an anti-phase relationship, signifying a negative correlation between the two time series at that location. Conversely, a right-pointing arrow signifies an in-phase relationship, indicating a positive correlation between the time series. A downward arrow indicates that the first time series (Returns) leads the second (Sentiment), while an upward arrow indicates that the second time series (sentiment) leads the first (returns).6 In all the Figures, lighter shades depict areas outside the cone of influence (represented by the area below the curved white borderline), indicating less reliable results. The cone shape arises because higher period bands require a greater amount of data for computation. We report the findings on the relationships between the two types of online investor sentiment and stock returns in the USA market in Fig. 6.
Fig. 6.
USA – Wavelet-based relationships between stock returns and online sentiment
Notes:Fig. 6 shows relationships between sentiment and returns in the USA market, Panel A and Panel B are based on Wavelet coherence, Panel C and Panel D are based on Partial Wavelet Coherence, Panel E is based on Multiple Wavelet Coherence.
Panel A of Fig. 6 shows the Wavelet Coherence findings on the causal relationship between average stock returns and average news sentiment. We observe no notable relationship between the variables from the start of the sample period until early 2021. From 2021 to the end of the sample period, we observe significant islands of coherence, especially in scales between 2 and 16 and one significant coherence island in the 16–64 scale. For all the significant coherence islands reported in Panel A of Fig. 6, the relationships are in phase (arrows pointing to the right), signifying a positive causal relationship between the two variables within the sample period. In terms of the lead-lag dynamics, most arrows are pointing upwards, signifying that the second variable (sentiment) dominantly and positively leads the first variable (stock returns). In other words, news sentiment seems to positively predict stock returns at short-term and medium-term frequencies and this effect is more pronounced towards the end of the sample period.
In Panel B of Fig. 6, Twitter sentiment has a stronger dynamic relationship with stock returns compared to news sentiment. This can be observed from the relatively larger islands of significant coherence of the two variables across time and frequency. Also, it can be noticed that the relationship between Twitter sentiment and stock returns is in phase, again signifying a positive causal relationship. Across all the islands of significant coherence, Twitter sentiment seems to be leading stock returns across time and frequency, showing that discussions about the prospects of some stock tickers today can predict stock returns in the future. Most of the significant coherence islands in Panel B of Fig. 6 occur at medium and long-term intervals. In the short term (lower scales) no significant causal relationship between Twitter sentiment and stock returns can be observed.
Considering partial Wavelet Coherence between the two proxies of online sentiment and stock returns in the USA, some interesting facts can be observed in Panels C and D of Fig. 6. First, the partial wavelet coherence between news sentiment and stock returns while eliminating the influence of social media sentiment (Panel C) differs from the ordinary wavelet coherence between news media sentiment and stock returns (Panel A). This is the same for the partial wavelet coherence between Twitter sentiment and stock returns while holding the effect of news constant, as some of the significant coherence islands become redundant. This means that news sentiment (social media sentiment) masks the true relationship between Twitter sentiment (news media) and stock returns. This provides rudimentary evidence of the interdependence of news media and social media in affecting financial market outcomes. This fact is further supported by the multiple coherence results in Panel E which shows how social media sentiment and news media sentiment collectively comove with stock returns. Bigger significant islands can be observed which shows that, taken together, social media sentiment and news media sentiment significantly comove with stock returns for longer periods and at different frequency intervals. In short, the results in Fig. 6 show us that social media dominantly affects future stock returns more than the effect caused by news sentiment in the American market. Fig. 7 presents the Wavelet coherence results for the United Kingdom market.
Fig. 7.
UK - Wavelet-based relationships between stock returns and online sentiment
Notes:Fig. 7 shows relationships between sentiment and returns in the UK market, Panel A and Panel B are based on Wavelet coherence, Panel C and Panel D are based on Partial Wavelet Coherence, Panel E is based on Multiple Wavelet Coherence.
As previously reported in the results for time-varying Granger causality, results for the UK market show some notable differences from the results reported for the USA market. First, there seem to be more significant islands of coherence between news media and stock returns (Panel A) than between social media and stock returns (Panel B). In Fig. 7 (Panel A) warmer islands showing a stronger causal relationship between news sentiment and stock returns can be observed in the medium and long-term frequency intervals. Most of the orienting arrows are pointing to the right, signifying that the variables are in phase, and therefore positively related. The majority of the arrows are also pointing upwards, which shows that for the significant coherence islands, news sentiment seems to lead stock returns for most of the periods. However, some downwards pointing arrows can be observed in Panel A, signifying that it is the stock returns that lead future news sentiment in some periods. This is particularly clear in the 32–64 scale around 2020 where stock returns negatively lead news sentiment. This relationship is contrary to expectation but could be explained by behavioural biases especially given that it occurs right at the start of the COVID-19 pandemic.
In Panel B, there is no consistent relationship between social media sentiment and stock returns across the sample period and at different frequencies. If we shift to partial wavelet coherences, there is not much difference between the ordinary wavelet coherences reported in Panel A and Panel B vis-à-vis the partial coherences reported in Panels C and Panel D. This shows that news media (social media) does not seem to be a significant confounding variable in the ability of social media (news media) influencing stock returns. However, Panel E shows that a combination of news sentiment and social media sentiment significantly comove with stock returns at longer periods and more frequencies. This strong comovement is probably dominated by news sentiment as observed by the results from Panel A. Finally, we end our interpretation of the results using Wavelet coherence analysis for the South African and Brazilian markets presented in Fig. 8, Fig. 9 respectively.
Fig. 8.
SA - Wavelet-based relationships between stock returns and online sentiment
Notes:Fig. 8 shows relationships between sentiment and returns in the SA market, Panel A and Panel B are based on Wavelet coherence, Panel C and Panel D are based on Partial Wavelet Coherence, and Panel E is based on Multiple Wavelet Coherence.
Fig. 9.
Brazil - Wavelet-based relationships between stock returns and online sentiment
Notes:Fig. 9 shows relationships between sentiment and returns in the USA market, Panel A and Panel B are based on Wavelet coherence, Panel C and Panel D are based on Partial Wavelet Coherence, Panel E is based on Multiple Wavelet Coherence.
First, considering the SA market in Fig. 8, news media sentiment and stock returns are mostly in-phase (positively correlated) and most of the time, the former leads the latter. Panel B shows various islands of statistical significance where social media sentiment leads stock returns. However, we also observe notable islands (for example in 2019 and 2020) where stock returns lead social media sentiment. This echoes the results previously reported in the time-varying Granger causality model where stock returns significantly Granger cause social media sentiment for most of the second half of the sample period. The partial Wavelet coherence results reported in Panel C and Panel D are not significantly different from the results from ordinary Wavelet Coherence results reported in Panel A and Panel B, signifying that either proxy does not serve as a confounding variable in the other affecting stock returns. Coming to the Brazilian market, no notable causal relationships can be observed except for a few isolated islands of significant comovements in the short term, which can be attributed to statistical noise [11]. The relationship between textual sentiment and stock returns is mostly insignificant. In the few instances where there are statistically significant comovements, the comovements are zero phase, signifying that the variables simultaneously move together. There is therefore no significant information content in news sentiment and social media sentiment that can be used to predict the future prices of stocks in the Brazilian market.
In the results shown above, we used average variables constructed using equal weights to understand the causal relationships between textual sentiment and stock returns in all four sampled markets in this study. To understand whether the same results obtain at the firm level, we select stocks at random in each of the four markets and use the same variants of wavelet methods to understand how textual sentiment is associated with stock returns in a time-frequency space.7 The results reported at the firm level are mostly qualitatively similar to the findings reported at the aggregate level. For the USA, Twitter sentiment has a more pronounced effect on stock returns than news media sentiment. In the UK, it is news sentiment that has a more pronounced effect on stock returns compared to social media sentiment while in SA and Brazil, the time-frequency relationships are mostly zero phase. The results at the aggregate level, as well as the firm level, show that in the USA, social media is an important platform whose narratives can help predict future stock returns. This is in contrast to UK, SA and Brazil where sentiment from traditional news seems to dominantly lead stock returns compared to sentiment from social media.
4.4. Out-of-sample forecasting
Out-of-sample predictions are typically generated using fixed, rolling, or recursive window approaches. In all these methods, an initial dataset of T observations is utilized to estimate the model parameters. In the fixed window approach, estimation occurs once using a sample of T observations, and forecasts for h steps ahead are based on those estimates. In a rolling window approach, the size of the estimation sample remains constant, but estimation is performed multiple times by adjusting both the starting and ending points of the initial sample by the same interval. Out-of-sample forecasts are then calculated after each estimation. The recursive window method is akin to the rolling window method, but it maintains a constant starting time period while incrementing the ending period. We use VAR (1) models to compute one-step-ahead forecasts of stock returns using social media sentiment and news media sentiment in separate models. We use the model that includes social media sentiment as our benchmark model and the one that includes news media sentiment as our competitor model. We use the fluctuation test [27] to measure the local relative forecasting performance of the two models and analyse its stability over time using statistical tests. The test focuses on the entire time path of the models' relative performance, rather than a single measure of overall performance. We report the findings in Fig. 10.
Fig. 10.
Results from testing for accuracy of out-of-sample forecasting
Notes:Fig. 10 shows out-of-sample forecasting for the USA (Panel A), UK (Panel B), SA (Panel C) and Brazil (Panel D).
Fig. 10 shows “bands” representing the expected range of forecasting performance for two models (social media sentiment and news media sentiment). This band reflects a 95 % confidence level, meaning there's a 95 % chance the actual performance will fall within this area. If the sequence of the test statistics falls entirely within this band, it suggests both models perform similarly in terms of forecasting accuracy. If the sequence of test statistics goes above the top line of the band, it indicates the benchmark model (social media) performs better and outperforms the competitor model (news media sentiment). Conversely, if the sequence dips below the bottom line of the band, it suggests the competitor model (news media sentiment) outperforms the social media model. In Fig. 10, we observe that though for most of the period there is no statistically significant difference in the forecasting accuracy of the two proxies of sentiment, towards the end of the sample period, social media outperforms news sentiment. For the UK and Brazil, most of the period are marked by equivalent forecasting accuracy while there are periods where news sentiment outperforms social media sentiment. For South Africa, there is evidence that shows that the models' forecasting performance is not statistically different. We also repeat the methodology using a non-parametric VAR forecasting model that is robust in the presence of nonlinearity. The results are qualitatively similar but we do not report the results for brevity. However, the results are available from the author upon request.
5. Discussion
This study has given some salient features of the relationships that exist between textual sentiment and stock returns in different geographic regions. Overall, the results show that social media sentiment has a leading influence over future stock returns in the US market. This could be a result of the regulatory environment that governs the disclosure of important corporate information in the USA where companies can disclose important corporate information on social media as long as market participants are notified in advance that such information can be shared on a specific platform. The regulatory requirement which allowed fair disclosure on social media came as a result of the chief executive officer of Netflix, Reed Hastings who posted on his personal social media account that the company had exceeded one billion hours in a month for the first time. This led to a rally in the shares of the company and in his defence, the CEO argued that since he had more than 200,000 subscribers on his social media account, this could be interpreted as a public forum. Thus, in an American context, market-moving information is likely to be disclosed via social media and later picked up by traditional news media. As a result, social media is likely to influence future prices compared to news media. These results could also be attributed to echo chambers “where social media repeats news but investors interpret the news as genuinely new information” [4]. The more pronounced effect of social media sentiment on stock returns in the USA market also confirms the existence of swarm intelligence in the market. It has been demonstrated that groups make some judgments more wisely than individuals do (Galton, 1907). Individual intelligence is inferior to collective intelligence, which is more than the sum of the actions of individual agents (Yaniv & Milyavsky, 2007). When group members' actions are interconnected, such as when they buy and sell to one another, coordinated swarm behaviour emerges (Surowiecki, 2004). This occurs because each group member acts in a way that advances a shared objective. This kind of behaviour can also be seen on interactive online platforms where users with shared objectives usually converge, and the group intellect exhibited through interactions can have the potential to predict stock market features.
In other countries where there is no overt regulation on the disclosure of important corporate information on social media, we observe social media having a trivial effect on future stock returns. In the UK, SA and Brazil, it is actually news sentiment that has a more pronounced effect on stock returns than social media sentiment. Thus, we rule out any significant existence of swarm intelligence in these markets. The fact that disclosure of corporate information is allowed by regulatory authorities could have been a factor contributing to social media in the USA being a platform where “new” information which can move financial markets is disseminated. This is different from the rest of the countries where market-moving “new” information may only be disseminated through press releases and traditional news channels.
Another phenomenon that can be observed in the relationship between textual sentiment and stock returns in a time-frequency domain is the heterogeneous association at different frequencies across all the markets. In the UK, for example, in 2020 we observe no meaningful coherence between news sentiment and stock returns in the short term. In the medium term stock returns negatively lead investor sentiment while in the long term news sentiment positively leads stock returns. Thus, around 2020, short-term investors seem not to react to news in their investing decisions. Medium-term investors, on the other hand, tend to shape narratives based on previous stock market outcomes. Thus, medium-term investors discuss the stock market in retrospect. Long-term investors, on the other hand, positively react to investor sentiment as news sentiment positively leads stock returns during this time. This shows that investors react differently to information based on their different investing horizons. This is contrary to the EMH which assumes a homogenous reaction of investors to new information coming into the market. Thus, these results support the Heterogeneous Market Hypothesis that presumes that investors are heterogeneous and therefore react differently to information because of their different investment horizons and therefore different risk profiles.
The results from time-varying Granger causality as well as wavelet-based methods have revealed that the relationship between textual sentiment and stock returns amplifies during the time COVID-19 was declared a global pandemic by the World Health Organisation in early 2020. This is particularly observed across all the markets. The results are consistent with the Novelty Narrative Hypothesis [28]. COVID-19 was a novel and black swan event that has never happened before, and no one exactly knew how the disease would progress. The Novelty Narrative Hypothesis (NNH) seeks to bring to the fore the role of information in an environment that has been exposed to a novel event. Where there is a novelty, there are increased tendencies for instability which tend to increase the levels of uncertainty in turn. The defining characteristic of a novel event is that its timing and the level of uncertainty it is likely to impose on markets cannot be predicted ex-ante, let alone understood in hindsight. Thus, it is possible that during the COVID-19 pandemic, investors deviated from their traditional quantitative models as they battled to establish the model that could give them the greatest confidence in predicting the future. Narratives formed because of the COVID-19 pandemic could have triggered investor reactions leading to a more pronounced causal relationship between textual sentiment and stock returns during this time.
6. Conclusion
This study aimed to examine the relationship between textual sentiment and stock returns. We used a suite of methods that range from simple parametric methods to complex methods that are robust in the presence of nonlinearity, chaos and deviation from normality. Across all the methods used, we report a more pronounced effect of social media sentiment on stock returns compared to news media sentiment in the USA. In the remaining markets, news seems to be more influential in affecting stock returns compared to social media sentiment. We also observe heterogeneous effects of investor sentiment on stock returns depending on different frequencies used in the wavelet-based methods. The implication is that the EMH is not adequate in modelling the arrival and effect of information in the financial markets. Alternative hypotheses to the EMH like the Heterogenous Market Hypothesis fill this gap as they incorporate the behaviour of different investors depending on their investment horizons in trying to understand the dynamics of financial markets. The results especially around March 2020, when COVID-19 was declared a global pandemic attest to the importance of narrative economics, especially during novel events. The fact that the strength of the causal relationship is not constant but changes across time, possibly due to changing market conditions supports the Adaptive Market Hypothesis. A potential limitation of the study is the exclusion of other variables because of the computational limits of the methods we used in the study. Future studies can therefore explore this phenomenon further by incorporating exogeneous variables. Another limitation of the study is the small sample size caused by the lack of an adequate number of stocks with quality sentiment scores. Future studies can use sentiment scores from primary data rather than third-party suppliers to have more potential stocks that can be included.
7. Implications
The findings underscore important implications for various stakeholders in financial markets. There is a need to acknowledge the substantial impact of social media sentiment on stock returns in the United States, prompting a more refined approach to investment decision-making. Regulatory bodies can leverage these findings to enhance market surveillance techniques, recognising the differential influence of news media and social media sentiment across various markets. Furthermore, the study emphasizes the importance of employing diverse analytical methods for researchers, guiding a more thorough understanding of the relationship between sentiment and stock returns. Lastly, global market participants can consider the varying influence of news media and social media sentiment in different countries when formulating strategies, recognising the need for adaptability in a dynamic financial landscape.
Data availability statement
The data used in this study can be acquired from Bloomberg Inc.
CRediT authorship contribution statement
Kingstone Nyakurukwa: Writing – original draft, Methodology, Investigation, Formal analysis, Conceptualization. Yudhvir Seetharam: Writing – review & editing, Validation, Supervision, Project administration, Investigation, Conceptualization.
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Footnotes
We use the term “online investor sentiment” to collectively represent sentiment that is extracted from two sources; online news and social media platforms.
In this study we use the terms social media sentiment and Twitter sentiment interchangeably.
The 2022 statistics rank USA, UK and Brazil among the top 5 users of Twitter as a percentage of population, while south Africa is in the Top 20 (https://www.statista.com/statistics/242606/number-of-active-twitter-users-in-selected-countries/).
The process used by Bloomberg to compute the sentiment scores is shown in the Appendix.
Interpretation of phase difference is in line with the suggestions of the software authors [29].
For brevity we do not report the results but are available upon request.
Contributor Information
Kingstone Nyakurukwa, Email: knyakurukwa@gmail.com.
Yudhvir Seetharam, Email: yudhvir.seetharam@wits.ac.za.
Appendix.
Appendix A. Sampled stocks
Table A1.
Sampled stocks in the US market
| SYMBOL | COMPANY | SECTOR | |
|---|---|---|---|
| 1 | AAPL | Apple Inc. | Technology |
| 2 | AMGN | Amgen Inc. | Healthcare |
| 3 | AXP | American Express Company | Financials |
| 4 | BA | Boeing Company | Industrials |
| 5 | CAT | Caterpillar Inc. | Industrials |
| 6 | CRM | Salesforce Inc | Technology |
| 7 | CSCO | Cisco Systems Inc. | Technology |
| 8 | CVX | Chevron Corporation | Energy |
| 9 | DIS | Walt Disney Company | Communication Services |
| 10 | GS | Goldman Sachs Group Inc | Financials |
| 11 | HD | Home Depot Inc. | Consumer Discretionary |
| 12 | HON | Honeywell International Inc. | Industrials |
| 13 | IBM | International Business Machines | Technology |
| 14 | INTC | Intel Corporation | Technology |
| 15 | JNJ | Johnson & Johnson | Healthcare |
| 16 | JPM | JPMorgan Chase & Co | Financials |
| 17 | KO | Coca-Cola Company | Consumer Staples |
| 18 | MCD | McDonald's Corporation | Consumer Discretionary |
| 19 | MMM | 3 M Company | Industrials |
| 20 | MRK | Merck & Co. Inc. | Healthcare |
| 21 | MSFT | Microsoft Corporation | Technology |
| 22 | NKE | NIKE Inc. Class B | Consumer Discretionary |
| 23 | PG | Procter & Gamble Company | Consumer Staples |
| 24 | TRV | Travelers Companies Inc | Financials |
| 25 | UNH | UnitedHealth Group Incorporated | Healthcare |
| 26 | V | Visa Inc. Class A | Financials |
| 27 | VZ | Verizon Communications Inc | Communication Services |
| 28 | WBA | Walgreens Boots Alliance Inc. | Healthcare |
| 29 | WMT | Walmart Inc. | Consumer Staples |
Table A2.
Sampled markets in the UK market
| SYMBOL | COMPANY | SECTOR | |
|---|---|---|---|
| 1 | AAL | Anglo American plc | Basic materials |
| 2 | ABDN | Abrdn PLC | Financials |
| 3 | ANTO | Antofagasta plc | Basic materials |
| 4 | AZN | AstraZeneca plc | Healthcare |
| 5 | BA | BAE Systems plc | Industrials |
| 6 | BARC | Barclays PLC | Financials |
| 7 | BATS | British American Tobacco PLC | Consumer staples |
| 8 | BP | BP plc | Energy |
| 9 | BRBY | Burberry Group plc | Consumer discretionary |
| 10 | BTA | BT Group Plc | Communication services |
| 11 | CNA | Centrica PLC | Utilities |
| 12 | DGE | Diageo plc | Consumer staples |
| 13 | EXPN | Experian plc | Industrials |
| 14 | GLEN | Glencore PLC | Basic materials |
| 15 | GSK | GlaxoSmithKline plc | Healthcare |
| 16 | HL | Hargreaves Lansdown PLC | Financials |
| 17 | HSBA | HSBC Holdings plc | Financials |
| 18 | IAG | International Consolidated Airlines | Industrials |
| 19 | LLOY | Lloyds Banking Group PLC | Financials |
| 20 | LSEG | London Stock Exchange Group Plc | Financials |
| 21 | NWG | Natwest Group PLC | Financials |
| 22 | OCDO | Ocado Group PLC | Consumer staples |
| 23 | RIO | Rio Tinto plc | Basic materials |
| 24 | RR | Rolls-Royce Holdings PLC | Industrials |
| 25 | SBRY | J Sainsbury plc | Consumer staples |
| 26 | SHEL | Shell PLC | Energy |
| 27 | STAN | Standard Chartered PLC | Financials |
| 28 | TSCO | Tesco PLC | Consumer staples |
| 29 | UU | United Utilities Group PLC | Utilities |
| 30 | WPP | WPP PLC | Communication services |
Table A3.
Sampled stocks in the SA market
| SYMBOL | COMPANY | SECTOR | |
|---|---|---|---|
| 1 | ABG | Absa Group Ltd | Financials |
| 2 | AGL | Anglo American plc | Basic materials |
| 3 | AMS | Anglo American Platinum Ltd | Basic materials |
| 4 | ANG | AngloGold Ashanti Limited | Basic materials |
| 5 | BTI | British American Tobacco plc | Consumer staples |
| 6 | FSR | FirstRand Ltd | Financials |
| 7 | GFI | Gold Fields Limited | Basic materials |
| 8 | GLN | Glencore PLC | Basic materials |
| 9 | HAR | Harmony Gold Mining Company Ltd | Basic materials |
| 10 | INP | Investec plc | Financials |
| 11 | MTN | MTN Group Ltd | Communication services |
| 12 | NPN | Naspers Limited | Communication services |
| 13 | S32 | South32 Ltd | Basic materials |
Table A4.
Sampled stocks in the Brazilian market
| SYMBOL | COMPANY | SECTOR | |
|---|---|---|---|
| 1 | ABEV3 | Ambev SA | Consumer staples |
| 2 | BBDC3 | Banco Bradesco SA | Financials |
| 3 | BRFS3 | BRF SA | Consumer staples |
| 4 | BRKM5 | Braskem SA | Basic materials |
| 5 | CSNA3 | Companhia Siderurgica Nacional SA | Basic materials |
| 6 | ELET3 | Brazilian Electric Power Co | Utilities |
| 7 | EMBR3 | Embraer SA | Industrials |
| 8 | PETR3 | Petroleo Brasileiro SA Petrobras | Energy |
| 9 | VALE3 | Vale SA | Basic materials |
| 10 | VIVT3 | Telefonica Brasil SA | Communication |
Appendix. B: Computed sentiment scores
Twitter and news average sentiment for this study is extracted from Bloomberg Inc. The process of calculating the average sentiment used by Bloomberg Inc. starts with manually analysing large datasets of tweets and news articles using human experts. Labels are then assigned to each tweet or news article and categorised into positive, negative and neutral labels using the following question;
“if an investor having a long position in the security mentioned were to read this tweet/news article, would he/she be bullish, bearish or neutral on her holdings”
The manually classified feeds are then fed into machine learning models that are taught to imitate language experts in analysing text messages. The completed machine learning models are subsequently used to scrutinise new tweets and news tagged with tickers and assign each tweet/news a story-level sentiment score ranging from −1 to +1 in real-time. Bloomberg does not, however, disclose the details of the models used to determine the sentiment scores because of their proprietary nature. The average firm-level daily sentiment is then extracted from the weighted average story-level sentiment scores in the last 24 h collected from Twitter/StockTwits and more than 50,000 premium online news sources and updated every day 10 min before the market opens and is calculated as:
| (1) |
Where:
is the sentiment polarity score for tweet that references firm ,
is the confidence of tweet that references firm ,
is the set of all non-neutral tweet feeds that reference firm in the 24 h-period ,
is firm total number of positive or negative tweets during period .
ranges from −1, the most negative sentiment to +1, the most positive sentiment. This means that an average sentiment score of 0 denotes neutral sentiment.
References
- 1.Russ-Mohl S., Nienstedt H.-W., Wilczek B. Journalism and Media Convergence: an Introduction. De Gruyter; 2013. Journalism and media convergence: an introduction; pp. 3–18. [DOI] [Google Scholar]
- 2.Turcotte J., York C., Irving J., Scholl R.M., Pingree R.J. News recommendations from social media opinion leaders: effects on media trust and information seeking. J. Computer-Mediated Commun. 2015;20(5):520–535. doi: 10.1111/jcc4.12127. [DOI] [Google Scholar]
- 3.Nyakurukwa K., Seetharam Y. From Shanghai to Wall Street: the influence of Chinese news sentiment on US stocks. J. Behav. Finance. 2023;0(0):1–14. doi: 10.1080/15427560.2023.2270100. [DOI] [Google Scholar]
- 4.Jiao P., Veiga A., Walther A. Social media, news media and the stock market. J. Econ. Behav. Organ. 2020;176:63–90. doi: 10.1016/j.jebo.2020.03.002. [DOI] [Google Scholar]
- 5.Souza T.T.P., Kolchyna O., Treleaven P.C., Aste T. Handbook of Sentiment Analysis in Finance. 2015. Twitter sentiment analysis applied to finance: a case study in the retail industry.https://arxiv.org/abs/1507.00784v3 [Google Scholar]
- 6.Lachana I., Schröder D. Investor sentiment, social media and stock returns: wisdom of crowds or power of words? 2022. (SSRN Scholarly Paper 3842039) [DOI]
- 7.Xu Y., Wang J., Chen Z., Liang C. Sentiment indices and stock returns: evidence from China. Int. J. Finance Econ. 2021:1–18. doi: 10.1002/ijfe.2463. Ahead-of-print. [DOI] [Google Scholar]
- 8.Smith S., O'Hare A. Comparing traditional news and social media with stock price movements; which comes first, the news or the price change? Journal of Big Data. 2022;9(1):47. doi: 10.1186/s40537-022-00591-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Alomari M., Al Rababa’a A.R., El-Nader G., Alkhataybeh A., Ur Rehman M. Examining the effects of news and media sentiments on volatility and correlation: evidence from the UK. Q. Rev. Econ. Finance. 2021;82:280–297. doi: 10.1016/j.qref.2021.09.013. [DOI] [Google Scholar]
- 10.Duan Y., Liu L., Wang Z. COVID-19 sentiment and the Chinese stock market: evidence from the official news media and Sina weibo. Res. Int. Bus. Finance. 2021;58 doi: 10.1016/j.ribaf.2021.101432. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Nyakurukwa K., Seetharam Y. The wisdom of the Twitter crowd in the stock market: evidence from a fragile state. African Review of Economics and Finance. 2022;14(1):203–228. doi: 10.10520/ejc-aref_v14_n1_a7. [DOI] [Google Scholar]
- 12.Li J., Chen Y., Shen Y., Wang J., Huang Z. Measuring China's stock market sentiment. 2019. (SSRN Scholarly Paper 3377684) [DOI]
- 13.Rao L., Zhou L. The role of stock price synchronicity on the return-sentiment relation. N. Am. J. Econ. Finance. 2019;47:119–131. doi: 10.1016/j.najef.2018.12.008. [DOI] [Google Scholar]
- 14.Guo J., Li Y., Zheng M. Bottom-up sentiment and return predictability of the market portfolio. Finance Res. Lett. 2019;29(C):57–60. [Google Scholar]
- 15.Yu J. Disagreement and return predictability of stock portfolios. J. Financ. Econ. 2011;99(1):162–183. doi: 10.1016/j.jfineco.2010.08.004. [DOI] [Google Scholar]
- 16.Guo J., Li Y., Zheng M. Bottom-up sentiment and return predictability of the market portfolio. Finance Res. Lett. 2019;29:57–60. doi: 10.1016/j.frl.2019.03.008. [DOI] [Google Scholar]
- 17.Mecklin C. Murray State University; 2021. Ordinary, Multiple and Partial Correlations.http://campus.murraystate.edu/academic/faculty/cmecklin/STA565/_book/correlations-multiple-and-partial.html [Google Scholar]
- 18.Swanson N.R. Money and output viewed through a rolling window. J. Monetary Econ. 1998;41(3):455–474. doi: 10.1016/S0304-3932(98)00005-1. [DOI] [Google Scholar]
- 19.Shi S., Phillips P.C.B., Hurn S. Change detection and the causal impact of the yield curve. J. Time Anal. 2018;39(6):966–987. doi: 10.1111/jtsa.12427. [DOI] [Google Scholar]
- 20.Shi S., Hurn S., Phillips P.C.B. Causal change detection in possibly integrated systems: revisiting the money–income relationship. J. Financ. Econom. 2020;18(1):158–180. doi: 10.1093/jjfinec/nbz004. [DOI] [Google Scholar]
- 21.Thoma M. Subsample instability and asymmetries in money-income causality. J. Econom. 1994;64(1–2):279–306. [Google Scholar]
- 22.Phillips P.C.B., Shi S., Yu J. Testing for multiple bubbles: limit theory of real-time detectors. Int. Econ. Rev. 2015;56(4):1079–1134. doi: 10.1111/iere.12131. [DOI] [Google Scholar]
- 23.Arora V., Shi S. Energy consumption and economic growth in the United States. Appl. Econ. 2016;48(39):3763–3773. doi: 10.1080/00036846.2016.1145347. [DOI] [Google Scholar]
- 24.Ng E.K.W., Chan J.C.L. Geophysical applications of partial wavelet coherence and multiple wavelet coherence. J. Atmos. Ocean. Technol. 2012;29(12):1845–1853. doi: 10.1175/JTECH-D-12-00056.1. [DOI] [Google Scholar]
- 25.Xu Y., Liu Z., Zhao J., Su C. Weibo sentiments and stock return: a time-frequency view. PLoS One. 2017;12(7) doi: 10.1371/journal.pone.0180723. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Mihanović H., Orlić M., Pasarić Z. Diurnal thermocline oscillations driven by tidal flow around an island in the Middle Adriatic. J. Mar. Syst. 2009;78:S157–S168. doi: 10.1016/j.jmarsys.2009.01.021. [DOI] [Google Scholar]
- 27.Giacomini R., Rossi B. Forecast comparisons in unstable environments. J. Appl. Econom. 2010;25(4):595–620. doi: 10.1002/jae.1177. [DOI] [Google Scholar]
- 28.Mangee N. Cambridge University Press; 2021. How Novelty and Narratives Drive the Stock Market: Black Swans, Animal Spirits and Scapegoats. [Google Scholar]
- 29.Gouhier T.C., Grinsted A., Simko V. Biwavelet: Conduct Univariate and bivariate wavelet analyses (0.20.21) 2021. https://cran.r-project.org/web/packages/biwavelet/index.html [Computer software]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The data used in this study can be acquired from Bloomberg Inc.










