Skip to main content
. 2014 Feb 27;4:4213. doi: 10.1038/srep04213

Figure 1. Examples of when hourly changes in social media sentiment contain lead-time information securities' hourly returns ahead of time.

Figure 1

We refer to the percentage increase in Mutual Information between hourly changes in the social media sentiment data and securities' hourly returns at leading time-shifts, relative to zero time-shift, as the information surplus. Here, social media sentiment data is offset such that it precedes financial data, and the Mutual Information between the two time-series is compared to that which is available at no time-shift. If the information surplus is positive, then sentiment data contains more Mutual Information about financial data at an exploitable leading time-shift, compared with the no-offset configuration. We suggest that in such scenarios, hourly changes in the sentiment data contain lead-time information about securities' hourly returns as they remove more uncertainty, ahead of time, about the financial data time-series than if the two time-series are not offset. To determine eligibility for social media to lead financial data, three further caveats were met: the assets' Twitter Filters attracted a minimum mean message volume of 60 messages per hour from our connection to Twitter's 10% Gardenhose feed; the information surplus values were greater when sentiment data preceded financial data, than the converse (when financial data preceded sentiment data); and finally that the observations were statistically-significant to the 99% confidence interval (relative to sentiments generated from randomly permutated data). In this manner, we identify twelve instruments for which hourly changes in the sentiments of social media messages contain lead-time information about securities' hourly returns ahead of time. In this figure, we show the maximum information surplus seen per time-shift. Of the permitted assets, Apple Inc. was the only company for which such an indication was visible using a Twitter Filter searching solely for an asset's industry Ticker-ID (rather than the company name). Tweets on the remaining individual stocks were obtained by filtering Twitter for Company Names AND/OR their industry Ticker-IDs. Finally, the sentiments of string-unfiltered Tweets from the USA were shown to lead the returns of S&P500 Futures for one time-shift.