Abstract
We examine COVID-19 related topics discussed in the printed edition of the Wall Street Journal. Using text analytics and topic modeling algorithms, we discover 15 distinct topics and present differences in their sentiment (polarity) and hype (intensity of coverage) trends throughout 2020. Importantly, the hype of the topic, not the sentiment, relates to stock market returns. In particular, the hype scores for Debt market and Financial markets have the strongest positive relation to the stock market performance.
Keywords: COVID-19, Media hype, Media sentiment, Stock market, Topic modeling, Text analytics
1. Introduction
The emergence of the Coronavirus (COVID-19) pandemic has impacted every aspect of our lives. At its peak in March 2020, the Coronavirus media coverage index showed that 84 percent of all news sources covered stories about COVID-19; and the index remained above the 70 percent mark through December 2020.1 By March 23, 2020, the S&P 500 index declined by 34% from its all-time high reached only a month earlier.
In this study, we analyze all COVID-19-related news articles published in the printed edition of the Wall Street Journal. We identify 15 distinct topics and document differences in their sentiment (polarity) and hype (intensity of coverage) trends throughout 2020. Importantly, we find that the hype scores of some topics (in particular, Debt market and Financial markets), but not the sentiment scores, significantly relate to the S&P 500 index returns.
This study extends two strands of literature. First, by examining sentiment and hype at the topic level, it contributes to the set of studies documenting the importance of media sentiment/coverage in predicting changes in asset prices and financial markets (see Tetlock, 2007, Bhattacharya et al., 2009, Fang and Peress, 2009, Shiller, 2015, Uhl et al., 2015), especially during times of economic or political uncertainty (Wisniewski and Lambe, 2013, Smales, 2014, Broadstock and Zhang, 2019, Shi and Ho, 2020). Second, the paper contributes to a rapidly growing literature on the market response to the COVID-19 pandemic and, in particular, to the studies utilizing sentiment and news media data (see O’Donnell et al., 2021, Ali et al., 2020, Al-Awadhi et al., 2020, Schell et al., 2020, Akhtaruzzaman et al., 2020, Zhang et al., 2020, Haroon and Rizvi, 2020, Costola et al., 2020, Baig et al., 2021).
2. Data and methodology
To identify major topics related to COVID-19, we collect articles published during 2020 in the printed edition of the Wall Street Journal. Our sample consists of 6552 articles, comprising over five million words in total.2 By performing topic modeling based on the Latent Semantic Analysis (LSA) (see Albright, 2004, Chakraborty et al., 2014, Alexander et al., 1998, Manning and Schütze, 1999),3 we identify 15 distinct and meaningful topics related to COVID-19.4 We label these topics by examining the top 20 words with the highest weights. We further confirm our selection of the topic labels by manually and independently examining the top three articles for each topic.
To estimate the media sentiment for each topic, we follow Hu and Liu (2004) and compute topic-specific sentiment score as the difference between the sum of positive and the sum of negative words, normalized by the length of the article. The sentiment score reflects the degree of polarity (i.e., negative, neutral, or positive) of a given article. In addition to the sentiment score, we compute the media hype score for each topic based on the number of articles belonging to a particular topic and the weight of the topic in each article. Thus, the hype score captures both the breadth and intensity of coverage.
Table 1.
Topics, sentiment, and hype scores.
|
Panel A: Topics |
Panel B: Sentiment (Hype) Scores |
||||||
|---|---|---|---|---|---|---|---|
| Topic label | Top 20 words | Percentage of articles | Mean | Median | Standard deviation | Maximum | Minimum |
| Family/social connections | home, family, feel, friend, live, room, husband, life, child, thing, love, house, want, daughter, space, wife, mother, work, restaurant, couple | 8.75% | 0.0254 | 0.0199 | 0.1080 | 0.4991 | 0.1691 |
| (6.0439) | (3.2900) | (8.4512) | (29.3220) | (2.3230) | |||
| World economy and politics | chinese, country, world, government, global, foreign, political, economy, economic, european, policy, power, oil, international, trade, war, production, leader, export, epidemic | 8.73% | 0.0809 | 0.0774 | 0.0616 | 0.0338 | 0.2239 |
| (6.1291) | (6.5920) | (2.3518) | (9.8140) | (0.0840) | |||
| Workplace safety | worker, hospital, plant, facility, production, employee, supply, nurse, factory, oil, equipment, company, patient, ventilator, meat, protective, demand, healthcare, shortage, care | 8.29% | 0.1285 | 0.1265 | 0.0815 | 0.0190 | 0.2682 |
| (8.1610) | (8.6570) | (2.7418) | (13.7640) | (0.1530) | |||
| Retail business | sale, quarter, store, retailer, company, consumer, customer, revenue, share, demand, profit, product, shopper, business, analyst, brand, e-commerce, online, chain, shop | 8.26% | 0.1096 | 0.0990 | 0.1265 | 0.1018 | 0.7140 |
| (7.8777) | (8.2240) | (2.5662) | (11.9070) | (0.5440) | |||
| Debt market | debt, investor, billion, fund, company, loan, property, york, bankruptcy, bank, firm, lender, asset, deal, court, bond, investment, city, sell, file | 8.03% | 0.0720 | 0.0762 | 0.0565 | 0.1433 | 0.1529 |
| (5.6104) | (5.8820) | (1.9938) | (8.7740) | (0.0630) | |||
| Local pandemic | death, state, case, cuomo, york, city, gov, governor, infection, resident, johns, reopen, hopkins, new case, official, number, hospitalization, test, restriction, blasio | 7.79% | 0.2317 | 0.2248 | 0.2606 | 0.1411 | 1.7118 |
| (11.5167) | (12.0490) | (3.1594) | (18.1800) | (2.0640) | |||
| Contact tracing | test, employee, information, user, infect, symptom, cdc, expert, contact, app., data, person, tech, company, virus, technology, apps, inc, software, google | 7.63% | 0.0615 | 0.0536 | 0.0995 | 0.1881 | 0.5362 |
| (3.8971) | (3.5830) | (1.6783) | (7.9840) | (0.5480) | |||
| Treatment | patient, hospital, drug, treatment, study, doctor, antibody, treat, blood, disease, remdesivir, plasma, therapy, medical, hospitalize, researcher, care, symptom, clinical, physician | 7.03% | 0.1067 | 0.0963 | 0.0879 | 0.0747 | 0.4580 |
| (6.2740) | (6.6940) | (2.0774) | (9.7610) | (0.3930) | |||
| Government funding | unemployment, bill, worker, job, loan, trillion, senate, tax, payment, federal, economist, economy, package, spending, benefit, aid, stimulus, employer, money, democrat | 6.85% | 0.0921 | 0.0861 | 0.0776 | 0.0577 | 0.3394 |
| (7.5880) | (7.8730) | (2.7798) | (11.6140) | (0.1930) | |||
| Financial markets | investor, stock, market, s&p 500, index, bond, yield, oil, share, price, trade, investment, bank, gain, point, rally, composite, economy, recovery, average | 6.26% | 0.1152 | 0.1037 | 0.0983 | 0.0398 | 0.5030 |
| (7.2272) | (7.1660) | (3.2504) | (13.9690) | (0.4790) | |||
| Presidential election | trump, biden, election, voter, campaign, democrat, vote, democratic, president, senate, presidential, republicans, ballot, house, joe, white, poll, win, party, president | 6.09% | 0.0230 | 0.0193 | 0.0742 | 0.1465 | 0.2129 |
| (5.6460) | (6.2740) | (2.5003) | (9.5230) | (0.2320) | |||
| Vaccine | vaccine, dose, trial, shot, pfizer, moderna, drug, clinical, vaccinate, biontech, clinical trial, develop, study, fda, authorization, vaccination, pfizer inc, effective, candidate, astrazeneca | 4.87% | 0.0452 | 0.0487 | 0.0793 | 0.1216 | 0.3320 |
| (7.0349) | (6.7450) | (3.2301) | (13.9590) | (0.2840) | |||
| Travel | airline, passenger, flight, travel, carrier, airlines, traveler, airport, fly, plane, air, trip, ship, aircraft, mask, hotel, cruise, crew, international, booking | 4.42% | 0.0738 | 0.0761 | 0.0467 | 0.0195 | 0.1563 |
| (4.7669) | (5.1100) | (1.7780) | (8.0650) | (0.0530) | |||
| Education | student, school, teacher, learning, child, parent, in-person, district, class, classroom, instruction, education, remote, college, campus, remote learning, fall, online, grade, kid | 4.13% | 0.0821 | 0.0704 | 0.1456 | 0.1210 | 0.9342 |
| (5.2641) | (4.9520) | (2.4170) | (10.6130) | (0.7730) | |||
| Sports | player, game, league, season, football, team, sport, play, coach, athlete, baseball, college, test, sports, fan, positive, mlb, protocol, tournament, ten | 2.87% | 0.0716 | 0.0598 | 0.0857 | 0.0119 | 0.5436 |
| (4.3014) | (4.1250) | (1.7639) | (10.0160) | (0.5650) | |||
Notes: The table presents 15 major topics, along with top 20 words and the percentage of articles represented in each topic (Panel A) and descriptive statistics for the sentiment and hype (in parentheses) scores for each topic (Panel B). Topics are generated from 6552 articles published in the printed edition of the Wall Street Journal during January 1 – December 31, 2020 that mention the word “COVID”.
3. Results
Table 1 presents the top 20 words and the percentage of articles (Panel A), and summary statistics on the sentiment and hype scores (Panel B) for each of the 15 discovered topics. The diversity of the topics covered in the main business newsprint, ranging from financial markets to family, education, and sports, reflects the widespread impact of the pandemic that touched all facets of life. The identified topics listed from the most to least frequent are: Family/social connections, World economy and politics, Workplace safety, Retail business, Debt market, Local pandemic, Contact tracing, Treatment, Government funding, Financial markets, Presidential election, Vaccine, Travel, Education, and Sports.
The mean and median sentiment scores for all topics are negative, which is not surprising given the negative tone of COVID-19 pandemic in the media. The sentiment scores for Family/social connections and Presidential election have the highest medians and means, through still negative at around 0.02. Local pandemic stands out among the 15 topics both in terms of sentiment and hype scores. Specifically, it shows the lowest mean (median) sentiment score of 0.2317 (0.2248) and the highest mean (median) hype score of 11.5167 (12.0490).
Fig. 1 shows the time-series patterns in sentiment and hype scores for each of the 15 topics and highlights the diversity in topic time-series patterns. For some topics, such as Debt market, Treatment, and Presidential election, sentiment scores show smooth patterns without significant spikes and declines. For other topics, such as Retail business and Local pandemic, sentiment scores exhibit sharp increases and decreases. The sentiment for Family/social connections displays a downward trend, whereas the sentiment for Financial markets shows a somewhat consistent upward trend.
Fig. 1.
Sentiment and hype scores for 15 major topics mentioning COVID in the Wall Street Journal in 2020. Notes: The graphs present time-series patterns in sentiment and media hype scores for each of the 15 topics identified in Table 1. The sentiment score in a given month is calculated for each topic as the difference between the positive and negative sum of words, normalized by the length of the document. The hype score in a given month is based on the weights and number of documents belonging to a particular topic.
The patterns of media hype scores are more striking. Specifically, hype scores for all topics, except Family/social connections, show a rapid surge from February to April. After that, hype scores tend to bounce up and down for most topics but show a steady decline for Financial markets and Vaccine. The patterns of hype scores for Family/social connections and Contact tracing are noticeably different from those of other topics. Specifically, a hype score for Family/social connections shows negative values and a slow increase at the start of the pandemic, but a rapid exponential increase starting in August. A hype score for Contact tracing shows a steady growth from February to December.
To offer a distinct illustration of a change in the degree of topic coverage, Fig. 2 presents the proportional distributions of the topics’ hype values in March 2020 – the drastic drop in the stock market due to COVID-19 pandemic – versus December 2020 – the last month of our sample period representing the year-end recovery of the stock market. The height of each block within a stacked horizontal bar chart shows the proportion of a specific topic. The intensity of coverage of the Family/social connections topics increased drastically, going from a meager 3% in March 2020 to a sizeable 22% in December 2020. Among the topics that experienced a decrease in hype, Vaccine and Local pandemic had the largest declines of 5.8% and 4.7%, respectively.
Fig. 2.
Comparison of topic media hype distribution in December 2020 versus March 2020. Notes: The chart presents media hype distribution for the 15 topics identified in Table 1. The hype score is based on number of articles belonging to a particular topic and the weight of the topic in each article.
To examine the relation between the sentiment and hype of the 15 topics and the S&P 500 index, we regress the returns of the S&P 500 index on its first lag and the topic sentiment and hype scores.5 Thus, for each topic i, we estimate the following model:
where t represents a week during a 47-week period from February 12 to December 31, 2020.6
Table 2 presents regression results for six topics with significant standardized coefficients.7 Coefficients for the hype score for World economy and politics, Workplace safety, Debt market, Local pandemic, Government funding, and Financial markets are positively related to the S&P 500 index returns with the level of significance ranging from 5% to 10%. Taken together, the regression results suggest two conclusions: (1) media hype, not sentiment, relates to stock market returns during the pandemic; (2) among different topics, media hype for topics related to business and economy, such as World economy and politics, Debt market, and Financial markets, demonstrates the strongest relation to stock market returns.
Table 2.
Topic sentiment and Hype scores and S&P 500 index returns.
| Topic label | Constant | Sentiment score |
Hype score |
Lagged S&P 500 index | R2 |
|---|---|---|---|---|---|
| World economy and politics | 0.03 | 0.01 | 0.29⁎ | 0.11 | 0.087 |
| (0.11) | (0.96) | (0.06) | (0.46) | ||
| Workplace safety | 0.04 | 0.09 | 0.27⁎ | 0.11 | 0.073 |
| (0.14) | (0.56) | (0.08) | (0.45) | ||
| Debt market | 0.03 | 0.06 | 0.30⁎⁎ | 0.12 | 0.097 |
| (0.15) | (0.67) | (0.05) | (0.42) | ||
| Local pandemic | 0.05 | 0.17 | 0.26⁎ | 0.06 | 0.074 |
| (0.11) | (0.28) | (0.10) | (0.71) | ||
| Government funding | 0.04 | 0.06 | 0.28⁎ | 0.09 | 0.083 |
| (0.12) | (0.69) | (0.06) | (0.55) | ||
| Financial markets | 0.02 | 0.17 | 0.29⁎⁎ | 0.14 | 0.110 |
| (0.32) | (0.25) | (0.05) | (0.36) |
Notes: The table presents the results of the ordinary least squares regressions examining the relation between topic sentiment and hype scores and the S&P 500 index returns: where t represents a week during a 47-week period from February 12 to December 31, 2020 and i represents one of the 15 topics described in Table 1. The number of observations is 47 in each regression. Two-tailed p values are shown in brackets below the estimated coefficients. To conserve space, the regression results are presented only for six topics with significant coefficients on hype score.
0.1.
0.05.
4. Summary
This study offers new empirical evidence on the relation between COVID-19 related news and stock market returns. By examining media sentiment and hype at topic level, we find that the hype scores, which reflect both the breadth and intensity of coverage, not the sentiment scores, which reflect polarity, are significantly related to the S&P 500 index returns during 2020. In particular, the hype scores for Debt market and Financial markets have the strongest positive relation to the stock market performance. These results suggest that when the sentiment scores are generally negative, the intensity of the topic coverage is more informative for predicting stock market performance. The results also imply that business media can play a role in building investors’ confidence in the path for economic recovery by increasing the coverage of COVID-19-related topics. As COVID-19 pandemic continues beyond 2020, more research is needed to confirm the persistence of the reported relation between the hype of topics and stock market returns.
Acknowledgment
Financial support from the CPA Ontario Center for Public Policy and Innovation in Accounting is gratefully acknowledged.
Footnotes
See Ravenpack available at https://coronavirus.ravenpack.com/.
The first two articles that mention COVID appear in the Wall Street Journal on February 12, 2020.
In topic modeling, documents exhibiting similar patterns are captured using similarity metrics and dimension reduction procedures with the flexibility for researchers to explore the underlying number of topics and their distinctiveness through influential words.
As a robustness check, we considered different numbers of topics, ranging from 6 to 20.
In unreported analysis, we ran the regressions of the topic sentiment and hype scores separately with qualitatively similar results.
We note that the number of observations (equal to 47) is relatively small and, therefore, interpret the results with caution.
To conserve space, the regression results are omitted for other topics for which the coefficients on sentiment and hype scores are not significant.
References
- Akhtaruzzaman M., Boubaker S., Sensoy A. Financial contagion during COVID–19 crisis. Finance Res. Lett. 2020;37 doi: 10.1016/j.frl.2020.101604. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Al-Awadhi A.M., Alsaifi K., Al-Awadhi A., Alhamadi S. Death and contagious infectious diseases: Impact of the COVID-19 virus on stock market returns. J. Behav. Exp. Finance. 2020;27 doi: 10.1016/j.jbef.2020.100326. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Albright R. SAS Institute Inc.; NC, U.S.A: 2004. Taming Text with the SVD. [Google Scholar]
- Alexander L., Johnson R., Weiss J. Exploring Zipf’s law. Teach. Math. Appl.: Int. J. IMA. 1998;17(4):155–158. [Google Scholar]
- Ali M., Alam N., Rizvi S.A.R. Coronavirus (COVID-19)—An epidemic or pandemic for financial markets. J. Behav. Exp. Finance. 2020;27 doi: 10.1016/j.jbef.2020.100341. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Baig A.S., Butt H.A., Haroon O., Rizvi S.A.R. Deaths, panic, lockdowns and US equity markets: The case of COVID-19 pandemic. Finance Res. Lett. 2021;38 doi: 10.1016/j.frl.2020.101701. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bhattacharya U., Galpin N., Ray R., Yu X. The role of the media in the internet IPO bubble. J. Financ. Quant. Anal. 2009;44(3):657–682. [Google Scholar]
- Broadstock D., Zhang D. Social-media and intraday stock returns: The pricing power of sentiment. Finance Res. Lett. 2019;30:116–123. [Google Scholar]
- Chakraborty D.G., Pagolu M., Garla S. SAS Institute; NC, U.S.A: 2014. Text Mining and Analysis: Practical Methods, Examples, and Case Studies using SAS. [Google Scholar]
- Costola M., Iacopini M., Santagiustina C.R. Google search volumes and the financial markets during the COVID-19 outbreak. Finance Res. Lett. 2020 doi: 10.1016/j.frl.2020.101884. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fang L., Peress J. Media coverage and the cross-section of stock returns. J. Finance. 2009;64:2023–2052. [Google Scholar]
- Haroon O., Rizvi S.A.R. COVID-19: media coverage and financial markets behavior–A sectoral inquiry. J. Behav. Exp. Finance. 2020 doi: 10.1016/j.jbef.2020.100343. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hu M., Liu B. Proceedings of AAAI Conference on Artificial Intelligence, Vol. 4. 2004. Mining opinion features in customer reviews; pp. 755–760. Available online. https://www.aaai.org/Papers/AAAI/2004/AAAI04-119.pdf. [Google Scholar]
- Manning C.D., Schütze H. MIT Press; 1999. Foundations of Statistical Natural Language Processing. [Google Scholar]
- O’Donnell N., Shannon D., Sheehan B. Immune or at-risk? Stock markets and the significance of the COVID-19 pandemic. J. Behav. Exp. Finance. 2021;30 doi: 10.1016/j.jbef.2021.100477. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schell D., Wang M., Huynh T.L.D. This time is indeed different: A study on global market reactions to public health crisis. J. Behav. Exp. Finance. 2020;27 doi: 10.1016/j.jbef.2020.100349. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shi Y., Ho K.Y. News sentiment and states of stock return volatility: Evidence from long memory and discrete choice models. Finance Res. Lett. 2020;38 [Google Scholar]
- Shiller R.J. Princeton university press; 2015. Irrational Exuberance: Revised and Expanded Third Edition. [Google Scholar]
- Smales L.A. News sentiment and the investor fear gauge. Finance Res. Lett. 2014;11:122–130. [Google Scholar]
- Tetlock P.C. Giving content to investor sentiment: The role of media in the stock market. J. Finance. 2007;62(3):1139–1168. [Google Scholar]
- Uhl M.W., Pedersen M., Malitius O. What’s in the news? Using news sentiment momentum for tactical asset allocation. J. Portf. Manag. 2015;41(2):100–112. [Google Scholar]
- Wisniewski T.P., Lambe B. The role of media in the credit crunch: The case of the banking sector. J. Econ. Behav. Organ. 2013;85:163–175. [Google Scholar]
- Zhang D., Hu M., Ji Q. Financial markets under the global pandemic of COVID-19. Finance Res. Lett. 2020;36 doi: 10.1016/j.frl.2020.101528. [DOI] [PMC free article] [PubMed] [Google Scholar]



