Skip to main content
PLOS One logoLink to PLOS One
. 2020 Sep 30;15(9):e0240010. doi: 10.1371/journal.pone.0240010

Framing COVID-19: How we conceptualize and discuss the pandemic on Twitter

Philipp Wicke 1,*, Marianna M Bolognesi 2
Editor: Panos Athanasopoulos3
PMCID: PMC7526906  PMID: 32997720

Abstract

Doctors and nurses in these weeks and months are busy in the trenches, fighting against a new invisible enemy: Covid-19. Cities are locked down and civilians are besieged in their own homes, to prevent the spreading of the virus. War-related terminology is commonly used to frame the discourse around epidemics and diseases. The discourse around the current epidemic makes use of war-related metaphors too, not only in public discourse and in the media, but also in the tweets written by non-experts of mass communication. We hereby present an analysis of the discourse around #Covid-19, based on a large corpus tweets posted on Twitter during March and April 2020. Using topic modelling we first analyze the topics around which the discourse can be classified. Then, we show that the WAR framing is used to talk about specific topics, such as the virus treatment, but not others, such as the effects of social distancing on the population. We then measure and compare the popularity of the WAR frame to three alternative figurative frames (MONSTER, STORM and TSUNAMI) and a literal frame used as control (FAMILY). The results show that while the FAMILY frame covers a wider portion of the corpus, among the figurative frames WAR, a highly conventional one, is the frame used most frequently. Yet, this frame does not seem to be apt to elaborate the discourse around some aspects involved in the current situation. Therefore, we conclude, in line with previous suggestions, a plethora of framing options—or a metaphor menu—may facilitate the communication of various aspects involved in the Covid-19-related discourse on the social media, and thus support civilians in the expression of their feelings, opinions and beliefs during the current pandemic.

Introduction

On December 31, 2019, Chinese authorities alerted the World Health Organization of pneumonia cases in Wuhan City, within the Hubei province in China. The cause, they initially said, was unknown, and the disease was first referred to as 2019-nCoV and then named COVID-19. The next day, the Huanan seafood market was closed, because it was suspected to be the source of the unknown disease, as some of the patients presenting with the pneumonia-like illness were dealers or vendors at that market. Since then, the disease has spread quickly throughout China, and from there to the rest of the world. SARS-CoV-2 is the name of the virus responsible for this coronavirus pandemic that we are experiencing while the present article was in writing. The virus has so far spread throughout all the inhabited continents and affected millions of people, killing thousands of individuals. Schools have been shut down, kids are still at home in many countries, many citizens are now working remotely, locked down in their houses, and leaving only for reasons of primary necessity, such as shopping for groceries and going to medical appointments.

With many countries implementing lock downs and promoting quarantines, suggesting or forcing citizens to stay inside their homes in order to avoid spreading the virus, millions of people are experiencing a global pandemic for the first time in their lives. The social distancing enforced by various governments stimulated many internet users to use social media to communicate and express their own concerns, opinions, beliefs and feelings in relation to this new reality. On Twitter, tweets with hashtags such as #coronavirus, #Covid-19 or #Covid pile up quickly (for instance, we accumulated around 16,000 tweets per hour). A variety of issues are debated on a daily basis on Twitter, in relation to the pandemic. These include, but are not limited to, the political and social consequences of various governmental decisions, the situations in the hospitals getting increasingly more crowded every day, the interpretation of the numbers associated with the spreading of the pandemic, the problems that families face with homeschooling their children while working from home, and so forth. Among these issues, the discussion around the treatment and containment of the virus is surely a central topic.

The present article aims at describing how the discourse around Covid-19 is framed on Twitter. In particular, we present a study that elucidates what the main topics are related to the discourse around Covid-19 on Twitter and to what extent the treatment of the disease is framed figuratively. Because previous research has shown that various social and political issues addressed in public discourse are framed in terms of wars [1], we assumed that this tendency may emerge also on Twitter, in relation to the discourse around Covid-19. Although Twitter contains messages written by journalists and other experts in mass communication, most tweets are provided by non-expert communicators. We investigated to what extent Twitter users, and therefore non-expert communicators, frame Covid-19 in terms of a war, and whether other figurative framings arise from automated analyses of our corpus of Tweets containing virus-related hashtags.

In particular, we addressed the following research questions:

  1. What type of topics are discussed on Twitter, in relation to Covid-19?

  2. To what extent is the WAR figurative frame and the conventional metaphor DISEASE TREATMENT IS WAR used to talk about Covid-19 on Twitter? Which lexical units are used within this metaphorical frame and which lexical units are not?

  3. Are there alternative figurative frames used to talk about Covid-19 on Twitter? And how does their use compare to the use of the WAR frame?

These three questions are addressed in the remainder of this paper in this same order. For each question we present methods, results and discussion of specific corpus-based analyses.

The innovative aspect of this paper lies in the quantitative nature of our observations and analyses of figurative framings used in pandemic-related discourse, and in the methods used: topic modelling. In particular, the WAR frame, which previous studies have identified as pervasive in many crisis-related texts, is hereby investigated by means of automated methods (topic modelling) applied to real-world data. This is a new approach in cognitive linguistics and metaphor studies, where the analysis of figurative frames is typically based on qualitative observations or small-scale corpus analyses. By answering the research questions outlined above our study opens the path to further investigations that may take a longitudinal perspective on the current reality, to investigate how the discourse around Covid-19 changes, with the development of new phases of the pandemic. These present results and the future developments provide important information in the field of opinion mining and can be used to understand the current state of mind, beliefs and feelings of various communities.

Theoretical background

Mining the information encoded by private internet users in the short texts posted on Twitter (the tweets) is becoming an increasingly fruitful field of research. In relation to health discourse, tweets have been used by epidemiologists to access supplementary data about epidemics. For example, tweets about particular diseases have been compared to gold-standard incidence data, showing that there are positive correlations between the number of tweets discussing flu-symptoms and official statistics about the virus spread such as those published by the Centers for Disease Control and Prevention and the Health Protection Agency [2]. Already a decade ago, in Brazil, tweets have been used to track the spreading of the dengue fever, a mosquito-transmitted virus [3]. More recently, Pruss and colleagues [4] used a topic model applied to a large corpus of tweets to automatically identify and extract the key topics of discussion about the Zika disease, a virus that spread mainly in the Americas in early 2015. The authors also found that rises in tweeting activity tended to follow major events related to the disease, and Zika-related discussions were moderately correlated with the virus incidence. Moreover, it has been demonstrated that the combination of data collected from hospitals about specific diseases, and data collected from social media, can improve surveillance and forecasting about the disease more effectively [5, 6].

Besides providing a valuable tool for tracking the spread of epidemics, and thus helping experts to make more effective decisions, social media have been used to investigate public awareness, attitudes and reactions about specific diseases [7, 8]. As Pruss and colleagues report in their review [4], the 2013 measles outbreak in the Netherlands, for example, has been analyzed in this perspective by Mollema and colleagues [9], who compared the number of tweets (and other messages posted on social media) with the number of online news articles as well as with the number of reported measles cases and found a strong correlation between social media messages and news articles and a mild correlation between number of tweets and number of reported measles cases. Moreover, through a topic analysis and a sentiment analysis of the tweets, they found that the most common opinion expressed in the tweets was frustration regarding people who do not vaccinate because of religious reasons (the measles outbreak in the Netherlands began among Orthodox Protestants who often refuse vaccination for religious reasons).

The 2014 Ebola outbreak in Africa was also used as a case study to mine the attitudes, concerns and opinions of the public, expressed on Twitter. For example, Lazard and colleagues [10] analyzed user-generated tweets to understand what were the main topics that concerned the American public, when widespread panic ensued on US soil after one case of Ebola was detected. The authors found that the main topics of concern for the American public were the symptoms and lifespan of the virus, the disease transfer and contraction, whether it was safe to travel, and how they could protect their body from the disease. In relation to the same outbreak, Tran and Lee [11] built Ebola-related information propagation models to mine the Ebola related tweets and the information encoded therein, focusing on the distribution over six topics, broadly defined as: 1. Ebola cases in the US, 2. Ebola outbreak in the world, 3. fear and prayer, 4. Ebola spread and warning, 5. jokes, swearing and disapproval of jokes and 6. impact of Ebola on daily life. The authors found that the second topic had the lowest focus, while the fifth and sixth had the highest.

More recently, tweets have been mined to understand the discussion around the Zika epidemics. Miller and colleagues [12] used a combination of natural language processing and machine learning techniques to determine the distribution of topics in relation to four characteristics of Zika: symptoms, transmission, prevention, and treatment. The authors managed to outline the most persistent concerns or misconceptions regarding the Zika virus, and provided a complex map of topics emerged from the tweets posted within each of the four categories. For example, in relation to the issue of prevention they observed the emergence of the following topics: need for control and prevention of spread, need for money, ways to prevent spread, bill to get funds, and research. Vijaykumar and colleagues [13] analyzed how content related to Zika disease spreads on Twitter, thanks to tweet amplifiers and retweets. The authors found that, of the 12 themes taken into account, Zika transmission was the most frequently talked about on Twitter. Finally, Pruss and colleagues [4] mined a corpus of tweets in three different languages (Spanish, Portuguese and English) with a multilingual topic model and identified key topics of discussion across the languages. The authors reported that the Zika outbreak was discussed differently around the world, and the topics identified were distributed in different ways across the three languages.

In cognitive linguistics, and in particular in metaphor studies, public discourse is often analyzed in relation to different figurative and literal communicative frames. We “frame” a topic when we “select some aspects of a perceived reality and make them more salient in a communicating text, in such a way as to promote a particular problem definition, causal interpretation, moral evaluation, and/or treatment recommendation for the item described” [14, p.53]. Metaphors are often used to talk about different aspects of diseases, such as their treatment, their outbreak and their symptoms. The framing power of metaphor is particularly relevant in health-related discourse, because it has been shown that it can impact patients’ general well-being. For example, in a seminal study Sontag [15] criticized the popular use of war metaphors to talk about cancer, a topic of research recently investigated also by Semino and colleagues [16]. As these authors explain, the military metaphor that we tend to use to talk about the development, spreading and cure of cancer inside the human body has been repeatedly rejected by cancer patients as well as by many relatives and doctors, who indicate that such framing provokes anxiety and a sense of helplessness that can have negative implications for cancer patients. In a series of experiments, for example, Hendricks and colleagues [17] found that framing a person’s cancer situation within the war metaphor, and therefore as a battle, has the consequence of making people believe that the patient may feel guilty in the case that the treatment does not succeed. Conversely, framing the cancer situation as a journey encourages the inference that the patient will experience less anxiety about her health condition.

The military metaphor commonly used to talk about diseases such as cancer is a very common one to be found in public discourse [1]. According to Karlberg and Buell [18] 17% of all articles in Time Magazine published between 1981 and 2000, contained at least one war metaphor. The war metaphor is not used solely to frame the discourse around diseases, but also the discussion around political campaigns, crime, drugs and poverty. As explained in [1], war metaphors are pervasive in public discourse and span a wide range of topics because they provide a very effective structural framework for communicating and thinking about abstract and complex topics. Moreover, this frame is characterized by a strong negative emotional valence. In the special case of the diseases, the war metaphor is typically used to frame the situation relatively to the treatment of the disease. As indicated in MetaNet, the Berkeley-based structured repository of conceptual metaphors and frames [19], the metaphor can be formalized as DISEASE TREATMENT IS WAR, or TREATING DISEASE IS WAGING WAR (https://metaphor.icsi.berkeley.edu/pub/en/index.php/Metaphor:DISEASE_TREATMENT_IS_WAR). Within this metaphor, a variety of mappings can be identified, including: the diseased cells are enemy combatants, medical professionals are the army of allies, the body is the battlefield, medical tools are weapons, and applying a treatment is fighting.

The figurative frame of WAR, used in discourses around diseases, is certainly a conventional one, frequently used, often unconsciously. As argued by [1], such a frame is handy and frequently used because it draws on basic knowledge that everyone has, even though for most people this is not knowledge coming from first-hand experience. Moreover, this frame expressed in an exemplary way the urgency associated with a very negative situation, and the necessity for actions to be taken, in order to achieve a final outcome quickly. The outcome can be either positive or negative, in a rather categorical way. The inner structure of the frame is also relatively simple, with opposing forces clearly labelled as in-groups and out-groups, or allies and enemies. Each force has a strategy to achieve a goal, which involves risks and can potentially be lethal. For these reasons, this frame is arguably very well suited to appear in the discourse around Covid-19, as previously observed in relation to other diseases. The adversarial relationship between doctors and the virus, the different goals afforded by the two forces and the human body as the battlefield for this operation, are possible mappings that we seek to trace down, with our analysis on Covid-19 related tweets.

Despite the undebatable frequency by which public discourse around diseases uses war metaphors, this frame is sometimes not well received, as mentioned above, and war-related metaphors can be opposed for various reasons. In the last weeks an increasingly large amount of blog posts and articles for the large public confronted and opposed the use of military language to talk about the pandemic, providing different arguments that range from the blindness that war metaphors generate toward alternative ways to solve problems, to the rise of xenophobia and the increase of fear and anxiety in the population that these metaphors generate. For example, [20] argued that “to adopt a wartime mentality is fundamentally to allow for an all-bets-are-off, anything-goes approach to emerging victorious. And while there may very well be a time for slapdash tactics in the course of weaponized encounters on the physical battlefield, this is never how one should endeavor to practice medicine.” [21] claimed that “using a war narrative to talk about COVID-19 plays into the hands of white supremacist groups. U.S. officials and the media should stop it.” [22] explained that using a WAR frame breeds fear and anxiety, divides communities, compromises democracies and may legitimize the use of actual military actions. [23] foreshadowed “shifts towards dangerous authoritarian power-grabs, as in Hungary, where Prime Minister Viktor Orbán seized wide-ranging emergency powers and the ability to rule by decree”, as a consequence of war-related language in the current situation.

As we will further elaborate in the Discussion section, in some cases the press opposes deliberately the war frame, advancing alternative figurative frames. Tracking down alternative frames to the war one in a qualitative manner has been a recent endeavor initiated by scholars in cognitive linguistics and corpus analysis on Twitter. The hashtag #ReframeCovid (first proposed, to the best of our knowledge, by two Spanish scholars, Inés Olza and Paula Sobrino) has been recently used to harvest texts such as articles, advertisements and notes showing how the virus has been opposed and framed in alternative ways by a few journalists and writers. Notably, the discourse has been reframed using lexical units related to the domain of FOOTBALL, of GAMES, of STORMS and so forth. In this paper, we explored the structure and functioning of alternative frames too, in a corpus-based analysis of tweets about Covid-19, and compared them to the WAR frame as well as a literal frame, that is the FAMILY frame. In this case, it should be mentioned that although FAMILY may be used as a metaphorical frame to talk, for example, about nations (“founding fathers”, “daughters of the American revolution”, “sending our sons to war” and so forth, see [24] for an extensive discussion), in the discourse around Covid-19, family-related words are typically used in their literal meaning, to talk for example about family members affected by the virus and family-dynamics being disrupted by the measures taken in response to the pandemic (e.g., the lock down).

Study design

To address our three research questions, first we explored the range of topics addressed in the discourse on Covid-19 on Twitter using a topic modelling technique. Consequently, we explored the actual usage of the WAR frame, and explored which topics (among the topics identified in the first part of the study) are more frequently framed within the metaphor of WAR. To do so, we compiled a list of war-related lexical units and ran it against our corpus of Covid-19 tweets, observing and discussing which lexical units of each frame were used in the tweets, and within which topics. Finally, we explored alternative frames that could be used to frame the discourse around Covid-19 on Twitter. To do so, we compiled lists of lexical units for selected alternative frames (three figurative frames and one literal one) and compared the percentages by which they appear to be used in the corpus of tweets, against the percentages by which the WAR frame is used. To conclude, we replicated our analysis on a new corpus of tweets collected in the weeks that followed the collection of the first corpus, following the same criteria, as well as on an existing resource “Coronavirus Tweets Dataset” by Lamsal [25], which became available during the revision process. Lamsal’s dataset is a constantly updated repository of twitter IDs. The collection of those IDs is based on English tweets that include 90+ hashtags and keywords that are commonly used while referencing the pandemic.

Constructing the corpus of Covid-19 tweets

In order to identify tweets that relate to the Covid-19 epidemic, we defined a set of relevant hashtags used to talk about the virus: #covid19, #coronavirus, #ncov2019, #2019ncov, #nCoV, #nCoV2019, #2019nCoV, #COVID19. Using Twitter's official API in combination with the Tweepy python library (tweepy.org) for 14 days we collected 25.000 tweets per day that contain at least one of the hashtags and no retweets. The tweets were collected in accordance with the Twitter terms of service. Two main restrictions of those terms and service motivated our decision to limit the extent of our corpus: Firstly, the free streaming API only allows access up to one week of Twitter’s history. Secondly, there is a limit of 180 requests per 15-minutes.

To balance our corpus, we needed to consider how a single tweet or a single user weights on the overall corpus. For example, a scientific analysis of fake news spread during the 2016 US presidential election showed that about 1% of users accounted for 80% of fake news and report that other research suggests that 80% of all tweets can be linked to the top 10% of most tweeting users [26]. In other words, there are Twitter users who tweet a lot, and Twitter users who tweet seldomly. This may be problematic when looking for the frequency distribution of word uses. For example, a specific Twitter user who is very fond of sci-fi issues might use the MONSTER framing very frequently in their tweets, also when tweeting about Covid-19. If we kept all the tweets by this user, this might have biased the frequency distribution of the MONSTER related words. For the purpose of our study, we were interested in exploring the relative uses of different frames in the discourse around Covid-19 on Twitter, rather than in the absolute percentages of use of the different frames in Twitter. Therefore, we constructed our corpus to be representative and balanced, as well as manageable from a computational perspective. To do so, we retained only one tweet per user and dropped retweets. Keeping only one tweet per user allowed us to balance compulsive tweeters and less involved Twitter users. Table 1 accumulates for each day how many tweets have been collected and the number of filtered tweets after discarding tweets from users that have already tweeted. This implies that only the first contribution of each user was retained: we have kept the first tweet on that day by users whose tweets we have not collected yet. Consequently, all 203,756 tweets are from unique tweeters. For example, Table 1 data column 3 shows that on 22.03.2020, we collected 75,000 tweets in total and from those 57,073 were from unique tweeters.

Table 1. Dates of collection for tweets containing hashtags related to the Covid-19 epidemic.

Date 20.03.2020 21.03.2020 22.03.2020 23.03.2020 24.03.2020 25.03.2020 26.03.2020
Filtered / Collected 20,316/25,000 39,284/50,000 57,073/75,000 73,346/100,000 89,785/125,000 103,614/150,000 118,866/175,000
Date 27.03.2020 28.03.2020 29.03.2020 30.03.2020 31.03.2020 01.04.2020 02.04.2020
Filtered / Collected 132,995/200,000 146,654/225,000 156,775/250,000 167,847/275,000 180,234/300,000 191,278/325,000 203,756/350,000

The filtered tweets are single tweets per user, the total number of collected tweets is 25k per day.

As the streaming API starts collecting tweets from 23:59 CET of each day and has been limited to English, our corpus encompasses mainly tweets produced by users residing in the USA, where the time of data collection corresponds to awake hours, and the targeted language corresponds to the first language of most US residents, according to the American Community Survey (ACS). The total number of collected tweets from unique tweeters over 14 days (20.03.2020 to 02.04.2020) is 203,756. This resulted in 41.78% of the collected tweets being tweets of a user tweeting more than once, thus being filtered out.

Given our research questions and our aims, we did not analyze the dynamics involved in retweetings and mentions on Twitter and neither did we provide an analysis of usernames, hashtags or URLs.

In compliance with the privacy rights of Twitter [27], we have only collected the tweet along with a timestamp. In order to comply with Twitter's content redistribution policy, it is not possible to make any information other than the Tweet IDs publicly available. We have therefore stored our data as tweet IDs. This dataset is publicly available in the online repository on OSF, retrievable at the following url: https://osf.io/bj5a6/?view_only=1644595a66dd4adebeeb6b2bb0449c89.

General corpus analytics

The corpus encompassed 203,756 tweets, in which the 30 most common words, excluding stopwords and online tags (e.g. “&amp”, “https”) are:

people (19153), us (13368), get (11270), like (10451), time (10263), help (10091), need (9993), cases (9205), home (9044), stay (8788), new (8752), one (8725), friends (8465), please (8232), pandemic (7614), support (7255), know (6931), going (6788), realdonaldtrump (6659), times (6462), world (6451), health (6449), day (6153), family (6010), go (5986), trump (5967), work (5862), would (5705), today (5602), take (5532)

In this list, the number in brackets represents the frequency of occurrence, also visualized in Fig 1 (in which greater size of the word indicates greater occurrence in the corpus). In this word cloud, the larger the word print, the more frequent its occurrence in the corpus. The most frequent word is “people” with about 19k occurrences, followed by “us” with about 13k occurrences. It should be mentioned that we cannot distinguish whether “us” means “United States” or the pronoun “us”.

Fig 1. Word cloud of the most common words in the corpus of over 200k collected tweets with at least one hashtag relating to the covid19 epidemic.

Fig 1

What type of topics are discussed on Twitter, in relation to Covid-19?

Identifying topics in Covid-19 discourse on Twitter through topic modeling

A topic model is a generative statistical model for discovering “topics” that occur in a collection of documents. The topics identified by topic modelling are based on the occurrences of semantically related words. For example, “ball”, “strike”, “bat”, “catcher”, “hitter” and “diamond” and “fastball” are likely to appear in documents about baseball. A document typically concerns multiple topics in different proportions, and such proportions are reflected in the probability of the words related to each topic. The topics produced by topic modeling techniques are therefore probability clusters of related words, which are unlabeled (the model returns the cluster of words, not the name of such cluster and thus of the topic). Topics therefore need to be interpreted and labeled by the analysts.

Conversely, in communication sciences a frame is typically defined as consisting of two elements [28]: elements in a text such as words, used as framing devices, and (latent) information used as reasoning devices, through which a problem, cause, evaluation, and/or treatment is implied. A topic operationalized by topic modelling, in this sense, corresponds to the first of these two components of a frame, that is, a list of semantically coherent words used as framing devices).

In order to extract and identify topics from the corpus, we used a Latent Dirichlet Allocation algorithm (henceforth, LDA) [29]. LDA is an unsupervised machine learning algorithm that aims to describe samples of data in terms of heterogeneous categories. It is mostly used to identify categories in documents of text and thus appropriate to identify topics within the Covid-19 corpus of tweets. The study reported by Pruss and colleagues on the corpus of tweets related to the Zika epidemics [4], for example, used the same algorithm to identify topics within the corpus. For the purpose of our study we used the Gensim LDA-Multicore algorithm, which allowed us to parallelize the training of our data on multiple CPUs. As an unsupervised learner, LDA needs to be given the number of topics that it will try to divide the data into. Our exploratory approach included the search space for several different amounts of topics, thus varying in the level of granularity represented within each topic. We hereby reported the results obtained from the division of the data into a relatively small number of topics (N = 4) and a relatively large number of topics (N = 16), to show and compare a less granular and a more granular division of the data. We expected to find broader and more generic concepts listed in the first analysis (N = 4) and more specific concepts in the fine-grained topic analysis (N = 16). These two numbers of clusters were chosen by investigating the data and are backed up by our post-hoc analysis of the LDA coherence measures. The preprocessing phase encompassed the six following steps:

  • converting each tweet into a list of tokens (using Gensim’s simple_preprocess function)

  • removing tokens with less than 3 characters (e.g. “aa”, “fo”, “#o”)

  • removing stopwords from the list of tokens (including updated stopwords from Stone et al. [30] and twitter specific stopwords.)

  • removing Covid-19 words from the list of tokens (e.g. “covid”, “nCov”, “coronavirus” etc)

  • turning the tokens into a bag-of-words, i.e. a list of tuples with the token and its number of occurrences in the corpus

We excluded terms like “coronavirus”, “covid”, “corona”, “virus” or “nCov19” from the topic modeling because these do not add information about the topics themselves. The preprocessing resulted in a list of 415,329 tokens, that is, inflected word forms. We did not lemmatize the corpus, nor pos-tagged it for the purpose of our study, because different forms of a lemma can express different metaphor scenarios and therefore, they shall be preserved. Hence, for example, gerundive forms of verbs, as well as plural forms of nouns are present in the corpus, and the list of frame-related words is also composed by inflected word forms and not simple lemmas. Additionally, we trained another LDA model with a tf-idf (term frequency-inverse document frequency) version of the tokens. The tf-idf assigns a statistical relevance to each token based on how many times the token occurs and the inverse document frequency (a measure of whether the token is rare or common in the corpus) of that token. As its results did not add any further insight to our research, we included it in the online repository but do not discuss this model further in the current paper.

Topic model analysis

Dividing the corpus into four topics through the LDA, we obtained a list of words for each topic and the weightage (importance). Fig 2 shows the word clouds with greater words signaling greater significance. Except for topic #II, all of the other topics included the word “pandemic” among their most important words and show a strong overlap. The weights (importance) and words for each topic allocated by the LDA model with N = 4 topics were the following:

Fig 2. Word clouds form N = 4 LDA topic modeling with greater words signaling greater significance.

Fig 2

LDA (N = 4, 6 passes):

  • Topic #I: 0.008 pandemic, 0.005 news, 0.004 data, 0.004 update, 0.004 world, 0.004 youtube, 0.003 information, 0.003 latest, 0.003 today, 0.003 april

  • Topic #II: 0.029 times, 0.028 friends, 0.027 family, 0.022 share, 0.021 italy, 0.021 trying, 0.014 support, 0.014 sign, 0.013 stand, 0.011 colleagues

  • Topic #III: 0.014 people, 0.013 cases, 0.011 trump, 0.009 realdonaldtrump, 0.008 like, 0.006 china, 0.006 world, 0.005 deaths, 0.005 pandemic, 0.004 going

  • Topic #IV: 0.010 home, 0.008 help, 0.008 stay, 0.007 people, 0.007 time, 0.007 need, 0.007 pandemic, 0.006 health, 0.006 work, 0.005 safe

The results for 16 topics, displayed in Fig 3, showed a much greater diversity among the classes.

Fig 3. Depiction of the word clouds for each of the 16 topics clustered by the LDA.

Fig 3

Discussion

As previously mentioned, the LDA algorithm does not provide labels for the topics. The interpretation of the topics is left to the analysts. The topics identified by LDA analyzed above and visualized in Fig 2 can be labeled as follows:

  • Topic #I: Communications and Reporting.

  • Topic #II: Community and Social Compassion.

  • Topic #III: Politics.

  • Topic #IV: Reacting to the epidemic.

The sixteen topic LDA model provided a more fine-grained view of topics that could be related to the 4 general topics. In the field of Communication and Reporting, we observed finer distinctions in topics #4, #11 and some in #15. Topic #11, in particular, is more focused on “World”, “Trump” and “China”, while topic #4 specifically encompasses “News”, “Lockdown”, “Press”, and “Media”. In the domain of Community and Social Compassion, topic #3 is very close to topic #II. Whereas, topic #13, #16, #5 relate to topics around the quarantine, self-isolation and in general Reacting to the Epidemic (#IV).

There are also some novel topics around treatment and medical needs (#1, #6, #7), around testing (#10) or working/studying from home (#2, #9 and parts of #12). Rather unrelated to the whole epidemic, a conglomerate of words can be found in topic #8 and parts in #12.

Finally, we provide an interactive online tool to explore the results of the LDA models for the 4 topic LDA model (https://bit.ly/3dCczfr) and for the 16 topic LDA model (https://bit.ly/3gUx5tU). These renderings have been produced using pyLDAvis [31].

To what extent is the WAR figurative frame used to talk about Covid-19 on Twitter?

Determining lexical units associated with the WAR frame

To investigate to what extent users use the WAR frame to talk about Covid-19, we needed to assess the amount of tweets that use war language in our corpus. To explore the lexical units associated with the WAR frame we took a double approach, using two tools. The first tool was the web-service relatedwords.org. This web-service provides a list of words (inflected word forms, not lemmas) related to a target word. This list is ranked through competing scores by several algorithms, one of which finds similar words in a word embedding [32] and another one queries ConceptNet [33] to find words with meaningful relationships to the target word. Choi and Lee [34] used the same web-service to expand the list of categories used to model conceptual representations for crisis-related tweets [34]. The list of words retrieved on relatedwords was adapted to our purpose. As a matter of fact, the list featured words such as “franco-prussian war” or “aggression”. The former is a specific type of war and it includes the term “war” itself. We dropped any kind of specific war or terms that include a compound of war, e.g. “state of war”. The latter term “aggression” is too broad and not closely related to the target word. Additionally, in case of doubt, we checked the term in an online dictionary to verify its relation to the war framing. The second tool used to prepare the list of lexical units related to the WAR frame was the MetaNet repository of conceptual metaphors and frames housed at the International Computer Science Institute in Berkeley, California [18]. Here, from the WAR frame (https://metaphor.icsi.berkeley.edu/pub/en/index.php/Frame:War) we selected the 12 words that were not yet included in the selection of lexical units based on relatedwords. Moreover, we dropped compound units that included words that we had already included in the list (e.g., “combat zone”, because we featured already the word “combat”) and two mis-spelled units (“seige” instead of “siege” and “beseige” instead of “besiege”). The total number of lexical units for the WAR framing was 91:

WAR (91): allied, allies, armed, armies, army, attack, attacks, battle, battlefield, battleground, battles, belligerent, bloodshed, bomb, captured, casualties, combat, combatant, combative, conflict, conquer, conquering, conquest, crusade, defeat, defend, defenses, destruction, disarmament, enemies, enemy, escalation, fight, fighter, fighting, foe, fortify, fought, grenade, guerrilla, gunfight, holocaust, homeland, hostilities, hostility, insurgency, invaded, invader, invaders, invasion, liberation, military, peace, peacetime, raider, rebellion, resist, resistance, riot, siege, soldier, soldiers, struggle, tank, threat, treaty, trench, trenches, troops, uprising, victory, violence, war, warfare, warrior, wars, wartime, warzone, weapon, alliance, ally, arsenal, blitzkrieg, bombard, front, line, minefield, troop, vanquish, vanquishment.

A methodological clarification shall be made explicit regarding the identification of lexical units used metaphorically in our corpus. In cognitive linguistics and metaphor studies, the procedure usually adopted for the reliable identification of words used metaphorically in linguistic corpora is MIPVU [37]. This procedure is applied manually, by multiple annotators, in content analyses where analysts make decisions about the metaphoricity of each lexical unit encountered in the text, based on information retrieved from dictionaries. Despite the high reliability of this method, because it is performed manually, it cannot be applied to large corpora, like the corpus of tweets on which the current study is based. For this reason, we opted for the following procedure: we assumed that war-related lexical entries would be used metaphorically rather than literally, within tweets about Covid-19, and we qualitatively confirmed our intuition by manually looking at a subsample of tweets. We acknowledge the fact that in the tweets that we haven’t manually checked it could be the case that some war-related words were used literally, for example to talk about soldiers getting infected with Covid-19 while being in service. However, we believe that this phenomenon may characterize a negligible amount of tweets, compared to the number of tweets in which war-related lexical entries are used metaphorically. Nonetheless, we acknowledge this limitation to our approach and we wish further studies to account for this possibility, and possibly to compare the metaphorical use to the literal use of war-related terms in Covid-19 discourse.

In order to understand where in relation to our predicted LDA topics the WAR frame was located, we collected all tweets that mentioned at least one term of the WAR frame and asked the LDA model to predict its topic. This way, we could identify the topics with the most or the least terms related to WAR.

WAR framing results

Analyzing all tweets from the database, a total of 10,846 tweets contained at least one term from the WAR framing, which is 5.32% of all tweets. Of these, 1,253 tweets had more than one war-related term. The 20 most common war terms found in our database are hereby reported with relative percentage to all war terms and number of occurrences:

WAR: fight (29.76%, 3228), fighting (10.65%, 1155), war (10.08%, 1093), combat (5.89%, 639), threat (5.13%, 556), battle (4.19%, 454), front line (3.82%, 414), military (3.61%, 392), peace (3.43%, 372), attack (2.95%, 320), enemy (2.61%, 283), defeat (2.51%, 273), violence (2.12%, 230), attacks (1.44%, 156), struggle (1,34%, 145), resist (1.23%, 133), soldiers (1.23%, 133), weapon (1,20%, 130), victory (0.95%, 103), wars (0.95%, 103)

Words that were virtually absent (or had very limited usage) in the context of Covid-19 on Twitter were: combatant (2x), combative (2x), disarmament (2x), gunfight (2x), invader (2x), treaty (2x), bombard (2x), minefield (2x), belligerent (1x), guerilla (1x), insurgency (1x), vanquish (1x), conquest (0x), blitzkrieg (0x), vanquishment (0x).

LDA topic prediction of WAR tweets

The LDA model can predict the probability that a document belongs to a certain topic of the corpus. We therefore used this prediction method to investigate what topics are relevant for those tweets that feature WAR terms. For this, we tokenized all tweets that contain at least one WAR term and used both of our LDA models to suggest which of the four and sixteen topics those WAR related tweets most likely belong to. For the four-ways topic model we report a distribution in Fig 4.

Fig 4. LDA-predicted average probability of WAR term contributing to one of 4 topics.

Fig 4

For the sixteen-ways topic model, we report a distribution in Fig 5. This image shows that lexical units belonging to the WAR domain and therefore tweets that relate to the WAR frame are most likely to be found in tweets that belong to topics IV and I, and partly III (in the macro distinction of topics) and in tweets that belong to topics 2, 7 and 10 in the fine-grained distinction.

Fig 5. LDA-predicted average probability of a WAR term contributing to one of 16 topics.

Fig 5

Discussion

The results show that 5.32% of all tweets contain war-related terms and are therefore likely to frame the discourse around Covid-19 metaphorically, in terms of a literal war. While it is hard to evaluate in absolute terms the impact that this frame has on the overall discourse around Covid-19, we show in the next sections how the WAR frame compares to the usage of 3 other figurative frames, as well as to a literal frame.

The specific words within the WAR frame that appeared to be used in the tweets were “fight” “fighting”, the very same word “war”, “combat”, “threat”, and “battle”. All these words carry a very negative valence and denote aspects of the war that relate to actions and events. This is probably due to the stage of the pandemic that we are in, that is, the emergency situation, and the related urgent need to take action and confront the negative situation. We cannot exclude that this tendency may change, once the pandemic moves into a different stage. In particular, it could be the case that when the emergency has passed, and we will move toward the next phase, in which we will leave the peak of the death and infection rates, the most frequent words used in relation to the WAR frame might relate to the identification of strategies to keep ourselves safe and to defend our community from potential new attacks.

In relation to the topic modelling of the war-related tweets, we showed that tweets that feature war-related terms are most likely to belong to topics IV, I and III, rather than to topic II. Interestingly, topic IV addresses aspects related to the reactions to the epidemic, including the measures proposed by the governments and taken by the people, such as self-isolating, staying at home, protecting our bodies and so forth. Our analysis therefore suggests that using war-related words is a communicative phenomenon that we use to express aspects of the Covid-19 epidemic related to the measures needed to oppose (fight!) the virus. Moreover, tweets that feature war-related words are also often classified within the topics I and III, which include the aspects related to communications and reports about the virus, and politics. We interpret these results arguing that public communications and political messages are likely to frame the discourse in the WAR framing. Finally, it might not come as a surprise the fact that topic II, which encompasses aspects of the discourse related to the familiar sphere, the community and the social compassion, does not relate well with the tweets containing war terms.

The fine-grained analysis into 16 topics shows some interesting trends as well. In particular, tweets containing war-related terms are particularly well represented in topics 2, 7, and 10. Topic 2 seems to relate to online learning and education, Topic 7 encompasses aspects related to the treatment of the virus, with words such as “workers”, “health”, “care”, “help”, “thank”, “need”, “support”. Similarly, topic 10 relates to the diagnostics and treatment of the virus, with words such as “positive”, “death”, “cases”, “tested”, “people”, “confirmed”. Therefore, as the MetaNet WAR metaphor suggests, and as we described in the Theoretical Background of this paper, it is the discourse around the disease treatment and its diagnostics that are likely to be framed figuratively in terms of a war. Conversely, topic 3, which is characterized by words like “friends”, “share”, “trying”, “family”, “time” and therefore addresses intimate social relations and personal affective aspects related to Covid-19, is not related to the WAR frame: tweets addressing these aspects do not employ military lexical units.

We note that an LDA model uses randomness in its training and inference, therefore training a new model with the same parameters will always yield slightly different topic distributions and there are different ways and limitations to analyze those [35].

Are there alternative figurative frames used to talk about Covid-19 on Twitter?

Search method for alternative framings and relevant lexical units therein

In order to identify whether or not the war framing is particularly relevant, we explored alternative framings used in discourses on viruses. For this purpose we used the metaphor exploration web services by [36], called MetaphorMagnet (http://bonnat.ucd.ie/metaphor-magnet-acl). Using the keywords “virus” and “epidemic” we selected the following alternative frames, which could be in principle used to frame the discourse around Covid-19: STORM, MONSTER and TSUNAMI. These figurative frames have been reported also within the crowdsourced observations collected by the #ReframeCovid initiative on Twitter. Other possible figurative frames are reported within this initiative too. These are, for example, GAME and the sub-frame SOCCER GAME, used in the Spanish press according to the community of Spanish cognitive linguists. However, lexical units such as “game”, “football”, “soccer”, “game season”, and so on, are likely to be used literally in the tweets, to refer to the fact that all sport events and thus all games have been suspended, due to the epidemic. Another frame that has been observed in the press and tagged as #ReframeCovid is the FLOOD frame. However, through a quick search on Metaphor Magnet, we realized that this frame has too many shared lexical units with STORM and TSUNAMI, and was therefore discarded. Moreover, we observed that the wordlists of the frames STORM and TSUNAMI contain shared words. However, dictionary definitions of these two terms suggest that the two phenomena are quite different. For example, the MacMillian online dictionary defines STORM as an occasion when a lot of rain falls very quickly, often with very strong wind or thunder and lightning. Conversely, TSUNAMI is defined as a very large wave or series of waves caused when something such as an earthquake moves a large quantity of water in the sea. Because the two concepts denote different phenomena, the fact that they share a few words does not constitute a redundancy.

In order to select the lexical units within each of the alternative frames, we used the tool relatedwords, already used for the WAR frame, for consistency. However, because these alternative frames are arguably less conventionalized, none of them is included in the list of frames on MetaNet. Thus, relatedwords was the only tool we used to harvest lexical units for the alternative frames. We created three list of lexical units:

STORM (57): thunderstorm, rain, lightning, snowstorm, blizzard, wind, hurricane, weather, rainstorm, typhoon, tempest, precipitation, beaufort, snow, cyclone, meteorology, hail, hailstorm, windstorm, flooding, thunder, tornado, monsoon, rainfall, rage, force, disaster, ice, storm, atmospheric, disturbance, wildfire, clouds, firestorm, ramp, tornadoes, fog, winds, rains, waves, landfall, thunderhead, duststorm, tides, gusts, floodwaters, wave, cloud, swells, cloudburst, anticyclone, downpour, sandstorm, stormy, whirlwinds, storms, oceanographic.

MONSTER (51): freak, demon, devil, giant, ogre, fiend, zombie, frankenstein, bogeyman, werewolf, horror, mutant, creature, dragon, superhero, goliath, behemoth, monstrosity, colossus, legend, evil, lusus, naturae, mouse, beast, boogeyman, leviathan, dracula, monstrous, teratology, villain, killer, ghost, gigantic, siren, superman, vampire, undead, psycho, monster, chimera, godzilla, fiction, mythology, mutation, demoniac, manatee, mermaid, monsters, spider, bug.

TSUNAMI (50): earthquake, disaster, tide, oceans, calamity, catastrophe, tragedy, wavelength, wind, period, cataclysm, flood, eruption, tidal, seiche, quake, thucydides, floods, floodwater, cyclone, devastation, ocean, surface, wave, coastlines, typhoon, waves, hurricane, magnitude, aftershock, mudslide, seafloor, richter, seawall, seismic, landslide, tsunamis, aftershocks, flooding, torrential, earthquakes, deepwater, triggering, tsunami, tremors, mudslides, riptide, rains, whirlpool, pacific.

As for the WAR frame, we ran these lists against our corpus and compared the frequency of occurrences within the corpus across the different framings.

The literal frame of FAMILY used as control

To evaluate the relevance of the figurative frames in the corpus of tweets, we compared the occurrence of the lexical units listed therein with those listed within a frame that we expected to occur in the literal sense: the FAMILY frame. The word list of lexical entries related to this frame encompasses the following words:

FAMILY (66): marriage, household, kin, house, kinfolk, home, lineage, kinship, parent, relative, clan, cousin, children, child, sister, mother, father, uncle, nephew, brother, grandson, son, grandfather, grandmother, kinsfolk, ancestor, consanguinity, tribe, sibling, subfamily, kindred, stepfamily, couple, family, sib, foster, parentage, menage, phratry, folk, daughter, kinsperson, aunt, grandma, granddaughter, grandaunt, stepbrother, niece, stepson, dad, stepdaughter, stepfather, wife, husband, daddy, parents, elder, daughters, mom, siblings, stepmother, grandpa, grandparents, relatives, widow, spouse.

Alternative framing results

The terms belonging to the frame STORM were found in 3,036 tweets (1.49% of all tweets). The terms in the MONSTER frame were found in 1,382 tweets (0.68% of all tweets). The terms in the TSUNAMI frame were found in 2,304 tweets (1.13% of all tweets). The terms in the literal frame (FAMILY) were found in 24,568 tweets (12.06% of all tweets). The difference between the frequency of occurrence of the frames, and in particular of the sets of words related to each frame, is statistically significant (Cochran's Q test statistic = 47,226.72, df = 4, p < 0.001). We then looked at the distribution of the frequencies by which the terms within each framing were used and observed that they all tended to follow Zipf distributions (see Fig 6, where the term “fight” from the WAR frame has more than 3,000 occurrences in the tweets). In other words, within each frame there were few words used very frequently, but many words were rarely used. Moreover, although this is not visible on the plot in Fig 6, in the online repository we stored the full list of lexical units within each frame. Among the top ranked ones for the FAMILY frame we found “home”, “family”, “house”, “children”, “parents”, “wife”, “son”, and “mom”. For the STORM frame among the most frequently used lexical units we found “force”, “disaster”, “weather”, “ice”, “wave”, “storm”, “cloud”, and “rain”. For the MONSTER frame among the most frequently used lexical units we find “evil”, “horror”, “killer”, “giant”, “monster”, “legend”, “ghost”, “zombie”, “devil”, “fiction”, “bug” and “beast”. Finally, for the TSUNAMI frame among the most frequently used lexical units we found “period”, “disaster”, “wave”, “tragedy”, “catastrophe”, and “waves”.

Fig 6. Five histograms depicting the occurrences of terms for each frame within the corpus.

Fig 6

Also, the total number of words within each frame was different, with the WAR frame featuring more words than the other figurative frames. In order to compare how frequently the WAR frame was used in Covid-19 discourse, compared to other possible figurative frames (and a literal frame), it was necessary to have lists of lexical units of the same length because longer lists could have yielded larger numbers of tweets in the corpus than shorter lists, in principle. Therefore, we decided to evaluate two subsets of the term lists for each framing, setting two cutoffs at N = 30 and N = 50 terms on each list. In this way, we only considered the top 30 and then 50 most relevant (i.e., most frequently used) terms within each frame. We then compared the number of tweets featuring words from these lists, which were now comparable in length.

Table 2 reports the number of tweets featuring at least one lexical unit related to a frame, and the general percentage of tweets in the corpus that can be related to these frames. Results showed that the literal frame FAMILY is substantially more frequently used in the discourse on Covid-19 than the figurative frames. However, among figurative frames, the WAR frame covered a higher portion of the tweets in our corpus than the other figurative frames. The table also shows that there is no substantial difference between the coverages of the corpus obtained using the 30 words and the 50 words lists of lexical units for each frame.

Table 2. Proportions of tweets that contain at least one of the terms from each of the frames with term list size N = 30 and N = 50.

Frame # of tweets with at least 1 word from 30-item list Percentage of tweets over the whole corpus (30 terms) # of tweets with at least 1 word from 50-item list Percentage of tweets over the whole corpus (50 terms) Total Tweets
WAR 10,107 4.96% 10,704 5.25% 203,756
FAMILY 24,269 11.91% 24,563 12.06% 203,756
STORM 3,017 1.48% 3,035 1.49% 203,756
MONSTER 1,348 0.66% 1,382 0.68% 203,756
TSUNAMI 2,217 1.09% 2,304 1.13% 203,756

Replication studies

Given the timeliness of this study, our first analysis was based on a corpus of tweets covering 2-weeks’ time. During the submission and review process, more data (more tweets) obviously became available. We therefore replicated our analysis in which different frames are compared, with new data. First, we constructed an additional corpus of tweets like the first one, with tweets produced in the two weeks that followed the timeframe of the first corpus. Second, we replicated our study using an external dataset with more than 1.2 million tweets, which became available during the revision process of the current article. The choice of tweets to be collected from an external dataset has been limited by the factors that define our corpus: no retweets, only unique tweeter’s tweets, English, from 20.03.2020–20.05.2020 and maximum memory limit (due to hardware constraints). The external corpus included 1,213,420 tweets from Lamsal’s Coronavirus Tweets Dataset [25] over two months. Fig 7 presents an overview of the comparison and Table 3 provides the descriptive statistics and results of Cochran Q tests.

Fig 7. Comparison of the two corpora for five frames and two time spans (2 weeks, 2 months).

Fig 7

Each bar indicates the percentage of a frame within the respective corpus.

Table 3. Results of the comparison of the two corpora for five frames and two time spans.

WAR FAMILY STORM MONSTER TSUNAMI Tweets Tot. Cochran Q statistic (df = 4)
W&B 5.32% 12.06% 1.49% 0.68% 1.13% 203,756 Q = 47,226.72
2 weeks p<0.0001
Lamsal 6.94% 8.6% 1.06% 0.70% 1.05% 401,582 Q = 57,159.11
2 weeks p<0.0001
W&B 5.54% 9.95% 1.67% 0.67% 1.25% 654,354 Q = 110,616.87
2 months p<0.0001
Lamsal 6.55% 8.14% 1.24% 0.72% 1.24% 1,213,420 Q = 173,630.43
2 months p<0.0001

As Fig 7 shows, the distribution of the five frames across the first corpus of tweets is very similar to the distributions observed in the replication studies. The increase of >0.22% in the WAR framing from our first corpus (W&B 2 weeks) to the second corpus (W&B 2 months) could be partially explained by new debates entering the discourse, with the development of the epidemic.

The comparison between our data and the data extracted from the “Coronavirus Tweets Dataset” by Lamsal over the same time-frame shows that overall the relative order of proportions is the same as in our analysis: Family > War > Storm + Tsunami + Monster. The differences in these proportions (Family proportion decreased by 3.46%, War increased by 1.62%) can be explained by different keywords that have been used to acquire the Lamsal dataset. This dataset, in fact, has been constructed using keywords that we did not use to construct our dataset, notably the keyword “Corona”. This is arguably a more colloquial expression that we chose to not include in our set of keywords. Moreover, in Lamsal’s dataset keywords change multiple times during the process of data mining, while we keep the same set of keywords, day after day.

Discussion

Our results show that the literal frame used as control (FAMILY) covers a wider portion of the tweets in the corpus while the figurative ones cover substantially less tweets. This is not particularly surprising, as previous literature shows that overall metaphor-related words cover only a percentage of the discourse, and that literal language is still prevalent. Steen and colleagues [37], for example, report that literal language covers 86.4% of the lexical units, while metaphor-related words cover just 13.6% of the lexical units. Their analysis is based on a sub-corpus of the BNC that encompasses 187,570 lexical units extracted from academic texts, conversations, fiction and news texts. All parts of speech are included in their analyses, including function words (such as prepositions and articles). Based on these statistics, we would expect to find around 13% of the lexical units in our corpus to be used metaphorically. This percentage would need to include pervasive metaphorical uses of function words such as prepositions, as well as all words used metaphorically, which can be related to any figurative frame. In this perspective, we believe that the percentage of use of the WAR frame reported in our study suggests that this frame is particularly frequent. In our specific study, based however on a limited number of possible figurative frames, lexical entries related to the WAR frame cover more than one third of all the words attributed to the metaphorical frames hereby investigated.

Within the FAMILY (literal) frame, the top words (i.e., most frequent words) that are used in the tweets denote family members and family relations. Within the STORM frame, words that suggest the most frequently used words seem to denote concrete entities that can be typically observed within a storm scenario. In general, from a qualitative standpoint, it can be observed that the different frames are used to tackle different aspects associated with Covid-19. Words in the STORM and in the TSUNAMI frames seem to relate to events and actions associated with the arrival and spreading of the pandemic (e.g., “wave”, “storm”, “tide”, “tsunami”, “disaster”, “tornado”). Words within the MONSTER framing, instead, are mostly nouns and can be arguably used to frame the discourse about the behavior of the virus, in a rather personified way, which is loaded with emotional content and extremely negative valence (e.g., “devil”, “demon”, “horror”, “monster”, “killer”). This phenomenon, overall, supports the idea that different frames are apt to elaborate the discourse around different aspects, related to a topic, and that therefore multiple frames are more likely to enable the effective description and discussion of different aspects related to the Covid-19 reality.

Finally, the series of replication analyses conducted on new and alternative corpora show that the results hereby reported are similar across corpora and therefore consistent. Small variations between our corpora and the resource provided by Lamsal may be due to the keywords used to construct the datasets. Conversely, differences between the 2-weeks and the 4-weeks corpora may be due to a change in the discourse, reflecting the natural evolvement of the pandemic. In this perspective, future research, which we are currently pursuing, will show in a longitudinal perspective the development of the different topics and figurative frames used in the discourse around Covid-19 week after week.

General discussion and conclusion

In this study we explored the discourse around Covid-19 in its manifestation on Twitter. We addressed three specific research questions: 1. What are the topics around which the Twitter discourse revolves, in relation to Covid-19; 2. To what extent the WAR framing is used to model the Covid-19 discourse on Twitter, and specifically in relation to which topics does this figurative framing emerge; 3. To what extent does the WAR framing compare to other potentially relevant figurative framings related to the discourse on viruses, and to the literal framing FAMILY.

In general, we found that the topics around which most of the Twitter discourse revolves, in relation to Covid-19, can be labelled as Communications and Reporting, Community and Social Compassion; Politics and Reacting to the epidemic. A more fine-grained analysis brings to light topics related to the treatment of the disease, mentioning people involved in this operation such as doctors and nurses, and topics related to the diagnostics of the virus. We also found that these specific topics appear to be those in which the WAR frame is particularly relevant: most lexical units within this frame are found in tweets that get automatically classified within the specific topics of virus treatment and diagnostics. Moreover, in relation to the second research question, we observed that there is a little number of lexical units related to war that are very frequently used, while the majority of war-related words are not used to frame the discourse around Covid-19. The more frequently used words refer to actions and events, such as “fighting”, “fight”, “battle”, and “combat”. As we anticipated, this might be a peculiarity of the stage of the pandemic we are currently living, which is the peak of the emergency. We do not exclude that with the development of the pandemic and the passage to the next phase (i.e., leaving the peak) also the most frequent words used within the WAR frame will change, to exploit new aspects of this frame that are relevant to the new situation. Finally, in relation to the third research question, we compared the frequency by which the WAR frame, the FAMILY literal frame and three other figurative frames are used. We found that while the FAMILY literal frame used as control covers a wider portion of the corpus, among the alternative figurative frames analyzed (MONSTER, STORM and TSUNAMI), the WAR frame is the most frequently used to talk about Covid-19, and thus, arguably, the most conventional one, as previous literature also suggests.

It should be mentioned that the current study is based on a corpus of tweets (and then replicated on other corpora of tweets) that has been constructed on the basis of precise methodological criteria. Notably, we dropped retweets from our corpus, and we retained only one tweet per user, to avoid the bias introduced by super-tweeters, that is, users (sometimes bots) who tweet many times a day and that may have monopolized the sample of tweets used for our analyses. These two operations, which were motivated by methodological requirements, on one hand made our corpus probably more robust, balanced and representative for the phenomena to be investigated, but on the other hand neglected these peculiarities of Twitter. As a matter of fact, Twitter, as a social network typically encompasses retweets (that is, duplicated tweets, which can be retweeted sometimes thousands of times) and it features super-tweeters. Additionally, information propagated by super-tweeters, can “go viral” or gain popularity, measured by likes and retweets. Therefore, a limitation of our study is that our findings may not reflect the actual distribution of topics and figurative frames on Twitter as a social media network per se. Rather, we argue, our findings reflect the way in which a wide selection of American-English speakers conceptualize and talk about Covid-19 on Twitter. In this sense, the approach adopted in the present study is embedded in common practices used in cognitive linguistics, discourse analysis and corpus linguistics. We acknowledge the fact that in scientific fields such as social media monitoring, criteria such as preserving the dynamics that characterize specifically the Twitter platform, such as retweets and super-tweeters, are particularly important. In these fields the construction of the sample of tweets (the corpus) would have been performed differently, to include retweets and without controlling for super-tweeters.

Taken together our results suggest what has been previously argued in discourse analysis, that is the relative pervasiveness of the WAR frame in shaping public discourse. In our study we show that this tendency applies also to the discourse on Covid-19, as previous literature would have predicted, given the frequent use of this frame in discourses on diseases and viruses. However, we have also found that this frame is used to talk about specific aspects of the current epidemic, such as its treatment and diagnostics. Other aspects involved in the epidemic are not typically framed within a WAR. This point is particularly important. The WAR frame, like any other frame, is useful and apt to talk about some aspects of the pandemic, such as the treatment of the virus and the operations performed by doctors and nurses in hospitals, but not to talk about other aspects, such as the need to feel our family close to us, while respecting the social distancing measures, or the collaborative efforts that we should undertake in order to #flattenthecurve, that is, diluting the spreading of the virus over a longer period of time, so that hospital ICU departments can work efficiently without getting saturated by incoming patients. In this sense, future studies could focus on the systematic identification of alternative figurative framings actually used in the Covid-19 discourse to tackle different aspects of the epidemic, but could also focus on the generation of additional frames, which can help communities to understand and express aspects of this situation that cannot be expressed by the WAR frame. A collection of different frames and metaphors that tackle different aspects of the current situation, or a Metaphor Menu (http://wp.lancs.ac.uk/melc/the-metaphor-menu/), as Semino and colleagues proposed in relation to cancer discourse [15], is arguably the most desirable set of communicative tools that, as language, communication, and computer scientists, we shall aim to construct in these current times, as a service to our communities.

Acknowledgments

The authors would like to thank all doctors, nurses, health-care workers, grocery store workers and anyone else at the front line of this epidemic.

Data Availability

All data files are available from the open science framework repository (https://osf.io/bj5a6/?view_only=b46ed9663a98461dac3a9430e3954e10).

Funding Statement

The author(s) received no specific funding for this work. PW's affiliated University (UCD) provides the funding for the publication fees.

References

  • 1.Flusberg SJ, Matlock T, Thibodeau PH. War metaphors in public discourse. Metaphor and Symbol 2018. 331 1–18. 10.1080/10926488.2018.1407992 [DOI] [Google Scholar]
  • 2.Culotta A. Towards Detecting Influenza Epidemics by Analyzing Twitter Messages. Proceedings of the First Workshop on Social Media Analytics. SOMA’10. New York, NY, USA: ACM; 2010. p. 115–122.
  • 3.Gomide J, Veloso A, Meira W Jr, Almeida V, Benevenuto F, Ferraz F, et al. Dengue Surveillance Based on a Computational Model of Spatio-temporal Locality of Twitter. Proceedings of the 3rd International Web Science Conference. WebSci’11. New York, NY, USA: ACM; 2011. p. 3:1–3:8.
  • 4.Pruss D, Fujinuma Y, Daughton AR, Paul MJ, Arnot B, Szafir DA, et al. Zika discourse in the Americas: A multilingual topic analysis of Twitter. PloS one, 2019. 14(5). 10.1371/journal.pone.0216922 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Paul MJ, Dredze M, Broniatowski D. Twitter Improves Influenza Forecasting. PLoS Currents Outbreaks. 2014. 10.1371/currents.outbreaks.90b9ed0f59bae4ccaa683a39865d9117 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Santillana M, Nguyen AT, Dredze M, Paul MJ, Nsoesie EO, Brownstein JS. Combining Search, Social Media, and Traditional Data Sources to Improve Influenza Surveillance. PLOS Comput Biol. 2015;11(10):e1004513 10.1371/journal.pcbi.1004513 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Ji X, Chun SA, Geller J. Monitoring public health concerns using Twitter sentiment classifications. IEEE International Conference on Healthcare Informatics; 2013. pp. 335–344.
  • 8.Smith M, Broniatowski DA, Paul MJ, Dredze M. Towards Real-Time Measurement of Public Epidemic Awareness: Monitoring Influenza Awareness through Twitter. AAAI Spring Symposium on Observational Studies through Social Media and Other Human-Generated Content; 2016.
  • 9.Mollema L, Harmsen IA, Broekhuizen E, Clijnk R, De Melker H, Paulussen T, et al. Disease detection or public opinion reflection? Content analysis of tweets, other social media, and online newspapers during the measles outbreak in the Netherlands in 2013. Journal of Medical Internet Research. 2015; 17(5): e128 10.2196/jmir.3863 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Lazard AJ, Scheinfeld E, Bernhardt JM, Wilcox GB, Suran M. Detecting themes of public concern: A text mining analysis of the Centers for Disease Control and Prevention’s Ebola live Twitter chat. American Journal of Infection Control. 2015. 10.1016/j.ajic.2015.05.025 [DOI] [PubMed] [Google Scholar]
  • 11.Tran T, Lee K. Understanding citizen reactions and Ebola-related information propagation on social media. IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). 2016. pp. 106–111.
  • 12.Miller M, Banerjee T, Muppalla R, Romine W, Sheth A. What Are People Tweeting About Zika? An Exploratory Study Concerning Its Symptoms, Treatment, Transmission, and Prevention. JMIR public health and surveillance. 2017; 3(2):e38 10.2196/publichealth.7157 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Vijaykumar S, Nowak G, Himelboim I, Jin Y. Virtual Zika transmission after the first U.S. case: who said what and how it spread on Twitter. American Journal of Infection Control. 2018; 10.1016/j.ajic.2017.10.015 [DOI] [PubMed] [Google Scholar]
  • 14.Entman R. M. (1993). Framing: Toward clarification of a fractured paradigm. Journal of Communication, 43(4), 51–58. 10.1111/j.1460-2466.1993.tb01304.x [DOI] [Google Scholar]
  • 15.Sontag S. Illness as Metaphor. Allen Lane, London: 1979. [Google Scholar]
  • 16.Semino E, Demjén Z, Demmen J, Koller V, Payne S, Hardie A, et al. The online use of Violence and Journey metaphors by patients with cancer, as compared with health professionals: a mixed methods study. BMJ supportive & palliative care 2017. 71: 60–66. 10.1136/bmjspcare-2014-000785 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Hendricks R. K., Demjén Z., Semino E. and Boroditsky L. Emotional implications of metaphor: Consequences of metaphor framing for mindset about cancer, Metaphor and Symbol, 2018. 33:4 267–279. 10.1080/10926488.2018.1549835 [DOI] [Google Scholar]
  • 18.Karlberg M, Buell L. Deconstructing the ‘war of all against all’: The prevalence and implications of war metaphors and other adversarial news schema in TIME, Newsweek, and Maclean’s. Journal of Peace and Conflict Studies 2005. 121: 22–39. [Google Scholar]
  • 19.Dodge EK, Hong J, Stickles E. MetaNet: Deep semantic automatic metaphor analysis. Proceedings of the Third Workshop on Metaphor in NLP. 2015.
  • 20.Wise A. Scientific American. 2020. https://blogs.scientificamerican.com/observations/military-metaphors-distort-the-reality-of-covid-19/
  • 21.Henderson K. Counterpunch. 2020. https://www.counterpunch.org/2020/04/24/trump-is-not-a-wartime-president-and-covid-19-is-not-a-war/
  • 22.Tisdall S. The Guardian. 2020 https://www.theguardian.com/commentisfree/2020/mar/21/donald-trump-boris-johnson-coronavirus
  • 23.Musu C. The Conversation. 2020. https://theconversation.com/war-metaphors-used-for-covid-19-are-compelling-but-also-dangerous-135406
  • 24.Lakoff G. 1996 Lakoff, G. Moral Politics: How Liberals and Conservatives Think. 1996
  • 25.Lamsal R. Coronavirus (COVID-19) Tweets Dataset. IEEE Dataport. 2020. 10.21227/781w-ef42 [DOI] [Google Scholar]
  • 26.Grinberg N, Joseph K, Friedland L, Swire-Thompson B, Lazer D. Fake news on Twitter during the 2016 US presidential election. Science, 2019. 363(6425), 374–378. 10.1126/science.aau2706 [DOI] [PubMed] [Google Scholar]
  • 27.Williams ML, Burnap P, Sloan L. Towards an ethical framework for publishing Twitter data in social research: Taking into account users’ views, online context and algorithmic estimation. Sociology, 2017. 51(6), 1149–1168. 10.1177/0038038517708140 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Joris W, d’Haenens L, Van Gorp B. The euro crisis in metaphors and frames: Focus on the press in the Low Countries. European Journal of Communication. 2014. 29(5), 608–617. [Google Scholar]
  • 29.Blei DM, Ng AY, Jordan MI. Latent dirichlet allocation. Journal of machine Learning research, 2003. 3(Jan), 993–1022. [Google Scholar]
  • 30.Stone B, Dennis S, Kwantes PJ. Comparing methods for single paragraph similarity analysis. Topics in Cognitive Science 2011. 3(1), 92–122. 10.1111/j.1756-8765.2010.01108.x [DOI] [PubMed] [Google Scholar]
  • 31.Sievert C, Shirley K. LDAvis: A method for visualizing and interpreting topics. In Proceedings of the workshop on interactive language learning, visualization, and interfaces. 2014. pp. 63–70.
  • 32.Le Q, Mikolov T. Distributed representations of sentences and documents. In International conference on machine learning 2014 (pp. 1188–1196).
  • 33.Liu H, Singh P. ConceptNet—a practical commonsense reasoning tool-kit. BT technology journal, 2004. 22(4), 211–226. [Google Scholar]
  • 34.Choi WG, Lee KS. Conceptual Representation for Crisis-Related Tweet Classification. Computación y Sistemas. 2019. 23(4). [Google Scholar]
  • 35.Chang J, Gerrish S, Wang C, Boyd-Graber JL, Blei DM. Reading tea leaves: How humans interpret topic models In Advances in neural information processing systems. 2009. pp. 288–296. [Google Scholar]
  • 36.Veale T. A service-oriented architecture for metaphor processing. Proceedings of the Second Workshop on Metaphor in NLP. 2014. pp. 52–60.
  • 37.Steen GJ, Dorst AG, Herrmann JB, Kaal AA, Krennmayr T, Pasma T. A method for linguistic metaphor identification From MIP to MIPVU. 2010. Amsterdam: John Benjamins. [Google Scholar]

Decision Letter 0

Panos Athanasopoulos

18 May 2020

PONE-D-20-10986

Framing COVID-19 How we conceptualize and discuss the pandemic on Twitter

PLOS ONE

Dear Mr. Wicke,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

I have benefitted from reviews by five colleagues who are all leading experts in this area. As such, you will be delighted to see that the reviewers have provided incredibly constructive and clear feedback, not only on technical issues relating to methodology, but also on the interpretation of the findings in the grand scheme of things. I invite you to engage thoroughly with every one of the reviewers’ points, as I am sure this will result in a stronger contribution.

We would appreciate receiving your revised manuscript by Jul 02 2020 11:59PM. When you are ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter.

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). This letter should be uploaded as separate file and labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. This file should be uploaded as separate file and labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. This file should be uploaded as separate file and labeled 'Manuscript'.

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

We look forward to receiving your revised manuscript.

Kind regards,

Panos Athanasopoulos, Ph.D

Academic Editor

PLOS ONE

Journal requirements:

When submitting your revision, we need you to address these additional requirements:

1.    Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at http://www.plosone.org/attachments/PLOSOne_formatting_sample_main_body.pdf and http://www.plosone.org/attachments/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. In your Methods section, please include additional information about your dataset and ensure that you have included a statement specifying whether the collection method complied with the terms and conditions for the websites from which you have collected data.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Yes

Reviewer #3: Partly

Reviewer #4: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: I Don't Know

Reviewer #4: No

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #4: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: No

Reviewer #3: Yes

Reviewer #4: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The paper analyzes Tweets that are used to talk about Covid-19 in three studies. The first identifies topics of these tweets. The second quantifies the prevalence of WAR metaphors in general, and across the topics identified in the first study. The third explores alternative figurative frames for the virus, along with one non-figurative frame (FAMILY). It is a timely study with interesting methods and results. I enjoyed reading it. I have suggestions for a revision, along with some methodological questions detailed below.

Introduction

1. The introduction could be better connected with the methods used in and research questions explored in the studies. Why is it interesting and important to determine what topics are discussed on Twitter in relation to Covid-19 or how frequently WAR and other metaphors are used? For example, could the topic modeling help to improve surveillance and forecasting about the disease? What frequency of WAR metaphors should we expect? How and why were the specific alternative frames chosen? Are there relevant theories in cognitive linguistics that the current work informs?

2. The emphasis and major strength of the paper seems to be the focus on WAR metaphors. The prevalence of WAR metaphors is quantified for an important issue, by topic, and in the context of other metaphoric frames (and a non-metaphoric frame). To highlight these strengths, I would encourage the authors to emphasize the novelty of quantifying the WAR frame using automated methods and real world data. Maybe switching the order of studies 1 and 2 would be helpful? (The results of Study 1 are difficult to interpret on their own).

3. The “Theoretical Background” section describes past work that is certainly interesting and relevant, but it doesn’t really identify theoretically motivated questions that the current work is well-suited to address. In most cases, it emphasizes practical applications and real world issues and/or methods (e.g., the relationship between measuring public sentiment on Twitter and addressing public health issues). Maybe rename this section for clarity.

Smaller points related to the introduction:

4. On p. 3 the authors note that around “16K Tweets are posted by Twitter users every hour, containing a hashtag such as #coronavirus, #Covid-19 or #COVID.” Is this based on the current data collection or a metric that is computed by Twitter? What is the temporal window for this statistic?

5. It seems like there is a typo on p. 4: “Unlike the articles on magazines and journals typically used for corpus analyses of this kind, Twitter does contain messages written by journalists and other experts in mass communication, as most tweets are provided by non-expert communicators.” Maybe “as” should be “although”?

Methods

1. Please clarify the following questions:

a. How were the 25k tweets per day selected? It seems like the algorithm picked up the first 25k tweets that included one of the specified hashtags per day, but that’s not explicitly stated. How long did it typically take to get to 25k tweets?

b. Were retweets included?

c. For a given person who tweeted multiple times in a day about the virus, was it only their first tweet that was included? What was the filtering criteria/method?

d. Were Tweets in languages other than English included?

2. It should be possible to clarify some of the hedging in the sentence, “our corpus arguably encompasses mainly tweets produced by users residing in the USA, where the time of data collection corresponds to awake hours, and the targeted language corresponds to the first language of many (if not most) US residents” (p. 12). I realize that the location data was stripped from the text before it was stored, but the text is initially tagged with the location data, so maybe it is possible to estimate the percentage of tweets from the US. Language use questions are asked on the US census every 10 years and roughly 80% of US residents report speaking English at home. A citation would make the case stronger.

3. How were the topic numbers (n = 4 vs. 16) decided in the first study? I find the results of this study hard to interpret. I think they are more informative when presented in concert with the results of study 2.

4. Categorizing language as metaphorical or not is tricky. There is a fair amount of work on this and the most common approach uses expert coders (see, e.g., Steen et al, 2010, A Method for Linguistic Metaphor Identification). I think the approach taken in the paper is reasonable but I think some limitations should be acknowledged. For example, there would probably be some disagreement by coders over which conceptual metaphors individual instances of “fight” appeal to (war vs. boxing vs. games, etc) and even whether or not particular instances are metaphorical or not.

5. Include some discussion about the relationship between the STORM and TSUNAMI categories. At first glance it seems like all instances of TSUNAMI should also be instances of STORM, but the current approach establishes these categories as mostly (completely?) non-overlapping.

6. I like the comparison of the metaphorical frames to the FAMILY non-metaphorical frame, but the FAMILY frame also seems fairly different in that it is more of a topic than a frame (i.e. in some ways more akin to the topics identified in Study 1). I don’t think this needs to be changed, but it seems worthy of a little discussion.

7. Include a note on how comparisons will be made. No inferential statistics are presented, which is common in cognitive linguistics, but the question about whether the number of cases is meaningfully different by frame or topic type will likely arise for readers.

Results

1. It’s hard to interpret the results of Study 1. Are these topics the ones we would expect? Do they inform theory? Do they inform practice? Could they have come out any other way?

2. What inferences can we draw about the relationship between WAR metaphors and topics from the results of Study 2?

3. If space is an issue, I think the comparison of the 30 vs 50 terms approaches on pp 26-28 could be cut.

4. Small point: There is no General Discussion section, although it is alluded to on p. 24.

5. Small point: Include a citation (and, ideally, a more precise statistic) in the sentence “…as previous literature shows that metaphor-related words cover only a percentage of the discourse, and that literal language is still prevalent” (p. 28). What percentage? Are there other metaphors that might be prevalent?

6. It would be nice to ground some of the qualitative observations noted in the discussion. For example: “Words in the STORM and TSUNAMI frames seem to relate to events and actions associated with the arrival and spreading of the pandemic…” (p. 28).

7. The paper ends by introducing the idea of a Metaphor Menu, which is interesting but it doesn’t logically fall out of the current study in my opinion. Maybe this idea could be discussed a little more.

Reviewer #2: The authors examined the (metaphorical) content of tens of thousands of English tweets surrounding the Covid-19 pandemic, scraped from two recent weeks from largely American twitter users. Topic modeling revealed several common themes (4 and also 16; more on this below), and that war metaphors were somewhat common (~4% prevalence), for some topics more than others, and appeared more frequently than other metaphorical domains. They argue that this is consistent with other empirical and theoretical research on the use of war metaphors in public discourse, but they now provide evidence this extends to everyday lay discourse online.

In general, I thought this was a timely article dealing with an important topic of interest to a variety of scholars, and a nice extension of previous work and theoretical musings on the use and prevalence of war metaphors. I think the methods and analyses were thoughtful and for the most part sound (though, as I detail below, I was confused about some of the details), and the results were solid. That said, I have a variety of comments and concerns that I think the authors should address in a revision before the manuscript is considered for publication.

One overarching concern is that the paper feels like it was rushed to submission and therefore the writing and overall organization are not quite up to the standards of a publishable manuscript. I understand the authors’ sense of urgency in getting this paper out there while the global pandemic crisis is still at its peak, and that they literally wrote it over the past few weeks, but I think extra care needs to be taken during any revision process to make sure the writing is improved. There were many grammatical and punctuation errors throughout the paper, along with confusing sentences and shifts in tense (if they want to refer to “now,” they should stick with phrasing like “at the time of writing,” which they were not consistent with throughout the paper). At times it was difficult to follow the logic of their thinking or make sense of some of the details of the methods and analyses.

One of the issues is mostly organizational: the authors chose to frame their work as three “studies,” presenting the “methods” for each first, followed by the results for each, etc. I found this structure to be confusing and hard to follow, as I had to jump back and forth between methods and results and discussion sections to remember what was done (and why) as I proceeded. At one point they discuss the topics modeling results, for example, but it had been so long since they had discussed the methods, and then they waited until the subsequent section to actually give the topics meaningful labels (which they never do in the main text for the 16 topics). It was very hard to keep track of everything because of this structure.

As the research itself really strikes me as one single study, not three, but with many analytic components, I think the authors should restructure the paper in a more logical, linear fashion. For example, they could still preview the whole set of big questions and their approach in the introduction, and then the main sections could be each question in turn, with meaningful headings/subheadings rather than traditional “Study 1” and “methods” headings.

So, they could still start by describing the procedures for gathering the data from twitter and the organization of the dataset. A sub-heading in that section could be something like “Themes in the data: Topic modeling” where they go through all of the methods, results, AND discussion and labels (each with their own subheadings…) for the topic modeling. Then they can move on to a section about defining their WAR (AND alternative!) dictionaries and analyses and discussion, and then conclude with their general discussion. I think something like this would help make the flow of the paper clearer and more effective.

Some additional comments:

While the authors reviewed a good amount of research on war metaphors, they neglected to discuss any of the dozens of articles have been written very recently about the war metaphor framing for Covid-19 (and its plusses and minuses), in both mainstream and independent outlets online. I think citing and discussing at least some of these would help situate the article in the present moment, provide additional context, and highlight the importance of the present research. Here are some examples:

https://grist.org/climate/no-more-war-on-coronavirus-in-search-of-better-ways-to-talk-about-a-pandemic/

https://www.vox.com/culture/2020/4/15/21193679/coronavirus-pandemic-war-metaphor-ecology-microbiome

https://time.com/5821430/history-war-language/

https://www.theguardian.com/commentisfree/2020/mar/21/donald-trump-boris-johnson-coronavirus

https://medium.com/@steve.howe_63053/were-at-war-the-language-of-covid-19-e3d4f4a1ae2e

https://www.counterpunch.org/2020/04/24/trump-is-not-a-wartime-president-and-covid-19-is-not-a-war/

https://www.afsc.org/blogs/news-and-commentary/how-to-talk-about-covid-19-pandemic

https://blogs.scientificamerican.com/observations/military-metaphors-distort-the-reality-of-covid-19/

https://theconversation.com/war-metaphors-used-for-covid-19-are-compelling-but-also-dangerous-135406

On Page 8, 167, the authors say “As explained in [1], war metaphors are pervasive in public discourse and span a wide range of topics because they provide a very effective structural framework for communicating and thinking about abstract and complex topics, notably BECAUSE of the emotional valence that these metaphors can convey” (emphasis added). This makes it sound like the emotional valence of WAR is part of its structural framework, but I think this is a bit confused. War provides both a structural schema as a source domain AND it conveys an emotional tone; these points are actually separated in the paper referenced in the sentence. The authors break this down on the following page, but this sentence was unclear. Again, this may be part of the broader need to edit and revise some of the language in the article.

P12, Line 243-4: “…and the targeted language [English] corresponds to the first language of many (if not most) US residents.” Look this up and cite a source instead of speculating.

P11-12. I was terribly confused by the whole data gathering and filtering procedure. It was unclear how many tweets there were vs. individual tweeters vs. used tweets. The table tracks cumulative tweets but didn’t say that, which was confusing. It was not explained how the filtering was done (i.e., how did you choose which tweet to keep from each user that posted multiple tweets? Did the same tweeters post on multiple days and how was that dealt with?). I think this whole section could be streamlined and made much clearer.

Lines 270-72: The authors note they expected to find broader and more generic topics when they included 4 as compared to 16 topics. Well, of course, how else could that have turned out? In general, I found the use of two sets of topics to be unnecessarily confusing and did not feel it added much to the overall message in the paper. I suggest the authors stick with one set of topics that have easily identifiable and meaningful labels/clusters of attributes. Perhaps they could split the difference and choose 8 or 10 topics. Whatever makes the most sense for interpreting the metaphor data later is fine. I should also note this was all very exploratory/arbitrary, which is OK, but perhaps should be noted in the text (they could add a footnote explaining that using different numbers of topics doesn’t fundamentally change the pattern of findings).

Lines 309-310: The authors write, “The term list includes the following 79 terms“… but no list was forthcoming yet until the authors discussed their other method for generating terms. Either separate out into two lists (79 + 12) or, better, use one list but BOLD the ones coming from tool two (metaNet), and do not say “the following terms…” until you are actually planning to list the terms.

The authors use FAMILY as their “literal” comparison, but it should be noted that family terms COULD be figurative (and indeed, Lakoff, for example, has written much about the figurative uses of FAMILY in describing governments…). For example, “all Americans are one family.” “the president is the father of the American household,” etc. Is there any way to check to make sure all of the instances of family terms in the dataset are indeed literal and not figurative (and to remove the latter)?

On lines 503-4, the authors note that war words have a “very negative valence, OF COURSE” [emphasis added]. But I am not so sure I agree with that. Some people might get excited and motivated by ‘FIGHTING” the virus (which feels much less negatively valenced than THREAT, for example). Especially in the United States, which comprises many subcultures that glorify guns and wars and the military, I think some of these terms may be quite positively valenced. Maybe draw on some empirical work and use actual ratings of emotional valence of these words (e.g., using Pennebaker’s LIWC or some other database)

Reviewer #3: This paper adopts a topic modelling approach to study a dataset consisting of just over 200,000 tweets about Covid-19 posted in English (and primarily from the USA) in March and April 2020. The approach is employed to: identify the main topics in the data (set at 4 and 16); study the prevalence of a WAR metaphorical framing; compare that framing with three alternative metaphorical framings and a literal topic; and investigate any correlations between the WAR framing and the topics that were automatically identified. The findings are relevant, if somewhat predictable: the WAR framing is more prevalent than the alternative metaphorical framings, and it tends to correlate with discussions of diagnosis and treatment.

Concerning the creation of the dataset, the authors provide some justification for limiting tweets from the same account to one. However, this makes it impossible to capture the actual prevalence of the various framings on Twitter. The consequences of this decision should therefore be explicitly acknowledged.

The labelling of the groups of terms associated with each automatically generated topic imposes more coherence on each set of words than is actually the case, especially in the version of the analysis that only involves four topics. This is typical of this kind of computational approach to discourse analysis, but it should minimally be pointed out as a methodological issue.

As for the alternative metaphorical frames, the terms under TSUNAMI are generally to do with natural disasters, rather than tsunamis specifically.

Finally, it should be acknowledged more explicitly that this kind of analysis cannot shed light on how the WAR framing, or any other framing, are actually used. For example, it cannot distinguish between cases where the WAR framing is adopted and where it is critiqued (as has also been the case on Twitter). Ideally, the subset of tweets that employ WAR-related vocabulary could have been subjected to a more fine-grained analysis, but this usually goes beyond the scope of studies such as this.

Reviewer #4: Thanks for the opportunity to review.

Interesting look at how discussion on Twitter may be framed using frames from the disease literature, and a brief discussion of results of topics models on a limited Twitter dataset. This is certainly a timely thing, so I recommend major revisions. With work I think this could bring value to the public health community as we endeavor to perform contact-tracing and subject to mis- and dis-information around this pandemic.

---------------------------

Main critiques

---------------------------

What I am missing is the theoretical and practical contribution. Specifically, how would the authors answer the "so what" if the tweets are framed like WAR, STORM, etc., and "so what" if they're not? (which, they're not - 90-95% of the posts are not according to the results.)

- Are relative frequencies of frames statistically different from each other, and do they happen often enough to be significant in general? Put another way, does this frame analysis work or matter on Twitter?

- How would the authors characterize the other 90% of the discussion, and why / how is it important? Are there any themes related to mis- or dis-information, or to political polarization?

Second, I have concerns about sampling bias. This amounts to a study of 12 days' worth of tweets, only a few thousand. Line 60 states 16k tweets are posted every hour (do the authors have a citation?), and yet the authors collected 25k tweets per day. This equation does not balance, even when accounting for a 1% sampling rate from the Twitter API. This uncertainty undermines the efficacy of this paper - either the collection has a problem or the statement is false.

Regardless, at the time of data collection multiple datasets of Twitter related to COVID-19 existed. I strongly recommend repeating this analysis in two ways to see if the results change or hold:

- one, now that the authors have been collecting more data for a while,

- and two, perhaps more pressingly, using one of the public open datasets for Twitter with millions of tweets. See e.g. this collection of resources: http://www.socialmediaforpublichealth.org/covid-19/resources/ "Twitter Data"

(This also suggests an opportunity to do temporal analysis, to see if the frames and discussion have changed and if so, how they are changing. This may help with a practical contribution - to answer if discussions are moving in a healthy or helpful direction, or the opposite, and why?

- For example, how often do these topics found happen over time, how often do these frames happen over time, and why is that important? How would we interpret these topics, and why might they be important? How do these frames correspond with hashtags or the literal discussion of the disease?)

Thirdly, please see critiques of the methods, related to LDA and Twitter pre-processing.

Fourthly, I also include more minor points and notes about statistical significance.

---------------------------

I also have concerning methods critiques that may undermine results:

---------------------------

On tweet processing decisions:

- I'm struggling to understand why the authors eliminated all but one tweet per user. This is a limitation. It looks like the methods and results are at the level of a tweet, not at the level of a user. In addition to the sampling bias, the authors could be discarding data that is important to their analysis. If the authors insist on retaining only one tweet per user, how this was performed? Was this random? If not, this could bias one's data again.

- I'm struggling to understand why the authors excluded retweets and mentions. How many retweets and mentions are there? Together, these choices severely limit the amount of analysis possible, to show how often the frame of the discourse is spreading, occurring, or changing. I understand wanting to exclude them initially, but what about repeating the analysis with them included to see how it changes?

On LDA implementation:

- Did the authors use Gibbs sampling or variational inference? Gibbs sampling has been shown to yield vastly superior topics. I'd recommend repeating analysis if used variational inference and see if results hold.

Related:

- how did the authors choose 4 vs 16 topics? why not other numbers? did they check perplexity - what number of topics has the lowest perplexity? (most likely to explain the data)

- How did the authors handle hashtags and URLs and usernames? These may contain information, or not, depending on the design of the study. What happened? These may be useful to report if analyzing the discussion.

On LDA interpretation and results:

- the authors look at significant words in topics, but what about tweets most about those topics? it can make a difference, per the coming citation. I recommend evaluating topics in both ways, as it may affect results of lines 585-596. See https://scholar.google.com/scholar?hl=en&as_sdt=0%2C21&q=reading+tea+leaves+humans+interpret+topic&btnG=

- did the authors interpret all of the 16 topics like they did the 4 topics? (lines 450-453)

- for topic figures, I suggest putting names of topics in the figure axes where possible. without them it's inconvenient to remember which is which

On Twitter pre-processing, I'm worried about the authors' use of general-language tools on tweets which have been shown to use vastly different language structure.

- stopwords from 2012... check/justify that these are up-to-date and apply to Twitter? need to come up with domain-specific ones?

- along the same lines there's a twitter tokenizer (e.g., stanfordNLP, NLTK) that are custom-built for this... what about emoji, how were these handled?

- line 279: better would have been to use tf-idf and leave the common terms in... these would have been reduced by the weighting organically

On literal framing control:

- What about the literal frame "it's a disease"? The authors chose family as a literal frame- this may strongly coincide with incidence or deaths from the disease, which may not be exactly what the authors want to measure.

- In addition... how are the authors controlling by including this? Should this be used for normalization, or testing statistical significance of frequencies or of differences among frame?

---------------------------

On Results and discussion

---------------------------

table 2 - are these results statistically significant? This would give weight to the authors' statement about the relative amount.

lines 512-536, about topics predicting occurrences of frames... are these differences between frequencies statistically significant? are these frequencies high enough to matter?

Continuing down the path about frames vs. topics:

- How often do the family or alternative frames show up in the predicted topics? like likes 383-384 for the WAR frame.

- In addition, how many topics include words in the frames? This may be an indicator if the frames are even worth studying on this domain. (see 90% number and earlier comment about frequencies)

Lines 55-67 do the authors have any citations for any/all of these statements?

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Paul Thibodeau

Reviewer #2: No

Reviewer #3: No

Reviewer #4: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files to be viewed.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please note that Supporting Information files do not need this step.

Attachment

Submitted filename: Reviewer 5.docx

PLoS One. 2020 Sep 30;15(9):e0240010. doi: 10.1371/journal.pone.0240010.r002

Author response to Decision Letter 0


14 Jun 2020

Reply to reviewers: Framing COVID-19 How we conceptualize and discuss the pandemic on Twitter PONE-D-20-10986.

We are very thankful to all 5 reviewers for the constructive feedback and comments, which helped us improve our manuscript in many ways. We took onboard all comments and we hereby provide our reply to each and every point raised. We apologize for the length of this document, which matches the length of the actual article.

A note to all reviewers: The matching algorithm used to identify the target terms in the corpus has been improved (notably, it is not case-sensitive anymore, so it can retrieve for example occurrences of “fighting” as well as “FiGHting”: this is quite relevant because on social media capital letters and mixes of small and caps are often used). We have therefore updated the numbers in the “Framing results” with more accurate values (none of which are affecting the topic modeling, interpretation or discussion). Accordingly, Fig. 6 and Table 2 were updated as well.

____________________________________________________________

Reviewer #1:

The paper analyzes Tweets that are used to talk about Covid-19 in three studies. The first identifies topics of these tweets. The second quantifies the prevalence of WAR metaphors in general, and across the topics identified in the first study. The third explores alternative figurative frames for the virus, along with one non-figurative frame (FAMILY). It is a timely study with interesting methods and results. I enjoyed reading it. I have suggestions for a revision, along with some methodological questions detailed below.

Reply: Thank you.

Introduction

1. The introduction could be better connected with the methods used in and research questions explored in the studies. Why is it interesting and important to determine what topics are discussed on Twitter in relation to Covid-19 or how frequently WAR and other metaphors are used? For example, could the topic modeling help to improve surveillance and forecasting about the disease? What frequency of WAR metaphors should we expect? How and why were the specific alternative frames chosen? Are there relevant theories in cognitive linguistics that the current work informs?

Reply: Thank you for this comment. The introduction contains the following information: topic (first paragraph), how we operationalize the topic (through tweets, 2nd paragraph), aim (3rd para) and RQs (4th para). The information required by the reviewer in our opinion does not belong in the introduction, but rather in the Theoretical background, which contains related work in cognitive linguistics, and Methods, where we explain why specific frames were used etc. Moreover, we have already included in the Introduction that we expect to find WAR related metaphors (indicating a very specific expected frequency of metaphors to be found appears to be out of place). We took this comment onboard by adding a very short paragraph at the end of the introduction in which we highlight the relevance of this study, which is then developed in the next section.

2. The emphasis and major strength of the paper seems to be the focus on WAR metaphors. The prevalence of WAR metaphors is quantified for an important issue, by topic, and in the context of other metaphoric frames (and a non-metaphoric frame). To highlight these strengths, I would encourage the authors to emphasize the novelty of quantifying the WAR frame using automated methods and real world data. Maybe switching the order of studies 1 and 2 would be helpful? (The results of Study 1 are difficult to interpret on their own).

Reply: Thank you. We emphasized these strengths in the last paragraph of the introduction. Moreover, following also the suggestion by reviewer 2, we have restructured the paper (methods and analyses), and thus the three studies are now three parts of the same study, so they don’t have to be interpreted by their own. This should improve the readability and solve this comment too.

3. The “Theoretical Background” section describes past work that is certainly interesting and relevant, but it doesn’t really identify theoretically motivated questions that the current work is well-suited to address. In most cases, it emphasizes practical applications and real world issues and/or methods (e.g., the relationship between measuring public sentiment on Twitter and addressing public health issues). Maybe rename this section for clarity.

Reply: Thank you. The Theoretical Background section contains related works (theoretical background) in the following fields, which we bring together in our study: quantitative analyses based on Twitter data related to epidemics (paragraphs 1-4); cognitive linguistic studies of figurative framings and metaphors, including the WAR frame and metaphors that we tackle in our study (paragraphs 5-7) and alternative framings (paragraph 8). This section does contain a selected review of the literature in these two fields of research, which is functional to our argumentation. We therefore stick to the label Theoretical Background.

Smaller points related to the introduction:

4. On p. 3 the authors note that around “16K Tweets are posted by Twitter users every hour, containing a hashtag such as #coronavirus, #Covid-19 or #COVID.” Is this based on the current data collection or a metric that is computed by Twitter? What is the temporal window for this statistic?

Reply: Thank you. We have made this relation clearer by stating that “collecting every tweet containing a hashtag such as #coronavirus, #Covid-19 or #Covid, we accumulated around 16,000 tweets within an hour each day.” Meaning, that the metric is in fact based on the current data collection. This ambiguity has now hopefully been resolved.

5. It seems like there is a typo on p. 4: “Unlike the articles on magazines and journals typically used for corpus analyses of this kind, Twitter does contain messages written by journalists and other experts in mass communication, as most tweets are provided by non-expert communicators.” Maybe “as” should be “although”?

Reply: Thank you. We solved this.

Methods

1. Please clarify the following questions:

a. How were the 25k tweets per day selected? It seems like the algorithm picked up the first 25k tweets that included one of the specified hashtags per day, but that’s not explicitly stated. How long did it typically take to get to 25k tweets?

b. Were retweets included?

c. For a given person who tweeted multiple times in a day about the virus, was it only their first tweet that was included? What was the filtering criteria/method?

d. Were Tweets in languages other than English included?

Reply: 1a) The algorithm sends a query to the Twitter API. Our query asks for 25,000 tweets from one day that contain a list of specified hashtags (#CoronaVirus, #Covid etc.) and are of English language. Twitter restricts the access by firstly not allowing to choose a time of day in the query, therefore we receive the first 25k tweets on that day. Secondly, there is a rate limit to retrieve tweets. To abide by that rate limit, the algorithm has waiting periods. On average it takes about 2hrs and 45min to collect 25k tweets from one day.

Reply: 1b) Retweets were not included (as we explained in the paper).

Reply: 1c) Yes, only the first tweet about the pandemic was retained, for each user. In this way we constructed a balanced corpus that had one tweet per user. By doing so, we avoided having, for example, hundreds of tweets from the same user who, maybe, was particularly keen on using the WAR frame. This would have biased our results. Instead, by keeping 1 tweet per user we have a better overview of how people talk about covid on twitter. The algorithm picked the first tweet per user that it could find, within the timeframe indicated. Collecting all tweets from all users, and then randomly selecting one per each user would have been computationally cumbersome and would have exceeded our database capacity.

Reply: 1d) see a), we have only selected English tweets through Twitter's API language filter.

2. It should be possible to clarify some of the hedging in the sentence, “our corpus arguably encompasses mainly tweets produced by users residing in the USA, where the time of data collection corresponds to awake hours, and the targeted language corresponds to the first language of many (if not most) US residents” (p. 12). I realize that the location data was stripped from the text before it was stored, but the text is initially tagged with the location data, so maybe it is possible to estimate the percentage of tweets from the US. Language use questions are asked on the US census every 10 years and roughly 80% of US residents report speaking English at home. A citation would make the case stronger.

Reply: In fact, a preliminary analysis of tweets included a statistical analysis of the location data. To the best of our knowledge, the location data provided by the free Twitter API is self-reported by the user. Unfortunately, as the data included a great range of responses from “Anywhere that has liquor” or “Here.” to “México!!!” or “Universe”, we found no feasible way as to manually interpret the sample. We concluded that most of the locations indicated the US. Furthermore, the tweet time and targeted language seemed to be a more reliable estimator to conclude the majority of tweets being from US citizens. We have now added a citation, supporting the claim that most US residents speak English at home.

3. How were the topic numbers (n = 4 vs. 16) decided in the first study? I find the results of this study hard to interpret. I think they are more informative when presented in concert with the results of study 2.

Reply: Thank you. We have added a sentence explaining that the numbers 4 and 16 were chosen on an empirical basis and then backed up by an analysis of the internal coherence of the clusters. We were interested in a small but meaningful number of clusters and opted for 4, because 1-2-3 were in our opinion too limited; then we were also interested in a much more granular solution, so we opted for 16 (4 x 4).

Reply: We have now reported the coherence scores for these clusters for different LDA inference algorithms (as suggested by Reviewer 4). A more detailed discussion of these scores, which is linked to the decision to choose topics 4 and 16 is given in response to Reviewer 4. In short, the methods used to evaluate the internal coherence of the various cluster solutions from 1 cluster (1 topic) to 20 clusters (20 topics), shows that the coherence is particularly good for 4 clusters (according to the elbow method of cluster coherence evaluation) and that the model plateaus at 16 topics, with little interpretable improvement to be expected for more than 16 topics.

Reply: Please note that the structure of the 3 studies, as indicated above, has now changed, and the three studies are 3 phases of one big study. This should improve the interpretability of the results.

4. Categorizing language as metaphorical or not is tricky. There is a fair amount of work on this and the most common approach uses expert coders (see, e.g., Steen et al, 2010, A Method for Linguistic Metaphor Identification). I think the approach taken in the paper is reasonable but I think some limitations should be acknowledged. For example, there would probably be some disagreement by coders over which conceptual metaphors individual instances of “fight” appeal to (war vs. boxing vs. games, etc) and even whether or not particular instances are metaphorical or not.

Reply: Thank you. The MIP (and MIPVU) approaches for metaphor identification in texts (Steen et al. 2010) works around basically the distinction between contextual meanings and basic meanings. If there is a cross-domain comparison between contextual and basic meanings in a given text, in relation to a specific lexical unit, then that specific unit shall be marked as MRW (metaphor related word). The procedure is performed by manual annotations, using dictionaries as tools to check the contextual and basic meanings. Because the dictionaries are the same, it is (rightly) argued that different annotators shall converge to similar annotations of MRWs, and hence the procedure is reliable. In our case, we took a different approach. Firstly, our corpus is much larger than any corpus ever analyzed with the manual annotation MIP. Secondly, there are probably no good dictionaries that list among the contextual meanings of a word, the covid-related uses. It could, however, be argued that we kept somehow the distinction between contextual and basic meanings, in the following way: by selecting tweets that contain the hashtag COVID we set the contextual meanings of the words in the tweet as the pandemic-related domain. Thus, in that domain, any word that has a basic meaning related to WAR, such as all the words used in our lists, retrieved from Lakoff’s database and the web-service related-words (both solidly theoretically motivated) are used metaphorically in the context of covid.

5. Include some discussion about the relationship between the STORM and TSUNAMI categories. At first glance it seems like all instances of TSUNAMI should also be instances of STORM, but the current approach establishes these categories as mostly (completely?) non-overlapping.

Reply: Thank you. The overlap between the two wordlists is about 20%. It should be noted that the two frames STORM and TSUNAMI, relate to quite different concepts. On the Macmillian dictionary: STORM: an occasion when a lot of rain falls very quickly, often with very strong winds or thunder and lightning. TSUNAMI: a very large wave or series of waves caused when something such as an earthquake moves a large quantity of water in the sea. It makes sense that, denoting different natural phenomena, the two frames have quite different sets of related words.

Reply: In general, while deciding for alternative frames, and related word lists, we discarded for example the potential alternative frame FLOOD precisely because it had too much overlap with these two frames, STORM and TSUNAMI. Conversely, STORM and TSUNAMI are substantially different, and therefore both good candidates for alternative frames. Moreover, both these frames, STORM and TSUNAMI, have been indicated as prolific frames that are actually used in the discourse on Covid, by the Reframe Covid initiative (#ReframeCovid: https://docs.google.com/spreadsheets/d/1TZqICUdE2CvKqZrN67LcmKspY51Kug7aU8oGvK5WEbA/edit#gid=268174477). This has been now added to the description of the methods related to the Alternative Frames analyses.

6. I like the comparison of the metaphorical frames to the FAMILY non-metaphorical frame, but the FAMILY frame also seems fairly different in that it is more of a topic than a frame (i.e. in some ways more akin to the topics identified in Study 1). I don’t think this needs to be changed, but it seems worthy of a little discussion.

Reply: Thank you. In traditional framing theory, framing is defined as “select[ing] some aspects of a perceived reality and mak[ing] them more salient in a communicating text, in such a way as to promote a particular problem definition, causal interpretation, moral evaluation, and/or treatment recommendation for the item described” (Entman, 1993,p. 53).

Reply: FAMILY is a frame that, as another reviewer pointed out, can be used to talk about covid because it is an aspect of this reality. It can be used figuratively, as well as literally. Another reviewer mentioned: “family terms COULD be figurative (and indeed, Lakoff, for example, has written much about the figurative uses of FAMILY in describing governments” (for example, within the EU, one could say that Germany is the responsible father of the family, who scolded Italy for her behavior etc). We have chosen FAMILY deliberately to be a frame that is most likely not used figuratively, yet has comparable properties to a frame we expect to be used figuratively (e.g. WAR, STORM or MONSTER). In fact, we have checked qualitatively the tweets that feature one of the FAMILY terms in it, and it appears that they are used literally, not metaphorically. So, for the covid reality, FAMILY is used typically as a literal frame, which carves an aspect of the overall covid reality (the aspect related to family relations and family dynamics and family members).

Reply: Moreover, in communication sciences a frame is typically defined as consisting of two elements (Joris,d’Haenens, & Van Gorp, 2014, p. 609): framing devices which are elements in a text and specific linguistic structures (for example a list of words related to a frame) and reasoning devices which are the (latent) information in a text through which the problem, cause, evaluation, and/or treatment is implied (a more conceptual-communicative dimension of a text). A topic operationalized by topic modelling, corresponds to the first of these two components of a frame (i.e., a list of semantically related words). For example, one can say that sister, father, home, parenthood, brotherly love are all words that together represent the topic of FAMILY, which can be used as a frame in covid discourse: by talking about family aspects one could direct the readers’ attention to the importance of family relations and stress how these may have been disrupted by the pandemic. But a topic can also be used as a frame, if it has the communicative purpose to highlight a specific aspect of a situation to fulfil a specific communicative purpose.

Reply: We have added the definition of framing in the theoretical background, explaining that the FAMILY frame is used literally in this type of discourse (end of Theoretical Background section). We also added a paragraph on the relation between frames and topics (as intended in topic modelling) in the beginning of the section “Identifying topics in Covid-19 discourse on Twitter through Topic Modeling”.

7. Include a note on how comparisons will be made. No inferential statistics are presented, which is common in cognitive linguistics, but the question about whether the number of cases is meaningfully different by frame or topic type will likely arise for readers.

Reply: Thank you. We have now reported the results of a Cochran’s Q test to assess if there are any significant differences between the 5 frames, in the ways they are represented in the corpus of tweets. To do so, we compiled a tweets by frames matrix that displays binary values for presence or absence of a frame-related term in each tweet. Cochran’s Q test statistic = 47,226.72 , df = 4, p < 0.001. The post-hoc pairwise McNemar test is highly significant (p < 0.001) for all permutations (all frame pairs). We can therefore conclude that there are significant differences in the presence of target words between each pair of frames.

Reply: Additionally, we have followed a suggestion by Reviewer #4 and validated our results through a series of replication studies: we have repeated this analysis on a new corpus constructed using another two weeks of tweets; on the overall corpus of tweets encompassing 8 weeks worth of tweets, and on an external, already existing resource containing Twitter data. The results of the replication studies are reported in the section “Replication studies”. Results show that the findings are consistent across the various datasets.

Results

1. It’s hard to interpret the results of Study 1. Are these topics the ones we would expect? Do they inform theory? Do they inform practice? Could they have come out any other way?

Reply: Thank you. As a matter of fact, we have now restructured the paper so that these three studies are three parts of the same study, which is discussed at the end in terms of their relevance for metaphor studies and for communication sciences in general, about how pandemics are understood and discussed by the general public.

2. What inferences can we draw about the relationship between WAR metaphors and topics from the results of Study 2?

Reply: Thank you. Please note that the paper has been restructured. The inferences about the relationship between war metaphors and topics are already elaborated in the section LDA Topic prediction of WAR tweets and discussed in the WAR framing discussion section, from which we report hereby an excerpt: “In relation to the topic modelling of the war-related tweets, we showed that tweets that feature war-related terms are most likely to belong to topics IV, I and III, rather than to topic II. Interestingly, topic IV addresses aspects related to the reactions to the epidemics, including the measures proposed by the governments and taken by the people, such as self-isolating, staying at home, protecting our bodies and so forth. Our analysis therefore suggests that using war-related words is a communicative phenomenon that we use to express aspects of the Covid-19 epidemic related to the measures needed to oppose (fight!) the virus. Moreover, tweets that feature war-related words are also often classified within the topics I and III, which include the aspects related to communications and reports about the virus, and politics. We interpret these results arguing that public communications and political messages are likely to frame the discourse in the WAR framing. Finally, it might not come as a surprise the fact that topic II, which encompasses aspects of the discourse related to the familiar sphere, the community and the social compassion, does not relate well with the tweets containing war terms.”

Reply: Moreover, as mentioned in reply to a previous comment, we have now clarified the definitions and distinctions between frame and topic.

3. If space is an issue, I think the comparison of the 30 vs 50 terms approaches on pp 26-28 could be cut.

Reply: Thank you.

4. Small point: There is no General Discussion section, although it is alluded to on p. 24.

Reply: Thank you. We apologize. We changed the last section (which was General Discussion and Conclusions) into simply Conclusions because of space reasons and forgot to update this sentence in the text. Thank you for pointing this out. We corrected it.

5. Small point: Include a citation (and, ideally, a more precise statistic) in the sentence “…as previous literature shows that metaphor-related words cover only a percentage of the discourse, and that literal language is still prevalent” (p. 28). What percentage? Are there other metaphors that might be prevalent?

Reply: Thank you. We clarified this point and related it to corpus-based literature that supports this claim. In particular, we refer to the analysis performed by Steen and colleagues, who report that literal language covers 86.4% of the lexical units, while metaphor-related words cover just 13.6% of the lexical units. Their analysis is based on a corpus of 187,570 lexical units, manually annotated one by one for metaphoricity in a formal content analysis.

6. It would be nice to ground some of the qualitative observations noted in the discussion. For example: “Words in the STORM and TSUNAMI frames seem to relate to events and actions associated with the arrival and spreading of the pandemic…” (p. 28).

Reply: Thank you. We added examples to ground our arguments.

7. The paper ends by introducing the idea of a Metaphor Menu, which is interesting but it doesn’t logically fall out of the current study in my opinion. Maybe this idea could be discussed a little more.

Reply: Thank you. As we explained with our analysis, the WAR frame appears to be used in relation to specific topics, and thus to frame specific aspects of the discourse around Covid. Metaphors related to war, for example, do not appear to be apt, and therefore used, to frame the covid topics “ such as the need to feel our family close to us, while respecting the social distancing measures, or the collaborative efforts that we should undertake in order to #flattenthecurve, that is, diluting the spreading of the virus over a longer period of time, so that hospitals’ ICU departments can work efficiently without getting saturated by incoming patients.” ( excerpt taken from our Conclusions). Thus, it follows that there are aspects involved in the discourse around covid that may benefit from different figurative frames. In this sense, the war frame and war metaphors alone appear to be not sufficient to discuss all the aspects related to covid. The idea of a metaphor menu, in our argumentation, evolves from this observation, to provide communicative tools that can be used to discuss also the other topics involved in the covid discourse, where war metaphors are not observed.

____________________________________________________________

Reviewer #2

The authors examined the (metaphorical) content of tens of thousands of English tweets surrounding the Covid-19 pandemic, scraped from two recent weeks from largely American twitter users. Topic modeling revealed several common themes (4 and also 16; more on this below), and that war metaphors were somewhat common (~4% prevalence), for some topics more than others, and appeared more frequently than other metaphorical domains. They argue that this is consistent with other empirical and theoretical research on the use of war metaphors in public discourse, but they now provide evidence this extends to everyday lay discourse online.

In general, I thought this was a timely article dealing with an important topic of interest to a variety of scholars, and a nice extension of previous work and theoretical musings on the use and prevalence of war metaphors. I think the methods and analyses were thoughtful and for the most part sound (though, as I detail below, I was confused about some of the details), and the results were solid. That said, I have a variety of comments and concerns that I think the authors should address in a revision before the manuscript is considered for publication.

Reply: Thank you.

One overarching concern is that the paper feels like it was rushed to submission and therefore the writing and overall organization are not quite up to the standards of a publishable manuscript. I understand the authors’ sense of urgency in getting this paper out there while the global pandemic crisis is still at its peak, and that they literally wrote it over the past few weeks, but I think extra care needs to be taken during any revision process to make sure the writing is improved. There were many grammatical and punctuation errors throughout the paper, along with confusing sentences and shifts in tense (if they want to refer to “now,” they should stick with phrasing like “at the time of writing,” which they were not consistent with throughout the paper). At times it was difficult to follow the logic of their thinking or make sense of some of the details of the methods and analyses.

Reply: Thank you. We have now worked on the revised manuscript to improve its readability in various ways: 1) we have restructured the sections (see comment below), 2) we have revised the tenses and 3) we have proofread the manuscript

One of the issues is mostly organizational: the authors chose to frame their work as three “studies,” presenting the “methods” for each first, followed by the results for each, etc. I found this structure to be confusing and hard to follow, as I had to jump back and forth between methods and results and discussion sections to remember what was done (and why) as I proceeded. At one point they discuss the topics modeling results, for example, but it had been so long since they had discussed the methods, and then they waited until the subsequent section to actually give the topics meaningful labels (which they never do in the main text for the 16 topics). It was very hard to keep track of everything because of this structure.

As the research itself really strikes me as one single study, not three, but with many analytic components, I think the authors should restructure the paper in a more logical, linear fashion. For example, they could still preview the whole set of big questions and their approach in the introduction, and then the main sections could be each question in turn, with meaningful headings/subheadings rather than traditional “Study 1” and “methods” headings.

So, they could still start by describing the procedures for gathering the data from twitter and the organization of the dataset. A sub-heading in that section could be something like “Themes in the data: Topic modeling” where they go through all of the methods, results, AND discussion and labels (each with their own subheadings…) for the topic modeling. Then they can move on to a section about defining their WAR (AND alternative!) dictionaries and analyses and discussion, and then conclude with their general discussion. I think something like this would help make the flow of the paper clearer and more effective.

Reply: Thank you. The organization of the empirical part of our paper was presented as follows:

Methods

General design of the 3 studies

Constructing the corpus of Covid-19 tweets

Study 1: Identifying topics in Covid-19 discourse on Twitter through Topic Modeling

Study 2: Determining lexical units associated with the WAR frame

Study 3: Search method for alternative framings and relevant lexical units therein

The literal frame of FAMILY used as control

Results and Discussions

General Corpus Analytics

Study 1: Topic Model Analysis

Analysis of 4 Topics

Analysis of 16 Topics

Study 1: Topic Modelling Discussion

Study 2: WAR Framing Results

LDA Topic prediction of WAR tweets

Study 2: WAR Framing Discussion

Study 3: Alternative Framing Results

Study 3: Alternative Framing Discussion

Conclusion.

Reply: Indeed, we provided in a clear and structured way the overall set of methods, and then the section with all results and discussions. The reviewer asks to restructure the paper as one big study with several analytic components. Following the reviewer’s suggestions we have restructured the paper as follows:

Experimental design

Constructing the corpus of Covid-19 tweets

General Corpus Analytics

What type of topics are discussed on Twitter, in relation to Covid-19?

Identifying topics in Covid-19 discourse on Twitter through Topic Modeling

Topic Model Analysis

Discussion

To what extent is the WAR figurative frame used to talk about Covid-19 on Twitter?

Determining lexical units associated with the WAR frame

WAR Framing Results

LDA Topic prediction of WAR tweets

Discussion

Are there alternative figurative frames used to talk about Covid-19 on Twitter?

Search method for alternative framings and relevant lexical units therein

The literal frame of FAMILY used as control

Alternative Framing Results

Replication studies

Discussion

General Discussion and Conclusion.

Some additional comments:

While the authors reviewed a good amount of research on war metaphors, they neglected to discuss any of the dozens of articles have been written very recently about the war metaphor framing for Covid-19 (and its plusses and minuses), in both mainstream and independent outlets online. I think citing and discussing at least some of these would help situate the article in the present moment, provide additional context, and highlight the importance of the present research. Here are some examples:

https://grist.org/climate/no-more-war-on-coronavirus-in-search-of-better-ways-to-talk-about-a-pandemic/

https://www.vox.com/culture/2020/4/15/21193679/coronavirus-pandemic-war-metaphor-ecology-microbiome

https://time.com/5821430/history-war-language/

https://www.theguardian.com/commentisfree/2020/mar/21/donald-trump-boris-johnson-coronavirus

https://medium.com/@steve.howe_63053/were-at-war-the-language-of-covid-19-e3d4f4a1ae2e

https://www.counterpunch.org/2020/04/24/trump-is-not-a-wartime-president-and-covid-19-is-not-a-war/

https://www.afsc.org/blogs/news-and-commentary/how-to-talk-about-covid-19-pandemic

https://blogs.scientificamerican.com/observations/military-metaphors-distort-the-reality-of-covid-19/

https://theconversation.com/war-metaphors-used-for-covid-19-are-compelling-but-also-dangerous-135406

Reply: Thank you. All these blog posts and articles are non-academic, addressed to the large public, and most of them have been published online after our submission to PlosOne. These are the reasons why they were not included in the discussion. In fact, there has been an exponential growth of non-academic posts and articles for the large public on the language of covid, especially in relation to the recent #ReframeCovid initiative, which however gained momentum after we submitted the current paper. We gladly related to some of these more recent press materials in our academic paper (although we have already mentioned the rise of the ReframeCovid initiative), for the purpose of framing our article within the current debate, as the reviewer mentions. We added a long paragraph on this matter toward the end of the Theoretical Background section.

On Page 8, 167, the authors say “As explained in [1], war metaphors are pervasive in public discourse and span a wide range of topics because they provide a very effective structural framework for communicating and thinking about abstract and complex topics, notably BECAUSE of the emotional valence that these metaphors can convey” (emphasis added). This makes it sound like the emotional valence of WAR is part of its structural framework, but I think this is a bit confused. War provides both a structural schema as a source domain AND it conveys an emotional tone; these points are actually separated in the paper referenced in the sentence. The authors break this down on the following page, but this sentence was unclear. Again, this may be part of the broader need to edit and revise some of the language in the article.

Reply: Thank you, we rephrased this sentence as the reviewer suggested (as well as others throughout the manuscript, to improve readability).

P12, Line 243-4: “…and the targeted language [English] corresponds to the first language of many (if not most) US residents.” Look this up and cite a source instead of speculating.

Reply: Thank you. Corrected. We cited the American Community Survey (ACS), which reports this information.

P11-12. I was terribly confused by the whole data gathering and filtering procedure. It was unclear how many tweets there were vs. individual tweeters vs. used tweets. The table tracks cumulative tweets but didn’t say that, which was confusing. It was not explained how the filtering was done (i.e., how did you choose which tweet to keep from each user that posted multiple tweets? Did the same tweeters post on multiple days and how was that dealt with?). I think this whole section could be streamlined and made much clearer.

Reply: Thank you. We have edited the paragraph including an additional discussion about the individual tweeter filtering and improved the description of the table. We have simplified the interpretation by explaining that all 203,756 tweets are from individual tweeters, as each day tweets are filtered against the list of previous tweeters.

Lines 270-72: The authors note they expected to find broader and more generic topics when they included 4 as compared to 16 topics. Well, of course, how else could that have turned out? In general, I found the use of two sets of topics to be unnecessarily confusing and did not feel it added much to the overall message in the paper. I suggest the authors stick with one set of topics that have easily identifiable and meaningful labels/clusters of attributes. Perhaps they could split the difference and choose 8 or 10 topics. Whatever makes the most sense for interpreting the metaphor data later is fine. I should also note this was all very exploratory/arbitrary, which is OK, but perhaps should be noted in the text (they could add a footnote explaining that using different numbers of topics doesn’t fundamentally change the pattern of findings).

Reply: Thank you. We looked at a more compressed analysis that includes just 4 topics, and at a fine-grained analysis that encompasses 16 topics and we were interested in observing what type of information is captured by these topics. The choice for these two numbers specifically is motivated by theoretical as well as technical reasons that we have now explained (as requested by reviewer #4). Considering the topic modelling as a clustering task, the 4 ways and 16 ways solutions appear to be coherent cluster solutions. From a theoretical perspective, we were interested in exploring whether the more fine-grained solution (which delivers many smaller and more internally coherent clusters) provided intra-class distinctions and semantic specifications that could not emerge when looking at the condensed solution (which delivers a few large and less internally coherent clusters). In fact, we provided some qualitative observations in this respect, in the discussion of these analyses. It should be noted that labelling topics in LDA is common practice, observed for example in Lazard et al, and in Tran and Lee, and in Miller and colleagues (all these works are reviewed in the Theoretical Background section of our paper). Finally, again in the section Discussion of the Topic modelling analysis, we provide a discussion of the 16-way topic analysis and describe which of these appear to be related to the use of the WAR figurative frame. Therefore, and in line with the feedback received from the other reviewers, we are going to keep both the 4-way and the 16-way analyses.

Lines 309-310: The authors write, “The term list includes the following 79 terms“… but no list was forthcoming yet until the authors discussed their other method for generating terms. Either separate out into two lists (79 + 12) or, better, use one list but BOLD the ones coming from tool two (metaNet), and do not say “the following terms…” until you are actually planning to list the terms.

Reply: Thank you. We fixed this wording. However, we have not bolded the terms coming from metaNet or separated the two lists because it did not add value to our argumentation, and because some of the words appeared in both resources and therefore attributing them only to the first resource would have been misleading.

The authors use FAMILY as their “literal” comparison, but it should be noted that family terms COULD be figurative (and indeed, Lakoff, for example, has written much about the figurative uses of FAMILY in describing governments…). For example, “all Americans are one family.” “the president is the father of the American household,” etc. Is there any way to check to make sure all of the instances of family terms in the dataset are indeed literal and not figurative (and to remove the latter)?

Reply: Thank you. We manually checked the data in our corpus and indeed, as we expected, here the FAMILY related words that are used to construct the family frame are used in their literal sense. It makes sense because people on Twitter mention their personal situations in relation to Covid-19, which involves their family members, family-dynamics affected by the pandemic and the lock down and so on. So, family-related words are used literally. We have also mentioned that indeed this frame can be used metaphorically in other contexts (e.g., to talk about nations and governments, as Lakoff extensively demonstrated), at the end of the Theoretical Background section.

On lines 503-4, the authors note that war words have a “very negative valence, OF COURSE” [emphasis added]. But I am not so sure I agree with that. Some people might get excited and motivated by ‘FIGHTING” the virus (which feels much less negatively valenced than THREAT, for example). Especially in the United States, which comprises many subcultures that glorify guns and wars and the military, I think some of these terms may be quite positively valenced. Maybe draw on some empirical work and use actual ratings of emotional valence of these words (e.g., using Pennebaker’s LIWC or some other database)

Reply: We are grateful for the remark, nonetheless we can find our claim confirmed using emotional valence ratings. Analyzing our WAR term list using Pennebaker’s LIWC (for Social Media, including Twitter) results in a score of 28.9 for negative emotions on our terms, where the average texts have a score of 2.10 (the higher the score, the more negative the emotions). Our terms have a positive emotion score of 1.1, where the average texts have a score of 4.57. This indicates that war words have indeed a very negative valence.

____________________________________________________________

Reviewer #3

This paper adopts a topic modelling approach to study a dataset consisting of just over 200,000 tweets about Covid-19 posted in English (and primarily from the USA) in March and April 2020. The approach is employed to: identify the main topics in the data (set at 4 and 16); study the prevalence of a WAR metaphorical framing; compare that framing with three alternative metaphorical framings and a literal topic; and investigate any correlations between the WAR framing and the topics that were automatically identified. The findings are relevant, if somewhat predictable: the WAR framing is more prevalent than the alternative metaphorical framings, and it tends to correlate with discussions of diagnosis and treatment.

Reply: Thank you. Although the reviewer mentions that the results may appear to be somewhat predictable, there was no quantitative empirical evidence published in support of this intuition. Therefore, our paper provides empirical support for this intuition (in addition to several innovative aspects, such as the relation between the WAR related terms and specific topics of this pandemic).

Concerning the creation of the dataset, the authors provide some justification for limiting tweets from the same account to one. However, this makes it impossible to capture the actual prevalence of the various framings on Twitter. The consequences of this decision should therefore be explicitly acknowledged.

Reply: Thank you. We limited the tweets to one tweet per user, to construct our balanced and representative corpus, precisely because we wanted to explore to what extent Twitter users use the various framings. By doing so we limited the bias of having for instance many tweets by the same user who, arguably, uses (or does not use) a specific frame because it is a frame that he/she finds apt and appropriate, or because he/she particularly likes it. For example, imagine that one specific Twitter user is very fond of sci-fi and monsters-related issues, he/she might use the MONSTER framing very frequently in their tweets, also about Covid-19. If we kept all the tweets by this user, this might have biased the distribution frequency of the MONSTER related words. Instead, by limiting one tweet per user, and dropping the retweets, we kept our corpus relatively manageable from a technical perspective (not too heavy) as well as representative and balanced from a theoretical perspective. In other words, we are not interested in the “actual prevalence” (the reviewer means the absolute prevalence?) of the various frames on Twitter, but the relative coverage of the different framings used to talk about Covid on Twitter. We have explicitly acknowledged these matters in the section “Constructing the corpus of Covid-19 tweets”.

The labelling of the groups of terms associated with each automatically generated topic imposes more coherence on each set of words than is actually the case, especially in the version of the analysis that only involves four topics. This is typical of this kind of computational approach to discourse analysis, but it should minimally be pointed out as a methodological issue.

Reply: Thank you. In topic modelling this is common practice. The clusters of words returned by the topic model are unlabeled, and are typically interpreted and labelled by the analysts (as in much of the literature mentioned in the Theoretical Background section: in Lazard et al, and in Tran and Lee, and in Miller and colleagues.) We have now explained, when introducing topic modelling, that this is the case, in the beginning of the section “Identifying topics in Covid-19 discourse on Twitter through Topic Modeling”.

As for the alternative metaphorical frames, the terms under TSUNAMI are generally to do with natural disasters, rather than tsunamis specifically.

Reply: Thank you. The word lists were constructed using two tools widely used and acknowledged in the scientific literature: the repository of metaphors and frames released by Lakoff and colleagues (hosted at Berkley) and the RelatedWords web service. We have extensively explained both in the manuscript. The specific way in which word lists were drafted is clearly and transparently explained in the manuscript, and backed up by solid scientific literature, not by personal intuitions.

Finally, it should be acknowledged more explicitly that this kind of analysis cannot shed light on how the WAR framing, or any other framing, are actually used. For example, it cannot distinguish between cases where the WAR framing is adopted and where it is critiqued (as has also been the case on Twitter). Ideally, the subset of tweets that employ WAR-related vocabulary could have been subjected to a more fine-grained analysis, but this usually goes beyond the scope of studies such as this.

Reply: Thank you. Indeed, as the reviewer concludes, a qualitative analysis of the subset of tweets that use the war-related words goes beyond the scope of this study. As a matter of fact, here we are talking about 9,502 tweets. We speculate that, because the WAR frame is particularly conventionalized, most of the WAR related words are used subconsciously and therefore adopted, rather than deliberately and consciously opposed and critiqued. But this intuition implies a different set of research questions, empirical analyses and hypotheses, as also the reviewer points out.

__________________________________________________________________

Reviewer #4

Thanks for the opportunity to review.

Interesting look at how discussion on Twitter may be framed using frames from the disease literature, and a brief discussion of results of topics models on a limited Twitter dataset. This is certainly a timely thing, so I recommend major revisions. With work I think this could bring value to the public health community as we endeavor to perform contact-tracing and subject to mis- and dis-information around this pandemic.

Reply: Thank you.

---------------------------

Main critiques

---------------------------

What I am missing is the theoretical and practical contribution. Specifically, how would the authors answer the "so what" if the tweets are framed like WAR, STORM, etc., and "so what" if they're not? (which, they're not - 90-95% of the posts are not according to the results.)

- Are relative frequencies of frames statistically different from each other, and do they happen often enough to be significant in general? Put another way, does this frame analysis work or matter on Twitter?

- How would the authors characterize the other 90% of the discussion, and why / how is it important? Are there any themes related to mis- or dis-information, or to political polarization?

Reply: Thank you. We made explicit our goal at the end of the Introduction: “This paper reports and discusses a series of corpus-based quantitative analyses on the figurative framings used in pandemic-related discourse. In particular, the WAR frame, which previous studies have identified as pervasive in many crisis-related texts, is investigated by means of automated methods (topic modelling) applied to real-world data. By answering the research questions outlined above we claim that our data can help to improve surveillance and forecasting about the disease, as previous research on pandemics has shown (see next section).”

Reply: Thank you. We have now reported the results of a Cochran’s Q test to assess if there are any significant differences between the 5 frames, in the way they are represented in the corpus of tweets. To do so, we compiled a tweets-by-frames matrix that displays binary values for presence or absence of a frame-related term in each tweet. Cochran’s Q test statistic = 47,226.72 , df=4, p < 0.001. The post-hoc pairwise McNemar test is highly significant (p < 0.001) for all permutations (all frame pairs). We can therefore conclude that there are significant differences in the presence of target words between each pair of frames.

Reply: Additionally, we have replicated our analyses on an updated corpus of tweets collected over two months with the same criteria that we used to collect the original corpus, and we replicated our analysis on a new corpus, extracted from an existing dataset “Coronavirus Tweets Dataset” (Rabindra Lamsal. (2020). Coronavirus (COVID-19) Tweets Dataset. IEEE Dataport. http://dx.doi.org/10.21227/781w-ef42). We could replicate similar proportions across the same timeframe in the alternative dataset and in our updated corpus. We have included these results in the section “Replication studies”. The distributions of the frames is very similar across the three resources, with the literal (control) frame being the most frequent, the WAR frame being the most frequently used frame, among the various metaphorical frames considered, followed by the STORM, the TSUNAMI and finally the MONSTER frame.

Reply: The rest of the covid tweets on Twitter are characterized by other frames, which can be literal or metaphorical. Arguably, they are mostly literal, because the WAR frame has been indicated as a pervasive metaphorical frame to talk about diseases. Our analysis is the first contribution that tests this qualitative claim, by running an analysis on an extensive corpus of tweets, about covid. This is new quantitative information, in cognitive linguistics and metaphor studies, which wasn’t available before. As we now mentioned in the paper, in the Discussion to the alternative frames section, Steen and colleagues, for example, reported that literal language covers 86.4% of the lexical units in a corpus of 187,570 lexical units analyzed by them (a subcorpus of the BNC), while metaphor-related words cover just 13.6% of the lexical units. Their analysis encompasses all parts of speech, including for example prepositions that are very commonly used metaphorically (e.g., IN, whenever referred to a non-spatial relation). Assuming that this percentage can be applied to any text (around 13% of words used metaphorically, the rest literally) our findings provide an argument for the actual pervasiveness of the WAR frame, which covers more than one third of ALL the metaphorical language used in texts.

Reply: Disinformation and political polarization are themes that are not related to the scope of this paper.

Second, I have concerns about sampling bias. This amounts to a study of 12 days' worth of tweets, only a few thousand. Line 60 states 16k tweets are posted every hour (do the authors have a citation?), and yet the authors collected 25k tweets per day. This equation does not balance, even when accounting for a 1% sampling rate from the Twitter API. This uncertainty undermines the efficacy of this paper - either the collection has a problem or the statement is false.

Reply: We could not collect every tweet that relates to the covid-19 discourse on every day, since those data sets that did, have collected between 0.5 and 3.5 million tweets on a single day (depending on different keywords), quickly accumulating unmanageable amounts of data exceeding the scope of our technical limits. “16k tweets are posted every hour (do the authors have a citation?), and yet the authors collected 25k tweets per day. This equation does not balance” should be elaborated further: We have queried 25k tweets on each day. This query collects every tweet with the related keywords in chronological appearance and stops when 25k of tweets have been reached. As the collection finished after about 1.5hrs, we can infer that around 16k tweets are posted every hour (relating to the covid19 pandemic). We have rewritten this section to make this clear.

Reply: Regarding the sampling bias, we can now refer to the results from our updated corpus and the alternative existing corpus (see next response). Notably, another great influence on the amount of tweets in our sample are the restrictions (only individual tweeters, etc.) which we have discussed in the manuscript and in further responses to reviewer #2. For example, we have collected 6x as many tweets from every day for the two weeks from an alternative dataset (Lamsal - see next response), yet after applying the same restrictions we ended up with a corpus of just 2x the size. This implies that our sample is naturally much smaller than the available data that consists of retweets and tweets from multiple users - tweets that are of no use for our analysis.

Regardless, at the time of data collection multiple datasets of Twitter related to COVID-19 existed. I strongly recommend repeating this analysis in two ways to see if the results change or hold:

- one, now that the authors have been collecting more data for a while,

- and two, perhaps more pressingly, using one of the public open datasets for Twitter with millions of tweets. See e.g. this collection of resources: http://www.socialmediaforpublichealth.org/covid-19/resources/ "Twitter Data"

Reply: We have included a paragraph in the revised manuscript that shows how our results about the impact of WAR terms hold when we look at our updated corpus (59 days of collection) of 654,354 tweets (Section Replication studies). Analyzing all tweets from the updated database, 5.54% tweets contained at least one term from the WAR framing, which previously was 5.32%. We argue that the increase of >0.22% in the WAR framing can be partially explained by new debates entering the discourse. For example, we could identify the topic of increased domestic violence (as a consequence of the lockdown), increased cyber criminality with “attacks” exploiting the anxiety during the pandemic and the increased involvement of the military in supporting and restricting the public. It will be highly interesting to separate those topics and observe them in another topic modeling analysis, yet we do not include such analysis as it would change the nature of the paper, which is focused on the early stage of the discourse.

Reply: We agree that the size of our initial dataset was limited and therefore it is crucial to replicate the results using more data. Yet, our criteria (individual tweeter, English only and the specific keywords) are very specifically tailored and embedded in our theoretical considerations. Nonetheless, we replicated our analysis on a new corpus, extracted from an existing dataset “Coronavirus Tweets Dataset” (Rabindra Lamsal. (2020). Coronavirus (COVID-19) Tweets Dataset. IEEE Dataport. http://dx.doi.org/10.21227/781w-ef42). A comparison of the Lamsal dataset with our dataset over the same time-frame can be seen in the Figure below. Notably, the relative order of proportions is the same as in our analysis: Family > War > Storm + Tsunami + Monster. Differences (Family proportion decreased by 3.46%, War increased by 1.62%) can be explained by different keywords that have been used to acquire the Lamsal dataset, e.g. it includes “Corona”, which is arguably a more colloquial expression that we did not use to select our corpus. Moreover, their keywords change multiple times during the process of data mining, while we kept the same set of keywords. Whereas the suggested dataset from the online repository at www.socialmediaforpublichealth.org/covid-19/resources/ "Twitter Data" includes all languages, which made it infeasible to apply language identification on such a large dataset.

Reply: Additionally, we have compared our updated corpus (two months of tweets) with the Lamsal corpus covering two months of tweets (also depicted in the Figure present in the docx of this response). We can observe that the distributions of tweets within each frame follows the same trends across the various corpora: the data collected on the 2 weeks corpora (ours and Lamsal’s) are very similar (compare bars with bright colors to bars with pastel colors). This figure along with the numerical results and a brief discussion have been added to the updated manuscript, in the section Replication studies.

(This also suggests an opportunity to do temporal analysis, to see if the frames and discussion have changed and if so, how they are changing. This may help with a practical contribution - to answer if discussions are moving in a healthy or helpful direction, or the opposite, and why?

- For example, how often do these topics found happen over time, how often do these frames happen over time, and why is that important?

Reply: Thank you. We did not identify a change in the distribution of the figurative frames within the discourse on Covid, within our initial 14 days of observation. With our updated corpus of 2 months of tweets, we also did not see a change of the proportion with which the frames occur. The Figure below (present in the docx of this response) depicts the percentage of tweets containing at least one term from the respective frame over the course of 2 months (dashed lines indicate the projection over 3 missing days, due to technical issues with the Twitter API. Notably, similar issues can be found in all other publicly available datasets). A thorough statistical analysis of these observations and the correlation of this temporal analysis with the progression of government regulation, infection rates and news are beyond the scope of the current paper. But we are currently pursuing this study, for a different contribution that will have a longitudinal perspective and will show how the different topics and frames change across time, week after week.

How would we interpret these topics, and why might they be important? How do these frames correspond with hashtags or the literal discussion of the disease?)

Reply: Thank you. In the manuscript we have provided a description of the topics identified by LDA, which is indeed our interpretation of the topics (including their labelling).

Reply: The only hashtags that we have considered in our analysis were the hashtags used to construct the corpus of tweets (#covid etc.). We did not analyze any hashtag related to the frames, because it was not in the scope of our analysis (see our research questions).

Thirdly, please see critiques of the methods, related to LDA and Twitter pre-processing.

Reply: Thank you. We replied to those.

Fourthly, I also include more minor points and notes about statistical significance.

Reply: Thank you. We replied to those.

---------------------------

I also have concerning methods critiques that may undermine results:

---------------------------

On tweet processing decisions:

- I'm struggling to understand why the authors eliminated all but one tweet per user. This is a limitation. It looks like the methods and results are at the level of a tweet, not at the level of a user. In addition to the sampling bias, the authors could be discarding data that is important to their analysis. If the authors insist on retaining only one tweet per user, how this was performed? Was this random? If not, this could bias one's data again.

- I'm struggling to understand why the authors excluded retweets and mentions. How many retweets and mentions are there? Together, these choices severely limit the amount of analysis possible, to show how often the frame of the discourse is spreading, occurring, or changing. I understand wanting to exclude them initially, but what about repeating the analysis with them included to see how it changes?

Reply: Thank you. We constructed a corpus, which follows specific criteria that we motivated in the paper. A corpus is a balanced and representative collection of texts. As we explained in the corpus, we limited one tweet per user precisely to avoid having a biased corpus where super-tweeters could have biased the percentages of frequency distribution of frames. Imagine for example that one specific Twitter user who uses Twitter many times per day, is very fond of sci-fi and monsters-related issues, he/she might use the MONSTER framing very frequently in their tweets, also about Covid-19. If we kept all the tweets by this user, this might have biased the distribution frequency of the MONSTER related words: we would have observed a high percentage of tweets with this frame, but these are all produced by the same person, who simply happens to tweet very often. This is not representative for the population. Instead, by limiting one tweet per user, we kept our corpus relatively manageable from a technical perspective (not too heavy) as well as representative and balanced from a theoretical perspective. We also dropped the retweets and mentions because these are duplicated texts, so, the same datapoint would have been counted twice. We argue that this is not desirable, from a corpus linguistics perspective. We have explicitly acknowledged these matters in the section “Constructing the corpus of Covid-19 tweets”. To conclude, our choices for the configuration of the corpus are theoretically motivated and clearly explained in the paper. We do not expect the reviewer to necessarily like them, but methodologically speaking, they are motivated and transparent. As explained later, we have added further analyses to replicate our results on an updated corpus constructed with the same criteria, collected by ourselves, and on a corpus constructed on the basis of existing resources (section “Replication studies").

On LDA implementation:

- Did the authors use Gibbs sampling or variational inference? Gibbs sampling has been shown to yield vastly superior topics. I'd recommend repeating analysis if used variational inference and see if results hold.

- how did the authors choose 4 vs 16 topics? why not other numbers? did they check perplexity - what number of topics has the lowest perplexity? (most likely to explain the data)

Reply: In the manuscript we have used an online variational Bayes (VB) algorithm through Gensim’s LDA-Multicore implementation. The topic numbers 4 and 16 have been chosen intentionally based on two factors. 1) We have observed the words in 4, 8 and 16 clusters and identified 4 and 16 to be semantically most informative and insightful. 2) We have compared the coherence values over 1 to 16 topics using Cv coherence (Syed, S., & Spruit, M., 2017). It is fair to assume that the slight increase of coherence after 16 topics is negligible given the fact that more and more topics might be more coherent, yet not more insightful or semantically meaningful. Therefore, we have chosen 16 topics at the higher end of coherence scores and for a small topic number, we have avoided choosing 2 topics (with the worst coherence score).

Reply: Additionally, we have now included a comparison of the coherence score over 1-16 topics using Gibbs sampling (through Mallet’s Gensim LDA wrapper). We are grateful for bringing this to our attention and report that our results hold. Firstly, the coherence score does not greatly change between the two algorithms (see Figure present in the docx of this response). Secondly, we have investigated the actual topics (4 and 16) for both algorithms and although some of them are different (as to be expected), there are no greatly different topics or insights to be expected. We have included the trained LDA model for 4 topics and 16 topics using Gibbs Sampling in the online repository at Models/lda_N4_Gibbs.pkl and Models/lda_N16_Gibbs.pkl.

- How did the authors handle hashtags and URLs and usernames? These may contain information, or not, depending on the design of the study. What happened? These may be useful to report if analyzing the discussion.

Reply: Thank you for your questions. We now make it more explicit in the paper by stating that “[i]n our study design, we are not interested in the dynamics within the social network and have not collected retweets and did not investigate usernames, hashtags or URLs.” We have simply included usernames, hashtags and URLs in the frame analysis and LDA training data. Which is for example why we have observed @realdonaldtrump in a topic word cloud.

On LDA interpretation and results:

- the authors look at significant words in topics, but what about tweets most about those topics? it can make a difference, per the coming citation. I recommend evaluating topics in both ways, as it may affect results of lines 585-596. See https://scholar.google.com/scholar?hl=en&as_sdt=0%2C21&q=reading+tea+leaves+humans+interpret+topic&btnG=

Reply: Thank you for this reference. Yes, we have looked at most significant words in the topics (similar to word intrusion in the referred paper). In our section “LDA Topic prediction of WAR tweets”, we are in fact looking at those tweets most about the topics. Admittedly, we only do this for tweets that contain a word from the WAR frame and not for the topics themselves in order to validate their cohesion (similar to topic intrusion in the referred paper). Overall, we now include a paragraph in the revised manuscript that will be more critical about the proposed topics, highlighting the limitations and suggested improvement of the modeling approach as it stands (see Discussion of LDA Topic prediction of WAR tweets). Arguably, there are many ways in which we can adopt and improve the topic modeling, yet our interpretation and especially the analysis of the framing within those topics has been done on a word and tweet level.

- did the authors interpret all of the 16 topics like they did the 4 topics? (lines 450-453)

Reply: We have interpreted all of the 16 topics in the sense that we relate them to the 4 topics (lines 454-463). Here, we also interpret novel topics (line 461-463).

- for topic figures, I suggest putting names of topics in the figure axes where possible. without them it's inconvenient to remember which is which

Reply: Thank you. However, because the labels for these topics have been provided by us, rather than being returned by the model, we believe that indicating these labels in the figures would be misleading. In other words, the labels of the topics are our own interpretation of the results of the LDA analysis. The topic model provides numbers for the topics (1,2,3 and 4), and based on the clusters of words within each topic we provided a label.

On Twitter pre-processing, I'm worried about the authors' use of general-language tools on tweets which have been shown to use vastly different language structure. - stopwords from 2012... check/justify that these are up-to-date and apply to Twitter? need to come up with domain-specific ones? - along the same lines there's a twitter tokenizer (e.g., stanfordNLP, NLTK) that are custom-built for this... what about emoji, how were these handled?

Reply: Thank you. This is valuable feedback and we are grateful for pointing out these specifics. With respect to the general-language tools, we have to explain that we have used stopword-removal, lowercasing and tokenization for the pre-processing. Firstly, our stopwords list was an updated version and we did not only use the original list from 2012, we have included domain specific stop “words” such as “http”, “&” and new format stop words e.g. “don’t, don`t, I'm, I`m”. The manuscript is now updated to clearly state that we have used an updated stop word list.

Reply: As for the Twitter tokenizer, we were not aware of a specific Twitter tokenizer especially for the language library (Gensim) that we have built the analysis with. Due to the raised concerns, we have investigated the source code of the NLTK TweetTokenizer (www.nltk.org/_modules/nltk/tokenize/casual.html#TweetTokenizer) in order to understand its possible improvements over our approach and whether or not they are negligible. The TweetTokenizer removes user handles (@). We are actually indifferent in our analysis about those as we have mentioned before. It removes html entities, which we partially remove with our additional stop words. Notably, it removes repeated characters (heeeeey), which we do not. Yet, we filter tokens with a length < 3 (“as”, “aa”, “to”, “yo” etc). It handles emoticons, emoji differently than we do. We agree that a Tweet tokenizer might reduce the number of tokens overall, but investigating our tokens, our pre-processing and stop words can compare with the same relevant tokens. Yet, we fully acknowledge the possible improvement for any future work.

- line 279: better would have been to use tf-idf and leave the common terms in... these would have been reduced by the weighting organically

Reply: Thank you. We are grateful for this advice. Irrespective of the removal of common terms, we did not see any improvement of the LDA w/ or w/o tf-idf, as mentioned in the manuscript.

On literal framing control:

- What about the literal frame "it's a disease"? The authors chose family as a literal frame- this may strongly coincide with incidence or deaths from the disease, which may not be exactly what the authors want to measure.

Reply: Thank you. We have considered several literal frames, including DISEASE and VIRUS. However, these words used within these frames are indeed often related to the figurative frames used for the analysis (fight, defend, attack, for example). In this sense, a frame like DISEASE or VIRUS, are hyper frames, they are at a different level of abstraction, compared to WAR, STORM, FAMILY, etc. Within the DISEASE frame we could still find the frames of FAMILY, WAR etc. We picked FAMILY because, instead, it appears to be more on the same level of abstraction of the figurative frames.

- In addition... how are the authors controlling by including this? Should this be used for normalization, or testing statistical significance of frequencies or of differences among frame?

Reply: Thank you. The literal frame is used as “control” in the sense that it is used to compare the frequency distributions of the figurative frames (metaphorical frames: war, storm, etc). As explained above, however, the statistical distribution of frequencies cannot be not easily calculated. We opted for a couple of replications, to show in a descriptive manner that the percentages are substantially preserved and the ranking of use of the frames too: the most frequent frame that occurs in the corpus (and in the secondo corpus collected for replication, and in the external corpus on which we ran our analysis for comparison) is FAMILY; followed by WAR (the most frequent metaphorical frame), followed by STORM, then TSUNAMI and eventually MONSTER.

---------------------------

On Results and discussion

---------------------------

table 2 - are these results statistically significant? This would give weight to the authors' statement about the relative amount.

Reply: Thank you. In response to Reviewer #1 (Remark #7), we have provided statistical (significant) results for the relative difference of proportions (using Cochran’s Q - a multi-sample extension of the McNemar test).

Reply: We used the Cochran’s Q test to assess if there are any significant differences between the 5 frames, in the ways they are represented in the corpus of tweets. To do so, we compiled a tweets by frames matrix that displays binary values for presence or absence of a frame-related term in each tweet. Cochran’s Q test statistic = 47,226.72 , df=4, p < 0.001. The post-hoc pairwise McNemar test is highly significant (p < 0.001) for all permutations (all frame pairs). We can therefore conclude that there are significant differences in the presence of target words between each pair of frames.

lines 512-536, about topics predicting occurrences of frames... are these differences between frequencies statistically significant? are these frequencies high enough to matter?

Reply: Thank you. We will clarify the statistical significance in this case: the reported frequencies are probability distributions that are being inferred by our LDA model. To the best of our knowledge it is non-trivial to provide a p-value for this prediction, because its accuracy relates to the goodness of the LDA model. The model itself provides a probability for each tweet to belong to a topic. As topics #1, #3 and #4 show a probability of >24% and topic #2 a probability <8%, we have interpreted this prediction with respect to the quality (coherence of our model) and concluded a meaningful difference. If the reviewer is aware of any statistical test that can be applied here, we would greatly appreciate the feedback and include it.

Continuing down the path about frames vs. topics:

- How often do the family or alternative frames show up in the predicted topics? like likes 383-384 for the WAR frame.

Reply: We cannot provide absolute numbers for the occurrence of the family or alternative frames in the predicted topics, because we can only use the LDA model to infer a probability of each tweet (from family terms / alternative framings) to belong to a topic. We have provided a few more probability distributions below to show that the predicted topics are in fact different for the different frames, yet the difference is diluted when observing the 4 topics (see Figure present in the docx of this response). We have decided not to include a discussion of all possible distributions (4 frames x 2 topic sizes (4, 16) = 8 distributions), as it would not contribute (nor contradict) our hypothesis.

- In addition, how many topics include words in the frames? This may be an indicator if the frames are even worth studying on this domain. (see 90% number and earlier comment about frequencies)

Reply: We think there is a conceptual misunderstanding here. A topic does not include a word or not. Each “topic” is a probability distribution of all tokens. That means that a token “trump”, “news” or “family” belongs to a topic with a certain probability. Therefore, all of the topics include words of the frames, some with high probability, some with low probability. To check the probability of a word within a frame one performs an inference of that word. Doing so for all words of a frame is what we have already done with the reply to the previous question. We can look at the probability from two sides, the topics or the frames, but the distribution is the same. Lines 55-67 do the authors have any citations for any/all of these statements?

Reply: Thank you. We do not believe that the text on these lines, informally describing the general situation generated by the current pandemic, require references to scientific literature. This would only disrupt the reading flow.

__________________________________________________________________

Reviewer #5

General comments:

This article provides some interesting first insights into what topics are discussed in relation to the Covid-19 pandemic on Twitter and what metaphor frames are prevalent. It was slightly disappointing that the research questions are purely descriptive, but maybe that is inevitable for a study that must be one of the first on the topic.

Reply: Thank you.

What is headed 'theory section' in fact includes both a literature review on analysing tweets on epidemics and another part on metaphor theory, but the two parts are not linked well. Also, the results are not linked back to the studies featuring in the literature review.

Reply: Thank you. Indeed, both parts pertain to the Theoretical Background and they are clearly separated because these are the two “ingredients” that we combine in our paper: social media analyses of pandemics and metaphor studies. They are not supposed to be linked well, that is why they are separated. We have, however, explained that in discourse analysis and metaphor studies topics such as the pandemics can be analyzed in terms of figurative framing, and mentioned the relevant literature.

The paper seems to have been written quickly and was clearly not proofread.

Reply: Thank you. We apologize for this limitation, which we have addressed in this revision. Indeed, we were probably too eager to submit, given the timely nature of the topic.

Specific comments:

• As for the corpus, how can the authors be sure that the corpus only contains tweets by non-experts?

Reply: Thank you. We have not argued that. What we have argued is that while in a corpus based on newspaper articles, for example, the texts have been written by professionals (journalists), on social media and therefore on Twitter, anyone with an account produces texts (the tweets).

• For Study 2, why did the authors decide against lemmatisation? There are good reasons not to use that process, e.g. because different forms of a lemma can express different metaphor scenarios, but the reasons should be spelled out.

Reply: Thank you. We added this specific motivation to the description of the corpus, as the reviewer suggests. Very much appreciated the constructive nature of this comment.

• The #ReframeCovid also includes storm, monster and tsunami source domains, and it does not only include examples from news media.

Reply: Thank you. Now, yes. We updated this information in the manuscript, in the section “Search method for alternative framings and relevant lexical units therein”. When we submitted the manuscript, the #ReframeCovid initiative had just been proposed as a hashtag on Twitter, and the crowdsourced collection of frames was about to start.

• What are possible labels for the 16 topics?

Reply: Thank you. The 16-way solution is a highly granular solution, generated by the topic model. Because the labels are not provided by the model, these have to be deduced by the analysts in an empirical fashion, by means of intuitions, basically. This is commonly done in topic modelling, but the more clusters the model generates, the more it is hard to name with an individual label each cluster (that is, each topic) provided by the model. For this reason, we discussed and provided labels to name the 4-way topic model, but we did not feel comfortable in labelling each and every cluster within the 16-way analysis. Our labels might have been easily criticized by the reviewers. For this reason, we instead opted for showing the 16 topics returned by the analysis in the form of word clouds, we discussed the 16 topics as sub-categories of the 4 topics, showed which of these 16 topics contain war-related terms (topics 2, 7, and 10), and to what type of arguments the other topics seem to refer.

• Why was family chosen as a literal control frame?

Reply: Thank you for this question, which gave us the opportunity to clarify this aspect, which was also pointed out by two other reviewers.

Reply: In traditional framing theory, framing is defined as “select[ing] some aspects of a perceived reality and mak[ing] them more salient in a communicating text, in such a way as to promote a particular problem definition, causal interpretation, moral evaluation, and/or treatment recommendation for the item described” (Entman, 1993,p. 53).

Reply: FAMILY is a frame that, as another reviewer pointed out, can be used to talk about covid (it is an aspect of this reality) that can be used figuratively, as well as literally. Another reviewer mentioned: “family terms COULD be figurative (and indeed, Lakoff, for example, has written much about the figurative uses of FAMILY in describing governments” (for example, within the EU, one could say that Germany is the responsible father of the family, who scolded Italy for her behavior etc). We have chosen FAMILY deliberately to be a frame that is most likely not used figuratively, yet has comparable properties to a frame we expect to be used figuratively (e.g. WAR, STORM or MONSTER). In fact, we have checked qualitatively the tweets that feature one of the FAMILY terms in it, and it appears that they are used literally, not metaphorically. So, for the covid reality, FAMILY is used typically as a literal frame.

Reply: Moreover, in communication sciences a frame is typically defined as consisting of two elements (Joris,d’Haenens, & Van Gorp, 2014, p. 609): framing devices which are elements in a text and specific linguistic structures (for example a list of words related to a frame) and reasoning devices which are the (latent) information in a text through which the problem, cause, evaluation, and/or treatment is implied (a more conceptual-communicative dimension of a text). A topic operationalized by topic modelling, corresponds to the first of these two components of a frame (i.e., a list of semantically related words). For example, one can say that sister, father, home, parenthood, brotherly love are all words that together represent the topic of FAMILY, which can be used as a frame in covid discourse: by talking about these family aspects of covid, one could direct the readers’ attention to the importance of staying together and helping each other, rather than looking at one another as strangers and potential risks of contagion. But a topic can also be used as a frame, if it has the communicative purpose to highlight a specific aspect of a situation to fulfil a specific communicative purpose.

Reply: Finally, one reviewer suggested to use, as an alternative frame, something like DISEASE. Infact, we have tried (before settling on FAMILY) to check for DISEASE and VIRUS as literal frames. However, the lexical units within these two frames encompass words related to the figurative frames (fight, defenses, attack, etc). We realized that these (DISEASE and VIRUS) are hyper-frames, that is, they are at a higher level of abstraction, compared to the frames WAR, STORM, or the literal FAMILY. We therefore opted for FAMILY because, we argue, is at the same level of abstraction of WAR and the rest of the figurative frames.

Reply: We have added this discussion in the revised manuscript, by introducing the definition of framing in the theoretical background, explaining that the FAMILY frame is used literally in this type of discourse (end of Theoretical Background section), and by explaining the relation between framing and topics (as intended in topic modelling) in the beginning of the section “Identifying topics in Covid-19 discourse on Twitter through Topic Modeling”.

• "Previous literature shows that metaphor-related words cover only a percentage of the discourse": does this refer to Steen et al. (2010)?

Reply: Thank you. Indeed. Yes. We have added this reference and elaborated this aspect further. The edited text is hereby reported:

Reply: “previous literature shows that overall metaphor-related words cover only a percentage of the discourse, and that literal language is still prevalent (e.g., Steen et al. 2010). Steen and colleagues, for example, report that literal language covers 86.4% of the lexical units, while metaphor-related words cover just 13.6% of the lexical units. Their analysis is based on a sub-corpus of the BNC that encompasses 187,570 lexical units extracted from academic texts, conversations, fiction and news texts. All parts of speech are included in their analyses, including function words (such as prepositions and articles). Assuming that this percentage of figurative language compared to literal language use can be applied to the discourse around Covid-19, we would expect to find around 13% of the lexical units in our corpus to be used metaphorically (including pervasive metaphorical uses of function words). Many of these metaphor-related words pertain figurative frames such as the frames analyzed in our paper. In this perspective, the percentage of use of the WAR frame reported in our study suggests that this frame is particularly pervasive, because it covers more than one third of all the metaphorical language typically used in texts.”

Attachment

Submitted filename: Response to Reviewers.docx

Decision Letter 1

Panos Athanasopoulos

31 Jul 2020

PONE-D-20-10986R1

Framing COVID-19 How we conceptualize and discuss the pandemic on Twitter

PLOS ONE

Dear Dr. Wicke,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

==============================

I have heard back from 3 of the original reviewers. As you can see, once again the colleagues have made incredibly helpful suggestions for revisions. I invite you to consider each point carefully. Please pay particular attention to the clarity of language issues highlighted by reviewers 1 and 2. I agree with them that you have your manuscript proof-read prior to submission of the next revision. And please especially focus on the technical issues with the data highlighted by reviewer 3. The reviewer has given no less than 4 possible ways forward, any of which I find entirely reasonable. At the very least, you should acknowledge the limitations of your data processing method, and modify your theoretical claims accordingly, as suggested by the 1st option the reviewer puts forward. I look forward to receiving your revision as per the guidelines below.

==============================

Please submit your revised manuscript by Sep 14 2020 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

We look forward to receiving your revised manuscript.

Kind regards,

Panos Athanasopoulos, Ph.D

Academic Editor

PLOS ONE

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: (No Response)

Reviewer #2: (No Response)

Reviewer #4: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Yes

Reviewer #4: No

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: I Don't Know

Reviewer #2: Yes

Reviewer #4: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #4: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: No

Reviewer #4: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: I appreciate the careful work of the authors to address the concerns raised in the previous round of review. The manuscript is clearer as a result and I think it will make a solid contribution to the literature. I have a few remaining comments and concerns that I describe below.

Introduction & Theoretical Background

The introductory sections do a better job of describing and situating the contribution of the current work. The paragraph that explicitly details “The innovative aspect of this paper…” (p. 5) is particularly helpful.

There is still an issue with the sentence I highlighted in my previous review, which now reads, “Unlike the articles on magazines and journals typically used for corpus analyses of this kind, Twitter contains messages written by journalists and other experts in mass communication: most tweets are provide by non-expert communicators” (p. 4). Maybe: “Although Twitter contains messages written by journalists and other experts in mass communication, most tweets are provided by non-expert communicators…”

I still find the heading “Theoretical Background” misleading. While that section certainly does discuss relevant research that helps situate the current work, it doesn’t discuss theoretical background per se (to my eye). But if the authors feel strongly that it does, that’s ok with me.

I feel more strongly that the heading “Experimental design” should be changed because the study is not an experiment. Maybe “Study design.”

Methods

I appreciate the revised structure of the methods and results, which makes the paper more readable. I still find the first section difficult to interpret. I think it would be useful to say explicitly in the set up or discussion of the topic modeling that the categories will be useful for understanding how the war frame is being used.

It’s not clear what it means to say that, “The two numbers of clusters were chosen on an empirical basis” (p. 17). What was the empirical method / criteria? How was it evaluated? Adding the LDA coherence measure is useful but seems different from what the authors mean when they say that the “clusters were chosen on an empirical basis.”

I appreciate the discussion of the MIP and MIPVU in the response letter, but I think some of these issues should be noted in the manuscript itself. Namely, this method of categorizing and quantifying metaphor is new, different from alternatives, and has its own benefits and limitations (as does any method of coding language). To be clear, I think the method being used is interesting and worthwhile. But it likely misses some instances of metaphor and it likely counts some words as metaphorical that are being used in a literal sense. Maybe, for example, someone tweeted about how COVID was spreading on the aircraft carrier the USS Theodore Roosevelt, which is a war ship. Given that one of the main contributions of the paper is the method, I think some attention should be given to the strengths and limitations of the method. Of note, a major strength of the method is that it can handle a lot of data. This point could also be made more explicitly. [The comment below also relates to a limitation of the coding method.]

I agree that the concept of a STORM is different from the concept of a TSUNAMI. However, if the same words (e.g., wave, flood, disaster) are being used to index both concepts, how do we know which concept people are drawing on? For example, the word “disaster” is the second most frequent marker of a TSUNAMI in the corpus and also the second most frequent marker of a STORM in the corpus. The word “wave” is also frequent in both. This means that some of the tweets are double counted as metaphorical in a sense, no? Do the analyses assume that the categories are mutually exclusive and exhaustive? It would be useful to report the percentage of tweets that included multiple categories of metaphor.

I understand and appreciate how the cluster of words related to FAMILY are being used, as a point of comparison with the figurative frames (WAR, STORM, TSUNAMI, MONSTER). But I still think there is a qualitative difference between this category and the others. The language of WAR (STORM and TSUNAMI and MONSTER) is being used to frame different aspects of the pandemic and response. In many cases, the issues being framed metaphorically could be discussed using an alternative metaphorical frame or in a literal sense. The tweets about FAMILY, on the other hand, seem to be about FAMILY per se. That strikes me as an important difference that should be discussed in the paper.

Results

See above for questions about tweets that were categorized as relating to multiple frames.

Adding the replication study strengthens the paper.

I recommend toning down some of the claims like, “…the percentage of use of the WAR frame reported in our study suggests that this frame is particularly pervasive, because it covers more than one third of all the metaphorical language typically used in texts” (pp. 33-34). The study restricts itself to four metaphorical frames of interest; the full range of metaphor typically used in texts is not quantified here.

Reviewer #2: I want to commend the authors on the impressive revisions to this paper. I think they have done just about everything they could to address all of the reviewers concerns, and I feel the paper is much stronger now and will make a nice addition to the literature. Indeed, I could see myself citing the findings in a future talk or article!

The only reason I am selecting "minor revision" rather than "accept" is that PLOS ONE does not copyedit accepted manuscripts, and I still feel the paper could use one more round of proofreading and editing. While the overall structure of the paper is greatly improved now, there were still grammatical errors and awkward phrases sprinkled throughout that hindered readability.

For example, lines 75-77 contained the sentence "Unlike the articles on magazines and journals typically used for corpus analyses of this kind, Twitter contains messages written by journalists and other experts in mass communications: most tweets are provided by non-expert communicators" This is very confusing to me as it seems to be saying two different things. A more mild example, but still one that should be edited for clarity, comes on lines 179-180: "The military metaphor thanks to which we frame diseases such as cancer is a very common one to be found in public discourse." And in the next sentence, the authors use the article "the" before Time Magazine, which is not necessary.

I am not going to go into every example of confusing writing, but I do want to recommend the authors go over the paper again and update the prose as necessary to enhance clarity. At that time, I will be prepared to accept!

Reviewer #4: Thank you to the authors for drastically improving many points about the paper. I enjoyed reading this interesting research even more this time. It is much cleaner to read and understand, including contributions, backing statistical analyses, and replication on other datasets. Other reviewers also brought up great points that I did not think of.

However, as is written, the study seems to have unacknowledged limitations and concerns that render contributions too broad about understanding general discourse on Twitter that I would recommend addressing before accepting this paper.

My concern stems from ambiguity in terms of unit of analysis that biases the representativeness of the data, which causes unacknowledged impact on results and contributions. Claims cannot be made about general discourse on Twitter, but they *can* be made about a biased yet useful subset - those tweets that originate from less-frequent tweeters.

Generally, any study that performs social monitoring needs to:

- make clear what its unit of analysis is,

- then accordingly determine what it means to be representative of these units in its sampling frame,

- gather units to analyze and analyze those same units

- acknowledge any limitations of representativeness

- and limit contributions accordingly

At minimum in this particular study,

- it should be made clear and consistent through data collection and analyses whether this particular study is analyzing use of frames by users, or use of frames in tweets, (I'm pretty sure it's tweets, right?)

- the limitation of a bias against super-tweeters should be made more explicit, and I would accordingly recommend explicitly walking back the study's claims, research questions, and contributions, because claims of general representativeness on Twitter are not defendable.

Specifically, if it is chosen to omit the vast majority of tweets by keeping one tweet per user and omitting retweets, claims cannot be made about how the general Twitter discourse discusses and frames COVID-19 (or to put it another way, the study's data is not representative of all tweets). Claims instead can be made about how those tweets *originating from less-frequent tweeters* discuss and frame COVID-19.

Either that, or I would recommend at least one of:

- switching to analysis at the level of a user

- more representatively sample from tweeters (as one tweet per user is usually not seen as representative of a user's content, whether they are among the top tweeters or not)

Please find details below - perhaps these can make the issue more clear if it helps.

First, the study has a bias against retweeters and sharers, in favor of original tweeters, as is stated. However, many users on Twitter retweet or share someone else's tweets - this is seen as a proxy for behavior and opinion, as usually people share things they agree with. Therefore, more accurate counts of frames might be *including* those frames that are retweeted. At the very least, one may analyze with retweets and without to see any differences. It's fine that the study disregards retweets, but this should be acknowledged as a limitation and claims about discourse should be modified accordingly.

And second, to paraphrase, data is being "limited to one tweet per user because of technical limitations and to make the corpus balanced and representative". I would reject the study's claim that the resulting corpus is representative - because the study is not clear about *how* it is representative. As designed, it seems like the study is confused about the unit of analysis that it is trying to represent: the tweet or the user.

- If the study wishes to accurately represent tweets, then by Twitter's nature there are super-tweeters, and frequencies of frames will be biased by super-tweeters who use frames more. This is an accurate representation of the Twitter discourse.

- If the study wishes to accurately represent users, then it should state this, and tweets should be aggregated or sampled representatively per user, and and all subsequent analyses should be at the level of the user, not the level of the tweet.

However, as currently designed, the study starts at the level of a user by "retaining one tweet per user", but then moves back to analyze at the level of a tweet. This results in a bias against super-tweeters with "retaining one tweet per user". Therefore, all subsequent analysis at the level of a tweet omits the vast majority of tweets that compose Twitter discourse - exactly what it seems to want to analyze.

- Minor point: The study should address exactly how it "retained only one tweet per user". This was asked in the review. Was the earliest tweet retained? the most recent? the one with the most framing words? Is it simple random sampling? I ask to elucidate, not to be flippant, as ambiguity does not lead to reproducibility - it leads to concern. A brief mention would suffice.

I can recommend four courses of action, although the authors would know which would be more appropriate for their desired contributions:

1. Keep current data collection, methods, and results that seem to analyze at the level of a tweet, and acknowledge explicit bias against super-tweeters, to favor those users who tweet less often. This seems to require revisions to contributions and research questions, as the overall Twitter discourse is not being studied for framing, but the Twitter discourse being studied is biased towards discourse by less-frequent users. Is there a theoretical motivation for studying in this way?

2. Keep current data collection, analyze at the level of a user, and acknowledge explicit bias against super-tweeters, to favor those users who tweet less often. This would require more extensive revisions to analyses, but would result in a different contribution: instead of studying the entirety of framing discourse on Twitter, studying how *users*, with emphasis on less-frequent users, frame the discussion.

3. Decreasing bias against super-tweeters by gathering a representative amount of tweets per user, and analyze at the level of a user. This would result in a proper representative analysis of discourse on Twitter at the level of the user, and contributions on how users discuss and frame would follow.

4. Decreasing bias against super-tweeters by gathering a representative amount of tweets per user, and analyze at the level of a tweet. This would result in a proper representative analysis of discourse on Twitter at the level of the tweet, and defendable contributions on how tweets generally discuss and frame would follow.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Paul Thibodeau

Reviewer #2: No

Reviewer #4: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2020 Sep 30;15(9):e0240010. doi: 10.1371/journal.pone.0240010.r004

Author response to Decision Letter 1


17 Aug 2020

To the Editor and Reviewers,

Once again, we are very thankful to the reviewers, for taking their time to provide constructive and helpful comments to our paper, which has now significantly improved in quality. In this letter we have addressed the remaining comments and indicated what we changed in our manuscript. We have then submitted the paper to a professional proofreader and editor, native speaker of American English, to proofread the text. We hope that this iteration has cleared any remaining language issues highlighted by Reviewer 1 and 2.

As for the technical issues addressed by Reviewer 4, we have provided a lengthy response and several adaptations to our paper. We are grateful for the detailed feedback, yet we have to acknowledge that some of the criticism will remain, which we attribute not to a wrong but different methodology arising from different research areas (Corpus Linguistics vs. Social Media Monitoring). Find our detailed response below.

REVIEWER 1

Introduction & Theoretical Background

The introductory sections do a better job of describing and situating the contribution of the current work. The paragraph that explicitly details “The innovative aspect of this paper…” (p. 5) is particularly helpful.

Reply: Thank you.

There is still an issue with the sentence I highlighted in my previous review, which now reads, “Unlike the articles on magazines and journals typically used for corpus analyses of this kind, Twitter contains messages written by journalists and other experts in mass communication: most tweets are provide by non-expert communicators” (p. 4). Maybe: “Although Twitter contains messages written by journalists and other experts in mass communication, most tweets are provided by non-expert communicators…”

Reply: Indeed, we apologize, this was a typo that we didn’t catch because of the track changes, thank you for pointing this out, we corrected it.

I still find the heading “Theoretical Background” misleading. While that section certainly does discuss relevant research that helps situate the current work, it doesn’t discuss theoretical background per se (to my eye). But if the authors feel strongly that it does, that’s ok with me.

I feel more strongly that the heading “Experimental design” should be changed because the study is not an experiment. Maybe “Study design.”

Reply: Thank you. As explained in the previous letter, we do believe that the Theoretical Background discusses literature related to the topic addressed in the paper: quantitative analyses based on Twitter data related to epidemics (paragraphs 1-4); cognitive linguistic studies of figurative framings and metaphors, including the WAR frame and metaphors that we tackle in our study (paragraphs 5-7) and alternative framings (paragraph 8). This section does contain a selected review of the literature in these two fields of research, which is functional to our argumentation. We therefore remain with this heading. Thank you for your collaboration.

Reply: We changed the heading “Experimental design” into “Study design” as suggested by the reviewer.

Methods

I appreciate the revised structure of the methods and results, which makes the paper more readable. I still find the first section difficult to interpret. I think it would be useful to say explicitly in the set up or discussion of the topic modeling that the categories will be useful for understanding how the war frame is being used.

Reply: Thank you. In the setup of the study (Study Design) we explicitly mentioned: “To address our three research questions, first we explored the range of topics addressed in the discourse on Covid-19 on Twitter using a topic modelling technique. Consequently, we explored the actual usage of the WAR frame, and in which topics related to Covid-19 is the WAR frame more frequently used.” We have now made explicit in that paragraph the fact that the identification of the topics is functional to the investigation of the distribution of the war-related terms.

It’s not clear what it means to say that, “The two numbers of clusters were chosen on an empirical basis” (p. 17). What was the empirical method / criteria? How was it evaluated? Adding the LDA coherence measure is useful but seems different from what the authors mean when they say that the “clusters were chosen on an empirical basis.”

Reply: With the term “empirical basis” we mean that these numbers were initially chosen by looking at the data and then backed up by post hoc testing. Since 2 and 3 seemed intuitively to be a too small number of clusters that would not allow much observation to emerge, we opted for multitudes of 2, e.g. 4, 8, 12, 16. Then we tested with a post hoc analysis the coherence of the various cluster solutions obtained, and based on these measures, we confirmed that 4 and 16 clusters appeared to be the best choice, given our goal to have a less granular (4 clusters) and a more granular (16 clusters) solutions. Clusters in between 4 and 16 did not provide any meaningful semantic insight, which had not been generalized by 4 clusters or specified by the 16 clusters. The post hoc coherence analysis was conducted by evaluating the coherence score calculated for each cluster solution. The cluster coherence score that we adopted is typically used in topic modelling and with the LDA algorithm, and it is called “Cv measure”. This score tells us how coherent a cluster solution (that is, a topic modelling analysis) based on a number of clusters (=topics) is. The score is based on a one-set segmentation of the most important words, a sliding window and an indirect confirmation measure. The latter uses normalized pointwise mutual information and cosine similarity (for more information on this, see: M. Röder, A. Both, and A. Hinneburg: Exploring the Space of Topic Coherence Measures. In Proceedings of the eighth International Conference on Web Search and Data Mining, 2015.)

I appreciate the discussion of the MIP and MIPVU in the response letter, but I think some of these issues should be noted in the manuscript itself. Namely, this method of categorizing and quantifying metaphor is new, different from alternatives, and has its own benefits and limitations (as does any method of coding language). To be clear, I think the method being used is interesting and worthwhile. But it likely misses some instances of metaphor and it likely counts some words as metaphorical that are being used in a literal sense. Maybe, for example, someone tweeted about how COVID was spreading on the aircraft carrier the USS Theodore Roosevelt, which is a war ship. Given that one of the main contributions of the paper is the method, I think some attention should be given to the strengths and limitations of the method. Of note, a major strength of the method is that it can handle a lot of data. This point could also be made more explicitly. [The comment below also relates to a limitation of the coding method.]

Reply: Thank you, we agree and we have now acknowledged these strengths and limitations of our method in the paper, comparing our way to identify automatically metaphor-related (frame-related) words in a very large corpus of data, to manual methods such as MIPVU, as suggested by the reviewer. We added a paragraph explaining this trade off at the end of the section “Determining lexical units associated with the WAR frame”.

I agree that the concept of a STORM is different from the concept of a TSUNAMI. However, if the same words (e.g., wave, flood, disaster) are being used to index both concepts, how do we know which concept people are drawing on? For example, the word “disaster” is the second most frequent marker of a TSUNAMI in the corpus and also the second most frequent marker of a STORM in the corpus. The word “wave” is also frequent in both. This means that some of the tweets are double counted as metaphorical in a sense, no? Do the analyses assume that the categories are mutually exclusive and exhaustive? It would be useful to report the percentage of tweets that included multiple categories of metaphor.

Reply: Thank you for this comment. The fact that there are lexical entries that appear in both domains tells us that the two domains are indeed semantically related, and therefore the same lexical entries can be used in both frames. This is semantically correct, and also, to be expected. The frames do not need to contain mutually exclusive sets of words: in fact, the more two frames are semantically close to one another, or similar to one another, the more they will share lexical entries. From an ontological perspective, the idea cannot be defended that words belong to one and only one frame. Our algorithm operates in a way that is motivated by these assumptions, and therefore as the reviewer correctly observes, can attribute a tweet to both frames, the STORM and the TSUNAMI, because the calculation is based on the frequency count of a lexical entry that may be listed in both frames, such as the word “disaster”. In this sense, each framing is measured independently from other frames, and we agree that there might be overlaps. This is a sustainable perspective in terms of its ecological validity: we cannot really distinguish whether in the mind of the tweeter there was specifically a STORM frame or a TSUNAMI frame, when they tweeted. It would be unrealistic to argue otherwise. Our analyses do not assume that the lists of lexical entries used for each figurative frame are mutually exclusive and exhaustive for that specific frame. They rather assume that the lists of lexical entries for each frame are theoretically motivated , meaningful and representative for the frame they stand for. Please note that the statistical test we applied in our analyses to compare the percentages by which the different frames are used takes into account the fact that a lexical entry may belong to more than one frame (in this sense, these categories are not mutually exclusive; the presence/absence of a lexical entry in a tweet, however, is a binary, mutually exclusive property: either the word is there, or it is not).

Reply: To clarify this further, our algorithm counted 1 hit whenever a lexical unit (a word from a list) was found in a tweet. Then, the numbers were added up within each frame. In this sense, our unit of analysis is the lexical units in the word lists, not the tweets. A tweet can be counted as many times as it has words related to a frame. For example a tweet such as “This wave of #covid is hard to fight” will count 1 hit for the TSUNAMI frame (wave), 1 hit for the STORM frame (wave) and 1 hit for the WAR frame (fight). Therefore, reporting the percentage of tweets that were counted more than once does not seem to be useful. We have however reported, in the previous and in the current version of the paper, the number of tweets that encompass more than one war-related term (N=1253).

I understand and appreciate how the cluster of words related to FAMILY are being used, as a point of comparison with the figurative frames (WAR, STORM, TSUNAMI, MONSTER). But I still think there is a qualitative difference between this category and the others. The language of WAR (STORM and TSUNAMI and MONSTER) is being used to frame different aspects of the pandemic and response. In many cases, the issues being framed metaphorically could be discussed using an alternative metaphorical frame or in a literal sense. The tweets about FAMILY, on the other hand, seem to be about FAMILY per se. That strikes me as an important difference that should be discussed in the paper.

Reply: Thank you. Indeed, these are qualitatively different frames because WAR, STORM, TSUNAMI, and MONSTER are figurative, while FAMILY is literal. The reviewer mentions: “ The language of WAR (STORM and TSUNAMI and MONSTER) is being used to frame different aspects of the pandemic and response. [...] The tweets about FAMILY, on the other hand, seem to be about FAMILY per se”. This is precisely because the first are used metaphorically (and therefore with a specific effect, generated by the metaphor used, which can be replaced by using another metaphor, or by using literal language instead). The latter is literal language, thus it is used to talk about the literal referents. We have chosen FAMILY deliberately to be a frame that is most likely not used figuratively, yet has comparable properties to a frame we expect to be used figuratively (e.g. WAR, STORM or MONSTER). In fact, we have checked qualitatively the tweets that feature one of the FAMILY terms in it, and it appears that they are used literally, not

metaphorically. So, for the covid reality, FAMILY is used typically as a literal frame. In any tweet that uses a term from FAMILY the context is still the pandemic due to the nature of the dataset.

Results

See above for questions about tweets that were categorized as relating to multiple frames.

Reply: Thank you, as explained above our analyses do not require the lexical entries to belong to mutually exclusive lists. For this reason we applied the Cochran’s Q test statistic in our analyses to compare the percentages by which the different frames are used takes into account the fact that a lexical entry may belong to more than one frame (in this sense, these categories are not mutually exclusive; the presence/absence of a lexical entry in a tweet, however, is a binary, mutually exclusive property: either the word is there, or it is not). Cochran's Q test requires that there only be a binary response (e.g. word is present/word is not present or 1/0) and that there be more than 2 groups of the same size (e.g. WAR, FAMILY, STORM, MONSTER, TSUNAMI). The binary response does not need to be mutually exclusive for the groups.

Adding the replication study strengthens the paper.

Reply: Thank you.

I recommend toning down some of the claims like, “…the percentage of use of the WAR frame reported in our study suggests that this frame is particularly pervasive, because it covers more than one third of all the metaphorical language typically used in texts” (pp. 33-34). The study restricts itself to four metaphorical frames of interest; the full range of metaphor typically used in texts is not quantified here.

Reply: Thank you, we toned down the argument and clarified it further, in relation to the percentages of metaphorical language reported by Steen and colleagues in their corpus analyses.

REVIEWER 2

I want to commend the authors on the impressive revisions to this paper. I think they have done just about everything they could to address all of the reviewers concerns, and I feel the paper is much stronger now and will make a nice addition to the literature. Indeed, I could see myself citing the findings in a future talk or article!

Reply: Thank you, this is great news and very much appreciated.

The only reason I am selecting "minor revision" rather than "accept" is that PLOS ONE does not copyedit accepted manuscripts, and I still feel the paper could use one more round of proofreading and editing. While the overall structure of the paper is greatly improved now, there were still grammatical errors and awkward phrases sprinkled throughout that hindered readability.

Reply: Thank you, this is very helpful. In fact, we have now had a (paid) professional proofreader, American English native speaker, proofreading this version of the manuscript.

For example, lines 75-77 contained the sentence "Unlike the articles on magazines and journals typically used for corpus analyses of this kind, Twitter contains messages written by journalists and other experts in mass communications: most tweets are provided by non-expert communicators" This is very confusing to me as it seems to be saying two different things.

Reply: This is indeed a typo due to the heavy editing and track changes. The revised statement now reads: “Although Twitter contains messages written by journalists and other experts in mass communication, most tweets are provided by non-expert communicators”.

A more mild example, but still one that should be edited for clarity, comes on lines 179-180: "The military metaphor thanks to which we frame diseases such as cancer is a very common one to be found in public discourse." And in the next sentence, the authors use the article "the" before Time Magazine, which is not necessary.

Reply: Thank you for your collaboration, we acknowledge these typos and mistakes, and we corrected these specific cases even before sending the manuscript to the proofreader. Then, the proofreader took care of the whole document. Very much appreciated.

I am not going to go into every example of confusing writing, but I do want to recommend the authors go over the paper again and update the prose as necessary to enhance clarity. At that time, I will be prepared to accept!

Reply: Thank you.

REVIEWER 4

Thank you to the authors for drastically improving many points about the paper. I enjoyed reading this interesting research even more this time. It is much cleaner to read and understand, including contributions, backing statistical analyses, and replication on other datasets. Other reviewers also brought up great points that I did not think of.

However, as is written, the study seems to have unacknowledged limitations and concerns that render contributions too broad about understanding general discourse on Twitter that I would recommend addressing before accepting this paper.

My concern stems from ambiguity in terms of unit of analysis that biases the representativeness of the data, which causes unacknowledged impact on results and contributions. Claims cannot be made about general discourse on Twitter, but they *can* be made about a biased yet useful subset - those tweets that originate from less-frequent tweeters.

Reply: Thank you. This comment is related to our methodological choice to select one tweet per user and drop retweets, which we motivated in the previous reply to reviewers. We are here reporting our previous reply: “We constructed a corpus, which follows specific criteria that we motivated in the paper. A corpus is a balanced and representative collection of texts. As we explained, we limited one tweet per user precisely to avoid having a biased corpus where super-tweeters could have biased the percentages of frequency distribution of frames. Imagine for example that one specific Twitter user who uses Twitter many times per day, is very fond of sci-fi and monsters-related issues, he/she might use the MONSTER framing very frequently in their tweets, also about Covid-19. If we kept all the tweets by this user, this might have biased the distribution frequency of the MONSTER related words: we would have observed a high percentage of tweets with this frame, but these are all produced by the same person, who simply happens to tweet very often. This is not representative for the population. Instead, by limiting one tweet per user, we kept our corpus relatively manageable from a technical perspective (not too heavy) as well as representative and balanced from a theoretical perspective. We also dropped the retweets and mentions because these are duplicated texts, so, the same datapoint would have been counted twice. We argue that this is not desirable, from a corpus linguistics perspective. We have explicitly acknowledged these matters in the section “Constructing the corpus of Covid-19 tweets”. To conclude, our choices for the configuration of the corpus are theoretically motivated and clearly explained in the paper. We do not expect the reviewer to necessarily like them, but methodologically speaking, they are motivated and transparent. As explained later, we have added further analyses to replicate our results on an updated corpus constructed with the same criteria, collected by ourselves, and on a corpus constructed on the basis of existing resources (section “Replication studies").”

Reply: The reviewer has not commented on our reply, which provides a rational motivation for our methodological choices. In this sense, the comment by the reviewer stating that our analyses are based on “a biased yet useful subset - those tweets that originate from less-frequent tweeters” is not fully correct. Our analyses are based on a corpus (and then replicated on another corpus) of tweets. Decades of published research reporting empirical studies in corpus linguistics are based on corpora that are carefully constructed, based on principles such as representativeness and balance of the texts included therein. We therefore believe that our corpus of tweets, which we constructed based on a few theoretically motivated and clearly explained criteria, is representative for the way in which Twitter users conceptualize and talk about Covid on Twitter: we constructed a corpus that represents the language that people use on Twitter, as it is common practice in corpus linguistics and cognitive linguistics, as well as in many disciplines within Language sciences. Please note that we do not only consider “tweets that originate from less-frequent tweeters” as the reviewer mentions, but we rather consider one tweet per tweeter, thus taking into account also super tweeters, by including one of their tweets in the sample.

Generally, any study that performs social monitoring needs to:

- make clear what its unit of analysis is,

- then accordingly determine what it means to be representative of these units in its sampling frame,

- gather units to analyze and analyze those same units

- acknowledge any limitations of representativeness

- and limit contributions accordingly

Reply: We are frankly unfamiliar with the term “social monitoring”. We have found a definition for “social media monitoring”, which appears to be the active monitoring of social media channels for information about a company or organization, usually tracking of various social media content in general as a way to determine the volume and sentiment of online conversation about a brand or topic. This is not what we present in our study.

Reply: The information requested by the reviewer has been provided in the paper as follows:

- units of analysis: the first study is based on topic modelling, and adopts the methods described by the LDA algorithm; the other two studies are based on the lexical entries that belong to the various frames, as clearly stated multiple times in the manuscript. We believe that with units of analysis the reviewer may refer to the words (from the various lists of frame-related words) that we counted in the tweets. To clarify this further, our algorithm counted 1 hit whenever a lexical unit (a word from a list) was found in a tweet. Then, the numbers were added up within each frame. In this sense, our unit of analysis is the lexical units in the word lists, not the tweets. A tweet can be counted as many times as it has words related to a frame. For example a tweet such as “This wave of #covid is hard to fight” will count 1 hit for the TSUNAMI frame (wave), 1 hit for the STORM frame (wave) and 1 hit for the WAR frame (fight). We then reported in the paper the total number of occurrences of the words in the tweets, as this paragraph explains: “Analyzing all tweets from the database, a total of 10,846 tweets contained at least one term from the WAR framing, which is 5.32% of all tweets. Of these, 1,253 tweets had more than one war-related term."

- representativeness: the representativeness of these units is backed up by the sources where they have been retrieved (RelatedWords and the Frames in the Berkeley database of figurative framings).

- gather units and analyze those units: that is how we proceeded, indeed!

- Acknowledge limitations: we have acknowledged the limitation of our analysis, in particular the limitation due to the assumption we made, that the lexical units used for our analysis are used always metaphorically, throughout the corpus. This was necessary, in order to perform an automated analysis. We added a paragraph explaining this trade off at the end of the section “Determining lexical units associated with the WAR frame”.

- we toned down the claims about our contribution, in the General Discussion and Conclusion section.

At minimum in this particular study,

- it should be made clear and consistent through data collection and analyses whether this particular study is analyzing use of frames by users, or use of frames in tweets, (I'm pretty sure it's tweets, right?)

Reply: We analyse the presence of lexical entries related to specific frames in a balanced corpus of tweets that is constructed by taking into account one tweet per user, to avoid the bias of super-tweeters. In this sense, we do not monitor how Twitter works per se, but how people, in our case Twitter users, tend to talk about the pandemic on Twitter. Our study relates to principles and assumptions generally made in fields such as corpus linguistics, discourse analysis and cognitive linguistics, which may be substantially different from those made in the “social media monitoring” field, suggested by the reviewer. As we explained, in line with common practice in the disciplines with which we relate, we started by constructing a balanced and representative corpus of tweets that features as many users as possible to preserve variability within the corpus (one tweet per user), that avoids the bias introduced by super tweeters (explained above), and that provides a window onto the language used on Twitter to talk about the current epidemic.

- the limitation of a bias against super-tweeters should be made more explicit, and I would accordingly recommend explicitly walking back the study's claims, research questions, and contributions, because claims of general representativeness on Twitter are not defendable.

Specifically, if it is chosen to omit the vast majority of tweets by keeping one tweet per user and omitting retweets, claims cannot be made about how the general Twitter discourse discusses and frames COVID-19 (or to put it another way, the study's data is not representative of all tweets). Claims instead can be made about how those tweets *originating from less-frequent tweeters* discuss and frame COVID-19.

Reply: We have not “chosen to omit the vast majority of tweets by keeping one tweet per user and omitting retweets”, as the reviewer argues. Rather, we have constructed a corpus of tweets that is representative and balanced for the language people use on Twitter to talk about Covid, which is the phenomenon we want to investigate. Note that any type of sampling would indeed “omit the vast majority of tweets”. The criteria used for sampling, in our case, have been clearly explained and motivated, in line with common practices in corpus linguistics, discourse analysis, and cognitive linguistic research.

Either that, or I would recommend at least one of:

- switching to analysis at the level of a user

- more representative sample from tweeters (as one tweet per user is usually not seen as representative of a user's content, whether they are among the top tweeters or not)

Please find details below - perhaps these can make the issue more clear if it helps.

First, the study has a bias against retweeters and sharers, in favor of original tweeters, as is stated. However, many users on Twitter retweet or share someone else's tweets - this is seen as a proxy for behavior and opinion, as usually people share things they agree with.

Reply: As previously mentioned, we constructed a corpus of tweets based on the principles of representativeness and balance, in relation to the language that people use to talk about covid on Twitter. We find the reviewer’s statement “people share things they agree with” quite interesting and possibly controversial, as people may share things they strongly disagree with, by retweeting them and adding a very negative comment on them. But, as non-experts of social media monitoring, we limit our discussion to the linguistic phenomena hereby investigated.

Therefore, more accurate counts of frames might be *including* those frames that are retweeted. At the very least, one may analyze with retweets and without to see any differences. It's fine that the study disregards retweets, but this should be acknowledged as a limitation and claims about discourse should be modified accordingly.

Reply: Indeed, we have acknowledged the fact that we dropped retweets on purpose, and motivated our choice. We have modified accordingly our claims about the discourse around Covid on Twitter, explaining that our findings relate to the corpus of tweets that we constructed, and not to Twitter as a social network, which includes retweets and information propagation that can be strongly affected by super-tweeters. A new paragraph including these limitations to our study has been added to the General Discussion and Conclusion section.

And second, to paraphrase, data is being "limited to one tweet per user because of technical limitations and to make the corpus balanced and representative". I would reject the study's claim that the resulting corpus is representative - because the study is not clear about *how* it is representative. As designed, it seems like the study is confused about the unit of analysis that it is trying to represent: the tweet or the user.

Reply: As argued above, the units of analysis are the lexical entries, that is, the frame-related words, which have been carefully constructed on the basis of existing resources. We investigated how these lexical entries are used in a corpus of tweets about Covid, which has been constructed by selecting tweets that contain specific hashtags about covid, dropping retweets because they are technically repeated texts that are not original, and thus in our opinion not much representative of the variability in semantic content that can be found on Twitter, and keeping one tweet per user to construct a balanced corpus that is not biased toward the texts tweeted by super-tweeters (or even twitter bots, which are usually super tweeters).

- If the study wishes to accurately represent tweets, then by Twitter's nature there are super-tweeters, and frequencies of frames will be biased by super-tweeters who use frames more. This is an accurate representation of the Twitter discourse.

- If the study wishes to accurately represent users, then it should state this, and tweets should be aggregated or sampled representatively per user, and and all subsequent analyses should be at the level of the user, not the level of the tweet.

However, as currently designed, the study starts at the level of a user by "retaining one tweet per user", but then moves back to analyze at the level of a tweet. This results in a bias against super-tweeters with "retaining one tweet per user". Therefore, all subsequent analysis at the level of a tweet omits the vast majority of tweets that compose Twitter discourse - exactly what it seems to want to analyze.

Reply: Thank you. This study, as mentioned above, is rooted in principles and assumptions that are usually starting points in corpus linguistics and cognitive linguistics. It is physically impossible to mine the whole Twitter as it stands, because it is a dynamic archive of texts that increases on a daily basis, and therefore there are no claims that can be true for “Twitter” as a whole, technically speaking. So, we have to sample this archive, and construct a corpus that works well for our purposes. In our case we constructed a corpus that is balanced and representative for the aim of the study. Our study is to mine the linguistic phenomenon of figurative framing, that is: the use of words related to specific frames, which suggests a specific conceptualization related to the topic of Covid. Given this aim, we constructed a corpus that is suitable for this type of linguistic investigation, and that would not distort the statistic of our findings, related to the phenomena investigated.

Reply: Our choice to select one tweet per user does not go in the direction of a user-based analysis. Rather, it is done in order to have a selection of tweets that have been produced by as many different human minds as possible, rather than by a very limited number of minds (supertweeters) which may in fact very likely encompass artificial minds (twitter bots).

Reply: Regarding our methodological choice to remove duplicates (retweets), please note that even in the construction of corpora based on the texts found in the web (e.g., UK WAC, IT WAC etc) duplicate texts are typically dropped, to favor the representativeness of the texts included in the corpus. Our choice for the construction of the corpus of tweets goes in the same direction of these previous studies. In fact, there are several tools implemented precisely to remove duplicates texts for the construction of web-based corpora. Twitter as well, is a web-based resource, where creating duplicates is extremely easy (one click: retweet!), but the motivations behind this operation may be different (notably, agreement vs disagreement with the content of the original tweet) and hard to interpret in an automated manner.

- Minor point: The study should address exactly how it "retained only one tweet per user". This was asked in the review. Was the earliest tweet retained? the most recent? the one with the most framing words? Is it simple random sampling? I ask to elucidate, not to be flippant, as ambiguity does not lead to reproducibility - it leads to concern. A brief mention would suffice.

Reply: Thank you, we added the information that only the first contribution of each user was retained, meaning we have kept the first tweet on that day by users whose tweets we have not collected yet. In the previous response, we have explained that “The algorithm picked the first tweet per user that it could find, within the timeframe indicated.” To respond directly to the questions of the reviewer: Was the earliest tweet retained? Yes, the earliest on that given day. The most recent? No. The one with the most framing words? No. Is it simple random sampling? No, since we stated in the previous response that “[c]ollecting all tweets from all users, and then randomly selecting one per each user would have been computationally cumbersome and would have exceeded our database capacity.”

I can recommend four courses of action, although the authors would know which would be more appropriate for their desired contributions:

1. Keep current data collection, methods, and results that seem to analyze at the level of a tweet, and acknowledge explicit bias against super-tweeters, to favor those users who tweet less often. This seems to require revisions to contributions and research questions, as the overall Twitter discourse is not being studied for framing, but the Twitter discourse being studied is biased towards discourse by less-frequent users. Is there a theoretical motivation for studying in this way?

2. Keep current data collection, analyze at the level of a user, and acknowledge explicit bias against super-tweeters, to favor those users who tweet less often. This would require more extensive revisions to analyses, but would result in a different contribution: instead of studying the entirety of framing discourse on Twitter, studying how *users*, with emphasis on less-frequent users, frame the discussion.

3. Decreasing bias against super-tweeters by gathering a representative amount of tweets per user, and analyze at the level of a user. This would result in a proper representative analysis of discourse on Twitter at the level of the user, and contributions on how users discuss and frame would follow.

4. Decreasing bias against super-tweeters by gathering a representative amount of tweets per user, and analyze at the level of a tweet. This would result in a proper representative analysis of discourse on Twitter at the level of the tweet, and defendable contributions on how tweets generally discuss and frame would follow.

Reply: Thank you, we adopted the first course of action and acknowledged the limitations of our study accordingly, emphasizing our background in cognitive and corpus linguistics and where our methodological choices stem from.

Attachment

Submitted filename: Response to Reviewers.docx

Decision Letter 2

Panos Athanasopoulos

8 Sep 2020

PONE-D-20-10986R2

Framing COVID-19 How we conceptualize and discuss the pandemic on Twitter

PLOS ONE

Dear Dr. Wicke,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

==============================

I am happy to accept your paper pending a few minor modifications as outlined by reviewer 1 (see below). Please submit the final revision with a cover letter showing how you have incorporated these final minor revisions as per the instructions below. I will then process the paper for publication without another round of review. Thank you for your diligent effort to address all of the reviewers' helpful comments throughout. I look forward to receiving the final version of your paper in due course.

==============================

Please submit your revised manuscript by Oct 23 2020 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

We look forward to receiving your revised manuscript.

Kind regards,

Panos Athanasopoulos, Ph.D

Academic Editor

PLOS ONE

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The manuscript, which explores metaphors used to frame COVID-19 on Twitter, has been revised to address concerns raised in review. Overall, I think the revision is very successful. The argument is much clearer, and I think the paper is nearly ready for publication. I have a few final comments that should be straightforward to address. I am recommending that the paper be accepted.

1. I still think the use of “empirical basis” is misleading and confusing (p. 17) because, in many research contexts, this term is used to refer to specific criteria or quantitative methods of comparison. Since the current approach is more informal, why not say in the paper what is said in the response letter? Namely, something like, “These two numbers of clusters chosen by looking at the data and are backed up by post hoc testing.”

2. Please include some details about the criteria that Lamsal used to create their database of COVID-19-related Tweets.

3. I was a little confused by the term “individual” in “individual tweeters” on page 13. I think “unique tweeters” would be clearer, assuming that is the intended meaning of “individual tweeters.”

4. Using the Topic labels (e.g., “Communications and Reporting”), rather than the numeric codes (e.g., “Topic III”), would make the figures more readable.

Once again, I found this research really interesting and think that it will make a nice contribution to the literature. I appreciate all the careful work the authors have done throughout the review process.

Reviewer #2: I am satisfied the authors have addressed all critical reviewer points. The current manuscript is significantly stronger than the initial submission and makes a nice contribution to the literature.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2020 Sep 30;15(9):e0240010. doi: 10.1371/journal.pone.0240010.r006

Author response to Decision Letter 2


8 Sep 2020

REVIEWER 1

The manuscript, which explores metaphors used to frame COVID-19 on Twitter, has been revised to address concerns raised in review. Overall, I think the revision is very successful. The argument is much clearer, and I think the paper is nearly ready for publication. I have a few final comments that should be straightforward to address. I am recommending that the paper be accepted.

Reply: Thank you. We appreciate the constructive feedback throughout the review process and such successful revision would not have been possible without such useful and detailed reviews.

1. I still think the use of “empirical basis” is misleading and confusing (p. 17) because, in many research contexts, this term is used to refer to specific criteria or quantitative methods of comparison. Since the current approach is more informal, why not say in the paper what is said in the response letter? Namely, something like, “These two numbers of clusters chosen by looking at the data and are backed up by post hoc testing.”

Reply: Thank you. We have now changed the use of “empirical basis” and refer to the selection of clusters as mentioned by the reviewer.

2. Please include some details about the criteria that Lamsal used to create their database of COVID-19-related Tweets.

Reply: Thank you. We have now included the following criteria in the paper: Lamsal’s dataset is a constantly updated repository of twitter IDs. The collection of those IDs is based on English tweets that include 90+ hashtags and keywords that are commonly used while referencing the pandemic.

3. I was a little confused by the term “individual” in “individual tweeters” on page 13. I think “unique tweeters” would be clearer, assuming that is the intended meaning of “individual tweeters.”

Reply: Thank you. We have now changed the mention of “individual tweeters” with “unique tweeters”.

4. Using the Topic labels (e.g., “Communications and Reporting”), rather than the numeric codes (e.g., “Topic III”), would make the figures more readable.

Reply: Thank you. We have now updated the figure with the topic labels for the 4 topics as requested by the reviewer.

Once again, I found this research really interesting and think that it will make a nice contribution to the literature. I appreciate all the careful work the authors have done throughout the review process.

Reply: Thank you, we appreciate this comment and all the careful responses from the reviewer.

REVIEWER 2

I am satisfied the authors have addressed all critical reviewer points. The current manuscript is significantly stronger than the initial submission and makes a nice contribution to the literature.

Reply: Thank you. We appreciate the constructive feedback throughout the review process and such successful revision would not have been possible without such useful and detailed reviews.

Attachment

Submitted filename: Response to Reviewers.docx

Decision Letter 3

Panos Athanasopoulos

18 Sep 2020

Framing COVID-19 How we conceptualize and discuss the pandemic on Twitter

PONE-D-20-10986R3

Dear Dr. Wicke,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Panos Athanasopoulos, Ph.D

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Acceptance letter

Panos Athanasopoulos

22 Sep 2020

PONE-D-20-10986R3

Framing COVID-19: How we conceptualize and discuss the pandemic on Twitter

Dear Dr. Wicke:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Professor Panos Athanasopoulos

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    Attachment

    Submitted filename: Reviewer 5.docx

    Attachment

    Submitted filename: Response to Reviewers.docx

    Attachment

    Submitted filename: Response to Reviewers.docx

    Attachment

    Submitted filename: Response to Reviewers.docx

    Data Availability Statement

    All data files are available from the open science framework repository (https://osf.io/bj5a6/?view_only=b46ed9663a98461dac3a9430e3954e10).


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES