Detecting COVID-19 Fake News on Twitter: Followers, Emotions, Relationships, and Uncertainty

Ming Ming Chiu; Alex Morakhovski; David Ebert; Audrey Reinert; Luke S Snyder

doi:10.1177/00027642231174329

. 2023 May 29:00027642231174329. doi: 10.1177/00027642231174329

Detecting COVID-19 Fake News on Twitter: Followers, Emotions, Relationships, and Uncertainty

Ming Ming Chiu ^1,^✉, Alex Morakhovski ², David Ebert ³, Audrey Reinert ³, Luke S Snyder ⁴

PMCID: PMC10227546

Abstract

Fake news about coronavirus disease 2019 (COVID-19) can discourage people from taking preventive measures (masks, social distancing), thereby increasing infections and deaths; thus, this study tests whether attributes of users or COVID-19 tweets can distinguish tweets of true news versus fake news. We analyzed 4,165 spell-checked English tweets with a link to 1 of 20 matched COVID-19 news stories (10 true and 10 fake), across the world during 1 year, via computational linguistics and advanced statistics. Tweets with common words, negative emotional valence, higher arousal, greater dominance, first person singular pronouns, third person pronouns or by users with more followers were more likely to be true news tweets. By contrast, tweets with second person pronouns, bald starts, or hedges were more likely to be fake news tweets. Accuracy (F1 score) was 95%. While some tweet attributes for detecting fake news might be universal (pronouns, politeness, followers), others might be topic specific (common words, emotions, hedges).

Keywords: deception, writing, politeness, social media, hedge

Fake news can kill. In April 2020 alone, 82 websites spreading coronavirus disease 2019 (COVID-19) misinformation received 460 million views (Avaaz, 2020). Many people believed the fake news and failed to take preventive measures (e.g., wearing masks, social distancing), contributing to unnecessary COVID-19 infections and 130,000 additional COVID-19 deaths by October 2020 (60% of 217,000 who died of COVID-19 in the United States at that time; Redlener et al., 2020). Hence, detection of fake COVID-19 news is critical for preventing its spread and saving lives.

Many past studies detected fake news accurately but atheoretically with artificial intelligence methods, such as machine learning and topic identification. Using human-coded data of true versus fake news, standard machine learning methods (e.g., random forests, support vector machines, naïve Bayes; Abdelminaam et al., 2021), transformer models (e.g., BERT, ALBERT, XLNET; Gundapu & Mamidi, 2021), and evolutionary classifications (e.g., particle swarm optimization, genetic algorithm, salp swarm algorithm; Al-Ahmad et al., 2021) all yield high classification accuracies (all F1 > 90%); however, all of them are atheoretical black boxes that cannot explain their criteria for identifying fake news and hence cannot inform detection of fake news in other domains. Also using human-coded data, exploratory topic identification methods group fake news by topics and enable researchers to scan these topics for patterns (Ceron et al., 2021), but they lack rigorous tests of statistical significance.

By contrast, statistical analyses test theoretical models (also on human coded data), and suggest that user characteristics (e.g., numbers of friends or followers) or message attributes (personal relationship markers, emotion-eliciting words, vocabulary, or uncertainty markers) might be linked to disinformation (Kwon et al., 2017, which collected over 10,000 tweets between 2006 and 2009 identified by four human coders as related to either 60 events identified as fake news by snopes.com or urban-legends.about.com [e.g., Obama is Muslim], or 51 comparable true news stories in times.com, nytimes.com, or cnn.com). Hence, this study proposes a multilevel model of these user characteristics and message attributes to distinguish tweets linked to fake news from those linked to true news, without human coding; we test our multilevel model via computational linguistics and statistical analyses of 4,165 COVID-19-related tweets during January 21, 2020 to January 2, 2021.

User and Tweet Attributes Predicting Fake News

To distinguish tweets advocating fake news from those advocating true news, we consider possible differences in their user profiles and their words.

User

A user who tweets false or misleading information might disappoint, annoy, or anger his or her audience, who might stop following the user. As a discovered fake news tweet is costlier to a user with more followers (indicating greater social media influence) than others, such a user might tend to tweet true news much more often than fake news (cf. detrimental outcomes in relationship from deception, Dunbar et al., 2016). Indeed, past studies have shown that users with more followers were less likely to tweet rumors (Kwon et al., 2017). Thus, we propose the following hypothesis.

H-1. Tweets by users with more followers are more likely than others to link to true news rather than fake news.

Tweets of fake news versus tweets of true news might differ in their personal relationship markers, emotion-eliciting words, vocabulary, or uncertainty markers (Chiu & Oh, 2021).

Personal Relationships

Personal relationship markers include personal pronouns and politeness markers.

Pronouns

Second person pronouns, first person singular pronouns, and third person pronouns might differentiate fake news tweets from true news tweets. By writing second person pronouns (“you,” “your,” “yours”) in a tweet, a user suggests a closer, more intimate relationship with the audience, which encourages greater audience trust in the user and greater potential compliance (Roloff & Janiszewski, 1989). Consider this fake news tweet, Fake-1.

Fake-1: @Coyotecyb @AnthonySabatini @nmlinguaphile You should read this! Might change your mind! Robert F. Kennedy Jr: COVID-19 vaccine should be avoided at all cost | Principia Scientific Intl.

A user might write second person pronouns in a tweet with fake news to encourage the audience to believe it.

Also, a user tweet with a second person pronoun focuses attention on the audience’s beliefs or actions and places more accountability for any consequences of the tweet on the audience; conversely, such a tweet renders the user more passive and less responsible (Kahn, 2006), thereby reducing his or her cost of revealed fake news. As a tweet with a second person pronoun fosters audience credibility and responsibility, it is more useful for tweeting fake news than true news, so users might be more likely to use it when tweeting fake news rather than true news.

H-2a. Tweets with second person pronouns are less likely than others to link to true news rather than fake news.

By contrast, users writing first person pronouns (“I,” “me,” “my,” “mine”) in their tweets draw attention to themselves and bear more responsibility for them (Martínez, 2005). See true news tweet, True-2.

True-2: I saw that that’s a nasty little disease is that the second case reported in the US 2020/02/07/world/asia/coronavirus-china.html

If a first person pronoun tweets with fake news is discovered, it might be costlier than a similar tweet with a second person pronoun, which might discourage users sending tweets with fake news from using first person pronouns. Hence, users sending tweets with fake news might be less likely than others to use first person pronouns.

H-2b. Tweets with first person pronouns are more likely than others to link to true news rather than fake news.

Lastly, third person pronouns (“he,” “she,” “they,” “him,” “her,” “them,” “his,” “their,” “hers,” or “theirs”) often describe external events unrelated to the user or audience. See True-3.

True-3: Patricia was in #covid19 #vaccines trials. She had a medical problem. Anti-vaccine activists used it to attack the vaccine, helped by a badly done Gofundme. The problem was not vaccine related, Patricia never got the vaccine. She was in the placebo group.

Such external events might be easier to verify than experiences of the user or the audience. Also, the audience expects reports of external events to be objectively true, unlike the subjective experiences of the user or the audience (Moore, 2001). Hence, users sending tweets with fake news might be less likely than others to use third person pronouns.

H-2c. Tweets with third person pronouns are more likely than others to link to true news rather than fake news.

Politeness

Greater formality entails more politeness, and greater familiarity enables less politeness, so people might show less politeness to suggest greater familiarity and a closer relationship (Eelen, 2014). Consider Fake-4.

Fake-4: Be Smart; inform yourself about #coronavirus

This tweet with commands might be rude to a stranger but is more acceptable to an audience familiar with the user. As a closer relationship with the audience encourages greater audience trust in and credibility of the user, a tweet with fake news might be more persuasive without politeness markers (Baxter, 1984).

H-3. Tweets with politeness markers are more likely than others to link to true news rather than fake news.

Emotion

Credible social institutions (e.g., Centers for Disease Control and Prevention [CDC], World Health Organization [WHO]) and medical professionals (e.g., scientists, doctors) often disseminate COVID-19 news regarding illness/death and the urgent need for the public to take preventive measures to protect their health. See True-5 and True-6.

True-5: Death toll rises to 41 in China virus outbreak, with more than 1,000 infected. #virus
True-6: Basic Points: 1. Wash hands frequently and thoroughly 2. Don’t go to work or school if you are ill 3. Wear a mask only if you are in a confined space with others who might be sick.

These tweets about COVID-19 elicit negative emotions (negative valence), ignite audience passion to act (high arousal), and direct them (high dominance) to take protective measures. Such emotional processing of information tends to solidify emerging understanding of the message content (Sherman & Kim, 2002); notably fear-inducing messages are especially effective (though they show diminishing marginal returns; Cummings, 2012; Rhodes, 2017). Authoritative health sources often use directive messages, which increase acceptance of suggested behaviors (Umphress et al., 2008; Wogalter et al., 1995). Hence, news that emphasizes severe risks and urges the public to take specific directed actions, increases public compliance (e.g., hurricane evacuations, Stein et al., 2013).

By contrast, fake news downplays the COVID-19 threat and highlights audience freedom. Consider Fake-7.

Fake-7: @pllchirps We have access to beaches, less than 10 active cases of Covid, and generally experience misty rains. We also have fun on Sounds like we’re the team for you

This tweet paints a lovely seaside scene to elicit positive emotions, minimizes risks to elicit greater audience ambiguity about COVID-19, reduces arousal about risks, and entices the audience, thereby highlighting its autonomy (unlike dominant commands).

H-4a. Tweets with words eliciting negative emotional valence are more likely than others to link to true COVID-19 news rather than fake COVID-19 news.
H-4b. Tweets with words eliciting high emotional arousal are more likely than others to link to true COVID-19 news rather than fake COVID-19 news.
H-4c. Tweets with words eliciting high emotional dominance are more likely than others to link to true COVID-19 news rather than fake COVID-19 news.

Vocabulary

The danger of COVID-19 illness and death obligates a user seeking to help an audience to clearly explain (a) COVID-19, (b) the benefits of corresponding preventive measures (e.g., mask, social distancing), and (c) how to execute them. Such a user makes it easier for his or her audience to understand the need for such actions (lower constraint recognition) and motivates them to comply (Grunig & Kim, 2017; Kim & Grunig, 2011). Hence, these users often write common words familiar to their audience rather than unfamiliar, unusual words (Brysbaert & New, 2009), which helps more of them understand COVID-19, have confidence in the prescriptions, and comply with the instructions to protect their health. Like all of the above true tweets, True-8 uses common, familiar words.

True-8: That ‘Miracle Cure’ You Saw on Facebook? It Won’t Stop the Coronavirus

Whereas users tweeting true COVID-19 news want their audience to act to protect themselves and their loved ones, users tweeting fake COVID-19 news want their audience to continue behaving in the same way. See Fake-9.

Fake-9: @Zebedee32 @bdragon74 @ianbrown the first time in the history of vaccination, the so-called last generation mRNA vaccines intervene directly in the genetic material of the patient and alter the individual genetic material, something that was forbidden and considered criminal.

This user uses less common, less familiar words (e.g., “generation mRNA vaccines intervene”) to increase the complexity of understanding the problem, which increases obstacles to action and facilitates inaction (heighten constraint recognition; Grunig & Kim, 2017; Kim & Grunig, 2011). Thus, users tweeting fake COVID-19 news might write less common, high status, technical words to confuse an audience into inaction.

H-5. Tweets with more common words are more likely than others to link to true COVID-19 news rather than fake COVID-19 news.

Uncertainty

Users trying to help their audience protect their health against COVID-19 are likely to state ideas and necessary actions with certainty (“must” “have to”), to help their audience clearly understand the problem/threat and to enhance audience confidence in the user’s claims and recommended actions (Kim & Grunig, 2011). Consider True-10.

True-10: Having a #vaccine for #COVID19 is a good start. But we still have to deal with vaccine #misinformation and #disinformation campaigns #pandemic

By contrast, users tweeting fake news use words indicating uncertainty (e.g., “perhaps” “might”). See Fake-11.

Fake-11: @thehill How about sterilization? Which you might not discover for years until you try to have a baby.

Such tweets use uncertainty words to raise doubts about COVID-19, reduce understanding and clarity of its danger, and discourage audience action (Gifford, 2011; Kim & Grunig, 2011). Also, uncertainty words reduce the clarity of the fake news, so skeptics cannot easily and definitively prove that unclear claims are false (cf. strategic ambiguity, Jarzabkowski et al., 2010). In addition, uncertainty words facilitate attribution of negative consequences to chance, and thereby help reduce the burden of responsibility on the tweeter (Leonhardt et al., 2011).

H-6. Tweets with fewer uncertainty words (e.g., hedges) are more likely than others to link to true COVID-19 news rather than fake COVID-19 news.

Figure 1 summarizes all six sets of hypotheses. In addition to common words, we reduce omitted variables bias (Kennedy, 2008) by including other writing and word attributes commonly used in similar computational linguistics studies: Flesch–Kincaid reading score (Kincaid et al., 1975), academic level (Coxhead, 2000), and concreteness (Brysbaert et al., 2014). Likewise, we also included the number of Twitter users that a person is following and the number of retweets (Kwon et al., 2017).

Method

Tweets with links to news stories often draw attention to them or use them to support claims. Hence, we determine which attributes of users or of tweets distinguish tweets linked to fake news stories from those linked to true news stories via computational linguistics and advanced statistics. (Some tweets might attack their linked news, but a spot check of our data did not reveal such cases.)

Data

We identified 10 COVID-19 news stories rated false or mostly false by an independent fact-checking website (https://www.snopes.com/fact-check-ratings/). These fake news stories include false claims about treatments or political accusations, such as Will gargling with salt water or vinegar “eliminate” the COVID-19 coronavirus? (false) or Was Charles Lieber arrested for selling the COVID-19 coronavirus to China? (mostly false). Then, we identified Twitter tweets that link to each of these original fake news sources (e.g., fake news website, Facebook, or Twitter post). For comparison, we identified 10 true news stories on the same issues from the New York Times, Reuters, BBC, or AP News.

We obtained COVID-related Twitter data collected by Chen et al. (2020) from January 21, 2020 to January 2, 2021 from GitHub (https://github.com/echen102/COVID-19-TweetIDs) and the full tweet metadata with Twitter’s lookup API (https://developer.twitter.com/en/docs/twitter-api/v1/tweets/post-and-engage/api-reference/get-statuses-lookup). From over 1 billion tweets, 16,897 were linked to fake or true news.

Of these 16,897 tweets, we removed 10,656 retweets and 608 tweets with only the URL link (no other words: 603 fake and 5 true), leaving 5,633 tweets. To determine whether each tweet was in English or another language, we used langid software (https://pypi.org/project/langid/). We removed 1,468 non-English tweets, retaining only the 4,165 English tweets (3,777 linked to fake news and 388 linked to true news). Our large proportion of tweets linked to fake news (rather than true news) is consistent with past studies (e.g., ISD, 2020). For α = .05 and a small effect size of 0.1, statistical power for both 2,358 users and 4,165 tweets exceeds .99 (Konstantopoulos, 2008).

For each tweet, we removed symbols (e.g., $, @, #,”) and extra spaces. We split each tweet into words, converted letters to lowercase and lemmatized them with spaCy 3.0 software (https://spacy.io/). We used autocorrect for spelling errors (https://pypi.org/project/autocorrect/).

For robustness tests, we created four more data sets: (a) retained the data set of 5,633; (b) corrected spelling errors in it; (c) retained the 4,165 tweets without correcting their spelling errors; and (d) created a balanced data set of 776 tweets by including the 388 tweets with links to true news and the first 388 tweets with links to fake news.

Variables

A true news tweet is a dichotomous variable. Its value is 1 for a tweet with a link to a news story from the New York Times, Reuters, BBC, or AP News. Otherwise, the tweet has a link to a fake news article according to Snopes.com, and its value is 0.

User

User variables include number of total tweets, number of people following the user (followers), and number of other users that a user follows (follows).

Writing and Vocabulary

Message attributes include writing, emotion, pronouns, politeness, and uncertainty. For each tweet, we used the regex Python library to count its words and its sentences, and code from https://github.com/akkana/ to count its syllables. For each tweet, we computed its Flesch–Kincaid score (Kincaid et al., 1975).

Flesch-Kincaid = 206.835 - 1.015 \times words / sentences - 84.6 \times syllables / words

(1)

Brysbaert and New (2009) compiled 51 million words from English subtitles of available U.S. movies and television series (SUBTLEX-US corpus) and counted total instances of each word (word frequency). Using this measure, we computed:

Common word = \log (word frequency + 1)

(2)

Then, we computed its mean for all words in a tweet.

Coxhead (2000) compiled a list of 570 word families that often occurred in academic writing (10% of 3.5 million words in over 400 books) but rarely in a similar sized corpus of fiction (1%) and divided them into deciles by frequency, from 10 (most frequent) to 1 (least frequent). We replaced each word in a tweet with this decile (omitting words not in the list) and computed the tweet’s mean academic value.

Brysbaert et al. (2014) paid 4,237 people to rate subsets of 37,058 English words on a scale from 1 (abstract/language based) to 5 (concrete/experience based). We replaced each word in a tweet with this concreteness rating (omitting words not in the list) and computed the tweet’s mean concreteness.

Emotion

Warriner et al. (2013) paid 1,827 anonymous U.S. residents on Amazon’s Mechanical Turk crowdsourcing website to rate the emotional sentiments of the 13,915 highest-frequency English words (64% nouns, 13% verbs, and 23% adjectives, excluding common stopwords like “a,” “the,” “to”). Participants rated each word along three dimensions, each on a nine-point scale (1–9); valence ranges from negative emotional tone (e.g., fury) to positive emotional tone (joy), arousal ranges from low emotional passion (ennui) to high emotional passion (zeal), and dominance ranges from low emotional confidence (serve) to high emotional confidence (control). Words with multiple meanings (polysemous) were not rated. We computed each tweet’s mean ratings of valence, arousal, and dominance.

Pronouns

We created dichotomous variables for the presence of pronouns in each tweet. If a tweet had “I,” “me,” “my,” or “mine,” the variable first person single was assigned a value of 1; otherwise, its value was 0. If a tweet had “we,” “us,” “our,” or “ours,” the variable first person plural was 1, otherwise 0. If a tweet had “you,” “your,” or “yours,” the variable second person was 1, otherwise 0. If a tweet had “he,” “she,” “they,” “him,” “her,” “them,” “his,” “their,” “hers,” or “theirs,” the variable third person was 1, otherwise 0.

Politeness

Danescu-Niculescu-Mizil et al. (2013) used the Stanford Dependency Parser (de Marneffe et al., 2006) and specialized lexicons to create convokit:politeness software to identify words or phrases within six politeness categories of apologize (e.g., “sorry”), deference (e.g., “excuse me”), question (e.g., “who . . .?”), gratitude (e.g., “thanks”), please (e.g., “if you please”), and bald/rude start (e.g., “So, will you”). We applied convokit:politeness to our tweets to create six eponymous dichotomous variables (e.g., if a tweet had an apologize word or phrase, its variable was 1; otherwise, 0).

Uncertainty

Convokit:politeness also identifies words or phrases in the categories of hedge (e.g., “suggest”) and subjunctive (e.g., “could”). Thus, we correspondingly used convokit:politeness to create these two eponymous variables.

Analysis

Analytic Issues and Statistics Strategies

Suitable analyzes of these data must address issues involving outcomes and explanatory variables (see Table 1). Outcome issues include nested data, discrete outcomes, and infrequency.

Table 1.

Statistics Strategies to Address Each Analytic Difficulty.

Analytic difficulty	Statistics strategy
Outcome variables
Nested data (tweets within users)	Multilevel analysis (aka Hierarchical linear modeling, Goldstein, 2011)
Discrete variable (true vs. fake link)	Logit/Probit and odds ratios (Kennedy, 2008)
Infrequency (<25%)	Logit bias estimator (King & Zeng, 2001)
Explanatory variables
Indirect, multilevel mediation effects (X → M → Y)	Multilevel M-test (MacKinnon et al., 2004)
Indirect, multilevel mediation effects (X → M → Y)	Multilevel structural equation model (Joreskog & Sorbom, 2018)
Cross-level interactions (User × Tweet)	Random effects model (Goldstein, 2011)
Many hypotheses’ false positives	Two-stage linear step-up procedure (Benjamini et al., 2006)
Compare effect sizes (β₁ > β₂?)	Lagrange multiplier tests (Bertsekas, 2014)
Consistency of results across data sets	Separate multilevel, single outcome models
	Analyzes of subsets of the data (Kennedy, 2008)
	Original (not estimated) data

Open in a new tab

As tweets by the same person likely resemble one another more than those by different users (nested data), an ordinary least squares regression underestimates the standard errors, so we use a multilevel analysis (Goldstein, 2011; also known as hierarchical linear modeling, Bryk & Raudenbush, 1992). For discrete outcomes (e.g., true vs. fake link), ordinary least squares regressions can bias the standard errors, so we use a Logit regression (Kennedy, 2008). To aid understanding of these results, we report the odds ratio of the regression coefficient, namely the percentage increase or decrease in the likelihood of the outcome (Kennedy, 2008). Infrequent events (less than 25% of the time) can bias logit regression results, so we estimate this bias and remove it (King & Zeng, 2001).

Explanatory variable issues include indirect effects, moderation effects, many hypotheses’ false positives, effect size comparisons, and robustness. Separate, single-level tests of indirect mediation effects on nested data can bias results, so we test for simultaneous multilevel mediation effects with a multilevel M-test (MacKinnon et al., 2004) and a multilevel structural equation model (ML-SEM, Little et al., 2012).

With nested data, incorrectly modeling moderation effects across levels (e.g., User × Tweet) can bias the results, so we use a random effects model (Goldstein, 2011). If the regression coefficient of an explanatory variable (e.g., β_vj = β_v0 + f_vj) differs significantly across levels (f_vj ≠ 0?), then cross-level moderation might exist, and we model the regression coefficient with structural variables (e.g., number of followers of a user). Interaction terms are often correlated with their component variables and can yield unstable results, so we use residual centering to remove such correlations before testing for moderation effects in an ML-SEM (Crandall et al., 2012).

Testing many hypotheses increases the possibility of a false positive. So, we reduce its likelihood via the two-stage linear step-up procedure, which outperformed 13 other methods in computer simulations (Benjamini et al., 2006).

When testing whether the effect sizes of explanatory variables differ, Wald and likelihood ratio tests do not apply at boundary points. Hence, we use Lagrange multiplier tests which apply to the entire data set and show greater statistical power than Wald or likelihood ratio tests for small deviations from the null hypothesis (Bertsekas, 2014).

Lastly, we test whether the results remain stable despite minor differences in the data or analyses (robustness, Kennedy, 2008). As a mis-specified equation for any outcome in a multivariate outcome model can introduce errors in otherwise correctly specified equations, we model each outcome variable separately. Then, we run subsets of the data separately.

Explanatory Model

To determine the antecedents of tweets that link to true news articles rather than fake ones, we model each tweet by each user with a multilevel binary Logit/Probit analysis, beginning with a variance components model to test for significant differences at each level: tweet and user (Goldstein, 2011).

P (T r u e_l i n k_{i j}) = F (β_{0} + f_{j}) + e_{i j}

(3)

The probability that a True_link occurs at tweet i by user j is its expected value via the Logit link function (F) of the overall mean β₀ and the unexplained components (residuals) at the user and tweet levels (f_j, e_ij).

P (True_{link}_{i j}) = F (β_{0} + f_{j} + β_{t} U s e r_{j} + β_{u j} W r i t i n g_{i j} + β_{v j} E m o t i o n_{i j} + β_{w j} R e l a t i o n s h i p_{i j} + β_{x j} U n c e r t a i n t y_{i j} + β_{y j} {Retweets}_{i j} + β_{z j} I n t e r a c t i o n s_{i j}) + e_{i j}

(4)

Time determines the entry order of the explanatory variables. User characteristics exist before message attributes, so we first enter a vector of User variables: total tweets, followers, following. As the log-likelihood difference chi-squared test is not reliable in multilevel analysis of dichotomous outcomes, Wald tests identify significant effects (Goldstein, 2011). Omitting non-significant isolated variables does not cause omitted variable bias, so we safely remove them to increase precision and reduce multicollinearity (Kennedy, 2008). We apply this procedure to each vector.

As understanding precedes other message content attributes, we next enter Writing variables: Flesch–Kincaid score, mean common, mean academic, and mean concrete. Our brain’s amygdala ignites emotional reactions before our cerebral cortex processes other information (Adolphs & Anderson, 2018), so Emotion variables follow: valence, arousal, and dominance. As people especially value their relationships, words indicating user Relationship with the audience are entered next: personal pronouns (first person singular, first person plural, second person, third person) and politeness markers (apologize, deference, question, gratitude, please, bald/rude start). Then, we enter Uncertainty variables: hedge, subjunctive. As the number of Retweets occur after the message, we enter it last.

We use multilevel mediation tests across the above vectors (MacKinnon et al., 2004). Then, a multilevel path analysis (Goldstein, 2011) creates an initial candidate for the ML-SEM (Joreskog & Sorbom, 2018). We remove non-significant variables to yield the final ML-SEM.

We also analyze residuals for influential outliers. To test whether these results are robust to distribution assumptions, we repeat the above analyzes with the Probit link function rather than Logit (Kennedy, 2008). We compute the F1 ratio and the predictive accuracy (final model’s predicted vs. actual news link of each tweet; Chiu, 2008; Powers, 2011).

Results

Summary Statistics

As the attributes of both all tweets and the English only tweets largely align, henceforth we discuss only those in English (see Table 2). Reflecting the predominance of tweets linked to fake news rather than true news (ISD, 2020), only 9% of the tweets were linked to true news stories. These users’ mean followers exceeded those that users followed (1,946 > 1,263). Tweets were by users with a median of 6,876 total tweets, 307 followers, and following 531 other users. Means far exceeded medians (22,928 > 6,876 total tweets, 1,946 > 307 followers, and following 1,263 > 531), so users who tweet frequently versus those who tweet infrequently likely differ.

Table 2.

Summary Statistics.

Variable	All (N = 5,633)					English only (N = 4,165)
Variable	Mean	SD	Min	Median	Max	Mean	SD	Min	Median	Max
True news tweet	0.083	0.275	0	0	1	0.093	0.291	0	0	1
Total tweets	22,853	45,929	1	6,894	653,955	22,928	46,692	1	6,876	653,944
Follows	1,193	2,861	0	472	92,195	1,263	3,133	0	531	92,195
Followers	2,421	34,122	0	308	2,440,230	1,946	9,753	0	307	345,661
Flesch–Kincaid	12.712	84.234	0	29.14	194	26.912	66.497	0	36.62	164
Common word	2.690	2.000	0	3.73	6.06	3.248	1.759	0	3.89	5.72
Academic	1.028	2.075	0	0	10	1.258	2.181	0	0	10
Concreteness	1.897	1.420	0	2.47	5	2.245	1.236	0	2.59	5
Valence	3.730	2.860	1	5.33	8	4.583	2.514	1	5.58	8
Arousal	2.826	2.166	1	3.96	6.9	3.439	1.893	1	4.2	6.63
Dominance	3.542	2.683	1	5.17	7.86	4.328	2.339	1	5.42	7.86
Pronouns
First person single	0.047	0.212	0	0	1	0.061	0.240	0	0	1
First person plural	0.046	0.210	0	0	1	0.061	0.239	0	0	1
Second person	0.077	0.267	0	0	1	0.103	0.304	0	0	1
Third person	0.127	0.333	0	0	1	0.134	0.340	0	0	1
Apologize	0.001	0.030	0	0	1	0.001	0.035	0	0	1
Deference	0.002	0.044	0	0	1	0.002	0.049	0	0	1
Question	0.030	0.170	0	0	1	0.039	0.195	0	0	1
Gratitude	0.005	0.069	0	0	1	0.006	0.080	0	0	1
Please	0.010	0.097	0	0	1	0.013	0.112	0	0	1
Bald/rude start	0.026	0.158	0	0	1	0.035	0.183	0	0	1
Hedge	0.107	0.309	0	0	1	0.140	0.347	0	0	1
Subjunctive	0.002	0.042	0	0	1	0.002	0.049	0	0	1
Retweets	1.844	18.604	0	0	721	1.840	18.510	0	0	721

Open in a new tab

These tweets were hard to read. The low median Flesch–Kincaid score of 36 indicates an expected reader with some college education, and the lower mean Flesch–Kincaid score of 27 expects a reader with a college degree. Words in these tweets were more abstract than concrete (median = 2.6, mean = 2.2, range: 0–5) and rarely academic (median = 0, mean = 1, range: 0–10).

The emotion of these tweets differed across their medians and their means. They showed median positive valence (5.6 in range of 1–9) but mean negative valence (4.6). Arousal was low but much higher at the median (4.2) than the mean (3.4). Also, the median tweet was slightly more dominant (5.4) but the mean was slightly more submissive (4.3). Together, these emotion dimensions indicate that most tweets were moderately positive, moderately passive, or moderately dominant, but a small subset were extremely negative, extremely passive, or extremely submissive.

Most tweets did not have pronouns or politeness. Among these tweets, 30% used personal pronouns (5% had multiple pronouns): first person single: 6%; first person plural: 6%; second person: 10%; third person: 13%. These tweets had few politeness markers: apologize: 0.1%; deference: 0.2%; question: 4%; gratitude: 0.6%; please: 1%. Also, 3.5% of tweets started rudely.

Tweets showed uncertainty much more through hedges (14%) than subjunctives (0.2%). These tweets were re-tweeted a median of 0 times (M = 1.8; SD = 18.5; range: 0–721).

Explanatory Model

The variance components model showed no significant variance across users, so multilevel SEM was not needed, and a single-level (tweets) binary logit analysis sufficed. All results describe first entry into the regression, controlling for all previously included variables. (See correlation-variance-covariance matrix in Appendix A.)

Attributes of users, vocabulary, emotion, audience relationship, and uncertainty were related to true news tweet (see Table 3). Users with a thousand more followers than the mean were a bit more likely (0.4%) to send a true news tweet (see Table 3, model 1), supporting hypothesis H-1. User attributes accounted for 0.7% of the differences in true news tweets (Table 3, model 1, bottom).

Table 3.

Summary Standardized Regression Coefficients (Standard Errors) and Standardized Odds Ratios (%) of Binary Logit Regressions Modeling Spell-checked, English Tweets Linked to True News Articles (Rather than Fake News Articles) (N = 4,165).

Explanatory	Tweet linked to true news
Explanatory	Model 1	Model 2	Model 3	Model 4	Model 5
Variable	User	+ Vocabulary	+ Emotion	+ Relationship	+ Uncertainty
Followers	0.016^***	0.013^***	0.013^***	0.014^***	0.013^**
(Thousands)	(0.004) + 0.4%	(0.004) −0.3%	(0.004) +0.3%	(0.004) 0.3%	(0.004) 0.3%
Common word		0.494^***	0.539^***	0.579^***	0.613^***
Common word		(0.055) +12%	(0.071), +13%	(0.074) +14%	(0.073) +15%
Emotion
Valence			−0.843^***	−0.838^***	−0.842^***
Valence			(0.123) −20%	(0.125) −20%	(0.123) −20%
Arousal			0.358^***	0.350^***	0.384^***
Arousal			(0.091) +9%	(0.093) +9%	(0.094) +9%
Dominance			0.580^***	0.573^***	0.543^***
Dominance			(0.142) 14%	(0.144) +14%	(0.144) +13%
Relationship
First person singular				0.900^***	0.910^***
First person singular				0.170, +21%	(0.170) +21%
Second person				−0.914^***	−0.892^***
Second person				(0.202) −21%	(0.203) −21%
Third person				0.444^**	0.467^***
Third person				(0.141) +11%	(0.141) +11%
Bald/rude start				−0.820^*	−0.760^*
Bald/rude start				(0.322) −19%	(0.323) −18%
Hedge					−0.676^***
Hedge					(0.172) −16%
McFadden’s R²	0.007	0.057	0.077	0.102	0.108

Open in a new tab

p < .05. **p < .01. ***p < .001.

Tweets with words that were more common were more likely to be a true news tweet, supporting H-5 (see Table 3, model 2). Commonness of words accounted for the most variance (5%) in true news tweet (Table 3, model 2, 0.057–0.007 = 5%).

Emotion attributes (valence, arousal, dominance) were also related to true news tweets (Table 3, model 3). Tweets with one degree more negative valence (max: 9), one degree more arousal, or one degree more dominance than others were, respectively, 20%, 9%, or 14% more likely to be true news tweets, supporting H-4a, H-4b, and H-4c. Emotion attributes accounted for an extra 2% of the variance.

Audience relationship markers (pronouns, politeness) were related to true news tweet (Table 3, model 4). Tweets with first person singular or third person pronouns were, respectively, 21% or 11% more likely to be true news tweets, supporting H-2b and H-2c. By contrast, tweets with second person pronouns or bald/rude starts were less likely to be true news tweets, by 21% or 19% respectively, supporting H-2a and H-3. Relationship attributes showed the largest effect sizes, especially first person singular and second person pronouns, and accounted for an extra 2.5% of the variance.

Tweets with a hedge (uncertainty word) were 16% less likely than others to be true news tweets, supporting H-6 (Table 3, model 5). Uncertainty attributes accounted for an extra 0.6% of the variance.

The final model accounted for nearly 11% of the variance and had an F1 ratio of 0.95. All other variables were not significant. All mediation tests and interaction variables were not significant. With no significant mediation, SEM was not needed. Analyses of residuals showed no significant outliers. Analyses with Probit rather than Logit showed similar results. Analyses with the other four data sets showed similar results.

Discussion

Fake news can discourage people from taking preventive measures against COVID-19 and indirectly cause deaths, so this study tests for determinants of true news tweets versus fake news tweets. Our results showed that tweets by a person with more followers, with more common words, more negative emotional valence, higher arousal, greater dominance, first person singular pronouns, or third person pronouns are more likely to be true news tweets. Tweets with second person pronouns, bald starts, or hedges are more likely to be fake news tweets.

Theory Implications

These results suggest the importance of a multilevel model of fake news that includes both user attributes and tweet attributes. Tweets by users with more followers were more likely to have true news links, consistent with the view that a discovered fake news tweet can cause more harm to users with more followers, so such users were more likely to tweet true news (Kwon et al., 2017).

COVID-19 tweets with more common words were more likely to link to true news, accounting for the most variance (5%) and supporting the view that words familiar to an audience help them understand the user’s ideas and enhance their confidence in user prescriptions to comply with them, thereby protecting their health (lower constraint recognition; Grunig & Kim, 2017; Kim & Grunig, 2011). This vocabulary result for COVID-19 true news tweets seeking audience action and corresponding fake news tweets seeking audience inaction might yield different results in other cases. For example, true news tweets that seek audience inaction (e.g., acceptance of the results of the 2020 U.S. presidential election) and fake news tweets that seek audience action (e.g., January 6, 2021 invasion of U.S. Capitol) might show the opposite results.

COVID-19 tweets with negative emotion valence, greater arousal, or greater dominance were more likely to link to true news, consistent with past studies showing that fear-inducing, directive messages increase public compliance (Rhodes, 2017; Stein et al., 2013; Umphress et al., 2008). As with common words, these emotion valence, arousal, and dominance results might differ for alarming fake news tweets and disarming true news tweets; if confirmed in further studies, such results indicate the need for topic-specific theoretical models and analyses of vocabulary and emotion to distinguish fake news tweets from true news tweets.

Relationship markers (pronouns and politeness markers) also differed across tweets of true versus fake news. Tweets with second person pronouns were more likely to link to fake news, consistent with the views that a second person pronoun (a) suggests a closer relationship between the user and the audience, which encourages greater audience trust and compliance (Roloff & Janiszewski, 1989) and (b) focuses attention on the audience, giving them more responsibility for the tweet (Kahn, 2006). By contrast, tweets with first person singular pronouns were more likely to link to true news, supporting the view that first person pronouns draw attention to the user who then bears more responsibility (Martínez, 2005). Also, tweets with third person pronouns were more likely to link to true news, supporting the view that third person pronouns describe external events expected to be true and verifiable, unlike the subjective experiences of the user (Moore, 2001). Tweets with bald/rude starts were more likely to link to fake news, consistent with the view that less politeness suggests greater familiarity which can foster greater audience trust. Like number of followers, these relationship results (for pronouns and politeness) do not depend on the specific topic.

Lastly, tweets with hedges were more likely to link to fake news, supporting the view that uncertainty words raise doubts, reduce understanding, discourage audience action, and avoid responsibility (Gifford, 2011; Jarzabkowski et al., 2010; Kim & Grunig, 2011). Like common words and emotions, this hedge effect might be specific to COVID-19 and might differ for other topics.

Methodology Implications

The user and tweet results, and possibility of topic-specific results have corresponding methodological implications. The significant results at different levels (user, tweet) suggest the importance of including explanatory variables at each level (Goldstein, 2011), especially as omitting significant explanatory variables from a statistical model often biases the results (Kennedy, 2008). If future studies show topic-specific results for common words, emotions, and hedges, statistical models (and their theoretical models) must capture explanatory model differences across topics (e.g., by modeling topic as a separate level in a multilevel analysis; Goldstein, 2011).

Limitations and Future Studies

This study’s limitations include its sample, tweet links to news, and limited explanatory variables. This sample was limited to tweets about one topic in one language within a year, so future studies can use larger samples with multiple topics across multiple languages and longer timespans. Although tweets typically link to news stories to highlight them or as buttresses for claims, some true tweets might debunk linked fake news, and some false tweets might smear true news; so, future studies can filter out such tweets from the data. (Such mislabeled tweets can be viewed as measurement error that tends to increase the standard error to reduce significance; hence, removing all measurement errors would make our significant results stronger.) Whether fake news tweets with links to news sources are similar to those without such links is a critical issue that future studies can test using the above results as preliminary hypotheses.

This study examined a small set of user and tweet attributes, so future studies can examine more attributes at these levels and those at additional levels such as topic, district, state, country, or time. For example, this study did not examine the number of followers or the number of retweets, so future studies can examine whether following other users or retweeting is related to the likelihood of tweeting fake versus real news. Also, a word that is common in one context (e.g., “right-handed” in conversations about differences among children) might be unfamiliar in another context (e.g., “right-handed” in conversations about electricity shows advanced expertise); future studies can include topic-specific dictionaries to address this issue. Moreover, this study did not examine the content of these tweets, which future studies can examine via thematic analysis (Terry et al., 2017) to explore the nature of the authors, the source of their popularity, and the extent to which their followers trust them.

Conclusion

As fake COVID-19 news can discourage people from taking preventive measures, they are more likely to be infected and die, so this study identified attributes that distinguished fake news tweets from true news tweets with 95% accuracy. Tweets by a person with more followers, with more common words, more negative emotional valence, higher arousal, greater dominance, first person singular pronouns, or third person pronouns were more likely to link to true news. By contrast, tweets with second person pronouns, bald starts, or hedges are more likely to link to fake news. Some attributes for detecting fake news tweets might apply broadly (pronouns, politeness, followers), but others are likely topic specific (common words, emotions, hedges).

Author Biographies

Ming Ming Chiu is Chair (Distinguished) Professor of Analytics and Diversity at The Education University of Hong Kong. He invented statistical discourse analysis (SDA), multilevel diffusion analysis (MDA), artificial intelligence Statistician, and online detection of sexual predators. He studies fake news, inequalities, learning, international comparisons, and automatic statistical analyses.

Alex Morakhovski is a Data Scientist specialized in Natural Language Processing, Machine Learning and Algorithms. His research is mainly done through text analysis, text classification, topic modeling and web scraping.

David Ebert is Associate Vice President for Research and Partnerships, Gallogly Chair Professor of electrical and computer engineering, Director, Data Institute for Societal Challenges, and IEEE Fellow. He researches visual analytics, visualization, interactive machine learning, explainable AI, human–computer teaming, predictive analytics, and procedural abstraction of complex, massive data.

Audrey Reinert is a Research Engineer with Aptima Inc., specializing in human factors engineering.

Luke S. Snyder is a Ph.D. student at the University of Washington, advised by Jeffrey Heer. His primary research interests are interactive data visualization and tools for scholarly communication.

Appendix A.

Ancillary Analyses Correlations, variances, and covariances of outcome and explanatory variables in the lower right, diagonal, and upper right matrices.

S. no.	Variable	1	2	3	4	5	6	7	8	9	10	11
1	True news tweet	.084	239	.080	.075	.066	.083	.008	−.002	.007	.000	−.002
2	Followers	.084	959	1.185	1.405	.982	1.435	.005	.039	.139	.028	−.015
3	Easiness	.157	.069	3.093	3.928	2.862	3.734	.072	.141	.044	.048	.147
Emotion
4	Valence	.103	.057	.889	6.319	4.567	5.808	.063	.123	.034	.037	.137
5	Arousal	.120	.053	.859	.960	3.584	4.231	.041	.073	.023	.022	.118
6	Dominance	.122	.063	.908	.988	.955	5.471	.072	.131	.037	.040	.137
Pronouns
7	First person single	.121	.002	.171	.105	.090	.128	.057	.008	.002	.009	.005
8	Second person	−.024	.013	.264	.161	.127	.184	.108	.092	.003	.008	.012
9	Third person	.071	.042	.074	.040	.035	.046	.023	.032	.116	.000	.004
10	Bald start (not polite)	−.006	.016	.148	.080	.065	.094	.198	.148	−.001	.033	.005
11	Hedge	−.025	−.004	.240	.157	.180	.168	.059	.118	.033	.087	.120

Open in a new tab

Bold words are variances.

Footnotes

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding: The author(s) received no financial support for the research, authorship, and/or publication of this article.

References

Abdelminaam D. S., Ismail F. H., Taha M., Taha A., Houssein E. H., Nabil A. (2021). COAID-DEEP: An optimized intelligent framework for automated detecting COVID-19 misleading information on twitter. IEEE Access, 9, 27840–27867. [DOI] [PMC free article] [PubMed] [Google Scholar]
Adolphs R., Anderson D. J. (2018). The neuroscience of emotion: A new synthesis. Princeton University Press. [Google Scholar]
Al-Ahmad B., Al-Zoubi A. M., Abu Khurma R., Aljarah I. (2021). An evolutionary fake news detection method for COVID-19 pandemic information. Symmetry, 13(6), 1091. [Google Scholar]
Avaaz (2020). Facebook’s algorithm: A major threat to public health. Avaaz. [Google Scholar]
Baxter L. A. (1984). An investigation of compliance-gaining as politeness. Human Communication Research, 10(3), 427–456. 10.1111/j.1468-2958.1984.tb00026.x [DOI] [Google Scholar]
Benjamini Y., Krieger A. M., Yekutieli D. (2006). Adaptive linear step-up procedures that control the false discovery rate. Biometrika, 93(3), 491–507. 10.1093/biomet/93.3.491 [DOI] [Google Scholar]
Bertsekas D. P. (2014). Constrained optimization and Lagrange multiplier methods. Academic. [Google Scholar]
Bryk A. S., Raudenbush S. W. (1992). Hierarchical linear models. Sage. [Google Scholar]
Brysbaert M., New B. (2009). Moving beyond Kučera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behavior Research Methods, 41(4), 977–990. 10.3758/BRM.41.4.977 [DOI] [PubMed] [Google Scholar]
Brysbaert M., Warriner A. B., Kuperman V. (2014). Concreteness ratings for 40 thousand generally known English word lemmas. Behavior Research Methods, 46(3), 904–911. 10.3758/s13428-013-0403-5 [DOI] [PubMed] [Google Scholar]
Ceron W., de-Lima-Santos M. F., Quiles M. G. (2021). Fake news agenda in the era of COVID-19: Identifying trends through fact-checking content. Online Social Networks and Media, 21, 100116. [Google Scholar]
Chen E., Lerman K., Ferrara E. (2020). Tracking social media discourse about the COVID-19 pandemic: Development of a public coronavirus twitter data set. JMIR Public Health and Surveill, 6(2), e19273. 10.2196/19273 [DOI] [PMC free article] [PubMed] [Google Scholar]
Chiu M. M. (2008). Flowing toward correct contributions during groups’ mathematics problem solving: A statistical discourse analysis. Journal of the Learning Sciences, 17(3), 415–463. 10.1080/10508400802224830 [DOI] [Google Scholar]
Chiu M. M., Oh Y. W. (2021). How fake news differs from personal lies. American Behavioral Scientist, 65(2), 243–258. 10.1177/0002764220910243 [DOI] [Google Scholar]
Coxhead A. (2000). A new academic word list. TESOL Quarterly, 34(2), 213–238. 10.2307/3587951 [DOI] [Google Scholar]
Cummings L. (2012). Scaring the public: Fear appeal arguments in public health reasoning. Informal Logic, 32(1), 25–50. 10.22329/il.v32i1.3146 [DOI] [Google Scholar]
Danescu-Niculescu-Mizil C., Sudhof M., Jurafsky D., Leskovec J., Potts C. (2013). A computational approach to politeness with application to social factors. arXiv preprint arXiv:1306.6078. [Google Scholar]
de Marneffe M.-C., MacCartney B., Manning C. D. (2006). Generating typed dependency parses from phrase structure parses [Conference session]. LREC (Vol. 6, pp. 449–454). May 24-26, 2006. Genoa, Italy. [Google Scholar]
Dunbar N. E., Gangi K., Coveleski S., Adams A., Bernhold Q., Giles H. (2016). When is it acceptable to lie? Interpersonal and intergroup perspectives on deception. Communication Studies, 67(2), 129–146. 10.1080/10510974.2016.1146911 [DOI] [Google Scholar]
Eelen G. (2014). A critique of politeness theory. Routledge. [Google Scholar]
Gifford R. (2011). The dragons of inaction: Psychological barriers that limit climate change mitigation and adaptation. American Psychologist, 66(4), 290–302. [DOI] [PubMed] [Google Scholar]
Goldstein H. (2011). Multilevel statistical models. Edward Arnold. [Google Scholar]
Grunig J. E., Kim J.-N. (2017). Publics approaches to segmentation in health and risk messaging. In Parrott R. (Ed.), Encyclopedia of health and risk message design and processing. Oxford University Press. [Google Scholar]
Gundapu S., Mamidi R. (2021). Transformer based automatic COVID-19 fake news detection system. arXiv preprint arXiv:2101.00180. [Google Scholar]
Institute for Strategic Dialogue (ISD) (2020). Far-right exploitation of COVID-19. [Google Scholar]
Jarzabkowski P., Sillince J. A., Shaw D. (2010). Strategic ambiguity as a rhetorical resource for enabling multiple interests. Human Relations, 63(2), 219–248. 10.1177/0018726709337040 [DOI] [Google Scholar]
Joreskog K., Sorbom D. (2018). LISREL 10.1. Scientific Software International. [Google Scholar]
Kahn M. (2006). The passive voice of science. In Fill A., Muhlhausler P. (Eds.), Ecolinguistics reader: Language, ecology and environment (pp. 241). [Google Scholar]
Kennedy P. (2008). A guide to econometrics. Wiley-Blackwell. [Google Scholar]
Kim J.-N., Grunig J. E. (2011). Problem solving and communicative action: A situational theory of problem solving. Journal of Communication, 61(1), 120–149. 10.1111/j.1460-2466.2010.01529.x [DOI] [Google Scholar]
Kincaid J. P., Fishburne R. P., Jr., Rogers R. L., Chissom B. S. (1975). Derivation of new readability formulas (automated readability index, fog count and Flesch reading ease formula) for navy enlisted personnel. Naval Technical Training Command Millington TN Research Branch. [Google Scholar]
King G., Zeng L. (2001). Logistic regression in rare events data. Political Analysis, 9, 137–163. [Google Scholar]
Konstantopoulos S. (2008). The power of the test for treatment effects in three-level cluster randomized designs. Journal of Research on Educational Effectiveness, 1(1), 66–88. 10.1080/19345740701692522 [DOI] [Google Scholar]
Kwon S., Cha M., Jung K. (2017). Rumor detection over varying time windows. PLoS One, 12(1), e016834. 10.1371/journal.pone.0168344 [DOI] [PMC free article] [PubMed] [Google Scholar]
Leonhardt J. M., Keller L. R., Pechmann C. (2011). Avoiding the risk of responsibility by seeking uncertainty: Responsibility aversion and preference for indirect agency when choosing for others. Journal of Consumer Psychology, 21(4), 405–413. 10.1016/j.jcps.2011.01.001 [DOI] [Google Scholar]
Little T. D., Card N. A., Bovaird J. A., Preacher K. J., Crandall C. S. (2012). Structural equation modeling of mediation and moderation with contextual factors. In Bovaird A., Card N. A. (Eds.), Modeling contextual effects in longitudinal studies (pp. 207–230). Routledge. [Google Scholar]
MacKinnon D. P., Lockwood C. M., Williams J. (2004). Confidence limits for the indirect effect: Distribution of the product and resampling methods. Multivariate Behavioral Research, 39(1), 99–128. [DOI] [PMC free article] [PubMed] [Google Scholar]
Martínez I. A. (2005). Native and non-native writers’ use of first person pronouns in the different sections of biology research articles in English. Journal of Second Language Writing, 14(3), 174–190. 10.1016/j.jslw.2005.06.001 [DOI] [Google Scholar]
Moore M. E. (2001). Third person pronoun errors by children with and without language impairment. Journal of Communication Disorders, 34(3), 207–228. 10.1016/S0021-9924(00)00050-2 [DOI] [PubMed] [Google Scholar]
Powers D. M. W. (2011). Evaluation: From precision, recall and F-measure to ROC, informedness, markedness & correlation. Journal of Machine Learning Technologies, 2(1), 37–63. [Google Scholar]
Redlener I., Sachs J. D., Hansen S., Hupert N. (2020). 130,000-210,000 avoidable covid-19 deaths—and counting—in the US. National Center for Disaster Preparedness, Columbia University. [Google Scholar]
Rhodes N. (2017). Fear-appeal messages: Message processing and affective attitudes. Communication Research, 44(7), 952–975. 10.1177/0093650214565916 [DOI] [Google Scholar]
Roloff M. E., Janiszewski C. A. (1989). Overcoming obstacles to interpersonal compliance: A principle of message construction. Human Communication Research, 16(1), 33–61. 10.1111/j.1468-2958.1989.tb00204.x [DOI] [Google Scholar]
Sherman D. K., Kim H. S. (2002). Affective perseverance: The resistance of affect to cognitive invalidation. Personality and Social Psychology Bulletin, 28(2), 224–237. 10.1177/0146167202282008 [DOI] [Google Scholar]
Stein R., Buzcu-Guven B., Dueñas-Osorio L., Subramanian D., Kahle D. (2013). How risk perceptions influence evacuations from hurricanes and compliance with government directives. Policy Studies Journal, 41(2), 319–342. 10.1111/psj.12019 [DOI] [Google Scholar]
Terry G., Hayfield N., Clarke V., Braun V. (2017). Thematic analysis. In W. S. Rogers (Ed.) The SAGE handbook of qualitative research in psychology (Vol. 2, pp. 17–37). [Google Scholar]
Umphress E. E., Simmons A. L., Boswell W. R., Triana M. D. C. (2008). Managing discrimination in selection: The influence of directives from an authority and social dominance orientation. Journal of Applied Psychology, 93(5), 982–993. 10.1037/0021-9010.93.5.982 [DOI] [PubMed] [Google Scholar]
Warriner A. B., Kuperman V., Brysbaert M. (2013). Norms of valence, arousal, and dominance for 13,915 English lemmas. Behavior Research Methods, 45, 1191–1207. 10.3758/s13428-012-0314-x [DOI] [PubMed] [Google Scholar]
Wogalter M. S., Barlow T., Murphy S. A. (1995). Compliance to owner’s manual warnings: Influence of familiarity and the placement of a supplemental directive. Ergonomics, 38(6), 1081–1091. 10.1080/00140139508925175 [DOI] [Google Scholar]

[bibr1-00027642231174329] Abdelminaam D. S., Ismail F. H., Taha M., Taha A., Houssein E. H., Nabil A. (2021). COAID-DEEP: An optimized intelligent framework for automated detecting COVID-19 misleading information on twitter. IEEE Access, 9, 27840–27867. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr2-00027642231174329] Adolphs R., Anderson D. J. (2018). The neuroscience of emotion: A new synthesis. Princeton University Press. [Google Scholar]

[bibr3-00027642231174329] Al-Ahmad B., Al-Zoubi A. M., Abu Khurma R., Aljarah I. (2021). An evolutionary fake news detection method for COVID-19 pandemic information. Symmetry, 13(6), 1091. [Google Scholar]

[bibr4-00027642231174329] Avaaz (2020). Facebook’s algorithm: A major threat to public health. Avaaz. [Google Scholar]

[bibr5-00027642231174329] Baxter L. A. (1984). An investigation of compliance-gaining as politeness. Human Communication Research, 10(3), 427–456. 10.1111/j.1468-2958.1984.tb00026.x [DOI] [Google Scholar]

[bibr6-00027642231174329] Benjamini Y., Krieger A. M., Yekutieli D. (2006). Adaptive linear step-up procedures that control the false discovery rate. Biometrika, 93(3), 491–507. 10.1093/biomet/93.3.491 [DOI] [Google Scholar]

[bibr7-00027642231174329] Bertsekas D. P. (2014). Constrained optimization and Lagrange multiplier methods. Academic. [Google Scholar]

[bibr8-00027642231174329] Bryk A. S., Raudenbush S. W. (1992). Hierarchical linear models. Sage. [Google Scholar]

[bibr9-00027642231174329] Brysbaert M., New B. (2009). Moving beyond Kučera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behavior Research Methods, 41(4), 977–990. 10.3758/BRM.41.4.977 [DOI] [PubMed] [Google Scholar]

[bibr10-00027642231174329] Brysbaert M., Warriner A. B., Kuperman V. (2014). Concreteness ratings for 40 thousand generally known English word lemmas. Behavior Research Methods, 46(3), 904–911. 10.3758/s13428-013-0403-5 [DOI] [PubMed] [Google Scholar]

[bibr11-00027642231174329] Ceron W., de-Lima-Santos M. F., Quiles M. G. (2021). Fake news agenda in the era of COVID-19: Identifying trends through fact-checking content. Online Social Networks and Media, 21, 100116. [Google Scholar]

[bibr12-00027642231174329] Chen E., Lerman K., Ferrara E. (2020). Tracking social media discourse about the COVID-19 pandemic: Development of a public coronavirus twitter data set. JMIR Public Health and Surveill, 6(2), e19273. 10.2196/19273 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr13-00027642231174329] Chiu M. M. (2008). Flowing toward correct contributions during groups’ mathematics problem solving: A statistical discourse analysis. Journal of the Learning Sciences, 17(3), 415–463. 10.1080/10508400802224830 [DOI] [Google Scholar]

[bibr14-00027642231174329] Chiu M. M., Oh Y. W. (2021). How fake news differs from personal lies. American Behavioral Scientist, 65(2), 243–258. 10.1177/0002764220910243 [DOI] [Google Scholar]

[bibr15-00027642231174329] Coxhead A. (2000). A new academic word list. TESOL Quarterly, 34(2), 213–238. 10.2307/3587951 [DOI] [Google Scholar]

[bibr16-00027642231174329] Cummings L. (2012). Scaring the public: Fear appeal arguments in public health reasoning. Informal Logic, 32(1), 25–50. 10.22329/il.v32i1.3146 [DOI] [Google Scholar]

[bibr17-00027642231174329] Danescu-Niculescu-Mizil C., Sudhof M., Jurafsky D., Leskovec J., Potts C. (2013). A computational approach to politeness with application to social factors. arXiv preprint arXiv:1306.6078. [Google Scholar]

[bibr18-00027642231174329] de Marneffe M.-C., MacCartney B., Manning C. D. (2006). Generating typed dependency parses from phrase structure parses [Conference session]. LREC (Vol. 6, pp. 449–454). May 24-26, 2006. Genoa, Italy. [Google Scholar]

[bibr19-00027642231174329] Dunbar N. E., Gangi K., Coveleski S., Adams A., Bernhold Q., Giles H. (2016). When is it acceptable to lie? Interpersonal and intergroup perspectives on deception. Communication Studies, 67(2), 129–146. 10.1080/10510974.2016.1146911 [DOI] [Google Scholar]

[bibr20-00027642231174329] Eelen G. (2014). A critique of politeness theory. Routledge. [Google Scholar]

[bibr21-00027642231174329] Gifford R. (2011). The dragons of inaction: Psychological barriers that limit climate change mitigation and adaptation. American Psychologist, 66(4), 290–302. [DOI] [PubMed] [Google Scholar]

[bibr22-00027642231174329] Goldstein H. (2011). Multilevel statistical models. Edward Arnold. [Google Scholar]

[bibr23-00027642231174329] Grunig J. E., Kim J.-N. (2017). Publics approaches to segmentation in health and risk messaging. In Parrott R. (Ed.), Encyclopedia of health and risk message design and processing. Oxford University Press. [Google Scholar]

[bibr24-00027642231174329] Gundapu S., Mamidi R. (2021). Transformer based automatic COVID-19 fake news detection system. arXiv preprint arXiv:2101.00180. [Google Scholar]

[bibr25-00027642231174329] Institute for Strategic Dialogue (ISD) (2020). Far-right exploitation of COVID-19. [Google Scholar]

[bibr26-00027642231174329] Jarzabkowski P., Sillince J. A., Shaw D. (2010). Strategic ambiguity as a rhetorical resource for enabling multiple interests. Human Relations, 63(2), 219–248. 10.1177/0018726709337040 [DOI] [Google Scholar]

[bibr27-00027642231174329] Joreskog K., Sorbom D. (2018). LISREL 10.1. Scientific Software International. [Google Scholar]

[bibr28-00027642231174329] Kahn M. (2006). The passive voice of science. In Fill A., Muhlhausler P. (Eds.), Ecolinguistics reader: Language, ecology and environment (pp. 241). [Google Scholar]

[bibr29-00027642231174329] Kennedy P. (2008). A guide to econometrics. Wiley-Blackwell. [Google Scholar]

[bibr30-00027642231174329] Kim J.-N., Grunig J. E. (2011). Problem solving and communicative action: A situational theory of problem solving. Journal of Communication, 61(1), 120–149. 10.1111/j.1460-2466.2010.01529.x [DOI] [Google Scholar]

[bibr31-00027642231174329] Kincaid J. P., Fishburne R. P., Jr., Rogers R. L., Chissom B. S. (1975). Derivation of new readability formulas (automated readability index, fog count and Flesch reading ease formula) for navy enlisted personnel. Naval Technical Training Command Millington TN Research Branch. [Google Scholar]

[bibr32-00027642231174329] King G., Zeng L. (2001). Logistic regression in rare events data. Political Analysis, 9, 137–163. [Google Scholar]

[bibr33-00027642231174329] Konstantopoulos S. (2008). The power of the test for treatment effects in three-level cluster randomized designs. Journal of Research on Educational Effectiveness, 1(1), 66–88. 10.1080/19345740701692522 [DOI] [Google Scholar]

[bibr34-00027642231174329] Kwon S., Cha M., Jung K. (2017). Rumor detection over varying time windows. PLoS One, 12(1), e016834. 10.1371/journal.pone.0168344 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr35-00027642231174329] Leonhardt J. M., Keller L. R., Pechmann C. (2011). Avoiding the risk of responsibility by seeking uncertainty: Responsibility aversion and preference for indirect agency when choosing for others. Journal of Consumer Psychology, 21(4), 405–413. 10.1016/j.jcps.2011.01.001 [DOI] [Google Scholar]

[bibr36-00027642231174329] Little T. D., Card N. A., Bovaird J. A., Preacher K. J., Crandall C. S. (2012). Structural equation modeling of mediation and moderation with contextual factors. In Bovaird A., Card N. A. (Eds.), Modeling contextual effects in longitudinal studies (pp. 207–230). Routledge. [Google Scholar]

[bibr37-00027642231174329] MacKinnon D. P., Lockwood C. M., Williams J. (2004). Confidence limits for the indirect effect: Distribution of the product and resampling methods. Multivariate Behavioral Research, 39(1), 99–128. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr38-00027642231174329] Martínez I. A. (2005). Native and non-native writers’ use of first person pronouns in the different sections of biology research articles in English. Journal of Second Language Writing, 14(3), 174–190. 10.1016/j.jslw.2005.06.001 [DOI] [Google Scholar]

[bibr39-00027642231174329] Moore M. E. (2001). Third person pronoun errors by children with and without language impairment. Journal of Communication Disorders, 34(3), 207–228. 10.1016/S0021-9924(00)00050-2 [DOI] [PubMed] [Google Scholar]

[bibr40-00027642231174329] Powers D. M. W. (2011). Evaluation: From precision, recall and F-measure to ROC, informedness, markedness & correlation. Journal of Machine Learning Technologies, 2(1), 37–63. [Google Scholar]

[bibr41-00027642231174329] Redlener I., Sachs J. D., Hansen S., Hupert N. (2020). 130,000-210,000 avoidable covid-19 deaths—and counting—in the US. National Center for Disaster Preparedness, Columbia University. [Google Scholar]

[bibr42-00027642231174329] Rhodes N. (2017). Fear-appeal messages: Message processing and affective attitudes. Communication Research, 44(7), 952–975. 10.1177/0093650214565916 [DOI] [Google Scholar]

[bibr43-00027642231174329] Roloff M. E., Janiszewski C. A. (1989). Overcoming obstacles to interpersonal compliance: A principle of message construction. Human Communication Research, 16(1), 33–61. 10.1111/j.1468-2958.1989.tb00204.x [DOI] [Google Scholar]

[bibr44-00027642231174329] Sherman D. K., Kim H. S. (2002). Affective perseverance: The resistance of affect to cognitive invalidation. Personality and Social Psychology Bulletin, 28(2), 224–237. 10.1177/0146167202282008 [DOI] [Google Scholar]

[bibr45-00027642231174329] Stein R., Buzcu-Guven B., Dueñas-Osorio L., Subramanian D., Kahle D. (2013). How risk perceptions influence evacuations from hurricanes and compliance with government directives. Policy Studies Journal, 41(2), 319–342. 10.1111/psj.12019 [DOI] [Google Scholar]

[bibr46-00027642231174329] Terry G., Hayfield N., Clarke V., Braun V. (2017). Thematic analysis. In W. S. Rogers (Ed.) The SAGE handbook of qualitative research in psychology (Vol. 2, pp. 17–37). [Google Scholar]

[bibr47-00027642231174329] Umphress E. E., Simmons A. L., Boswell W. R., Triana M. D. C. (2008). Managing discrimination in selection: The influence of directives from an authority and social dominance orientation. Journal of Applied Psychology, 93(5), 982–993. 10.1037/0021-9010.93.5.982 [DOI] [PubMed] [Google Scholar]

[bibr48-00027642231174329] Warriner A. B., Kuperman V., Brysbaert M. (2013). Norms of valence, arousal, and dominance for 13,915 English lemmas. Behavior Research Methods, 45, 1191–1207. 10.3758/s13428-012-0314-x [DOI] [PubMed] [Google Scholar]

[bibr49-00027642231174329] Wogalter M. S., Barlow T., Murphy S. A. (1995). Compliance to owner’s manual warnings: Influence of familiarity and the placement of a supplemental directive. Ergonomics, 38(6), 1081–1091. 10.1080/00140139508925175 [DOI] [Google Scholar]

PERMALINK

Detecting COVID-19 Fake News on Twitter: Followers, Emotions, Relationships, and Uncertainty

Ming Ming Chiu

Alex Morakhovski

David Ebert

Audrey Reinert

Luke S Snyder

Abstract

User and Tweet Attributes Predicting Fake News

User

Tweet

Personal Relationships

Pronouns

Politeness

Emotion

Vocabulary

Uncertainty

Figure 1.

Method

Data

Variables

User

Writing and Vocabulary

Emotion

Pronouns

Politeness

Uncertainty

Analysis

Analytic Issues and Statistics Strategies

Table 1.

Explanatory Model

Results

Summary Statistics

Table 2.

Explanatory Model

Table 3.

Discussion

Theory Implications

Methodology Implications

Limitations and Future Studies

Conclusion

Author Biographies

Appendix A.

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases