Skip to main content
PLOS Computational Biology logoLink to PLOS Computational Biology
. 2020 Mar 12;16(3):e1007633. doi: 10.1371/journal.pcbi.1007633

The impact of news exposure on collective attention in the United States during the 2016 Zika epidemic

Michele Tizzoni 1,*, André Panisson 1, Daniela Paolotti 1, Ciro Cattuto 1
Editor: Matthew (Matt) Ferrari2
PMCID: PMC7067377  PMID: 32163409

Abstract

In recent years, many studies have drawn attention to the important role of collective awareness and human behaviour during epidemic outbreaks. A number of modelling efforts have investigated the interaction between the disease transmission dynamics and human behaviour change mediated by news coverage and by information spreading in the population. Yet, given the scarcity of data on public awareness during an epidemic, few studies have relied on empirical data. Here, we use fine-grained, geo-referenced data from three online sources—Wikipedia, the GDELT Project and the Internet Archive—to quantify population-scale information seeking about the 2016 Zika virus epidemic in the U.S., explicitly linking such behavioural signal to epidemiological data. Geo-localized Wikipedia pageview data reveal that visiting patterns of Zika-related pages in Wikipedia were highly synchronized across the United States and largely explained by exposure to national television broadcast. Contrary to the assumption of some theoretical epidemic models, news volume and Wikipedia visiting patterns were not significantly correlated with the magnitude or the extent of the epidemic. Attention to Zika, in terms of Zika-related Wikipedia pageviews, was high at the beginning of the outbreak, when public health agencies raised an international alert and triggered media coverage, but subsequently exhibited an activity profile that suggests nonlinear dependencies and memory effects in the relation between information seeking, media pressure, and disease dynamics. This calls for a new and more general modelling framework to describe the interaction between media exposure, public awareness and disease dynamics during epidemic outbreaks.

Author summary

Despite its importance for public health policy-makers, understanding the impact of media coverage on collective attention during disease outbreaks remains an elusive research task, due to the lack of available data, especially at high spatial granularity. In this paper, we study the dynamics of collective attention received by the 2016 Zika epidemic in the USA and its interplay with the media coverage of the outbreak, at level of US states and cities. We measure the attention to Zika through geo-localized Wikipedia page view data, and we compare it with mentions of Zika in US news outlets and TV shows. We also compare the collective attention received by the outbreak with the incidence of Zika reported by the US Centers for Disease Control and Prevention in each state. We find that the attention dynamics was highly synchronized across states, irrespective of the local risk of transmission of the virus. By building a linear regression model, we show that the dynamics of collective attention is highly predictable, even at state level, only based on the national media coverage received by the outbreak.

Introduction

The advent of the digital era has radically changed the way individuals search for information and this is particularly relevant for health-related information [1]. A 2013 study [2] found that 59% of U.S. adults had looked for health information on the Web in the previous year and that about one in three U.S. adults use the Internet to figure out what medical condition they have. The fruition of news sources, either traditional such as television, radio and newspapers, or digital such as Web news or online social networks, has become crucial in how health information is delivered and it can play a fundamental role in shaping opinions, awareness and behaviours. In the past ten years, several studies have addressed the impact of awareness and information spread during epidemic outbreaks and it has been reported that the degree of public attention and concern induced by an epidemic threat might affect the disease transmission dynamics [37]. However, modeling efforts have been mostly theoretical and a large-scale empirical characterization of the adoption of health protective behaviours, either induced by information spread or media exposure, and its interplay with the disease dynamics during an epidemic outbreak has been elusive so far due to the lack of available data [8].

Here, we study a large-scale dataset on spatio-temporally resolved accesses to Wikipedia pages on the 2015-2016 Zika virus (ZIKV) epidemic, regarded as a proxy for collective attention to this emerging health threat. The epidemic started in Brazil in 2015 and spread to other parts of South and North America in 2016. This study focuses on attention patterns in the United States throughout 2016, and on their relation to media coverage of the epidemic.

ZIKV is a RNA virus from the Flaviviridae family which is mainly transmitted by infected Aedes mosquitoes, although there have been cases of sexual and perinatal transmission. Infection is mostly asymptomatic or associated with mild symptoms [9] but it can lead to serious and sometimes fatal neurological defects in neonates born to ZIKV infected women. In particular, following the association between ZIKV and a cluster of microcephaly cases in Brazil [10], the World Health Organization (WHO) declared the ZIKV epidemic a Public Health Emergency of International Concern (PHEIC) on February 1st, 2016 [11]. The emergency lasted until November 18th 2016, when the WHO declared the PHEIC to be over [12]. As of March 2017, ZIKV has spread worldwide to 79 countries where there has been evidence of an ongoing vector-borne virus transmission. The most affected region has been the American continent with 47 countries or territories reporting local ZIKV transmission, due to the extensive presence of Aedes mosquitoes in almost all the region’s countries [13]. In such epidemiological context, the ZIKV epidemic has posed peculiar communication challenges to the public due to its association with microcephaly in newborns, its transmission modalities, and its prevalence in areas where the virus was never detected before and that was suddenly characterized by intense international travel due to the 2016 Summer Olympics [1315].

The communication challenges posed by ZIKV are well exemplified in a manual released in 2017 by the WHO European Region [16] on how to deal with the complex communication challenges of ZIKV and mosquito borne disease outbreaks in general. Among the main communication issues listed in the manual, besides Dealing with uncertainty and Rumor management, there are Increase in information demand and Managing evolving information. The last two are particularly interesting as they refer to the massive demand for information from the public, and from media as well, and to the necessity of adapting public health communication as more is learned about the outbreak and scientific knowledge about Zika virus and its complications evolve.

Public polls conducted in the United States evidenced the lack of knowledge about ZIKV in the general population and more specifically in groups at risk, such as pregnant women [17]. The novelty of the disease and the lack of previous knowledge of it in the affected areas make the 2016 ZIKV epidemic an ideal case study to characterize collective attention patterns, identify their drivers and test traditional modeling assumptions. Intuitively, mass media coverage represents the main driver of public attention during an epidemic. Indeed, several peculiarities of media narratives around public health hazards [18] and infectious diseases [19] have been elucidated, but a general and quantitative comprehension of how the public opinion responds to media exposure during an emerging epidemic threat is still lacking. The majority of modeling studies assume that media exposure is driving behavioural changes, hence media exposure effects are incorporated into some kind of media function that modulates individual behaviors and may affect disease dynamics [2022]. The general assumption is that as the number of cases increases and is reported by mass media, the susceptibility of individuals will decrease due to increasing awareness and the associated behavioral changes [21]. However, for most disease outbreaks, such modeling assumptions have never been supported by direct empirical evidence.

Our study analyses time-resolved and geo-localized Wikipedia pageview counts to investigate the dynamics of public attention in the United States, during the 2016 ZIKV epidemic. Accesses to Wikipedia pages represent a signal of information seeking behavior, defined as the deliberate process in which individuals actively aim to acquire new knowledge by searching for information on a specific topic [23]. In our study, we considered such indicator to be a proxy for measuring the collective attention to the outbreak. Specifically, we considered the daily pageview counts on 128 different Zika-related Wikipedia articles (96 languages) in the U.S. to be an unambiguous indicator of collective attention to the epidemic. We investigated the temporal and spatial patterns of pageviews in relation to the timeline of ZIKV incidence reported by the US Centers for Disease Control and Prevention (CDC), and in relation to the coverage of the ZIKV epidemic by local and national media sources. In particular, we focused on news coverage of the ZIKV epidemic by online media and television in 2016, available in digital format through the GDELT project and the Internet Archive (see Methods for a full description of the data under study).

Results

Temporal profile of collective attention and news coverage

Public attention and media coverage of the 2016 ZIKV epidemic showed a distinct and synchronous temporal pattern, as seen in Fig 1. The daily timeline of Wikipedia pageviews (Fig 1A) highlights two distinct peaks of attention in 2016: the first, in the beginning of February 2016, corresponding to the international alert raised by the WHO and by the CDC at national level; the second, in August 2016, corresponding to the Summer Olympics in Rio de Janeiro, which attracted significant attention due to the health concerns for athletes and the related risk of case importation. Such spikes of attention visibly correspond to similar spikes in the media coverage profiles, both in the TV coverage of the epidemic (Fig 1B) and in the Web news coverage of Zika (Fig 1C). The three time series are indeed highly correlated, with a Pearson’s correlation coefficient r = 0.74, (p < 10−4), for the Wikipedia pageviews and the Web news time series, and r = 0.80, (p < 10−4), for the Wikipedia attention and the TV coverage (see S2 Table in the Supporting Information).

Fig 1. Attention, media coverage and disease incidence of the Zika virus in the USA in 2016.

Fig 1

(A) Daily Wikipedia pageview counts of Zika related pages. B, Daily mentions of the word “Zika” in TV programs broadcasted in the U.S. extracted from the TV Internet Archive. C, Daily number of Web news mentioning “Zika” extracted from the GDELT project. D, Weekly incidence of the Zika virus reported by the CDC. Originally reported case counts were smoothed with a biweekly rolling average.

On the contrary, while the profile of Wikipedia pageviews shows a temporal pattern that is very similar to the one displayed by mentions of Zika in media outlets, the temporal profile of the disease incidence is qualitatively very different. The number of new ZIKV cases in the United States reported by the CDC every week (Fig 1D) gradually increased from the beginning of 2016 until the summer, with a peak between the end of August and the beginning of September. Notifications of new cases declined afterwards. Summer 2016 was also characterized by the first reports of local ZIKV transmission in Florida and in Texas, events that were also responsible for an increased news coverage of Zika. However, the surge of reported ZIKV cases in the United States did not result in an increased level of attention with respect to the initial spike observed at the beginning of the outbreak. The epidemic profile is indeed not correlated with the timeline of Wikipedia pageviews (r = −0.15, p = 0.26) or the media coverage profiles (r = 0.04, p = 0.80 for TV and r = 0.10, p = 0.47, for Web news). Such dynamics of public attention can be ascribed to the initial novelty of the outbreak, which was presented as a novel and serious health threat even in the presence of a small number of imported cases. As the extension of the outbreak and the associated risks became clearer to the public, the interest of Americans in looking for additional information on Wikipedia faded over the course of the year, with relatively smaller increments linked to important events such as the Olympics. Such observation is consistent with the presence of a memory effect in the dynamics of Wikipedia pageviews: individuals retain information for some time before their attention toward a topic is elicited again by novel events or anniversaries [24, 25].

Spatial patterns of collective attention

The available spatial granularity of Wikipedia pageview data allowed us to further inspect how the above picture changes when moving from a national perspective to States and U.S. cities. Notably, the temporal dynamics of attention to the Zika-related Wikipedia pages in 2016 was highly synchronized across all the 50 States. Although the relative risk of case importation and local transmission varied significantly from state to state, being the Southern States more at risk due to vector’s presence and abundance [26], the Wikipedia pageview timelines were all highly correlated, as shown in Fig 2. The Pearson correlation coefficient of the cross-correlation matrix of Wikipedia pageview time series by State ranges from r = 0.77 (p < 10−4) for Delaware and Montana, to r = 0.99 (p < 10−4) for New York and New Jersey. Overall, the correlation of the Wikipedia pageviews in each state with the national timeline was always higher than r = 0.88 (p < 10−4), indicating a high degree of spatial uniformity across the country. Given the above mentioned correlation of Wikipedia pageviews with the TV coverage of the epidemic and the mentions of Zika on the Web, the attention patterns at State level were also highly correlated with the national media coverage suggesting a fundamental role of news exposure as a driver of public attention at all geographic scales.

Fig 2. Correlation of public attention timelines by state.

Fig 2

Pearson correlation matrix of the daily Wikipedia pageviews time series of the 50 states and the District of Columbia.

One could argue that local patterns of attention may be influenced by local news and local epidemic events, such as case importations or a local increase of disease prevalence. We tested these hypotheses by comparing Wikipedia pageview counts in each state to Web news mentioning the word “Zika” and the name of the state, and to the local ZIKV incidence profiles. Attention profiles in each state were generally positively correlated to Web news mentioning the name of the state, however the degree of correlation ranged from r = 0.004 (p = 0.98) in Wisconsin to r = 0.75 (p < 10−4) in Texas, showing significant spatial differences across the country (see S3 Table in the Supporting Information). Interestingly, ZIKV incidence in each state could explain such geographic variations as a negative driver of attention. On the one hand, local patterns of attention in each state were generally not correlated with disease incidence, with the exception of Montana (r = 0.33, p = 0.02), as shown in S4 Table of the Supporting Information. On the other hand, Web news covering Zika in each state were positively correlated with the local incidence profiles (r > 0.20) only in 13 states out of 50 (see S5 Table in the Supporting Information) and, at the same time, these states showed the smallest degree of correlation between news and attention. A direct comparison of the 50 states ranked by degree of correlation between news and ZIKV incidence, and between news and attention, showed a negative rank correlation: weighted Kendall’s τ = −0.25 [27]. Overall, in those states where local news were following more closely the local epidemic patterns, the dynamics of public attention was not driven much by news. Instead, local attention patterns followed more closely the state news where the latter was more similar to the national one and less correlated with the local ZIKV epidemiology.

It is natural to ask whether correlations between patterns of attention and disease risk may change by looking at different spatial resolutions. To answer this question, we examined the attention to ZIKV in 788 cities of the United States with a population larger than 40,000 and compared it to their total Wikipedia viewership. By ranking the U.S. cities based on their total volume of Wikipedia pageviews in 2016, and comparing such ranking with the one based on pageviews of Zika-related articles only, we identified locations where the attention to ZIKV was higher than expected. As shown in Fig 3A, cities on the East Coast of Florida showed the highest relative attention to ZIKV, when compared to their overall Wikipedia activity. Other relevant outliers with high attention were cities in Texas and in the Northeast. On the contrary, the lowest attention to ZIKV was observed in cities in California, and in the Midwest (Fig 3B). These results suggest that increases in public attention at city level may be explained by risk perception due to the presence of the vector (as in Florida and Texas). However, the high level of attention in other places, such as Union City, NJ, can not be easily explained by epidemiological risk factors and it may be due to specific events, such as one or more case importations, that do not appear in our dataset.

Fig 3. Spatial patterns of attention.

Fig 3

Cities of the United States, with population higher than 40,000, where the volume of attention to ZIKV related pages was higher (panel A) or lower (panel B) than expected based on the total volume of pageviews to Wikipedia in 2016. The maps only show the 50 cities with the largest positive (panel C) or negative (panel D) difference in their pageview rankings, based on ZIKV related pages and the full Wikipedia. The labels on the maps highlight the 10 cities with highest (panel A) or lowest attention (panel B).

Time series analysis

The correlation analysis based on the Pearson’s coefficient could be influenced by autocorrelations in both the dependent and independent variables under consideration. To better assess the mutual relationships between news coverage, Wikipedia pageviews and Zika incidence, we turn to a Vector Autoregression (VAR) model, a technique that is commonly used in the analysis of multivariate time series [28]. In a VAR(L) model, each variable can have an influence on all other variables with a maximum time lag equal to L (see Materials and methods).

First, we build a VAR model using daily time series of Web news, TV captions and Wikipedia pageviews, through the VARS package in R [29]. By computing the Schwartz-Bayes information criterion (BIC) and the Hannan-Quinn criterion for different VAR(L) models, with L ranging between 1 and 40, we identify the optimal time lag to be L = 8 days. A Granger causality test based on the best VAR model shows that each variable does Granger-cause the other two variables in the model, with a smaller effect of Web news (p < 0.05) with respect to TV (p < 10−4) and Wikipedia pageviews (p < 10−4). On the other hand, a Wald-type test supports a relationship of instantaneous causality among all the three variables: Web news instantaneously cause Wikipedia pageviews and TV mentions (χ2 = 86.321, df = 2, p < 10−6); TV mentions instantaneously cause Wikipedia pageviews and Web news (χ2 = 97.335, df = 2, p < 10−6); Wikipedia pageviews instantaneously cause Web news and TV mentions (χ2 = 89.23, df = 2, p < 10−6).

To include the ZIKV incidence timeline in our analysis, we build another VAR model with 4 weekly time series: Zika-related pageviews, TV mentions, Web news and ZIKV incidence. For this model we identify the optimal time lag to be L = 2 weeks, based on the Schwartz-Bayes information criterion. A Granger causality test using the VAR(2) model identifies a causal relationship between Wikipedia pageviews and the other variables (p < 10−3), and a causal relationship between Web news and the other variables (p < 0.05). Again, based on a Wald test, an instantaneous causal relationship between TV captions, Web news and Wikipedia pageviews and the other variables is supported (p < 10−3). ZIKV incidence, instead, does not Granger-cause the other time series nor there is an instantaneous causal relationship between the epidemic profile and the other variables (p = 0.27).

While a preferential causal direction between media coverage and Wikipedia pageviews does not emerge from the Granger causality analysis, our results support the idea that the media signal and the pageview timeline are highly synchronized, so that the hypothesis of instantaneous causality is supported, both at daily and weekly scale. On the contrary, the epidemiological ZIKV curve does not show to have a predictive power nor in a Granger-causal framework nor as an instantaneous driver of the public attention.

An equal-time predictive model of collective attention

Prompted by the results of the time series analysis, we consider the following task: now-casting the number of Zika-related Wikipedia pageviews in each state based on the volume of nationwide media coverage.

We begin by building an equal-time regression model that predicts the weekly number of Zika-related Wikipedia pageviews for each state, rescaled by state population, based exclusively on the frequencies of Zika-related mentions in Web news and TV closed captions. That is, we assume that information seeking behavior in Wikipedia is driven, at any given point in time, by same-week exposure to media sources. Since our goal is uncovering drivers of collective attention, rather than achieving optimal prediction of the empirical time series, we choose an equal-time modeling approach over standard time series modeling techniques (e.g., autoregressive models). More specifically, we start with a linear regression model that predicts population-rescaled pageview counts for a given week and a given state using only national Web news and TV data for the same week. We focus on 43 states with population in excess of 1 million, comprising more than 98% of the U.S. population according to 2016 United States Census Bureau estimates [30].

These states are also those where the epidemic ZIKV activity in 2016 was the highest. In S7 Table, we report results including all states with the only exception of Alaska, where no ZIKV cases were reported in 2016. We train the model via state-wise cross-validation and evaluate its performance using the determination coefficient R2, the Pearson’s correlation coefficient r and the Spearman’s ρ. Despite its simplicity, this equal-time linear regression demonstrates that both media signals, taken independently, are already quite informative of the Zika population-rescaled pageview time series: using exclusively TV close captions we obtain R2 = 0.61 and r = 0.80, while using only Web news we obtain R2 = 0.52 and r = 0.78. Combining both features, the linear model achieves R2 = 0.63 and r = 0.82. Similar results are obtained when considering the Spearman’s ρ as a measure of performance (see Table 1). As model performance is evaluated via state-wise cross-validation, these results highlight that national-level media signals are highly informative of state-level pageview time series, once they are rescaled to take into account population size. As a reference, we compare the results obtained with media features with an equal-time linear regression informed only by the epidemiological signal, that is the ZIKV incidence in each state. As shown in Table 1, first row, the predictive performance of ZIKV incidence is generally poor with R2 = −0.398, r = −0.032 and ρ = 0.152, in line with the time series analysis presented above.

Table 1. Comparison of model performance with 10 different feature combinations.

For each feature, the average R2, Pearson r, Spearman ρ and AIC, computed over 43 states are reported. Average values of R2, r and ρ are computed under K-fold cross-validation (k = 10). The standard deviation is reported in parenthesis. AIC is computed for the best set of parameters for each model. The last column reports Δi = AICiAICmin for model comparison.

Features R2 Pearson r Spearman ρ AIC Δi
ZIKV -0.398 (0.032) -0.032 (0.030) 0.1515 (0.055) -917.21 131.35
TV 0.6091 (0.040) 0.8011 (0.021) 0.7833 (0.030) -1001.90 46.67
Web 0.5244 (0.046) 0.7780 (0.026) 0.7886 (0.030) -986.01 62.54
TV, Web 0.6266 (0.044) 0.8209 (0.023) 0.8105 (0.030) -1003.92 44.65
TV, m(TV) 0.7062 (0.047) 0.8675 (0.025) 0.7677 (0.029) -1025.90 22.66
Web, m(Web) 0.6352 (0.051) 0.8178 (0.027) 0.7419 (0.029) -1004.53 44.03
TV, Web, m(TV) 0.7318 (0.052) 0.8761 (0.026) 0.7963 (0.031) -1033.91 14.64
TV, Web, m(Web) 0.7603 (0.050) 0.8942 (0.026) 0.7546 (0.026) -1045.90 2.66
TV, Web, m(TV), m(Web) 0.7602 (0.050) 0.8942 (0.026) 0.7860 (0.028) -1044.64 3.92
TV, Web, m(TV), m(Web), state_news 0.7638 (0.045) 0.8937 (0.024) 0.7868 (0.027) -1048.56 0.0

To take into account the possibility of memory effects in the response to media exposure, we enrich the feature space of the regression model with additional features (time series) obtained by filtering the Web news and TV time series with an exponential memory kernel (see Methods for a complete description). The characteristic time τ of the memory kernel, describing news persistence in the attention response, is a new hyper-parameter of the model to be set via cross-validation. Table 1 summarizes the performance of the model, in terms of determination coefficient R2, Pearson’s r, and Spearman’s ρ, for 10 different sets of features. The introduction of a memory kernel increases the determination coefficient by about 20%, reaching an average R2 = 0.76. This is obtained for a characteristic time scale τ of about 2 weeks, over which collective attention is affected by past media exposure.

We also considered state-level news features obtained by counting the weekly number of mentions of each state in Web news. However, adding these features does not significantly improve the model predictions (Table 1, bottom row), although it yields the best performance according to the Akaike Information Criterion (AIC). Overall, by computing the AIC for each model and averaging over all states, three linear models based on TV, Web news, and state news, can be considered equally likely, assuming evidence for Δi = AICiAICmin < 4.

Discussion

Our study demonstrates that the temporal dynamics of Wikipedia pageviews in the United States during the ZIKV 2016 epidemic was highly predictable, even at state level, based on the volume of national and international news sources mentioning Zika and the United States. Collective attention to the ZIKV outbreak thus seems to have been mainly driven by news exposure and much less by the disease transmission dynamics, although the epidemic profile of ZIKV infections varied significantly from state to state and the risk of local transmission was not uniform across the country. Such picture describes a scenario where the awareness of the epidemic in the country is globally present, while local effects, as those due to the local spreading of awareness, play a less important role, following the terminology first introduced by Funk et al. [31].

Media outlets in the U.S. have a prominent role in defining the on-line public discourse [32]. The impact of media exposure on the collective awareness and risk perception during epidemic outbreaks has been investigated in previous works [18, 19, 33], however, only a few studies have attempted to quantitatively measure the effect of media engagement on epidemic awareness using empirical data from Web sources on a large scale [22, 34, 35]. While previous studies have focused on newspaper coverage of epidemics [36], we investigated the relationship between the exposure to TV coverage and online news, and the attention to Wikipedia pages. Our study confirms the high sensitivity of Wikipedia searches to breaking news and official announcements, in particular in the case of disaster events, as found by previous studies [37, 38]. On the other hand, the temporal dynamics of Wikipedia pageviews during the 2016 ZIKV epidemic showed a nonlinear dependence with media coverage: the Wikipedia pages activity was high in the initial phases of the outbreak, but it declined more quickly than media coverage. This can be explained by the fact that information on Wikipedia is rather static, and users will view Wikipedia pages immediately after the news breaks but they will not return in the next days, unless more recent events renew their attention [37].

Although assuming that collective attention follows media coverage is rather plausible, a Granger-causality analysis did not evidence a preferential causal relationship between pageviews and news sources. On the other hand, our data support an instantaneous causal relationship between Wikipedia readership and media coverage, suggesting that nowcasting Wikipedia pageviews based on the volume of media coverage is feasible.

During the 2016 ZIKV outbreak, different aspects of the epidemic, and the risks associated with the disease, received different levels of coverage by traditional media sources, depending on their level of newsworthiness. In general, media coverage was influenced by several factors, and only to a lesser extent by the progression of the epidemic. Although from a journalist’s perspective, it may be expected that news did not necessarily follow the number of ZIKV cases, such result highlight the limitations of several behavioral epidemic models that incorporate media effects. Indeed, choices made by journalists regarding newsworthiness of specific aspects related to ZIKV infection affected the level of knowledge and familiarity in the US population [39]. Also, the media narrative around ZIKV evolved over time, as found in a recent study by Yotam Ophir [40]. The content of the ZIKV news coverage shifted from focusing on scientific themes, at the beginning of the outbreak, to the description of social disruptions during the Summer Olympics. Since we did not examine the textual content of news items in our dataset, further research should identify specific themes emerging from media outlets that are most significantly associated to Wikipedia viewership during epidemic outbreaks. Also, we relied on the automatic tagging system provided by GDELT to identify news items mentioning Zika. However, the classification algorithm may have limitations and some items could be less relevant than others. Further research should assess the quality of the GDELT news tagging algorithm in the specific case of epidemic outbreaks [41]. Another limitation of our analysis is that we considered the content of mass media outlets only, and did not look into other news sources, such as online social networks. However, social media often provide an amplification channel for traditional media sources and online users consume information that is not very different from what appears on mass meda [42].

From an epidemiological standpoint, our results are consistent with the recent findings of Bragazzi et al. [43], who analysed various data streams to measure the global reaction to the 2015-2016 ZIKV outbreaks in different countries. Similarly, we did not find any statistically significant correlation between the viewership of Wikipedia pages and the ZIKV incidence data in the U.S. The correlation between ZIKV incidence and media coverage was also mild, and varied from state to state, confirming that media coverage was only relatively influenced by the actual progression of the epidemic over time.

The spatial granularity of Wikipedia pageview data would in principle allow for a more detailed geo-spatial analysis of time series data, in particular regarding languages and behavioral response by geographic areas. On the other hand, increasing the spatial resolution of the pageview data analysis could expose sensitive information about Wikipedia users’ locations and preferences. In our study, we decided to limit the risks associated to such analysis by looking only at cumulative pageview data in the US cities.

Our results have implications from a public health perspective. The importance of mass media coverage in eliciting the public attention to announcements by the CDC at the beginning of the Zika outbreak was highlighted by a recent study by Southwell et al. [44]. Consistently, we found that Wikipedia page activity was highest in conjunction of the alert raised by the CDC, suggesting that media coverage of official communications by public health authorities could effectively capture the public attention and elicit information seeking.

One might argue that our results may not generalize to all epidemic outbreaks. Indeed, the peculiar characteristics of the ZIKV infection, such as its association to mild symptoms and the relatively small size of the population at risk, due to the spatial distribution of the vector, may have influenced the attention dynamics during the outbreak.

As originally defined by Sandman [45], individual perception of risk is the result of a combination between the actual hazard and the emotional response in terms of concern, fear or anger. Sandman’s theory is often exemplified by the equation: Risk = Hazard + Outrage and it was one of the first attempt at ascribing to the public a role in risk communication, since in his view it’s the interplay between the external threats and personal aspects that shape the meaning of risk. Individual perception of risk during the 2016 ZIKV outbreak has probably decreased quickly, after it became clear that the infection did not pose an immediate threat to most individuals.

Moreover, survey based studies have found the perceived risk of infection for oneself to decrease considerably over time, during the course of a chikungunya outbreak, another vector-borne disease similar to ZIKV [46].

From the point of view of media coverage, the fact that most of ZIKV cases were asymptomatic, may have reduced newsworthiness, as media tend to pay more attention to visually compelling topics [39].

Epidemic outbreaks caused by different pathogens, possibly characterized by a higher transmissibility and more evident clinical symptoms, such as the Ebola virus or pandemic influenza, may lead to different attention patterns by media outlets, raising substantial more concern, and more persistently, in the population. However, it is reasonable to believe that media coverage would be, in any case, the main driver of collective attention, as it also has been during the 2014 West African Ebola virus epidemic [34, 47].

The increasing availability of novel data streams, such as social media, Web search queries and participatory surveillance data, provides an invaluable resource to measure and quantify the complex interplay between the spread of information, collective attention and the epidemiology of infectious diseases [48, 49]. Recently, Wikipedia pageview data have been increasingly used by researchers in epidemiology and infectious disease modeling [50, 51]. The overall value of Wikipedia data to measure and forecast the dynamics of infectious diseases has been debated [52] and, in general, Wikipedia-based forecasting models have been proved successful in the case of endemic or seasonal diseases, such as influenza, dengue or tubercolosis [51]. On the other hand, our study demonstrates that Wikipedia page viewership can provide a temporally resolved measure of collective attention during epidemic outbreaks caused by novel emerging diseases, at a high spatial granularity. Previous works have investigated the effects of external events on the activity of Wikipedia editors and on the number of pageviews [53, 54]. More generally, the characterization of the usage of Wikipedia as a source of information and as a proxy for measuring the global attention to real-world events has been studied [24, 38, 55, 56]. The results of our study add further evidence of the value of Wikipedia data in the field of digital epidemiology, especially for capturing information seeking behavior, and attention patterns during disease outbreaks [57].

We showed Wikipedia data can capture collective attention during outbreaks, however, we did not link such signal with a measure of individual behavioral response or the adoption of health protecting behaviors in the population. Detecting health-related behavioral changes from Web sources remains a challenging task. Previous studies have used TV viewing data to infer the behavioral response during the 2009 A/H1N1 pandemic in Mexico [58]. More recently, Poletto el al. [35] showed that an increased collective attention was correlated to changes in the hospital management of MERS-Cov patients, reducing the time from admission to isolation. Further research is needed to infer causal patterns between collective attention and behavioral responses, and to identify the most suitable approach to integrate them into disease-behavior models.

Materials and methods

Data sources

Wikipedia pageview counts

We collected hourly pageview data of the English Wikipedia pages “Zika virus” (https://en.wikipedia.org/wiki/Zika_virus) and “Zika fever” (https://en.wikipedia.org/wiki/Zika_fever) and their counterparts in 96 different Wikipedia projects. The complete list of the 128 monitored Wikipedia pages is provided in S1 Table of the Supporting Information.

The “Zika virus” and “Zika fever” pages are the only two pages in the English Wikipedia that provide information on the disease (note that “Zika” redirects to “Zika fever”) and on the pathogen causing the disease (“Zika virus”). They were, by far, the most accessed pages among all Zika related articles in the English Wikipedia, with the “Zika virus” page totalling almost 8 million worldwide views in 2016, and about 800,000 worldwide pageviews for “Zika fever”.

While aggregate hourly and daily pageview data for Wikipedia articles by language is released by the Wikimedia Foundation in the form of data dumps (https://dumps.wikimedia.org/other/pageviews/readme.html) and APIs (https://wikitech.wikimedia.org/wiki/Analytics/AQS/Pageviews), the geographic breakdown of this data is not made publicly available due to privacy reasons. The Wikimedia Foundation discards raw traffic data after a short retention window, but it collects and retains aggregate historical pageview counts with a geographic breakdown, dating back to 2015 (https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Pageview_hourly). The pageview data with geographical aggregations used in this study provides the total view counts and the following information for each page: hour, day, year, city, subdivision, country. Geo-location is based standard industry methods which provide a 90% accuracy at state level in the US and a 86% accuracy for cities in the US within a 50 km radius (https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Geolocation).

Access to this nonpublic pageview data was granted from the Wikimedia Foundation under a non-disclosure agreement as part of its formal collaboration policy. For the analysis conducted in this study, we first selected the pageview counts that were localized in the United States only, from January 1, 2016 until December 31, 2016. We then aggregated all pageview counts for the 128 monitored Wikipedia pages at daily and weekly timescale.

Web news

Data were downloaded from the Global Database of Events, Language and Tone (GDELT—http://www.gdeltproject.org), available on the Google Cloud Platform. The GDELT is created from real-time translation of worldwide news into 65 languages and updated every 15 minutes. Whenever GDELT detects a news report breaking anywhere the world, the report is then translated, processed to identify all events, counts, quotes, people, organisations, locations, themes, emotions, relevant imagery, video, and embedded social media posts. All the information is made available through an API.

In our study, we collected all news items published online in 2016, which mentioned the words “Zika” and “United States”, through the Google Cloud Platform. More specifically we selected those items matching TAX_DISEASE_ZIKA as a Theme and United_States as a Location. Themes (V2Themes) and Locations (V2Locations) are automatically identified by the platform based on the textual content of the items and each item can be assigned multiple themes or locations. The complete query is provided in the S1 Appendix of the Supporting Information. The dataset contains a total of 112,706 news items from 7,737 different Web news outlets, in any of the 65 languages covered by the GDELT platform. News outlets are not necessarily based in the United States, the only constraint is that each news item mentions the United States. Time series analysed in our study report the number of news items citing Zika that appeared each day in 2016. Multiple mentions of the word “Zika” in the same article are not counted, therefore we consider the volume of individual articles mentioning Zika. Metadata associated to each news item allow to select only news mentioning a specific geographic entity beyond the United States, such as States or counties. Each item can be associated to multiple States, therefore in the State level analysis the same item can be counted in the time series of different States at the same time.

TV captions

Data were downloaded from the TV News Archive (https://archive.org/details/tv) which is a research library service launched in September 2012. The service is provided by the Internet Archive which, among other sources, collects and preserves television news. The TV News Archive repurposes closed captioning to enable users to search, quote and borrow U.S. TV news programs. For this study, we collected TV news items by searching all mentions of the word “Zika” in the closed captions of any TV News show aired in the United States in 2016, available from the Archive. For each item, the following information is provided: time, TV station, TV program, text snippet of the caption. In total, the dataset comprises 23,855 timestamped mentions of the word “Zika” from 1,410 different TV programs, both in English and Spanish, aired by 64 U.S. TV stations. We did not limit our query to those languages, since the TV News Archive includes broadcasting networks in other languages too. However, English and Spanish were the only languages resulting from our search. Time series analyzed in our study report the number of mentions of “Zika” that appeared each day in 2016, thus including multiple mentions of the word in the same program.

Zika case notification data

Incidence data of the Zika virus in the United States was collected from the weekly reports published by the CDC. The reports and the associated data were made publicly available by the CDC on GitHub (https://github.com/cdcepi/zika). The CDC epidemiological reports provide the cumulative number of Zika cases by State, starting from February 24, 2016. Additional case counts of January-February 2016 were extracted from CDC official media releases and included in the dataset.

Vector Autoregression models and Granger causality test

We build the Vector Autoregression models using the R package VARS [29]. The VAR model is of the following form:

yt=A1yt-1++ALyt-L+CDt+TDt+ut (1)

where y is the vector of endogenous variables, which has dimension 3 × 1 for daily time series (including Wikipedia pageviews, Web news, TV captions) and 4 × 1 for weekly time series (with the addition of ZIKV incidence). L is the lag order and u assigns a spherical disturbance term of the same dimension of y. The model also includes both a constant and a trend regressors represented by CDt and TDt. The model is fit by OLS per equation. We determine the optimal lag length L by comparing the Schwartz-Bayes information criterion (BIC) and the Hannan-Quinn criterion for lags up to 40 days for daily time series and up to 8 weeks for weekly time series.

Using the R package VARS, we perform two causality tests for both models, at daily and weekly scale, testing the causal hypothesis for each variable in the model. The first test is a F-type Granger-causality test. The second is a Wald-type test that is characterized by testing for nonzero correlation between the error processes of the cause and effect variables.

Equal-time regression model

We model the weekly number of pageview counts to Zika-related Wikipedia pages in each state with a linear regression of the form

PV^s(w)=inbiXi (2)

where PV^s(w) is the Wikipedia pageview count in state s on week w, rescaled by the state population. The rescaling of pageview data takes the form:

PV^s(w)Nsβ (3)

where Ns is the state population and β = 1.1397 is a scaling exponent independently estimated on the total volume of pageviews in each state by adopting the probabilistic framework of Leitão et al.[59] (details are reported in the S2 Appendix of the Supporting Information). By K-fold (k = 10) and leave-one-out cross validation, we test the performance of the model considering different linear combinations of features Xi. Specifically, we considered as model features the weekly media timelines Y(w), where Y = TV, Web or Webstate, and Webstate, and Webstate represents the selection of Web news mentioning only a specific state name together with the word “Zika”. We also consider as a reference the case Y(w) = ZIKVs(w), where ZIKVs(w) is the weekly number of reported ZIKV cases in state s. To take into account the saturation effect due to media exposure, we also considered an exponentially decaying function of the media timelines Y (Y = TV, Web):

m(Y)=Δt=1Δtmaxe-ΔtτY(w-Δt) (4)

where τ is a free parameter, setting the memory time scale, and Δtmax is defined by the total length of the time series up to week wtmax = w). Thus, the full model with all the 5 media features under consideration takes the following form:

PV^s(w)=a·TV(w)+b·Web(w)+c·m(Web)+d·m(TV)+e·Webstate(w). (5)

A list of the best estimates for the model’s coefficients and the 10 feature combinations of Table 1 is reported in the S6 Table.

Supporting information

S1 Appendix. GDELT query.

SQL code has been used to query the GDELT platform through the Google BigQuery API.

(PDF)

S2 Appendix. Wikipedia pageview scaling.

(PDF)

S1 Table. Wikipedia pages under study.

Full list of the 128 Wikipedia pages whose page view counts were monitored in the study. The field language refers to the language codes defined by ISO 639-1 and ISO 639-3.

(PDF)

S2 Table. Correlations between Wikipedia pageviews, the Web news mentioning Zika and TV close captions in 2016.

The table reports the Pearson’s correlation coefficient r for the Wikipedia page view counts, the Web news mentioning Zika and the TV close captions at national level. All values of r are statistically significant at p < 10−4.

(PDF)

S3 Table. Correlations between Wikipedia pageviews and news mentioning Zika by state.

All states are ranked by Pearson’s r values, in descending order.

(PDF)

S4 Table. Correlations between Wikipedia pageviews and ZIKV incidence by state.

All states are ranked by Pearson’s r values, in descending order.

(PDF)

S5 Table. Correlations between news mentioning Zika and ZIKV incidence by state.

All states are ranked by Pearson’s r values, in descending order.

(PDF)

S6 Table. Parameters of the best fitted models for all selected features.

(PDF)

S7 Table. Comparison of model performance for 49 states and D.C.

(PDF)

Acknowledgments

We gratefully acknowledge the Wikimedia Foundation for supporting this work through their formal collaboration and open access policies. We thank Dario Taraborelli for his early support of this study.

Data Availability

Data are available from the Zenodo repository (https://doi.org/10.5281/zenodo.3603916).

Funding Statement

MT, AP, DP and CC acknowledge the support by the Lagrange Project of the ISI Foundation funded by the CRT Foundation. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1. Jacobs W, Amuta AO, Jeon KC. Health information seeking in the digital age: An analysis of health information seeking behavior among US adults. Cogent Social Sciences. 2017;3(1):1302785 10.1080/23311886.2017.1302785 [DOI] [Google Scholar]
  • 2.Fox S, Duggan M. Health online 2013. Washington, DC: Pew Internet & American Life Project. 2013;.
  • 3. Ferguson N. Capturing human behaviour. Nature. 2007;446(7137):733–733. 10.1038/446733a [DOI] [PubMed] [Google Scholar]
  • 4. Poletti P, Caprile B, Ajelli M, Pugliese A, Merler S. Spontaneous behavioural changes in response to epidemics. Journal of theoretical biology. 2009;260(1):31–40. 10.1016/j.jtbi.2009.04.029 [DOI] [PubMed] [Google Scholar]
  • 5. Funk Sebastian and Salathé Marcel and Jansen Vincent. Modelling the influence of human behaviour on the spread of infectious diseases: a review. Journal of The Royal Society Interface. 2010;7(50):1247–1256. 10.1098/rsif.2010.0142 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Bauch CT, Galvani AP. Social factors in epidemiology. Science. 2013;342(6154):47–49. 10.1126/science.1244492 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Perra Nicola and Balcan Duygu and Gonçalves Bruno and Vespignani Alessandro. Towards a characterization of behavior-disease models. PLoS One. 2011;6(8):e23084 10.1371/journal.pone.0023084 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Funk S, Bansal S, Bauch CT, Eames KT, Edmunds WJ, Galvani AP, et al. Nine challenges in incorporating the dynamics of behaviour in infectious diseases models. Epidemics. 2015;10:21–25. 10.1016/j.epidem.2014.09.005 [DOI] [PubMed] [Google Scholar]
  • 9. Lessler J, Chaisson LH, Kucirka LM, Bi Q, Grantz K, Salje H, et al. Assessing the global threat from Zika virus. Science. 2016;353(6300):aaf8160 10.1126/science.aaf8160 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Mlakar J, Korva M, Tul N, Popović M, Poljšak-Prijatelj M, Mraz J, et al. Zika virus associated with microcephaly. N Engl J Med. 2016;2016(374):951–958. 10.1056/NEJMoa1600651 [DOI] [PubMed] [Google Scholar]
  • 11.World Health Organization. WHO Director-General summarizes the outcome of the Emergency Committee regarding clusters of microcephaly and Guillain-Barré syndrome; 2016. Available from: http://www.who.int/mediacentre/news/statements/2016/emergency-committee-zika-microcephaly/en/.
  • 12.World Health Organization. Fifth meeting of the Emergency Committee under the International Health Regulations (2005) regarding microcephaly, other neurological disorders and Zika virus; 2016. Available from: http://www.who.int/mediacentre/news/statements/2016/zika-fifth-ec/en/.
  • 13. Malone RW, Homan J, Callahan MV, Glasspool-Malone J, Damodaran L, Schneider ADB, et al. Zika virus: medical countermeasure development challenges. PLoS Negl Trop Dis. 2016;10(3):e0004530 10.1371/journal.pntd.0004530 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Ferguson NM, Cucunubá ZM, Dorigatti I, Nedjati-Gilani GL, Donnelly CA, Basáñez MG, et al. Countering the Zika epidemic in Latin America. Science. 2016;353(6297):353–354. 10.1126/science.aag0219 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Zhang Q, Sun K, Chinazzi M, y Piontti AP, Dean NE, Rojas DP, et al. Spread of Zika virus in the Americas. Proceedings of the National Academy of Sciences. 2017;114(22):E4334–E4343. 10.1073/pnas.1620161114 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.World Health Organization Regional Office for Europe. Zika virus and emerging mosquito—borne diseases: The European emergency risk communication challenge. A response guide; 2017. Available from: http://www.euro.who.int/en/health-topics/emergencies/zika-virus/technical-reports-and-guidelines-on-zika-virus/emergency-risk-communications/zika-virus-and-emerging-mosquito-borne-diseases-the-european-emergency-risk-communication-challenge.-a-response-guide-2017.
  • 17.Harvard T H Chan School of Public Health. Many U.S. families considering pregnancy don’t know Zika facts; 2016. Available from: https://www.hsph.harvard.edu/news/press-releases/zika-virus-awareness-pregnant-women/.
  • 18. Shih TJ, Wijaya R, Brossard D. Media coverage of public health epidemics: Linking framing and issue attention cycle toward an integrated theory of print news coverage of epidemics. Mass Communication & Society. 2008;11(2):141–160. 10.1080/15205430701668121 [DOI] [Google Scholar]
  • 19. Young ME, Norman GR, Humphreys KR. Medicine in the popular press: the influence of the media on perceptions of disease. PLoS One. 2008;3(10):e3552 10.1371/journal.pone.0003552 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Collinson S, Heffernan JM. Modelling the effects of media during an influenza epidemic. BMC public health. 2014;14(1):376 10.1186/1471-2458-14-376 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Collinson S, Khan K, Heffernan JM. The effects of media reports on disease spread and important public health measurements. PloS one. 2015;10(11):e0141423 10.1371/journal.pone.0141423 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Mitchell L, Ross JV. A data-driven model for influenza transmission incorporating media effects. Open Science. 2016;3(10):160481. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Lewis N. Information seeking and scanning. The international encyclopedia of media effects. 2017; p. 1–10. 10.1002/9781118783764.wbieme0156 [DOI] [Google Scholar]
  • 24. García-Gavilanes R, Mollgaard A, Tsvetkova M, Yasseri T. The memory remains: Understanding collective memory in the digital age. Science Advances. 2017;3(4):e1602368 10.1126/sciadv.1602368 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Ferron M, Massa P. Beyond the encyclopedia: Collective memories in Wikipedia. Memory Studies. 2014;7(1):22–45. 10.1177/1750698013490590 [DOI] [Google Scholar]
  • 26. Shacham E, Nelson EJ, Hoft DF, Schootman M, Garza A. Potential High-Risk Areas for Zika Virus Transmission in the Contiguous United States. American journal of public health. 2017;107(5):724–731. 10.2105/AJPH.2017.303670 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Vigna S. A weighted correlation index for rankings with ties. In: Proceedings of the 24th international conference on World Wide Web. International World Wide Web Conferences Steering Committee; 2015. p. 1166–1176.
  • 28. Lütkepohl H. New introduction to multiple time series analysis. Springer Science & Business Media; 2005. [Google Scholar]
  • 29. Pfaff B. VAR, SVAR and SVEC Models: Implementation Within R Package vars. Journal of Statistical Software. 2008;27(4). 10.18637/jss.v027.i04 [DOI] [Google Scholar]
  • 30.United States Census Bureau. Annual Estimates of the Resident Population for the United States, Regions, States, and Puerto Rico: April 1, 2010 to July 1, 2016.;. Available from: https://www2.census.gov/programs-surveys/popest/tables/2010-2016/state/totals/nst-est2016-01.xlsx.
  • 31. Funk Sebastian and Gilad Erez and Wtkins Chris and Jansen Vincent AA. The spread of awareness and its impact of epidemic outbreaks. Proceeding of the National Academy of Science. 2009;106(16):6872–6877. 10.1073/pnas.0810762106 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. King G, Schneer B, White A. How the news media activate public expression and influence national agendas. Science. 2017;358(6364):776–780. 10.1126/science.aao1100 [DOI] [PubMed] [Google Scholar]
  • 33. Young ME, King N, Harper S, Humphreys KR. The influence of popular media on perceptions of personal and population risk in possible disease outbreaks. Health, risk & society. 2013;15(1):103–114. 10.1080/13698575.2012.748884 [DOI] [Google Scholar]
  • 34. Towers S, Afzal S, Bernal G, Bliss N, Brown S, Espinoza B, et al. Mass media and the contagion of fear: the case of Ebola in America. PloS one. 2015;10(6):e0129179 10.1371/journal.pone.0129179 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Poletto C, Boëlle PY, Colizza V. Risk of MERS importation and onward transmission: a systematic review and analysis of cases reported to WHO. BMC infectious diseases. 2016;16(1):448 10.1186/s12879-016-1787-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. Smith KC, Rimal RN, Sandberg H, Storey JD, Lagasse L, Maulsby C, et al. Understanding newsworthiness of an emerging pandemic: International newspaper coverage of the H1N1 outbreak. Influenza and other respiratory viruses. 2013;7(5):847–853. 10.1111/irv.12073 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Geiß S, Leidecker M, Roessing T. The interplay between media-for-monitoring and media-for-searching: How news media trigger searches and edits in Wikipedia. New Media & Society. 2016;18(11):2740–2759. 10.1177/1461444815600281 [DOI] [Google Scholar]
  • 38. García-Gavilanes R, Tsvetkova M, Yasseri T. Dynamics and biases of online attention: the case of aircraft crashes. Royal Society open science. 2016;3(10):160460 10.1098/rsos.160460 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Ophir Y, Jamieson KH. The Effects of Zika Virus Risk Coverage on Familiarity, Knowledge and Behavior in the US–A Time Series Analysis Combining Content Analysis and a Nationally Representative Survey. Health Communication. 2018;35(1):35–45. 10.1080/10410236.2018.1536958 [DOI] [PubMed] [Google Scholar]
  • 40. Ophir Y. Coverage of epidemics in American newspapers through the lens of the crisis and emergency risk communication framework. Health security. 2018;16(3):147–157. 10.1089/hs.2017.0106 [DOI] [PubMed] [Google Scholar]
  • 41.Balashankar A, Dugar A, Subramanian L, Fraiberger S. Reconstructing the MERS disease outbreak from news. In: Proceedings of the Conference on Computing & Sustainable Societies. ACM; 2019. p. 272–280.
  • 42. Conway BA, Kenski K, Wang D. The rise of Twitter in the political campaign: Searching for intermedia agenda-setting effects in the presidential primary. Journal of Computer-Mediated Communication. 2015;20(4):363–380. 10.1111/jcc4.12124 [DOI] [Google Scholar]
  • 43. Bragazzi NL, Alicino C, Trucchi C, Paganino C, Barberis I, Martini M, et al. Global reaction to the recent outbreaks of Zika virus: Insights from a Big Data analysis. PloS one. 2017;12(9):e0185263 10.1371/journal.pone.0185263 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44. Southwell BG, Dolina S, Jimenez-Magdaleno K, Squiers LB, Kelly BJ. Zika virus–related news coverage and online behavior, United States, Guatemala, and Brazil. Emerging infectious diseases. 2016;22(7):1320 10.3201/eid2207.160415 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45. Sandman PM. Responding to community outrage: Strategies for effective risk communication. AIHA; 1993. [Google Scholar]
  • 46. Raude J, MCColl K, Flamand C, Apostolidis T. Understanding health behaviour changes in response to outbreaks: findings from a longitudinal study of a large epidemic of mosquito-borne disease. Social Science & Medicine. 2019;230:184–193. 10.1016/j.socscimed.2019.04.009 [DOI] [PubMed] [Google Scholar]
  • 47. Alicino C, Bragazzi NL, Faccio V, Amicizia D, Panatto D, Gasparini R, et al. Assessing Ebola-related web search behaviour: insights and implications from an analytical study of Google Trends-based query volumes. Infectious diseases of poverty. 2015;4(1):54 10.1186/s40249-015-0090-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48. Salathe M, Bengtsson L, Bodnar TJ, Brewer DD, Brownstein JS, Buckee C, et al. Digital epidemiology. PLoS computational biology. 2012;8(7):e1002616 10.1371/journal.pcbi.1002616 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49. Althouse BM, Scarpino SV, Meyers LA, Ayers JW, Bargsten M, Baumbach J, et al. Enhancing disease surveillance with novel data streams: challenges and opportunities. EPJ Data Science. 2015;4(1):17 10.1140/epjds/s13688-015-0054-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50. McIver D, Brownstein J. Wikipedia Usage Estimates Prevalence of Influenza-Like Illness in the United States in Near Real-Time. Plos Computational Biology. 2014;10(4):e1003581 10.1371/journal.pcbi.1003581 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51. Generous N, Fairchild G, Deshpande A, Del Valle SY, Priedhorsky R. Global disease monitoring and forecasting with Wikipedia. PLoS computational biology. 2014;10(11):e1003892 10.1371/journal.pcbi.1003892 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Priedhorsky R, Osthus D, Daughton AR, Moran KR, Generous N, Fairchild G, et al. Measuring Global Disease with Wikipedia: Success, Failure, and a Research Agenda. In: Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing. ACM; 2017. p. 1812–1834. [DOI] [PMC free article] [PubMed]
  • 53.Keegan B, Gergle D, Contractor N. Hot off the wiki: dynamics, practices, and structures in Wikipedia’s coverage of the tōhoku catastrophes. International Symposium on Wikis. 2011; p. 105–113.
  • 54. Ratkiewicz J, Fortunato S, Flammini A, Menczer F, Vespignani A. Characterizing and modeling the dynamics of online popularity. Physical Review Letters. 2010;105(158701). 10.1103/PhysRevLett.105.158701 [DOI] [PubMed] [Google Scholar]
  • 55.Osborne M, Petrović S, McCreadie R, Macdonald C, Ounis I. Bieber no more: First story detection using Twitter and Wikipedia. Proceedings of the Workshop on Time-aware Information Access. 2012;.
  • 56.Georgescu M, Kanhabua N, Krause D, Nejdl W, Siersdorfer S. Extracting event-related information from article updates in wikipedia. In: European Conference on Information Retrieval. Springer; 2013. p. 254–266.
  • 57. Tausczik Y, Faasse K, Pennebaker JW, Petrie KJ. Public anxiety and information seeking following the H1N1 outbreak: blogs, newspaper articles, and Wikipedia visits. Health communication. 2012;27(2):179–185. 10.1080/10410236.2011.571759 [DOI] [PubMed] [Google Scholar]
  • 58. Springborn M, Chowell G, MacLachlan M, Fenichel EP. Accounting for behavioral responses during a flu epidemic using home television viewing. BMC Infectious Diseases. 2015;15(1):21 10.1186/s12879-014-0691-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59. Leitão JC, Miotto JM, Gerlach M, Altmann EG. Is this scaling nonlinear? Royal Society open science. 2016;3(7):150649 10.1098/rsos.150649 [DOI] [PMC free article] [PubMed] [Google Scholar]
PLoS Comput Biol. doi: 10.1371/journal.pcbi.1007633.r001

Decision Letter 0

Rob J De Boer, Matthew (Matt) Ferrari

12 Aug 2019

Dear Dr Tizzoni,

Thank you very much for submitting your manuscript 'The impact of news exposure on collective attention during epidemics: case study of the 2016 U.S. Zika outbreak' for review by PLOS Computational Biology. Your manuscript has been fully evaluated by the PLOS Computational Biology editorial team and in this case also by independent peer reviewers. The reviewers appreciated the attention to an important problem, but raised some substantial concerns about the manuscript as it currently stands. While your manuscript cannot be accepted in its present form, we are willing to consider a revised version in which the issues raised by the reviewers have been adequately addressed. We cannot, of course, promise publication at that time.

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

Your revisions should address the specific points made by each reviewer. Please return the revised version within the next 60 days. If you anticipate any delay in its return, we ask that you let us know the expected resubmission date by email at ploscompbiol@plos.org. Revised manuscripts received beyond 60 days may require evaluation and peer review similar to that applied to newly submitted manuscripts.

In addition, when you are ready to resubmit, please be prepared to provide the following:

(1) A detailed list of your responses to the review comments and the changes you have made in the manuscript. We require a file of this nature before your manuscript is passed back to the editors.

(2) A copy of your manuscript with the changes highlighted (encouraged). We encourage authors, if possible to show clearly where changes have been made to their manuscript e.g. by highlighting text.

(3) A striking still image to accompany your article (optional). If the image is judged to be suitable by the editors, it may be featured on our website and might be chosen as the issue image for that month. These square, high-quality images should be accompanied by a short caption. Please note as well that there should be no copyright restrictions on the use of the image, so that it can be published under the Open-Access license and be subject only to appropriate attribution.

Before you resubmit your manuscript, please consult our Submission Checklist to ensure your manuscript is formatted correctly for PLOS Computational Biology: http://www.ploscompbiol.org/static/checklist.action. Some key points to remember are:

- Figures uploaded separately as TIFF or EPS files (if you wish, your figures may remain in your main manuscript file in addition).

- Supporting Information uploaded as separate files, titled Dataset, Figure, Table, Text, Protocol, Audio, or Video.

- Funding information in the 'Financial Disclosure' box in the online system.

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see here

We are sorry that we cannot be more positive about your manuscript at this stage, but if you have any concerns or questions, please do not hesitate to contact us.

Sincerely,

Matthew (Matt) Ferrari

Associate Editor

PLOS Computational Biology

Rob De Boer

Deputy Editor

PLOS Computational Biology

A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately:

[LINK]

The reviewers were positive in their reviews, but raised a number of issues that should be addressed in revision. All 3 reviewers, in some form, raised the issue of interpreting correlations as causative and the direction of that causation. R3, in particular, raises several issues about whether or not information seeking or simply increased attention, and specifically increased media attention. This should be discussed further in a revision.

R1 and R3 raise several questions about the inclusion of articles in the study (and the possibility of duplicates). If it is possible to address these by looking at the sensitivity of the results to alternate inclusion criteria, this should be done. If this is not possible, then the authors should at least provide text that would answer to the reviewers' questions in the main text.

The reviewers also raise a number of small technical questions that should be clarified.

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: The authors have submitted an enjoyable, thoughtful, and timely paper. I enjoyed assessing it. The manuscript and the study harbor some important weaknesses, however, that should be addressed before moving forward.

Although the paper is generally well written, the piece would benefit from some additional copyediting. For example, there appears to be an errant question mark in brackets on page 11 of the PDF. The title also may be somewhat misleading, as I believe this is a study of behavior in the US relative to the Zika outbreak but the 2016 outbreak wasn't limited to the US nor was it really centered in the US, per se. The transmission that actually occurred in the US was quite limited. The authors might just want to revisit how exactly they are framing the phenomenon being studied. There was US transmission but much of the news coverage driving the search behavior reflected international events, unless I am mistaken.

The authors also miss an opportunity to connect this work to other highly relevant results that have appeared in other journals. Consider, for example, Southwell et al. in Emerging Infectious Diseases, a piece that connects news coverage, search data, and social media data regarding the Zika outbreak discussed here. The story is remarkably similar to what we found and so that would be an important foundational citation for the discussion here. Of course, the outcome measures of information seeking are somewhat different but nonetheless the story of ephemeral effect driven by news coverage that was not completely tied to actual epidemiological patterns is consistent. Beyond that, the authors also have an opportunity to connect this piece with a limited but nonetheless important subset of the communication research literature which has looked at communication as a process unfolding over time and which as a result has included time-based analysis of news effects on behavior.

On a related note, the other major limitation of the paper is the mismatch between theoretically longitudinal concerns with correlational data. Chances are good that time series analysis will tell us a similar story as the correlational analyses here but nonetheless because cases are ordered in time there is a risk that the analysis presented unfairly capitalizes on lurking autocorrelation in the data. The authors should comment on that and either present their findings using time series analysis or at least assure readers that such analysis provides a similar story in a detailed note.

If the authors can address these concerns, they will very likely improve the contribution of the piece.

Reference

Southwell, B. G., Dolina, S., Jimenez-Magdaleno, K., Squiers, L. B., & Kelly, B. J. (2016). Zika virus–related news coverage and online behavior, United States, Guatemala, and Brazil. Emerging Infectious Diseases, 22(7), 1320-1321.

Reviewer #2: This paper attempts to measure the drivers leading to information seeking in Wikipedia during the 2016 U.S. Zika outbreak. Specifically, the authors applied various statistical approaches to compare Wikipedia page view data against U.S. news outlets and TV shows at various spatial resolutions (i.e., national, state, and city level). The authors found that Wikipedia searches were driven by national media coverage and not by the magnitude of the outbreak. Although the methodology used is simple, the results are important for understanding the impact of media exposure on collective population behavior.

Major Comments

• Overall, very limited results are presented for the state and city-level analyses. Given the geo-localized Wikipedia pageview available to the authors, the paper could be strengthened by providing timeseries at the state and city level as well as more detailed analyses that showcase the various behavioral responses by geography and language.

• The authors considered daily pageviews on 128 different Zika-related Wikipedia articles in 96 languages. However, it is not clear if the Web news captured all 96 languages. Similarly, the TV captions appear to only include English and Spanish. Therefore, it is not clear if the analyses performed are correct given that the different data streams analyzed may be measuring different populations. In short, the same languages should be captured in all the data streams.

• A discussion on why the authors decided to only use Zika and Zika virus for most of the articles analyzed is warranted. Similarly, a justification on why they decided to focus on 43 states with population in excess of 1 million is needed.

• Are duplicate news articles included or removed from the analyses (GDELT)?

• What is the scaling exponent for city-level analyses? Does the scaling exponent (beta) change for each state or it remains constant across all the states?

• The authors should consider including tables for the correlations between Wikipedia and TV shows in the supplementary material.

• It’s hard to read the states in Figure 3, consider increasing the font.

Minor Comments:

• A reference is missing in page 11.

Reviewer #3: This is a very important, well-articulated, and thorough study, creatively using preexisting novel data to evaluate the relationships between Zika cases, news coverage, and Wikipedia visits. I have several reservations regarding the analysis and would strongly suggest you adding some missing literature (see below), but I believe these may be addressed through a revision. I applaud the authors for the meticulous and creative methodological work. My specific comments are as follow:

1) I suggest you add Sandman's perspective of risk into your literature and into the interpretation of the results. Sandman in his work on risk perception did a pretty good job explaining why experts perceive risk differently from health organizations such as the CDC. He has a list of what he called outrage factors - like novelty (see Sandman, 1987. Risk Communication: Facing Public Outrage), that could really explain why people in your study seem to have lost interest with time. Paul Slovic had some similar work that could also be implemented here - in any case, I think you should add a discussion of risk perception of laypersons and how it is different from that of experts - both Sandman and Slovic could be useful here. Following their perspective - your results are expected and make sense.

2) Information seeking - you seem to shift back and forth between collective attention and information seeking. First, I think some additional literature is needed re health info seeking - Nehama Lewis recently wrote a value on information seeking and scanning (2017) for the International Encyclopedia of Media Effects that could be useful. Second, be consistent with your main DV. Do you measure attention to information seeking?

3) Your analysis will also benefit from discussing a recent study that looked at the relationships between Zika news coverage and public knowledge, familiarity and information sharing in the US - using a large national survey. The paper is: Ophir & Jamieson (2018). The Effects of Zika Virus Risk Coverage on Familiarity, Knowledge and Behavior in the U.S. – A Time Series Analysis Combining Content Analysis and a Nationally Representative Survey. This seems especially relevant to your inquiry.

4) on p3 - you need to elaborate more on why ZIKV is a communication challenge - the explanations you bring are for why it was a public health challenge, not necessarily a communication one. Use risk or crisis communication literature here.

5) p4 - CDC should be Centers for Disease Control and Prevention (add "and Prevention")

6) p5 - I'm very concerned with the use of Pearson correlation for variables that will definitely be influenced by autocorrelation. In such case, where I expect autoregression in both observed and unobserved variables that could be correlated with the dependent, I strongly prefer the use of vector auroregression models (VARS). VARS models could be coupled with Granger causality tests to support a causal direction as well.

7) also on p5 - it has to do with the literature but I really don't think you should have expected a linear relationship between number of new cases and public reaction. First of all, while it's true that new cases were added, other prior cases were solved... and in almost all cases - Zika came and went without leaving any harm. So it completely makes sense to me that people lost interest in Zika after a while (with an exception for the Rio olympics where people probably considered its effects on visitors and athletes-- which was also prevalent in news coverage).

8) Similarly for the media - we have strong reasons to expect the media NOT to follow the number of cases. Again, I think some literature is missing here - something on newsworthiness (Galtung & Rouge, 1965 or something similar). For various reasons, Zika was just not interesting for journalists. It gained some attention when it was new and mostly unknown, but other than its effects on golfers etc. it wasn't a big deal for journalists. You can see more about changes in content in Ophir. (2018). Coverage of epidemics in American newspapers through the lens of the Crisis and Emergency Risk Communication Framework

9) p6 - how reliable do you believe the geographic data is? is it based on ip addresses? these are often inaccurate

10) still on p6 - you measure correlations, but by doing that you assume the public follows the media. While plausible, it could also be that journalists "feel" the public interest and opinion (for example in social media) and are therefore affected by people's attention to the disease. You should be careful about the causal assumption here and explain that it could be the other way around. Again - it would be helpful to use autoregressive models with Granger causality to provide additional support for direction, including the optimal lag.

11) p11 - line 254 - there is a question mark where a citation number should appear - "disease similar to ZIKV [?]"

12) p12 - I think you're too harsh on yourself saying you didn't measure public behavior. Information seeking is a behavior - you could say that you didn't measure behavior on the individual level, or did not measure health-related behavior, but information seeking is definitely a behavior and Wikipedia seems like a reasonable proxy for that.

13) I'm worried about your decision to limit articles to those including both Zika and United States - Do you really think all relevant articles will include the term "United States"? For example,- people interested in the Rio olympics in Brazil will look at articles that do not use the term US. Also - you assume that only articles that explicitly connect the disease to the States will have an impact, which might not be the case. Anyway - I would remove the US condition from your search and look at all US media mentions of Zika. But if you decide to stick with your decision, at least explain why you did so in the discussion section and how it affects your conclusions.

14) notice that figures are in low resolution. Please provide sharper versions in the revision (e.g., hard to read states' names, etc).

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: Yes

Reviewer #2: No: The Wikipedia page view was provided under a non-disclosure agreement.

Reviewer #3: No: The authors stated some data will not become available due to restrictions. They did, however provide the correlation matrix upon which the heatmap was built

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Brian Southwell

Reviewer #2: No

Reviewer #3: No

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1007633.r003

Decision Letter 1

Rob J De Boer, Matthew (Matt) Ferrari

6 Jan 2020

Dear Dr Tizzoni,

We are pleased to inform you that your manuscript 'The impact of news exposure on collective attention in the United States during the 2016 Zika epidemic' has been provisionally accepted for publication in PLOS Computational Biology.

Both reviewers were appreciative of the efforts made by the authors to address their concerns. Reviewer 3 did make a few small suggestions to language. I would encourage the authors to consider these changes as the suggested language may make arguments more acceptable to readers. These changes can be made at the proofs stage and I see no reason to delay the editorial process any further. 

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. Please be aware that it may take several days for you to receive this email; during this time no action is required by you. Once you have received these formatting requests, please note that your manuscript will not be scheduled for publication until you have made the required changes.

In the meantime, please log into Editorial Manager at https://www.editorialmanager.com/pcompbiol/, click the "Update My Information" link at the top of the page, and update your user information to ensure an efficient production and billing process.

One of the goals of PLOS is to make science accessible to educators and the public. PLOS staff issue occasional press releases and make early versions of PLOS Computational Biology articles available to science writers and journalists. PLOS staff also collaborate with Communication and Public Information Offices and would be happy to work with the relevant people at your institution or funding agency. If your institution or funding agency is interested in promoting your findings, please ask them to coordinate their releases with PLOS (contact ploscompbiol@plos.org).

Thank you again for supporting Open Access publishing. We look forward to publishing your paper in PLOS Computational Biology.

Sincerely,

Matthew (Matt) Ferrari

Associate Editor

PLOS Computational Biology

Rob De Boer

Deputy Editor

PLOS Computational Biology

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: The authors have addressed earlier review comments adequately. Thank you for the effort.

Reviewer #3: The authors took my and other reviewers' comments seriously and improved the paper accordingly. The result is a more careful analysis that takes into consideration the methodological and theoretical points raised by the reviewers, while better connecting to recent findings in other studies. I appreciate the hard work given to this manuscript in its original and revised versions. I believe the revised version is of considerable theoretical and practical importance for the fields of health, risk, and crisis communication.

I have very minor suggestions for the authors, but believe these could be communicated directly with the editor and do not require another round of peer-review:

1) remove the word "always" from line 212 on page 9

2) In line 297 of page 12, change the term "we could prove" to "our data support" (or at least change prove to support)

3) p13 - change "associated to the disease" to "associated with the disease"

4) p13, line 305 - change "may be obvious" to "may be expected"

5) notice Ophir & Jamieson (2018) was published in Volume 35 Issue 1 of Health Communication, pages 35-45 (https://www.tandfonline.com/doi/full/10.1080/10410236.2018.1536958)

6) I would find a more cautious language for "social media provide an amplification channel for traditional media sources"-- research on the topic is not as consistent as this sentence suggest (e.g., Harder et al., 2017. Intermedia agenda setting in the social media age: https://journals.sagepub.com/doi/10.1177/1940161217704969)

Once again, I commend the authors' meticulous and rigor work and look forward to reading the final publication.

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: Yes

Reviewer #3: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Brian Southwell

Reviewer #3: Yes: Yotam Ophir, Ph.D.

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1007633.r004

Acceptance letter

Rob J De Boer, Matthew (Matt) Ferrari

19 Feb 2020

PCOMPBIOL-D-19-00953R1

The impact of news exposure on collective attention in the United States during the 2016 Zika epidemic

Dear Dr Tizzoni,

I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Matt Lyles

PLOS Computational Biology | Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom ploscompbiol@plos.org | Phone +44 (0) 1223-442824 | ploscompbiol.org | @PLOSCompBiol

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Appendix. GDELT query.

    SQL code has been used to query the GDELT platform through the Google BigQuery API.

    (PDF)

    S2 Appendix. Wikipedia pageview scaling.

    (PDF)

    S1 Table. Wikipedia pages under study.

    Full list of the 128 Wikipedia pages whose page view counts were monitored in the study. The field language refers to the language codes defined by ISO 639-1 and ISO 639-3.

    (PDF)

    S2 Table. Correlations between Wikipedia pageviews, the Web news mentioning Zika and TV close captions in 2016.

    The table reports the Pearson’s correlation coefficient r for the Wikipedia page view counts, the Web news mentioning Zika and the TV close captions at national level. All values of r are statistically significant at p < 10−4.

    (PDF)

    S3 Table. Correlations between Wikipedia pageviews and news mentioning Zika by state.

    All states are ranked by Pearson’s r values, in descending order.

    (PDF)

    S4 Table. Correlations between Wikipedia pageviews and ZIKV incidence by state.

    All states are ranked by Pearson’s r values, in descending order.

    (PDF)

    S5 Table. Correlations between news mentioning Zika and ZIKV incidence by state.

    All states are ranked by Pearson’s r values, in descending order.

    (PDF)

    S6 Table. Parameters of the best fitted models for all selected features.

    (PDF)

    S7 Table. Comparison of model performance for 49 states and D.C.

    (PDF)

    Attachment

    Submitted filename: Response_to_reviewers.pdf

    Data Availability Statement

    Data are available from the Zenodo repository (https://doi.org/10.5281/zenodo.3603916).


    Articles from PLoS Computational Biology are provided here courtesy of PLOS

    RESOURCES