Tracking COVID-19 Infections Using Survey Data on Rapid At-Home Tests

Mauricio Santillana; Ata A Uslu; Tamanna Urmi; Alexi Quintana-Mathe; James N Druckman; Katherine Ognyanova; Matthew Baum; Roy H Perlis; David Lazer

doi:10.1001/jamanetworkopen.2024.35442

. 2024 Sep 30;7(9):e2435442. doi: 10.1001/jamanetworkopen.2024.35442

Tracking COVID-19 Infections Using Survey Data on Rapid At-Home Tests

Mauricio Santillana ^1,^2,^3,^✉, Ata A Uslu ³, Tamanna Urmi ^1,³, Alexi Quintana-Mathe ³, James N Druckman ⁴, Katherine Ognyanova ⁵, Matthew Baum ⁶, Roy H Perlis ^7,⁸, David Lazer ^3,^9,^10,¹¹

¹Machine Intelligence Group for the Betterment of Health and the Environment, Northeastern University, Boston, Massachusetts

²Department of Epidemiology, Harvard T.H. Chan School of Public Health, Cambridge, Massachusetts

³Network Science Institute, Northeastern University, Boston, Massachusetts

⁴Department of Political Science, University of Rochester, Rochester, New York

⁵School of Communication and Information, Rutgers University, New Brunswick, New York

⁶Department of Government, John F. Kennedy School of Government, Harvard University, Cambridge, Massachusetts

⁷Department of Psychiatry, Massachusetts General Hospital and Harvard Medical School, Boston

⁸Editor, JAMA Network Open

⁹Department of Political Science, Northeastern University, Boston, Massachusetts

¹⁰Khoury College of Computer Sciences, Northeastern University, Boston, Massachusetts

¹¹Institute for Quantitative Social Science, Harvard University, Cambridge, Massachusetts

Accepted for Publication: July 30, 2024.

Published: September 30, 2024. doi:10.1001/jamanetworkopen.2024.35442

^✉

Corresponding Author: Mauricio Santillana, PhD, Machine Intelligence Group for the Betterment of Health and the Environment, Network Science Institute, Northeastern University, 177 Huntington Ave, 2nd Floor, Boston, MA 02215 (msantill@g.harvard.edu).

Author Contributions: Dr Santillana and Mr Uslu had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.

Concept and design: Santillana, Uslu, Druckman, Baum, Lazer.

Acquisition, analysis, or interpretation of data: Santillana, Uslu, Urmi, Quintana-Mathe, Ognyanova, Baum, Perlis, Lazer.

Drafting of the manuscript: Santillana, Uslu, Urmi.

Critical review of the manuscript for important intellectual content: All authors.

Statistical analysis: Santillana, Uslu, Urmi.

Obtained funding: Santillana, Druckman, Ognyanova, Baum, Perlis, Lazer.

Administrative, technical, or material support: Santillana, Uslu, Quintana-Mathe, Druckman, Lazer.

Supervision: Santillana, Druckman, Ognyanova, Lazer.

Conflict of Interest Disclosures: Dr Santillana reported receiving institutional research funds from the Johnson & Johnson Foundation, Janssen Global Public Health, and Pfizer Pharmaceuticals Inc. Drs Ognyanova and Baum reported receiving grants from the National Science Foundation during the conduct of the study. Dr Perlis reported receiving personal fees from Vault Health, Genomind, Circular Genomics, Psy Therapeutics, Swan AI Studios, and Belle outside the submitted work. No other disclosures were reported.

Funding/Support: This study was supported by the US Department of Health and Human Services, National Institutes of Health, and National Institute of General Medical Sciences (grant R01GM130668); the Centers for Disease Control and Prevention Foundation (grant 200-2016-91779 and the Centers for Disease Control and Prevention Foundation (grant CDC-RFA-FT-23-0069); and the US Department of Health and Human Services, National Institutes of Health, and National Institute of Mental Health (grant RF132335).

Role of the Funder/Sponsor: The funders had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.

Disclaimer: Dr Perlis is Associate Editor of JAMA Network Open, but was not involved in any of the decisions regarding review of the manuscript or its acceptance.

Data Sharing Statement: See Supplement 2.

^✉

Corresponding author.

PMCID: PMC11443354 PMID: 39348120

Key Points

Question

Can nonprobability survey data accurately track institutionally confirmed COVID-19 cases in the US and provide estimates of unaccounted infections when rapid at-home tests are popularized and institutionalized tests are discontinued?

Findings

In this survey study conducted among 306 799 residents aged 18 years or older across 50 US states and the District of Columbia, the proportion of individuals reporting a positive COVID-19 infection in a longitudinal nonprobability survey closely tracked the institutionally reported proportions in the US (15.9% self-reported a test-confirmed COVID-19 infection), as well as nationally aggregated wastewater SARS-CoV-2 viral concentrations, from April 2020 to February 2022. Survey estimates suggest that a high number of confirmed infections may have been unaccounted for in official records starting in February 2022, when large-scale distribution of rapid at-home tests occurred; this finding was further confirmed by viral concentrations in wastewater.

Meaning

This study suggests that nonprobability online surveys can serve as an effective complementary method to monitor infections during an emerging pandemic and provide an alternative for estimating infections in the absence of institutional testing when at-home tests are widely available.

Abstract

Importance

Identifying and tracking new infections during an emerging pandemic is crucial to design and deploy interventions to protect populations and mitigate the pandemic’s effects, yet it remains a challenging task.

Objective

To characterize the ability of nonprobability online surveys to longitudinally estimate the number of COVID-19 infections in the population both in the presence and absence of institutionalized testing.

Design, Setting, and Participants

Internet-based online nonprobability surveys were conducted among residents aged 18 years or older across 50 US states and the District of Columbia, using the PureSpectrum survey vendor, approximately every 6 weeks between June 1, 2020, and January 31, 2023, for a multiuniversity consortium—the COVID States Project. Surveys collected information on COVID-19 infections with representative state-level quotas applied to balance age, sex, race and ethnicity, and geographic distribution.

Main Outcomes and Measures

The main outcomes were (1) survey-weighted estimates of new monthly confirmed COVID-19 cases in the US from January 2020 to January 2023 and (2) estimates of uncounted test-confirmed cases from February 1, 2022, to January 1, 2023. These estimates were compared with institutionally reported COVID-19 infections collected by Johns Hopkins University and wastewater viral concentrations for SARS-CoV-2 from Biobot Analytics.

Results

The survey spanned 17 waves deployed from June 1, 2020, to January 31, 2023, with a total of 408 515 responses from 306 799 respondents (mean [SD] age, 42.8 [13.0] years; 202 416 women [66.0%]). Overall, 64 946 respondents (15.9%) self-reported a test-confirmed COVID-19 infection. National survey-weighted test-confirmed COVID-19 estimates were strongly correlated with institutionally reported COVID-19 infections (Pearson correlation, r = 0.96; P < .001) from April 2020 to January 2022 (50-state correlation mean [SD] value, r = 0.88 [0.07]). This was before the government-led mass distribution of at-home rapid tests. After January 2022, correlation was diminished and no longer statistically significant (r = 0.55; P = .08; 50-state correlation mean [SD] value, r = 0.48 [0.23]). In contrast, survey COVID-19 estimates correlated highly with SARS-CoV-2 viral concentrations in wastewater both before (r = 0.92; P < .001) and after (r = 0.89; P < .001) January 2022. Institutionally reported COVID-19 cases correlated (r = 0.79; P < .001) with wastewater viral concentrations before January 2022, but poorly (r = 0.31; P = .35) after, suggesting that both survey and wastewater estimates may have better captured test-confirmed COVID-19 infections after January 2022. Consistent correlation patterns were observed at the state level. Based on national-level survey estimates, approximately 54 million COVID-19 cases were likely unaccounted for in official records between January 2022 and January 2023.

Conclusions and Relevance

This study suggests that nonprobability survey data can be used to estimate the temporal evolution of test-confirmed infections during an emerging disease outbreak. Self-reporting tools may enable government and health care officials to implement accessible and affordable at-home testing for efficient infection monitoring in the future.

This survey study characterizes the ability of data from nonprobability online surveys to longitudinally estimate the number of COVID-19 infections in the population both in the presence and in the absence of institutionalized testing.

Introduction

Identifying and tracking new infections during the earlier and most intense phases of the COVID-19 pandemic were crucial for the design of mitigation strategies; however, this was extremely challenging due to the novel nature of the pathogen.^1,2 The significant number of asymptomatic COVID-19 infections, the limited availability of resources to identify and treat infections across locations, and people’s lack of trust and willingness to seek medical attention were some of the most important challenges of estimating incidence numbers.^3,4,5,6 Multiple approaches to characterize the incidence of COVID-19 in the population were deployed in the US as infections spread.⁷ These approaches included (1) clinical-based individual testing (via polymerase chain reaction [PCR] or rapid tests)⁸; (2) tracking the number of patients in hospital visits with COVID-19 symptoms, such as fever, cough, sore throat, and anosmia (referred to as syndromic surveillance)⁹; (3) continuous monitoring of the presence of antibodies against SARS-CoV-2 in the blood serum samples in a population (referred to as antigen testing and serosurveillance)^3,10; and (4) measuring the amount of SARS-CoV-2 viral concentration in wastewater (WW) samples shed by infected individuals.^4,11,12

Among all these approaches, widespread institutional individual testing was the most heavily relied-on indicator to assess the severity of local outbreaks, allocate resources, and deploy or lift nonpharmaceutical mitigation interventions. Throughout the pandemic, however, testing availability and reporting were inconsistent in the US.¹³ For example, the COVID-19 tests—designed by the US Centers for Disease Control and Prevention—were recalled due to a faulty reagent¹⁴ during the earlier months of 2020, heterogenous state policies regarding access to free institutional testing led to inconsistencies in interpreting case count data,¹⁵ and the massive government-led distribution of rapid at-home tests starting in January 2022 without a concurrent deployment of a centralized infection reporting system meant that there was low coverage of testing.

Here, we study the ability of data collected from large US-based nonprobability surveys—the COVID States Project (CSP)—to estimate the number of COVID-19 infections from January 2020 to January 2023 at national and state levels. Multiple studies have investigated how surveys can be used to monitor infections, people’s behaviors, and trust in vaccines during specific periods and in particular geographical locations during the COVID-19 pandemic.^16,17 In this study, we further sought to assess the extent to which carefully analyzed survey data could have been used to monitor the number of COVID-19 infections continuously and longitudinally at national and state levels during the first 3 years of the pandemic in the US.

Methods

Study Design

We used data collected by an ongoing large-scale internet-based nonprobability survey conducted by an academic consortium approximately every 6 weeks from June 1, 2020, to January 31, 2023, inclusive of all 50 states and the District of Columbia. Survey participants were individuals aged 18 years or older who resided in the US. Before agreeing to participate in the survey, respondents were not aware that the survey would include questions related to the COVID-19 pandemic, to minimize selection bias. The survey used national and state-level representative quotas for sex, age, and race and ethnicity (Asian, Black, Hispanic, White, and another race [no specific races or ethnicities were specified for “another race”]) to represent the US population in the most recent census data. Participants were recruited using PureSpectrum, an online survey panel aggregator, and they provided informed consent online before survey access. The study protocol was reviewed and approved by the institutional review board of Harvard University as exempt as only deidentified data were used and no participant contact was required. This study followed the American Association for Public Opinion Research (AAPOR) reporting guideline.

From the fifth survey wave (June 2020) onward, the surveys asked 2 questions to identify the COVID-19 test frequency of participants, positive test results, and the month when they experienced symptoms. The precise wording of the questions can be found in the eMethods in Supplement 1.

Measures

All respondents were asked if they had been tested for COVID-19 in the past (not distinguishing between PCR test or antigen test in some waves), and those who indicated a positive test result were asked when they experienced symptoms. To estimate the number of infections happening in each month, we aggregated the number of respondents who indicated having a positive test result and were sick in each month, using only the immediately subsequent survey wave after each individual’s infection to minimize potential participants’ recall errors. Approximately 16% of respondents participated in multiple survey waves, and if they reported multiple infections in different months, we included their health status in each month that they reported an infection. Sensitivity analyses were conducted to test whether including more than 1 infection per respondent would yield different results compared with including at most only 1 (randomly selected) infection per respondent. The aggregated responses were demographically reweighted to represent the most recent US Census and normalized by the sample size to estimate the proportion of infected individuals at the national and state levels. The sample sizes and percentages of respondents who were sick in each month at the national level are shown in eTable 3 in Supplement 1. Institutionally confirmed COVID-19 infections were obtained from state and local governments and health departments by the Coronavirus Resource Center of Johns Hopkins University (JHU) and compiled by the New York Times. Finally, as an additional and independent measure of COVID-19 prevalence in the US, we used monthly aggregated WW SARS-CoV-2 viral concentrations from Biobot Analytics.¹⁸

Statistical Analysis

We conducted our statistical analyses in 2 different and nonoverlapping time periods within the first 3 years of the pandemic in the US, selected a priori. The first period was from April 1, 2020, to January 31, 2022, a time when institutional efforts to test individuals were most active, according to the number of daily PCR COVID-19 tests conducted (eFigure 10 in Supplement 1). The second period was from February 1, 2022, to January 1, 2023, a time when rapid at-home tests were massively distributed to the general public by the federal government. During this period, there was not a centralized system to record the rapid test outcomes, and an overall decrease in governmental resources allocated to monitor COVID-19 infections gradually occurred, culminating with the federal public health emergency for COVID-19 expiring in May 2023.

Correlation Analysis

We calculated pairwise correlation coefficients between the proportion of infected individuals as inferred by survey data (referred to as CSP) and the institutional numbers reported in the JHU COVID-19 dashboard for the 2 time periods at the national and state levels. We also calculated pairwise correlations between SARS-CoV-2 viral load in WW and both the CSP estimates and the JHU reported infections during the 2 distinct time periods. This was done at the national and state levels.

Survey Mean Estimates and SEs

We measured the distance between the official numbers and our survey estimates as multiples of the survey-based SEs (SD of the mean) of our estimates. We report both these standardized differences and the 95% CIs in the Figure for the national level.

Figure. — The percentage of respondents in our survey who reported having a confirmed COVID-19 infection in each month is shown in orange (CSP), the institutionally reported percentage of individuals infected in each month as monitored by Johns Hopkins University (JHU) is shown in dark blue, and the wastewater viral concentration of SARS-CoV-2 is shown in light blue. A vertical dotted line shows the time when at-home rapid tests were widely available in February 2022. *NYT* indicates the *New York Times*.

Excess Infection Estimates

We estimated the number of infections that were likely unaccounted for by institutional surveillance systems starting in February 2022 using 2 approaches. The first approach involved calculating the cumulative infections estimated from survey data from February 1, 2022, until January 1, 2023. This was achieved by multiplying the percentage of weight-corrected self-reported infected participants in each month by the total population of the US, and then by adding all these estimates across time. We then simply subtracted the cumulative number of infections reported by JHU from the survey’s cumulative estimates. A second approach involved calibrating a linear regression model to map CSP’s estimated incidence of COVID-19 infection onto JHU’s reported COVID-19 incidence from April 2020 to January 2022, to identify the association between the 2 models in the first 2 years when they closely tracked each other. We then used this model to estimate confirmed infections after February 2022, with the assumption that we had access only to CSP information. We computed cumulative values for the estimated JHU-reported incidence that would have been observed in the absence of any disruption (intervention) to the institutional COVID-19 surveillance, and then we compared these estimates with the cumulative cases calculated from JHU data. This method is detailed by De Salazar et al¹⁹ in the context of COVID-19 vaccine effectiveness and is frequently referred to as interrupted time-series analysis.^19,20

Results

The survey spanned 17 waves deployed from June 2020 to January 2023, with a total of 408 515 responses from 306 799 respondents (mean [SD] age, 42.8 [13.0] years; 202 416 respondents [66.0%] identified as women, and 104 383 respondents [34.0%] identified as men). A total of 16 715 respondents (5.4%) identified as Asian, 33 234 (10.8%) as Black, 24 938 (8.1%) as Hispanic, 219 448 (71.5%) as White, and 12 464 (4.1%) as another race. Overall, 64 946 respondents (15.9%) self-reported a test-confirmed COVID-19 infection. In aggregate and at the national level, COVID-19 case counts inferred from CSP surveys were highly correlated with JHU reports from April 2020 to January 2022 (Pearson r = 0.96; P < .001), as seen in the Figure and Table, with a mean (SD) state-level Pearson correlation coefficient r = 0.88 (0.07). After February 2022, soon after at-home rapid tests were massively distributed by the federal government, and up to January 2023, the Pearson correlation between CSP and JHU case counts decreased to r = 0.55 (P = .08) with a mean (SD) state-level Pearson correlation coefficient r = 0.48 (0.23). Sensitivity analysis shown in eFigure 11 in Supplement 1 yielded very similar temporal infection curves when including repeat participants only (randomly) once in our study. In addition, the Table shows that the concentrations of SARS-CoV-2 in WW data closely correlate with CSP surveys’ case counts both between April 2020 and January 2022 (r = 0.92; P < .001) and between February 2022 and January 2023 (r = 0.89; P < .001). Although SARS-CoV-2 in WW correlates (r = 0.79; P < .001) with JHU case counts before January 2022, this correlation decreased to r = 0.31 (P = .35) between February 2022 and January 2023. Consistent correlation patterns were observed at the state-level and are reported in eTables 1 to 5 and eFigures 4 and 9 in Supplement 1.

Table. National-Level Pairwise Pearson Correlations Between Survey Test-Confirmed Infection Estimates, Institutionally Reported COVID-19, and WW SARS-CoV-2 Viral Concentrations.

Data source	April 2020-January 2022 (pre–rapid test period)		February 2022-January 2023 (rapid test period)		April 2020-January 2023 (full-time period)
Data source	Pearson correlation coefficient r	P value	Pearson correlation coefficient r	P value	Pearson correlation coefficient r	P value
CSP-JHU	0.96	<.001	0.55	.08	0.78	<.001
CSP-WW	0.92	<.001	0.89	< .001	0.87	<.001
JHU-WW	0.79	<.001	0.31	.35	0.74	<.001

Open in a new tab

Abbreviations: CSP, survey test-confirmed infections estimates; JHU, Johns Hopkins University; WW, wastewater.

Using the first approach to calculate cumulative infections from February 1, 2022, to January 1, 2023 (after at-home tests distribution), at the national level, our survey estimates suggest that approximately 79 million (95% CI, 71 million to 86 million) confirmed cases may have occurred compared with 25 million reported in the JHU data. This estimate indicates that 54 million cases, more than twice as many as those reported, were likely unaccounted for in institutional surveillance. At the state level, the number of potentially unaccounted cases varies between 59 000 in Wyoming and 6.3 million in California. Our second (interrupted time-series) approach, the linear regression (JHU = 0.96 × CSP – 0.1) calibrated during April 2020 to January 2022 nationally, yielded consistent results, and it estimated that the cumulative number of positive cases from February 1, 2022, to January 1, 2023, was 73 million (95% CI, 65 million to 81 million) at the national level. State-level results are consistent and shown in eTable 4 and eFigure 3 in Supplement 1.

For 22 months, from April 2020 (when reliable institutionalized testing in the US accelerated) until February 2022, case counts in the official data fell under within 2 to 3 SEs away from our estimates, except for the months of January, November, and December 2021. However, from February 2022 onward (when the distribution of rapid at-home tests started), the distance between survey estimates and the officially reported cases started to diverge significantly, ranging from 6 SEs in February 2022 to 16 SEs during the peak of July 2022 (eFigure 1 in Supplement 1).

Discussion

Our results support the hypothesis that nonprobability surveys serve as a reliable and complementary method to monitor the proportion of test-confirmed infections in real time during a public health crisis. Specifically, by analyzing data from nonprobability surveys deployed approximately every 6 weeks during the first 2 years of the COVID-19 pandemic in the US, we found that the COVID-19 infections inferred from survey data closely tracked institutionally reported infections when institutional testing was at its best in the US. When institutional efforts to monitor COVID-19 infections diminished and rapid at-home tests were made widely accessible—with no centralized system to collect at-home test results—survey data suggested that a high proportion of test-confirmed infections were unaccounted for in institutional reports. When comparing with COVID-19 activity estimates obtained from SARS-CoV-2 concentrations in WW data, we found high consistency with the COVID-19 trends observed in surveys throughout this study.

The alignment of the 3 COVID-19 activity estimates—JHU, CSP, and WW—before January 2022 suggests that these surveillance systems were consistent and compatible with one another before the mass distribution of at-home tests. After February 2022, the consistency between CSP and WW COVID-19 activity, and the pronounced discordance between these 2 sources and JHU cases, suggest that both (1) CSP and WW data may have continued properly capturing COVID-19 infections trends and (2) the introduction of at-home rapid tests and the discontinuation of institutional testing disrupted institutional efforts (JHU) to track COVID-19 trends. Similar alignment between the 3 data sources was observed before January 2022 at the state level. There also were clear discrepancies between both CSP and WW data and JHU data after February 2022 at the state level, as shown in eFigures 2, 4, 5, and 6 in Supplement 1. Neither political affiliation, population size, nor health care spending per capita can explain the number of unreported infections per 100 000 inhabitants across states (eTable 5 in Supplement 1).

Although there have been multiple attempts to monitor or estimate the number of confirmed COVID-19 cases using alternative internet-based data sources, such as digital internet traces (eg, general population’s internet search queries and clinicians’ searches, among others²¹), human mobility data from smartphones,^22,23,24 self-test reporting systems,²⁵ and surveys such as ours,²⁶ this study presents one of the most comprehensive assessments of the quality of COVID-19 activity estimates using nonprobability surveys, at the national and state levels, for the first 3 years of the pandemic.

Other attempts to track COVID-19 cases have used cross-sectional (or limited-period) surveys starting in the early stages of the pandemic. Most attempts concluded that only a small fraction of COVID-19 cases were captured by institutional testing, consistent with our findings after February 2022—and perhaps also during the early weeks of the pandemic.⁵ For example, a study by Gallup²⁷ suggested that the number of COVID-19 infections on April 3, 2020, would be 2.5 times more than what the official numbers had suggested at that time if more people had undergone official testing. Another survey-based study conducted by Qasmieh et al²⁸ fielded between March 14 and 16, 2022, asked 1030 adult residents of New York City about COVID-19 testing and related outcomes from January 2022 onward. They applied representative survey weights as in our method to estimate the number of infections in New York City. They estimated that 1.8 million adults (95% CI, 1.6 million to 2.1 million adults) had a COVID-19 infection from January 1 to March 16, 2022, compared with 1.1 million cases that our survey numbers suggest for the same period in New York state.

In another online survey-based (N = 97 707) study by Rader et al,²⁹ researchers estimated that “2.6 million cases (95% CI, 1 874 549-3 853 341) were diagnosed by at-home tests and not included in the official case count” over the period from March 20 to May 21, 2022. Our surveys’ lower temporal resolution does not allow for a direct comparison with the daily estimates shown in the study by Rader et al.²⁹ However, when scaling our monthly estimates (March estimate × 1/3 + April estimate + May estimate × 2/3), we estimate that approximately 6 million infections nationally were not included in reported case counts in the same period. These estimates, while both imperfect, confirm that millions of test-confirmed infections were missed in this overlapping time period.

In another survey-based study, Qasmieh et al³⁰ estimated the cumulative incidence of COVID-19 cases during the preceding 14-day period (April 23 to May 8) to be 31 times the official case count: 1.5 million (95% CI, 1.3 million to 1.8 million) vs 50 000. In comparison, our estimates for April to May 2022 suggest that there were 1.2 million cases in New York state, much closer to the estimates in the study by Qasmieh et al.³⁰ Government-led efforts aimed at centralizing information about individual at-home test results include a National Institutes of Health initiative tasked with developing a self-test reporting standard.

We identified large discrepancies in COVID-19 estimates among all 3 data sources—CSP, JHU, and WW—before May 2020. Both CSP and WW data showed significantly higher estimates of COVID-19 activity than those reported by JHU. Although testing was very sparse and inconsistent during this period, our estimates aligned with other attempts, for example,²¹ that have used statistical corrections and multiple complementary data streams to estimate the total number of COVID-19 infections before April 2020. Specifically, Lu et al⁵ found the cumulative number of suspected (symptomatic, either test confirmed or not) infections as of April 4, 2020, to be as many as 2.3 million to 4.8 million cases, or approximately 25 times the number of institutionally reported cases in the US. Our estimates do not point to such high numbers because, by design, our goal was to track only test-confirmed infections.

Because our surveys were not specifically designed to attract the participation of individuals with COVID-19 symptoms or particularly interested in COVID-19 more broadly, our infection rate estimates should be less biased than those obtained from COVID-19–specific surveys, such as the “Facebook-CMU” (Carnegie Mellon University) survey or “Outbreaks Near Me.”^31,32 It has been documented that people who are experiencing symptoms are more motivated to report their experience in surveys,^33,34 and thus incidence rates tend to be inflated in disease-specific surveys because fewer healthy individuals participate in them.

One strength of survey-based infection cases surveillance is that it allows for the multivariate collection of disease activity information in parallel with other sociodemographic variables. Institutional data collection, in contrast, rarely allows for access to demographic details of those reported to be infected and thus precludes examining subgroup infection rates. Future studies of the survey data should closely analyze infection trends in different sociodemographic groups.

In future public health crises, survey-based approaches to monitor confirmed infections should be deployed in conjunction with either widely available institutional testing or diagnostic at-home tests. In the absence of that, no criterion standard will exist to assess the historical validity of survey-based approaches, and, thus, their robustness and generalizability may be limited.

Limitations

Our study has multiple limitations. The first is potential participants’ recall error. We used the answers from the most contemporaneous wave to estimate the number of infections in a month to mitigate recall bias. Another potential limitation might be entry errors in low-frequency responses. An expected low-frequency response in our study was whether the respondent was sick or not at the very beginning of the pandemic.³⁵ Entry errors for low-frequency responses in an approximately 20 000-respondent wave (considered large) could have inflated our test-confirmed infection estimates in the first 3 months of the pandemic.

Although we have robust statistical power nationally and within large states as shown in eTables 2, 4, and 5 and eFigures 7 and 8 in Supplement 1, our state-level analyses are far less precise, especially for states where we had smaller sample sizes. For many, a probability sampling approach to estimate unreported infections would be preferable because such approaches are often deemed to be more representative of the population being studied.^36,37 More work is needed to compare nonprobability and probability surveys in arriving at disease surveillance estimates. The main challenge for probability samples is the recent unit nonresponse bias that seems to be associated with the underrepresentation of particular groups of people.³⁸ More importantly, probability samples are extremely expensive and logistically difficult. An advantage of our approach is that the logistics and lower price enable both spatial and temporal coverage, which are vital for careful and consistent disease surveillance.

Conclusions

Our study supports the potential for applying surveys to complement government-led disease surveillance in future public health crises, despite some limitations that may be addressable in future deployments. Self-reporting tools may enable government and health care officials to implement accessible and affordable at-home testing for efficient infection monitoring in the future.

Supplement 1.

eMethods.

eFigure 1. Top panel: The percent of respondents in our survey (CSP) who reported having a confirmed COVID-19 infection in each month is shown in red, the institutionally reported percent of individuals infected in each month as monitored by JHU is shown in black, and the wastewater viral concentration of SARS-CoV-2 is shown in blue. A vertical green dashed line shows the time when at-home rapid test were widely delivered in February 2022. Bottom panel: differences between CSP and JHU new monthly infections as multiples of the standard error (of means) of the CSP estimates.

eTable 1. Average Pearson correlation for all US states between survey test-confirmed infections estimates (CSP), Institutionally reported COVID-19 (JHU), and Wastewater SARS-CoV-2 viral concentrations (WW) in two time periods

eFigure 2. COVID-19 case estimates from CSP data (with confidence intervals, in red), concentrations of SARS-CoV-2 in wastewater (blue), and institutional confirmed cases from JHU (black) for all US states

eFigure 3. Scatter plots of JHU COVID-19 infections vs CSP infections from April 2020 to January 2022 was used for training (blue) and from February 2022 to January 2023 (red)

eFigure 4. Scatter plots of JHU COVID-19 infections vs CSP infections from April 2020 to January 2022 was used for training (blue) and from February 2022 to January 2023 (red) at the State-level

eFigure 5. COVID-19 case estimates from Covid States Project data (with confidence interval) and official data source for all states

eFigure 6. Interrupted time-series approach: COVID-19 case estimates from JHU (black) and predictions using interrupted linear regression relating official source and Covid States Project for all states during April 2020 and January 2022

eFigure 7. Data availability from Biobot’s wastewater sequencing per month in each state

eFigure 8. Data availability from COVID states project survey per month in each state

eTable 2. State-level pairwise Pearson correlation and p-values between survey test-confirmed infections estimates (CSP), Institutionally reported COVID-19 (JHU), and Wastewater SARS-CoV-2 viral concentrations (WW) in three time periods

eTable 3. Monthly observed COVID-19 cases estimated from survey data for the multiple survey deployments

eTable 4. The differences in number of cases recorded during the period after rapid tests were deployed on ground (Feb’22 to Dec’22) between Official data source (New York Times) and Covid States survey and prediction obtained by training a linear regression using Covid states

eTable 5. Number of state-level unreported infection per 100,000 individuals as calculated from survey data

eFigure 9. Case estimation performed by BioBot using viral concentration in wastewater

eFigure 10. Daily COVID-19 tests administered in the United States per thousand people

eFigure 11. Sensitivity analysis of infection curves obtained by only including repeat respondents only once (chosen randomly) in our longitudinal analysis

jamanetwopen-e2435442-s001.pdf^{(2.3MB, pdf)}

Supplement 2.

Data Sharing Statement

jamanetwopen-e2435442-s002.pdf^{(16.1KB, pdf)}

References

1.Lipsitch M, Santillana M. Enhancing situational awareness to prevent infectious disease outbreaks from becoming catastrophic. Curr Top Microbiol Immunol. 2019;424:59-74. doi: 10.1007/82_2019_172 [DOI] [PubMed] [Google Scholar]
2.Russell TW, Golding N, Hellewell J, et al. ; CMMID COVID-19 working group . Reconstructing the early global dynamics of under-ascertained COVID-19 cases and infections. BMC Med. 2020;18(1):332. doi: 10.1186/s12916-020-01790-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Brown TS, Walensky RP. Serosurveillance and the COVID-19 epidemic in the US: undetected, uncertain, and out of control. JAMA. 2020;324(8):749-751. doi: 10.1001/jama.2020.14017 [DOI] [PubMed] [Google Scholar]
4.Centers for Disease Control and Prevention. COVID data tracker: COVID-19 update for the United States. Published November 20, 2023. Accessed November 21, 2023. https://covid.cdc.gov/covid-data-tracker/#datatracker-home
5.Lu FS, Nguyen AT, Link NB, et al. Estimating the cumulative incidence of COVID-19 in the United States using influenza surveillance, virologic testing, and mortality data: four complementary approaches. PLoS Comput Biol. 2021;17(6):e1008994. doi: 10.1371/journal.pcbi.1008994 [DOI] [PMC free article] [PubMed] [Google Scholar]
6.The COVID States Project. The decision to not get vaccinated, from the perspective of the unvaccinated. Accessed May 20, 2024. https://www.covidstates.org/reports/the-decision-to-not-get-vaccinated-from-the-perspective-of-the-unvaccinated
7.Mina MJ, Andersen KG. COVID-19 testing: one size does not fit all. Science. 2021;371(6525):126-127. doi: 10.1126/science.abe9187 [DOI] [PubMed] [Google Scholar]
8.Velavan TP, Meyer CG. COVID-19: a PCR-defined pandemic. Int J Infect Dis. 2021;103:278-279. doi: 10.1016/j.ijid.2020.11.189 [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Centers for Disease Control and Prevention. COVID data tracker. Published online March 2020. Accessed December 15, 2023. https://covid.cdc.gov/COVID-data-tracker/
10.Yamayoshi S, Sakai-Tagawa Y, Koga M, et al. Comparison of rapid antigen tests for COVID-19. Viruses. 2020;12(12):1420. doi: 10.3390/v12121420 [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Kirby AE, Walters MS, Jennings WC, et al. Using wastewater surveillance data to support the COVID-19 response—United States, 2020-2021. MMWR Morb Mortal Wkly Rep. 2021;70(36):1242-1244. doi: 10.15585/mmwr.mm7036a2 [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Wu F, Xiao A, Zhang J, et al. SARS-CoV-2 RNA concentrations in wastewater foreshadow dynamics and clinical presentation of new COVID-19 cases. Sci Total Environ. 2022;805:150121. doi: 10.1016/j.scitotenv.2021.150121 [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Kaashoek J, Santillana M. COVID-19 positive cases, evidence on the time evolution of the epidemic or an indicator of local testing capabilities? a case study in the United States. Published online April 10, 2020. Accessed December 15, 2023. doi: 10.2139/ssrn.3574849 [DOI]
14.Cohen J. The United States badly bungled coronavirus testing—but things may soon improve. Science. Published online February 28, 2020. Accessed December 15, 2023. https://www.science.org/content/article/united-states-badly-bungled-coronavirus-testing-things-may-soon-improve
15.Blavatnik School of Government, University of Oxford. COVID-19 government response tracker. Accessed November 3, 2022. https://www.bsg.ox.ac.uk/research/covid-19-government-response-tracker
16.Taube JC, Susswein Z, Bansal S. Spatiotemporal trends in self-reported mask-wearing behavior in the United States: analysis of a large cross-sectional survey. JMIR Public Health Surveill. 2023;9:e42128. doi: 10.2196/42128 [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Kwok KO, Li KK, Wei WI, Tang A, Wong SYS, Lee SS. Editor’s choice: influenza vaccine uptake, COVID-19 vaccination intention and vaccine hesitancy among nurses: a survey. Int J Nurs Stud. 2021;114:103854. doi: 10.1016/j.ijnurstu.2020.103854 [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Biobot Analytics. Data on COVID-19 and Mpox wastewater monitoring. Accessed March 27, 2024. https://biobot.io/data/
19.De Salazar PM, Link NB, Lamarca K, Santillana M. High coverage COVID-19 mRNA vaccination rapidly controls SARS-CoV-2 transmission in long-term care facilities. Commun Med (Lond). 2021;1:16. doi: 10.1038/s43856-021-00015-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
20.McDowall D, McCleary R, Bartos BJ. Interrupted Time Series Analysis. Oxford University Press. October 2019. Accessed March 27, 2024. https://global.oup.com/academic/product/interrupted-time-series-analysis-9780190943943?cc=us&lang=en&
21.Kogan NE, Clemente L, Liautaud P, et al. An early warning approach to monitor COVID-19 activity with multiple digital traces in near real time. Sci Adv. 2021;7(10):eabd6989. doi: 10.1126/sciadv.abd6989 [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Xiong C, Hu S, Yang M, Luo W, Zhang L. Mobile device data reveal the dynamics in a positive relationship between human mobility and COVID-19 infections. Proc Natl Acad Sci U S A. 2020;117(44):27087-27089. doi: 10.1073/pnas.2010836117 [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Chinazzi M, Davis JT, Ajelli M, et al. The effect of travel restrictions on the spread of the 2019 novel coronavirus (COVID-19) outbreak. Science. 2020;368(6489):395-400. doi: 10.1126/science.aba9757 [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Lai S, Ruktanonchai NW, Zhou L, et al. Effect of non-pharmaceutical interventions to contain COVID-19 in China. Nature. 2020;585(7825):410-413. doi: 10.1038/s41586-020-2293-x [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Sudre CH, Keshet A, Graham MS, et al. Anosmia, ageusia, and other COVID-19–like symptoms in association with a positive SARS-CoV-2 test, across six national digital surveillance platforms: an observational study. Lancet Digit Health. 2021;3(9):e577-e586. doi: 10.1016/S2589-7500(21)00115-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Perrotta D, Grow A, Rampazzo F, et al. Behaviours and attitudes in response to the COVID-19 pandemic: insights from a cross-national Facebook survey. EPJ Data Sci. 2021;10(1):17. doi: 10.1140/epjds/s13688-021-00270-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Rothwell J. Estimating COVID-19 prevalence in symptomatic Americans. Gallup Blog. April 3, 2020. Accessed August 19, 2024. https://news.gallup.com/opinion/gallup/306458/estimating-covid-prevalence-symptomatic-americans.aspx
28.Qasmieh SA, Robertson MM, Teasdale CA, Kulkarni SG, Nash D. Estimating the period prevalence of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection during the Omicron (BA.1) surge in New York City (NYC), 1 January to 16 March 2022. Clin Infect Dis. 2023;76(3):e499-e502. doi: 10.1093/cid/ciac644 [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Rader B, Gertz A, Iuliano AD, et al. Use of at-home COVID-19 tests—United States, August 23, 2021–March 12, 2022. MMWR Morb Mortal Wkly Rep. 2022;71(13):489-494. doi: 10.15585/mmwr.mm7113e1 [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Qasmieh SA, Robertson MM, Teasdale CA, et al. The prevalence of SARS-CoV-2 infection and other public health outcomes during the BA.2/BA.2.12.1 surge, New York City, April-May 2022. Commun Med (Lond). 2023;3(1):92. doi: 10.1038/s43856-023-00321-w [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Outbreaks Near Me. Home. Accessed August 31, 2023. https://outbreaksnearme.org/us/en-US
32.Data For Good at Meta. COVID-19 trends and impact survey. Accessed November 21, 2023. https://dataforgood.facebook.com/dfg/tools/covid-19-trends-and-impact-survey
33.Smolinski MS, Crawley AW, Baltrusaitis K, et al. Flu Near You: crowdsourced symptom reporting spanning 2 influenza seasons. Am J Public Health. 2015;105(10):2124-2130. doi: 10.2105/AJPH.2015.302696 [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Baltrusaitis K, Santillana M, Crawley AW, Chunara R, Smolinski M, Brownstein JS. Determinants of participants’ follow-up and characterization of representativeness in Flu Near You, a participatory disease surveillance system. JMIR Public Health Surveill. 2017;3(2):e18. doi: 10.2196/publichealth.7304 [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Ansolabehere S, Luks S, Schaffner BF. The perils of cherry picking low frequency events in large sample surveys. Elect Stud. 2015;40:409-410. doi: 10.1016/j.electstud.2015.07.002 [DOI] [Google Scholar]
36.Bradley VC, Kuriwaki S, Isakov M, Sejdinovic D, Meng XL, Flaxman S. Unrepresentative big surveys significantly overestimated US vaccine uptake. Nature. 2021;600(7890):695-700. doi: 10.1038/s41586-021-04198-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
37.MacInnis B, Krosnick JA, Ho AS, Cho MJ. The accuracy of measurements with probability and nonprobability survey samples: replication and extension. Public Opin Q. 2018;82(4):707-744. doi: 10.1093/poq/nfy038 [DOI] [Google Scholar]
38.Clinton J, Cohen J, Lapinski J, Trussler M. Partisan pandemic: how partisanship and public health concerns affect individuals’ social mobility during COVID-19. Sci Adv. 2021;7(2):eabd7204. doi: 10.1126/sciadv.abd7204 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials