Abstract
Background
Understanding how infectious disease transmission varies from person to person, including associations with age and contact behavior, can help design effective control strategies. Within households, transmission may be highly variable because of differing transmission risks by age, household size, and individual contagiousness. Our aim was to disentangle those factors by fitting mathematical models to SARS-CoV-2 household survey and serologic data.
Methods
We surveyed members of 3,381 Utah households from January-April 2021 and performed SARS-CoV-2 antibody testing on all available members. We paired these data with a probabilistic model of household importation and transmission composed of a novel combination of transmission variability and age- and size-structured heterogeneity. We calculated maximum likelihood estimates of mean and variability of household transmission probability between household members in different age groups and different household sizes, simultaneously with importation probability and probabilities of false negative and false positive test results.
Results
12.8% of individual participants, residing in 17.4% of the participating households, showed serologic evidence of prior infection or reported a prior positive test on the survey. Serologically positive individuals in younger age groups were less likely than older adults to have tested positive during their infection according to our survey results. Our model results suggested that adolescents and young adults (ages 13–24) acquired SARS-CoV-2 infection outside the household at a rate substantially higher than younger children and older adults. Our estimate of the household secondary attack rate (HSAR) among adults aged 45 and older exceeded HSARs to and/or from younger age groups at a given household size. We found lower HSAR in households with more members, independent of age differences. The age-specific HSAR patterns we found could not be explained by age-dependent biological susceptibility and transmissibility alone, suggesting that age groups contacted each other at different rates within households.
Conclusions
We disentangled several factors contributing to age-specific infection risk, including non-household exposure, within-household exposure to specific age groups, and household size. Within-household contact rate differences played a significant role in driving household transmission epidemiology. These findings provide nuanced insights for understanding community outbreak patterns and mechanisms of differential infection risk.
Supplementary Information
The online version contains supplementary material available at 10.1186/s12889-024-20308-z.
Keywords: SARS-CoV-2, Serology, Transmission, Household, Age-structure, Infection risk, Mathematical model, Parameter estimation
Background
The spread of an infectious disease through a community can occur in complicated patterns, especially when the person-to-person transmission potential of infected individuals is highly variable [1]. Such variability can occur physiologically, through varying duration, severity, and contagiousness of human infections, as well as socially, through varying social settings and contact behaviors that provide transmission opportunities [2]. These variabilities may be partially explained by correlating transmission risk with observable demographic variables such as age and number of contacts in primary social settings such as households [3]. Understanding the mechanisms and associations along which transmission variabilities arise helps improve the efficiency of infection control via targeted pharmaceutical and social interventions [4].
Transmission mechanisms and variabilities could be studied by directly observing the transmissions and transmission opportunities for each infected person, but such data are difficult and costly to collect. Cross-sectional prevalence data, e.g. from testing serological samples for the presence of antibodies among individuals who have had contact with each other, can be used to indirectly estimate transmission rates and patterns. When such data are paired with mechanistically derived mathematical models, insights can be derived about the source of transmission variabilities using appropriate analytic and computational techniques [5, 6]. In this work, our goal was to use such a model-based approach to study transmission variabilities of SARS-CoV-2, with a focus on age-based associations effects, using household seroprevalence data.
SARS-CoV-2 poses high risk of transmission from person to person in indoor public spaces and households [7], where spread of SARS-CoV-2 has been well documented [8]. Studying transmission at the household level is convenient relative to other indoor locations with more transient populations, and household transmission research can provide illuminating insights into risk factors and mechanisms of transmission. Furthermore, households provide an intergenerational mixing ground with contact across age groups, and both household and age structure can be crucially important for accurate characterization of community-wide transmission [3].
Studying differences in susceptibility, transmissibility, and contact by age has been an important area of research with implications for public health policy for many infectious diseases. During the COVID-19 pandemic, younger individuals have been less likely to experience severe disease but are a potential source of SARS-CoV-2 contagion [9]. Infected children with mild or no symptoms may nevertheless transmit to other more vulnerable contacts, including to highly vulnerable aged adults in multi-generational households [10]. Some SARS-CoV-2 and influenza studies have found that households with more members have lower transmission rates [11–13], and it is unclear the extent to which that observation might be attributed to different age configurations of larger households or other factors. Understanding differences in transmission among and between different age groups, and the mechanisms by which those differences might arise, can shed light on the benefits of age-group targeting of transmission-interruption measures, including vaccination [14].
Serological surveys, in which blood samples from individuals in a population are tested for the presence of antibodies, can be used to infer which individuals had experienced prior infection, including those who had not been diagnosed [15]. When performed over entire households, serological surveys offer one way to assess household transmission: the number of household members testing positive provides information to infer the distribution of the total number of household members infected during household outbreaks that ran their course prior to data collection. Quantitative model-based techniques [16] can be used to extract information from these datasets on the transmissibility of the pathogen under conditions of sustained close contact. When such datasets include metadata on household member demographics, they can help answer questions about the relative susceptibility to or transmissibility of infection across demographics such as age [17].
In this work, we developed a novel mathematical model of household transmission that incorporates transmission variability [18] and heterogeneity over age and household size. A central goal achieved by our model-based findings was to estimate quantitative age differences in household COVID-19 epidemiology using detailed serological data from Utah households and elucidate the extent to which these estimates reveal potential mechanisms of within-household transmission differences by age.
Methods
Data collection from Utah households
Our data was collected via the Utah Health & Economic Recovery Outreach (HERO) Project [19], which recruited participants from households in the state of Utah during several distinct phases in 2020 [20] and 2021 [21]. Members of participating households completed surveys and were subsequently invited to a mobile testing site to contribute samples for testing. Participants contributing the specific subset of the Utah HERO Project data analyzed and modeled for this manuscript were recruited during two different phases of the Utah HERO Project in 2021: “school district testing” and “hotspot testing” (see Additional file 1 for recruitment details). In contacted households, all household members aged 12 years or older were asked to complete an individual survey, adults were asked to complete the individual survey on behalf of children < 12 years of age, and the household was asked to complete an overall household survey. Each household member > 6 months of age was asked to provide a blood sample at a mobile testing site.
The following data from the surveys were retained for our analysis. From the household survey, we used the household size (number of people living in the household). From the individual surveys, we used the date of birth, whether the individual had previously tested positive for COVID-19, the date of a previous positive test, the number of prior COVID-19 vaccine doses received, and the date of any prior vaccine doses. We calculated the age of surveyed participants on the day of serological testing, using the date of birth from the survey. Prior test result data from the survey were classified as prior positive reported (P) or no prior positive reported (N).
We excluded households from our analysis if the number of people living in the house (household size) or the age of any household member were missing. We also excluded households for which the reported size was inconsistent with the individual-level data (i.e., the reported household size was less than the number of individuals contributing data for that household).
Whole blood was collected from participants at mobile testing facilities, via finger prick for participants aged between 6 months and 12 years and via venous puncture for participants aged 12 years or older. Serum specimens from blood collected via venous puncture were analyzed using all three of the following tests: Abbott SARS-CoV-2 IgG assay performed on an Abbott Architect i2000 instrument (Abbott Laboratories), Euroimmun Anti-SARS-CoV-2 ELISA (IgG) assay, and Siemens SARS-CoV-2 IgG (Cov2G) assay. Only the Euroimmun test was used to analyze specimens from finger prick samples.
Methodology and criteria for a positive antibody result were defined according to the manufacturers’ instructions [22–24] and as in prior studies [25, 26]. Antibody test results for each test were classified according to negative (N), positive (P), or not performed / indeterminate (X). Because the Euroimmun and Siemens assays detect anti-SARS-CoV-2 IgG for the S1 domain of viral spike protein (anti-S1), which can be produced by vaccination, participants who reported having received at least one dose of COVID-19 vaccine prior to data collection and tested positive on either assay were automatically labeled as indeterminate (X) for purposes of informing prior infection. The Abbott assay detects anti-SARS-CoV-2 IgG for the nucleocapsid protein (anti-N), which we assumed was not impacted by prior vaccination and that vaccinated patients testing Abbott-positive had tested positive for prior infection [27]. We defined an antibody test result as being “informative” for potential prior infection status when the result was negative or positive for the Abbott test, negative or positive for Euroimmun or Siemens for participants reporting no prior vaccination, or negative for Euroimmun or Siemens for participants reporting any prior vaccination.
Each member of a household retained for analysis was labeled according to their combination of five data points: reported prior positive test (N or P); Abbott test result (N, P, or X); Euroimmun test result (N, P, or X); Siemens test result (N, P, or X). For example, a participant labeled “NPXN” reported no prior positive test, tested positive by Abbott, had an indeterminate Euorimmun test, and tested negative by Siemens. Further labels were added indicating age group and household size, as described in later sections. Then we created a table tallying the number of participants with each label in each household: a column for each unique label that at least one participant received and a row for each household. Each table element is the number of participants in each household with each label. We then tallied the number of unique rows, C, and labeled the sets of numbers in each unique row as vectors . Finally, the entire dataset is represented by the vector , where frequency is the number of households described by the corresponding .
The following sections describe the mathematical model designed to produce a likelihood function to quantify the probability of observing the dataset under a given set of epidemiological assumptions. The de-identified data and codes, written in R version 4.0.3, to produce , calculate and optimize , and produce the results in this manuscript, are publicly available at https://github.com/damontoth/householdTransmission2021.
Household transmission model
First, we define the probability that a single infected person transmits infection to different numbers of people living in the same household. We separate a household’s population into age groups that can differ in both susceptibility to acquisition from infected housemates and in transmissibility to susceptible housemates from each group. For simplicity, we limited the number of transmission age groups to three, named U, V, and W. We use the notation to denote the probability that a single infected person with susceptible housemates in each group directly transmits infection to of them, respectively.
To derive our formula for , we assume that the probability of transmission to a u-group person is a random variable taking the form , where is a gamma-distributed random variable with shape and rate . The gamma distribution models the mean and variability of infected individuals’ transmissibility to housemates, where the parameter is equivalent to the dispersion parameter often used to model transmission variability and superspreading. The value of can be considered the product of the total dose of exposure experienced by an infected person’s susceptible housemate and the per-dose probability of infection, as in the exponential dose-response model. Then, the probabilities of transmission to a person in group V and W is and , where the coefficients and alter the probability of infection relative to a person in group U, from a given infected housemate.
From the above assumptions, the following formula for applies (see Additional file 1 for derivation):
Because we also model differential transmissibility from each group, we use three sets of parameters governing transmission: , , and , which are the parameters used when a person from group U, V, and W is the potential transmitter, respectively. Using those parameters, we can also define the directional household secondary attack rates (HSARs) , the mean transmission probability from an infected person from group i to a susceptible person from group j in the same household:
For clarity, we report our estimates for the set of transmission parameters using the above nine directional attack rates, plus the dispersion parameters . The values of and can then be calculated by inverting the above equations.
With the above function defined for transmissions from one infectious individual in a household, we now derive formulae for multiple potential transmitters. First, let be the probability that, in a household with infected members from each group, those infected persons transmit infection directly to a total of of susceptible housemates from each group. First, we define the trivial case where : no infected household members to transmit:
Next, we have already defined the cases where (exactly one infected household member):
Then we iteratively calculate for each ( in increasing sequence of :
With in hand, we then calculate : the probability that already infected (u-group, v-group, w-group) household members lead to a total of transmissions to susceptible household members from each respective group. In other words, is the probability that the vector of final household outbreak sizes of each group is given that were initially infected and were initially susceptible. The probability accounts for transmissions directly from initially infected members and in any subsequent transmission generations.
We calculate values of iteratively, starting with trivial case of no initial infections in the household:
Then, starting with and then for increasing in sequence, we iteratively calculate:
Total household infection size model
The probability of infections within a household also depends on the likelihood of members acquiring infection from non-household members (called “community” acquisitions). We calculated those probabilities for each household member based on characteristics that could extend beyond membership in the U, V, and W transmission groups described in the last section.
Let be the number of household members in each of the three transmission groups. Let be the infection indicators for each household member, where the values equal 1 if infected and 0 if uninfected. Let be indicators for whether each household member belongs to transmission group U (1 if yes, 0 if no), and similarly and are indicators for membership in groups V and W. Then we define .
Let be the community acquisition probability of each household member. Now we define the likelihood for household members indicated by the values ending up infected by either community acquisition or household transmission.
The values . The values depend on both the household transmission parameters defined in the prior section and the community acquisition probabilities.
Data accuracy model
Applying the formula from the prior section directly to our data would be problematic because the true infection history of each household member is not known with certainty. Our data include up to 4 sources of COVID-19 test information by which prior infection status of individual household members can be probabilistically inferred: results from up to 3 different antibody tests (Abbot, Siemens, and Euroimmun) and results from surveys in which participants could report results of a prior test.
Antibody test results are subject to imperfect sensitivity and specificity due to false negatives and false positives. To account for these, we added 6 additional parameters to our likelihood model: , , and are the antibody test sensitivities for Abbot, Siemens, and Euroimmun, respectively, and , , and are the specificities. Prior test results for SARS-CoV-2 reported on the survey also do not perfectly identify those with prior infections. Among people who had a prior infection, a portion of them never tested positive because they did not receive a test while their infection was identifiable or because they falsely tested negative. Because the degree of this under-ascertainment has been known to vary widely by age, we modeled it using a separate probability for each household member: are the probabilities that each of the n surveyed persons in a household reported receiving a positive test for the virus had they been infected. The parameter is the probability that any surveyed person with no prior infection did not report receiving a positive test.
For each person in a household of size , we considered the overall combination of that person’s test results to calculate and , the probability of that test result combination occurring if the person had been infected and never infected, respectively. The following provide illustrated examples.
Example 1: result combination “NNNN”: no prior positive reported on survey, and negative by all 3 antibody tests:
Example 2: result combination “NNPN”: no prior positive reported on survey, negative by Abbot, positive by Euroimmun, and negative by Siemens:
Example 3: result combination “PNXN”: prior positive reported on survey, negative by Abbot, no or indeterminate Euroimmun result, and negative by Siemens:
With the and values for each household member in hand, the following formula specifies the overall likelihood of observing the entire household’s set of test results together:
Age- and size-based parameter values
The formulas in the previous sections included three different components that we assumed could depend on the characteristics of household member . First, the “transmission group” specified by ; second, the community acquisition probability ; and third, the prior infection ascertainment probability . We modeled each of these dependencies based on the individual’s age at the time of data collection. Specifically, we categorized ages using combinations of age ranges for terms defined by Medical Subject Headings (MeSH) [28] (Table 1). Because we had limited data from the youngest individual age groups, we combined the newborn, infant, and preschool child groups into one category (aged 0 to 5 years). Other MeSH-defined age categories and sub-categories were retained separately, for a total of 8 categories.
Table 1.
Age categories and age-specific model parameters
| MeSH age category names | Age range (years) | Community acquisition parameter | Prior infection ascertainment parameter |
|---|---|---|---|
| Newborn/Infant/Preschool Child | 0–5 | ||
| Child | 6–12 | ||
| Adolescent | 13–18 | ||
| Young adult | 19–24 | ||
| Adult (excluding young adult) | 25–44 | ||
| Middle aged | 45–64 | ||
| Aged (excluding 80 and over) | 65–79 | ||
| Aged, 80 and over | 80+ |
MeSH Medical Subject Headings
The community acquisition parameter for each individual in a household was set to one of the 8 values through according to age group, and similarly each individual’s prior infection ascertainment parameter was set to a value among through . For the transmission groupings, because we limited our model to three groups, we tested different ways to combine our 8 MeSH age groups from Table 1 into three groups of adjacent MeSH age groups. We designated transmission group “U” as the oldest age group, “V” as the middle age group, and “W” as the youngest.
We also allowed within-household transmission to vary by total household size (number of household members) using two additional parameters: and scale the probability of acquiring infection from an infected housemate of a person living in a size 3–4 or size 5-or-more household, respectively, relative to a person living in a size-2 household. These parameters multiply each of the parameters that model age differences, so that each age-to-age transmission probability is assumed to scale by household size in the same way. Thus, we calculate three sets of T values quantifying household transmission chain probabilities, for a given set of epidemiological parameters:
In our full model we sought to use our data to simultaneously estimate the 22 epidemiological parameters
and the 15 data accuracy parameters
using a maximum likelihood estimate (MLE).
The log likelihood of the dataset described above with variable set is
Likelihood optimization and uncertainty
We maximized the log likelihood over the unknown parameters using the observations for each household, to produce the MLE . The log likelihood maximization was performed using an iterative scheme in R, in which R’s native “optim” function was applied alternately to the set of epidemiological parameters and then to the set of data accuracy parameters , holding the other set constant at the previous solution and continuing back and forth until consecutive maximal log likelihood values achieved for the estimates converged to within 0.01. While it is possible to supply the entire set of parameters to the optimization function in one step, in early testing we found that the function performed better when the number of parameters to be optimized simultaneously was reduced for each function call.
We first tested models with simpler assumptions for the dispersion parameters , , and . The value 1 is a special case representing exponentially-distributed (special case of gamma-distributed) transmissibility by an infected person. The special case represents no variance in transmissibility. We tested models assuming and against one that optimized those values along with other parameters; if one of the simpler models is favored, we could move forward with other model comparisons described next using less computationally costly formulae.
We also tested different age-group choices for the household transmission groups U, V, and W described above, and we selected the grouping that produced the MLE with the highest likelihood. We refer to this result as the MLE of the “full model” to which alternate, simpler models were compared, to test the extent to which the full model MLE might suffer from overfitting.
For the full model, we derived approximate confidence interval boundaries for an individual parameter using the likelihood ratio test, using the statistic , where consists of freely varying and the other elements of held at their optimal value. We defined a 95% confidence interval boundary where produces a value for this statistic equal to the 95th percentile of the chi-squared distribution with 1 degree of freedom.
Alternate model comparisons
We compared the MLE of the full model against simpler models with fewer free parameters. To calculate P-values at which a simpler model could be rejected in favor of one with more free parameters and of which the simpler model is a special case, we used the chi-squared distribution with degrees of freedom equal to difference in the number of free parameters between the two models. We quantified and compared the parsimony of different models using the Akaike information criterion (AIC), which aims to balance goodness of fit with model simplicity. A lower AIC score is favored for model selection.
For the set of average HSARs, which consists of 9 free age-group-to-age-group values in the full model, we tested alternate models that use fewer parameters via simplifying assumptions. We tested an “equal contact” model, which assumes that the extent to which household members contact each other is independent of age, but that susceptibility and/or transmissibility per contact can differ by age. The equal contact model corresponds to the assumption that and . Thus, in this model the 9 HSARs are governed by 5 free parameters: 2 relative susceptibility () parameters and 3 average transmissibility parameters.
Next, we tested a “reciprocal transmission” model, which assumes that the average household transmission probability between an infected and susceptible person from two different age groups is the same regardless of which person is infected, i.e., , , and . With those 3 constraints, the reciprocal transmission model is governed by 6 free parameters to determine the 9 HSARs.
To seek a maximally parsimonious model, we tested an additional series of models applying a variety of simplifying assumptions, as defined by optimal (minimal) AIC. These included assumptions of age-independence of transmission to and/or from each of the individual transmission age groups. For example, assuming enforces transmission from age group U to be equally likely to a housemate of any age. Likewise, assuming enforces acquisition of infection by age group U to be equally likely from an infected housemate of any age, and assuming enforces an equal HSAR between age group U and all ages in both directions. We repeated the MLE procedure for each of these models and each age group, as well as combinations of them for multiple age groups in cases where multiple sets of assumptions produced favorable fits by the AIC. We also tested models that collapse the different age-group-based community acquisition parameters through and prior infection ascertainment parameters through into fewer age groupings, as well as models that assume no effect of household size on transmission rates. We combined any of the above simplifications that were favored by the AIC criteria to arrive at our maximally parsimonious model.
We summarized the components of our model and parameter estimation procedure against the data (Fig. 1), including a mapping to Tables where results from different versions of the model can be found.
Fig. 1.
Diagram summary of model components and parameter estimation procedure. Individual data from study participants are used to iteratively optimize two sets of parameters using MLE (maximum likelihood estimation). Various model components are used to determine the exact definition and number of the parameters to be estimated. The component model boxes specify the Tables where MLE results for each version of the model can be found in the Results section or in Additional file 2
Finally, we analyzed implications of our results for the most parsimonious model. We explored potential mechanisms of household transmission difference by age that are consistent with estimated age differences in HSAR, including age-specific differences in household contact rates and per-contact transmissibility or susceptibility. We also calculated the overall age-specific infection risk for each group implied by the model results and the age configurations of households in the data to provide a summary of the estimated rate of infection from household and non-household sources for each age group.
Results
Data summary
Of the 96,547 total households contacted across both the school district and hotspot testing phases, we compiled serological test data and associated survey data from 3,381 (3.5%) of the households and 7,707 individuals (Table A in Additional file 2).
Of the 3,381 total households, we retained 2,514 (74.4%) for our analysis. The 867 excluded households were removed because of unknown household size (n = 55), a reported household size inconsistent with individual data (n = 57), or incomplete age data obtained for the household members (n = 755). In the 2,514 retained households, there were 5,855 (2.33 per household) household members. The 755 excluded households of known size averaged 3.67 members per household.
Among the 5,855 retained participants, 1,337 (22.8%) reported having received at least one dose of a COVID-19 vaccine prior to serological data collection, and 727 (12.4%) reported having received two vaccine doses. For those reporting a plausible date for their first vaccine dose (n = 1274), the reported date occurred a median of 32.5 days prior to serological sample collection (range 0–112 days). For those reporting plausible dates for two vaccine doses (n = 654), the reported date of the second dose occurred a median of 29 days prior to data collection (range 0–91 days). Participants reporting prior vaccination tested antibody positive by Euroimmun and Siemens tests at highly elevated rates compared to those not reporting prior vaccination (Table B in Additional file 2), consistent with the presence of vaccine-induced antibodies; these positive results were labeled non-informative (‘X’) when calculating prior SARS-CoV-2 infection likelihood in our model.
5,626 (96.1%) of the retained participants provided a blood sample that produced at least one informative antibody test result, and 2,848 (48.6%) of the retained participants received informative results for all three antibody tests. Of those, 2,412 (84.7%) were all three negative, 230 (8.1%) were all three positive, and 206 (7.2%) were a mixture of positive and negative results. The most common conflicting result combinations among the latter group were Abbott-negative conflicting with Euroimmun-positive and Siemens-positive results (n = 99) and Euroimmun-positive conflicting with Abbott-negative and Siemens-negative results (n = 63). We tallied other result combinations, including indeterminate (non-informative) results marked by ‘X’ as well as prior-positive test reporting for participants in each antibody test result group (Table C in Additional file 2).
In all, 748 (12.8%) of the 5,855 participants were positive on at least one informative antibody test or reported prior test. These 748 participants lived in 438 of the 2514 households, thus 17.4% of the participating households had at least one informative positive result. By comparison, among the 867 excluded households, 21.7% had at least one informative positive participant, and 16.3% of the individuals in the excluded households produced at least one informative positive result.
Among those 438 households with at least 1 informative positive participant, 246 had exactly 1 and 192 had more than 1, ranging from 2 to 8. A crude HSAR estimate, i.e., assuming exactly 1 importation in each of the 438 households and calculating the fraction of housemates of an importer who were also informative positive, was 33.9%. Our likelihood model relaxed those assumptions in several ways, including accounting for the probability of multiple household importations, false negative and false positive tests, and other features, as detailed in the following sections.
Maximum likelihood estimates
Preliminary model comparisons: transmission dispersion and age groupings
We compared models that produced estimates of each individual transmission dispersion parameter to simpler models of transmission variability. First, the simpler model with exponentially distributed transmissibility ( 1) produced similar maximal likelihood to the full model and a superior (lower) AIC, while the model with no variability in transmissibility () produced inferior maximal likelihood (Table D in Additional file 2). For simplicity we fixed the most parsimonious 1 assumption for all following models.
We also ran our MLE procedure for each possible configuration of the (U, V, W) transmission groups (Table E in Additional file 2). The best configuration producing the maximal log-likelihood occurred when the oldest group, reference group U, included middle aged and aged individuals (age 45+), the middle group V included adults excluding young adults (age 25–44), and the youngest group W included children, adolescents, and young adults (age 0–24). We used those three age ranges for the three transmission age groups in all following models.
Full model results
The model with the assumptions defined in the prior sub-section and no other simplifying assumptions is called our “full model” and includes 34 free parameters. We derived the MLE and confidence intervals for all 34 parameters (Table 2).
Table 2.
Full model MLE results with confidence intervals
| Parameter | MLE (95% CI) |
|---|---|
| Mean community acquisition probability | |
| – infant/pre-school 0–5 ( | 3.2% (0.6 – 6.8%) |
| – child 6–12 ( | 2.0% (0.6 – 4.1%) |
| – adolescent 13–18 ( | 15.1% (11.4 – 19.4%) |
| – young adult 19–24 ( | 12.6% (8.8 – 17.2%) |
| – adult 25–44 ( | 6.7% (5.5 – 8.0%) |
| – middled aged 45–64 ( | 6.7% (5.3 – 8.4%) |
| – aged 65–79 ( | 5.2% (3.7 – 7.1%) |
| – aged, 80 and over ( | 0.0% (0.0 – 3.4%) |
| HSAR to age 45 + adults in 2-person households | |
| – From middle aged and older adults aged 45+ () | 68% (56 – 77%) |
| – From adults aged 25–44 () | 45% (11 – 66%) |
| – From children, adolescents, and young adults aged 0–24 () | 42% (23 – 57%) |
| HSAR to age 25–44 adults in 2-person households | |
| – From middle aged and older adults aged 45+ () | 52% (22 – 73%) |
| – From adults aged 25–44 () | 44% (29 – 58%) |
| – From children, adolescents, and young adults aged 0–24 () | 63% (50 – 74%) |
| HSAR to age 0–24 individuals in 2-person households | |
| – From middle aged and older adults aged 45+ () | 34% (11 – 53%) |
| – From adults aged 25–44 () | 47% (31 – 60%) |
| – From children, adolescents, and young adults aged 0–24 () | 45% (33 – 56%) |
| Altered transmission coefficient to members of larger households | |
| – with 3 or 4 members () | 0.43 (0.33–0.57) |
| – with 5 or more members () | 0.22 (0.16–0.29) |
| Probability that prior infected person reported receiving a prior positive test | |
| – infant / pre-school, 0–5 () | 16% (6 – 32%) |
| – child, 6–12 () | 36% (23 – 49%) |
| – adolescent, 13–18 () | 53% (42 – 63%) |
| – young adult 19–24 () | 62% (47 – 75%) |
| – adult 25–44 () | 59% (52 – 65%) |
| – middled aged 45–64 () | 73% (65 – 80%) |
| – aged 65–79 () | 73% (61 – 83%) |
| – aged, 80 and over () | 100% (38 – 100%) |
| Probability that prior infected person (any age) tested positive for antibodies | |
| – Abbot () | 72% (68 – 76%) |
| – Euroimmun () | 99.2% (97.5 – 99.98%) |
| – Siemens () | 87% (83 – 90%) |
| Probability that uninfected person did not report receiving a prior positive test | |
| – any age () | 99.2% (98.9 – 99.4%) |
| Probability that uninfected person (any age) tested negative for antibodies | |
| – Abbot () | 99.7% (99.4 – 99.8%) |
| – Euroimmun () | 98.1% (97.6 – 98.6%) |
| – Siemens () | 99.85% (99.65 – 99.95%) |
MLE Maximum likelihood estimate, 95% CI Confidence interval, derived from comparing the likelihood ratio test statistic to the chi-squared distribution with one degree of freedom
For community acquisition probability, the highest estimates occurred for adolescents: 15.1% (11.4 – 19.4%) and young adults: 12.6% (8.8 – 17.2%). The confidence intervals for those estimates did not overlap with those of any other age group, younger or older. The community acquisition probability for children aged 6–12: 2.0% (0.1 – 4.1%) was substantially lower than the adjacent age group of adolescents, and was also lower than the result for adults aged 25–44: 6.7% (5.5 – 8.0%) and middle-aged adults: 6.7% (5.3 – 8.4%). The result for the oldest age group, 80 and over: 0.0% (0.0 – 3.4%), was substantially lower than any other adult or adolescent age group.
Our results also exhibit age differences in directional household transmission probabilities (HSARs) at given household sizes. The highest estimated average HSAR among all the 9 possible directional age group pairings in this model was from middle-aged and older adults (age 45+) to other middle-aged and older adults: 68% (56 – 77%) in size-2 households. The lowest estimate for size-2 households was from middle-aged and older adults to children, adolescents, and young adults (age 0–24): 34% (11 – 53%). We also found significantly smaller acquisition probability for those living in larger households versus smaller ones: 0.43 (0.33–0.57), and 0.22 (0.16–0.29). This means that our model estimates that each age-to-age HSAR estimate for size-2 households (Table 2) is lower for size 3–4 households and lower still for size 5+ households. For example, our MLE HSAR estimates among age 45+ adults living together in the three different household size groups are 68%, 43%, and 31%.
We found age-based differences in the probability that individuals with prior infections tested positive during their infection (as reported on our survey). Infants and preschoolers (age 0–5) had the lowest probability estimate: 16% (6 – 33%), with a confidence interval that overlapped only with that of children aged 6–12: 36% (23 – 49%), reflective of lower infection ascertainment for young children. The MLE for each other age group was above 50%, with middle aged and older individuals having the highest estimates at 73% or higher. The false-positive rate estimate for survey-reported prior test results among individuals with no prior infection was 0.8%: 99.2% (98.9 – 99.4%).
We also estimated sensitivity and specificity parameters for each of the antibody tests. Our sensitivity estimates were substantially different for the three tests, with Euroimmun highest at 99.2% (97.5 – 99.98%) followed by Siemens, 87% (83 – 90%), and Abbott, 72% (68 – 76%). Our sensitivity estimate for Euroimmun was higher than what the test manufacturer found from samples of individuals who had tested PCR positive 14+ days earlier. By contrast, our sensitivity estimates for Siemens and Abbott were lower than the respective test manufacturer estimates.
For antibody test specificity, our results produced the lowest estimate for Euroimmun, 98.1% (97.5 – 98.6%) and substantially higher estimates for Abbott, 99.7% (99.4 – 99.8%) and Siemens, 99.85% (99.65 – 99.95%). Those specificity intervals overlap those obtained by each test manufacturer from testing stored, presumed negative samples collected prior to the COVID-19 pandemic: 99.0% (98.4 – 99.4%) for Euroimmun [22], 99.6% (99.1 – 99.9%) for Abbott [23], and 99.89% (99.66 – 99.98%) for Siemens [24].
Equal contact and reciprocal transmission models
The “equal contact” model, in which the assumptions and are enforced, produces a MLE result (Table 3) in which the oldest transmission age group has the highest transmissibility, the middle transmission age group has the lowest transmissibility and the highest susceptibility, and the youngest age group has the lowest susceptibility. However, the maximal likelihood of this estimate is much poorer than that of the full model, such that the equal contact model can be rejected by the likelihood ratio test (P = 0.01) and has a poorer AIC score (5.0 higher) despite using 3 fewer parameters than the full model. By contrast, the “reciprocal transmission” model, in which the assumptions , , and are enforced, cannot be rejected by the likelihood ratio test and produces a superior AIC score (4.2 lower) to the full model (Table 3) due to using 3 fewer parameters while achieving a similar likelihood at the MLE.
Table 3.
Comparison of equal contact and reciprocal transmission models to full model
| Log likelihood | P value | Par | ΔAIC | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Full model | 3.2% | 45% | 63% | 42% | 0.43 | 16% | 99.2% | −3021.0 | Ref | 34 | Ref | |
| 2.0% | 47% | 44% | 45% | 0.22 | 36% | 72% | 99.7% | |||||
| 15.1% | 34% | 52% | 68% | 53% | 99.2% | 98.1% | ||||||
| 12.6% | 62% | 87% | 99.85% | |||||||||
| 6.7% | 59% | |||||||||||
| 6.7% | 73% | |||||||||||
| 5.2% | 73% | |||||||||||
| 0.0% | 100% | |||||||||||
| Equal contact model | 3.6% | 44% | 57% | 53% | 0.42 | 16% | 99.2% | −3026.5 | 0.03 | 30 | 3.0 | |
| 2.1% | 37% | 50% | 46% | 0.22 | 36% | 72% | 99.7% | |||||
| 15.0% | 50% | 62% | 58% | 52% | 99.2% | 98.1% | ||||||
| 12.6% | 62% | 87% | 99.85% | |||||||||
| 6.6% | 58% | |||||||||||
| 6.8% | 73% | |||||||||||
| 5.3% | 73% | |||||||||||
| 0.0% | 100% | |||||||||||
| Reciprocal transmission model | 2.7% | 39% | 54% | 37% | 0.45 | 16% | 99.1% | −3021.8 | 0.62 | 31 | −4.2 | |
| 1.7% | 54% | 45% | 49% | 0.24 | 35% | 72% | 99.7% | |||||
| 14.5% | 37% | 49% | 67% | 53% | 99.2% | 98.1% | ||||||
| 12.4% | 62% | 87% | 99.85% | |||||||||
| 7.0% | 59% | |||||||||||
| 6.8% | 73% | |||||||||||
| 5.2% | 73% | |||||||||||
| 0.0% | 100% |
The equal contact model estimates 1.17 and 0.73 are used when calculating the 9 HSAR values shown. The reciprocal transmission model enforces,, and. P values are for rejection of the model in favor of the full model, by the likelihood ratio test with degrees of freedom equal to the difference in the number of parameters estimated (Par). ΔAIC = difference in Akaike information criterion value versus the reference (Ref) model; a negative ΔAIC means the model’s AIC score was lower and thus favored over the reference model
Most parsimonious model
We tested a series of models using simplifying assumptions determining the age-specific HSAR values, using fewer free parameters to determine the 9 values. The models enforcing equal HSAR to and/or from age group V (age 25–44) produced better (lower) AIC scores than those that did not (Table F in Additional file 2). When combining this with other favored assumptions, the best AIC score was produced by a 5-parameter model that used 3 HSAR values to parameterize the 3-by-3 matrix of directional HSAR by age group for a particular household size group, plus the 2 size-effect parameters and (Table F in Additional file 2). This model retains the assumption of equal AR between age group V (age 25–44) and all age groups in either direction, and also equal HSAR between age group W (age 0–24) and age groups W and U (age 45+) in either direction, which leaves the HSAR among age group U as the third independent value for a particular household size group.
We also tested simpler models that used two rather than three transmission age groups by consolidating adjacent groups of our three-group model (Table F in Additional file 2). The most parsimonious among those two-group models used ages 0–44 and 45 + as the two groups, in addition to assuming reciprocal transmission – it produced a likelihood inferior to the best three-age-group model just described, with each estimating 5 transmission parameters. Our testing also included even simpler models that used fewer than 5 transmission parameters (Table G in Additional file 2), which could each be rejected in favor of the most parsimonious model (P ≤ 0.03). Those rejected models include models with no age-difference assumptions on transmission (household size being the only correlate with HSAR) or no household size effect ( 1). The simplest model with no age- or size-structure produced a single HSAR estimate of 27% applying to all ages and household sizes, but that result could be strongly rejected (P < 0.0001).
We tested whether adjacent age groups could be consolidated to use fewer free parameters for community acquisition and case ascertainment while achieving comparable model fit. Guided by comparing confidence intervals from the full model MLE results, we tested a model that combined the age groups for community acquisition to reduce the number of groups from 8 to 4, where the 4 groups are ages 0–12 (), 13–24 (), 25–79 (), and 80+ (). For case ascertainment, we reduced the number of groups from 8 to 3: ages 0–12 (), 13–44 (), and 45+ (). Compared to the full model, the MLE for the model with those new groupings for both community acquisition and case ascertainment could not be rejected by the likelihood ratio test (P = 0.46) and produced superior (lower) AIC score, 9.2 less than the full model (Table H in Additional file 2). Models using alternate groupings using the same or fewer numbers of age groups produced lower maximal likelihood and inferior AIC scores.
The model combining all the simplifying assumptions producing the best AIC scores in the above comparisons is the most parsimonious overall model we found, producing the best overall AIC score. The most parsimonious model includes 3 household transmission parameters for age-specific HSAR, 2 for household size-adjustment coefficients for HSAR, 4 age-specific importation parameters, 3 age-specific prior case ascertainment parameters, and 7 age-independent sensitivity and specificity parameters (Table 4). The HSAR values shown apply only between members of 2-person households: 66% HSAR between age 45 + individuals residing together, 47% between age 25–44 individuals and anyone else, and 35% between remaining pairings involving age 0–24 individuals with each other or with age 45 + individuals. For households with 3 or 4 residents, the model adjusts those HSAR estimates down to 49%, 31%, and 21%, respectively, and for households with 5 or more residents, the respective HSAR estimates are 35%, 21%, and 13%.
Table 4.
Most parsimonious model with confidence intervals
| Parameter | MLE (95% CI) |
|---|---|
| Mean community acquisition probability | |
| – infant / pre-school / child 0–12 ( | 2.1% (0.8 – 3.8%) |
| – adolescent / young adult 13–24 ( | 13.7% (10.9 – 16.7%) |
| – adult / middle aged / aged 25–79 ( | 6.6% (5.8 – 7.4%) |
| – aged, 80 and over ( | 0.0% (0.0 – 3.4%) |
| Mean HSAR in 2-person households | |
| – Among age 45+ () | 66% (54 – 75%) |
| – From or to age 25–44 () | 47% (41 – 54%) |
| – Among age 0–24 and between age 0–24 and age 45+ () | 35% (27 – 42%) |
| Altered exposure / susceptibility of members of larger households | |
| – with 3 or 4 members () | 0.51 (0.39–0.67) |
| – with 5 or more members () | 0.29 (0.21–0.38) |
| Probability that prior infected person reported receiving a prior positive test | |
| – infant / pre-school / child, 0–12 () | 28% (19 – 39%) |
| – adolescent / young adult / adult, 13–44 () | 57% (52 – 63%) |
| – middled aged and older, 45+ () | 73% (67 – 79%) |
| Probability that prior infected person (any age) tested positive for antibodies: | |
| – Abbot () | 72% (68 – 76%) |
| – Euroimmun () | 99.3% (97.6 – 100%) |
| – Siemens () | 87% (83 – 90%) |
| Probability that uninfected person did not report receiving a prior positive test | |
| – any age () | 99.2% (98.9 – 99.4%) |
| Probability that uninfected person (any age) tested negative for antibodies | |
| – Abbot () | 99.7% (99.4 – 99.8%) |
| – Euroimmun () | 98.1% (97.6 – 98.6%) |
| – Siemens () | 99.85% (99.65 – 99.95%) |
MLE Maximum likelihood estimate, 95% CI Confidence interval, derived from comparing the likelihood ratio test statistic to the chi-squared distribution with one degree of freedom
Mechanisms of household transmission differences
The three different age-specific HSARs at a given household size estimated by the most parsimonious model could be explained by a combination of age-based differences in transmissibility, susceptibility, and the amount of household contact. We cannot uniquely estimate the age differences for each of those three mechanisms simultaneously, but we can use our HSAR estimates to constrain their values and explore possible combinations that are consistent with our findings (Fig. 2). We previously showed that assuming an equal amount of household contact between all age groups is inconsistent with our results, as age differences in transmissibility and susceptibility alone cannot reproduce the attack rate matrix. Thus, all depicted possibilities exhibit age-based contact differences (Fig. 2). However, uncertainty remains about the potential role of age-based transmissibility and susceptibility differences in combination with different magnitudes of contact differences.
Fig. 2.
Possible mechanisms of transmission differences consistent with model estimates. W: age group 0–24; V: age group 25–44; U: age group 45+; labels ‘XY’ in relative-amount-of-contact figures refer to the amount of contact between age groups X and Y when X is infected and Y is susceptible. A Relative amount of contact when assuming no age difference in susceptibility nor transmissibility; (B) results when assuming equal relative age differences in both susceptibility and transmissibility and no age differences in within-age-group contact; (C) results when assuming no age difference in transmissibility nor in within-age-group contact; (D) results when assuming no age differences in susceptibility nor in within-age-group contact
One possibility consistent with our results is that transmissibility and susceptibility per contact are independent of age, and that attack rate differences are entirely explained by contact differences (Fig. 2A). With this scenario, we estimate the amount of contact between age 25–44 adults and their housemates of any age would have to be about 47% the amount that age 45 + adults contact each other. The amount of contact between age 0–24 and age 0–24 or 45 + would be about 28% the amount of contact within the over-45 age group.
Another possibility is that within-age-group contacts are the same, and the increase of within-group attack rates is explained by both susceptibility and transmissibility increasing with age by equal proportions (Fig. 2B). With this scenario, we estimate that susceptibility and transmissibility of those aged 25–44 and 0–24 are about 68% and 53% less than those aged 45+, respectively. This scenario also requires that the highest amount of contact occurred between age 25–44 and age 0–24 individuals.
It is also possible that age groups have equal transmissibility but different susceptibility (Fig. 2C) or vice-versa (Fig. 2D). However, those scenarios require that contact behavior depends on the age group of the infected person in an infected-susceptible pairing. For example, if aged 25–44 adults are more susceptible than children but equally transmissible, the amount of household contact in a potential transmission scenario between adults and children must be higher when adults were infected and lower when children were infected.
The depicted scenarios illustrate certain boundary cases and are not exhaustive of all possibilities. We derived a full mathematical description of the possible mechanistic parameter combinations consistent with our MLE results (Additional file 1).
Infection risk by age group
Estimates from the most parsimonious model exhibit notable age differences in the overall risk of infection prior to data collection, as well as in relative contribution of non-household and household sources (Fig. 3). Members of the youngest (0–12 years) and oldest (80 + years) age groups experienced higher probability of within-household acquisition compared to their risk of acquiring infection from non-household sources. Those in the intermediate age groups experienced most of their acquisition from non-household sources, but within-household sources still contributed between 23% and 41% of their overall infection probability.
Fig. 3.
Estimated infection probability of individuals from five age groups from non-household (black) and within-household (grey) sources. Estimates were derived using parameters from the most parsimonious model (Table 4)
Children and adolescents were estimated to have had the highest absolute probability of acquiring infection within the household (6–7%), despite being members of the transmission age group with the lowest HSAR estimates. This was because they lived in larger households than an average adult and were more likely to be exposed to an introduction. Relative to other age groups, adolescents experienced high acquisition risk from both non-household and within-household sources.
The age 25–44 and 45–79 groups experienced similar overall within-household acquisition risk despite the differences in their transmission probabilities; the higher HSAR of age 45 + individuals among themselves was balanced out by their lower rate of acquisition from the youngest age groups. Finally, the age 80 + age group were estimated to have negligible non-household acquisition risk and the lowest within-household risk, the latter due to their living with individuals in the younger groups less frequently.
Discussion
Our novel mathematical model of household infection, transmission, and test data accuracy can simultaneously incorporate multiple highly important features of SARS-CoV-2: variability in transmissibility of infected people and heterogeneity in non-household acquisition, case ascertainment, and household transmission across age groups and household sizes. Our model could produce nuanced estimates for household transmission dynamics from data and provide novel insights about potential mechanisms of disparate infection rates across households of different sizes and age combinations.
Our findings show strong evidence that adolescents and young adults aged 13–24 were more likely than younger children and older adults to acquire SARS-CoV-2 infection outside the household prior to data collection in early 2021 in Utah. Children aged 12 and younger were less likely to acquire outside the household than any adult group other than those aged 80 and over. We found little evidence of any acquisitions by the age 80 + group from outside the household, suggesting that the primary risk to these elderly individuals living in households may have been transmission from their housemates. Our data did not include elderly residing in congregate living facilities such as nursing homes, where residents likely experienced a much different risk profile compared to aged individuals covered by our sampling of households [29].
The results also suggest that there were differences in household transmission rates across age group pairings. The highest age-specific HSAR for a given household size was from middle-aged and older adults (age 45+) to other middle-aged and older adults. The next-highest HSAR estimate was for adults aged 25–44, from and to all age groups. The lowest HSAR estimate was for those younger than 25 among each other and to and from those aged 45 and older. That age pattern of attack rates was inconsistent with a model that assumes an age-independent rate of contact among infected and susceptible pairs of housemates. With that equal-contact model, the age group most susceptible to acquiring infection for a given amount of infectious contact would experience the highest attack rate from all three age groups. However, our findings showed that, while the 45-and-older group experienced the highest attack rate from age 45 + housemates, they experienced a lower attack rate from the under-25 group compared to that experienced by the 25–44 group. Similarly, while infected 45-and-older individuals exhibited the highest attack rate to other 45-and-older housemates, they transmitted at a lower rate to under-25 housemates compared to the corresponding attack rate from the 25–44 group. Therefore, our findings suggest that contact within households was not uniform across age groups.
Our finding that the attack rate estimates were similar in both directions for each given age-group pairing is consistent with a hypothesis that there are not substantial age differences in susceptibility or transmissibility for a given rate of contact. For example, if 45-and-older individuals were substantially more susceptible to acquiring infection from a given contact compared to those under 25, one might expect the under-25 to over-45 attack rate to be higher than the over-45 to under-25 attack rate. However, it is possible that an equal attack rate in both directions could be explained by multiple mechanistic differences that cancel each other out, such as age-dependent susceptibility, transmissibility, or contact behavior changes during infection.
Additional data on age-structured, transmission-relevant contact behavior in households would help further constrain the possible mechanisms driving the age differences in transmission we found. For example, if data showed that contact between younger parents and younger children (i.e. between “V” and “W” groups in our model) was more substantial than contact between other age pairings within households, and especially if that contact persisted during illness because of the necessities of childcare [30], then our findings would suggest stronger evidence that younger age groups had lower susceptibility and/or transmissibility per contact compared to older age groups (Fig. 2 panels B-D).
We also found lower HSARs for larger household sizes, even when controlling for our age difference findings. This finding is consistent with other household transmission studies [11–13] that favored models with transmission rates scaled down with increasing household size, e.g. via frequency-dependent-like transmission assumptions. The precise mechanisms by which transmission probabilities for a given age-to-age pairing might decrease with increasing household size are unclear. There could be contact rate differences among members of larger vs. smaller families [12], or risk differences due to domicile size and layout or demographic factors other than age that correlate with household size. Differences in precaution behavior to prevent transmission might also play a role, particularly in later generations of infection after a given household introduction, which would be a larger factor for more populous households where multiple rounds of transmission are possible.
In combination, our findings for age-dependency and household size-dependency of household transmission result in dramatically HSAR depending on the scenario. Our model’s HSAR estimates range from 66% for middle-aged or older adults living alone together, down to 13% among age 0–24 individuals or between age 0–24 and 45 + individuals living together in households of size 5 or more. This finding suggests that standard estimates of a single averaged HSAR can be highly misleading if used to judge the risk of transmission in all households.
Our estimates quantifying transmission variability are consistent with our prior finding that the low-variability binomial transmission assumption traditionally used in household models can be rejected in favor of models incorporating higher variability (lower value of a dispersion parameter) for individual transmissibility [18]. Notably, our model factors out potential sources of variability that could correlate with age and/or household size. Yet we still found that incorporating additional variability for individual transmissibility within members of the same age/size groups was warranted, suggesting that other sources of variability beyond those determined by age and household size are important to represent [2].
The dispersion parameter, k, that we used to quantify transmission variability is analogous to the dispersion parameter k commonly used in negative binomial transmission models, in that it derives from the shape parameter of gamma-distributed individual transmissibility [1]. However, in the negative binomial model, k represents all potential contributions to variability, including variability in number of people contacted [31] and variation in non-contact-based characteristics that might correlate with age. Our model explicitly represents those features and leaves k to represent any remaining contributions to variability. I.e., we use k to represent variability of transmission probability to each housemate of a given age group, from a given age group, in a household of a given size. Therefore, if contact degree and and/or age/size-based differences contribute significantly to overall transmission variation, we would expect our estimate of k to be higher (lower variability) than estimated k from a negative binomial model. Indeed, a systematic review [32] found that 93% of 28 studies estimated k < 1 for the negative binomial model fit to COVID-19 transmission data, whereas we did not find strong support for k < 1 in our model (k = 1 could not be rejected and produced the most parsimonious model fit). However, our rejection of the lower variance binomial assumption () suggests the possibility that a large portion of overall transmission variability could be attributed to factors other than number of contacts or age/size-associated differences, such as the duration of infectiousness, organism shedding rate [33], or individual contact timing and intensity.
Our detailed data including results from up to three different antibody tests also presented a unique opportunity for comparing test accuracies against each other and against data from the test manufacturers or other sources. The rates of disagreement we found among the three tests and the survey data on prior test results (see Table C in Additional File 2) provided strong support for independently estimating sensitivity and specificity of the antibody tests, and those estimates were quite stable across widely different models of the underlying household epidemiology (see Table 3). Thus, we did not need to rely on assuming independent test accuracy estimates such as those from the test manufacturers, though comparing our results to others is illuminating. Our independent estimates of the antibody test specificities were remarkably consistent with results from independent data. However, our results for antibody test sensitivities included some disagreements with other sources. We found lower sensitivity estimates for the Aboott and Siemens tests compared to the test manufacturer, which might be explained by our data containing more participants who had experienced a longer interval between infection and serological testing during which antibodies could have waned subjects with a longer duration. Further research could illuminate why our Euroimmun sensitivity estimates were higher, perhaps by building a model that utilizes the quantitative Euroimmun results to investigate whether the manufacturer-suggested positive/indeterminate/negative thresholds were less appropriate for our implementation. Our findings suggest that, in epidemiologic studies that do not have multiple sources of data with which to directly estimate test accuracies, it would be reasonable to assume manufacturer-derived estimates of test specificity, but estimates of test sensitivity should be treated with much more caution.
Our extensive antibody test data also allowed us to estimate case ascertainment across age groups. By comparing survey data about whether antibody-positive individuals had ever tested positive prior to serological testing, we found a high level of age dependence: younger individuals with a prior infection were substantially less likely to have tested positive during their infection compared to older age groups. This finding provides evidence that case ascertainment in Utah had been substantially lower for children aged 0–12 compared to adolescents and adults, and somewhat lower for those aged 13–44 compared to those middle-aged and older, perhaps suggesting that those younger groups were less likely to experience symptoms that warranted seeking testing [34].
This study has several limitations. Our mathematical models contain many parameters, and the many-dimensional parameter space cannot feasibly be fully explored to calculate the likelihood in all regions. We are confident that we arrived at the global optimal parameter estimates because they were reached from many different starting points in parameter space by the maximization algorithm, but we cannot be 100% sure without mathematical proof. Further work on mathematical analysis and numerical methods could improve rigor and speed of the estimation procedure.
We did not test factors other than age composition and household size that might affect the way community acquisition risk or within-household transmission varies by household. Some households could have experienced higher collective community acquisition risk than others of similar age composition due to living in higher-risk locations, common work or school exposure, or common social exposure. Household transmission could also be driven by non-age-based properties of households such as age-independent contact behavior, health composition of household members, physical properties of the domicile such as size and ventilation, or other properties that could increase infection or transmission risk of all household members together. It is possible that further expansion of our model to include correlates with non-age data and/or household-level variability distributions for community acquisition or household transmission could provide further insights, but we suspect that these additions would push the limits of parameter identifiability against our data.
Some members of participating households might have been actively or very recently infected with SARS-CoV-2 at the time of sampling for serological testing. When ongoing or recent household outbreaks involved multiple infections, some of the infected members could have tested seropositive while others had yet to become infected or had yet to seroconvert on the day of sampling. For that scenario, our assumption that household outbreaks had reached their final size would be violated, contributing to underestimates of HSAR. We suspect that the proportion of those households was low, given the short time window in which those conditions would apply in a multi-person outbreak, and households might have been unlikely to choose to travel to a testing site while experiencing or recovering from symptomatic infection.
Our findings may contain biases related to non-participation rates of households that were selected and approached for inclusion in the study. Our data collection in the school district testing phase included a complicated sampling design with weights introduced partly to account for different rates of nonresponse across demographic strata. For simplicity we ignored these details and sampling weights for the analysis presented here. In the hotspot testing phase households were selected for invitation by simple random sample of addresses, and those choosing to participate may not be representative of those areas. Also, while the included sampling phases and locations cover a broad area of the most populous region in Utah, there may be important differences in households both within those areas and from the areas of Utah not covered. Thus, households with higher COVID-19 risk may be overrepresented or underrepresented in our data relative to their frequency in the broader population of households in Utah.
Our decision to exclude households with missing age data might have introduced some bias. The excluded households were larger on average (3.67 members vs. 2.32), likely due to households with more members having more difficulty completing full data collection. The excluded households also had somewhat higher rates of prior COVID-19 or antibody positivity among participating members. However, because our model included household transmission adjustment factors by household size, and larger households would be expected to experience more SARS-CoV-2 importations due to having more individuals at risk of community acquisition, it’s possible that our parameter estimates would be largely unchanged if those households had produced complete data.
We used participants’ prior vaccination information only to indicate whether positive test results for the two spike-protein-based antibody tests might have been triggered by the vaccine rather than prior SARS-CoV-2 infection; we did not attempt to assess whether the vaccine produced a protective affect against community and/or within-household acquisition in the period after receiving one or two vaccine doses. Our raw data show that vaccinated individuals tested positive by the Abbott (nucleocapsid) test at a lower rate than non-vaccinated individuals, particularly in the 65–79 age group, which had received higher vaccination coverage than younger age groups. Thus, it is possible that including a vaccination effect parameter on community acquisition or household transmission probability would produce a result suggesting a reduced infection probability. However, such a result would not be conclusive of a causal effect of the vaccine on acquisition risk, for the following reasons.
As vaccines had only recently become available in early 2021, only 12% of participants had received the full two-dose series prior to serological sampling, and about half of those had received their second dose less than one month prior. Thus, even the minority of those participants who were considered fully vaccinated at the time of testing had spent only a small portion of their cumulative time at risk since the onset of the pandemic in a potentially protected state. The subset of participants who had experienced both prior infection and vaccination could include both those who were infected prior to vaccination as well as break-through infections post vaccination, and inferring the relative timing of these events would be difficult with our data. Also, inferring a potential causal effect of vaccination on infection risk would be hampered by confounders, including risk-based vaccination eligibility, the potential reverse causal effect of known prior infection on the decision to vaccinate, and potential correlations between exposure risk and vaccine access and/or refusal among those eligible. If the vaccinated participants did experience substantial COVID-19 exposures between their vaccination and serological testing, and the vaccine substantially reduced their likelihood of infection by those exposures, then our results for the relative acquisition probabilities of older age groups (who had higher vaccine coverage) might be biased low.
Our dataset’s collection period ended on April 10, 2021, just prior to the widespread emergence of the Delta variant of SARS-CoV-2, which was first detected in Utah in mid-April 2021 and spread rapidly in the following weeks [35]. Thus, our estimates may not accurately reflect infection and transmission characteristics of dominant SARS-CoV-2 strains circulating in the U.S. and elsewhere from late Spring 2021 to the present, particularly because different strains have been found to exhibit different household transmission rates [36]. Also, as the state of Utah has a younger population, larger average household size, and other demographic differences compared to U.S. national averages [18], our household-based epidemiological estimates might be atypical. However, as we expressed our household transmission estimates as per-capita transmission probabilities from and to particular age groups and household sizes, our findings could serve as tool to study how overall household transmission patterns would differ in communities with different household size and age distributions.
Conclusions
We found evidence of important age differences in COVID-19 epidemiology in Utah households during 2020 and early 2021, including differences in case ascertainment, non-household acquisition, and within-household transmission. Household transmission differences by age occurred in patterns that suggest substantial differences in household contact rates between different age groups. A better understanding of those differences might be crucial to deciphering community-wide transmission patterns and mitigation efforts.
Supplementary Information
Acknowledgements
The authors acknowledge David Swerdlow, Fred Angulo, Jessica Atwell, and Farid Khan for inspiration for this data modeling work and comments on this manuscript. We acknowledge Stephen Alder, Adam Looney, Tom Greene, Brian Orleans, Yue Zhang, and other members of the Utah HERO Project team and the University of Utah Study Design & Biostatistics Center who helped design and/or implement the household participant recruitment and survey effort. We acknowledge Julio Delgado for directing the validation and implementation of the antibody assays at ARUP Laboratories and overseeing the validation of finger prick sample collection, as well as the staff at ARUP Laboratories who processed and tested sample specimens to produce the antibody data used in our analysis. LK acknowledges funds provided by the University of Utah Immunology, Inflammation, & Infectious Disease Initiative (26798 Seed Grant).
Abbreviations
- HSAR
Household secondary attack rate
- Utah HERO Project
Utah Health & Economic Recovery Outreach Project
- P
Positive antibody test result or prior positive reported on survey
- N
Negative antibody test result or no prior positive reported on survey
- X
Antibody test not performed or indeterminate for indicating prior infection
- MeSH
Medical Subject Headings
- MLE
Maximum likelihood estimate
- AIC
Akaike information criterion
Authors’ contributions
D.T. led the conception and design of the mathematical models, parameter estimation techniques, and computer code implementation, and led the writing of the manuscript. T.S. analyzed the household data, compiled tables, figures, and literature search, and designed model components. A.B. designed components of the mathematical model and tested components of the parameter estimation scheme. S.A. compiled data and led the writing of the data methods sections. NS worked on study participant recruitment and relayed study data to the modeling team. J.L. helped conceive the model comparison approach and performed background literature search and writing. L.K. collaborated on modeling study design, model specification, and manuscript writing. M.S. was a leader of the Utah HERO Project data collection, led the compilation of the data components used for parameter estimation, and helped conceive the modeling approach and organize the results. All authors contributed to writing and editing and reviewed and approved the final manuscript.
Funding
NS and MS received funding from the State of Utah (https://www.utah.gov/) under contract number AR3473 to collect the data analyzed in this manuscript. DJAT received funding from Pfizer, Inc. (https://www.pfizer.com) to perform initial analyses that led to the modeling results in this manuscript. DT, TS, AB, SA, JL, LK, and MS were supported to complete this work by the Centers for Disease Control and Prevention (https://www.cdc.gov/) under SHEPheRD 2021 Domain 1-A015, grant number 75D30121F00003.
Data availability
The data and codes, written in R version 4.0.3, used to produce the results in this manuscript are publicly available at https://github.com/damontoth/householdTransmission2021.
Declarations
Ethics approval and consent to participate
The University of Utah Institutional Review Board reviewed the surveillance project that produced the data analyzed in this manuscript and determined it as non-research public health surveillance, waived the requirement for documented consent, and determined that use of these data for analysis to understand the dynamics of SARS-CoV-2 transmission was exempt from further review (IRB_00132598). Individuals were informed of the project procedures and that participation was voluntary. Participants provided their agreement to participate and were given the chance to opt out of having their data used for future research. The data were analyzed anonymously for this manuscript.
Consent for publication
Not applicable.
Competing interests
D.T. reports receiving grants from Pfizer, Inc. A.B. reports receiving graduate student funding from bioMérieux. All other authors declare that they have no competing interests.
Footnotes
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Lloyd-Smith JO, Schreiber SJ, Kopp PE, Getz WM. Superspreading and the effect of individual variation on disease emergence. Nature. 2005;438(7066):355–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.VanderWaal KL, Ezenwa VO. Heterogeneity in pathogen transmission: mechanisms and methodology. Funct Ecol. 2016;30(10):1606–22. [Google Scholar]
- 3.Pellis L, Cauchemez S, Ferguson NM, Fraser C. Systematic selection between age and household structure for models aimed at emerging epidemic predictions. Nat Commun. 2020;11(1):906. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Buckee C, Noor A, Sattenspiel L. Thinking clearly about social aspects of infectious disease transmission. Nature. 2021;595(7866):205–13. [DOI] [PubMed] [Google Scholar]
- 5.Ma KC, Menkir TF, Kissler S, Grad YH, Lipsitch M. Modeling the impact of racial and ethnic disparities on COVID-19 epidemic dynamics. Elife. 2021;10:10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Ferguson NM, Donnelly CA, Anderson RM. Transmission dynamics and epidemiology of dengue: insights from age-stratified sero-prevalence surveys. Philos Trans R Soc Lond B Biol Sci. 1999;354(1384):757–68. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Meyerowitz EA, Richterman A, Gandhi RT, Sax PE. Transmission of SARS-CoV-2: a review of viral, host, and environmental factors. Ann Intern Med. 2021;174(1):69–79. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Qian H, Miao T, Liu L, Zheng X, Luo D, Li Y. Indoor transmission of SARS-CoV-2. Indoor Air. 2021;31(3):639–45. 10.1111/ina.12766. [DOI] [PubMed]
- 9.Yonker LM, Neilan AM, Bartsch Y, Patel AB, Regan J, Arya P, et al. Pediatric severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2): clinical presentation, infectivity, and Immune responses. J Pediatr. 2020;227:45–52. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Ghosh AK, Venkatraman S, Soroka O, Reshetnyak E, Rajan M, An A, et al. Association between overcrowded households, multigenerational households, and COVID-19: a cohort study. Public Health. 2021;198:273–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Hart WS, Abbott S, Endo A, Hellewell J, Miller E, Andrews N, et al. Inference of the SARS-CoV-2 generation time using UK household data. Elife. 2022;11:11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Endo A, Uchida M, Kucharski AJ, Funk S. Fine-scale family structure shapes influenza transmission risk in households: insights from primary schools in Matsumoto city, 2014/15. PLoS Comput Biol. 2019;15(12):e1007589. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Cauchemez S, Carrat F, Viboud C, Valleron AJ, Boelle PY. A bayesian MCMC approach to study transmission of influenza: application to household longitudinal data. Stat Med. 2004;23(22):3469–87. [DOI] [PubMed] [Google Scholar]
- 14.Bubar KM, Reinholt K, Kissler SM, Lipsitch M, Cobey S, Grad YH, et al. Model-informed COVID-19 vaccine prioritization strategies by age and serostatus. Science. 2021;371(6532):916–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Metcalf CJ, Farrar J, Cutts FT, Basta NE, Graham AL, Lessler J, et al. Use of serological surveys to generate key insights into the changing global landscape of infectious disease. Lancet. 2016;388(10045):728–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Longini I, Koopman JS. Household and community transmission parameters from final distributions of infections in households. Biometrics. 1982;38(1):115–26. [PubMed] [Google Scholar]
- 17.Bi Q, Lessler J, Eckerle I, Lauer SA, Kaiser L, Vuilleumier N, et al. Insights into household transmission of SARS-CoV-2 from a population-based serological survey. Nat Commun. 2021;12(1):3643. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Toth DJA, Beams AB, Keegan LT, Zhang Y, Greene T, Orleans B, et al. High variability in transmission of SARS-CoV-2 within households and implications for control. PLoS ONE. 2021;16(11):e0259097. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Utah health and economic recovery outreach home - Utah hero. 2021. Available from: https://eccles.utah.edu/utah-hero/.
- 20.Samore MH, Looney A, Orleans B, Greene T, Seegert N, Delgado JC, et al. Probability-based estimates of severe acute respiratory syndrome coronavirus 2 seroprevalence and detection fraction, Utah, USA. Emerg Infect Dis. 2021;27(11):2786–94. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Hershow RB, Wu K, Lewis NM, Milne AT, Currie D, Smith AR, et al. Low SARS-CoV-2 transmission in elementary schools - Salt Lake County, Utah, December 3, 2020-January 31, 2021. MMWR Morb Mortal Wkly Rep. 2021;70(12):442–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Anti-SARS-CoV-2 ELISA (IgG) - instructions for use. 2022. Available from: https://www.fda.gov/media/137609/download.
- 23.SARS-CoV-2 IgG architect - instructions for use. 2020. Available from: https://www.fda.gov/media/137383/download.
- 24.Dimension vista SARS-CoV-2 IgG (COV2G) - instructions for use. Available from: https://www.fda.gov/media/145020/download.
- 25.Tang MS, Hock KG, Logsdon NM, Hayes JE, Gronowski AM, Anderson NW, et al. Clinical performance of two SARS-CoV-2 Serologic assays. Clin Chem. 2020;66(8):1055–62. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Rychert J, Couturier MR, Elgort M, Lozier BK, La’ulu S, Genzen JR, et al. Evaluation of 3 SARS-CoV-2 IgG antibody assays and correlation with neutralizing antibodies. J Appl Lab Med. 2021;6(3):614–24. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Demmer RT, Baumgartner B, Wiggen TD, Ulrich AK, Strickland AJ, Naumchik BM, et al. Identification of natural SARS-CoV-2 infection in seroprevalence studies among vaccinated populations. Mayo Clin Proc. 2022;97(4):754–60. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Age Groups - MeSH - NCBI. 2023. Available from: https://www.ncbi.nlm.nih.gov/mesh/68009273.
- 29.Cronin CJ, Evans WN. Nursing home quality, COVID-19 deaths, and excess mortality. J Health Econ. 2022;82:102592. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Paul LA, Daneman N, Schwartz KL, Science M, Brown KA, Whelan M, et al. Association of age and pediatric household transmission of SARS-CoV-2 infection. JAMA Pediatr. 2021;175(11):1151–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Bansal S, Grenfell BT, Meyers LA. When individual behaviour matters: homogeneous and network models in epidemiology. J R Soc Interface. 2007;4(16):879–91. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Wegehaupt O, Endo A, Vassall A. Superspreading, overdispersion and their implications in the SARS-CoV-2 (COVID-19) pandemic: a systematic review and meta-analysis of the literature. BMC Public Health. 2023;23(1):1003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Ke R, Martinez PP, Smith RL, Gibson LL, Mirza A, Conte M, et al. Daily longitudinal sampling of SARS-CoV-2 infection reveals substantial heterogeneity in infectiousness. Nat Microbiol. 2022;7(5):640–52. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Davies NG, Klepac P, Liu Y, Prem K, Jit M, group CC-w, et al. Age-dependent effects in the transmission and control of COVID-19 epidemics. Nat Med. 2020;26(8):1205–11. [DOI] [PubMed] [Google Scholar]
- 35.Keegan LT, Truelove S, Lessler J. Analysis of vaccine effectiveness against COVID-19 and the emergence of Delta and other variants of concern in Utah. JAMA Netw Open. 2021;4(12):e2140906. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Madewell ZJ, Yang Y, Longini IM Jr, Halloran ME, Dean NE. Household secondary attack rates of SARS-CoV-2 by variant and vaccination status: an updated systematic review and meta-analysis. JAMA Netw Open. 2022;5(4):e229317. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The data and codes, written in R version 4.0.3, used to produce the results in this manuscript are publicly available at https://github.com/damontoth/householdTransmission2021.



