An integrated risk and epidemiological model to estimate risk-stratified COVID-19 outcomes for Los Angeles County: March 1, 2020—March 1, 2021

Abigail L Horn; Lai Jiang; Faith Washburn; Emil Hvitfeldt; Kayla de la Haye; William Nicholas; Paul Simon; Maryann Pentz; Wendy Cozen; Neeraj Sood; David V Conti

doi:10.1371/journal.pone.0253549

. 2021 Jun 24;16(6):e0253549. doi: 10.1371/journal.pone.0253549

An integrated risk and epidemiological model to estimate risk-stratified COVID-19 outcomes for Los Angeles County: March 1, 2020—March 1, 2021

Abigail L Horn ^1,^*, Lai Jiang ¹, Faith Washburn ², Emil Hvitfeldt ¹, Kayla de la Haye ¹, William Nicholas ², Paul Simon ², Maryann Pentz ¹, Wendy Cozen ¹, Neeraj Sood ^3,⁴, David V Conti ¹

Editor: Martial L Ndeffo Mbah⁵

PMCID: PMC8224896 PMID: 34166416

Abstract

The objective of this study was to use available data on the prevalence of COVID-19 risk factors in subpopulations and epidemic dynamics at the population level to estimate probabilities of severe illness and the case and infection fatality rates (CFR and IFR) stratified across subgroups representing all combinations of the risk factors age, comorbidities, obesity, and smoking status. We focus on the first year of the epidemic in Los Angeles County (LAC) (March 1, 2020–March 1, 2021), spanning three epidemic waves. A relative risk modeling approach was developed to estimate conditional effects from available marginal data. A dynamic stochastic epidemic model was developed to produce time-varying population estimates of epidemic parameters including the transmission and infection observation rate. The epidemic and risk models were integrated to produce estimates of subpopulation-stratified probabilities of disease progression and CFR and IFR for LAC. The probabilities of disease progression and CFR and IFR were found to vary as extensively between age groups as within age categories combined with the presence of absence of other risk factors, suggesting that it is inappropriate to summarize epidemiological parameters for age categories alone, let alone the entire population. The fine-grained subpopulation-stratified estimates of COVID-19 outcomes produced in this study are useful in understanding disparities in the effect of the epidemic on different groups in LAC, and can inform analyses of targeted subpopulation-level policy interventions.

1 Introduction

Health disparities have emerged with the COVID-19 epidemic because the risk of exposure to infection and the prevalence of risk factors for severe outcomes given infection vary within and between populations and over time [1–7]. For public health policy makers to better address the pandemic, models reporting stratified estimates are necessary to investigate the potential outcomes of policy scenarios targeting specific subpopulations. However, estimated epidemic quantities such as rates of severe illness and death, the case fatality rate (CFR), and the infection fatality rate (IFR) are often expressed in terms of aggregated population-level estimates or by age group alone due to the lack of epidemiological data at the refined subpopulation level [8–10]. While data may be available for single risk factor strata such as by age [11], data on subpopulations representing individuals with combinations of risk factors are not reported or available. Conventionally, estimates of risk effects and outcomes given combinations of conditions are obtained through access to individual-level data and the application of multiple regression techniques [5, 12]. At the time of this study, individual-level COVID-19 data were not widely available nor sampled in an appropriate manner to avoid substantial bias [13].

In this paper we develop a model that produces stratified estimates of the probability of disease progression and death for subpopulations representing individuals with combinations of risk factors important for COVID-19 using dynamic epidemiological data at the aggregated population level [14], published studies on the risk of individual risk factors on illness severity, and prevalences of risk factors in the general population. In the absence of access to individual-level data, we apply a statistical technique developed for joint analysis of marginal summary statistics (JAM) [15] to obtain estimates of the conditional effects of combinations of COVID-19 risk factors on the probability of developing severe illness and death, using data from published studies reporting the marginal effects of individual risk factors [2, 3]. We consider the risk factors age, existing comorbidities, obesity, and smoking. Separately, we develop a stochastic epidemic model and use Bayesian methods to estimate time-varying probabilities of hospitalization, ICU admission, and death given infection at the population level. We integrate the conditional risk effects and the population-level probabilities, together with available dynamic data on the prevalence of infections and deaths stratified by age, to estimate the probability of disease progression, CFRs and IFRs, stratified across all plausible combinations of the modeled risk factors. This approach allows us to produce risk-stratified estimates without access to either individual-level data on disease progression, or subpopulation-level dynamics of infections, hospitalizations, ICU admissions, and deaths by risk groups.

Focusing on Los Angeles County (LAC), the most populous and one of the most diverse counties in the United States, we analyze the estimated overall and risk-stratified time-varying disease progression probabilities, CFRs, and IFRs in relation to the epidemic timecourse and implemented policy decisions through the first year of the epidemic, from March 1, 2020 through March 1, 2021. Our analysis is framed in terms of three epidemic waves experienced in LAC; a first wave from March 1—May 6, 2020, diminished through a strict lockdown; a second and larger wave, May 7—October 14, 2020, which peaked at the end of July; and a third significantly large wave that began on October 15, 2020, peaked in mid-January, and had subsided by March 1, 2021.

The integrated model allows the comparison of dynamic outcomes and parameters across the overall population, age groups, and more fine-grained subpopulations in LAC representing age and combinations of other risk factors for severe COVID-19 illness. Such fine-grained results can be useful in understanding disparities in the effect of the epidemic on different groups in LAC and can inform studies involving targeted subpopulation-level policy interventions [16].

2 Methods

We developed a single-population stochastic dynamic epidemic model that accounts for observed and unobserved transmission of COVID-19 and trajectories through the healthcare system with hospitalization, ICU admission, and death. Using Bayesian methods for parameter estimation and uncertainty quantification, we estimated the population-average time-varying probabilities of transitions between the infected, hospitalized, ICU, death, and recovery compartments, and the resulting population-average time-varying case fatality rate (CFR, defined as deaths over observed infections) and infection fatality rate (IFR, defined as deaths over all infections) (Section 2.1). In parallel, we used available data from published studies on the marginal effects of individual risk factors (age, existing comorbidities, obesity, smoking) to calculate conditional risk effects estimates for three models: (1) hospitalization given infection, (2) ICU admission given hospitalization, and (3) death given ICU admission. The conditional risk estimates were integrated with the corresponding probability estimates $\hat{α_{t}}$ , $\hat{κ_{t}}$ , and $\hat{δ_{t}}$ from the dynamic epidemic model to create a risk model (Section 2.2). The risk model enables us to estimate, stratified across 54 combinations of the levels of the modeled risk factors (i.e. risk profiles), the probability of each stage of disease given infection within LAC. Finally, we integrate the time-varying stratified probability of each stage of disease with the timeseries of observed infections, estimated total infections including observed and unobserved, deaths, together with available data on the prevalence of infections stratified by age, to estimate the risk profile-stratified CFR and IFR across time (Section 2.2.5).

2.1 Epidemic model

We develop a model of COVID-19 transmission in a single, homogeneously-mixed population divided into nine compartments representing different disease states (Fig 1). Compartments relating to the transmission of infection are the widely-used susceptible (S), exposed (latent but not yet infectious) (E), infectious and observed (I), and recovered (R) classes. By including the exposed compartment, we are able to model the delay between individuals being exposed to infection and becoming infectious. We also include a compartment representing infectious individuals with unobserved and/or unconfirmed infections (A). I represents cases of infection that have tested positive for the SARS-CoV2 virus and are confirmed in the official register of infection case data. A represents cases that are symptomatic but do not appear in the confirmed case data, whether because they are asymptomatic, are symptomatic and do not get tested, or get tested and have a false negative result. We model healthcare utilization and outcome at a more granular level by including compartments representing individuals that are in hospital (H), in ICU care (Q), and that die (D). H includes individuals that are receiving care services in skilled nursing facilities (i.e., SNFs). D represents only deaths that are confirmed as being COVID-19 related. Each individual can only be in one state at each point in time.

Fig 1 — Model compartments with available data are represented as square compartments.

Our model applies the following logic and assumptions. Susceptible individuals will become exposed and develop infection (emphasizing that exposure to the virus is not a sufficient condition for developing an infection) and move to the exposed but latent state E, meaning they will become, but are not yet, infectious. The transfer of susceptible individuals into the exposed state happens at a per capita rate β_t, the transmission rate, defined as the average number of individuals that an infected individual will infect per day. β_t controls the rate of disease spread and reduces following modifications including non-pharmaceutical interventions (NPIs). By including the exposed compartment, we are able to model the delay between individuals being exposed to infection and becoming infectious. From the exposed and latent state, individuals will transition to one of the two active infection states: a time-varying fraction r_t of these cases will transfer to the observed infectious state I, and the remaining 1 − r_t will transfer to the unobserved infectious state A. Therefore, the parameter r represents the fraction of all infectious cases that are observed and confirmed. We assume that new infections are created only by individuals in the infected classes (I and A), and that individuals in all other compartments, including in hospital, do not contribute to transmission. Individuals transfer from the exposed to infectious and observed (I) or unobserved (A) compartments at a rate equal to the inverse of the mean latency period, d_EI.

Infectious cases may either move directly into the recovered state (R), or into hospitalization (H) if further care is required. Of all observed and infectious cases (I), we assume that individuals will require hospitalization with probability α_t, equal to P_t(H|I). Infectious individuals transition into the hospitalized state at a rate equal to the inverse of the time between infectiousness and hospitalization, d_IH, or move directly to recovery at a rate equal to the inverse of the mean time of infection given that hospitalization is not required, d_IR. Hospitalized individuals will require ICU care with a probability κ_t, equal to P_t(Q|H), and transfer into Q at a rate equal to the inverse of the mean time in hospital given that ICU care will be required, d_HQ. With probability 1 − κ_t they will recover and move into R at a rate of the inverse of the mean time in hospital given that ICU care will not be required before recovery, d_HR. Individuals in the ICU will recover with probability 1 − δ_t, moving to R at rate equal to the inverse of the mean time in ICU before recovery, d_QR, or will die with probability δ_t, equal to P_t(D|Q), moving to D with rate equal to the inverse of the mean time in ICU care given a fatal case, d_QD. We assume that all unobserved infections (A) will recover directly, since admission to hospital would entail a COVID-19 test. These cases transition to R at the same rate as observed infectious (I) individuals, 1/d_IR. We assume recovered individuals cannot be reinfected due to immunity, and cannot infect others. While the dynamics of transitions within the healthcare setting would change if hospital and ICU, capacity are reached, we do not model this condition. The only route to death is through an observed infection followed by hospitalization and ICU care, meaning we do not model individuals that die from COVID-19 illness at home rather than at a point-of-care. We justify this assumption because the majority of confirmed COVID-19 deaths cases result from individuals who die in SNF, hospital, or following a stay in hospital; analysis of death certificate data from California indicates that 4%-9% of official COVID-19 deaths have occurred at home, across the three epidemic waves (E. Garcia, personal communication based on unpublished data for the state of California, April 20, 2021). Furthermore, we do not model a route to death for individuals without a confirmed COVID-19 infection (A), since record of confirmed COVID-19 infection (or probable based on clinical evidence) is needed to be classified as COVID-19 mortality [17].

To model this dynamical state system we employ a discrete-time approximation to the corresponding stochastic continuous-time Markov process in which transitions of individuals between disease stages are seen as stochastic movements between the corresponding population compartments with random transition rates [18, 19]. This model keeps track of the number of individuals in each compartment and the flows of individuals transitioning between compartments through a set of coupled discrete-time multinomial counting processes with transmission rates defined by Poisson processes. To simulate from this system we employ a Euler numerical scheme for Markov process models [18]. For more details see Section 1.5 in S1 Appendix.

The reproductive number, R_t, defined as the mean number of secondary cases generated by a typical infectious individual on each day in a fully susceptible population [20], is a function of model parameters including the transmission rate β_t [21], and like β_t, changes in time with behavior and interventions. We use the Next Generation Matrix approach to solve for the reproductive number (Section 1.6 in S1 Appendix) and find that,

\begin{matrix} R_{t} = β_{t} [\frac{r_{t}}{\frac{α_{t}}{d_{I H}} + \frac{1 - α_{t}}{d_{I R}}} + (1 - r_{t}) d_{I R}] . \end{matrix}

(1)

Thus, R_t is a function of the transmission rate β_t as well as other model parameters.

As the pandemic continues and the susceptible population decreases, it becomes important to study the effective reproductive number R_efft, equal to R_t multiplied by the fraction of the population that is susceptible at time t, S(t)/P(t). When R_efft < 1, the epidemic will begin to decrease (although stuttering chains of epidemic growth may still occur in a stochastic model).

2.2.1 Parameter estimation

All transition rate parameters (e.g., the inverse of the time between exposure and infectiousness d_E I) are modeled as fixed values taken directly from published literature (Table 2 in S1 Appendix). The model has five unknown parameters, θ = {β_t, r_t, α_t, κ_t, δ_t}, which we estimate from COVID-19 data for LAC (Table 3 in S1 Appendix).

Due to the relationships between the five interacting model parameters {β_t, r_t, α_t, κ_t, δ_t} in the model formulation, a tractable likelihood function was not possible for our model and a likelihood-free method of parameter estimation was required. Furthermore, the formulation of the model allows for multiple parameter solutions to exist. This means that estimated posterior distributions will be multimodal if allowed to vary over a wide prior parameter space. We use a two-step likelihood-free sampling process to define unimodal posterior distributions and achieve convergence in parameter estimates, using a broad grid search followed by approximate Bayesian computation (ABC) sampling. We first perform a broad grid search to identify possible regions for each parameter, from which we decide on a single mode. External data sources were used to specify the parameter range for the grid search (discussed below). Second, we use ABC sampling to estimate the final posterior distribution for each parameter with a prior distribution informed by the chosen mode from the grid search step. Specifically, we define a prior distribution for ABC as a normal distribution with 95% of its values lying within ±25% of the mean value of the chosen mode; for example, if the mean of a chosen mode for parameter X is determined to be 0.1, then the prior distribution for X will be a normal distribution with standard deviation of 0.01, chosen such that Pr(0.075 < X < 0.125) ≈ 95%.

Because prior information from existing studies on the reproductive number R_t is more readily available than for β_t, we estimate R_t from data and then use Eq 1 to obtain β_t from R_t. We first estimate R0, the initial value of the reproductive number before any interventions. The grid search parameter space for R0 is informed by values estimated from previous published studies on COVID-19 [22, 23]. We use geolocation trace data from smartphones, i.e. mobility data, to inform the magnitude and the timing of changes in the distribution of R_t over time from the initial R0 value. We incorporate data for LAC provided by Unacast [24] on reductions in distances travelled and encounter rates [25]. Interestingly, this data source diverges from observed trend in infections with the third epidemic wave, demonstrating a decrease in mobility activity as the epidemic surge took off. Thus, for dates after October 15, 2020, we do not use mobility data to inform the grid search space for R_t and instead set this equal to 1 < R_t < R0, for dates corresponding to increasing infection trends. The grid search parameter space for the fraction of observed cases out of all infections, r_t, was informed by results of a CDC study reporting seroprevalence surveys across 10 communities in March through early May 2020 for t within that time period, and was allowed to vary more widely for dates after May 2020 [26]. Grid search ranges for the parameters representing the probabilities of disease stage progression, α_t, κ_t, and δ_t, were informed by the ratios of the observed numbers of infections, hospitalizations, and deaths in LAC. Prior distributions used in parameter estimation are specified in Section 2.1 in S1 Appendix.

The model was fit to the daily and cumulative count of observed infections and deaths, and current numbers in-hospital and in-ICU, coming from the GitHub page of the Los Angeles Times (LA Times) Data and Graphics Department [14]. The infection and death data is sourced from reports logged by LA Times reporters and editors based on reports from the LAC Department of Public Health. The in-hospital and in-ICU data was sourced by the LA Times directly from the California Department of Public Health’s Open Data Portal [27]. We use the total of both confirmed and suspected COVID-19 patients in hospital or ICU.

Using ABC on multiple parameters simultaneously produces a joint posterior distribution over all parameters. We simulate the model with each set of jointly-estimated values to produce estimated timeseries of all state variables, as well as to estimate the time-varying case fatality rate, CFR_t, and infection fatality rate, IFR_t at each model run. These are calculated as estimated deaths (D) over estimated cumulative observed infections (I) or estimated cumulative total infections (I+A), respectively, on date t. Specifically, we simulate the model over 100 jointly estimated parameter sets and 20 stochastic realizations for each set, resulting in 2000 total realizations. We pool together all simulated model trajectories and report their median and 95% credible intervals (CI) as the 2.5th/97.5th quantiles of realizations. This procedure quantifies uncertainty from two sources: variability due to posterior parameter distributions, and variability due to the stochastic variation between model runs with the same parameter values.

2.2 Risk model

The risk model produces estimates of the probability of disease progression (infection to hospitalization, ICU admission, and death), and the CFR and IFR, stratified across 54 risk profiles q ∈ Q, representing all combinations of the different levels of the risk factors age, comorbidities, obesity, and smoking status. The profile-stratified estimates are tied to the LAC population through multiple inputs: they are mean-centered on the overall population epidemic model estimated parameters α_t, κ_t, and δ_t, take into account the prevalence of each risk factor in the LAC population, and are designed to match the prevalence of each risk profile over infections and deaths in LAC. The profile-stratified probabilities of disease progression, CFR, and IFR are estimated through six steps, described below and summarized in terms of data inputs, modeling or analysis steps, and outputs or estimates in the flow diagram in Fig 2.

Fig 2 — The diagram shows data inputs, modeling or analysis steps, and outputs or estimates.

Estimate the population-average probability that individuals in LAC who acquire infection are admitted to hospital, $\hat{α_{t}}$ , who are in hospital require admittance to the ICU, $\hat{κ_{t}}$ , and who are in ICU will die, $\hat{δ_{t}}$ , using the epidemic model and Approximate Bayesian Computation as described in the previous section. We also use the epidemic model to estimate the timeseries of the numbers of the LAC population that are infected and observed, infected and unobserved, in hospital and ICU, and deaths.
Calculate conditional relative risk (RR) estimates for three risk factors (existing comorbidities, obesity, smoking) conditional on one another and on age for three models (m): hospitalization given illness, (H|I), ICU admission given hospitalization, (Q|H), and death given hospitalization, (D|Q). This is done using available data from published studies on the marginal RR of the four factors on COVID-19 illness at each stage of disease, the correlation between these factors in a population resembling that of LAC, and a statistical method called the Joint Analysis of Marginal summary statistics (JAM).
Estimate the prevalence of each risk profile q ∈ Q in the infected population over time. Data inputs to this step are the prevalence of each marginal risk factor in the overall LAC population, the correlation between the risk factors in a population resembling that of LAC, and the proportion of each marginal age group out of all COVID-19 infections in LAC.
Estimate the probability of disease stage progression across all risk profiles over time, $\underline{\hat{P_{t} (H | I)}}$ , $\underline{\hat{P_{t} (Q | H)}}$ , and $\underline{\hat{P_{t} (D | Q)}}$ such that they are mean centered on the probability estimates from the epidemic model $\hat{α_{t}}$ , $\hat{κ_{t}}$ , and $\hat{δ_{t}}$ , respectively. These estimates are produced by integrating $\hat{α_{t}}$ , $\hat{κ_{t}}$ , and $\hat{δ_{t}}$ (Step 1) with the corresponding conditional risk estimates for each model (H|I), (Q|H), and (D|Q) (Step 2), and the prevalence of each risk profile in the infected population (Step 3) within a logistic model. This step also produces estimates of the frequency of each profile q ∈ Q at each stage of disease. This is similar in spirit to epidemiologic approaches that combine risk estimates with baseline hazard rates from external sources to estimate absolute risk [28].
Estimate the conditional RR of age on the other three risk factors, by iteratively adjusting the conditional RR for each age group input to the logistic risk model (Step 4) until the frequency of each age group over deaths (an output from Step 4) matches the observed distribution of each age age-over-deaths distribution for LAC.
Estimate the CFR_q,t and IFR_q,t for each risk profile using the profile-stratified frequency in infections (Step 3) and deaths (Step 4), and the timeseries of observed infections, unobserved infections, and deaths for LAC overall (Step 1). This is produced by projecting the frequency of each risk profile at each stage of disease (Step 4) onto estimated timeseries of I, I + A, and D, and dividing to obtain the CFR_q,t and IFR_q,t ratios.

Below, we provide an overview of the methodology used in each step besides Step 1, which was described in Section 2.1. Further details on the methodologies employed in each step are provided in S1 Appendix, Part II. A summary of the mathematical notation used in the risk model is provided on Table 8 in S1 Appendix.

2.2.1 Step 2: Conditional RR for BMI, smoking, and comorbidities

The risk factors p ∈ P included in our analysis are age, body mass index (BMI), smoking status (smoking), and any comorbidity (comorbidity). The comorbidities included are diabetes, hypertension, chronic obstructive pulmonary disease (COPD), hepatitis B, coronary heart disease, stroke, cancer and chronic kidney disease. We modeled age and BMI as an ordinal variable and assume an additive effect of both age and BMI on the three outcomes. Age was categorized within four groups: 0 − 18, 19 − 49, 50 − 64, 65 − 79, and 80+. BMI was categorized in three groups according to obesity classes: Class 1 (no obesity) $B M I < 30 \frac{k g}{m^{2}}$ ; Class 2 (obesity), $30 \leq B M I \leq 40 \frac{k g}{m^{2}}$ ; Class 3 (severe obesity), $B M I > 40 \frac{k g}{m^{2}}$ . Any comorbidity and smoking status were modeled as binary variables.

We estimate the conditional RR for BMI, smoking, and comorbidity, conditional on age, for the three models (H|I), (Q|H), and (D|Q) using marginal RR estimates available from reported studies and a method called the Joint Analysis of Marginal Summary Statistics (JAM) [15]. JAM uses two pieces of information: (i) the marginal RR between risk factors and the outcome and (ii) a reference correlation structure between the risk factors, Σ. For information informing (i) we obtain the marginal log RR between individual risk factors and COVID-19 illness severity from peer-reviewed published COVID-19 studies [2, 3] (left column of Table 1). For (ii), we estimate the correlation structure Σ using individual-level data from the National Health and Nutrition Examination Survey (NHANES) from 2017–2018 [29], weighted by race/ethnicity proportions to create a population resembling that of LAC (Section 6 in S1 Appendix).

Table 1. The marginal relative risk of each stage of disease collected from published studies on COVID-19 and conditional relative risk estimated by the risk model for each risk factor on rates of hospitalization given infection, (H|I); ICU admission given hospitalization, (Q|H); and death given ICU admission, (D|Q) (95% credible interval).

The reference group is individuals with no comorbidity, $B M I < 30 \frac{k g}{m^{2}}$ , and non-smoking.

Risk Factors	Marginal RR (95% CI)	Conditional RR (95% CI)
(H\|I)
Ordinal BMI	2.98 (2.61, 3.39)	1.82 (1.06, 3.15)
Smoker	1.40 (0.90, 2.17)	1.76 (0.21, 14.52)
Any comorbidity	3.18 (2.42, 4.18)	1.50 (0.59, 3.84)
(Q\|H)
Ordinal BMI	1.01 (0.86, 1.18)	1.05 (0.65, 1.69)
Smoker	1.71 (0.87, 3.38)	1.61 (1.45, 1.79)
Any comorbidity	1.34 (0.87, 2.06)	1.02 (0.86, 1.20)
(D\|Q)
Ordinal BMI	1^†	1.12 (0.73, 1.71)
Smoker	1^†	1.96 (1.33, 2.89)
Any comorbidity	1.64 (0.81, 3.32)	1.05 (0.78, 1.43)

Open in a new tab

^†We set the marginal RR for ordinal BMI and smoker to 1 because we did not find the association between obesity class, smoking status, and the likelihood of death given ICU admission D|Q due to COVID-19 in the published literature.

Using the marginal summary statistics from (i), specifically the marginal log relative risks $ψ_{p, m}^{M a r g}$ for risk factor p and model m, JAM obtains conditional log relative risks $ψ_{p, m}^{C o n d}$ for each factor. To accomplish this JAM first expresses the relationship between an outcome m, such as hospitalization given infection, ICU admission given hospitalization, and death given ICU admission, and the risk factors p ∈ P as a normal linear model, m ∼ N(Pψ, τ²I). For such a model the conditional or adjusted estimates of effect are given by $\hat{ψ^{C o n d}} = {(P^{'} P)}^{- 1} P^{'} m$ . To fit this model without access to individual-level data we substitute an estimate of P′P based on an estimate of this matrix using the correlation Σ between the risk factors from external NHANES data as specified in (ii). P′m defines the mean value of the outcome for each of the corresponding values of the risk factor. These can be constructed using the marginal log relative risks $ψ_{p, m}^{M a r g}$ and the frequencies of each risk factor in the population. (See Section 5 in S1 Appendix for more details).

2.2.2 Step 3: Prevalence of each risk profile in the infected population

We estimate the time-varying frequency of each risk profile in the infected population, f_t,q,I. First, we estimate the frequency of the risk profiles q in the overall LAC population, l_q, by simulating a sample population based on the prevalence of each individual risk factor in LAC, and the weighted correlation structure between the risk factors Σ obtained from NHANES data from Step 2. The prevalence of each age group comes from the American Community Survey via the tidycensus R package [30]. The prevalence of obesity, smoking and all comorbidities besides cancer come from the Los Angeles County Health Survey (LACHS), study year 2018 [31]. The prevalence of cancer comes from the California Health Information Survey (CHIS) [32]. Using the vector of prevalences of each risk factor, l_p, and correlation structure Σ, we generate a simulated population χ by sampling from a multivariate normal, χ ∼ N(x; l_p, Σ), where x is the number of samples. An any comorbidity variable is constructed as an indicator if any of the comorbidities are present. We then calculate the vector of the frequencies of each risk profile in the overall LAC population, l_q, as its relative frequency in the simulated population χ.

Second, we use COVID-19 infection cases by age group in illnesses [14] together with the estimate of the prevalence of each risk factor in the overall LAC population, l_q, to estimate the frequency of each profile within the infected population on each date. We anchor our estimate of the frequency of each profile over infections on the frequency over ages, as age is the only risk factor with observed infection prevalence data in LAC.

Infection timeseries data by age group come from the LA Times. We use the age group infection numbers from the state of California because data for LAC is not available. We checked that the distribution for aggregate age groups in this California data resemble that reported by the LAC Department of Public Health [33]. To estimate the frequency of each risk profile q in the infected population, f_t,q,I, the frequency of each age group over infections is stratified across the risk profiles according to the relative frequency of each profile in the baseline LAC population.

2.2.3 Step 4: Risk-profile-stratified probabilities of disease stage progression

We construct a logistic model to estimate the probability of disease stage progression across all risk profiles over time, $\underline{\hat{P_{t} (H | I)}}$ , $\underline{\hat{P_{t} (Q | H)}}$ , and $\underline{\hat{P_{t} (D | Q)}}$ . Specifically, the model combines the 54 risk profiles as linear combinations of the risk factors specified in a mean centered design matrix, X; and their corresponding conditional log-RR obtained from JAM, $\underline{\hat{ψ}}$ ; with specified intercepts set to the estimated probabilities from the epidemic model (Section 2.1) for $\hat{α_{t}}$ , $\hat{κ_{t}}$ , $\hat{δ_{t}}$ , respectively. For example, to estimate the vector of probabilities of hospitalization given infection for all risk profiles we use $\underline{\hat{P_{t} (H | I)}} = e x p i t (\hat{α_{t}} + X \underline{\hat{ψ}})$ . The reference profile are individuals age 0 − 18 with no comorbidity, $B M I < 30 \frac{k g}{m^{2}}$ , and non-smoking.

The mean-centered design matrix is based on the frequency of each risk profile at each stage of disease. The frequency of each profile in infections comes from Step 3. The frequency at subsequent stages is calculated recursively for each stage of disease (in hospital, in ICU, and deceased) using the frequencies in the previous stage of disease and the calculated incoming probabilities.

2.2.4 Step 5: Conditional RR for age

Rather than estimate the conditional RR for age using the same methodology as for the other factors as described in Step 2, we estimate the conditional RR of each age group separately since we have observed data on the distribution of each age group over deaths for LAC. Given this information, we aim to find the solution set that minimizes the distance between the distribution over deaths produced by the logistic model and the observed distribution. Specifically, we choose the conditional RR for age such that the distance between the frequency of each age group over deaths produced in Step 4 and the observed distribution of each age group over deaths in LAC is minimized. This is done through an iterative optimization process in which we vary over a wide search space the conditional RR for each age group for each model that are input to the logistic model (Step 4) and find the maximizing values for the conditional RR.

2.2.5 Step 6: Risk-profile-stratified CFR_q,t and IFR_q,t

To calculate the time-varying CFR_q,t and IFR_q,t for each risk profile, the estimated frequency of each profile in the infected population and in the deceased population (obtained from Step 4) are multiplied by each value of the estimated cumulative number of observed infections (I) or total infections (I+A), and deaths (D) obtained from each realization of the epidemic model. We find the CFR_q,t and IFR_q,t for each model realization as the number of deaths over observed infections, and number of deaths over total infections, respectively. Repeating and calculating summary statistics across the 2000 model realizations achieves the 95% CI. This process therefore accounts for the uncertainty in the estimated parameters and stochasticity in the epidemic model, but not from the risk model estimates.

3 Results

3.1 Model and parameter estimates

3.1.1 Model fits

Fig 3 summarizes the epidemic model fit with COVID-19 data for LAC from March 1, 2020 through March 1, 2021 for all disease states across multiple views: New cases, representing new daily incidence; the current number in a compartment at a specific date, relevant for understanding current prevalence rates and comparing with healthcare capacity limitations; and cumulative counts until a specific date. Observed data for available compartments are plotted as black dots, evidencing the day-to-day variability in case and death counts. The figure demonstrates that good model fits are achieved in all compartments across time. Close-ups of the timeseries of the numbers in infection states plotted against available data are provided in Section 3.1 in S1 Appendix.

Section 3.2 in S1 Appendix provides values of the five estimated parameters at two-week intervals from March 1, 2020 through March 1, 2021. The two-step parameter estimation approach (broad grid search to select a single mode of each parameter, followed by approximate Bayesian computation (ABC) using a prior distribution specified around that single mode) achieved convergence in posterior densities. Convergence is not reached for the broad grid search step, with multi-modal distributions returned for each parameter (not shown). By specifying a narrow prior distribution around a mode chosen from the broad grid search sampling, convergence around a dominant single mode is achieved in the final posterior density returned by the ABC sampling step (see Section 3.3 in S1 Appendix for density plots of prior and posterior distributions).

3.1.2 Epidemic timecourse in LAC

The LA City Mayor’s Office distinguishes between three stages of the COVID-19 epidemic in LA City and County relating to policy response measures implemented following the orders of the County Health Officer: Stage I, March 19—May 7: the initial shutdown; Stage II, May 8—June 11: the first steps towards reopening; Stage III, June 12 and beyond: greater reopening followed by “modifications” closing higher risk settings (including bars and indoor seating in restaurants) [34, 35]. The start of the school year on August 18, although virtual, marked a change in activity level and is also depicted. We characterize three waves of the epidemic occurring across these stages: a first wave, March 1—May 6, 2020, occurring between Stage I and the beginning of Stage II and peaking on April 1; a second and larger wave, May 7—October 14, 2020, beginning with Stage II and peaking on July 30 during Stage III; and a third and more than five times larger wave that began on October 15, 2020 and subsided by March 1 2021. Fig 4b–4f characterize the estimated model parameters relative to these policy stages and epidemic waves; a full time course of the epidemic and policy decisions in LAC can be found at [36].

Fig 4 — Model-estimated median curves are plotted along with the 50th% (dark shading) and 95% CI (light shading). (a) R_t, the time varying reproduction number. (b) R_eff,t, the effective reproduction number. (c) r_t, proportion observed infections. (d) α_t, the probability of hospitalization given infection; κ_t, probability of ICU admission given hospitalization; and δ_t, probability of death given admission to the ICU. (e) Population-average case fatality rate, *CFR*_t. (f) Population-average infection fatality rate, *IFR*_t.

We estimate that for most of the first wave, which coincides with Stage I, the overall observation rate was r_t = 0.19 (95% CI: 0.12,0.26) of all infections observed. Beginning in mid-April 2020, the observation rate began to steadily increase through the second wave and Stages II and III until levelling off at a value of r_t = 0.5 (0.34, 0.64) of infections observed by August 15, 2020. In the initial period of the outbreak before public behavior began to change and policy interventions were implemented, we estimate the basic reproduction number in LAC was R0 = 3.69 (3.6, 3.82). From March 12 to March 27, 2020, beginning just before the Stage I lockdown was implemented, we estimate a reduction to an R_t of 0.88 (0.77, 0.95). The corresponding reduction in transmission led to a levelling off at 36,000 (14,000, 79,000) estimated current total infections (including observed an unobserved) on April 2, 2020.

R_t remained below 1 until the end of April, reaching 1.26 (1.06, 1.39) just as the Stage II reopening began. The increase in R_t > 1 facilitated the increase in infections and the second wave of the epidemic, which peaked at 105,000 (41,000, 200,000) current total infections. Following the decrease in the susceptible population due to the sizeable number of cases accrued from the second wave, R_t,eff began to diverge appreciably from R_t. By mid-July R_t,eff had dropped below 1, whereas R_t based on behavior alone took two weeks longer to drop below 1. R_t,eff reached a lowest value of R_eff,t = 0.76(0.58, 0.91) in mid-August, just around the time the school year began virtually, where it remained through mid-October.

Following the further re-opening of personal service businesses, malls, and outdoor drinking establishments in late September through October, the third wave began. We estimate an R_t (based on behavior alone) of 2.03 (1.87, 2.22) from October 15, 2020—January 1, 2021, and a final spike of 2.40 (2.15, 2.55) from January 1—January 5, before beginning to decrease to 1.06 (0.97, 1.17) by January 15. Meanwhile, with soaring infections the effective R_eff,t dropped back below 1 by January 5, 2020, even as the R_t based on behavior alone was spiking at values above 2. R_t,eff remained below 1 through March 1, 2021. Current observed infections peaked at approximately 580,000 (275,000, 850,000) on January 13, meaning that at this time over 5% of the LAC population was currently infected, compared with just over 1% of the population with current observed infections. By March 1, 2021, current observed infections had dropped back down to pre-surge levels of October 15, 2020. When unobserved cases are accounted for, we estimate that 40%–60% of the LAC population had been infected with COVID-19 by March 1, 2021.

We identify three phases of the probabilities of disease stage progression, α_t, κ_t, and δ_t across the three waves. The highest values for all three probabilities were observed during the first wave, which as we will see below was marked by a proportionally higher fraction of infections from the higher-risk elderly population and in particular those 80+, a lot of whom coming from SNFs/care homes; α_t = 0.349(0.322, 0.38), κ_t = 0.327(0.309, 0.348), and δ_t = 0.506(0.452, 0.565). The three probabilities decreased starting at the beginning of May as infections started to increase, this time with a higher proportion coming from younger, lower-risk populations. From mid-May through mid-October, 2020, α_t = 0.153(0.118,0.19), δ_t = 0.196(0.143, 0.234), and κ_t = 0.526(0.503, 0.543), with a brief increase to values 0.184 (0.142, 0.228), 0.246 (0.179, 0.293), and 0.579 (0.553, 0.597), respectively, during the height of the second wave in mid-late July. With increasing infections in the third wave, α_t and κ_t further decreased to values 0.1 (0.077, 0.124) and 0.157 (0.114, 0.187), respectively. Meanwhile, δ_t increased to its highest value yet of 0.737 (0.704, 0.76), suggesting that while the infected population had become either lower risk for hospitalization or less inclined to seek treatment in healthcare, for those making it through to the final stages of hospitalized care, their probability of survival was low. Despite the drop in α_t during the third wave, the surge of infections meant that hospital capacity was surpassed for the first time during this wave (Fig 4 in S1 Appendix). ICU capacity was not surpassed at any point, although it was approached during the third wave.

The population-wide CFR_t and IFR_t are also characterized by three phases, peaking at the beginning of May 2020 before the probabilities of disease progression began to drop, leveling out through the summer of 2020, and decreasing further with the third wave as the initial influx of cases outpaced the deaths, which later started to catch up. On May 15, 2020, marking the majority of deaths that could have come from the end of the first epidemic wave, CFR_t = 5.56%(4.35%, 6.3%) and IFR_t = 1.1%(0.41%, 1.81%); on October 15, 2020, marking the majority of deaths that could have come from the end of the second epidemic wave, CFR_t = 2.74(2.08, 3.39) and IFR_t = 0.55(0.22, 0.96); and on March 1, 2021, marking the majority of deaths that could have come from the end of the third wave, CFR_t = 1.65(1.29, 2.06) and IFR_t = 0.32(0.16, 0.55). Here we have provided values at the end of each wave to allow the number of deaths occurred from each wave to catch up with the number of infections.

3.2 Conditional relative risks (RR) for risk factors

Table 1 displays as mean and 95% CI the marginal relative risks (RR) extracted from the literature (left column) and the RR for BMI, smoking, and comorbidity conditional on age estimated by JAM for the three risk models representing increasing disease progression: hospitalization given infection, (H|I), ICU admission given hospitalization, (Q|H), and death given ICU admission, (D|Q). We observe that the independent effect of comorbidities and obesity attenuate with increasing severity of disease; smoking may increase with age, however a very wide confidence interval for (H|I) makes this conclusion tentative.

Separately, the estimated conditional RR for each age group, estimated such that the distribution of age groups in the infected and deceased populations produced by the model matches that observed in LAC, are shown in Table 2. Our modeling approach estimates that the conditional risk of advancing to the next stage of illness relative to the reference population is equivalent across the three models, i.e. (H|I) = (Q|H) = (D|Q); we report this single set of values for each age category in Fig 2. The independent effect of age is much higher than from any other risk factor, and increases exponentially with age. Fig 11 in S1 Appendix shows the frequency of the age groups in the deceased population estimated by the model compared with the observed frequency, i.e. the distributions featured in the objective function used to estimate the conditional RR for the age groups for the three models in Step 5 of the risk model methodology.

Table 2. The estimated conditional relative risk (RR) for each age group relative to the reference age group of 19–49.

The estimated conditional RR of advancing to the next stage of illness is equivalent across the three models, (H|I), (Q|H), and (D|Q). The RR are conditional to the other risk factors and estimated from LAC infection and death data stratified by age, using the combination of the epidemic and risk models.

Age group	(H\|I) = (Q\|H) = (D\|Q)
0 − 18	0.14
19 − 49	1
50 − 64	2.59
65 − 79	6.69
80+	18.17

Open in a new tab

3.3 Risk-profile-stratified probabilities of disease stage progression

Tables 4, 9–11 in S1 Appendix show the estimated probabilities of disease stage progression for each of three models $\underline{\hat{P_{t} (H | I)}}$ , $\underline{\hat{P_{t} (Q | H)}}$ , and $\underline{\hat{P_{t} (D | Q)}}$ , as well as the estimated frequency in the overall LAC population stratified across each risk profile characterizing a unique combination of age group, BMI range, smoking status, and any comorbidity. Profile-stratified probabilities are shown for dates ranging every two weeks from May 15 2020—March 1 2021, The probability of hospitalization given infection, ICU admission given hospitalization, and death given ICU admission vary extensively across the risk profiles. Notably, the risks within specific marginal factor groups also vary extensively. To demonstrate the variability in disease stage progression probability across risk profiles, we show in Fig 5a–5c the range of values that can be taken on by profiles falling within each age group. Specifically, these figures show the mean (as a point), and minimum and maximum (error bar) of the probabilities across the composing risk profiles within each age group 0 − 18, 19 − 49, 50 − 64, 65 − 79, 80+, under each of the three risk models. The figures demonstrate that the difference in probability between profiles within an age group may vary more widely than across adjacent age groups, in particular for the probability of hospitalization given infection $\underline{\hat{P_{t} (H | I)}}$ ; the factor increase between adjacent age groups ranges from 1.25 to 2 while the ratio increase from the maximum to minimum profile ranges from a factor of two to four times.

Fig 5 — Each figure shows the mean (as a point), and minimum and maximum (error bar) probability for each age group 0 − 18, 19 − 49, 50 − 64, 65 − 79, 80+, under each risk model. (a) Range of the probabilities of ICU admission given hospitalization across each risk profile, $\underline{\hat{P_{t} (Q | H)}}$ , summarized for each age group. (b) Range of the probabilities of death given ICU admission across each risk profile, $\underline{\hat{P_{t} (D | Q)}}$ , summarized for each age group. (c) Range of probabilities of disease stage progression for the three models $\underline{\hat{P_{t} (H | I)}}$ , $\underline{\hat{P_{t} (Q | H)}}$ , and $\underline{\hat{P_{t} (D | Q)}}$ across all risk profiles within each age group.

3.4 Risk-profile-stratified CFR_q,t and IFR_q,t

Tables 12 and 13 in S1 Appendix show the median and the 95% CI of the estimated risk-profile-stratified case fatality rates CFR_q,t and infection fatality rates IFR_q,t across dates every two weeks from May 15 2020—March 1 2021. To facilitate interpretation of the variability in these quantities across risk profiles, we show in Fig 6a and 6b the range of values that can be taken on by profiles falling within each age group; the mean (point), and minimum and maximum (error bar) of the median CFR_q,t and IFR_q,t across the composing risk profiles within each age group are shown. The maximum IFR_q,t for each age group comes from individuals with at least one comorbidity, a smoking history, and severe obesity, while the minimum comes from individuals that have no comorbidities, do not smoke, and have a healthy BMI.

As with the probabilities of disease progression, the CFR_q,t and IFR_q,t vary extensively both across age groups, as well as across profiles representing different combinations of risk factors for a given age group. The factor differential between the risk profiles decreases with age, varying from a 3- to 30-fold difference across profiles within each age groups. There are also changes in the CFR_q,t and IFR_q,t across epidemic waves for the higher age groups. On May 15, 2020, marking the end of deaths from the first wave, median IFR_q,t ranged from 0.01% (0.006%, 0.019%) to 0.27% (0.14%, 0.44%) across profiles for ages 19 − 49; on March 1, 2021, marking the end of deaths from the third wave, the IFR_q,t for 19 − 49 remained the same. For ages 50 − 64 on May 15, 2020, median IFR_q,t ranged from 0.14% (0.075%, 0.23%) to 1.9% (1.0%, 3.14%); these values remained approximately constant through March 1, 2021. For ages 65 − 79 on May 15, 2020, median IFR_q,t ranged from 1.25% (0.65%, 2.04%) to 7.7% (4.02%, 12.6%); the minimum and maximum median IFR_q,t values decreased to 0.90% (0.48%, 1.41%) and 4.22% (2.23%, 6.65%), respectively, by March 1, 2021. For ages 80+ on May 15, 2020, the IFR_q,t ranged from 6.5% (3.4%, 10.7%) to 18.3% (9.5%, 29.9%) but by March 1, 2021, had dropped to 3.5% (1.9%, 5.5%) and 8.3% (4.4%, 13.3%), respectively.

We also plot the estimated median and the 95% CI for the most prevalent risk profile within each age group in Fig 6c and 6d, to compare with the extremes presented in Fig 6a and 6b. The CI for the IFR_q,t are wider than for the CFR_q,t, since the former account for uncertainty in the infection observation rate, r_t. Although wide, the CI are non-overlapping between the most-prevalent profiles for each age group visualized in the figures.

3.5 Frequency of risk factor groups at each stage of disease

Fig 7 shows the model estimated frequency of risk factor groups (age groups, obesity class groups, any comorbidity, and smoking) in the population of individuals in each stage of disease (infected, hospitalized, admitted to ICU, and deceased), compared with their frequency in the overall LAC population at two week intervals from May 15, 2020 through March 1, 2021.

The figure illustrates the effect of each risk factor on its frequency in the population at each stage of disease. Age greater than that of the reference of 19 − 49 years has the largest effect on each probability of disease progression and the most apparent increase in frequency at each stage of disease. Both comorbidities and obesity have a much larger effect on the probability of hospitalization given infection than death given hospitalization, which corresponds to a large increase in the proportion of individuals with comorbidities and obesity in the infected population and hospitalized populations, but little difference between the hospitalized, in-ICU, and deceased population. Smoking shows the smallest effect overall out of all risk factors, in part because this risk factor is infrequently represented in the risk profiles more prevalent in the infected population.

Most notably, Fig 7a makes apparent the decrease in the frequency of higher age groups in the infected population between the first and third epidemic waves, and the effect this change had on the composition of the hospitalized, in-ICU, and deceased populations. For example, individuals 80+ show a steep increase in the fraction of the population progressing to each stage of disease; between May 15, 2020 (marking the end of deaths that could have come from the end of the first epidemic wave), and March 1, 2021 (marking the end of deaths that could have come from the third wave), the observed frequency of individuals 80+ in the infected population decreased from 10% to 3% and the prevalence of this group in the hospitalized population changed from 25% to 12.5%, in the in-ICU population from 50% to 35%, and in the deceased population from 65% to 40%. Meanwhile, individuals aged 19 − 49 show a steep decrease in the fraction of the population as disease stage progresses; increasing from 50% to 62% of the infected population, 24% to 36% of the hospitalized population, 7% to 12% of the in-ICU population, and 2% to 6% of the deceased population between May 15, 2020, and March 1, 2021.

4 Discussion

This work has developed a framework for using available data on COVID-19 epidemic dynamics and prevalences of COVID-19 risk factors at the population level to estimate time-varying subpopulation-stratified probabilities of disease progression and CFR and IFR during three epidemic waves in Los Angeles County from March 1, 2020, through March 1, 2021. In the absence of individual-level data, the technical contribution of this work was to integrate a dynamic epidemic model with a risk modeling approach to estimate conditional effects from available marginal data and to subsequently produce time-varying subpopulation-stratified estimates for LAC. To reflect the uncertain knowledge of many parameters and the understanding that in non-linear systems small variations to specific parameters can result in large impacts in outputs [37], we account for uncertainty in all results through the use of a stochastic epidemic model and a Bayesian approach to parameter estimation. The epidemic modeling framework produces estimates with confidence intervals of the population-wide reproductive number, case observation fraction, probabilities of disease progression, and CFR and IFR. On its own, the risk model estimates the conditional effects of each risk factor and therefore the overall effect of risk factors in combination. These adjusted effects have not been typically reported in observational studies on COVID-19, yet help to understand more precisely what subpopulations are at highest risk of advancing to each stage of disease. Integration of the risk model with the epidemic model allows the comparison of dynamic outcomes and parameters across the overall population, age groups, and more fine-grained subpopulations in LAC representing age and combinations of other risk factors for severe COVID-19 illness. Such fine-grained results can be useful in understanding disparities in the effect of the epidemic on different groups in LAC, and can inform studies involving targeted subpopulation-level policy interventions [16].

We focus our modeling framework on the risk factors age, comorbidities, obesity, and smoking status as these demographic and medical conditions have consistently been identified across various studies as factors inducing the probability of progressing to severe illness given COVID-19 infection [8]. We do not include race/ethnicity as a factor, because although strongly predictive of the risk of overall mortality from COVID-19 [17], it has been shown that increased exposure risk and not race per se explains racial disparities in COVID-19 health outcomes [38, 39].

Analyses demonstrate that the risk of severe illness and death from COVID-19 infection have decreased over time and moreover vary tremendously across subpopulations representing combinations of the four modeled risk factors, which we call risk profiles, suggesting that it is inappropriate to summarize epidemiological parameters for the entire population and epidemic time period. This includes variation not only across age groups, but also within age strata combined with other risk factors analyzed in this study. The highest IFR for each age strata come from profiles including comorbidities, obesity Class 2 or 3, and current smoking status. The factor differential between the risk profiles with highest and lowest IFR within each age strata decreases with age. At the end of the first epidemic wave, we find median IFR ranging from 0.01% to 0.27% across risk profiles for the age group 19 − 49, an almost 30-fold difference; ranging from 0.14% to 1.9% across profiles within age group 50 − 64, a 14-fold difference; from 1.25% to 7.7% for ages 65 − 79, a 6 fold difference; for ages 80+, the range was from 6.5% to 18.3%, a 3-fold difference.

Our age-stratified IFR estimates during dates corresponding to the first and second epidemic waves in LAC (May—October, 2020) are comparable to those found in recent notable reviews and modeling studies including a meta-regression of seroprevalence data from 11 European countries and 12 U.S. locations [8], a study comparing mortality data from 45 countries with 22 seroprevalence studies [40], and a model-based analysis for estimating IFR during in New York City’s large first epidemic wave (March—May, 2020) [10].

A feature of our IFR estimates for the higher age groups (65+) is that they decreased in the third epidemic wave; median IFR for ages 65 − 79 ranged from 1.25% to 7.7% after the first wave compared with 0.90% to 4.22% after the second wave; for ages 80+ the median IFR ranged from 6.5% to 18.3% after the first wave and had dropped to a range of 3.5% to 8.3% after the third wave. The decrease in IFR for the same profiles during the third wave may be explained by three factors. First, the first and second waves were characterized by a large number of outbreaks in nursing homes/SNFs; it has been demonstrated that when high rates of infection have occurred among nursing home residents, IFRs for the same age group (and the overall population-average IFR) will be significantly greater then when cases in care-home-aged populations have been in the general community due to greater frailty in care home populations [40]. Second, there may have been improvements in medical treatment over the course of the epidemic [41, 42]. Third, a limitation of our model-based analysis for older risk strata is that we assume unobserved infections are equally distributed across all risk profiles, whereas there are likely to be far fewer unobserved or asymptomatic infections for those at higher risk of severe outcomes. For risk profiles including individuals age 65+, and for dates during the third wave when the number of infections spiked especially among younger age groups, our IFR may therefore be underestimated and the true values lie between IFR and CFR estimates. For the most prevalent profile within age group 65 − 79, the CFR was 6.2% (95% CI: 4.2%, 9.3%), and for the most prevalent profile within age group 80+, the CFR was 21.7% (14.9%, 32.3%). More generally, in interpreting our results for policy implications, emphasis should be placed on the relative differences in IFR across risk profiles and the understanding that the IFR for a specific age strata represents an average across a wide variation given the presence or absence of other risk factors.

Our overall IFR estimate for LAC at the end of the first epidemic wave of 1.11% (95% CI: 0.41%, 1.81%) is similar to the overall IFR estimated in the NYC study after the first large wave, when only confirmed deaths are accounted for, of 1.10% [10]. IFR estimates at the end of the second wave are equivalent to the global estimate of 0.5% as of September, 2020 coming from the study by O’Driscoll et al. (2020) utilizing mortality and seroprevalence data across 45 countries [40]. The decrease in IFR between the first and second waves follows from the decrease in the prevalence of populations aged 65+ in observed infections from approximately 23% on April 15, 2020 to 12% and lower as of July 15, 2020 [14], but may also reflect other changes in the demographic composition of infected individuals including other at-risk subpopulations for which stratified data in LAC is not available (e.g., individuals with comorbidities). The slight decrease in overall IFR at the end of the third wave of 0.32% (0.16%, 0.55%) may reflect the decrease in age groups 65+ in the observed infected population from 12% on July 15, 2020, to 10% on December 12, 2020, and a proportional increase in unobserved infections from younger age groups.

Our estimates may misrepresent the true IFR from COVID-19 in LAC because we account only for underascertainment of infections and not of deaths [43, 44]. Although we assume that the underascertainment of deaths is much lower than for infections, the percentage is likely to have been the highest during the third wave given that the percentage of documented at-home deaths increased from 4% in the first wave to 9% in the third wave (E. Garcia, personal communication based on unpublished data from the state of California, April 20, 2021); this may most affect estimates for older age groups in which unaccounted for deaths are likely to be the highest [40, 45]. Even at the lowest overall IFR estimated for LAC, a key finding is that COVID-19 is substantially more deadly than seasonal influenza, which has a population-average IFR of approximately 0.05% [8, 46].

A critical factor determining our IFR estimates is the fraction of cases that are detected, r(t). We estimate that this was 19% (95% CI: 12%, 26%) in the first wave of the outbreak and had stabilized to 50% (34%, 64%) in the late summer through the fall, following the peak of the second wave. There is insufficient serological information for LAC to provide confirmatory evidence behind these estimates, and CDC studies of serology carried out in various settings throughout the USA (not including LAC) during the first epidemic wave vary from as low as 2.3% of infections observed to as high as 30% of infections detected [26]. An additional piece of evidence that supports low detection rates is the low fraction of infections that seek medical attention, since this informs how many infections are of at least moderate COVID. A recent study has used serological studies, participatory surveillance systems, and mathematical modeling to estimate the underdetection of infections in France and found that only 31% of individuals with COVID-19-like symptoms consulted a doctor in the study period. Although a different context, this result suggests that large numbers of symptomatic COVID-19 cases do not seek medical advice and therefore many of these likely do not show up in the official register of cases [47].

This study is prone to typical limitations occurring when modeling epidemiological dynamics in the context of a rapidly evolving infectious disease outbreak. We model the major epidemic trends across the three waves using time-varying parameters, however this approach does not enable capturing all of the complexity of the changing epidemic. Due to the model specification and the concurrent estimation of multiple time-varying parameters, multiple joint parameter solutions exist, resulting in multimodal posterior distributions. We attempted to address this by employing a two-step parameter estimation approach, first using broad grid search to identify and choose a mode of a posterior distribution, and second using approximate Bayesian computation to identify the shape of the distribution around the chosen mode. This process involved the use of “expert opinion” to guide the choice of parameters towards most correctly representing the major trends and peaks at each epidemic wave, at the expense of accurately capturing deviations from the major trendlines.

Data informing the conditional effect estimates within the risk model were therefore aggregated across early, large, retrospective studies from China (for comorbidities and smoking) [3] and NYC (for BMI) [2] on the fractions of hospitalization, ICU admission, and death by individual risk factors. We chose these studies due to the limited body of research reporting marginal or conditional risk effects for the same cohort across the three modeled stages of disease progression, returned by a Pubmed search at the start of this study. While we attempt to reframe these results for the demographic composition of the LAC population through regional data on the prevalence of risk factors and the correlation structure between risk factors, there may be differences in the underlying study population or treatment setting between China, NYC, and LAC that would lead to heterogeneity in effect estimates. However, we believe that the estimates from the Chinese studies do represent population-based estimates as these samples avoid some of the biases present from other potentially available studies, but with highly selected samples.

While this work has focused on demonstrating the substantial heterogeneity in risk probabilities and IFR across subpopulations, it employs a single-population epidemic model. LAC is a large county consisting of many composing cities and communities, each with their own epidemic processes unfolding at different rates [48]. Extreme disparities in infection incidence and mortality have been observed for different communities within LAC. This includes incidence rates up to 15 times higher in low-income neighborhoods in East LA with high percentages of essential workers than in affluent communities in West LA [14], and COVID deaths as a proportion of the typical total deaths 11.6 times higher for young, foreign-born Latinx than for young, U.S.-born, non-Hispanics (for California) [17]. These large differences in infection incidence and death will undoubtedly translate into large differences in probabilities of disease progression and IFR. However, at the time of beginning this study we did not have the data to formally model subpopulation-specific probabilities of exposure or the data on hospitalization and death counts for different groups necessary to fit the parameters of a multi-population model. The approach we developed is a way to use commonly available population-level epidemic timeseries data to model multiple groups in a single population, and combine these population-level estimates with prevalence rates of risk factors to produce stratified estimates for different subpopulations, specific to the region of LAC. This adaptive approach allowed us to provide epidemic trends and risk estimates that informed the LAC Department of Public Health and other decision-makers in real-time during the emerging epidemic.

Future work will develop multi-population models that estimate subpopulation-stratified probabilities of infection, of illness progression, and IFR, accounting for key risk factors of both exposure to infection and severe illness given infection. Risk factors for exposure are not limited to age and health conditions, but also include more diverse socioeconomic factors including occupation and essential worker status, neighborhood of residence, housing overcrowding, multigenerational households, economic status, and access to PPE [48–52]. In the meantime, the subpopulation-stratified estimates of disease progression and IFR produced using the framework presented here can be used to evaluate policy decisions that may involve both population-wide interventions and interventions that target specific subpopulations at risk of developing severe illness given infection, for example isolating or prioritizing vaccination for the elderly or those with other health-related risk factors [16].

Supporting information

S1 Appendix

(PDF)

Click here for additional data file.^{(1.2MB, pdf)}

Acknowledgments

We would like to acknowledge data processing and visualization support from Claire Jacquillat.

Data Availability

The data used in the study are openly available to the public without application or screening for access come from the GitHub page of the Los Angeles Times Data and Graphics Department. In addition, these data are provided in a public repository at: https://github.com/AbigailHorn/COV2-LA/tree/master/data.

Funding Statement

ALH was supported from an NIH Ruth L Kirschstein National 591 Research Service Award (NRSA) Institutional Training Grant T32 5T32CA009492-35. 592 DVC was supported by NIH P01CA196569 and NCI P30CA014089. EH was supported 593 by NIH P01CA196569. DVC and ALH are also funded by COVID-19 Keck Research 594 Fund from the University of Southern California.

References

1. Stokes EK, Zambrano LD, Anderson KN, Marder EP, Raz KM, Felix SEB, et al. Coronavirus disease 2019 case surveillance—United States, January 22–May 30, 2020. Morbidity and Mortality Weekly Report. 2020;69(24):759. doi: 10.15585/mmwr.mm6924e2 [DOI] [PMC free article] [PubMed] [Google Scholar]
2. Petrilli CM, Jones SA, Yang J, Rajagopalan H, O’Donnell LF, Chernyak Y, et al. Factors associated with hospitalization and critical illness among 4,103 patients with COVID-19 disease in New York City. MedRxiv. 2020;. [Google Scholar]
3. Guan Wj, Ni Zy, Hu Y, Liang Wh, Ou Cq, He Jx, et al. Clinical characteristics of coronavirus disease 2019 in China. New England journal of medicine. 2020;382(18):1708–1720. doi: 10.1056/NEJMoa2002032 [DOI] [PMC free article] [PubMed] [Google Scholar]
4. Zhou F, Yu T, Du R, Fan G, Liu Y, Liu Z, et al. Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study. The Lancet. 2020;. doi: 10.1016/S0140-6736(20)30566-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
5. Williamson EJ, Walker AJ, Bhaskaran K, Bacon S, Bates C, Morton CE, et al. Factors associated with COVID-19-related death using OpenSAFELY. Nature. 2020;584(7821):430–436. doi: 10.1038/s41586-020-2521-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
6.U S Department of Health and Human Services. Social Determinants of Health [online]; 2020. Available from: https://www.healthypeople.gov/2020/topics-objectives/topic/social-determinants-of-healthexternal.
7. Egede LE, Walker RJ. Structural Racism, Social Risk Factors, and Covid-19—A Dangerous Convergence for Black Americans. New England Journal of Medicine. 2020;383(12):e77. doi: 10.1056/NEJMp2023616 [DOI] [PMC free article] [PubMed] [Google Scholar]
8. Levin AT, Hanage WP, Owusu-Boaitey N, Cochran KB, Walsh SP, Meyerowitz-Katz G. Assessing the age specificity of infection fatality rates for COVID-19: systematic review, meta-analysis, and public policy implications. European journal of epidemiology. 2020; p. 1–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
9. Verity R, Okell LC, Dorigatti I, Winskill P, Whittaker C, Imai N, et al. Estimates of the severity of coronavirus disease 2019: a model-based analysis. The Lancet infectious diseases. 2020;. doi: 10.1016/S1473-3099(20)30243-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
10. Yang W, Kandula S, Huynh M, Greene SK, Van Wye G, Li W, et al. Estimating the infection-fatality risk of SARS-CoV-2 in New York City during the spring 2020 pandemic wave: a model-based analysis. The Lancet Infectious Diseases. 2020;. doi: 10.1016/S1473-3099(20)30769-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Los Angeles County Department of Public Health. Los Angeles County Case Counts, Crude Rates by Selected Region; 2020. Available from: https://github.com/AbigailHorn/COV2-LA/tree/master/data.
12. Rothman KJ, Greenland S, Lash TL. Modern epidemiology. Lippincott Williams & Wilkins; 2008. [Google Scholar]
13. Griffith G, Morris TT, Tudball M, Herbert A, Mancano G, Pike L, et al. Collider bias undermines our understanding of COVID-19 disease risk and severity. medRxiv. 2020;. doi: 10.1038/s41467-020-19478-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Data LAT, Department G. california-coronavirus-data GitHub; 2021. Available from: https://github.com/datadesk/california-coronavirus-data.
15. Newcombe PJ, Conti DV, Richardson S. JAM: a scalable Bayesian framework for joint analysis of marginal SNP effects. Genetic epidemiology. 2016;40(3):188–201. doi: 10.1002/gepi.21953 [DOI] [PMC free article] [PubMed] [Google Scholar]
16. Horn AL, Jiang L, Washburn F, Hvitfeldt E, de la Haye K, Nicholas W, et al. Estimation of COVID-19 risk-stratified epidemiological parameters and policy implications for Los Angeles County through an integrated risk and stochastic epidemiological model. medRxiv. 2020;. doi: 10.1101/2020.12.11.20209627 [DOI] [PMC free article] [PubMed] [Google Scholar]
17. Garcia E, Eckel SP, Chen Z, Li K, Gilliland FD. COVID-19 mortality in California based on death certificates: disproportionate impacts across racial/ethnic groups and nativity. Annals of Epidemiology. 2021;58:69–75. doi: 10.1016/j.annepidem.2021.03.006 [DOI] [PMC free article] [PubMed] [Google Scholar]
18. Bretó C, He D, Ionides EL, King AA, et al. Time series analysis via mechanistic models. The Annals of Applied Statistics. 2009;3(1):319–348. [Google Scholar]
19. Mode CJ, Sleeman CK. Stochastic processes in epidemiology: HIV/AIDS, other infectious diseases, and computers. World Scientific; 2000. [Google Scholar]
20. Keeling MJ, Rohani P. Modeling infectious diseases in humans and animals. Princeton University Press; 2011. [Google Scholar]
21. Diekmann O, Heesterbeek J, Roberts MG. The construction of next-generation matrices for compartmental epidemic models. Journal of the Royal Society Interface. 2010;7(47):873–885. doi: 10.1098/rsif.2009.0386 [DOI] [PMC free article] [PubMed] [Google Scholar]
22. Liu Y, Gayle AA, Wilder-Smith A, Rocklöv J. The reproductive number of COVID-19 is higher compared to SARS coronavirus. Journal of travel medicine. 2020;. doi: 10.1093/jtm/taaa021 [DOI] [PMC free article] [PubMed] [Google Scholar]
23. Bertozzi AL, Franco E, Mohler G, Short MB, Sledge D. The challenges of modeling and forecasting the spread of COVID-19. Proceedings of the National Academy of Sciences. 2020;117(29):16732–16738. doi: 10.1073/pnas.2006520117 [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Unacast. The Unacast Social Distancing Scoreboard; 2020. Available from: https://unacast.com/post/the-unacast-social-distancing-scoreboard.
25. Oliver N, Lepri B, Sterly H, Lambiotte R, Deletaille S, De Nadai M, et al. Mobile phone data for informing public health actions across the COVID-19 pandemic life cycle. Science. 2020;. doi: 10.1126/sciadv.abc0764 [DOI] [PMC free article] [PubMed] [Google Scholar]
26. Havers FP, Reed C, Lim T, Montgomery JM, Klena JD, Hall AJ, et al. Seroprevalence of antibodies to SARS-CoV-2 in 10 sites in the United States, March 23-May 12, 2020. JAMA Internal Medicine. 2020;. doi: 10.1001/jamainternmed.2020.4130 [DOI] [PubMed] [Google Scholar]
27.of Public Health CD. California Department of Public Health Open Data Portal; 2021. Available from: https://data.ca.gov/dataset/covid-19-hospital-data1.
28. Pfeiffer RM, Gail MH. Absolute risk: methods and applications in clinical management and public health. CRC Press; 2017. [Google Scholar]
29.Centers for Disease Control and Prevention (CDC). National Health and Nutrition Examination Survey Data (NHANES); 2017-2018. Available from: https://www.cdc.gov/nchs/nhanes/index.htm.
30.U S Census Bureau via R package tidycensus. Selected age groups, 2014-2018 American Community Survey 5-year estimates; 2018. Available from: https://CRAN.R-project.org/package=tidycensus.
31.of Public Health LACD. Los Angeles County Health Survey (LACHS); 2018. Available from: http://publichealth.lacounty.gov/ha/hasurveyintro.htm.
32.UCLA Center for Health Policy Research CA Los Angeles. California Health Information Survey; 2018. Available from: http://ask.chis.ucla.edu.
33.of Public Health LACD. LA County Daily COVID-19 Data Dashboard; 2020. Available from: http://publichealth.lacounty.gov/media/coronavirus/data/index.htm.
34.Los Angeles County Department of Public Health. Order of the Health Officer, Reopening safer at work and in the community for control of COVID-19; October 6, 2020. Available from: http://www.ph.lacounty.gov/media/Coronavirus/docs/HOO/2020_10_06_HOO_Safer_at_Home.pdf.
35.Los Angeles Mayor’s Office. Safer L.A. COVID-19; November 15, 2020. Available from: https://corona-virus.la/SaferLA.
36.Los Angeles Mayor’s Office. Safer L.A. COVID-19 Timeline; November 24, 2020. Available from: https://lh3.googleusercontent.com/-awbM5oar9Bk/X71VEq0xBmI/AAAAAAAAJ1w/fWXuO3IaEYolw8fMxqC3VLAgE1m6vTeNQCK8BGAsYHg/s0/2020-11-24.png.
37. Adam D. Simulating the pandemic: What COVID forecasters can learn from climate models. Nature. 2020;. doi: 10.1038/d41586-020-03208-1 [DOI] [PubMed] [Google Scholar]
38. Hawkins D. Differential occupational risk for COVID-19 and other infection exposure according to race and ethnicity. American journal of industrial medicine. 2020;63(9):817–820. doi: 10.1002/ajim.23145 [DOI] [PMC free article] [PubMed] [Google Scholar]
39. Zelner J, Trangucci R, Naraharisetti R, Cao A, Malosh R, Broen K, et al. Racial disparities in coronavirus disease 2019 (COVID-19) mortality are driven by unequal infection risks. Clinical Infectious Diseases. 2021;72(5):e88–e95. doi: 10.1093/cid/ciaa1723 [DOI] [PMC free article] [PubMed] [Google Scholar]
40. O’Driscoll M, Dos Santos GR, Wang L, Cummings DA, Azman AS, Paireau J, et al. Age-specific mortality and immunity patterns of SARS-CoV-2. Nature. 2021;590(7844):140–145. doi: 10.1038/s41586-020-2918-0 [DOI] [PubMed] [Google Scholar]
41.Wetsman N. Doctors are better at treating COVID-19 patients now than they were in March; 2020. Available from: https://www.theverge.com/2020/7/8/21317128/improved-covid-treatment-hospitals-remdesivir-dexamethasone.
42. Centers for Disease Control and Prevention (CDC). interim clinical guidance for management of patients with confirmed 2019 novel coronavirus (2019-nCoV) Infection. 2020;12. [Google Scholar]
43. Woolf SH, Chapman DA, Sabo RT, Weinberger DM, Hill L, Taylor DD. Excess deaths from COVID-19 and other causes, March-July 2020. Jama. 2020;324(15):1562–1564. doi: 10.1001/jama.2020.19545 [DOI] [PMC free article] [PubMed] [Google Scholar]
44. Bilinski A, Emanuel EJ. Covid-19 and excess all-cause mortality in the US and 18 comparison countries. JAMA. 2020;. doi: 10.1001/jama.2020.20717 [DOI] [PMC free article] [PubMed] [Google Scholar]
45.Our World in Data. A pandemic primer on excess mortality statistics and their comparability across countries.; 2020. Available from: https://ourworldindata.org/covid-excess-mortality.
46.Centers for Disease Control and Prevention (CDC). Medical Visits, Hospitalizations, and Deaths in the United States—2017–2018 influenza season [online]. 2019;.
47. Pullano G, Di Domenico L, Sabbatini CE, Valdano E, Turbelin C, Debin M, et al. Underdetection of COVID-19 cases in France threatens epidemic control. Nature. 2020;. [DOI] [PubMed] [Google Scholar]
48. Harris JE. Los Angeles County SARS-CoV-2 Epidemic: Critical Role of Multi-generational Intra-household Transmission. Journal of Bioeconomics. 2021;23(1):55–83. doi: 10.1007/s10818-021-09310-2 [DOI] [Google Scholar]
49. Tai DBG, Shah A, Doubeni CA, Sia IG, Wieland ML. The disproportionate impact of COVID-19 on racial and ethnic minorities in the United States. Clinical Infectious Diseases. 2020;. [DOI] [PMC free article] [PubMed] [Google Scholar]
50. Chen YH, Glymour M, Riley A, Balmes J, Duchowny K, Harrison R, et al. Excess mortality associated with the COVID-19 pandemic among Californians 18-65 years of age, by occupational sector and occupation: March through October 2020. medRxiv. 2021;. [DOI] [PMC free article] [PubMed] [Google Scholar]
51. Baker MG, Peckham TK, Seixas NS. Estimating the burden of United States workers exposed to infection or disease: a key factor in containing risk of COVID-19 infection. PloS one. 2020;15(4):e0232452. doi: 10.1371/journal.pone.0232452 [DOI] [PMC free article] [PubMed] [Google Scholar]
52. Mutambudzi M, Niedwiedz C, Macdonald EB, Leyland A, Mair F, Anderson J, et al. Occupation and risk of severe COVID-19: prospective cohort study of 120 075 UK Biobank participants. Occupational and Environmental Medicine. 2021;78(5):307–314. doi: 10.1136/oemed-2020-106731 [DOI] [PMC free article] [PubMed] [Google Scholar]

PLoS One. doi: 10.1371/journal.pone.0253549.r001

Decision Letter 0

Martial L Ndeffo Mbah

18 Feb 2021

PONE-D-21-00882

An integrated risk and epidemiological model to estimate risk-stratified COVID-19 outcomes and policy implications for Los Angeles County

PLOS ONE

Dear Dr. Horn,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Apr 04 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

We look forward to receiving your revised manuscript.

Kind regards,

Martial L Ndeffo Mbah, Ph.D

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information

Additional Editor Comments:

Please, thoroughly address reviewers comments, especially reviewer #2. These should greatly improve the readability of the manuscript and the quality of the manuscript.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The manuscript was a pleasure to read, and the supplement was also well organized and helped me understand some of the statements made in the manuscript. Unfortunately I have not stayed up to date on the clinical epidemiology or policy issues surrounding COVID-19 so I will only be able to make comments on technical points.

Substantive comments:

1a. Line 72: Can the authors please describe why there is no A->D transition? I imagine COVID diagnosis obtained only through autopsy is not common, but some explanation here would help.

1b. It may be worth stating upfront that the binary classification of infections as A or I only represents detection/entering the health system, and not asymptomatic/symptomatic, as in other modeling studies.

2. Line 77: I would appreciate some more explanation of what "proxy with error" means, with respect to the Q->V transition. In particular, what real process when a patient enters the ICU does that compartment represent? Does it mean that someone is only recorded as an ICU COVID patient when they are given a ventilator?

3. Section 2.1.1: if beta(t) is the parameter being estimated by ABC, what is the relationship of that estimated curve to mu(t) in the previous paragraph (starting on line 90)?

4. Line 124: I would appreciate some comments from the authors on convergence (or not) of the ABC sampling for the 6 parameters in the main text. If time permits, perhaps trace plots or histograms in the appendix would be a nice addition.

5. Line 161: For those of us unfamiliar with JAM, a sentence describing the assumptions needed to combine the correlation structure and marginal effects to get conditional effects would be a valuable addition.

6. Line 187: I understand that this accounts for uncertainty in estimated parameters and stochastic variation in trajectories, but does it also account for uncertainty in the estimates from the risk model?

7. Line 192: How does changing R(t) adjust beta(t) and mu(t)? I'm still somewhat unclear on the relationship of these quantities.

Minor comments:

1. Abstract: CFR is used before it is defined

2. Line 70: please remind readers which compartments S,E,R are.

3. Line 105: there is an inconsistent use of subscript t and function of t for parameters alpha, kappa, delta between text and Figure 1. Also, p_v is not defined anywhere, what is it?

4. Line 204: text references Fig 4a but looks like it should say Fig 4b.

5. Line 228: text references Fig 4b but looks like it should say Fig 4a.

6. Tables 2 and 3 are very hard to read (too small), could they be enlarged somewhat?

Reviewer #2: In this study, the authors present an analysis of the COVID-19 transmission dynamics in Los Angeles county (LAC). They aim to estimate the probability of severe illness depending on the risk profile of the individuals, and analyse the impact different control measures could have had on the number of infected individuals. In order to carry out this analysis, the authors developed the following workflow:

1 - Estimate the population-wide proportion of hospitalised cases, proportion of hospitalisations leading to Intensive Care Unit (ICU) admission, proportion of deaths given ICU admission, proportion of reported cases, and reproduction number over time using a compartmental model and Approximate Bayesian Computation.

2 - Estimate the conditional relative risk of different factors on the proportion of hospitalisations, ICU admissions and deaths using marginal relative risks from the previous literature and Joint Analysis of Marginal summary statistics (JAM).

3 - Estimate the proportion of cases from different risk profiles using the marginal risk factor in LAC, and the age distribution the reported cases.

4 - Use the population-wide estimates from step 1, the conditional risk estimates from step 2 and the distribution of the risks profiles of the infected population from step 3 to deduce the risks of hospitalisation given infection, ICU admission given hospitalisation, and death given ICU admission for each risk profile.

5 - Generate simulations of outbreaks using different scenarios of Non-Pharmaceutical Interventions.

This paper mixes different complex methods and use different publicly available data sources. I appreciate the time and effort the authors have spent to provide the code and make their analysis reproducible on a Github repository. I also want to highlight the thorough Appendix detailing the approach used in every step of the analysis.

Overall, I believe this is an interesting piece, which can provide important contributions to the field. Nevertheless, I think there needs to be some clarification on some stages of the study. It took me time to understand the justifications behind each step of the workflow, notably what outputs were needed for the final results, and the uncertainty of some of the estimated parameters.

Before getting to major and minor points, I had an overarching comment on the paper:

Because of the number of different stages and models developed in the study, I think the message / workflow sometimes gets lost, or at least I got confused a number of times. I wonder whether the authors would consider splitting this analysis into two papers: One focused on estimating the distribution of risk profiles in infected cases and the probabilities of hospitalisation (ICU and death) associated with each profile (ie steps 2 to 4, and the first objective of this paper), and one focused on the simulations of different scenarios of NPIs and vaccine coverage using the risks profiles (mostly Step 5). This way, the different scenarios of NPI included in the simulations could be deepened and more realistic, and the authors could give more information in the Main Paper on the JAM method and logistic regression they implemented (and use the prior distribution on alpha, kappa and delta for their estimations inn Step 4). I do not think this is a requirement for publication, but I believe this would make it easier for the reader to follow the arguments the authors are presenting.

Major points

1/ Summarise the overall workflow in a Figure.

In the summary of my review, I tried to summarise the workflow the authors implemented from the Main text and Appendix. Although I hope I understood it correctly, it took me time and a few read-throughs to figure out how and why the authors went from one stage to another. I think the authors should add a figure summarising each stage of the analysis, along with their input and output, and how they connect to one another. I believe this would be very helpful for the readers, and would prevent a lot of confusion.

Along with that, the authors use a lot of different notations in each section of the Appendix. I believe they should summarise all the notations of Section 2 in a table (similar to what they did in Table 1, 2, and 3 of the Appendix). This would facilitate the reading and general understanding of their analysis.

2/ The implementation of the compartmental model should be clearer

I am not sure I understand why Approximate Bayesian Computation (ABC) was needed in the first step. I thought the authors could have fitted a deterministic model to the daily number of new infections / hospitalisations / ventilations / deaths by generating a likelihood function from these measures and running a Monte Carlo Markov Chain to estimate the parameters. What made the ABC approach more relevant?

From the Appendix tables 4, 5 and 6, the authors use very informative priors on most of their parameters (especially alpha, delta, and kappa). I think the authors should compare the prior to the posterior distribution for these parameters to highlight whether the fitting procedure was different from the prior assumptions.

The authors mention that Figure 2 “demonstrates that good model fits are achieved in all compartments across time.” I am not certain I find all the panels of Figure 2 convincing (for example the time series of “New Deaths” and “New in Hospital” show a lot of daily variations, which makes it harder to evaluate whether the fit is convincing). Could the authors aggregate the data and the simulation by week and show the match between the weekly time series? This could remove part of the dispersion observed in Figure 2 and make it easier for the reader to compare the inferred time series to the data.

In the Appendix (subsection 2.3.1), the authors explain the summary statistics used for their ABC approach. I am not sure I understood the last notation. Did they use the total number of cases infected, hospitalised, ventilated and deceased (before 15th-25th March), along with the number of cases that recovered before 4th April? If so, why did they only include the number of recovered cases early in the outbreak?

3/ Some clarifications on the risk model and the uncertainty of the estimates are needed

I am not familiar with the JAM method the authors used to compute the conditional risk effects. I believe they should add a couple of sentences to explain how this method matches the two inputs it uses.

In line 316, the authors state that “The independent effect of comorbidities and obesity attenuate with increasing severity of disease, while that of age and smoking increase”. I believe the authors’ conclusions should reflect that the 95% CIs are quite large (especially for H|I), which makes this comparison seem excessive (eg: the CI of the condition RR of smoking on H|I is between .21 and 14.52).

The authors consider that BMI and age are ordinal variables, did the authors explore the idea of using different RR for each age category (ie the RR between 21-40 and 0-19 would be different from the RR between 41-59 and 21-40)? Would it be possible (and worth testing) that the risks of severe illness abruptly increase for the highest age group / BMI?

Finally, I thought the tables 2 and 3 were very hard to read and interpret. I do not really see what conclusions to draw from these. I think the authors should consider using a graphical representation rather than a table, or greatly reduce the number of rows / columns. Furthermore, the authors only show the median estimates, whereas they reported very large confidence intervals for some of the conditional RRs. I think they should report and reflect on the confidence intervals of these estimates.

4/ The scenario implemented could be more realistic

The authors currently consider 9 scenarios representing a combination of isolating a fraction of the individuals older than 65 years old (0%, 50% or 100%), and adopting different levels of NPIs (None, Moderate or Observed). I believe this idea is relevant, especially in the context of vaccination campaigns aimed at certain age groups, but most of the scenarios implemented here are unrealistic. Indeed, I think complete isolation of older people is improbable (multi-generational households, care homes..), and imagining a situation where uncontrolled transmission would trigger absolutely no change in behaviour (or policy) is also unthinkable. I believe there would be great value in a more consistent exploration of the impact of a gradually increasing proportion of older individuals being protected (or isolated), mixed with different moderate values of stringency of control measures.

In line 44 the authors state, “Results highlight […] the efficacy of targeted subpopulation-level policy interventions in LAC.” I do not think this sentence is in agreement with the results shown by the authors. Indeed, there was only one set of simulations where a lockdown was not implemented and the number of deaths was similar to the data, and it came at the cost of overwhelming hospitals. Furthermore, this result relied on a complete isolation of those 65+, which is unrealistic. I would argue the results highlight the efficacy of a complete lockdown in limiting transmission, which is also what the authors write in the abstract and in the discussion. Therefore, I think they should remove this sentence.

Minor points

Footnote P5 “The susceptible population does not decrease sizeably during the time period considered in this study.” According to the first panel of Figure 2, Up to 25% of the population was infected between March and November, do the authors think this could potentially impact the effective reproduction number? If not, what decrease would they consider to be sizeable?

L288-290: The authors mention the hospital and ICU capacity limits, I think it may be relevant to add these thresholds to the right panel of Figure 3, in order to facilitate the comparison between the simulated number of hospitalisations and the maximum capacity.

L345-351: I think the explanation of how the risk profiles were grouped should come before Table 2 is described, since this is one of the columns of Table 2. I would therefore suggest moving these sentences up.

In the compartmental model, the authors estimate the parameter Pv, representing the proportion of hospitalised cases who need ventilation. I could not find the prior distribution or the estimated values of Pv in the Main Text, or in the supplement.

I did not find any plot of the values of r(t) estimated by the model, I think the authors should plot the distribution of all the parameters estimated.

In the Appendix, Subsection 2.3.1, does “D” stand for the Data or the death time series, I think the letter applies to both here, is it a mistake?

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2021 Jun 24;16(6):e0253549. doi: 10.1371/journal.pone.0253549.r002

Author response to Decision Letter 0

12 May 2021

(Please see Response to Reviewers file in pdf)

Dear reviewers,

Thank you for the extremely careful review of our manuscript and providing such comprehensive and constructive reviews. We greatly appreciate the time and thoroughness that was devoted to its review and which has provided us with these comprehensive suggestions for improvement.

We have attempted to address every suggestion made by the two reviewers in the main text and the Appendix. We have documented the changes in the following point-by-point letter that responds to each comment raised by the two reviewers.

We would like to note that we followed Reviewer 2’s suggestion to split the original manuscript into two papers. The Part I paper, this paper, focuses on estimating the distribution of risk profiles among infections, hospital and ICU admission, and deaths, as well as the CFR and IFR associated with each profile. The Part II paper, which will be submitted at a later date, will focus on policy analyses that utilize the risk estimates from this (Part I) paper as parameter inputs. Splitting our manuscript into two papers allows us to focus each paper on a single objective, as a result making each paper easier for readers to follow. In the case of this Part I paper, focusing on the risk-profile-stratified estimates has allowed us to respond to many of the comments from both the reviewers and more clearly outline and describe our multi-step analysis plan in the methods section, including both the epidemic model and risk model.

Given that we will not be changing the methods or models underlying our analysis, but rather splitting the analysis and text into two papers, we still plan to submit our Part II paper for revision as part of the review process for the original manuscript (PONE-D-21-00882). Because of the great time and attention that both reviewers have already invested in understanding our methods, we hope that both manuscripts may be re-reviewed by the two reviewers. Please advise on how we should handle the re-submission or submission of Part II of the paper.

In the following, we begin by summarizing the major changes made in our revised manuscript. We then provide the point-by-point response letter. We note that we have repeated some of the content from the summary of major changes in the point-by-point responses, and apologize for the resulting length of this review letter; we thought it would be most clear for the editor and reviewers to follow the changes made to the manuscript by first reading a summary of all changes, and second reading the more detailed responses to each point.

We look forward to your review and thank you in advance for your time and attention to our revised manuscript.

Kind regards,

Abigail Horn, Ph.D

David Conti, Ph.D

Summary of major changes (beyond splitting the paper into Part I and Part II)

Epidemic model (methods and results)

With more data available since the review, we have expanded the time frame of analysis to include the full first year of the epidemic in Los Angeles County (LAC), from March 1, 2020 through March 1, 2021, which spans 3 epidemic waves.

We changed data sources to a data source from the Los Angeles Times that provides data on cumulative infections and deaths, and the current number of COVID-19 patients in-hospital and in-ICU. Now, with direct observations of numbers in-ICU, we no longer need to include the ventilation compartment in the model and we can drop the parameter for the probability of ventilation (pV).

We have provided a more detailed explanation of our parameter estimation framework, which involved a two-step process of first using broad grid search and second approximate Bayesian computation (ABC) sampling.

This approach was necessary because multiple parameter solutions to fit the model exist, meaning that estimated posterior distributions will be multimodal if allowed to vary over a wide prior parameter space. The two-step process was employed to define unimodal posterior distributions and achieve convergence in parameter estimates. The broad grid search step was used to identify possible regions for each parameter, from which we decided on a single mode. External data sources were used to specify the parameter range for the grid search. The ABC sampling step was used to estimate the final posterior distribution for each parameter, with a prior distribution informed by the chosen mode from the grid search step.

We have added comments on the convergence of the ABC sampling step for the model parameters in the main text. We have also provided plots of the prior and posterior density distributions used in the ABC step and commented on their similarity / difference in the Appendix. The distribution ranges for the broad grid search step are referenced in the main text, with details and full specification provided in the Appendix.

Risk model (methods and results)

Based on the reviewers suggestion for more clarity, we have restructured our explanation of the risk model methods in the main text by breaking the explication into 6 steps, explaining each of the 6 steps separately, and adding a flow diagram illustrating each step (model/analysis) and its inputs (as data or a previous modeling step) and outputs. Supporting details and the mathematical specification of each step have been provided in the Appendix. As part of this revision, in Step 2 we provide a paragraph describing how the JAM method combines the correlation structure and marginal effects to get conditional effects, with further mathematical detail provided in the Appendix.

We have recategorized the age factor, and now model the categories 0-18, 19-49, 50-64, 65-79, and 80+. These categories are now possible to model due to the availability of observed data on these categories using the LA Times data source; in the COVID-19 infections-by-age data source used in the original version of the manuscript, data on these more fine-grained categories was not available and we were not able distinguish between ages 60-79 from 80+.

We have added a step to the risk modeling approach whereby we estimate the conditional relative risk for age (conditional on the other three risk factors), for each of the three models of disease progression (H|I), (Q|H), (D|Q), by calibrating the risk model to observed COVID-19 data for LAC on deaths by age group. Now, our estimates of the frequency of each age group over infections and over deaths are both matched to observed data. This means we are no longer using JAM to estimate the conditional RR for age groups. Please see Step 5 of the risk model for details.

To visualize our estimates of the risk-profile-stratified probabilities of disease progression and CFR and IFR, we have removed Tables 2 and 3 which both reviewers commented were difficult to read and/or interpret. The purpose of Tables 2 and 3 was to convey the range of values that can be taken on by the risk-profile-stratified probabilities of disease progression and CFR / IFR. In the revision, we now convey this information graphically and tabularly by:

Including in the main text new figures that show the range of values that can be taken on by profiles falling within each age group; specifically, the mean, minimum, and maximum (as an error bar) of the probabilities/CFR/IFR across the composing risk profiles within each of the 5 age groups.

Including in the main text a figure of the estimated median and the 95% CI of the CFR and IFR for the most populous (in the overall LAC population) risk profile within each age group.

Including in the Appendix, as standard numerical tables, the median and the 95% CI of the risk-profile-stratified probabilities of disease progression in Appendix Tables 8-10, and the median and the 95% CI of the risk-profile-stratified case fatality rates and infection fatality rates in Appendix Tables 11-12, all across dates every two weeks from May 15 2020 - March 1 2021.

Reviewer #1:

The manuscript was a pleasure to read, and the supplement was also well organized and helped me understand some of the statements made in the manuscript. Unfortunately I have not stayed up to date on the clinical epidemiology or policy issues surrounding COVID-19 so I will only be able to make comments on technical points.

Thank you for your careful read of our manuscript and very thorough and helpful comments in review. We are very glad to hear the manuscript was a pleasure to read, and that the supplement was helpful. We have attempted to address all of your thoughtful comments and questions in the main text and Appendix, and in the following provide a point-by-point explanation of all the changes that were made in response to your comments. Implementing your comments has greatly contributed to the readability and clarity of the methods used and conclusions drawn from the paper.

Substantive comments:

R1.1a. Line 72: Can the authors please describe why there is no A->D transition? I imagine COVID diagnosis obtained only through autopsy is not common, but some explanation here would help.

Thank you for asking for clarification on this point; we agree that although COVID-19 diagnosis through autopsy is rare, and explanation justifying our assumption is warranted.

In our model, the only route to death is through an observed infection, followed by hospitalization and ICU care, meaning we do not model individuals that die from COVID-19 illness at home rather than at a point-of-care. We justify this assumption because the majority of confirmed COVID-19 deaths cases result from individuals who die in SNF, hospital, or following a stay in hospital; this evidence comes from personal communication with a colleague, Professor Erika Garcia in the Department of Preventive Medicine at USC, who has analyzed COVID-19 mortality data for the state of California (see Garcia et al. 2021, Annals of Epidemiology, https://doi.org/10.1016/j.annepidem.2021.03.006). Although not included in that paper, Professor Garcia has analyzed the mortality data for CA to provide us with the estimate that 4%-9% of official COVID-19 deaths have occurred at home, across the three epidemic waves.

Furthermore, we do not model a route to death for individuals without a confirmed COVID-19 infection, since record of confirmed COVID-19 infection (or probable based on clinical evidence) is needed to be classified as COVID-19 mortality.

We have added this statement to the text under Section 2.1, beginning approximately line 125.

R1.1b. It may be worth stating upfront that the binary classification of infections as A or I only represents detection/entering the health system, and not asymptomatic/symptomatic, as in other modeling studies.

Thank you for this suggestion, we had provided this explanation in the Appendix but agree that it is an important distinction and needs to be included in the main text. We have included the following sentences to clarify this point, ~ line 78:

We also include a compartment representing infectious individuals with unobserved and/or unconfirmed infections (A). I represents cases of infection that have tested positive for the SARS-CoV2 virus and are confirmed in the official register of infection case data. A represents cases that are symptomatic but do not appear in the confirmed case data, whether because they are asymptomatic, are symptomatic and do not get tested, or get tested and have a false negative result.

R1.2. Line 77: I would appreciate some more explanation of what "proxy with error" means, with respect to the Q->V transition. In particular, what real process when a patient enters the ICU does that compartment represent? Does it mean that someone is only recorded as an ICU COVID patient when they are given a ventilator?

In the revision, we have removed the ventilation compartment from the model. We had originally included the ventilation compartment because we had available data on the numbers of patients on ventilation but not in the ICU, whereas our risk model probabilities were based on the progression from general hospital admittance to being advanced to critical care (in ICU). As noted in the Summary at the beginning of this letter, we have since obtained data on the number of COVID-19 patients in Los Angeles County in the ICU, and so removed the ventilation compartment.

R1.3. Section 2.1.1: if beta(t) is the parameter being estimated by ABC, what is the relationship of that estimated curve to mu(t) in the previous paragraph (starting on line 90)?

Thank you for asking for clarification on the relationship between beta(t) and mu(t). This comment and comment R1.7 below helped us to see that this relationship was not made sufficiently clear in the main text. To address this, in the revision we have removed the parameter mu(t) and consider only the time-varying infection rate, beta(t), and time-varying reproductive number, R(t). Now, when interventions are put in place that change the transmission rate, instead of estimating modifications to mu(t) to represent modifications in beta(t) and R(t), we discuss modifications to the time-varying parameters beta(t) and R(t) directly.

R1.4. Line 124: I would appreciate some comments from the authors on convergence (or not) of the ABC sampling for the 6 parameters in the main text. If time permits, perhaps trace plots or histograms in the Appendix would be a nice addition.

We have added comments on the convergence of the ABC sampling for the model parameters in the main text (Section 3.1) and also provided density plots of the final distributions in the Appendix (Section 3.3).

This was a very helpful suggestion, as it helped us to see that the parameter estimation framework was not fully explicated in the paper. In the revision we have provided more context on the parameter estimation process, as follows below (in Methods Section 2.1.1):

Due to the relationships between parameters in the model formulation, multiple parameter solutions to fit the model exist. This means that estimated posterior distributions will be multimodal if allowed to vary over a wide prior parameter space. We use a two-step process to define unimodal posterior distributions and achieve convergence in parameter estimates by using broad grid search followed by approximate Bayesian computation (ABC) sampling. We first perform the broad grid search to identify possible regions for each parameter, from which we decide on a single mode. External data sources were used to specify the parameter range for the grid search (Appendix Section 2.1). Second, we use ABC sampling to estimate the final posterior distribution for each parameter with a prior distribution informed by the chosen mode from the grid search step. Specifically, we define the ABC prior as a normal distribution with 95% of its values lying within 25% of the mean value of the chosen mode; for example, if the mean of a chosen mode for parameter X is determined to be 0.1, then the prior distribution for X will be a normal distribution with standard deviation of 0.01, chosen such that Pr(0.075 < X < 0.125) ≈ 95%.

Then, in the main text Results Section 3.1, we added the following comments on convergence of the ABC sampling for the model parameters, and provided density plots of the final distributions in the Appendix (Section 3.3):

The two-step parameter estimation approach (broad grid search to select a single mode of each parameter, followed by approximate Bayesian computation (ABC) using a prior distribution specified around that single mode) achieved convergence in posterior densities. Convergence is not reached for the broad grid search step, with multi-modal distributions returned for each parameter (not shown). By specifying a narrow prior distribution around a mode chosen from the broad grid search sampling, convergence around a dominant single mode is achieved in the final posterior density returned by the ABC sampling step (see Appendix Section 3.3 for density plots of prior and posterior distributions).

R1.5. Line 161: For those of us unfamiliar with JAM, a sentence describing the assumptions needed to combine the correlation structure and marginal effects to get conditional effects would be a valuable addition.

One major change in our revision was to restructure our explanation of the risk model methods in the main text by breaking it into 6 steps, explaining each of the 6 steps separately, and adding a flow diagram illustrating each step (model/analysis) and its inputs (as data or a previous modeling step) and outputs. Supporting details and the mathematical specification of each step have been provided in the Appendix. As part of this revision, in Step 2 we provide a couple of sentences describing how JAM combines the correlation structure and marginal effects to get conditional effects; please see the main text Section 2.2.1 ~line 300, and the Appendix Section 5.3.

R1.6. Line 187: I understand that this accounts for uncertainty in estimated parameters and stochastic variation in trajectories, but does it also account for uncertainty in the estimates from the risk model?

This is a good question and the answer is no, it does not. We estimate the risk-profile-stratified probabilities of disease progression as fixed values (that are mean-centered on the means of the population-wide alpha, kappa, and delta), so the uncertainty in the risk-profile-stratified CFR and IFR comes only from the stochasticity in the epidemic model estimates of the population-wide infection and death timeseries. We have added the following line to the text to make this clear to the reader:

This process [of estimating the risk-profile-stratified CFR and IFR] therefore accounts for the uncertainty in the estimated parameters and stochasticity in the epidemic model, but not from the risk model estimates.

R1.7. Line 192: How does changing R(t) adjust beta(t) and mu(t)? I'm still somewhat unclear on the relationship of these quantities.

As noted in response to your comment R1.3 above, this comment helped us to see that the relationship between R(t), beta(t), and mu(t) was not made sufficiently clear in the main text, and so in the revision we have removed the parameter mu(t) and instead deal only with the time-varying infection rate, beta(t), and time-varying reproductive number, R(t). Now, when interventions are put in place that change the transmission rate, instead of estimating modifications to mu(t) to represent modifications in beta(t) and R(t), we discuss modifications to the time-varying parameters beta(t) and R(t) directly.

Minor comments:

R1m.1. Abstract: CFR is used before it is defined

Thank you for pointing this out, we have corrected it.

R1m.2. Line 70: please remind readers which compartments S,E,R are.

We have added much detail to our explanation of the epidemic model in the main text, which we believe has sufficiently clarified the meaning of each compartment, including S, E, and R; please see the main text section 2.1.

R1m.3. Line 105: there is an inconsistent use of subscript t and function of t for parameters alpha, kappa, delta between text and Figure 1.

We have corrected this in the revision and now use subscript t for all parameters, while saving function of t for all state variables.

Also, p_v is not defined anywhere, what is it?

As noted above, we have removed the ventilation compartment from the model and the associated probability of ventilation p_v. We apologize for the lack of clarity in the original version of the manuscript.

R1m.4. Line 204: text references Fig 4a but looks like it should say Fig 4b.

In the revision, we have updated all figures, their captions, and in-text references.

R1m.5. Line 228: text references Fig 4b but looks like it should say Fig 4a.

Wee have updated all figures, their captions, and in-text references.

R1m.6. Tables 2 and 3 are very hard to read (too small), could they be enlarged somewhat?

We agree Tables 2 and 3 were hard to read and interpret. Reviewer 2 suggested that a graphical representation could be used to convey the information in Tables 2 and 3. In our revision, we have replaced Tables 2 and 3 with both graphical representations and standard numerical tables as follows.

The purpose of Tables 2 and 3 was to convey the range of values that can be taken on by the risk-profile-stratified probabilities of disease progression and CFR / IFR. To convey this information graphically, in the main text we have included Figures 5a-c and 6a-d. We show in Figures 5a - 5c the range of values that can be taken on by profiles falling within each age group. Specifically, these figures show the mean, minimum, and maximum (as an error bar) of the probabilities across the composing risk profiles within each of the 5 age groups, under each of the three risk models. Figures 6a - b show the mean, minimum, and maximum of the median CFR and IFR values that can be taken on by the risk profiles within each age group. We also plot the estimated median and the 95% CI for the most populous (in the overall LAC population) risk profile within each age group in Figures 6c - d.

We have also included, as standard numerical tables, the median and the 95% CI of the risk-profile-stratified probabilities of disease progression in Appendix Tables 8-10, and the median and the 95% CI of the risk-profile-stratified case fatality rates and infection fatality rates in Appendix Tables 11-12, all across dates every two weeks from May 15 2020 - March 1 2021.

Reviewer #2:

In this study, the authors present an analysis of the COVID-19 transmission dynamics in Los Angeles county (LAC). They aim to estimate the probability of severe illness depending on the risk profile of the individuals, and analyse the impact different control measures could have had on the number of infected individuals. In order to carry out this analysis, the authors developed the following workflow:

3 - Estimate the proportion of cases from different risk profiles using the marginal risk factor in LAC, and the age distribution the reported cases.

5 - Generate simulations of outbreaks using different scenarios of Non-Pharmaceutical Interventions.

This paper mixes different complex methods and uses different publicly available data sources. I appreciate the time and effort the authors have spent to provide the code and make their analysis reproducible on a Github repository. I also want to highlight the thorough Appendix detailing the approach used in every step of the analysis.

Thank you for your complimentary words, and moreover for the great time and attention put into your extremely thorough and constructive review of our manuscript, resulting from your very careful read of both the main text and the Appendix. We have addressed all of your suggestions, which we believe have contributed in a major way to improving the quality of our article.

Before getting to major and minor points, I had an overarching comment on the paper:

R2.0/ Because of the number of different stages and models developed in the study, I think the message / workflow sometimes gets lost, or at least I got confused a number of times. I wonder whether the authors would consider splitting this analysis into two papers: One focused on estimating the distribution of risk profiles in infected cases and the probabilities of hospitalisation (ICU and death) associated with each profile (ie steps 2 to 4, and the first objective of this paper), and one focused on the simulations of different scenarios of NPIs and vaccine coverage using the risks profiles (mostly Step 5). This way, the different scenarios of NPI included in the simulations could be deepened and more realistic, and the authors could give more information in the Main Paper on the JAM method and logistic regression they implemented (and use the prior distribution on alpha, kappa and delta for their estimations inn Step 4). I do not think this is a requirement for publication, but I believe this would make it easier for the reader to follow the arguments the authors are presenting.

After carefully considering this suggestion, we decided to split the original manuscript into two papers, exactly as you suggested: The first paper, which is presented in this submitted revision, focuses on estimating the distribution of risk profiles among infected cases, hospitalized and ICU admitted patients, and deaths, as well as the CFR and IFR associated with each profile. The second paper, which we are still working on, will focus on policy analyses that utilize the risk estimates from the first paper (this paper) as parameter inputs.

We agree that splitting our manuscript into two papers will allow us to focus each paper on a single objective, as a result making each paper easier for the reader to follow. In the case of this paper, focusing on the risk-profile-stratified estimates has allowed us to more clearly outline and describe our multi-step analysis plan in the methods section, as well as the findings from both the epidemic model and risk model.

Major points

R2.1/ Summarise the overall workflow in a Figure.

R2.1a/ In the summary of my review, I tried to summarise the workflow the authors implemented from the Main text and Appendix. Although I hope I understood it correctly, it took me time and a few read-throughs to figure out how and why the authors went from one stage to another. I think the authors should add a figure summarising each stage of the analysis, along with their input and output, and how they connect to one another. I believe this would be very helpful for the readers, and would prevent a lot of confusion.

You have provided an excellent summary of our workflow. Thank you for your very careful read and attention to the details of our paper that was required to understand all steps of our method. We have taken this point to heart and substantively edited the methods section describing the risk model in the main text (Section 2.2) and the Appendix (Appendix Part II) to outline, describe, and visualize in Figure 2 a workflow of all steps taken in our methodology.

R2.1b/ Along with that, the authors use a lot of different notations in each section of the Appendix. I believe they should summarise all the notations of Section 2 in a table (similar to what they did in Table 1, 2, and 3 of the Appendix). This would facilitate the reading and general understanding of their analysis.

Thank you for this helpful suggestion. We have included in Table 8 of the Appendix a table summarizing all notations used in developing the risk model, i.e. Part II of the Appendix. In the table we have distinguished between which notations represent definitions, observed data, and model-produced estimates. We agree with the reviewer that this will facilitate the reader’s understanding of the model.

R2.2/ The implementation of the compartmental model should be clearer

R2.2a/ I am not sure I understand why Approximate Bayesian Computation (ABC) was needed in the first step. I thought the authors could have fitted a deterministic model to the daily number of new infections / hospitalisations / ventilations / deaths by generating a likelihood function from these measures and running a Monte Carlo Markov Chain to estimate the parameters. What made the ABC approach more relevant?

We respond to this comment in two parts. First, we motivate our choice of employing a stochastic model instead of a deterministic model. We felt a stochastic model was needed to represent the stochasticity in the infection process especially when numbers are small in certain compartments. LAC is a large county made up of many composing cities and communities, each with their own epidemic processes unfolding at different rates with different peaks. The unified trend across these many cities/communities is one of substantive day-to-day variation in counts of new infections and deaths, since it is the composite of multiple different trends. Furthermore, it has been well established that there is stochasticity in the observation and reporting of infections, and to a lesser extent hospitalizations and deaths. Since we do not attempt to model these observation/reporting processes themselves, our model is not capable of mechanistically identifying the “true” trend of each state variable from the observed data. We therefore believe a stochastic model is merited, which is able to represent a range of scenarios and confidence intervals by simulating it across stochastic runs.

Second, we motivate our choice of ABC sampling to estimate parameters of our stochastic model. First and foremost, a tractable likelihood function was not possible for our model, which involves five free and interacting parameters (or six, if the starting time is included). Therefore, a likelihood-free method of parameter estimation was required. ABC provides a number of benefits that make it particularly well suited to this application, not only allowing estimation of model parameters when a likelihood function is intractable but also providing a suitable framework when: prior information and/or assumptions are available about the distribution or range of values each parameter may take; multiple data features are available for model estimation, e.g. infection, hospitalization, and death counts; and data is missing, partially observed, or uncertain, such as unreliable early infection surveillance data.

Finally, we note that while ABC is a computation-heavy approach, simulating our stochastic compartmental model is computationally cheap, and so we are able to employ ABC without any computational issues in this setting.

R2.2b/ From the Appendix tables 4, 5 and 6, the authors use very informative priors on most of their parameters (especially alpha, delta, and kappa). I think the authors should compare the prior to the posterior distribution for these parameters to highlight whether the fitting procedure was different from the prior assumptions.

This is a fair point, we agree that comments on the similarity between the prior and posterior distributions from ABC parameter estimation should be provided. First, it is important to note (as we have now noted in the text) that our parameter estimation process involved a first step of a broad grid search of each parameter over a wide range. This step returned the modes around which we defined prior distributions for the ABC sampling step. Final posterior distributions for each parameter were quite distinguished from the initial range of values sampled in the broad grid search step.

With that said, we felt it would still be instructive to compare the plots of prior distributions used in the ABC sampling step to the final posterior distributions produced for each parameter. In the Appendix Section 3.3, we have provided the density plots for the prior and posterior distributions from ABC parameter estimation. Comparing the distributions, it can be seen that although the prior distributions used in the ABC step are narrow (with 95% of a prior parameter's value lying within 25% of the mean of the chosen mode from broad grid search), they are not too narrow to allow the posterior distributions to take a different shape such that all posterior distributions differ slightly from the prior. The mean of the posterior is not exactly aligned with the mean of the prior, and the standard deviation becomes narrower. We have provided these observations in the text in Appendix Section 3.3.

R2.2c/ The authors mention that Figure 2 “demonstrates that good model fits are achieved in all compartments across time.” I am not certain I find all the panels of Figure 2 convincing (for example the time series of “New Deaths” and “New in Hospital” show a lot of daily variations, which makes it harder to evaluate whether the fit is convincing). Could the authors aggregate the data and the simulation by week and show the match between the weekly time series? This could remove part of the dispersion observed in Figure 2 and make it easier for the reader to compare the inferred time series to the data.

First, as noted in the Summary, we have added a new data source that provides current numbers in-hospital and in-ICU; trends for these parameters are undispersed and clearly interpretable for the reader.

Regarding aggregating infections and deaths in presenting plots for the reader, this is the one point on which we do not agree with the reviewer. We believe that it is necessary to present the true stochasticity in the data in our figures, as this is the stochastic data we were fitting our model to. As noted in our response to comment R2.2a, this stochasticity in the observed and reported data is the closest to the “truth” of the infection process we have, without attempting to estimate the error in the observation and reporting processes, which is not a feature of our modeling framework. Additionally, the stochasticity in the data is also representative of the fact that the epidemic trend in LAC is in fact a composite of many epidemic trends occurring in the cities/communities that make up the county. For these reasons, we felt it necessary to use a stochastic model, to fit the stochastic model to the stochastic data, and to present the stochastic data to the reader in the plots. In sum, an aggregate statistic created on the data could add bias or error to the data being fit to, and so we choose to use the raw data, which the stochastic model is capable of estimating to.

This all said, we have produced for this review figures that shows model estimates of new infections and new deaths plotted against the data as a 7-day running average (see below). However for the reasons given above, we would prefer to provide the reader with the view of the raw observed data used to fit the model.

R2.2d/ In the Appendix (subsection 2.3.1), the authors explain the summary statistics used for their ABC approach. I am not sure I understood the last notation. Did they use the total number of cases infected, hospitalised, ventilated and deceased (before 15th-25th March), along with the number of cases that recovered before 4th April? If so, why did they only include the number of recovered cases early in the outbreak?

Thank you for this question, which identified a mistake in the Appendix in Section 2.3.1 that was left over from an earlier draft of the manuscript. Recovered cases are not used in estimating the model and should not be included in the summary statistic function; we have removed this mention.

R2.3/ Some clarifications on the risk model and the uncertainty of the estimates are needed

R2.3a/ I am not familiar with the JAM method the authors used to compute the conditional risk effects. I believe they should add a couple of sentences to explain how this method matches the two inputs it uses.

Step 2 out of our 6-step restructuring of the methods section describing the risk model deals with the implementation of JAM to estimate the conditional relative risks. In the description of Step 2 in both the main text and the Appendix, we have provided a paragraph describing how JAM combines the correlation structure and marginal effects to get conditional effects; please see the main text Section 2.2.1 ~line 300, and the Appendix Section 5.3 for additional mathematical formalization.

R2.3b/ In line 316, the authors state that “The independent effect of comorbidities and obesity attenuate with increasing severity of disease, while that of age and smoking increase”. I believe the authors’ conclusions should reflect that the 95% CIs are quite large (especially for H|I), which makes this comparison seem excessive (eg: the CI of the condition RR of smoking on H|I is between .21 and 14.52).

This is a fair point, we have tempered our statement by replacing the text quoted above with the following:

We observe that the independent effect of comorbidities and obesity attenuate with increasing severity of disease; smoking may increase with age, however a very wide confidence interval for (H|I) makes this conclusion tentative.

R2.3c/ The authors consider that BMI and age are ordinal variables, did the authors explore the idea of using different RR for each age category (ie the RR between 21-40 and 0-19 would be different from the RR between 41-59 and 21-40)? Would it be possible (and worth testing) that the risks of severe illness abruptly increase for the highest age group / BMI?

This is a great question and in fact we now do estimate a different conditional RR for each age category. This was achieved by adding a step to the risk modeling approach (Step 5) whereby we estimate the conditional relative risk for age for each of the three models of disease progression (H|I), (Q|H), (D|Q) separately from estimating the conditional RR for BMI, smoking, and any comorbidity. Specifically, we estimate the conditional RR for age by calibrating the risk model to observed COVID-19 data for LAC on deaths by age group. Now, our estimates of the frequency of each age group over infections and over deaths are both matched to observed data. In this way, we are able to estimate a different RR for each category.

We have also added a 5th age category to our age category set, and now model the categories 0-18, 19-49, 50-64, 65-79, and 80+. This enables us to model the sharp increase in risk for individuals in the higher age groups.

R2.3d/ Finally, I thought the tables 2 and 3 were very hard to read and interpret. I do not really see what conclusions to draw from these. I think the authors should consider using a graphical representation rather than a table, or greatly reduce the number of rows / columns. Furthermore, the authors only show the median estimates, whereas they reported very large confidence intervals for some of the conditional RRs. I think they should report and reflect on the confidence intervals of these estimates.

We agree that Tables 2 and 3 were hard to read and interpret and in our revision, we have replaced Tables 2 and 3 with both graphical representations and standard numerical tables as follows.

The purpose of Tables 2 and 3 was to convey the range of values that can be taken on by the risk-profile-stratified probabilities of disease progression and CFR / IFR. To convey this information graphically, in the main text we have included Figures 5a-c and 6a-d. We show in Figures 5a - 5c the range of values that can be taken on by profiles falling within each age group. Specifically, these figures show the mean, minimum, and maximum of the probabilities across the composing risk profiles within each of the 5 age groups, under each of the three risk models. Figures 6a - b show the mean, minimum, and maximum of the median CFR and IFR values that can be taken on by the risk profiles within each age group.

We have also reported and commented on the confidence intervals of the median estimates. We plot the estimated median and the 95% CI (vs. the minimum and maximum profile by median only, as in 6a-b) for the most populous (in the overall LAC population) risk profile within each age group in Figures 6c - d. In the text describing Figures 6c-d, we have commented on the confidence intervals observed in the figures.

R2.4/ The scenario implemented could be more realistic

R2.4a/ The authors currently consider 9 scenarios representing a combination of isolating a fraction of the individuals older than 65 years old (0%, 50% or 100%), and adopting different levels of NPIs (None, Moderate or Observed). I believe this idea is relevant, especially in the context of vaccination campaigns aimed at certain age groups, but most of the scenarios implemented here are unrealistic. Indeed, I think complete isolation of older people is improbable (multi-generational households, care homes..), and imagining a situation where uncontrolled transmission would trigger absolutely no change in behaviour (or policy) is also unthinkable. I believe there would be great value in a more consistent exploration of the impact of a gradually increasing proportion of older individuals being protected (or isolated), mixed with different moderate values of stringency of control measures.

As discussed, we have removed the policy analysis from this Part I paper. This will be the focus on the Part II paper, and we will be sure to take these comments into consideration. We will be implementing more realistic scenarios in terms of both NPIs and protecting those at higher risk.

Minor points

R2.4b/ In line 44 the authors state, “Results highlight […] the efficacy of targeted subpopulation-level policy interventions in LAC.” I do not think this sentence is in agreement with the results shown by the authors. Indeed, there was only one set of simulations where a lockdown was not implemented and the number of deaths was similar to the data, and it came at the cost of overwhelming hospitals. Furthermore, this result relied on a complete isolation of those 65+, which is unrealistic. I would argue the results highlight the efficacy of a complete lockdown in limiting transmission, which is also what the authors write in the abstract and in the discussion. Therefore, I think they should remove this sentence.

Thank you for pointing this out. Although this sentence and the discussion of the policy analysis is no longer included in this Part I paper, we agree with the reviewer on this point and will not be including the highlighted sentence / conclusion in the Part II paper.

R2m.1/ Footnote P5 “The susceptible population does not decrease sizeably during the time period considered in this study.” According to the first panel of Figure 2, Up to 25% of the population was infected between March and November, do the authors think this could potentially impact the effective reproduction number? If not, what decrease would they consider to be sizeable?

In the revision, we have plotted and reflected on both the R(t) based on behavior alone and the effective R(t) (R_eff(t)) that takes into account the diminishing size of the susceptible population. The comparison of R(t) and R_eff(t) allows us to say that an appreciable divergence between the two began as early as the beginning of July, following the first half of the second wave. This was an important observation, since R_eff(t) had dropped back below 1 by mid-July, two weeks earlier than R(t) did.

Furthermore, as the cumulative infected population grew with the major surge of the third epidemic wave, we estimate this fraction grew to 40-60% by March 1, 2021, which would be sizeable by anyone’s definition.

R2m.2/ L288-290: The authors mention the hospital and ICU capacity limits, I think it may be relevant to add these thresholds to the right panel of Figure 3, in order to facilitate the comparison between the simulated number of hospitalisations and the maximum capacity.

Thank you for this suggestion. We have plotted the hospital and ICU capacity limits in the figure showing current cases in-hospital and in-ICU, which we have moved to the Appendix Figure 5.

R2m.3/ L345-351: I think the explanation of how the risk profiles were grouped should come before Table 2 is described, since this is one of the columns of Table 2. I would therefore suggest moving these sentences up.

We agree this would have been a helpful suggestion, however in our revision we have omitted the grouping of the risk profiles into the 5 risk categories; we thought this added little value to the overall interpretation of the calculated risk values.

R2m.4/ In the compartmental model, the authors estimate the parameter Pv, representing the proportion of hospitalised cases who need ventilation. I could not find the prior distribution or the estimated values of Pv in the Main Text, or in the supplement.

Wee have removed the ventilation compartment from the model and the associated probability Pv. We had originally included the ventilation compartment because we had available data on the numbers of patients on ventilation but not in the ICU, whereas our risk model probabilities were based on the progression from general hospital admittance to being advanced to critical care (in ICU). We have since obtained data on the number of COVID-19 patients in Los Angeles County in the ICU, and so removed the ventilation compartment.

R2m.5/ I did not find any plot of the values of r(t) estimated by the model, I think the authors should plot the distribution of all the parameters estimated.

We have included a plot of the model-estimated r(t) in the revision (Figure 4c).

R2m.6/ In the Appendix, Subsection 2.3.1, does “D” stand for the Data or the death time series, I think the letter applies to both here, is it a mistake?

Thank you for this question and for noticing this mistake. In the highlighted section, “D” was meant to indicate the data, but we agree that this notation cannot be used since “D” is reserved for the death timeseries. In the revision we have used the notation Φ to represent the data.

Attachment

Submitted filename: LETTER_TO_REVIEWERS_PONE-S-21-01098_REVISION.pdf

Click here for additional data file.^{(492.3KB, pdf)}

PLoS One. doi: 10.1371/journal.pone.0253549.r003

Decision Letter 1

Martial L Ndeffo Mbah

8 Jun 2021

An integrated risk and epidemiological model to estimate risk-stratified COVID-19 outcomes for Los Angeles County: March 1, 2020 - March 1, 2021

PONE-D-21-00882R1

Dear Dr. Horn,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Martial L Ndeffo Mbah, Ph.D

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #3: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #1: Yes

Reviewer #3: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #3: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

Reviewer #1: Yes

Reviewer #3: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

Reviewer #1: Yes

Reviewer #3: Yes

**********

6. Review Comments to the Author

Reviewer #1: (No Response)

Reviewer #3: All comments and criticisms raised in the previous review round have been addressed by authors, who should be commended for their efforts.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #3: No

PLoS One. doi: 10.1371/journal.pone.0253549.r004

Acceptance letter

Martial L Ndeffo Mbah

15 Jun 2021

PONE-D-21-00882R1

An integrated risk and epidemiological model to estimate risk-stratified COVID-19 outcomes for Los Angeles County: March 1, 2020 - March 1, 2021

Dear Dr. Horn:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Martial L Ndeffo Mbah

Academic Editor

PLOS ONE

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

S1 Appendix

(PDF)

Click here for additional data file.^{(1.2MB, pdf)}

Attachment

Submitted filename: LETTER_TO_REVIEWERS_PONE-S-21-01098_REVISION.pdf

Click here for additional data file.^{(492.3KB, pdf)}

Data Availability Statement

[pone.0253549.ref001] 1. Stokes EK, Zambrano LD, Anderson KN, Marder EP, Raz KM, Felix SEB, et al. Coronavirus disease 2019 case surveillance—United States, January 22–May 30, 2020. Morbidity and Mortality Weekly Report. 2020;69(24):759. doi: 10.15585/mmwr.mm6924e2 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0253549.ref002] 2. Petrilli CM, Jones SA, Yang J, Rajagopalan H, O’Donnell LF, Chernyak Y, et al. Factors associated with hospitalization and critical illness among 4,103 patients with COVID-19 disease in New York City. MedRxiv. 2020;. [Google Scholar]

[pone.0253549.ref003] 3. Guan Wj, Ni Zy, Hu Y, Liang Wh, Ou Cq, He Jx, et al. Clinical characteristics of coronavirus disease 2019 in China. New England journal of medicine. 2020;382(18):1708–1720. doi: 10.1056/NEJMoa2002032 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0253549.ref004] 4. Zhou F, Yu T, Du R, Fan G, Liu Y, Liu Z, et al. Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study. The Lancet. 2020;. doi: 10.1016/S0140-6736(20)30566-3 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0253549.ref005] 5. Williamson EJ, Walker AJ, Bhaskaran K, Bacon S, Bates C, Morton CE, et al. Factors associated with COVID-19-related death using OpenSAFELY. Nature. 2020;584(7821):430–436. doi: 10.1038/s41586-020-2521-4 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0253549.ref006] 6.U S Department of Health and Human Services. Social Determinants of Health [online]; 2020. Available from: https://www.healthypeople.gov/2020/topics-objectives/topic/social-determinants-of-healthexternal.

[pone.0253549.ref007] 7. Egede LE, Walker RJ. Structural Racism, Social Risk Factors, and Covid-19—A Dangerous Convergence for Black Americans. New England Journal of Medicine. 2020;383(12):e77. doi: 10.1056/NEJMp2023616 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0253549.ref008] 8. Levin AT, Hanage WP, Owusu-Boaitey N, Cochran KB, Walsh SP, Meyerowitz-Katz G. Assessing the age specificity of infection fatality rates for COVID-19: systematic review, meta-analysis, and public policy implications. European journal of epidemiology. 2020; p. 1–16. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0253549.ref009] 9. Verity R, Okell LC, Dorigatti I, Winskill P, Whittaker C, Imai N, et al. Estimates of the severity of coronavirus disease 2019: a model-based analysis. The Lancet infectious diseases. 2020;. doi: 10.1016/S1473-3099(20)30243-7 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0253549.ref010] 10. Yang W, Kandula S, Huynh M, Greene SK, Van Wye G, Li W, et al. Estimating the infection-fatality risk of SARS-CoV-2 in New York City during the spring 2020 pandemic wave: a model-based analysis. The Lancet Infectious Diseases. 2020;. doi: 10.1016/S1473-3099(20)30769-6 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0253549.ref011] 11.Los Angeles County Department of Public Health. Los Angeles County Case Counts, Crude Rates by Selected Region; 2020. Available from: https://github.com/AbigailHorn/COV2-LA/tree/master/data.

[pone.0253549.ref012] 12. Rothman KJ, Greenland S, Lash TL. Modern epidemiology. Lippincott Williams & Wilkins; 2008. [Google Scholar]

[pone.0253549.ref013] 13. Griffith G, Morris TT, Tudball M, Herbert A, Mancano G, Pike L, et al. Collider bias undermines our understanding of COVID-19 disease risk and severity. medRxiv. 2020;. doi: 10.1038/s41467-020-19478-2 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0253549.ref014] 14.Data LAT, Department G. california-coronavirus-data GitHub; 2021. Available from: https://github.com/datadesk/california-coronavirus-data.

[pone.0253549.ref015] 15. Newcombe PJ, Conti DV, Richardson S. JAM: a scalable Bayesian framework for joint analysis of marginal SNP effects. Genetic epidemiology. 2016;40(3):188–201. doi: 10.1002/gepi.21953 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0253549.ref016] 16. Horn AL, Jiang L, Washburn F, Hvitfeldt E, de la Haye K, Nicholas W, et al. Estimation of COVID-19 risk-stratified epidemiological parameters and policy implications for Los Angeles County through an integrated risk and stochastic epidemiological model. medRxiv. 2020;. doi: 10.1101/2020.12.11.20209627 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0253549.ref017] 17. Garcia E, Eckel SP, Chen Z, Li K, Gilliland FD. COVID-19 mortality in California based on death certificates: disproportionate impacts across racial/ethnic groups and nativity. Annals of Epidemiology. 2021;58:69–75. doi: 10.1016/j.annepidem.2021.03.006 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0253549.ref018] 18. Bretó C, He D, Ionides EL, King AA, et al. Time series analysis via mechanistic models. The Annals of Applied Statistics. 2009;3(1):319–348. [Google Scholar]

[pone.0253549.ref019] 19. Mode CJ, Sleeman CK. Stochastic processes in epidemiology: HIV/AIDS, other infectious diseases, and computers. World Scientific; 2000. [Google Scholar]

[pone.0253549.ref020] 20. Keeling MJ, Rohani P. Modeling infectious diseases in humans and animals. Princeton University Press; 2011. [Google Scholar]

[pone.0253549.ref021] 21. Diekmann O, Heesterbeek J, Roberts MG. The construction of next-generation matrices for compartmental epidemic models. Journal of the Royal Society Interface. 2010;7(47):873–885. doi: 10.1098/rsif.2009.0386 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0253549.ref022] 22. Liu Y, Gayle AA, Wilder-Smith A, Rocklöv J. The reproductive number of COVID-19 is higher compared to SARS coronavirus. Journal of travel medicine. 2020;. doi: 10.1093/jtm/taaa021 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0253549.ref023] 23. Bertozzi AL, Franco E, Mohler G, Short MB, Sledge D. The challenges of modeling and forecasting the spread of COVID-19. Proceedings of the National Academy of Sciences. 2020;117(29):16732–16738. doi: 10.1073/pnas.2006520117 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0253549.ref024] 24.Unacast. The Unacast Social Distancing Scoreboard; 2020. Available from: https://unacast.com/post/the-unacast-social-distancing-scoreboard.

[pone.0253549.ref025] 25. Oliver N, Lepri B, Sterly H, Lambiotte R, Deletaille S, De Nadai M, et al. Mobile phone data for informing public health actions across the COVID-19 pandemic life cycle. Science. 2020;. doi: 10.1126/sciadv.abc0764 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0253549.ref026] 26. Havers FP, Reed C, Lim T, Montgomery JM, Klena JD, Hall AJ, et al. Seroprevalence of antibodies to SARS-CoV-2 in 10 sites in the United States, March 23-May 12, 2020. JAMA Internal Medicine. 2020;. doi: 10.1001/jamainternmed.2020.4130 [DOI] [PubMed] [Google Scholar]

[pone.0253549.ref027] 27.of Public Health CD. California Department of Public Health Open Data Portal; 2021. Available from: https://data.ca.gov/dataset/covid-19-hospital-data1.

[pone.0253549.ref028] 28. Pfeiffer RM, Gail MH. Absolute risk: methods and applications in clinical management and public health. CRC Press; 2017. [Google Scholar]

[pone.0253549.ref029] 29.Centers for Disease Control and Prevention (CDC). National Health and Nutrition Examination Survey Data (NHANES); 2017-2018. Available from: https://www.cdc.gov/nchs/nhanes/index.htm.

[pone.0253549.ref030] 30.U S Census Bureau via R package tidycensus. Selected age groups, 2014-2018 American Community Survey 5-year estimates; 2018. Available from: https://CRAN.R-project.org/package=tidycensus.

[pone.0253549.ref031] 31.of Public Health LACD. Los Angeles County Health Survey (LACHS); 2018. Available from: http://publichealth.lacounty.gov/ha/hasurveyintro.htm.

[pone.0253549.ref032] 32.UCLA Center for Health Policy Research CA Los Angeles. California Health Information Survey; 2018. Available from: http://ask.chis.ucla.edu.

[pone.0253549.ref033] 33.of Public Health LACD. LA County Daily COVID-19 Data Dashboard; 2020. Available from: http://publichealth.lacounty.gov/media/coronavirus/data/index.htm.

[pone.0253549.ref034] 34.Los Angeles County Department of Public Health. Order of the Health Officer, Reopening safer at work and in the community for control of COVID-19; October 6, 2020. Available from: http://www.ph.lacounty.gov/media/Coronavirus/docs/HOO/2020_10_06_HOO_Safer_at_Home.pdf.

[pone.0253549.ref035] 35.Los Angeles Mayor’s Office. Safer L.A. COVID-19; November 15, 2020. Available from: https://corona-virus.la/SaferLA.

[pone.0253549.ref036] 36.Los Angeles Mayor’s Office. Safer L.A. COVID-19 Timeline; November 24, 2020. Available from: https://lh3.googleusercontent.com/-awbM5oar9Bk/X71VEq0xBmI/AAAAAAAAJ1w/fWXuO3IaEYolw8fMxqC3VLAgE1m6vTeNQCK8BGAsYHg/s0/2020-11-24.png.

[pone.0253549.ref037] 37. Adam D. Simulating the pandemic: What COVID forecasters can learn from climate models. Nature. 2020;. doi: 10.1038/d41586-020-03208-1 [DOI] [PubMed] [Google Scholar]

[pone.0253549.ref038] 38. Hawkins D. Differential occupational risk for COVID-19 and other infection exposure according to race and ethnicity. American journal of industrial medicine. 2020;63(9):817–820. doi: 10.1002/ajim.23145 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0253549.ref039] 39. Zelner J, Trangucci R, Naraharisetti R, Cao A, Malosh R, Broen K, et al. Racial disparities in coronavirus disease 2019 (COVID-19) mortality are driven by unequal infection risks. Clinical Infectious Diseases. 2021;72(5):e88–e95. doi: 10.1093/cid/ciaa1723 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0253549.ref040] 40. O’Driscoll M, Dos Santos GR, Wang L, Cummings DA, Azman AS, Paireau J, et al. Age-specific mortality and immunity patterns of SARS-CoV-2. Nature. 2021;590(7844):140–145. doi: 10.1038/s41586-020-2918-0 [DOI] [PubMed] [Google Scholar]

[pone.0253549.ref041] 41.Wetsman N. Doctors are better at treating COVID-19 patients now than they were in March; 2020. Available from: https://www.theverge.com/2020/7/8/21317128/improved-covid-treatment-hospitals-remdesivir-dexamethasone.

[pone.0253549.ref042] 42. Centers for Disease Control and Prevention (CDC). interim clinical guidance for management of patients with confirmed 2019 novel coronavirus (2019-nCoV) Infection. 2020;12. [Google Scholar]

[pone.0253549.ref043] 43. Woolf SH, Chapman DA, Sabo RT, Weinberger DM, Hill L, Taylor DD. Excess deaths from COVID-19 and other causes, March-July 2020. Jama. 2020;324(15):1562–1564. doi: 10.1001/jama.2020.19545 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0253549.ref044] 44. Bilinski A, Emanuel EJ. Covid-19 and excess all-cause mortality in the US and 18 comparison countries. JAMA. 2020;. doi: 10.1001/jama.2020.20717 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0253549.ref045] 45.Our World in Data. A pandemic primer on excess mortality statistics and their comparability across countries.; 2020. Available from: https://ourworldindata.org/covid-excess-mortality.

[pone.0253549.ref046] 46.Centers for Disease Control and Prevention (CDC). Medical Visits, Hospitalizations, and Deaths in the United States—2017–2018 influenza season [online]. 2019;.

[pone.0253549.ref047] 47. Pullano G, Di Domenico L, Sabbatini CE, Valdano E, Turbelin C, Debin M, et al. Underdetection of COVID-19 cases in France threatens epidemic control. Nature. 2020;. [DOI] [PubMed] [Google Scholar]

[pone.0253549.ref048] 48. Harris JE. Los Angeles County SARS-CoV-2 Epidemic: Critical Role of Multi-generational Intra-household Transmission. Journal of Bioeconomics. 2021;23(1):55–83. doi: 10.1007/s10818-021-09310-2 [DOI] [Google Scholar]

[pone.0253549.ref049] 49. Tai DBG, Shah A, Doubeni CA, Sia IG, Wieland ML. The disproportionate impact of COVID-19 on racial and ethnic minorities in the United States. Clinical Infectious Diseases. 2020;. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0253549.ref050] 50. Chen YH, Glymour M, Riley A, Balmes J, Duchowny K, Harrison R, et al. Excess mortality associated with the COVID-19 pandemic among Californians 18-65 years of age, by occupational sector and occupation: March through October 2020. medRxiv. 2021;. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0253549.ref051] 51. Baker MG, Peckham TK, Seixas NS. Estimating the burden of United States workers exposed to infection or disease: a key factor in containing risk of COVID-19 infection. PloS one. 2020;15(4):e0232452. doi: 10.1371/journal.pone.0232452 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0253549.ref052] 52. Mutambudzi M, Niedwiedz C, Macdonald EB, Leyland A, Mair F, Anderson J, et al. Occupation and risk of severe COVID-19: prospective cohort study of 120 075 UK Biobank participants. Occupational and Environmental Medicine. 2021;78(5):307–314. doi: 10.1136/oemed-2020-106731 [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

An integrated risk and epidemiological model to estimate risk-stratified COVID-19 outcomes for Los Angeles County: March 1, 2020—March 1, 2021

Abigail L Horn

Lai Jiang

Faith Washburn

Emil Hvitfeldt

Kayla de la Haye

William Nicholas

Paul Simon

Maryann Pentz

Wendy Cozen

Neeraj Sood

David V Conti

Roles

Abstract

1 Introduction

2 Methods

2.1 Epidemic model

Fig 1. Epidemic model structure and estimated parameters.

2.2.1 Parameter estimation

2.2 Risk model

Fig 2. Flow diagram illustrating the risk model, i.e. the set of steps used to produce estimates of risk profile-stratified probabilities of disease progression and CFR and IFR.

2.2.1 Step 2: Conditional RR for BMI, smoking, and comorbidities

2.2.2 Step 3: Prevalence of each risk profile in the infected population

2.2.3 Step 4: Risk-profile-stratified probabilities of disease stage progression

2.2.4 Step 5: Conditional RR for age

2.2.5 Step 6: Risk-profile-stratified CFRq,t and IFRq,t

3 Results

3.1 Model and parameter estimates

3.1.1 Model fits

Fig 3. Summary of the epidemic model fit with COVID-19 data for Los Angeles, for all state variables, across multiple views: New cases, representing new daily incidence, current number in a compartment at a specific date, and cumulative counts.

3.1.2 Epidemic timecourse in LAC

Fig 4. Timeseries of model-estimated parameters relative to key dates and COVID-19 policy decisions in LAC.

3.2 Conditional relative risks (RR) for risk factors

Table 2. The estimated conditional relative risk (RR) for each age group relative to the reference age group of 19–49.

3.3 Risk-profile-stratified probabilities of disease stage progression

Fig 5. Range of the probabilities of hospitalization given infection across each risk profile, Pt(H|I)^_, summarized for each age group.

3.4 Risk-profile-stratified CFRq,t and IFRq,t

Fig 6.

3.5 Frequency of risk factor groups at each stage of disease

Fig 7. Estimated frequency of risk factor groups in the overall LAC population, and the distribution of individuals in each stage of disease from infected, hospitalized, admitted to ICU, to deceased at two week intervals from May 15, 2020 through March 1, 2021.

4 Discussion

Supporting information

Acknowledgments

Data Availability

Funding Statement

References

Decision Letter 0

Martial L Ndeffo Mbah

Roles

Author response to Decision Letter 0

Decision Letter 1

Martial L Ndeffo Mbah

Roles

Acceptance letter

Martial L Ndeffo Mbah

Roles

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

2.2.5 Step 6: Risk-profile-stratified CFR_q,t and IFR_q,t

Fig 5. Range of the probabilities of hospitalization given infection across each risk profile, $\underline{\hat{P_{t} (H | I)}}$ , summarized for each age group.

3.4 Risk-profile-stratified CFR_q,t and IFR_q,t